WO2020082574A1

WO2020082574A1 - Generative adversarial network-based music generation method and device

Info

Publication number: WO2020082574A1
Application number: PCT/CN2018/123550
Authority: WO
Inventors: 王义文; 刘奡智; 王健宗; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-10-26
Filing date: 2018-12-25
Publication date: 2020-04-30
Also published as: CN109346043B; CN109346043A

Abstract

Embodiments of the present application provide a generative adversarial network-based music generation method and a device, pertaining to the technical field of artificial intelligence. The method comprises: acquiring a training music signal comprising a real multi-track polyphonic music signal and multiple real music signals in multiple pre-determined audio tracks; extracting a feature matrix from the training music signal as training music sample data; constructing a generative adversarial network model, training the generative adversarial network model, and acquiring a network parameter of the trained generative adversarial network model; acquiring a random music signal input by a user; and inputting the random music signal into the generative adversarial network model, such that the generative adversarial network model automatically generates a multi-track polyphonic music signal according to the random music signal and the network parameter. The technical solution provided by the embodiments of the present application solves the problem in the prior art in which polyphonic music having multiple harmonious audio tracks is difficult to generate.

Description

Music generation method and device based on generation confrontation network

This application requires the priority of the Chinese patent application filed on October 26, 2018 in the Chinese Patent Office, with the application number 201811257179.3 and the application name as "a music generation method and device based on generating an adversarial network", the entire content of which is cited by reference Incorporated in this application.

Technical field

The present application relates to the field of data processing technology, and in particular, to a music generation method and device based on a generation confrontation network.

Background technique

Music is usually composed of multiple musical instruments / tracks and has its own time dynamics, and the concert expands interdependently with the passage of time. The success of natural language generation and monophonic music generation is not easily spread to polyphonic music. Most existing technologies choose to simplify the generation of polyphonic music in some way to make the problem easier to manage. This simplification includes: only generating mono track monophonic music, introducing chronological order of polyphonic music, etc.

Therefore, how to generate coordinated polyphony music among multiple audio tracks has become an urgent problem to be solved.

Application content

In view of this, the embodiments of the present application provide a music generation method and device based on a generation confrontation network to solve the problem that it is difficult to generate coordinated polyphony music among multiple audio tracks in the prior art.

In order to achieve the above object, according to an aspect of the present application, there is provided a music generation method based on a generative adversarial network model. The method includes: acquiring a music training signal, the music training signal including a multi-track polyphony music real signal and Real music signals of multiple preset audio tracks; extracting a feature matrix from the music training signal as music training sample data; constructing and generating an adversarial network model, and training the generated adversarial network model through the music training sample data, Obtain the trained network parameters of the generated adversarial network model; obtain the music random signal input by the user, the music random signal including at least one of the following: multi-track polyphony random music signal, multiple random music preset music tracks Signal; input the music random signal into the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.

In order to achieve the above object, according to an aspect of the present application, there is provided a music generation device based on a generation confrontation network, the device includes: a first acquisition unit for acquiring a music training signal, the music training signal including multiple tracks Polyphony real music signal and real music signals of multiple preset audio tracks; extraction unit, used to extract feature matrix from the music training signal as music training sample data; construction unit, used to construct a confrontation network model, And train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model; a second acquisition unit is used to acquire a music random signal input by the user, the music random signal At least one of the following: a multi-track polyphony random music signal, a plurality of preset random music tracks of the music random signal; a generating unit for inputting the music random signal into the generative confrontation network model to make the generative confrontation The network model automatically generates multitrack polyphony based on the music random signal and the network parameters Music signal.

In order to achieve the above object, according to an aspect of the present application, there is provided a computer non-volatile storage medium, the storage medium includes a stored program, and when the program is running, the device where the storage medium is located is controlled to execute the above-mentioned music Generation method.

In order to achieve the above object, according to an aspect of the present application, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing The steps of the above-mentioned music generation method are realized when the computer program is described.

In this scheme, by constructing a generative adversarial network model and using the dynamic game process composed of discriminators and generators, a multi-track polyphony music signal is finally generated, so that multiple tracks of polyphony music have coordination, Solve the problem in the prior art that it is difficult to generate coordinated polyphony music between multiple audio tracks.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions of the embodiments of the present application, the drawings required in the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings based on these drawings without creative labor.

FIG. 1 is a flowchart of a music generation method based on a generation confrontation network according to an embodiment of the present application;

2 is a schematic diagram of a music generation device based on a generation confrontation network according to an embodiment of the present application.

3 is a schematic diagram of a computer device according to an embodiment of the present application.

detailed description

In order to better understand the technical solutions of the present application, the following describes the embodiments of the present application in detail with reference to the accompanying drawings.

It should be clear that the described embodiments are only a part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the scope of protection of the present application.

The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The singular forms "a", "said" and "the" used in the embodiments of the present application and the appended claims are also intended to include most forms unless the context clearly indicates other meanings.

It should be understood that the term “and / or” used herein is merely an association relationship describing an associated object, indicating that there may be three relationships, for example, A and / or B, which may indicate: A exists alone, and A and B, there are three cases of B alone. In addition, the character “/” in this article generally indicates that the related objects before and after are in an “or” relationship.

It should be understood that although the terms first, second, third, etc. may be used to describe the terminals in the embodiments of the present application, these terminals should not be limited to these terms. These terms are only used to distinguish the terminals from each other. For example, without departing from the scope of the embodiments of the present application, the first acquiring unit may also be referred to as a second acquiring unit, and similarly, the second acquiring unit may also be referred to as a first acquiring unit.

Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to determination" or "in response to detection". Similarly, depending on the context, the phrases "if determined" or "if detected (statement or event stated)" can be interpreted as "when determined" or "in response to determination" or "when detected (statement or event stated) ) "Or" in response to detection (statement or event stated) ".

FIG. 1 is a flowchart of a music generation method based on a generation confrontation network according to an embodiment of the present application. As shown in FIG. 1, the method includes:

Step S101: Acquire a music training signal. The music training signal includes a multi-track polyphony real music signal and a plurality of preset real music music signals;

Step S102: Extract a feature matrix from the music training signal as music training sample data;

Step S103, construct and generate an adversarial network model, and train and generate an adversarial network model through music training sample data to obtain the trained network parameters of the generated adversarial network model;

Step S104: Acquire a random music signal input by the user. The random music signal includes at least one of the following: a multi-track polyphony random music signal and a plurality of preset random music tracks;

In step S105, the music random signal is input to generate an adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.

In this scheme, by constructing a generative adversarial network model and using the dynamic game process composed of discriminators and generators, multi-track polyphony music signals are finally generated, and there is coordination among multiple tracks of polyphony music. Therefore, the problem that it is difficult to generate coordinated polyphony music between multiple audio tracks in the prior art is solved.

Optionally, the music training signal is a real music signal collected in advance, for example, first collecting midi data of 200 "D major Cannon" in advance. Music training signals include piano solo, violin solo, cello solo, ensemble, etc. Multiple preset tracks are represented as different musical instruments, such as piano, string, percussion, brass instruments, etc.

Optionally, extracting the feature matrix from the music training signal includes: extracting the start time, duration and pitch of each note in each music training signal; determining the note according to the start time, duration and pitch of each note The feature vector of the music; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; and use the feature matrix of the music training signal as the music training sample data.

Alternatively, the way to extract the feature matrix from the music training signal can be performed by the piano rolling window editor.

Optionally, construct and generate an adversarial network model, and train the generated adversarial network model through music training sample data to obtain the trained network parameters of the generated adversarial network model, including:

The first step is to build a generative adversarial network model, which includes at least one generator and one discriminator. The generator is used to perform rhythm adjustment on the input real music signals of multiple preset audio tracks and output the adjusted multi-track polyphony music signals, and the discriminator is used to determine whether the input music signals are output by the generator.

Among them, the generation of adversarial networks (Generative Adversarial Networks, GAN) is inspired by a two-player game in game theory. The two players in the GAN model are composed of a generator (generative model) and a discriminator ( discriminative model). The generator captures the distribution of the music training sample data and generates a sample similar to the real signal. The pursuit effect is that the more like the real signal, the better. The discriminator is a binary classifier that discriminates the probability that a sample comes from the music training sample data (not the generator's generated data). Common discriminators may include but are not limited to linear regression models, linear discriminant analysis, and support vector machines ( Support Vector (Machine, SVM), neural network, etc. Common generators may include, but are not limited to, deep neural network models, hidden Markov models (Hidden Markov Model, HMM), naive Bayes models, Gaussian mixture models, and so on.

The second step is to train the generator and the discriminator; specifically, fix the discriminator and adjust the network parameters of the generator; fix the generator and adjust the network parameters of the discriminator. In this embodiment, the generator continuously generates more and more realistic and coordinated multi-track polyphony music signals through continuous learning; while the discriminator continuously learns to enhance the reality of the generated multi-track polyphony music signals and multi-track polyphony music The ability to distinguish signals. Through the confrontation between the generator and the discriminator, in the end, the multi-track polyphony music signal generated by the generator is close to the real signal of the multi-track polyphony music and successfully "deceives" the discriminator. Such a trained generative adversarial network model can be used to improve the authenticity of the generated multi-track polyphony music signal.

The specific method of training the generator includes: first, a multitrack polyphony music signal output from the initial generator based on the real music signals of at least two preset audio tracks is input into a pre-trained discriminator, and the discriminator generates the multitrack The probability that the polyphony music signal is a real signal; secondly, the loss of the initial generator is determined based on the probability and the feature matrix similarity between the multitrack polyphony music signal and the real music signal of the at least two preset tracks Function; Finally, use the loss function to update the network parameters of the above initial generator to get the generator. For example, backpropagating the loss function back to the initial generator to update the network parameters of the initial generator. It should be noted that the above training process of the generator is only used to explain the process of adjusting the parameters of the generator. It can be considered that the initial generator is the model before the parameter adjustment, and the generator is the model after the parameter adjustment. The parameter adjustment process is not limited to Once, it can be repeated many times according to the optimization degree of the generator and the actual needs.

The third step is to obtain the network parameters of the trained adversarial network model.

Optionally, there are many ways to generate an anti-network model to automatically generate multi-track polyphony music signals based on music random signals and network parameters. The following three generation methods are provided:

Method 1: Generating an adversarial network model includes a generator and a discriminator, which can be understood as a composer model. The generator receives the random signal of the multi-track polyphony music, and generates a new music signal of multiple preset audio tracks according to the random signal of the multi-track polyphony music, and the discriminator judges that the new music signal of the multiple preset audio tracks generated by the generator is Real signal or generated signal;

When the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks. The new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.

For example: randomly input the music signals of multiple different tracks of a piece made by the composer into the generator, such as piano signals, violin signals, cello signals, etc., but the coordination between the multiple tracks is poor. The multitrack polyphony random signal made by the composer generates new music signals of multiple preset tracks under the adjustment of the generator, and under the discrimination of the discriminator, the new music signals of the generated preset tracks are made closer Real signal, coordination between multiple audio tracks.

Method 2: The generation of the adversarial network model includes multiple generators and multiple discriminators corresponding to the multiple generators. The generation of the adversarial network model automatically generates multi-track polyphony music signals according to the random music signals and network parameters. The receiver receives the random music signal corresponding to a preset audio track, and generates a new music signal of the preset audio track according to the random music signal of the preset audio track, and each discriminator judges a preset audio track generated by a corresponding generator Is the new music signal a real signal or a generated signal;

When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.

For example, randomly input into each generator a musical signal corresponding to an instrument played by a musician, for example: piano. At this time, each musician plays the same tune, but plays different instruments. Multiple musicians interfere with each other, which is easy to cause incoordination between multiple music signals. The random music signal of each musical instrument generates a new music signal of a preset track under the adjustment of a corresponding generator, and under the discrimination of a discriminator, the new music signal of the generated preset track is closer to the real Signal, coordination between multiple audio tracks.

Method 3: Generating an adversarial network model includes multiple generators and a discriminator. The generating an adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters. Each generator receives a random music signal and a multi-track polyphony random signal corresponding to a preset audio track, and generates a preset audio track according to the random music signal and the multi-track polyphony random signal of the preset audio track New music signal; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;

When the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.

For example, the piano music signal in a tune created by a musician and the piano music signal in the music signal of the same tune made by a composer are used as the music random signal of a preset track, corresponding to a generator ’s Under the adjustment, a new music signal of a preset track (piano) is generated. Thus, the music signals made by various musical instruments are generated one by one under the adjustment of a corresponding generator, and a new music signal is generated, and the discrimination of the same discriminator is accepted, so that the generated new music signals of multiple preset tracks are composed The multi-track polyphony music signal is more real, and there is coordination among multiple audio tracks.

An embodiment of the present application provides a music generation device based on a generation confrontation network. The device is used to execute the above-mentioned music generation method based on generation confrontation network. As shown in FIG. 2, the device includes: a first acquisition unit 10 and an extraction unit 20. Construction unit 30, second acquisition unit 40, and generation unit 50.

The first obtaining unit 10 is configured to obtain a music training signal, the music training signal includes a multi-track polyphony real music signal and a plurality of preset real music music signals;

The extraction unit 20 is used to extract a feature matrix from the music training signal as music training sample data;

The construction unit 30 is used to construct and generate an adversarial network model, and train and generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model;

The second obtaining unit 40 is configured to obtain a music random signal input by the user, and the music random signal includes at least one of the following: a multi-track polyphony random music signal and a plurality of preset random music tracks of the music track;

The generating unit 50 is configured to input a random signal of music to generate an adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.

In this scheme, by constructing a generative adversarial network model, using the dynamic game process composed of discriminators and generators, multi-track polyphony music signals are finally generated, and there is coordination among multiple tracks of polyphony music. The efficiency of generating polyphonic music can be effectively improved, thereby solving the problem of low efficiency of generating polyphonic music in the prior art.

Optionally, the extraction unit 20 includes: an extraction subunit, a composition subunit, a combined subunit, and a first acquisition subunit.

Extraction subunit, used to extract the start time, duration and pitch of each note in each music training signal; constitute a subunit, used to determine the feature vector of the note according to the start time, duration and pitch of each note The combining subunit is used to combine the feature vectors of the musical notes to obtain the feature matrix of the music training signal; the first obtaining subunit is used to use the feature matrix of the music training signal as music training sample data.

Optionally, the construction unit 30 includes a construction subunit, a training subunit, and a second acquisition subunit.

A construction subunit is used to construct and generate an adversarial network model. The adversarial network model includes at least one generator and one discriminator. The generator is used to perform rhythm adjustment on the input real music signals of multiple preset audio tracks and output the adjusted multi-track polyphony music signals, and the discriminator is used to determine whether the input music signals are output by the generator.

The training subunit is used to train the generator and the discriminator; specifically, the fixed discriminator is used to adjust the network parameters of the generator; the fixed generator is used to adjust the network parameters of the discriminator. In this embodiment, the generator continuously generates more and more realistic and coordinated multi-track polyphony music signals through continuous learning; while the discriminator continuously learns to enhance the reality of the generated multi-track polyphony music signals and multi-track polyphony music The ability to distinguish signals. Through the confrontation between the generator and the discriminator, in the end, the multi-track polyphony music signal generated by the generator is close to the real signal of the multi-track polyphony music and successfully "deceives" the discriminator. Such a trained generative adversarial network model can be used to improve the authenticity of the generated multi-track polyphony music signal.

The second acquisition subunit is used to acquire the network parameters of the trained adversarial network model.

Optionally, generating an adversarial network model includes a generator and a discriminator, which can be understood as a composer model. The generator is used to receive the random signal of the multi-track polyphony music, and generate new music signals of multiple preset audio tracks according to the random signal of the multi-track polyphony music, and the discriminator is used to determine the new music of the multiple preset audio tracks generated by the generator Whether the signal is a real signal or a generated signal;

Optionally, the generation of the adversarial network model includes multiple generators and multiple discriminators corresponding to the multiple generators. The generation of the adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters, each The generator receives the music random signal corresponding to a preset audio track, and generates a new music signal of the preset audio track according to the music random signal of the preset audio track, and each discriminator judges a preset sound generated by a corresponding generator Whether the new music signal of the track is a real signal or a generated signal;

For example, a music signal corresponding to a musical instrument played by a musician, such as a piano, is randomly input into each generator. At this time, each musician plays the same tune, but plays different instruments. Multiple musicians interfere with each other, which is easy to cause incoordination between multiple music signals. The random music signal of each musical instrument generates a new music signal of a preset track under the adjustment of a corresponding generator, and under the discrimination of a discriminator, the new music signal of the generated preset track is closer to the real Signal, coordination between multiple audio tracks.

Optionally, the generation of the confrontation network model includes multiple generators and a discriminator, and the generation of the confrontation network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters. Each generator receives a random music signal and a multi-track polyphony random signal corresponding to a preset audio track, and generates a preset audio track according to the random music signal and the multi-track polyphony random signal of the preset audio track New music signal; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;

An embodiment of the present application provides a computer non-volatile storage medium, where the storage medium includes a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform the following steps:

Obtain music training signals. The music training signals include multi-track polyphony music real signals and music real signals of multiple preset audio tracks; extract the feature matrix from the music training signals as music training sample data; construct and generate an adversarial network model, and Generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model; obtain the music random signal input by the user. The music random signal includes at least one of the following: multi-track polyphony music random signal, multiple Preset the music random signal of the audio track; input the music random signal to generate the adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.

Optionally, when the program is running, the device where the storage medium is located also performs the following steps: the generator receives the multi-track polyphony random music signal, and generates new music signals for multiple preset audio tracks based on the multi-track polyphony random music signal, The discriminator judges whether the new music signals of multiple preset audio tracks generated by the generator are real signals or generated signals;

Optionally, when the program is running, the device where the storage medium is located also performs the following steps: each generator receives a music random signal corresponding to a preset audio track, and generates a preset audio track according to the music random signal of the preset audio track New music signal, each discriminator determines whether the new music signal corresponding to a preset track generated by a generator is a real signal or a generated signal;

Optionally, when the program is running, the device where the storage medium is located also performs the following steps: each generator receives a random music signal corresponding to a preset audio track and a multi-track polyphony random music signal, and according to the The random music signal and the multi-track polyphony random music signal generate a new music signal of a preset track; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;

Optionally, when the program is running, the device that controls the storage medium also performs the following steps: extracting the start time, duration, and pitch of each note in each music training signal; based on the start time, duration, and pitch of each note Highly determine the feature vectors of the notes; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; use the feature matrix of the music training signal as the music training sample data.

As shown in FIG. 3, an embodiment of the present application provides a computer device 100, including a memory 102, a processor 101, and a computer program 103 stored in the memory 102 and executable on the processor 101. The processor The following steps are realized when the computer program is executed:

Optionally, the processor also implements the following steps when executing the computer program: the generator receives the multi-track polyphony random music signal, and generates new music signals for multiple preset audio tracks based on the multi-track polyphony random music signal, and the discriminator determines Whether the new music signals of multiple preset audio tracks generated by the generator are real signals or generated signals;

Optionally, the processor also implements the following steps when executing the computer program: each generator receives a music random signal corresponding to a preset audio track, and generates a new music of the preset audio track according to the music random signal of the preset audio track Signal, each discriminator determines whether the new music signal corresponding to a preset track generated by a generator is a real signal or a generated signal;

Optionally, the processor also implements the following steps when executing the computer program: each generator receives a random music signal corresponding to a preset audio track and a multi-track polyphony random music signal, and according to the random music signal of the preset audio track And the multi-track polyphony random signal generates a new music signal of a preset track; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;

Optionally, the processor also implements the following steps when executing the computer program: extracting the start time, duration and pitch of each note in each music training signal; determining the note according to the start time, duration and pitch of each note The feature vector of the music; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; and use the feature matrix of the music training signal as the music training sample data.

It should be noted that the terminals involved in the embodiments of the present application may include, but are not limited to, personal computers (Personal Computers, PCs), personal digital assistants (Personal Digital Assistants, PDAs), wireless handheld devices, tablet computers (Tablet Computers), Mobile phones, MP3 players, MP4 players, etc.

It can be understood that the application may be an application program (nativeApp) installed on the terminal, or may also be a webpage program (webApp) of a browser on the terminal, which is not limited in this embodiment of the present application.

Those skilled in the art can clearly understand that for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined Or it can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The above integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.

The above integrated unit implemented in the form of a software functional unit may be stored in a computer-readable storage medium. The above software function unit is stored in a storage medium, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) to perform the methods described in the embodiments of the present application Partial steps. The foregoing storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes .

The above are only the preferred embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of this application should be included in this application Within the scope of protection.

Claims

A music generation method based on a generation confrontation network, characterized in that the method includes:

Obtain a music training signal, the music training signal includes a multi-track polyphony music real signal and a plurality of preset real music music signals;

Extract a feature matrix from the music training signal as music training sample data;

Construct and generate an adversarial network model, and train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model;

Acquire a random signal of music input by a user, the random signal of music comprising at least one of the following: a random signal of multi-track polyphony music, a random signal of music of a plurality of preset audio tracks;

The music random signal is input to the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
The method according to claim 1, wherein the generated adversarial network model includes a generator and a discriminator, and the generated adversarial network model automatically generates a multi-track complex based on the music random signal and the network parameters Tune music signals, including:

The generator receives the random signal of the multi-track polyphony music, and generates new music signals of a plurality of preset audio tracks according to the random signal of the multi-track polyphony music, and the discriminator judges all the signals generated by the generator Whether the new music signals of multiple preset audio tracks are real signals or generated signals;

When the discriminator determines that the new music signals of the plurality of preset audio tracks are real signals, it outputs new music signals of the plurality of preset audio tracks, and the new music signals of the plurality of preset audio tracks Form a brand new multi-track polyphony music signal.
The method according to claim 1, wherein the generating adversarial network model includes a plurality of generators and a plurality of discriminators corresponding to the plurality of generators, the generating adversarial network model is based on the The random music signal and the network parameters automatically generate multi-track polyphony music signals, including:

Each of the generators receives a random music signal corresponding to a preset audio track, and generates a new music signal of the preset audio track according to the random music signal of the preset audio track, and each of the discriminators determines that the corresponding one Whether the new music signal of the one preset audio track generated by the generator is a real signal or a generated signal;

When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track complex Tune the music signal.
The method according to claim 1, wherein the generated adversarial network model includes a plurality of generators and a discriminator, and the generated adversarial network model automatically generates a multi-track based on the music random signal and the network parameters Polyphony music signals, including:

Each of the generators receives a random signal of music corresponding to a preset audio track and a random signal of multi-track polyphony music, and generates according to the random signal of music of the preset audio track and the random signal of multi-track polyphony music A new music signal of a preset audio track; the discriminator determines whether the new music signal of each preset audio track generated by each generator is a real signal or a generated signal;

When the discriminator determines that the new music signal of each preset audio track generated by the generator is a real signal, it outputs new music signals of multiple preset audio tracks. The new music signal forms a brand new multi-track polyphony music signal.
The method according to claim 1, wherein the extracting a feature matrix from the music training signal includes:

Extract the start time, duration and pitch of each note in each music training signal;

Determine the feature vector of the note according to the start time, duration and pitch of each note;

Combining feature vectors of the musical notes to obtain a feature matrix of the music training signal;

The feature matrix of the music training signal is used as the music training sample data.
A music generation device based on a generation confrontation network, characterized in that the device includes:

The first obtaining unit is used to obtain a music training signal, the music training signal includes a multi-track polyphony real music signal and a plurality of preset real music music signals;

An extraction unit for extracting a feature matrix from the music training signal as music training sample data;

A construction unit, configured to construct a generated adversarial network model, and train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model;

A second obtaining unit, configured to obtain a music random signal input by the user, the music random signal including at least one of the following: a multi-track polyphony random music signal, a plurality of preset random music tracks of the music signal;

The generating unit is configured to input the music random signal into the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
The apparatus according to claim 6, wherein the generating adversarial network model includes multiple generators and multiple discriminators corresponding to the multiple generators in one-to-one correspondence; each of the generators receives a corresponding one A random music signal of a preset audio track, and generating a new music signal of a preset audio track according to the random music signal of the preset audio track, and each of the discriminators determines that the one generated by the generator corresponds to Whether the new music signal of the preset audio track is a real signal or a generated signal;

When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track complex Tune the music signal.
The device according to claim 6, characterized in that the generated adversarial network model includes multiple generators and a discriminator, and each of the generators receives a random music signal corresponding to a preset audio track and a multi-track Polyphony the random music signal, and generate a new music signal of the preset audio track according to the random music signal of the preset audio track and the multitrack polyphony random music signal; the discriminator judges each of the generators Whether the generated new music signal of the one preset audio track is a real signal or a generated signal;

When the discriminator determines that the new music signal of each preset audio track generated by the generator is a real signal, it outputs new music signals of multiple preset audio tracks. The new music signal forms a brand new multi-track polyphony music signal.
The apparatus according to claim 6, wherein the generated adversarial network model includes a generator and a discriminator, the generator receives the random signal of the multi-track polyphony music, and according to the multi-track complex Tune the music random signal to generate new music signals of multiple preset audio tracks, and the discriminator determines whether the new music signals of the multiple preset audio tracks generated by the generator are real signals or generated signals;

When the discriminator determines that the new music signals of the plurality of preset audio tracks are real signals, it outputs new music signals of the plurality of preset audio tracks, and the new music signals of the plurality of preset audio tracks Form a brand new multi-track polyphony music signal.
The apparatus according to claim 6, wherein the extraction unit comprises:

An extraction subunit, used to extract the start time, duration and pitch of each note in each of the music training signals;

Forming a sub-unit for determining the feature vector of the note according to the start time, duration and pitch of each note;

A combination subunit, configured to combine feature vectors of the musical notes to obtain a feature matrix of the music training signal;

The first obtaining subunit is configured to use the feature matrix of the music training signal as the music training sample data.
A computer non-volatile storage medium, characterized in that the storage medium includes a stored program, characterized in that, when the program is running, the device where the storage medium is located is controlled to perform the following steps:

Obtain a music training signal, the music training signal includes a multi-track polyphony music real signal and a plurality of preset real music music signals;

Extract a feature matrix from the music training signal as music training sample data;

Construct and generate an adversarial network model, and train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model;

Acquire a random signal of music input by a user, the random signal of music comprising at least one of the following: a random signal of multi-track polyphony music, a random signal of music of a plurality of preset audio tracks;

The music random signal is input to the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
The computer non-volatile storage medium according to claim 11, wherein when the program is running, the device where the storage medium is located is controlled to perform the following steps:

The generative adversarial network model includes a generator and a discriminator. The generator receives the multi-track polyphony random signal and generates a plurality of new preset tracks based on the multi-track polyphony random signal. Music signals, the discriminator determines whether the new music signals of the plurality of preset audio tracks generated by the generator are real signals or generated signals;

When the discriminator determines that the new music signals of the plurality of preset audio tracks are real signals, it outputs new music signals of the plurality of preset audio tracks, and the new music signals of the plurality of preset audio tracks Form a brand new multi-track polyphony music signal.
The computer non-volatile storage medium according to claim 11, wherein when the program is running, the device where the storage medium is located is controlled to perform the following steps:

The generative adversarial network model includes a plurality of generators and a plurality of discriminators corresponding to the plurality of generators, each of the generators receives a random signal of music corresponding to a preset audio track, and according to the The random music signal of the preset audio track generates a new music signal of the preset audio track, and each of the discriminators determines whether the new music signal corresponding to the one preset audio track generated by the generator is a real signal or generated signal of;

When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track complex Tune the music signal.
The computer non-volatile storage medium according to claim 11, wherein when the program is running, the device where the storage medium is located is controlled to perform the following steps:

The generative adversarial network model includes a plurality of generators and a discriminator, each of the generators receives a random music signal corresponding to a preset audio track and a multi-track polyphony random music signal, and according to the preset sound A random music signal of the track and the multi-track polyphony random music signal to generate a new music signal of a preset audio track; the discriminator judges the new music signal of the one preset audio track generated by each generator Whether it is a real signal or a generated signal;

When the discriminator determines that the new music signal of each preset audio track generated by the generator is a real signal, it outputs new music signals of multiple preset audio tracks. The new music signal forms a brand new multi-track polyphony music signal.
The computer non-volatile storage medium according to claim 11, wherein when the program is running, the device where the storage medium is located is controlled to perform the following steps:

Extract the start time, duration and pitch of each note in each music training signal;

Determine the feature vector of the note according to the start time, duration and pitch of each note;

Combining feature vectors of the musical notes to obtain a feature matrix of the music training signal;

The feature matrix of the music training signal is used as the music training sample data.
A computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the following steps when executing the computer program:

Obtain a music training signal, the music training signal includes a multi-track polyphony music real signal and a plurality of preset real music music signals;

Extract a feature matrix from the music training signal as music training sample data;

Construct and generate an adversarial network model, and train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model;

Acquire a random signal of music input by a user, the random signal of music comprising at least one of the following: a random signal of multi-track polyphony music, a random signal of music of a plurality of preset audio tracks;

The music random signal is input to the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
The computer device according to claim 16, characterized in that the generated adversarial network model includes a generator and a discriminator, the generator receives the multi-track polyphony random signal and according to the multi-track The polyphony music random signal generates new music signals of a plurality of preset audio tracks, and the discriminator determines whether the new music signals of the plurality of preset audio tracks generated by the generator are real signals or generated signals;

When the discriminator determines that the new music signals of the plurality of preset audio tracks are real signals, it outputs new music signals of the plurality of preset audio tracks, and the new music signals of the plurality of preset audio tracks Form a brand new multi-track polyphony music signal.
The computer device according to claim 16, characterized in that the generating adversarial network model includes a plurality of generators and a plurality of discriminators corresponding to the plurality of generators in one-to-one correspondence, and each of the generators receives a corresponding A music random signal of a preset audio track, and a new music signal of a preset audio track is generated according to the music random signal of the preset audio track, and each of the discriminators determines that the corresponding one generated by the generator Whether the new music signal of a preset audio track is a real signal or a generated signal;

When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track complex Tune the music signal.
The computer device according to claim 16, characterized in that the generated adversarial network model includes a plurality of generators and a discriminator, and each of the generators receives a music random signal corresponding to a preset audio track and a multi Track polyphony music random signal, and generate a new music signal of the preset audio track according to the music random signal of the preset audio track and the multi-track polyphony music random signal; the discriminator judges each of the generated Whether the new music signal of the one preset audio track generated by the device is a real signal or a generated signal;

When the discriminator determines that the new music signal of each preset audio track generated by the generator is a real signal, it outputs new music signals of multiple preset audio tracks. The new music signal forms a brand new multi-track polyphony music signal.
The computer device according to claim 16, wherein the processor further implements the following steps when executing the computer program:

Extract the start time, duration and pitch of each note in each music training signal;

Determine the feature vector of the note according to the start time, duration and pitch of each note;

Combining feature vectors of the musical notes to obtain a feature matrix of the music training signal;

The feature matrix of the music training signal is used as the music training sample data.