WO2020082574A1 - Generative adversarial network-based music generation method and device - Google Patents

Generative adversarial network-based music generation method and device Download PDF

Info

Publication number
WO2020082574A1
WO2020082574A1 PCT/CN2018/123550 CN2018123550W WO2020082574A1 WO 2020082574 A1 WO2020082574 A1 WO 2020082574A1 CN 2018123550 W CN2018123550 W CN 2018123550W WO 2020082574 A1 WO2020082574 A1 WO 2020082574A1
Authority
WO
WIPO (PCT)
Prior art keywords
music
signal
track
preset audio
signals
Prior art date
Application number
PCT/CN2018/123550
Other languages
French (fr)
Chinese (zh)
Inventor
王义文
刘奡智
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020082574A1 publication Critical patent/WO2020082574A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/145Composing rules, e.g. harmonic or musical rules, for use in automatic composition; Rule generation algorithms therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/151Music Composition or musical creation; Tools or processes therefor using templates, i.e. incomplete musical sections, as a basis for composing

Definitions

  • the present application relates to the field of data processing technology, and in particular, to a music generation method and device based on a generation confrontation network.
  • the embodiments of the present application provide a music generation method and device based on a generation confrontation network to solve the problem that it is difficult to generate coordinated polyphony music among multiple audio tracks in the prior art.
  • a music generation method based on a generative adversarial network model.
  • the method includes: acquiring a music training signal, the music training signal including a multi-track polyphony music real signal and Real music signals of multiple preset audio tracks; extracting a feature matrix from the music training signal as music training sample data; constructing and generating an adversarial network model, and training the generated adversarial network model through the music training sample data, Obtain the trained network parameters of the generated adversarial network model; obtain the music random signal input by the user, the music random signal including at least one of the following: multi-track polyphony random music signal, multiple random music preset music tracks Signal; input the music random signal into the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
  • a music generation device based on a generation confrontation network
  • the device includes: a first acquisition unit for acquiring a music training signal, the music training signal including multiple tracks Polyphony real music signal and real music signals of multiple preset audio tracks; extraction unit, used to extract feature matrix from the music training signal as music training sample data; construction unit, used to construct a confrontation network model, And train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model; a second acquisition unit is used to acquire a music random signal input by the user, the music random signal At least one of the following: a multi-track polyphony random music signal, a plurality of preset random music tracks of the music random signal; a generating unit for inputting the music random signal into the generative confrontation network model to make the generative confrontation
  • the network model automatically generates multitrack polyphony based on the music random signal and the network parameters Music signal.
  • a computer non-volatile storage medium the storage medium includes a stored program, and when the program is running, the device where the storage medium is located is controlled to execute the above-mentioned music Generation method.
  • a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing The steps of the above-mentioned music generation method are realized when the computer program is described.
  • FIG. 1 is a flowchart of a music generation method based on a generation confrontation network according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a music generation device based on a generation confrontation network according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a computer device according to an embodiment of the present application.
  • first, second, third, etc. may be used to describe the terminals in the embodiments of the present application, these terminals should not be limited to these terms. These terms are only used to distinguish the terminals from each other.
  • first acquiring unit may also be referred to as a second acquiring unit, and similarly, the second acquiring unit may also be referred to as a first acquiring unit.
  • the word “if” as used herein may be interpreted as “when” or “when” or “in response to determination” or “in response to detection”.
  • the phrases “if determined” or “if detected (statement or event stated)” can be interpreted as “when determined” or “in response to determination” or “when detected (statement or event stated) ) “Or” in response to detection (statement or event stated) ".
  • FIG. 1 is a flowchart of a music generation method based on a generation confrontation network according to an embodiment of the present application. As shown in FIG. 1, the method includes:
  • Step S101 Acquire a music training signal.
  • the music training signal includes a multi-track polyphony real music signal and a plurality of preset real music music signals;
  • Step S102 Extract a feature matrix from the music training signal as music training sample data
  • Step S103 construct and generate an adversarial network model, and train and generate an adversarial network model through music training sample data to obtain the trained network parameters of the generated adversarial network model;
  • Step S104 Acquire a random music signal input by the user.
  • the random music signal includes at least one of the following: a multi-track polyphony random music signal and a plurality of preset random music tracks;
  • step S105 the music random signal is input to generate an adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
  • the music training signal is a real music signal collected in advance, for example, first collecting midi data of 200 "D major Cannon" in advance.
  • Music training signals include piano solo, violin solo, cello solo, ensemble, etc. Multiple preset tracks are represented as different musical instruments, such as piano, string, percussion, brass instruments, etc.
  • extracting the feature matrix from the music training signal includes: extracting the start time, duration and pitch of each note in each music training signal; determining the note according to the start time, duration and pitch of each note The feature vector of the music; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; and use the feature matrix of the music training signal as the music training sample data.
  • the way to extract the feature matrix from the music training signal can be performed by the piano rolling window editor.
  • construct and generate an adversarial network model and train the generated adversarial network model through music training sample data to obtain the trained network parameters of the generated adversarial network model, including:
  • the first step is to build a generative adversarial network model, which includes at least one generator and one discriminator.
  • the generator is used to perform rhythm adjustment on the input real music signals of multiple preset audio tracks and output the adjusted multi-track polyphony music signals
  • the discriminator is used to determine whether the input music signals are output by the generator.
  • GAN Generative Adversarial Networks
  • the two players in the GAN model are composed of a generator (generative model) and a discriminator ( discriminative model).
  • the generator captures the distribution of the music training sample data and generates a sample similar to the real signal.
  • the pursuit effect is that the more like the real signal, the better.
  • the discriminator is a binary classifier that discriminates the probability that a sample comes from the music training sample data (not the generator's generated data).
  • Common discriminators may include but are not limited to linear regression models, linear discriminant analysis, and support vector machines ( Support Vector (Machine, SVM), neural network, etc.
  • Common generators may include, but are not limited to, deep neural network models, hidden Markov models (Hidden Markov Model, HMM), naive Bayes models, Gaussian mixture models, and so on.
  • the second step is to train the generator and the discriminator; specifically, fix the discriminator and adjust the network parameters of the generator; fix the generator and adjust the network parameters of the discriminator.
  • the generator continuously generates more and more realistic and coordinated multi-track polyphony music signals through continuous learning; while the discriminator continuously learns to enhance the reality of the generated multi-track polyphony music signals and multi-track polyphony music The ability to distinguish signals.
  • the multi-track polyphony music signal generated by the generator is close to the real signal of the multi-track polyphony music and successfully "deceives" the discriminator.
  • Such a trained generative adversarial network model can be used to improve the authenticity of the generated multi-track polyphony music signal.
  • the specific method of training the generator includes: first, a multitrack polyphony music signal output from the initial generator based on the real music signals of at least two preset audio tracks is input into a pre-trained discriminator, and the discriminator generates the multitrack The probability that the polyphony music signal is a real signal; secondly, the loss of the initial generator is determined based on the probability and the feature matrix similarity between the multitrack polyphony music signal and the real music signal of the at least two preset tracks Function; Finally, use the loss function to update the network parameters of the above initial generator to get the generator. For example, backpropagating the loss function back to the initial generator to update the network parameters of the initial generator.
  • the above training process of the generator is only used to explain the process of adjusting the parameters of the generator. It can be considered that the initial generator is the model before the parameter adjustment, and the generator is the model after the parameter adjustment.
  • the parameter adjustment process is not limited to Once, it can be repeated many times according to the optimization degree of the generator and the actual needs.
  • the third step is to obtain the network parameters of the trained adversarial network model.
  • Method 1 Generating an adversarial network model includes a generator and a discriminator, which can be understood as a composer model.
  • the generator receives the random signal of the multi-track polyphony music, and generates a new music signal of multiple preset audio tracks according to the random signal of the multi-track polyphony music, and the discriminator judges that the new music signal of the multiple preset audio tracks generated by the generator is Real signal or generated signal;
  • the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks.
  • the new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
  • the multitrack polyphony random signal made by the composer generates new music signals of multiple preset tracks under the adjustment of the generator, and under the discrimination of the discriminator, the new music signals of the generated preset tracks are made closer Real signal, coordination between multiple audio tracks.
  • the generation of the adversarial network model includes multiple generators and multiple discriminators corresponding to the multiple generators.
  • the generation of the adversarial network model automatically generates multi-track polyphony music signals according to the random music signals and network parameters.
  • the receiver receives the random music signal corresponding to a preset audio track, and generates a new music signal of the preset audio track according to the random music signal of the preset audio track, and each discriminator judges a preset audio track generated by a corresponding generator Is the new music signal a real signal or a generated signal;
  • the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
  • each generator randomly input into each generator a musical signal corresponding to an instrument played by a musician, for example: piano. At this time, each musician plays the same tune, but plays different instruments. Multiple musicians interfere with each other, which is easy to cause incoordination between multiple music signals.
  • the random music signal of each musical instrument generates a new music signal of a preset track under the adjustment of a corresponding generator, and under the discrimination of a discriminator, the new music signal of the generated preset track is closer to the real Signal, coordination between multiple audio tracks.
  • Method 3 Generating an adversarial network model includes multiple generators and a discriminator.
  • the generating an adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
  • Each generator receives a random music signal and a multi-track polyphony random signal corresponding to a preset audio track, and generates a preset audio track according to the random music signal and the multi-track polyphony random signal of the preset audio track New music signal;
  • the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
  • the piano music signal in a tune created by a musician and the piano music signal in the music signal of the same tune made by a composer are used as the music random signal of a preset track, corresponding to a generator ’s Under the adjustment, a new music signal of a preset track (piano) is generated.
  • the music signals made by various musical instruments are generated one by one under the adjustment of a corresponding generator, and a new music signal is generated, and the discrimination of the same discriminator is accepted, so that the generated new music signals of multiple preset tracks are composed
  • the multi-track polyphony music signal is more real, and there is coordination among multiple audio tracks.
  • An embodiment of the present application provides a music generation device based on a generation confrontation network.
  • the device is used to execute the above-mentioned music generation method based on generation confrontation network.
  • the device includes: a first acquisition unit 10 and an extraction unit 20. Construction unit 30, second acquisition unit 40, and generation unit 50.
  • the first obtaining unit 10 is configured to obtain a music training signal, the music training signal includes a multi-track polyphony real music signal and a plurality of preset real music music signals;
  • the extraction unit 20 is used to extract a feature matrix from the music training signal as music training sample data
  • the construction unit 30 is used to construct and generate an adversarial network model, and train and generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model;
  • the second obtaining unit 40 is configured to obtain a music random signal input by the user, and the music random signal includes at least one of the following: a multi-track polyphony random music signal and a plurality of preset random music tracks of the music track;
  • the generating unit 50 is configured to input a random signal of music to generate an adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
  • the music training signal is a real music signal collected in advance, for example, first collecting midi data of 200 "D major Cannon" in advance.
  • Music training signals include piano solo, violin solo, cello solo, ensemble, etc. Multiple preset tracks are represented as different musical instruments, such as piano, string, percussion, brass instruments, etc.
  • the extraction unit 20 includes: an extraction subunit, a composition subunit, a combined subunit, and a first acquisition subunit.
  • Extraction subunit used to extract the start time, duration and pitch of each note in each music training signal; constitute a subunit, used to determine the feature vector of the note according to the start time, duration and pitch of each note
  • the combining subunit is used to combine the feature vectors of the musical notes to obtain the feature matrix of the music training signal; the first obtaining subunit is used to use the feature matrix of the music training signal as music training sample data.
  • the way to extract the feature matrix from the music training signal can be performed by the piano rolling window editor.
  • the construction unit 30 includes a construction subunit, a training subunit, and a second acquisition subunit.
  • a construction subunit is used to construct and generate an adversarial network model.
  • the adversarial network model includes at least one generator and one discriminator.
  • the generator is used to perform rhythm adjustment on the input real music signals of multiple preset audio tracks and output the adjusted multi-track polyphony music signals, and the discriminator is used to determine whether the input music signals are output by the generator.
  • GAN Generative Adversarial Networks
  • the two players in the GAN model are composed of a generator (generative model) and a discriminator ( discriminative model).
  • the generator captures the distribution of the music training sample data and generates a sample similar to the real signal.
  • the pursuit effect is that the more like the real signal, the better.
  • the discriminator is a binary classifier that discriminates the probability that a sample comes from the music training sample data (not the generator's generated data).
  • Common discriminators may include but are not limited to linear regression models, linear discriminant analysis, and support vector machines ( Support Vector (Machine, SVM), neural network, etc.
  • Common generators may include, but are not limited to, deep neural network models, hidden Markov models (Hidden Markov Model, HMM), naive Bayes models, Gaussian mixture models, and so on.
  • the training subunit is used to train the generator and the discriminator; specifically, the fixed discriminator is used to adjust the network parameters of the generator; the fixed generator is used to adjust the network parameters of the discriminator.
  • the generator continuously generates more and more realistic and coordinated multi-track polyphony music signals through continuous learning; while the discriminator continuously learns to enhance the reality of the generated multi-track polyphony music signals and multi-track polyphony music The ability to distinguish signals.
  • the multi-track polyphony music signal generated by the generator is close to the real signal of the multi-track polyphony music and successfully "deceives" the discriminator.
  • Such a trained generative adversarial network model can be used to improve the authenticity of the generated multi-track polyphony music signal.
  • the specific method of training the generator includes: first, a multitrack polyphony music signal output from the initial generator based on the real music signals of at least two preset audio tracks is input into a pre-trained discriminator, and the discriminator generates the multitrack The probability that the polyphony music signal is a real signal; secondly, the loss of the initial generator is determined based on the probability and the feature matrix similarity between the multitrack polyphony music signal and the real music signal of the at least two preset tracks Function; Finally, use the loss function to update the network parameters of the above initial generator to get the generator. For example, backpropagating the loss function back to the initial generator to update the network parameters of the initial generator.
  • the above training process of the generator is only used to explain the process of adjusting the parameters of the generator. It can be considered that the initial generator is the model before the parameter adjustment, and the generator is the model after the parameter adjustment.
  • the parameter adjustment process is not limited to Once, it can be repeated many times according to the optimization degree of the generator and the actual needs.
  • the second acquisition subunit is used to acquire the network parameters of the trained adversarial network model.
  • generating an adversarial network model includes a generator and a discriminator, which can be understood as a composer model.
  • the generator is used to receive the random signal of the multi-track polyphony music, and generate new music signals of multiple preset audio tracks according to the random signal of the multi-track polyphony music, and the discriminator is used to determine the new music of the multiple preset audio tracks generated by the generator Whether the signal is a real signal or a generated signal;
  • the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks.
  • the new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
  • the multitrack polyphony random signal made by the composer generates new music signals of multiple preset tracks under the adjustment of the generator, and under the discrimination of the discriminator, the new music signals of the generated preset tracks are made closer Real signal, coordination between multiple audio tracks.
  • the generation of the adversarial network model includes multiple generators and multiple discriminators corresponding to the multiple generators.
  • the generation of the adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters, each The generator receives the music random signal corresponding to a preset audio track, and generates a new music signal of the preset audio track according to the music random signal of the preset audio track, and each discriminator judges a preset sound generated by a corresponding generator Whether the new music signal of the track is a real signal or a generated signal;
  • the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
  • a music signal corresponding to a musical instrument played by a musician is randomly input into each generator.
  • each musician plays the same tune, but plays different instruments. Multiple musicians interfere with each other, which is easy to cause incoordination between multiple music signals.
  • the random music signal of each musical instrument generates a new music signal of a preset track under the adjustment of a corresponding generator, and under the discrimination of a discriminator, the new music signal of the generated preset track is closer to the real Signal, coordination between multiple audio tracks.
  • the generation of the confrontation network model includes multiple generators and a discriminator, and the generation of the confrontation network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
  • Each generator receives a random music signal and a multi-track polyphony random signal corresponding to a preset audio track, and generates a preset audio track according to the random music signal and the multi-track polyphony random signal of the preset audio track New music signal;
  • the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
  • the piano music signal in a tune created by a musician and the piano music signal in the music signal of the same tune made by a composer are used as the music random signal of a preset track, corresponding to a generator ’s Under the adjustment, a new music signal of a preset track (piano) is generated.
  • the music signals made by various musical instruments are generated one by one under the adjustment of a corresponding generator, and a new music signal is generated, and the discrimination of the same discriminator is accepted, so that the generated new music signals of multiple preset tracks are composed
  • the multi-track polyphony music signal is more real, and there is coordination among multiple audio tracks.
  • An embodiment of the present application provides a computer non-volatile storage medium, where the storage medium includes a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform the following steps:
  • the music training signals include multi-track polyphony music real signals and music real signals of multiple preset audio tracks; extract the feature matrix from the music training signals as music training sample data; construct and generate an adversarial network model, and Generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model; obtain the music random signal input by the user.
  • the music random signal includes at least one of the following: multi-track polyphony music random signal, multiple Preset the music random signal of the audio track; input the music random signal to generate the adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
  • the device where the storage medium is located also performs the following steps: the generator receives the multi-track polyphony random music signal, and generates new music signals for multiple preset audio tracks based on the multi-track polyphony random music signal, The discriminator judges whether the new music signals of multiple preset audio tracks generated by the generator are real signals or generated signals;
  • the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks.
  • the new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
  • the device where the storage medium is located also performs the following steps: each generator receives a music random signal corresponding to a preset audio track, and generates a preset audio track according to the music random signal of the preset audio track New music signal, each discriminator determines whether the new music signal corresponding to a preset track generated by a generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
  • the device where the storage medium is located also performs the following steps: each generator receives a random music signal corresponding to a preset audio track and a multi-track polyphony random music signal, and according to the The random music signal and the multi-track polyphony random music signal generate a new music signal of a preset track; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
  • the device that controls the storage medium when the program is running, the device that controls the storage medium also performs the following steps: extracting the start time, duration, and pitch of each note in each music training signal; based on the start time, duration, and pitch of each note Highly determine the feature vectors of the notes; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; use the feature matrix of the music training signal as the music training sample data.
  • an embodiment of the present application provides a computer device 100, including a memory 102, a processor 101, and a computer program 103 stored in the memory 102 and executable on the processor 101.
  • the processor The following steps are realized when the computer program is executed:
  • the music training signals include multi-track polyphony music real signals and music real signals of multiple preset audio tracks; extract the feature matrix from the music training signals as music training sample data; construct and generate an adversarial network model, and Generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model; obtain the music random signal input by the user.
  • the music random signal includes at least one of the following: multi-track polyphony music random signal, multiple Preset the music random signal of the audio track; input the music random signal to generate the adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
  • the processor also implements the following steps when executing the computer program: the generator receives the multi-track polyphony random music signal, and generates new music signals for multiple preset audio tracks based on the multi-track polyphony random music signal, and the discriminator determines Whether the new music signals of multiple preset audio tracks generated by the generator are real signals or generated signals;
  • the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks.
  • the new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
  • the processor also implements the following steps when executing the computer program: each generator receives a music random signal corresponding to a preset audio track, and generates a new music of the preset audio track according to the music random signal of the preset audio track Signal, each discriminator determines whether the new music signal corresponding to a preset track generated by a generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
  • the processor also implements the following steps when executing the computer program: each generator receives a random music signal corresponding to a preset audio track and a multi-track polyphony random music signal, and according to the random music signal of the preset audio track And the multi-track polyphony random signal generates a new music signal of a preset track; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
  • the processor also implements the following steps when executing the computer program: extracting the start time, duration and pitch of each note in each music training signal; determining the note according to the start time, duration and pitch of each note The feature vector of the music; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; and use the feature matrix of the music training signal as the music training sample data.
  • terminals involved in the embodiments of the present application may include, but are not limited to, personal computers (Personal Computers, PCs), personal digital assistants (Personal Digital Assistants, PDAs), wireless handheld devices, tablet computers (Tablet Computers), Mobile phones, MP3 players, MP4 players, etc.
  • the application may be an application program (nativeApp) installed on the terminal, or may also be a webpage program (webApp) of a browser on the terminal, which is not limited in this embodiment of the present application.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined Or it can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the above integrated unit implemented in the form of a software functional unit may be stored in a computer-readable storage medium.
  • the above software function unit is stored in a storage medium, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) to perform the methods described in the embodiments of the present application Partial steps.
  • the foregoing storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes .

Abstract

Embodiments of the present application provide a generative adversarial network-based music generation method and a device, pertaining to the technical field of artificial intelligence. The method comprises: acquiring a training music signal comprising a real multi-track polyphonic music signal and multiple real music signals in multiple pre-determined audio tracks; extracting a feature matrix from the training music signal as training music sample data; constructing a generative adversarial network model, training the generative adversarial network model, and acquiring a network parameter of the trained generative adversarial network model; acquiring a random music signal input by a user; and inputting the random music signal into the generative adversarial network model, such that the generative adversarial network model automatically generates a multi-track polyphonic music signal according to the random music signal and the network parameter. The technical solution provided by the embodiments of the present application solves the problem in the prior art in which polyphonic music having multiple harmonious audio tracks is difficult to generate.

Description

一种基于生成对抗网络的音乐生成方法及装置Music generation method and device based on generation confrontation network
本申请要求于2018年10月26日提交中国专利局、申请号为201811257179.3、申请名称为“一种基于生成对抗网络的音乐生成方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application filed on October 26, 2018 in the Chinese Patent Office, with the application number 201811257179.3 and the application name as "a music generation method and device based on generating an adversarial network", the entire content of which is cited by reference Incorporated in this application.
技术领域Technical field
本申请涉及数据处理技术领域,尤其涉及一种基于生成对抗网络的音乐生成方法及装置。The present application relates to the field of data processing technology, and in particular, to a music generation method and device based on a generation confrontation network.
背景技术Background technique
音乐通常由多个乐器/音轨组成,并具有各自的时间动态,音乐会随着时间的推移而相互依存地展开。自然语言生成和单音音乐生成的成功不容易普及到复调音乐。大多数现有技术选择以某种方式简化复调音乐的生成以使问题易于管理。这种简化包括:仅生成单轨单声道音乐,为复调音乐引入音符的时间顺序等。Music is usually composed of multiple musical instruments / tracks and has its own time dynamics, and the concert expands interdependently with the passage of time. The success of natural language generation and monophonic music generation is not easily spread to polyphonic music. Most existing technologies choose to simplify the generation of polyphonic music in some way to make the problem easier to manage. This simplification includes: only generating mono track monophonic music, introducing chronological order of polyphonic music, etc.
因此,如何生成多个音轨之间协调的复调音乐成为目前亟待解决的问题。Therefore, how to generate coordinated polyphony music among multiple audio tracks has become an urgent problem to be solved.
申请内容Application content
有鉴于此,本申请实施例提供了一种基于生成对抗网络的音乐生成方法及装置,用以解决现有技术中难以生成多个音轨之间协调的复调音乐的问题。In view of this, the embodiments of the present application provide a music generation method and device based on a generation confrontation network to solve the problem that it is difficult to generate coordinated polyphony music among multiple audio tracks in the prior art.
为了实现上述目的,根据本申请的一个方面,提供了一种基于生成对抗网络模型的音乐生成方法,所述方法包括:获取音乐训练信号,所述音乐训练信号包括多轨复调音乐真实信号及多个预设音轨的音乐真实信号;从所述音乐训练信号中提取特征矩阵,作为音乐训练样本数据;构建生成对抗网络模型,并通过所述音乐训练样本数据训练所述生成对抗网络模型,获得训练好的所述生成对抗网络模型的网络参数;获取用户输入的音乐随机信号,所述音乐随机信号包括以下至少之一:多轨复调音乐随机信号、多个预设音轨的音乐随机信号;将所述音乐随机信号输入所述生成对抗网络模型,以使所述生成对抗网络模型根据所述音乐随机信号及所述网络参数自动生成多轨复调音乐信号。In order to achieve the above object, according to an aspect of the present application, there is provided a music generation method based on a generative adversarial network model. The method includes: acquiring a music training signal, the music training signal including a multi-track polyphony music real signal and Real music signals of multiple preset audio tracks; extracting a feature matrix from the music training signal as music training sample data; constructing and generating an adversarial network model, and training the generated adversarial network model through the music training sample data, Obtain the trained network parameters of the generated adversarial network model; obtain the music random signal input by the user, the music random signal including at least one of the following: multi-track polyphony random music signal, multiple random music preset music tracks Signal; input the music random signal into the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
为了实现上述目的,根据本申请的一个方面,提供了一种基于生成 对抗网络的音乐生成装置,所述装置包括:第一获取单元,用于获取音乐训练信号,所述音乐训练信号包括多轨复调音乐真实信号及多个预设音轨的音乐真实信号;提取单元,用于从所述音乐训练信号中提取特征矩阵,作为音乐训练样本数据;构建单元,用于构建生成对抗网络模型,并通过所述音乐训练样本数据训练所述生成对抗网络模型,获得训练好的所述生成对抗网络模型的网络参数;第二获取单元,用于获取用户输入的音乐随机信号,所述音乐随机信号包括以下至少之一:多轨复调音乐随机信号、多个预设音轨的音乐随机信号;生成单元,用于将所述音乐随机信号输入所述生成对抗网络模型,以使所述生成对抗网络模型根据所述音乐随机信号及所述网络参数自动生成多轨复调音乐信号。In order to achieve the above object, according to an aspect of the present application, there is provided a music generation device based on a generation confrontation network, the device includes: a first acquisition unit for acquiring a music training signal, the music training signal including multiple tracks Polyphony real music signal and real music signals of multiple preset audio tracks; extraction unit, used to extract feature matrix from the music training signal as music training sample data; construction unit, used to construct a confrontation network model, And train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model; a second acquisition unit is used to acquire a music random signal input by the user, the music random signal At least one of the following: a multi-track polyphony random music signal, a plurality of preset random music tracks of the music random signal; a generating unit for inputting the music random signal into the generative confrontation network model to make the generative confrontation The network model automatically generates multitrack polyphony based on the music random signal and the network parameters Music signal.
为了实现上述目的,根据本申请的一个方面,提供了一种计算机非易失性存储介质,所述存储介质包括存储的程序,在所述程序运行时控制所述存储介质所在设备执行上述的音乐生成方法。In order to achieve the above object, according to an aspect of the present application, there is provided a computer non-volatile storage medium, the storage medium includes a stored program, and when the program is running, the device where the storage medium is located is controlled to execute the above-mentioned music Generation method.
为了实现上述目的,根据本申请的一个方面,提供了一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述的音乐生成方法的步骤。In order to achieve the above object, according to an aspect of the present application, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing The steps of the above-mentioned music generation method are realized when the computer program is described.
在本方案中,通过构建生成对抗网络模型,利用判别器和生成器所构成的动态博弈过程,最终生成多轨复调音乐信号,使得复调音乐的多个音轨之间具有协调性,从而解决现有技术中难以生成多个音轨之间协调的复调音乐的问题。In this scheme, by constructing a generative adversarial network model and using the dynamic game process composed of discriminators and generators, a multi-track polyphony music signal is finally generated, so that multiple tracks of polyphony music have coordination, Solve the problem in the prior art that it is difficult to generate coordinated polyphony music between multiple audio tracks.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。In order to more clearly explain the technical solutions of the embodiments of the present application, the drawings required in the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings based on these drawings without creative labor.
图1是根据本申请实施例的一种基于生成对抗网络的音乐生成方法的流程图;FIG. 1 is a flowchart of a music generation method based on a generation confrontation network according to an embodiment of the present application;
图2是根据本申请实施例的一种基于生成对抗网络的音乐生成装置的示意图。2 is a schematic diagram of a music generation device based on a generation confrontation network according to an embodiment of the present application.
图3是根据本申请实施例的一种计算机设备的示意图。3 is a schematic diagram of a computer device according to an embodiment of the present application.
具体实施方式detailed description
为了更好的理解本申请的技术方案,下面结合附图对本申请实施例进行详细描述。In order to better understand the technical solutions of the present application, the following describes the embodiments of the present application in detail with reference to the accompanying drawings.
应当明确,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。It should be clear that the described embodiments are only a part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the scope of protection of the present application.
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The singular forms "a", "said" and "the" used in the embodiments of the present application and the appended claims are also intended to include most forms unless the context clearly indicates other meanings.
应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term “and / or” used herein is merely an association relationship describing an associated object, indicating that there may be three relationships, for example, A and / or B, which may indicate: A exists alone, and A and B, there are three cases of B alone. In addition, the character “/” in this article generally indicates that the related objects before and after are in an “or” relationship.
应当理解,尽管在本申请实施例中可能采用术语第一、第二、第三等来描述终端,但这些终端不应限于这些术语。这些术语仅用来将终端彼此区分开。例如,在不脱离本申请实施例范围的情况下,第一获取单元也可以被称为第二获取单元,类似地,第二获取单元也可以被称为第一获取单元。It should be understood that although the terms first, second, third, etc. may be used to describe the terminals in the embodiments of the present application, these terminals should not be limited to these terms. These terms are only used to distinguish the terminals from each other. For example, without departing from the scope of the embodiments of the present application, the first acquiring unit may also be referred to as a second acquiring unit, and similarly, the second acquiring unit may also be referred to as a first acquiring unit.
取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to determination" or "in response to detection". Similarly, depending on the context, the phrases "if determined" or "if detected (statement or event stated)" can be interpreted as "when determined" or "in response to determination" or "when detected (statement or event stated) ) "Or" in response to detection (statement or event stated) ".
图1是根据本申请实施例的一种基于生成对抗网络的音乐生成方法的流程图,如图1所示,该方法包括:FIG. 1 is a flowchart of a music generation method based on a generation confrontation network according to an embodiment of the present application. As shown in FIG. 1, the method includes:
步骤S101,获取音乐训练信号,音乐训练信号包括多轨复调音乐真实信号及多个预设音轨的音乐真实信号;Step S101: Acquire a music training signal. The music training signal includes a multi-track polyphony real music signal and a plurality of preset real music music signals;
步骤S102,从音乐训练信号中提取特征矩阵,作为音乐训练样本数据;Step S102: Extract a feature matrix from the music training signal as music training sample data;
步骤S103,构建生成对抗网络模型,并通过音乐训练样本数据训练生成对抗网络模型,获得训练好的生成对抗网络模型的网络参数;Step S103, construct and generate an adversarial network model, and train and generate an adversarial network model through music training sample data to obtain the trained network parameters of the generated adversarial network model;
步骤S104,获取用户输入的音乐随机信号,音乐随机信号包括以下至少之一:多轨复调音乐随机信号、多个预设音轨的音乐随机信号;Step S104: Acquire a random music signal input by the user. The random music signal includes at least one of the following: a multi-track polyphony random music signal and a plurality of preset random music tracks;
步骤S105,将音乐随机信号输入生成对抗网络模型,以使生成对抗网络模型根据音乐随机信号及网络参数自动生成多轨复调音乐信号。In step S105, the music random signal is input to generate an adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
在本方案中,通过构建生成对抗网络模型,利用判别器和生成器所构成的动态博弈过程,最终生成多轨复调音乐信号,且复调音乐 的多个音轨之间具有协调性。从而解决现有技术中难以生成多个音轨之间协调的复调音乐的问题。In this scheme, by constructing a generative adversarial network model and using the dynamic game process composed of discriminators and generators, multi-track polyphony music signals are finally generated, and there is coordination among multiple tracks of polyphony music. Therefore, the problem that it is difficult to generate coordinated polyphony music between multiple audio tracks in the prior art is solved.
可选地,音乐训练信号为预先采集的真实音乐信号,例如,先预先收集200首“D大调卡农”的midi数据。音乐训练信号包括钢琴独奏曲、小提琴独奏曲、大提琴独奏曲、合奏曲等。多个预设音轨表示为不同的乐器,例如钢琴、弦乐、打击乐、铜管乐器等。Optionally, the music training signal is a real music signal collected in advance, for example, first collecting midi data of 200 "D major Cannon" in advance. Music training signals include piano solo, violin solo, cello solo, ensemble, etc. Multiple preset tracks are represented as different musical instruments, such as piano, string, percussion, brass instruments, etc.
可选地,从音乐训练信号中提取特征矩阵,包括:提取每个音乐训练信号中每个音符的开始时刻、持续时长及音高;根据每个音符的开始时刻、持续时长及音高确定音符的特征向量;将音符的特征向量进行组合,得到音乐训练信号的特征矩阵;将音乐训练信号的特征矩阵作为音乐训练样本数据。Optionally, extracting the feature matrix from the music training signal includes: extracting the start time, duration and pitch of each note in each music training signal; determining the note according to the start time, duration and pitch of each note The feature vector of the music; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; and use the feature matrix of the music training signal as the music training sample data.
可选地,从音乐训练信号中提取特征矩阵的方式可以通过钢琴卷帘窗编辑器进行。Alternatively, the way to extract the feature matrix from the music training signal can be performed by the piano rolling window editor.
可选地,构建生成对抗网络模型,并通过音乐训练样本数据训练生成对抗网络模型,获得训练好的生成对抗网络模型的网络参数,包括:Optionally, construct and generate an adversarial network model, and train the generated adversarial network model through music training sample data to obtain the trained network parameters of the generated adversarial network model, including:
第一步,构建生成对抗网络模型,生成对抗网络模型包括至少一个生成器及一个判别器。生成器用于对输入的多个预设音轨的音乐真实信号进行节奏调整并输出调整后的多轨复调音乐信号,判别器用于确定所输入的音乐信号是否由生成器输出。The first step is to build a generative adversarial network model, which includes at least one generator and one discriminator. The generator is used to perform rhythm adjustment on the input real music signals of multiple preset audio tracks and output the adjusted multi-track polyphony music signals, and the discriminator is used to determine whether the input music signals are output by the generator.
其中,生成对抗网络(Generative Adversarial Networks,GAN)启发自博弈论中的二人零和博弈(two-player game),GAN模型中的两位博弈方分别由生成器(generative model)和判别器(discriminative model)充当。生成器捕捉音乐训练样本数据的分布,生成一个类似真实信号的样本,追求效果是越像真实信号越好。判别器是一个二分类器,判别一个样本来自于音乐训练样本数据(而非生成器的生成数据)的概率,常见的判别器可以包括但不限于线性回归模型、线性判别分析、支持向量机(Support Vector Machine,SVM)、神经网络等等。常见的生成器可以包括但不限于深度神经网络模型、隐马尔可夫模型(Hidden Markov Model,HMM)、朴素贝叶斯模型、高斯混合模型等等。Among them, the generation of adversarial networks (Generative Adversarial Networks, GAN) is inspired by a two-player game in game theory. The two players in the GAN model are composed of a generator (generative model) and a discriminator ( discriminative model). The generator captures the distribution of the music training sample data and generates a sample similar to the real signal. The pursuit effect is that the more like the real signal, the better. The discriminator is a binary classifier that discriminates the probability that a sample comes from the music training sample data (not the generator's generated data). Common discriminators may include but are not limited to linear regression models, linear discriminant analysis, and support vector machines ( Support Vector (Machine, SVM), neural network, etc. Common generators may include, but are not limited to, deep neural network models, hidden Markov models (Hidden Markov Model, HMM), naive Bayes models, Gaussian mixture models, and so on.
第二步,训练生成器和判别器;具体地,固定判别器,调整生成器的网络参数;固定生成器,调整判别器的网络参数。本实施例中,生成器通过不断学习,生成越来越逼真协调的多轨复调音乐信号;而判别器通过不断地学习,增强对生成的多轨复调音乐信号和多轨复调音乐真实信号的区分能力。通过生成器与判别器之间的对抗,最终,生成器生成的多轨复调音乐信号接近于多轨复调音乐真实信号而成功“欺骗”了判别器。这样的训练好的生成对抗网络模型可以用于提高生成的多轨复调音乐信号的真实性。The second step is to train the generator and the discriminator; specifically, fix the discriminator and adjust the network parameters of the generator; fix the generator and adjust the network parameters of the discriminator. In this embodiment, the generator continuously generates more and more realistic and coordinated multi-track polyphony music signals through continuous learning; while the discriminator continuously learns to enhance the reality of the generated multi-track polyphony music signals and multi-track polyphony music The ability to distinguish signals. Through the confrontation between the generator and the discriminator, in the end, the multi-track polyphony music signal generated by the generator is close to the real signal of the multi-track polyphony music and successfully "deceives" the discriminator. Such a trained generative adversarial network model can be used to improve the authenticity of the generated multi-track polyphony music signal.
其中,训练生成器的具体方式包括:首先,将初始生成器基于至少两个预设音轨的音乐真实信号输出的一个多轨复调音乐信号输入预先训练的判别器,判别器生成该多轨复调音乐信号为真实信号的概率;其次,基于上述概率和上述多轨复调音乐信号与上述至少两个预设音轨的音乐真实信号之间的特征矩阵相似度确定上述初始生成器的损失函数;最后,利用损失函数更新上述初始生成器的网络参数,得到生成器。例如,将上述损失函数反向传播回上述初始生成器,以更新上述初始生成器的网络参数。需要说明的是,上述生成器的训练过程仅仅用于说明生成器参数的调整过程,可以认为初始生成器为参数调整前的模型,生成器为参数调整后的模型,参数的调整过程并不仅限于一次,可以根据生成器的优化程度以及实际需要等重复多次。The specific method of training the generator includes: first, a multitrack polyphony music signal output from the initial generator based on the real music signals of at least two preset audio tracks is input into a pre-trained discriminator, and the discriminator generates the multitrack The probability that the polyphony music signal is a real signal; secondly, the loss of the initial generator is determined based on the probability and the feature matrix similarity between the multitrack polyphony music signal and the real music signal of the at least two preset tracks Function; Finally, use the loss function to update the network parameters of the above initial generator to get the generator. For example, backpropagating the loss function back to the initial generator to update the network parameters of the initial generator. It should be noted that the above training process of the generator is only used to explain the process of adjusting the parameters of the generator. It can be considered that the initial generator is the model before the parameter adjustment, and the generator is the model after the parameter adjustment. The parameter adjustment process is not limited to Once, it can be repeated many times according to the optimization degree of the generator and the actual needs.
第三步,获取训练好的生成对抗网络模型的网络参数。The third step is to obtain the network parameters of the trained adversarial network model.
可选地,生成对抗网络模型根据音乐随机信号及网络参数自动生成多轨复调音乐信号的方式有多种,以下提供三种生成方式:Optionally, there are many ways to generate an anti-network model to automatically generate multi-track polyphony music signals based on music random signals and network parameters. The following three generation methods are provided:
方式一:生成对抗网络模型包括一个生成器及一个判别器,可以理解为作曲家模型。生成器接收多轨复调音乐随机信号,并根据多轨复调音乐随机信号生成多个预设音轨的新音乐信号,判别器判断生成器生成的多个预设音轨的新音乐信号是真实信号还是生成的信号;Method 1: Generating an adversarial network model includes a generator and a discriminator, which can be understood as a composer model. The generator receives the random signal of the multi-track polyphony music, and generates a new music signal of multiple preset audio tracks according to the random signal of the multi-track polyphony music, and the discriminator judges that the new music signal of the multiple preset audio tracks generated by the generator is Real signal or generated signal;
当判别器判断出多个预设音轨的新音乐信号为真实信号时,输出多个预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks. The new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
例如:随机往生成器中输入作曲家作出的一首曲子的多个不同音轨的音乐信号,如钢琴信号,小提琴信号,大提琴信号等,但是多个音轨之间的协调性较差。作曲家作出的多轨复调音乐随机信号在生成器的调整下生成多个预设音轨的新音乐信号,并在判别器的鉴别下,使得生成的预设音轨的新音乐信号更加接近真实信号,多个音轨之间具有协调性。For example: randomly input the music signals of multiple different tracks of a piece made by the composer into the generator, such as piano signals, violin signals, cello signals, etc., but the coordination between the multiple tracks is poor. The multitrack polyphony random signal made by the composer generates new music signals of multiple preset tracks under the adjustment of the generator, and under the discrimination of the discriminator, the new music signals of the generated preset tracks are made closer Real signal, coordination between multiple audio tracks.
方式二:生成对抗网络模型包括多个生成器及与多个生成器一一对应的多个判别器,生成对抗网络模型根据音乐随机信号及网络参数自动生成多轨复调音乐信号,每个生成器接收对应一个预设音轨的音乐随机信号,并根据预设音轨的音乐随机信号生成一个预设音轨的新音乐信号,每个判别器判断对应一个生成器生成的一个预设音轨的新音乐信号是真实信号还是生成的信号;Method 2: The generation of the adversarial network model includes multiple generators and multiple discriminators corresponding to the multiple generators. The generation of the adversarial network model automatically generates multi-track polyphony music signals according to the random music signals and network parameters. The receiver receives the random music signal corresponding to a preset audio track, and generates a new music signal of the preset audio track according to the random music signal of the preset audio track, and each discriminator judges a preset audio track generated by a corresponding generator Is the new music signal a real signal or a generated signal;
当判别器判断出对应一个预设音轨的新音乐信号皆为真实信号时,输出预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
例如:随机往每个生成器中输入对应一个音乐家演奏的一种乐 器的音乐信号,例如:钢琴。此时,每个音乐家演奏的曲子相同,但是演奏的乐器不同。多个音乐家彼此之间相互干扰,容易造成多个音乐信号之间不协调。每种乐器的音乐随机信号在对应一个生成器的调整下生成一个预设音轨的新音乐信号,并在对应一个判别器的鉴别下,使得生成的预设音轨的新音乐信号更加接近真实信号,多个音轨之间具有协调性。For example, randomly input into each generator a musical signal corresponding to an instrument played by a musician, for example: piano. At this time, each musician plays the same tune, but plays different instruments. Multiple musicians interfere with each other, which is easy to cause incoordination between multiple music signals. The random music signal of each musical instrument generates a new music signal of a preset track under the adjustment of a corresponding generator, and under the discrimination of a discriminator, the new music signal of the generated preset track is closer to the real Signal, coordination between multiple audio tracks.
方式三:生成对抗网络模型包括多个生成器及一个判别器,生成对抗网络模型根据音乐随机信号及网络参数自动生成多轨复调音乐信号。每个生成器接收对应一个预设音轨的音乐随机信号及一个多轨复调音乐随机信号,并根据预设音轨的音乐随机信号及多轨复调音乐随机信号生成一个预设音轨的新音乐信号;判别器判断每个生成器生成的一个预设音轨的新音乐信号是真实信号还是生成的信号;Method 3: Generating an adversarial network model includes multiple generators and a discriminator. The generating an adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters. Each generator receives a random music signal and a multi-track polyphony random signal corresponding to a preset audio track, and generates a preset audio track according to the random music signal and the multi-track polyphony random signal of the preset audio track New music signal; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
当判别器判断出每个生成器生成的预设音轨的新音乐信号皆为真实信号时,输出多个预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
例如,将一个音乐家创作的一首曲子中的钢琴音乐信号和一个作曲家作出的同一曲子的音乐信号中的钢琴音乐信号共同作为一个预设音轨的音乐随机信号,在对应一个生成器的调整下,生成一个预设音轨(钢琴)的新音乐信号。从而使得由多种乐器作出的音乐信号一一在对应一个生成器的调整下一一生成新音乐信号,并接受同一个判别器的鉴别,使得生成的多个预设音轨的新音乐信号组成的多轨复调音乐信号更加真实,多个音轨之间具有协调性。For example, the piano music signal in a tune created by a musician and the piano music signal in the music signal of the same tune made by a composer are used as the music random signal of a preset track, corresponding to a generator ’s Under the adjustment, a new music signal of a preset track (piano) is generated. Thus, the music signals made by various musical instruments are generated one by one under the adjustment of a corresponding generator, and a new music signal is generated, and the discrimination of the same discriminator is accepted, so that the generated new music signals of multiple preset tracks are composed The multi-track polyphony music signal is more real, and there is coordination among multiple audio tracks.
本申请实施例提供了一种基于生成对抗网络的音乐生成装置,该装置用于执行上述基于生成对抗网络的音乐生成方法,如图2所示,该装置包括:第一获取单元10、提取单元20、构建单元30、第二获取单元40、生成单元50。An embodiment of the present application provides a music generation device based on a generation confrontation network. The device is used to execute the above-mentioned music generation method based on generation confrontation network. As shown in FIG. 2, the device includes: a first acquisition unit 10 and an extraction unit 20. Construction unit 30, second acquisition unit 40, and generation unit 50.
第一获取单元10,用于获取音乐训练信号,音乐训练信号包括多轨复调音乐真实信号及多个预设音轨的音乐真实信号;The first obtaining unit 10 is configured to obtain a music training signal, the music training signal includes a multi-track polyphony real music signal and a plurality of preset real music music signals;
提取单元20,用于从音乐训练信号中提取特征矩阵,作为音乐训练样本数据;The extraction unit 20 is used to extract a feature matrix from the music training signal as music training sample data;
构建单元30,用于构建生成对抗网络模型,并通过音乐训练样本数据训练生成对抗网络模型,获得训练好的生成对抗网络模型的网络参数;The construction unit 30 is used to construct and generate an adversarial network model, and train and generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model;
第二获取单元40,用于获取用户输入的音乐随机信号,音乐随机信号包括以下至少之一:多轨复调音乐随机信号、多个预设音轨的音乐随机信号;The second obtaining unit 40 is configured to obtain a music random signal input by the user, and the music random signal includes at least one of the following: a multi-track polyphony random music signal and a plurality of preset random music tracks of the music track;
生成单元50,用于将音乐随机信号输入生成对抗网络模型,以使生成对抗网络模型根据音乐随机信号及网络参数自动生成多轨复 调音乐信号。The generating unit 50 is configured to input a random signal of music to generate an adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
在本方案中,通过构建生成对抗网络模型,利用判别器和生成器所构成的动态博弈过程,最终生成多轨复调音乐信号,且复调音乐的多个音轨之间具有协调性。能够有效提高生成复调音乐效率,从而解决现有技术中生成复调音乐效率低的问题。In this scheme, by constructing a generative adversarial network model, using the dynamic game process composed of discriminators and generators, multi-track polyphony music signals are finally generated, and there is coordination among multiple tracks of polyphony music. The efficiency of generating polyphonic music can be effectively improved, thereby solving the problem of low efficiency of generating polyphonic music in the prior art.
可选地,音乐训练信号为预先采集的真实音乐信号,例如,先预先收集200首“D大调卡农”的midi数据。音乐训练信号包括钢琴独奏曲、小提琴独奏曲、大提琴独奏曲、合奏曲等。多个预设音轨表示为不同的乐器,例如钢琴、弦乐、打击乐、铜管乐器等。Optionally, the music training signal is a real music signal collected in advance, for example, first collecting midi data of 200 "D major Cannon" in advance. Music training signals include piano solo, violin solo, cello solo, ensemble, etc. Multiple preset tracks are represented as different musical instruments, such as piano, string, percussion, brass instruments, etc.
可选地,提取单元20包括:提取子单元、构成子单元、组合子单元、第一获取子单元。Optionally, the extraction unit 20 includes: an extraction subunit, a composition subunit, a combined subunit, and a first acquisition subunit.
提取子单元,用于提取每个音乐训练信号中每个音符的开始时刻、持续时长及音高;构成子单元,用于根据每个音符的开始时刻、持续时长及音高确定音符的特征向量;组合子单元,用于将音符的特征向量进行组合,得到音乐训练信号的特征矩阵;第一获取子单元,用于将音乐训练信号的特征矩阵作为音乐训练样本数据。Extraction subunit, used to extract the start time, duration and pitch of each note in each music training signal; constitute a subunit, used to determine the feature vector of the note according to the start time, duration and pitch of each note The combining subunit is used to combine the feature vectors of the musical notes to obtain the feature matrix of the music training signal; the first obtaining subunit is used to use the feature matrix of the music training signal as music training sample data.
可选地,从音乐训练信号中提取特征矩阵的方式可以通过钢琴卷帘窗编辑器进行。Alternatively, the way to extract the feature matrix from the music training signal can be performed by the piano rolling window editor.
可选地,构建单元30包括构建子单元、训练子单元、第二获取子单元。Optionally, the construction unit 30 includes a construction subunit, a training subunit, and a second acquisition subunit.
构建子单元,用于构建生成对抗网络模型,生成对抗网络模型包括至少一个生成器及一个判别器。生成器用于对输入的多个预设音轨的音乐真实信号进行节奏调整并输出调整后的多轨复调音乐信号,判别器用于确定所输入的音乐信号是否由生成器输出。A construction subunit is used to construct and generate an adversarial network model. The adversarial network model includes at least one generator and one discriminator. The generator is used to perform rhythm adjustment on the input real music signals of multiple preset audio tracks and output the adjusted multi-track polyphony music signals, and the discriminator is used to determine whether the input music signals are output by the generator.
其中,生成对抗网络(Generative Adversarial Networks,GAN)启发自博弈论中的二人零和博弈(two-player game),GAN模型中的两位博弈方分别由生成器(generative model)和判别器(discriminative model)充当。生成器捕捉音乐训练样本数据的分布,生成一个类似真实信号的样本,追求效果是越像真实信号越好。判别器是一个二分类器,判别一个样本来自于音乐训练样本数据(而非生成器的生成数据)的概率,常见的判别器可以包括但不限于线性回归模型、线性判别分析、支持向量机(Support Vector Machine,SVM)、神经网络等等。常见的生成器可以包括但不限于深度神经网络模型、隐马尔可夫模型(Hidden Markov Model,HMM)、朴素贝叶斯模型、高斯混合模型等等。Among them, the generation of adversarial networks (Generative Adversarial Networks, GAN) is inspired by a two-player game in game theory. The two players in the GAN model are composed of a generator (generative model) and a discriminator ( discriminative model). The generator captures the distribution of the music training sample data and generates a sample similar to the real signal. The pursuit effect is that the more like the real signal, the better. The discriminator is a binary classifier that discriminates the probability that a sample comes from the music training sample data (not the generator's generated data). Common discriminators may include but are not limited to linear regression models, linear discriminant analysis, and support vector machines ( Support Vector (Machine, SVM), neural network, etc. Common generators may include, but are not limited to, deep neural network models, hidden Markov models (Hidden Markov Model, HMM), naive Bayes models, Gaussian mixture models, and so on.
训练子单元,用于训练生成器和判别器;具体地,固定判别器,调整生成器的网络参数;固定生成器,调整判别器的网络参数。本实施例中,生成器通过不断学习,生成越来越逼真协调的多轨复调音乐信号;而判别器通过不断地学习,增强对生成的多轨复调音乐信 号和多轨复调音乐真实信号的区分能力。通过生成器与判别器之间的对抗,最终,生成器生成的多轨复调音乐信号接近于多轨复调音乐真实信号而成功“欺骗”了判别器。这样的训练好的生成对抗网络模型可以用于提高生成的多轨复调音乐信号的真实性。The training subunit is used to train the generator and the discriminator; specifically, the fixed discriminator is used to adjust the network parameters of the generator; the fixed generator is used to adjust the network parameters of the discriminator. In this embodiment, the generator continuously generates more and more realistic and coordinated multi-track polyphony music signals through continuous learning; while the discriminator continuously learns to enhance the reality of the generated multi-track polyphony music signals and multi-track polyphony music The ability to distinguish signals. Through the confrontation between the generator and the discriminator, in the end, the multi-track polyphony music signal generated by the generator is close to the real signal of the multi-track polyphony music and successfully "deceives" the discriminator. Such a trained generative adversarial network model can be used to improve the authenticity of the generated multi-track polyphony music signal.
其中,训练生成器的具体方式包括:首先,将初始生成器基于至少两个预设音轨的音乐真实信号输出的一个多轨复调音乐信号输入预先训练的判别器,判别器生成该多轨复调音乐信号为真实信号的概率;其次,基于上述概率和上述多轨复调音乐信号与上述至少两个预设音轨的音乐真实信号之间的特征矩阵相似度确定上述初始生成器的损失函数;最后,利用损失函数更新上述初始生成器的网络参数,得到生成器。例如,将上述损失函数反向传播回上述初始生成器,以更新上述初始生成器的网络参数。需要说明的是,上述生成器的训练过程仅仅用于说明生成器参数的调整过程,可以认为初始生成器为参数调整前的模型,生成器为参数调整后的模型,参数的调整过程并不仅限于一次,可以根据生成器的优化程度以及实际需要等重复多次。The specific method of training the generator includes: first, a multitrack polyphony music signal output from the initial generator based on the real music signals of at least two preset audio tracks is input into a pre-trained discriminator, and the discriminator generates the multitrack The probability that the polyphony music signal is a real signal; secondly, the loss of the initial generator is determined based on the probability and the feature matrix similarity between the multitrack polyphony music signal and the real music signal of the at least two preset tracks Function; Finally, use the loss function to update the network parameters of the above initial generator to get the generator. For example, backpropagating the loss function back to the initial generator to update the network parameters of the initial generator. It should be noted that the above training process of the generator is only used to explain the process of adjusting the parameters of the generator. It can be considered that the initial generator is the model before the parameter adjustment, and the generator is the model after the parameter adjustment. The parameter adjustment process is not limited to Once, it can be repeated many times according to the optimization degree of the generator and the actual needs.
第二获取子单元,用于获取训练好的生成对抗网络模型的网络参数。The second acquisition subunit is used to acquire the network parameters of the trained adversarial network model.
可选地,生成对抗网络模型包括一个生成器及一个判别器,可以理解为作曲家模型。生成器用于接收多轨复调音乐随机信号,并根据多轨复调音乐随机信号生成多个预设音轨的新音乐信号,判别器用于判断生成器生成的多个预设音轨的新音乐信号是真实信号还是生成的信号;Optionally, generating an adversarial network model includes a generator and a discriminator, which can be understood as a composer model. The generator is used to receive the random signal of the multi-track polyphony music, and generate new music signals of multiple preset audio tracks according to the random signal of the multi-track polyphony music, and the discriminator is used to determine the new music of the multiple preset audio tracks generated by the generator Whether the signal is a real signal or a generated signal;
当判别器判断出多个预设音轨的新音乐信号为真实信号时,输出多个预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks. The new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
例如:随机往生成器中输入作曲家作出的一首曲子的多个不同音轨的音乐信号,如钢琴信号,小提琴信号,大提琴信号等,但是多个音轨之间的协调性较差。作曲家作出的多轨复调音乐随机信号在生成器的调整下生成多个预设音轨的新音乐信号,并在判别器的鉴别下,使得生成的预设音轨的新音乐信号更加接近真实信号,多个音轨之间具有协调性。For example: randomly input the music signals of multiple different tracks of a piece made by the composer into the generator, such as piano signals, violin signals, cello signals, etc., but the coordination between the multiple tracks is poor. The multitrack polyphony random signal made by the composer generates new music signals of multiple preset tracks under the adjustment of the generator, and under the discrimination of the discriminator, the new music signals of the generated preset tracks are made closer Real signal, coordination between multiple audio tracks.
可选地,生成对抗网络模型包括多个生成器及与多个生成器一一对应的多个判别器,生成对抗网络模型根据音乐随机信号及网络参数自动生成多轨复调音乐信号,每个生成器接收对应一个预设音轨的音乐随机信号,并根据预设音轨的音乐随机信号生成一个预设音轨的新音乐信号,每个判别器判断对应一个生成器生成的一个预设音轨的新音乐信号是真实信号还是生成的信号;Optionally, the generation of the adversarial network model includes multiple generators and multiple discriminators corresponding to the multiple generators. The generation of the adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters, each The generator receives the music random signal corresponding to a preset audio track, and generates a new music signal of the preset audio track according to the music random signal of the preset audio track, and each discriminator judges a preset sound generated by a corresponding generator Whether the new music signal of the track is a real signal or a generated signal;
当判别器判断出对应一个预设音轨的新音乐信号皆为真实信号 时,输出预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
例如:随机往每个生成器中输入对应一个音乐家演奏的一种乐器的音乐信号,例如:钢琴。此时,每个音乐家演奏的曲子相同,但是演奏的乐器不同。多个音乐家彼此之间相互干扰,容易造成多个音乐信号之间不协调。每种乐器的音乐随机信号在对应一个生成器的调整下生成一个预设音轨的新音乐信号,并在对应一个判别器的鉴别下,使得生成的预设音轨的新音乐信号更加接近真实信号,多个音轨之间具有协调性。For example, a music signal corresponding to a musical instrument played by a musician, such as a piano, is randomly input into each generator. At this time, each musician plays the same tune, but plays different instruments. Multiple musicians interfere with each other, which is easy to cause incoordination between multiple music signals. The random music signal of each musical instrument generates a new music signal of a preset track under the adjustment of a corresponding generator, and under the discrimination of a discriminator, the new music signal of the generated preset track is closer to the real Signal, coordination between multiple audio tracks.
可选地,生成对抗网络模型包括多个生成器及一个判别器,生成对抗网络模型根据音乐随机信号及网络参数自动生成多轨复调音乐信号。每个生成器接收对应一个预设音轨的音乐随机信号及一个多轨复调音乐随机信号,并根据预设音轨的音乐随机信号及多轨复调音乐随机信号生成一个预设音轨的新音乐信号;判别器判断每个生成器生成的一个预设音轨的新音乐信号是真实信号还是生成的信号;Optionally, the generation of the confrontation network model includes multiple generators and a discriminator, and the generation of the confrontation network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters. Each generator receives a random music signal and a multi-track polyphony random signal corresponding to a preset audio track, and generates a preset audio track according to the random music signal and the multi-track polyphony random signal of the preset audio track New music signal; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
当判别器判断出每个生成器生成的预设音轨的新音乐信号皆为真实信号时,输出多个预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
例如,将一个音乐家创作的一首曲子中的钢琴音乐信号和一个作曲家作出的同一曲子的音乐信号中的钢琴音乐信号共同作为一个预设音轨的音乐随机信号,在对应一个生成器的调整下,生成一个预设音轨(钢琴)的新音乐信号。从而使得由多种乐器作出的音乐信号一一在对应一个生成器的调整下一一生成新音乐信号,并接受同一个判别器的鉴别,使得生成的多个预设音轨的新音乐信号组成的多轨复调音乐信号更加真实,多个音轨之间具有协调性。For example, the piano music signal in a tune created by a musician and the piano music signal in the music signal of the same tune made by a composer are used as the music random signal of a preset track, corresponding to a generator ’s Under the adjustment, a new music signal of a preset track (piano) is generated. Thus, the music signals made by various musical instruments are generated one by one under the adjustment of a corresponding generator, and a new music signal is generated, and the discrimination of the same discriminator is accepted, so that the generated new music signals of multiple preset tracks are composed The multi-track polyphony music signal is more real, and there is coordination among multiple audio tracks.
本申请实施例提供了一种计算机非易失性存储介质,存储介质包括存储的程序,其中,在程序运行时控制存储介质所在设备执行以下步骤:An embodiment of the present application provides a computer non-volatile storage medium, where the storage medium includes a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform the following steps:
获取音乐训练信号,音乐训练信号包括多轨复调音乐真实信号及多个预设音轨的音乐真实信号;从音乐训练信号中提取特征矩阵,作为音乐训练样本数据;构建生成对抗网络模型,并通过音乐训练样本数据训练生成对抗网络模型,获得训练好的生成对抗网络模型的网络参数;获取用户输入的音乐随机信号,音乐随机信号包括以下至少之一:多轨复调音乐随机信号、多个预设音轨的音乐随机信号;将音乐随机信号输入生成对抗网络模型,以使生成对抗网络模型根据音乐随机信号及网络参数自动生成多轨复调音乐信号。Obtain music training signals. The music training signals include multi-track polyphony music real signals and music real signals of multiple preset audio tracks; extract the feature matrix from the music training signals as music training sample data; construct and generate an adversarial network model, and Generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model; obtain the music random signal input by the user. The music random signal includes at least one of the following: multi-track polyphony music random signal, multiple Preset the music random signal of the audio track; input the music random signal to generate the adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
可选地,在程序运行时控制存储介质所在设备还执行以下步骤:生成器接收多轨复调音乐随机信号,并根据多轨复调音乐随机信号 生成多个预设音轨的新音乐信号,判别器判断生成器生成的多个预设音轨的新音乐信号是真实信号还是生成的信号;Optionally, when the program is running, the device where the storage medium is located also performs the following steps: the generator receives the multi-track polyphony random music signal, and generates new music signals for multiple preset audio tracks based on the multi-track polyphony random music signal, The discriminator judges whether the new music signals of multiple preset audio tracks generated by the generator are real signals or generated signals;
当判别器判断出多个预设音轨的新音乐信号为真实信号时,输出多个预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks. The new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
可选地,在程序运行时控制存储介质所在设备还执行以下步骤:每个生成器接收对应一个预设音轨的音乐随机信号,并根据预设音轨的音乐随机信号生成一个预设音轨的新音乐信号,每个判别器判断对应一个生成器生成的一个预设音轨的新音乐信号是真实信号还是生成的信号;Optionally, when the program is running, the device where the storage medium is located also performs the following steps: each generator receives a music random signal corresponding to a preset audio track, and generates a preset audio track according to the music random signal of the preset audio track New music signal, each discriminator determines whether the new music signal corresponding to a preset track generated by a generator is a real signal or a generated signal;
当判别器判断出对应一个预设音轨的新音乐信号皆为真实信号时,输出预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
可选地,在程序运行时控制存储介质所在设备还执行以下步骤:每个生成器接收对应一个预设音轨的音乐随机信号及一个多轨复调音乐随机信号,并根据预设音轨的音乐随机信号及多轨复调音乐随机信号生成一个预设音轨的新音乐信号;判别器判断每个生成器生成的一个预设音轨的新音乐信号是真实信号还是生成的信号;Optionally, when the program is running, the device where the storage medium is located also performs the following steps: each generator receives a random music signal corresponding to a preset audio track and a multi-track polyphony random music signal, and according to the The random music signal and the multi-track polyphony random music signal generate a new music signal of a preset track; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
当判别器判断出每个生成器生成的预设音轨的新音乐信号皆为真实信号时,输出多个预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
可选地,在程序运行时控制存储介质所在设备还执行以下步骤:提取每个音乐训练信号中每个音符的开始时刻、持续时长及音高;根据每个音符的开始时刻、持续时长及音高确定音符的特征向量;将音符的特征向量进行组合,得到音乐训练信号的特征矩阵;将音乐训练信号的特征矩阵作为音乐训练样本数据。Optionally, when the program is running, the device that controls the storage medium also performs the following steps: extracting the start time, duration, and pitch of each note in each music training signal; based on the start time, duration, and pitch of each note Highly determine the feature vectors of the notes; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; use the feature matrix of the music training signal as the music training sample data.
如图3所示,本申请实施例提供了一种计算机设备100,包括存储器102、处理器101以及存储在所述存储器102中并可在所述处理器101上运行的计算机程序103,处理器执行计算机程序时实现以下步骤:As shown in FIG. 3, an embodiment of the present application provides a computer device 100, including a memory 102, a processor 101, and a computer program 103 stored in the memory 102 and executable on the processor 101. The processor The following steps are realized when the computer program is executed:
获取音乐训练信号,音乐训练信号包括多轨复调音乐真实信号及多个预设音轨的音乐真实信号;从音乐训练信号中提取特征矩阵,作为音乐训练样本数据;构建生成对抗网络模型,并通过音乐训练样本数据训练生成对抗网络模型,获得训练好的生成对抗网络模型的网络参数;获取用户输入的音乐随机信号,音乐随机信号包括以下至少之一:多轨复调音乐随机信号、多个预设音轨的音乐随机信号;将音乐随机信号输入生成对抗网络模型,以使生成对抗网络模型根据音乐随机信号及网络参数自动生成多轨复调音乐信号。Obtain music training signals. The music training signals include multi-track polyphony music real signals and music real signals of multiple preset audio tracks; extract the feature matrix from the music training signals as music training sample data; construct and generate an adversarial network model, and Generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model; obtain the music random signal input by the user. The music random signal includes at least one of the following: multi-track polyphony music random signal, multiple Preset the music random signal of the audio track; input the music random signal to generate the adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
可选地,处理器执行计算机程序时还实现以下步骤:生成器接收多 轨复调音乐随机信号,并根据多轨复调音乐随机信号生成多个预设音轨的新音乐信号,判别器判断生成器生成的多个预设音轨的新音乐信号是真实信号还是生成的信号;Optionally, the processor also implements the following steps when executing the computer program: the generator receives the multi-track polyphony random music signal, and generates new music signals for multiple preset audio tracks based on the multi-track polyphony random music signal, and the discriminator determines Whether the new music signals of multiple preset audio tracks generated by the generator are real signals or generated signals;
当判别器判断出多个预设音轨的新音乐信号为真实信号时,输出多个预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks. The new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
可选地,处理器执行计算机程序时还实现以下步骤:每个生成器接收对应一个预设音轨的音乐随机信号,并根据预设音轨的音乐随机信号生成一个预设音轨的新音乐信号,每个判别器判断对应一个生成器生成的一个预设音轨的新音乐信号是真实信号还是生成的信号;Optionally, the processor also implements the following steps when executing the computer program: each generator receives a music random signal corresponding to a preset audio track, and generates a new music of the preset audio track according to the music random signal of the preset audio track Signal, each discriminator determines whether the new music signal corresponding to a preset track generated by a generator is a real signal or a generated signal;
当判别器判断出对应一个预设音轨的新音乐信号皆为真实信号时,输出预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
可选地,处理器执行计算机程序时还实现以下步骤:每个生成器接收对应一个预设音轨的音乐随机信号及一个多轨复调音乐随机信号,并根据预设音轨的音乐随机信号及多轨复调音乐随机信号生成一个预设音轨的新音乐信号;判别器判断每个生成器生成的一个预设音轨的新音乐信号是真实信号还是生成的信号;Optionally, the processor also implements the following steps when executing the computer program: each generator receives a random music signal corresponding to a preset audio track and a multi-track polyphony random music signal, and according to the random music signal of the preset audio track And the multi-track polyphony random signal generates a new music signal of a preset track; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
当判别器判断出每个生成器生成的预设音轨的新音乐信号皆为真实信号时,输出多个预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
可选地,处理器执行计算机程序时还实现以下步骤:提取每个音乐训练信号中每个音符的开始时刻、持续时长及音高;根据每个音符的开始时刻、持续时长及音高确定音符的特征向量;将音符的特征向量进行组合,得到音乐训练信号的特征矩阵;将音乐训练信号的特征矩阵作为音乐训练样本数据。Optionally, the processor also implements the following steps when executing the computer program: extracting the start time, duration and pitch of each note in each music training signal; determining the note according to the start time, duration and pitch of each note The feature vector of the music; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; and use the feature matrix of the music training signal as the music training sample data.
需要说明的是,本申请实施例中所涉及的终端可以包括但不限于个人计算机(Personal Computer,PC)、个人数字助理(Personal Digital Assistant,PDA)、无线手持设备、平板电脑(Tablet Computer)、手机、MP3播放器、MP4播放器等。It should be noted that the terminals involved in the embodiments of the present application may include, but are not limited to, personal computers (Personal Computers, PCs), personal digital assistants (Personal Digital Assistants, PDAs), wireless handheld devices, tablet computers (Tablet Computers), Mobile phones, MP3 players, MP4 players, etc.
可以理解的是,所述应用可以是安装在终端上的应用程序(nativeApp),或者还可以是终端上的浏览器的一个网页程序(webApp),本申请实施例对此不进行限定。It can be understood that the application may be an application program (nativeApp) installed on the terminal, or may also be a webpage program (webApp) of a browser on the terminal, which is not limited in this embodiment of the present application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实 施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined Or it can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The above integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机装置(可以是个人计算机,服务器,或者网络装置等)或处理器(Processor)执行本申请各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above integrated unit implemented in the form of a software functional unit may be stored in a computer-readable storage medium. The above software function unit is stored in a storage medium, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) to perform the methods described in the embodiments of the present application Partial steps. The foregoing storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes .
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above are only the preferred embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of this application should be included in this application Within the scope of protection.

Claims (20)

  1. 一种基于生成对抗网络的音乐生成方法,其特征在于,所述方法包括:A music generation method based on a generation confrontation network, characterized in that the method includes:
    获取音乐训练信号,所述音乐训练信号包括多轨复调音乐真实信号及多个预设音轨的音乐真实信号;Obtain a music training signal, the music training signal includes a multi-track polyphony music real signal and a plurality of preset real music music signals;
    从所述音乐训练信号中提取特征矩阵,作为音乐训练样本数据;Extract a feature matrix from the music training signal as music training sample data;
    构建生成对抗网络模型,并通过所述音乐训练样本数据训练所述生成对抗网络模型,获得训练好的所述生成对抗网络模型的网络参数;Construct and generate an adversarial network model, and train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model;
    获取用户输入的音乐随机信号,所述音乐随机信号包括以下至少之一:多轨复调音乐随机信号、多个预设音轨的音乐随机信号;Acquire a random signal of music input by a user, the random signal of music comprising at least one of the following: a random signal of multi-track polyphony music, a random signal of music of a plurality of preset audio tracks;
    将所述音乐随机信号输入所述生成对抗网络模型,以使所述生成对抗网络模型根据所述音乐随机信号及所述网络参数自动生成多轨复调音乐信号。The music random signal is input to the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
  2. 根据权利要求1所述的方法,其特征在于,所述生成对抗网络模型包括一个生成器及一个判别器,所述生成对抗网络模型根据所述音乐随机信号及所述网络参数自动生成多轨复调音乐信号,包括:The method according to claim 1, wherein the generated adversarial network model includes a generator and a discriminator, and the generated adversarial network model automatically generates a multi-track complex based on the music random signal and the network parameters Tune music signals, including:
    所述生成器接收所述多轨复调音乐随机信号,并根据所述多轨复调音乐随机信号生成多个预设音轨的新音乐信号,所述判别器判断所述生成器生成的所述多个预设音轨的新音乐信号是真实信号还是生成的信号;The generator receives the random signal of the multi-track polyphony music, and generates new music signals of a plurality of preset audio tracks according to the random signal of the multi-track polyphony music, and the discriminator judges all the signals generated by the generator Whether the new music signals of multiple preset audio tracks are real signals or generated signals;
    当所述判别器判断出所述多个预设音轨的新音乐信号为真实信号时,输出所述多个预设音轨的新音乐信号,所述多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of the plurality of preset audio tracks are real signals, it outputs new music signals of the plurality of preset audio tracks, and the new music signals of the plurality of preset audio tracks Form a brand new multi-track polyphony music signal.
  3. 根据权利要求1所述的方法,其特征在于,所述生成对抗网络模型包括多个生成器及与所述多个生成器一一对应的多个判别器,所述生成对抗网络模型根据所述音乐随机信号及所述网络参数自动生成多轨复调音乐信号,包括:The method according to claim 1, wherein the generating adversarial network model includes a plurality of generators and a plurality of discriminators corresponding to the plurality of generators, the generating adversarial network model is based on the The random music signal and the network parameters automatically generate multi-track polyphony music signals, including:
    每个所述生成器接收对应一个预设音轨的音乐随机信号,并根据所述预设音轨的音乐随机信号生成一个预设音轨的新音乐信号,每个所述判别器判断对应一个所述生成器生成的所述一个预设音轨的新音乐信号是真 实信号还是生成的信号;Each of the generators receives a random music signal corresponding to a preset audio track, and generates a new music signal of the preset audio track according to the random music signal of the preset audio track, and each of the discriminators determines that the corresponding one Whether the new music signal of the one preset audio track generated by the generator is a real signal or a generated signal;
    当所述判别器判断出对应一个预设音轨的新音乐信号皆为真实信号时,输出预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track complex Tune the music signal.
  4. 根据权利要求1所述的方法,其特征在于,所述生成对抗网络模型包括多个生成器及一个判别器,所述生成对抗网络模型根据所述音乐随机信号及所述网络参数自动生成多轨复调音乐信号,包括:The method according to claim 1, wherein the generated adversarial network model includes a plurality of generators and a discriminator, and the generated adversarial network model automatically generates a multi-track based on the music random signal and the network parameters Polyphony music signals, including:
    每个所述生成器接收对应一个预设音轨的音乐随机信号及一个多轨复调音乐随机信号,并根据所述预设音轨的音乐随机信号及所述多轨复调音乐随机信号生成一个预设音轨的新音乐信号;所述判别器判断每个所述生成器生成的所述一个预设音轨的新音乐信号是真实信号还是生成的信号;Each of the generators receives a random signal of music corresponding to a preset audio track and a random signal of multi-track polyphony music, and generates according to the random signal of music of the preset audio track and the random signal of multi-track polyphony music A new music signal of a preset audio track; the discriminator determines whether the new music signal of each preset audio track generated by each generator is a real signal or a generated signal;
    当所述判别器判断出每个所述生成器生成的预设音轨的新音乐信号皆为真实信号时,输出多个预设音轨的新音乐信号,所述多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signal of each preset audio track generated by the generator is a real signal, it outputs new music signals of multiple preset audio tracks. The new music signal forms a brand new multi-track polyphony music signal.
  5. 根据权利要求1所述的方法,其特征在于,所述从所述音乐训练信号中提取特征矩阵,包括:The method according to claim 1, wherein the extracting a feature matrix from the music training signal includes:
    提取每个音乐训练信号中每个音符的开始时刻、持续时长及音高;Extract the start time, duration and pitch of each note in each music training signal;
    根据所述每个音符的开始时刻、持续时长及音高确定所述音符的特征向量;Determine the feature vector of the note according to the start time, duration and pitch of each note;
    将所述音符的特征向量进行组合,得到所述音乐训练信号的特征矩阵;Combining feature vectors of the musical notes to obtain a feature matrix of the music training signal;
    将所述音乐训练信号的特征矩阵作为所述音乐训练样本数据。The feature matrix of the music training signal is used as the music training sample data.
  6. 一种基于生成对抗网络的音乐生成装置,其特征在于,所述装置包括:A music generation device based on a generation confrontation network, characterized in that the device includes:
    第一获取单元,用于获取音乐训练信号,所述音乐训练信号包括多轨复调音乐真实信号及多个预设音轨的音乐真实信号;The first obtaining unit is used to obtain a music training signal, the music training signal includes a multi-track polyphony real music signal and a plurality of preset real music music signals;
    提取单元,用于从所述音乐训练信号中提取特征矩阵,作为音乐训练样本数据;An extraction unit for extracting a feature matrix from the music training signal as music training sample data;
    构建单元,用于构建生成对抗网络模型,并通过所述音乐训练样本数 据训练所述生成对抗网络模型,获得训练好的所述生成对抗网络模型的网络参数;A construction unit, configured to construct a generated adversarial network model, and train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model;
    第二获取单元,用于获取用户输入的音乐随机信号,所述音乐随机信号包括以下至少之一:多轨复调音乐随机信号、多个预设音轨的音乐随机信号;A second obtaining unit, configured to obtain a music random signal input by the user, the music random signal including at least one of the following: a multi-track polyphony random music signal, a plurality of preset random music tracks of the music signal;
    生成单元,用于将所述音乐随机信号输入所述生成对抗网络模型,以使所述生成对抗网络模型根据所述音乐随机信号及所述网络参数自动生成多轨复调音乐信号。The generating unit is configured to input the music random signal into the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
  7. 根据权利要求6所述的装置,其特征在于,所述生成对抗网络模型包括多个生成器及与所述多个生成器一一对应的多个判别器;每个所述生成器接收对应一个预设音轨的音乐随机信号,并根据所述预设音轨的音乐随机信号生成一个预设音轨的新音乐信号,每个所述判别器判断对应一个所述生成器生成的所述一个预设音轨的新音乐信号是真实信号还是生成的信号;The apparatus according to claim 6, wherein the generating adversarial network model includes multiple generators and multiple discriminators corresponding to the multiple generators in one-to-one correspondence; each of the generators receives a corresponding one A random music signal of a preset audio track, and generating a new music signal of a preset audio track according to the random music signal of the preset audio track, and each of the discriminators determines that the one generated by the generator corresponds to Whether the new music signal of the preset audio track is a real signal or a generated signal;
    当所述判别器判断出对应一个预设音轨的新音乐信号皆为真实信号时,输出预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track complex Tune the music signal.
  8. 根据权利要求6所述的装置,其特征在于,所述生成对抗网络模型包括多个生成器及一个判别器,每个所述生成器接收对应一个预设音轨的音乐随机信号及一个多轨复调音乐随机信号,并根据所述预设音轨的音乐随机信号及所述多轨复调音乐随机信号生成一个预设音轨的新音乐信号;所述判别器判断每个所述生成器生成的所述一个预设音轨的新音乐信号是真实信号还是生成的信号;The device according to claim 6, characterized in that the generated adversarial network model includes multiple generators and a discriminator, and each of the generators receives a random music signal corresponding to a preset audio track and a multi-track Polyphony the random music signal, and generate a new music signal of the preset audio track according to the random music signal of the preset audio track and the multitrack polyphony random music signal; the discriminator judges each of the generators Whether the generated new music signal of the one preset audio track is a real signal or a generated signal;
    当所述判别器判断出每个所述生成器生成的预设音轨的新音乐信号皆为真实信号时,输出多个预设音轨的新音乐信号,所述多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signal of each preset audio track generated by the generator is a real signal, it outputs new music signals of multiple preset audio tracks. The new music signal forms a brand new multi-track polyphony music signal.
  9. 根据权利要求6所述的装置,其特征在于,所述生成对抗网络模型包括一个生成器及一个判别器,所述生成器接收所述多轨复调音乐随机信号,并根据所述多轨复调音乐随机信号生成多个预设音轨的新音乐信号, 所述判别器判断所述生成器生成的所述多个预设音轨的新音乐信号是真实信号还是生成的信号;The apparatus according to claim 6, wherein the generated adversarial network model includes a generator and a discriminator, the generator receives the random signal of the multi-track polyphony music, and according to the multi-track complex Tune the music random signal to generate new music signals of multiple preset audio tracks, and the discriminator determines whether the new music signals of the multiple preset audio tracks generated by the generator are real signals or generated signals;
    当所述判别器判断出所述多个预设音轨的新音乐信号为真实信号时,输出所述多个预设音轨的新音乐信号,所述多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of the plurality of preset audio tracks are real signals, it outputs new music signals of the plurality of preset audio tracks, and the new music signals of the plurality of preset audio tracks Form a brand new multi-track polyphony music signal.
  10. 根据权利要求6所述的装置,其特征在于,所述提取单元包括:The apparatus according to claim 6, wherein the extraction unit comprises:
    提取子单元,用于提取每个所述音乐训练信号中每个音符的开始时刻、持续时长及音高;An extraction subunit, used to extract the start time, duration and pitch of each note in each of the music training signals;
    构成子单元,用于根据每个所述音符的开始时刻、持续时长及音高确定所述音符的特征向量;Forming a sub-unit for determining the feature vector of the note according to the start time, duration and pitch of each note;
    组合子单元,用于将所述音符的特征向量进行组合,得到所述音乐训练信号的特征矩阵;A combination subunit, configured to combine feature vectors of the musical notes to obtain a feature matrix of the music training signal;
    第一获取子单元,用于将所述音乐训练信号的特征矩阵作为所述音乐训练样本数据。The first obtaining subunit is configured to use the feature matrix of the music training signal as the music training sample data.
  11. 一种计算机非易失性存储介质,其特征在于,所述存储介质包括存储的程序,其特征在于,在所述程序运行时控制所述存储介质所在设备执行以下步骤:A computer non-volatile storage medium, characterized in that the storage medium includes a stored program, characterized in that, when the program is running, the device where the storage medium is located is controlled to perform the following steps:
    获取音乐训练信号,所述音乐训练信号包括多轨复调音乐真实信号及多个预设音轨的音乐真实信号;Obtain a music training signal, the music training signal includes a multi-track polyphony music real signal and a plurality of preset real music music signals;
    从所述音乐训练信号中提取特征矩阵,作为音乐训练样本数据;Extract a feature matrix from the music training signal as music training sample data;
    构建生成对抗网络模型,并通过所述音乐训练样本数据训练所述生成对抗网络模型,获得训练好的所述生成对抗网络模型的网络参数;Construct and generate an adversarial network model, and train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model;
    获取用户输入的音乐随机信号,所述音乐随机信号包括以下至少之一:多轨复调音乐随机信号、多个预设音轨的音乐随机信号;Acquire a random signal of music input by a user, the random signal of music comprising at least one of the following: a random signal of multi-track polyphony music, a random signal of music of a plurality of preset audio tracks;
    将所述音乐随机信号输入所述生成对抗网络模型,以使所述生成对抗网络模型根据所述音乐随机信号及所述网络参数自动生成多轨复调音乐信号。The music random signal is input to the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
  12. 根据权利要求11所述的计算机非易失性存储介质,其特征在于,在所述程序运行时控制所述存储介质所在设备执行以下步骤:The computer non-volatile storage medium according to claim 11, wherein when the program is running, the device where the storage medium is located is controlled to perform the following steps:
    所述生成对抗网络模型包括一个生成器及一个判别器,所述生成器接收所述多轨复调音乐随机信号,并根据所述多轨复调音乐随机信号生成多个预设音轨的新音乐信号,所述判别器判断所述生成器生成的所述多个预设音轨的新音乐信号是真实信号还是生成的信号;The generative adversarial network model includes a generator and a discriminator. The generator receives the multi-track polyphony random signal and generates a plurality of new preset tracks based on the multi-track polyphony random signal. Music signals, the discriminator determines whether the new music signals of the plurality of preset audio tracks generated by the generator are real signals or generated signals;
    当所述判别器判断出所述多个预设音轨的新音乐信号为真实信号时,输出所述多个预设音轨的新音乐信号,所述多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of the plurality of preset audio tracks are real signals, it outputs new music signals of the plurality of preset audio tracks, and the new music signals of the plurality of preset audio tracks Form a brand new multi-track polyphony music signal.
  13. 根据权利要求11所述的计算机非易失性存储介质,其特征在于,在所述程序运行时控制所述存储介质所在设备执行以下步骤:The computer non-volatile storage medium according to claim 11, wherein when the program is running, the device where the storage medium is located is controlled to perform the following steps:
    所述生成对抗网络模型包括多个生成器及与所述多个生成器一一对应的多个判别器,每个所述生成器接收对应一个预设音轨的音乐随机信号,并根据所述预设音轨的音乐随机信号生成一个预设音轨的新音乐信号,每个所述判别器判断对应一个所述生成器生成的所述一个预设音轨的新音乐信号是真实信号还是生成的信号;The generative adversarial network model includes a plurality of generators and a plurality of discriminators corresponding to the plurality of generators, each of the generators receives a random signal of music corresponding to a preset audio track, and according to the The random music signal of the preset audio track generates a new music signal of the preset audio track, and each of the discriminators determines whether the new music signal corresponding to the one preset audio track generated by the generator is a real signal or generated signal of;
    当所述判别器判断出对应一个预设音轨的新音乐信号皆为真实信号时,输出预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track complex Tune the music signal.
  14. 根据权利要求11所述的计算机非易失性存储介质,其特征在于,在所述程序运行时控制所述存储介质所在设备执行以下步骤:The computer non-volatile storage medium according to claim 11, wherein when the program is running, the device where the storage medium is located is controlled to perform the following steps:
    所述生成对抗网络模型包括多个生成器及一个判别器,每个所述生成器接收对应一个预设音轨的音乐随机信号及一个多轨复调音乐随机信号,并根据所述预设音轨的音乐随机信号及所述多轨复调音乐随机信号生成一个预设音轨的新音乐信号;所述判别器判断每个所述生成器生成的所述一个预设音轨的新音乐信号是真实信号还是生成的信号;The generative adversarial network model includes a plurality of generators and a discriminator, each of the generators receives a random music signal corresponding to a preset audio track and a multi-track polyphony random music signal, and according to the preset sound A random music signal of the track and the multi-track polyphony random music signal to generate a new music signal of a preset audio track; the discriminator judges the new music signal of the one preset audio track generated by each generator Whether it is a real signal or a generated signal;
    当所述判别器判断出每个所述生成器生成的预设音轨的新音乐信号皆为真实信号时,输出多个预设音轨的新音乐信号,所述多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signal of each preset audio track generated by the generator is a real signal, it outputs new music signals of multiple preset audio tracks. The new music signal forms a brand new multi-track polyphony music signal.
  15. 根据权利要求11所述的计算机非易失性存储介质,其特征在于,在所述程序运行时控制所述存储介质所在设备执行以下步骤:The computer non-volatile storage medium according to claim 11, wherein when the program is running, the device where the storage medium is located is controlled to perform the following steps:
    提取每个音乐训练信号中每个音符的开始时刻、持续时长及音高;Extract the start time, duration and pitch of each note in each music training signal;
    根据所述每个音符的开始时刻、持续时长及音高确定所述音符的特征向量;Determine the feature vector of the note according to the start time, duration and pitch of each note;
    将所述音符的特征向量进行组合,得到所述音乐训练信号的特征矩阵;Combining feature vectors of the musical notes to obtain a feature matrix of the music training signal;
    将所述音乐训练信号的特征矩阵作为所述音乐训练样本数据。The feature matrix of the music training signal is used as the music training sample data.
  16. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现以下步骤:A computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the following steps when executing the computer program:
    获取音乐训练信号,所述音乐训练信号包括多轨复调音乐真实信号及多个预设音轨的音乐真实信号;Obtain a music training signal, the music training signal includes a multi-track polyphony music real signal and a plurality of preset real music music signals;
    从所述音乐训练信号中提取特征矩阵,作为音乐训练样本数据;Extract a feature matrix from the music training signal as music training sample data;
    构建生成对抗网络模型,并通过所述音乐训练样本数据训练所述生成对抗网络模型,获得训练好的所述生成对抗网络模型的网络参数;Construct and generate an adversarial network model, and train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model;
    获取用户输入的音乐随机信号,所述音乐随机信号包括以下至少之一:多轨复调音乐随机信号、多个预设音轨的音乐随机信号;Acquire a random signal of music input by a user, the random signal of music comprising at least one of the following: a random signal of multi-track polyphony music, a random signal of music of a plurality of preset audio tracks;
    将所述音乐随机信号输入所述生成对抗网络模型,以使所述生成对抗网络模型根据所述音乐随机信号及所述网络参数自动生成多轨复调音乐信号。The music random signal is input to the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
  17. 根据权利要求16所述的计算机设备,其特征在于,所述生成对抗网络模型包括一个生成器及一个判别器,所述生成器接收所述多轨复调音乐随机信号,并根据所述多轨复调音乐随机信号生成多个预设音轨的新音乐信号,所述判别器判断所述生成器生成的所述多个预设音轨的新音乐信号是真实信号还是生成的信号;The computer device according to claim 16, characterized in that the generated adversarial network model includes a generator and a discriminator, the generator receives the multi-track polyphony random signal and according to the multi-track The polyphony music random signal generates new music signals of a plurality of preset audio tracks, and the discriminator determines whether the new music signals of the plurality of preset audio tracks generated by the generator are real signals or generated signals;
    当所述判别器判断出所述多个预设音轨的新音乐信号为真实信号时,输出所述多个预设音轨的新音乐信号,所述多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals of the plurality of preset audio tracks are real signals, it outputs new music signals of the plurality of preset audio tracks, and the new music signals of the plurality of preset audio tracks Form a brand new multi-track polyphony music signal.
  18. 根据权利要求16所述的计算机设备,其特征在于,所述生成对抗网络模型包括多个生成器及与所述多个生成器一一对应的多个判别器,每个所述生成器接收对应一个预设音轨的音乐随机信号,并根据所述预设 音轨的音乐随机信号生成一个预设音轨的新音乐信号,每个所述判别器判断对应一个所述生成器生成的所述一个预设音轨的新音乐信号是真实信号还是生成的信号;The computer device according to claim 16, characterized in that the generating adversarial network model includes a plurality of generators and a plurality of discriminators corresponding to the plurality of generators in one-to-one correspondence, and each of the generators receives a corresponding A music random signal of a preset audio track, and a new music signal of a preset audio track is generated according to the music random signal of the preset audio track, and each of the discriminators determines that the corresponding one generated by the generator Whether the new music signal of a preset audio track is a real signal or a generated signal;
    当所述判别器判断出对应一个预设音轨的新音乐信号皆为真实信号时,输出预设音轨的新音乐信号,多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track complex Tune the music signal.
  19. 根据权利要求16所述的计算机设备,其特征在于,所述生成对抗网络模型包括多个生成器及一个判别器,每个所述生成器接收对应一个预设音轨的音乐随机信号及一个多轨复调音乐随机信号,并根据所述预设音轨的音乐随机信号及所述多轨复调音乐随机信号生成一个预设音轨的新音乐信号;所述判别器判断每个所述生成器生成的所述一个预设音轨的新音乐信号是真实信号还是生成的信号;The computer device according to claim 16, characterized in that the generated adversarial network model includes a plurality of generators and a discriminator, and each of the generators receives a music random signal corresponding to a preset audio track and a multi Track polyphony music random signal, and generate a new music signal of the preset audio track according to the music random signal of the preset audio track and the multi-track polyphony music random signal; the discriminator judges each of the generated Whether the new music signal of the one preset audio track generated by the device is a real signal or a generated signal;
    当所述判别器判断出每个所述生成器生成的预设音轨的新音乐信号皆为真实信号时,输出多个预设音轨的新音乐信号,所述多个预设音轨的新音乐信号组成一个全新的多轨复调音乐信号。When the discriminator determines that the new music signal of each preset audio track generated by the generator is a real signal, it outputs new music signals of multiple preset audio tracks. The new music signal forms a brand new multi-track polyphony music signal.
  20. 根据权利要求16所述的计算机设备,其特征在于,所述处理器执行所述计算机程序时还实现以下步骤:The computer device according to claim 16, wherein the processor further implements the following steps when executing the computer program:
    提取每个音乐训练信号中每个音符的开始时刻、持续时长及音高;Extract the start time, duration and pitch of each note in each music training signal;
    根据所述每个音符的开始时刻、持续时长及音高确定所述音符的特征向量;Determine the feature vector of the note according to the start time, duration and pitch of each note;
    将所述音符的特征向量进行组合,得到所述音乐训练信号的特征矩阵;Combining feature vectors of the musical notes to obtain a feature matrix of the music training signal;
    将所述音乐训练信号的特征矩阵作为所述音乐训练样本数据。The feature matrix of the music training signal is used as the music training sample data.
PCT/CN2018/123550 2018-10-26 2018-12-25 Generative adversarial network-based music generation method and device WO2020082574A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811257179.3A CN109346043B (en) 2018-10-26 2018-10-26 Music generation method and device based on generation countermeasure network
CN201811257179.3 2018-10-26

Publications (1)

Publication Number Publication Date
WO2020082574A1 true WO2020082574A1 (en) 2020-04-30

Family

ID=65312008

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123550 WO2020082574A1 (en) 2018-10-26 2018-12-25 Generative adversarial network-based music generation method and device

Country Status (2)

Country Link
CN (1) CN109346043B (en)
WO (1) WO2020082574A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936806A (en) * 2021-09-18 2022-01-14 复旦大学 Brain stimulation response model construction method, response method, device and electronic equipment
CN116959393A (en) * 2023-09-18 2023-10-27 腾讯科技(深圳)有限公司 Training data generation method, device, equipment and medium of music generation model

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110085202B (en) * 2019-03-19 2022-03-15 北京卡路里信息技术有限公司 Music generation method, device, storage medium and processor
CN110288965B (en) * 2019-05-21 2021-06-18 北京达佳互联信息技术有限公司 Music synthesis method and device, electronic equipment and storage medium
CN113496243A (en) * 2020-04-07 2021-10-12 北京达佳互联信息技术有限公司 Background music obtaining method and related product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193992A (en) * 2010-03-11 2011-09-21 姜胡彬 System and method for generating custom songs
CN107945811A (en) * 2017-10-23 2018-04-20 北京大学 A kind of production towards bandspreading resists network training method and audio coding, coding/decoding method
CN108334497A (en) * 2018-02-06 2018-07-27 北京航空航天大学 The method and apparatus for automatically generating text
CN108461079A (en) * 2018-02-02 2018-08-28 福州大学 A kind of song synthetic method towards tone color conversion

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271457B (en) * 2007-03-21 2010-09-29 中国科学院自动化研究所 Music retrieval method and device based on rhythm
CN107293289B (en) * 2017-06-13 2020-05-29 南京医科大学 Speech generation method for generating confrontation network based on deep convolution
CN108346433A (en) * 2017-12-28 2018-07-31 北京搜狗科技发展有限公司 A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN108597496B (en) * 2018-05-07 2020-08-28 广州势必可赢网络科技有限公司 Voice generation method and device based on generation type countermeasure network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193992A (en) * 2010-03-11 2011-09-21 姜胡彬 System and method for generating custom songs
CN107945811A (en) * 2017-10-23 2018-04-20 北京大学 A kind of production towards bandspreading resists network training method and audio coding, coding/decoding method
CN108461079A (en) * 2018-02-02 2018-08-28 福州大学 A kind of song synthetic method towards tone color conversion
CN108334497A (en) * 2018-02-06 2018-07-27 北京航空航天大学 The method and apparatus for automatically generating text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FAN, ZHECHENG: "SVSGAN: SINGING VOICE SEPARATION VIA GENERATIVE ADVERSARIAL NETWORK", 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 13 September 2018 (2018-09-13), XP033401364 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936806A (en) * 2021-09-18 2022-01-14 复旦大学 Brain stimulation response model construction method, response method, device and electronic equipment
CN113936806B (en) * 2021-09-18 2024-03-08 复旦大学 Brain stimulation response model construction method, response method, device and electronic equipment
CN116959393A (en) * 2023-09-18 2023-10-27 腾讯科技(深圳)有限公司 Training data generation method, device, equipment and medium of music generation model
CN116959393B (en) * 2023-09-18 2023-12-22 腾讯科技(深圳)有限公司 Training data generation method, device, equipment and medium of music generation model

Also Published As

Publication number Publication date
CN109346043B (en) 2023-09-19
CN109346043A (en) 2019-02-15

Similar Documents

Publication Publication Date Title
WO2020082574A1 (en) Generative adversarial network-based music generation method and device
US10657934B1 (en) Enhancements for musical composition applications
CN101796587B (en) Automatic accompaniment for vocal melodies
Dittmar et al. Music information retrieval meets music education
CN103959372A (en) System and method for providing audio for a requested note using a render cache
CN104040618A (en) System and method for producing a more harmonious musical accompaniment and for applying a chain of effects to a musical composition
US11521585B2 (en) Method of combining audio signals
US10504498B2 (en) Real-time jamming assistance for groups of musicians
Lerch et al. An interdisciplinary review of music performance analysis
WO2020082573A1 (en) Long-short-term neural network-based multi-part music generation method and device
JP2017058597A (en) Automatic accompaniment data generation device and program
Sabathé et al. Deep recurrent music writer: Memory-enhanced variational autoencoder-based musical score composition and an objective measure
CN112289300B (en) Audio processing method and device, electronic equipment and computer readable storage medium
Hutchings Talking Drums: Generating drum grooves with neural networks
CN112669811B (en) Song processing method and device, electronic equipment and readable storage medium
Nakano et al. Voice drummer: A music notation interface of drum sounds using voice percussion input
Nikolaidis et al. Playing with the masters: A model for improvisatory musical interaction between robots and humans
WO2022153875A1 (en) Information processing system, electronic musical instrument, information processing method, and program
JP2015060200A (en) Musical performance data file adjustment device, method, and program
Duggan Machine annotation of traditional Irish dance music
Nymoen et al. Self-awareness in active music systems
KR20140054810A (en) System and method for producing music recorded, and apparatus applied to the same
Yang et al. Unsupervised Musical Timbre Transfer for Notification Sounds
Tian A cross-cultural analysis of music structure
JP6459162B2 (en) Performance data and audio data synchronization apparatus, method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18937916

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18937916

Country of ref document: EP

Kind code of ref document: A1