WO2020082574A1 - Procédé et dispositif de génération de musique basée sur un réseau publicitaire génératif - Google Patents

Procédé et dispositif de génération de musique basée sur un réseau publicitaire génératif Download PDF

Info

Publication number
WO2020082574A1
WO2020082574A1 PCT/CN2018/123550 CN2018123550W WO2020082574A1 WO 2020082574 A1 WO2020082574 A1 WO 2020082574A1 CN 2018123550 W CN2018123550 W CN 2018123550W WO 2020082574 A1 WO2020082574 A1 WO 2020082574A1
Authority
WO
WIPO (PCT)
Prior art keywords
music
signal
track
preset audio
signals
Prior art date
Application number
PCT/CN2018/123550
Other languages
English (en)
Chinese (zh)
Inventor
王义文
刘奡智
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020082574A1 publication Critical patent/WO2020082574A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/145Composing rules, e.g. harmonic or musical rules, for use in automatic composition; Rule generation algorithms therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/151Music Composition or musical creation; Tools or processes therefor using templates, i.e. incomplete musical sections, as a basis for composing

Definitions

  • the present application relates to the field of data processing technology, and in particular, to a music generation method and device based on a generation confrontation network.
  • the embodiments of the present application provide a music generation method and device based on a generation confrontation network to solve the problem that it is difficult to generate coordinated polyphony music among multiple audio tracks in the prior art.
  • a music generation method based on a generative adversarial network model.
  • the method includes: acquiring a music training signal, the music training signal including a multi-track polyphony music real signal and Real music signals of multiple preset audio tracks; extracting a feature matrix from the music training signal as music training sample data; constructing and generating an adversarial network model, and training the generated adversarial network model through the music training sample data, Obtain the trained network parameters of the generated adversarial network model; obtain the music random signal input by the user, the music random signal including at least one of the following: multi-track polyphony random music signal, multiple random music preset music tracks Signal; input the music random signal into the generated adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
  • a music generation device based on a generation confrontation network
  • the device includes: a first acquisition unit for acquiring a music training signal, the music training signal including multiple tracks Polyphony real music signal and real music signals of multiple preset audio tracks; extraction unit, used to extract feature matrix from the music training signal as music training sample data; construction unit, used to construct a confrontation network model, And train the generated adversarial network model through the music training sample data to obtain the trained network parameters of the generated adversarial network model; a second acquisition unit is used to acquire a music random signal input by the user, the music random signal At least one of the following: a multi-track polyphony random music signal, a plurality of preset random music tracks of the music random signal; a generating unit for inputting the music random signal into the generative confrontation network model to make the generative confrontation
  • the network model automatically generates multitrack polyphony based on the music random signal and the network parameters Music signal.
  • a computer non-volatile storage medium the storage medium includes a stored program, and when the program is running, the device where the storage medium is located is controlled to execute the above-mentioned music Generation method.
  • a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing The steps of the above-mentioned music generation method are realized when the computer program is described.
  • FIG. 1 is a flowchart of a music generation method based on a generation confrontation network according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a music generation device based on a generation confrontation network according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a computer device according to an embodiment of the present application.
  • first, second, third, etc. may be used to describe the terminals in the embodiments of the present application, these terminals should not be limited to these terms. These terms are only used to distinguish the terminals from each other.
  • first acquiring unit may also be referred to as a second acquiring unit, and similarly, the second acquiring unit may also be referred to as a first acquiring unit.
  • the word “if” as used herein may be interpreted as “when” or “when” or “in response to determination” or “in response to detection”.
  • the phrases “if determined” or “if detected (statement or event stated)” can be interpreted as “when determined” or “in response to determination” or “when detected (statement or event stated) ) “Or” in response to detection (statement or event stated) ".
  • FIG. 1 is a flowchart of a music generation method based on a generation confrontation network according to an embodiment of the present application. As shown in FIG. 1, the method includes:
  • Step S101 Acquire a music training signal.
  • the music training signal includes a multi-track polyphony real music signal and a plurality of preset real music music signals;
  • Step S102 Extract a feature matrix from the music training signal as music training sample data
  • Step S103 construct and generate an adversarial network model, and train and generate an adversarial network model through music training sample data to obtain the trained network parameters of the generated adversarial network model;
  • Step S104 Acquire a random music signal input by the user.
  • the random music signal includes at least one of the following: a multi-track polyphony random music signal and a plurality of preset random music tracks;
  • step S105 the music random signal is input to generate an adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
  • the music training signal is a real music signal collected in advance, for example, first collecting midi data of 200 "D major Cannon" in advance.
  • Music training signals include piano solo, violin solo, cello solo, ensemble, etc. Multiple preset tracks are represented as different musical instruments, such as piano, string, percussion, brass instruments, etc.
  • extracting the feature matrix from the music training signal includes: extracting the start time, duration and pitch of each note in each music training signal; determining the note according to the start time, duration and pitch of each note The feature vector of the music; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; and use the feature matrix of the music training signal as the music training sample data.
  • the way to extract the feature matrix from the music training signal can be performed by the piano rolling window editor.
  • construct and generate an adversarial network model and train the generated adversarial network model through music training sample data to obtain the trained network parameters of the generated adversarial network model, including:
  • the first step is to build a generative adversarial network model, which includes at least one generator and one discriminator.
  • the generator is used to perform rhythm adjustment on the input real music signals of multiple preset audio tracks and output the adjusted multi-track polyphony music signals
  • the discriminator is used to determine whether the input music signals are output by the generator.
  • GAN Generative Adversarial Networks
  • the two players in the GAN model are composed of a generator (generative model) and a discriminator ( discriminative model).
  • the generator captures the distribution of the music training sample data and generates a sample similar to the real signal.
  • the pursuit effect is that the more like the real signal, the better.
  • the discriminator is a binary classifier that discriminates the probability that a sample comes from the music training sample data (not the generator's generated data).
  • Common discriminators may include but are not limited to linear regression models, linear discriminant analysis, and support vector machines ( Support Vector (Machine, SVM), neural network, etc.
  • Common generators may include, but are not limited to, deep neural network models, hidden Markov models (Hidden Markov Model, HMM), naive Bayes models, Gaussian mixture models, and so on.
  • the second step is to train the generator and the discriminator; specifically, fix the discriminator and adjust the network parameters of the generator; fix the generator and adjust the network parameters of the discriminator.
  • the generator continuously generates more and more realistic and coordinated multi-track polyphony music signals through continuous learning; while the discriminator continuously learns to enhance the reality of the generated multi-track polyphony music signals and multi-track polyphony music The ability to distinguish signals.
  • the multi-track polyphony music signal generated by the generator is close to the real signal of the multi-track polyphony music and successfully "deceives" the discriminator.
  • Such a trained generative adversarial network model can be used to improve the authenticity of the generated multi-track polyphony music signal.
  • the specific method of training the generator includes: first, a multitrack polyphony music signal output from the initial generator based on the real music signals of at least two preset audio tracks is input into a pre-trained discriminator, and the discriminator generates the multitrack The probability that the polyphony music signal is a real signal; secondly, the loss of the initial generator is determined based on the probability and the feature matrix similarity between the multitrack polyphony music signal and the real music signal of the at least two preset tracks Function; Finally, use the loss function to update the network parameters of the above initial generator to get the generator. For example, backpropagating the loss function back to the initial generator to update the network parameters of the initial generator.
  • the above training process of the generator is only used to explain the process of adjusting the parameters of the generator. It can be considered that the initial generator is the model before the parameter adjustment, and the generator is the model after the parameter adjustment.
  • the parameter adjustment process is not limited to Once, it can be repeated many times according to the optimization degree of the generator and the actual needs.
  • the third step is to obtain the network parameters of the trained adversarial network model.
  • Method 1 Generating an adversarial network model includes a generator and a discriminator, which can be understood as a composer model.
  • the generator receives the random signal of the multi-track polyphony music, and generates a new music signal of multiple preset audio tracks according to the random signal of the multi-track polyphony music, and the discriminator judges that the new music signal of the multiple preset audio tracks generated by the generator is Real signal or generated signal;
  • the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks.
  • the new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
  • the multitrack polyphony random signal made by the composer generates new music signals of multiple preset tracks under the adjustment of the generator, and under the discrimination of the discriminator, the new music signals of the generated preset tracks are made closer Real signal, coordination between multiple audio tracks.
  • the generation of the adversarial network model includes multiple generators and multiple discriminators corresponding to the multiple generators.
  • the generation of the adversarial network model automatically generates multi-track polyphony music signals according to the random music signals and network parameters.
  • the receiver receives the random music signal corresponding to a preset audio track, and generates a new music signal of the preset audio track according to the random music signal of the preset audio track, and each discriminator judges a preset audio track generated by a corresponding generator Is the new music signal a real signal or a generated signal;
  • the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
  • each generator randomly input into each generator a musical signal corresponding to an instrument played by a musician, for example: piano. At this time, each musician plays the same tune, but plays different instruments. Multiple musicians interfere with each other, which is easy to cause incoordination between multiple music signals.
  • the random music signal of each musical instrument generates a new music signal of a preset track under the adjustment of a corresponding generator, and under the discrimination of a discriminator, the new music signal of the generated preset track is closer to the real Signal, coordination between multiple audio tracks.
  • Method 3 Generating an adversarial network model includes multiple generators and a discriminator.
  • the generating an adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
  • Each generator receives a random music signal and a multi-track polyphony random signal corresponding to a preset audio track, and generates a preset audio track according to the random music signal and the multi-track polyphony random signal of the preset audio track New music signal;
  • the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
  • the piano music signal in a tune created by a musician and the piano music signal in the music signal of the same tune made by a composer are used as the music random signal of a preset track, corresponding to a generator ’s Under the adjustment, a new music signal of a preset track (piano) is generated.
  • the music signals made by various musical instruments are generated one by one under the adjustment of a corresponding generator, and a new music signal is generated, and the discrimination of the same discriminator is accepted, so that the generated new music signals of multiple preset tracks are composed
  • the multi-track polyphony music signal is more real, and there is coordination among multiple audio tracks.
  • An embodiment of the present application provides a music generation device based on a generation confrontation network.
  • the device is used to execute the above-mentioned music generation method based on generation confrontation network.
  • the device includes: a first acquisition unit 10 and an extraction unit 20. Construction unit 30, second acquisition unit 40, and generation unit 50.
  • the first obtaining unit 10 is configured to obtain a music training signal, the music training signal includes a multi-track polyphony real music signal and a plurality of preset real music music signals;
  • the extraction unit 20 is used to extract a feature matrix from the music training signal as music training sample data
  • the construction unit 30 is used to construct and generate an adversarial network model, and train and generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model;
  • the second obtaining unit 40 is configured to obtain a music random signal input by the user, and the music random signal includes at least one of the following: a multi-track polyphony random music signal and a plurality of preset random music tracks of the music track;
  • the generating unit 50 is configured to input a random signal of music to generate an adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
  • the music training signal is a real music signal collected in advance, for example, first collecting midi data of 200 "D major Cannon" in advance.
  • Music training signals include piano solo, violin solo, cello solo, ensemble, etc. Multiple preset tracks are represented as different musical instruments, such as piano, string, percussion, brass instruments, etc.
  • the extraction unit 20 includes: an extraction subunit, a composition subunit, a combined subunit, and a first acquisition subunit.
  • Extraction subunit used to extract the start time, duration and pitch of each note in each music training signal; constitute a subunit, used to determine the feature vector of the note according to the start time, duration and pitch of each note
  • the combining subunit is used to combine the feature vectors of the musical notes to obtain the feature matrix of the music training signal; the first obtaining subunit is used to use the feature matrix of the music training signal as music training sample data.
  • the way to extract the feature matrix from the music training signal can be performed by the piano rolling window editor.
  • the construction unit 30 includes a construction subunit, a training subunit, and a second acquisition subunit.
  • a construction subunit is used to construct and generate an adversarial network model.
  • the adversarial network model includes at least one generator and one discriminator.
  • the generator is used to perform rhythm adjustment on the input real music signals of multiple preset audio tracks and output the adjusted multi-track polyphony music signals, and the discriminator is used to determine whether the input music signals are output by the generator.
  • GAN Generative Adversarial Networks
  • the two players in the GAN model are composed of a generator (generative model) and a discriminator ( discriminative model).
  • the generator captures the distribution of the music training sample data and generates a sample similar to the real signal.
  • the pursuit effect is that the more like the real signal, the better.
  • the discriminator is a binary classifier that discriminates the probability that a sample comes from the music training sample data (not the generator's generated data).
  • Common discriminators may include but are not limited to linear regression models, linear discriminant analysis, and support vector machines ( Support Vector (Machine, SVM), neural network, etc.
  • Common generators may include, but are not limited to, deep neural network models, hidden Markov models (Hidden Markov Model, HMM), naive Bayes models, Gaussian mixture models, and so on.
  • the training subunit is used to train the generator and the discriminator; specifically, the fixed discriminator is used to adjust the network parameters of the generator; the fixed generator is used to adjust the network parameters of the discriminator.
  • the generator continuously generates more and more realistic and coordinated multi-track polyphony music signals through continuous learning; while the discriminator continuously learns to enhance the reality of the generated multi-track polyphony music signals and multi-track polyphony music The ability to distinguish signals.
  • the multi-track polyphony music signal generated by the generator is close to the real signal of the multi-track polyphony music and successfully "deceives" the discriminator.
  • Such a trained generative adversarial network model can be used to improve the authenticity of the generated multi-track polyphony music signal.
  • the specific method of training the generator includes: first, a multitrack polyphony music signal output from the initial generator based on the real music signals of at least two preset audio tracks is input into a pre-trained discriminator, and the discriminator generates the multitrack The probability that the polyphony music signal is a real signal; secondly, the loss of the initial generator is determined based on the probability and the feature matrix similarity between the multitrack polyphony music signal and the real music signal of the at least two preset tracks Function; Finally, use the loss function to update the network parameters of the above initial generator to get the generator. For example, backpropagating the loss function back to the initial generator to update the network parameters of the initial generator.
  • the above training process of the generator is only used to explain the process of adjusting the parameters of the generator. It can be considered that the initial generator is the model before the parameter adjustment, and the generator is the model after the parameter adjustment.
  • the parameter adjustment process is not limited to Once, it can be repeated many times according to the optimization degree of the generator and the actual needs.
  • the second acquisition subunit is used to acquire the network parameters of the trained adversarial network model.
  • generating an adversarial network model includes a generator and a discriminator, which can be understood as a composer model.
  • the generator is used to receive the random signal of the multi-track polyphony music, and generate new music signals of multiple preset audio tracks according to the random signal of the multi-track polyphony music, and the discriminator is used to determine the new music of the multiple preset audio tracks generated by the generator Whether the signal is a real signal or a generated signal;
  • the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks.
  • the new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
  • the multitrack polyphony random signal made by the composer generates new music signals of multiple preset tracks under the adjustment of the generator, and under the discrimination of the discriminator, the new music signals of the generated preset tracks are made closer Real signal, coordination between multiple audio tracks.
  • the generation of the adversarial network model includes multiple generators and multiple discriminators corresponding to the multiple generators.
  • the generation of the adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters, each The generator receives the music random signal corresponding to a preset audio track, and generates a new music signal of the preset audio track according to the music random signal of the preset audio track, and each discriminator judges a preset sound generated by a corresponding generator Whether the new music signal of the track is a real signal or a generated signal;
  • the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
  • a music signal corresponding to a musical instrument played by a musician is randomly input into each generator.
  • each musician plays the same tune, but plays different instruments. Multiple musicians interfere with each other, which is easy to cause incoordination between multiple music signals.
  • the random music signal of each musical instrument generates a new music signal of a preset track under the adjustment of a corresponding generator, and under the discrimination of a discriminator, the new music signal of the generated preset track is closer to the real Signal, coordination between multiple audio tracks.
  • the generation of the confrontation network model includes multiple generators and a discriminator, and the generation of the confrontation network model automatically generates a multi-track polyphony music signal according to the music random signal and the network parameters.
  • Each generator receives a random music signal and a multi-track polyphony random signal corresponding to a preset audio track, and generates a preset audio track according to the random music signal and the multi-track polyphony random signal of the preset audio track New music signal;
  • the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
  • the piano music signal in a tune created by a musician and the piano music signal in the music signal of the same tune made by a composer are used as the music random signal of a preset track, corresponding to a generator ’s Under the adjustment, a new music signal of a preset track (piano) is generated.
  • the music signals made by various musical instruments are generated one by one under the adjustment of a corresponding generator, and a new music signal is generated, and the discrimination of the same discriminator is accepted, so that the generated new music signals of multiple preset tracks are composed
  • the multi-track polyphony music signal is more real, and there is coordination among multiple audio tracks.
  • An embodiment of the present application provides a computer non-volatile storage medium, where the storage medium includes a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform the following steps:
  • the music training signals include multi-track polyphony music real signals and music real signals of multiple preset audio tracks; extract the feature matrix from the music training signals as music training sample data; construct and generate an adversarial network model, and Generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model; obtain the music random signal input by the user.
  • the music random signal includes at least one of the following: multi-track polyphony music random signal, multiple Preset the music random signal of the audio track; input the music random signal to generate the adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
  • the device where the storage medium is located also performs the following steps: the generator receives the multi-track polyphony random music signal, and generates new music signals for multiple preset audio tracks based on the multi-track polyphony random music signal, The discriminator judges whether the new music signals of multiple preset audio tracks generated by the generator are real signals or generated signals;
  • the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks.
  • the new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
  • the device where the storage medium is located also performs the following steps: each generator receives a music random signal corresponding to a preset audio track, and generates a preset audio track according to the music random signal of the preset audio track New music signal, each discriminator determines whether the new music signal corresponding to a preset track generated by a generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
  • the device where the storage medium is located also performs the following steps: each generator receives a random music signal corresponding to a preset audio track and a multi-track polyphony random music signal, and according to the The random music signal and the multi-track polyphony random music signal generate a new music signal of a preset track; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
  • the device that controls the storage medium when the program is running, the device that controls the storage medium also performs the following steps: extracting the start time, duration, and pitch of each note in each music training signal; based on the start time, duration, and pitch of each note Highly determine the feature vectors of the notes; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; use the feature matrix of the music training signal as the music training sample data.
  • an embodiment of the present application provides a computer device 100, including a memory 102, a processor 101, and a computer program 103 stored in the memory 102 and executable on the processor 101.
  • the processor The following steps are realized when the computer program is executed:
  • the music training signals include multi-track polyphony music real signals and music real signals of multiple preset audio tracks; extract the feature matrix from the music training signals as music training sample data; construct and generate an adversarial network model, and Generate the adversarial network model through the music training sample data to obtain the network parameters of the trained adversarial network model; obtain the music random signal input by the user.
  • the music random signal includes at least one of the following: multi-track polyphony music random signal, multiple Preset the music random signal of the audio track; input the music random signal to generate the adversarial network model, so that the generated adversarial network model automatically generates a multi-track polyphony music signal according to the music random signal and network parameters.
  • the processor also implements the following steps when executing the computer program: the generator receives the multi-track polyphony random music signal, and generates new music signals for multiple preset audio tracks based on the multi-track polyphony random music signal, and the discriminator determines Whether the new music signals of multiple preset audio tracks generated by the generator are real signals or generated signals;
  • the discriminator determines that the new music signals of multiple preset audio tracks are real signals, it outputs new music signals of multiple preset audio tracks.
  • the new music signals of multiple preset audio tracks form a brand new multitrack polyphony Music signal.
  • the processor also implements the following steps when executing the computer program: each generator receives a music random signal corresponding to a preset audio track, and generates a new music of the preset audio track according to the music random signal of the preset audio track Signal, each discriminator determines whether the new music signal corresponding to a preset track generated by a generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals corresponding to one preset audio track are all real signals, the new music signal of the preset audio track is output, and the new music signals of multiple preset audio tracks form a brand new multi-track polyphony music signal.
  • the processor also implements the following steps when executing the computer program: each generator receives a random music signal corresponding to a preset audio track and a multi-track polyphony random music signal, and according to the random music signal of the preset audio track And the multi-track polyphony random signal generates a new music signal of a preset track; the discriminator determines whether the new music signal of a preset track generated by each generator is a real signal or a generated signal;
  • the discriminator determines that the new music signals of the preset audio tracks generated by each generator are real signals, it outputs new music signals of multiple preset audio tracks, and the new music signals of multiple preset audio tracks form a brand new Multi-track polyphony music signal.
  • the processor also implements the following steps when executing the computer program: extracting the start time, duration and pitch of each note in each music training signal; determining the note according to the start time, duration and pitch of each note The feature vector of the music; combine the feature vectors of the notes to obtain the feature matrix of the music training signal; and use the feature matrix of the music training signal as the music training sample data.
  • terminals involved in the embodiments of the present application may include, but are not limited to, personal computers (Personal Computers, PCs), personal digital assistants (Personal Digital Assistants, PDAs), wireless handheld devices, tablet computers (Tablet Computers), Mobile phones, MP3 players, MP4 players, etc.
  • the application may be an application program (nativeApp) installed on the terminal, or may also be a webpage program (webApp) of a browser on the terminal, which is not limited in this embodiment of the present application.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined Or it can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the above integrated unit implemented in the form of a software functional unit may be stored in a computer-readable storage medium.
  • the above software function unit is stored in a storage medium, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) to perform the methods described in the embodiments of the present application Partial steps.
  • the foregoing storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes .

Abstract

Des modes de réalisation de la présente invention concerne un procédé et un dispositif de génération de musique basée sur un réseau publicitaire génératif, se rapportant au domaine technique de l'intelligence artificielle. Le procédé consiste à : obtenir un signal de musique d'apprentissage comprenant un signal de musique polyphonique multipiste réel et de multiples signaux de musique réels dans de multiples pistes audio prédéterminées ; extraire une matrice de caractéristiques à partir du signal de musique d'apprentissage en tant que données d'échantillon de musique d'apprentissage ; construire un modèle de réseau publicitaire génératif, former le modèle de réseau publicitaire génératif, et obtenir un paramètre de réseau du modèle de réseau publicitaire génératif entraîné ; obtenir un signal de musique aléatoire entré par un utilisateur ; et entrer le signal de musique aléatoire dans le modèle de réseau publicitaire génératif, de telle sorte que le modèle de réseau publicitaire génératif génère automatiquement un signal de musique polyphonique multipiste en fonction du signal de musique aléatoire et du paramètre de réseau. La solution technique fournie par les modes de réalisation de la présente invention résout le problème de l'état de la technique dans lequel de la musique polyphonique ayant de multiples pistes audio harmonieuses est difficile à générer.
PCT/CN2018/123550 2018-10-26 2018-12-25 Procédé et dispositif de génération de musique basée sur un réseau publicitaire génératif WO2020082574A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811257179.3A CN109346043B (zh) 2018-10-26 2018-10-26 一种基于生成对抗网络的音乐生成方法及装置
CN201811257179.3 2018-10-26

Publications (1)

Publication Number Publication Date
WO2020082574A1 true WO2020082574A1 (fr) 2020-04-30

Family

ID=65312008

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123550 WO2020082574A1 (fr) 2018-10-26 2018-12-25 Procédé et dispositif de génération de musique basée sur un réseau publicitaire génératif

Country Status (2)

Country Link
CN (1) CN109346043B (fr)
WO (1) WO2020082574A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936806A (zh) * 2021-09-18 2022-01-14 复旦大学 脑刺激响应模型构建方法、响应方法、装置及电子设备
CN116959393A (zh) * 2023-09-18 2023-10-27 腾讯科技(深圳)有限公司 音乐生成模型的训练数据生成方法、装置、设备及介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110085202B (zh) * 2019-03-19 2022-03-15 北京卡路里信息技术有限公司 音乐生成方法、装置、存储介质及处理器
CN110288965B (zh) * 2019-05-21 2021-06-18 北京达佳互联信息技术有限公司 一种音乐合成方法、装置、电子设备及存储介质
CN113496243A (zh) * 2020-04-07 2021-10-12 北京达佳互联信息技术有限公司 背景音乐获取方法及相关产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193992A (zh) * 2010-03-11 2011-09-21 姜胡彬 用于生成定制歌曲的系统和方法
CN107945811A (zh) * 2017-10-23 2018-04-20 北京大学 一种面向频带扩展的生成式对抗网络训练方法及音频编码、解码方法
CN108334497A (zh) * 2018-02-06 2018-07-27 北京航空航天大学 自动生成文本的方法和装置
CN108461079A (zh) * 2018-02-02 2018-08-28 福州大学 一种面向音色转换的歌声合成方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271457B (zh) * 2007-03-21 2010-09-29 中国科学院自动化研究所 一种基于旋律的音乐检索方法及装置
CN107293289B (zh) * 2017-06-13 2020-05-29 南京医科大学 一种基于深度卷积生成对抗网络的语音生成方法
CN108346433A (zh) * 2017-12-28 2018-07-31 北京搜狗科技发展有限公司 一种音频处理方法、装置、设备及可读存储介质
CN108597496B (zh) * 2018-05-07 2020-08-28 广州势必可赢网络科技有限公司 一种基于生成式对抗网络的语音生成方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193992A (zh) * 2010-03-11 2011-09-21 姜胡彬 用于生成定制歌曲的系统和方法
CN107945811A (zh) * 2017-10-23 2018-04-20 北京大学 一种面向频带扩展的生成式对抗网络训练方法及音频编码、解码方法
CN108461079A (zh) * 2018-02-02 2018-08-28 福州大学 一种面向音色转换的歌声合成方法
CN108334497A (zh) * 2018-02-06 2018-07-27 北京航空航天大学 自动生成文本的方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FAN, ZHECHENG: "SVSGAN: SINGING VOICE SEPARATION VIA GENERATIVE ADVERSARIAL NETWORK", 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 13 September 2018 (2018-09-13), XP033401364 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936806A (zh) * 2021-09-18 2022-01-14 复旦大学 脑刺激响应模型构建方法、响应方法、装置及电子设备
CN113936806B (zh) * 2021-09-18 2024-03-08 复旦大学 脑刺激响应模型构建方法、响应方法、装置及电子设备
CN116959393A (zh) * 2023-09-18 2023-10-27 腾讯科技(深圳)有限公司 音乐生成模型的训练数据生成方法、装置、设备及介质
CN116959393B (zh) * 2023-09-18 2023-12-22 腾讯科技(深圳)有限公司 音乐生成模型的训练数据生成方法、装置、设备及介质

Also Published As

Publication number Publication date
CN109346043B (zh) 2023-09-19
CN109346043A (zh) 2019-02-15

Similar Documents

Publication Publication Date Title
WO2020082574A1 (fr) Procédé et dispositif de génération de musique basée sur un réseau publicitaire génératif
US10657934B1 (en) Enhancements for musical composition applications
CN101796587B (zh) 声音旋律的自动伴奏
Dittmar et al. Music information retrieval meets music education
CN103959372A (zh) 用于使用呈现高速缓存针对所请求的音符提供音频的系统和方法
CN104040618A (zh) 用于制作更和谐音乐伴奏以及用于将效果链应用于乐曲的系统和方法
US11521585B2 (en) Method of combining audio signals
CN102576524A (zh) 接收、分析并编辑音频来创建音乐作品的系统和方法
Lerch et al. An interdisciplinary review of music performance analysis
WO2020082573A1 (fr) Procédé et dispositif de production de musique en plusieurs parties basé sur un réseau neuronal à longue mémoire court terme
JP2017058597A (ja) 自動伴奏データ生成装置及びプログラム
CN112289300B (zh) 音频处理方法、装置及电子设备和计算机可读存储介质
Hutchings Talking Drums: Generating drum grooves with neural networks
CN112669811B (zh) 一种歌曲处理方法、装置、电子设备及可读存储介质
Nikolaidis et al. Playing with the masters: A model for improvisatory musical interaction between robots and humans
Nakano et al. Voice drummer: A music notation interface of drum sounds using voice percussion input
WO2022153875A1 (fr) Système de traitement d'informations, instrument de musique électronique, procédé de traitement d'informations et programme
JP2015060200A (ja) 演奏データファイル調整装置、方法、およびプログラム
JP6459162B2 (ja) 演奏データとオーディオデータの同期装置、方法、およびプログラム
Duggan Machine annotation of traditional Irish dance music
Nymoen et al. Self-awareness in active music systems
KR20140054810A (ko) 반주음악 제작 서비스 시스템 및 그 방법, 그리고 이에 적용되는 장치
Yang et al. Unsupervised Musical Timbre Transfer for Notification Sounds
Tian A cross-cultural analysis of music structure
WO2022172732A1 (fr) Système de traitement d'informations, instrument de musique électronique, procédé de traitement d'informations et système d'apprentissage machine

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18937916

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18937916

Country of ref document: EP

Kind code of ref document: A1