CN113096621A - Music generation method, device and equipment based on specific style and storage medium - Google Patents

Music generation method, device and equipment based on specific style and storage medium Download PDF

Info

Publication number
CN113096621A
CN113096621A CN202110322904.6A CN202110322904A CN113096621A CN 113096621 A CN113096621 A CN 113096621A CN 202110322904 A CN202110322904 A CN 202110322904A CN 113096621 A CN113096621 A CN 113096621A
Authority
CN
China
Prior art keywords
data
preset
performance
melody
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110322904.6A
Other languages
Chinese (zh)
Inventor
刘奡智
韩宝强
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110322904.6A priority Critical patent/CN113096621A/en
Publication of CN113096621A publication Critical patent/CN113096621A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/111Automatic composing, i.e. using predefined musical rules

Abstract

The invention relates to the field of artificial intelligence, and discloses a music generation method, a device, equipment and a storage medium based on a specific style, which are used for generating musical compositions according to the specific style, so that the generation efficiency of music and the controllability of the musical compositions are improved. The music generation method based on the specific style comprises the following steps: acquiring original data; marking original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; inputting the intermediate data into a preset playing encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network; inputting the coded data into a preset decoder to generate decoded data; and correcting errors of the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference. In addition, the invention also relates to a block chain technology, and the generated Buddha music work can be stored in the block chain nodes.

Description

Music generation method, device and equipment based on specific style and storage medium
Technical Field
The present invention relates to the field of audio conversion, and in particular, to a method, an apparatus, a device, and a storage medium for generating music based on a specific style.
Background
With the development of deep learning, a music generation model and a variant thereof are particularly important in music generation, and in the field of automatic music generation, a Transformer model can generate a work with the duration exceeding one minute in a short time, and is widely applied to the aspects of language models and translation tasks.
However, the existing music generation model has great limitations, the efficiency of music generation is low, and the style of the generated musical composition is not controllable.
Disclosure of Invention
The invention provides a music generation method, a device, equipment and a storage medium based on a specific style, which are used for generating musical compositions according to the specific style, and improving the generation efficiency of music and the controllability of the musical compositions.
The invention provides a music generation method based on a specific style in a first aspect, which comprises the following steps: acquiring original data, wherein the original data comprises a Musical Instrument Digital Interface (MIDI) file played by a piano and audio data played by the piano; marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; inputting the intermediate data into a preset playing encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network; inputting the coded data into a preset decoder to generate decoded data; and correcting errors of the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference.
Optionally, in a first implementation manner of the first aspect of the present invention, the marking the original data to generate intermediate data, where the intermediate data includes a plurality of events includes: marking the original data based on the note starting time and the note ending time to generate first marking data, wherein the first marking data comprises a preset number of note-on events and a preset number of note-off events; marking the original data based on a preset time increment value to generate second marking data, wherein the second marking data comprises a preset number of time-shifting events; marking the original data based on a preset quantization speed to generate third marking data, wherein the third marking data comprises a preset number of note playing speed events; and combining the first marking data, the second marking data and the third marking data to generate intermediate data, wherein the intermediate data comprises a plurality of events.
Optionally, in a second implementation manner of the first aspect of the present invention, the inputting the intermediate data into a preset performance encoder and a preset melody encoder, and the generating the encoded data based on a relative attention mechanism and a feedforward neural network includes: performing feature extraction on the intermediate data to generate performance input data and melody input data; inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder, and transmitting to a feedforward neural network to obtain performance encoding data; inputting the melody input data into a preset melody encoder, passing through a multi-head relative attention layer of the preset melody encoder, and transmitting to a feedforward neural network to obtain melody encoded data; generating encoded data based on the performance encoded data and the melody encoded data.
Optionally, in a third implementation manner of the first aspect of the present invention, the inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder, and transmitting to a feed-forward neural network, to obtain the performance encoded data includes: inputting the performance input data into a first layer stack of the preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting to a feedforward neural network of the first layer stack to generate a first performance segment; inputting the first playing segment into a second layer of stack of the preset playing encoder, iterating according to preset times, and generating a playing time segment based on data output by a feedforward neural network of the last layer of stack; and performing compression processing on the performance time slice to generate performance coded data.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the preset-based adjustment mechanism performs error correction on the decoded data to obtain target data, and generates a final musical piece according to the target data, where the adjustment mechanism includes melody adjustment, performance adjustment, and input interference: correcting the target data based on a preset melody and a playing mechanism, deleting abnormal data, and generating first adjusting data; and performing noise reduction processing according to the first adjusting data, reducing input interference and generating a final musical composition.
Optionally, in a fifth implementation manner of the first aspect of the present invention, after the preset-based adjustment mechanism performs error correction on the decoded data to obtain target data, and generates a final musical piece according to the target data, the method further includes: similar evaluations of performance characteristics were performed.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the performing similar evaluation on the performance characteristics includes: acquiring two musical works to be evaluated, and determining evaluation indexes, wherein the evaluation indexes comprise note density, pitch range, average pitch change, integral pitch change, average speed, speed change, average duration and duration change; respectively generating a plurality of evaluation index histograms of the two musical pieces based on the evaluation indexes, and calculating the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances; generating a normal distribution graph according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution graphs; and calculating the overlapping area of the normal distribution diagram of the evaluation index corresponding to the two musical works, and evaluating the similarity based on the overlapping area.
A second aspect of the present invention provides a music generating apparatus based on a specific genre, comprising: the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring original data, and the original data comprises a Musical Instrument Digital Interface (MIDI) file played by a piano and audio data played by the piano; the marking module is used for marking the original data to generate intermediate data, and the intermediate data comprises a plurality of events; the encoding module is used for inputting the intermediate data into a preset performance encoder and a preset melody encoder and generating encoded data based on a relative attention mechanism and a feedforward neural network; the decoding module is used for inputting the coded data into a preset decoder to generate decoded data; and the adjusting module is used for carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data and generate a final musical composition, wherein the adjusting mechanism comprises melody adjustment, playing adjustment and input interference.
Optionally, in a first implementation manner of the second aspect of the present invention, the marking module includes: a first marking unit, configured to mark the original data based on a note start time and a note end time to generate first mark data, where the first mark data includes a preset number of note-on events and a preset number of note-off events; a second marking unit, configured to mark the original data based on a preset time increment value to generate second marking data, where the second marking data includes a preset number of time-shift events; a third marking unit for marking the original data based on a preset quantization speed to generate third marking data, wherein the third marking data comprises a preset number of note playing speed events; a merging unit, configured to merge the first tag data, the second tag data, and the third tag data to generate intermediate data, where the intermediate data includes a plurality of events.
Optionally, in a second implementation manner of the second aspect of the present invention, the encoding module includes: a feature extraction unit for performing feature extraction on the intermediate data to generate performance input data and melody input data; the first input unit is used for inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder and transmitting the performance input data to a feedforward neural network to obtain performance encoded data; the second input unit is used for inputting the melody input data into a preset melody encoder, passing through a multi-head relative attention layer of the preset melody encoder and transmitting the melody input data to a feedforward neural network to obtain melody encoded data; a first generation unit configured to generate encoded data based on the performance encoded data and the melody encoded data.
Optionally, in a third implementation manner of the second aspect of the present invention, the first input unit is specifically configured to: inputting the performance input data into a first layer stack of the preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting to a feedforward neural network of the first layer stack to generate a first performance segment; inputting the first playing segment into a second layer of stack of the preset playing encoder, iterating according to preset times, and generating a playing time segment based on data output by a feedforward neural network of the last layer of stack; and performing compression processing on the performance time slice to generate performance coded data.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the adjusting module includes: a correcting unit for correcting the target data based on a preset melody and a playing mechanism, deleting abnormal data, and generating first adjustment data; and the second generating unit is used for performing noise reduction processing according to the first adjusting data, reducing input interference and generating a final musical composition.
Optionally, in a fifth implementation manner of the second aspect of the present invention, after the preset-based adjustment mechanism performs error correction on the decoded data to obtain target data, and generates a final musical piece according to the target data, the apparatus further includes: and a similarity evaluation module.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the similarity evaluation module includes: the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for acquiring two musical works to be evaluated and determining evaluation indexes, and the evaluation indexes comprise note density, pitch range, average change of pitch, overall change of pitch, average speed, speed change, average duration and duration change; the calculation unit is used for respectively generating a plurality of evaluation index histograms of the two musical pieces based on the evaluation indexes, and calculating the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances; the third generating unit is used for generating a normal distribution graph according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution graphs; and the evaluation unit is used for calculating the overlapping area of the normal distribution diagram of the evaluation index corresponding to the two musical compositions and evaluating the similarity based on the overlapping area.
A third aspect of the present invention provides a music generating apparatus based on a specific genre, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the particular genre-based music generation device to perform the particular genre-based music generation method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-described specific-style-based music generation method.
In the technical scheme provided by the invention, original data are obtained, wherein the original data comprise musical instrument digital interface MIDI files played by a piano and audio data played by the piano; marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; inputting the intermediate data into a preset playing encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network; inputting the coded data into a preset decoder to generate decoded data; and correcting errors of the decoded data based on a preset adjusting mechanism to obtain target data and generate a final musical composition, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference. In the embodiment of the invention, the musical composition is generated according to the specific style, so that the generation efficiency of the music and the controllability of the musical composition are improved.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a music generation method based on a specific genre according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of another embodiment of a music generation method based on a specific genre according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a music generating apparatus based on a specific genre according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of a music generating apparatus based on a specific genre according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an embodiment of a music generating device based on a specific genre according to the embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a music generation method, a device, equipment and a storage medium based on a specific style, which are used for generating musical compositions according to the specific style, and improving the generation efficiency of music and the controllability of the musical compositions.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a specific flow of an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a music generation method based on a specific genre in an embodiment of the present invention includes:
101. raw data including a MIDI file of a musical instrument digital interface played by a piano and audio data played by the piano are acquired.
The server acquires raw data including a MIDI file of a musical instrument digital interface played by a piano and audio data played by the piano. Raw data was collected from related music websites, including MIDI files played on 5000 kinds of classical pianos, and audio data collected from piano plays for up to 20000 hours.
It is to be understood that the executing subject of the present invention may be a music generating apparatus based on a specific genre, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
102. And marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events.
The server marks the original data and generates intermediate data, wherein the intermediate data comprises a plurality of events. Specifically, the server marks the original data based on the note starting time and the note ending time to generate first mark data, wherein the first mark data comprises a preset number of note-on events and a preset number of note-off events; the server marks the original data based on a preset time increment value to generate second marked data, wherein the second marked data comprise a preset number of time-shifting events; the server marks the original data based on a preset quantization speed to generate third mark data, wherein the third mark data comprise a preset number of note playing speed events; the server combines the first mark data, the second mark data and the third mark data to generate intermediate data, and the intermediate data comprises a plurality of events.
The server represents the raw data as a series of discrete markers including 88 note-on events, 88 note-off events, 100 time-shift events from 10ms to 1s in increments of 10ms, and 16 quantized velocity markers representing 88 note-play velocity events, which are capable of representing the play velocity of the notes.
103. The intermediate data is inputted to a preset performance encoder and a preset melody encoder, and the encoded data is generated based on a relative attention mechanism and a feedforward neural network.
The server inputs the intermediate data into a preset performance encoder and a preset melody encoder, and generates encoded data based on a relative attention mechanism and a feedforward neural network. An encoder is a device that compiles and converts signals or data into a form of signals that can be used for communication, transmission, and storage, when audio data is input, the audio is converted into a data format that can be stored on a computer by an encoder, the performance encoder takes a performance part as input, the melody encoder takes a melody part as input, generates a corresponding time segment after encoding, each encoder comprises 6 layers of stacks, each stack is a data structure with data items arranged in sequence, the data items can be inserted and deleted only at one end, each layer of stacks comprises a multi-head relative attention layer and a feedforward neural network layer, data passes through the relative attention layer of the first layer of stacks, passes through the feedforward neural network layer of the first layer of stacks and is transmitted to the relative attention layer of the next layer of stacks, and iteration processing is carried out in sequence until the feedforward neural network of the last layer of stacks outputs encoded data.
104. The encoded data is input to a preset decoder to generate decoded data.
The server inputs the encoded data to a preset decoder to generate decoded data. The decoder has the same structure as the network of encoders and comprises 6 layer stacks, each layer stack comprises a multi-headed relative attention layer and a feedforward neural network layer, and the decoder receives the output of the encoder and also generates new flags. To ensure the accuracy of the newly generated marker, end-to-end model training may be performed, if a sequence n of length x is given, the following formula is obtained:
Figure BDA0002993494730000071
where θ is the model parameter.
105. And correcting errors of the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference.
The server corrects the error of the decoded data based on a preset adjusting mechanism to obtain target data, and generates a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, playing adjustment and input interference. Specifically, the server corrects the target data based on a preset melody and a playing mechanism, deletes abnormal data, and generates first adjustment data; and the server performs noise reduction processing according to the first adjusting data, reduces input interference and generates a final musical composition.
The abnormal data comprises messy codes caused by problems of character coding and the like, the characters are cut off, abnormal numerical values and the like, the data are audited and filtered through setting preset melodies and playing mechanisms, the abnormal data are deleted, the quality of works can be improved, the data noise reduction processing is mainly based on Fourier transform, signals in a time domain are converted into signals in a frequency domain, and corresponding noise reduction processing is carried out.
In the embodiment of the invention, the musical composition is generated according to the specific style, so that the generation efficiency of the music and the controllability of the musical composition are improved.
Referring to fig. 2, another embodiment of the music generation method based on a specific genre according to the embodiment of the present invention includes:
201. raw data including a MIDI file of a musical instrument digital interface played by a piano and audio data played by the piano are acquired.
The server acquires raw data including a MIDI file of a musical instrument digital interface played by a piano and audio data played by the piano. Raw data was collected from related music websites, including MIDI files played on 5000 kinds of classical pianos, and audio data collected from piano plays for up to 20000 hours.
It is to be understood that the executing subject of the present invention may be a music generating apparatus based on a specific genre, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
202. And marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events.
The server marks the original data and generates intermediate data, wherein the intermediate data comprises a plurality of events. Specifically, the server marks the original data based on the note starting time and the note ending time to generate first mark data, wherein the first mark data comprises a preset number of note-on events and a preset number of note-off events; the server marks the original data based on a preset time increment value to generate second marked data, wherein the second marked data comprise a preset number of time-shifting events; the server marks the original data based on a preset quantization speed to generate third mark data, wherein the third mark data comprise a preset number of note playing speed events; the server combines the first mark data, the second mark data and the third mark data to generate intermediate data, and the intermediate data comprises a plurality of events.
The server represents the raw data as a series of discrete markers including 88 note-on events, 88 note-off events, 100 time-shift events from 10ms to 1s in increments of 10ms, and 16 quantized velocity markers representing 88 note-play velocity events, which are capable of representing the play velocity of the notes.
203. And performing feature extraction on the intermediate data to generate performance input data and melody input data.
The server performs feature extraction on the intermediate data to generate performance input data and melody input data. The feature extraction is mainly based on a Principal Component Analysis (PCA) algorithm, and data are converted into a plurality of comprehensive index features by using a dimensionality reduction idea, and the feature extraction mainly comprises the following steps: the method comprises the steps of carrying out standardization processing on original data, calculating a correlation coefficient matrix, calculating a characteristic value and a characteristic vector of the correlation coefficient matrix to obtain a new index scalar, calculating an information contribution rate and an accumulated contribution rate of the characteristic value, selecting a principal component according to a preset rule, and finally generating performance input data and melody input data according to the preset rule.
204. And inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder, and transmitting to a feedforward neural network to obtain performance encoding data.
The server inputs the performance input data into a preset performance encoder, the performance input data passes through a multi-head relative attention layer of the preset performance encoder and is transmitted to the feedforward neural network, and the performance encoding data are obtained. Specifically, the server inputs the performance input data into a first layer stack of a preset performance encoder, the performance input data passes through a multi-head relative attention layer of the first layer stack and is transmitted to a feedforward neural network of the first layer stack to generate a first performance segment; the server inputs the first playing segment into a second layer of stack of a preset playing encoder, iteration is carried out according to preset times, and a playing time segment is generated based on data output by a feedforward neural network of the last layer of stack; the server compresses the performance time slice to generate performance coded data.
205. And inputting the melody input data into a preset melody encoder, passing through a multi-head relative attention layer of the preset melody encoder, and transmitting to a feedforward neural network to obtain melody encoded data.
The server inputs the melody input data into a preset melody encoder, the melody input data pass through a multi-head relative attention layer of the preset melody encoder and are transmitted to the feedforward neural network, and melody encoding data are obtained. Specifically, the server inputs melody input data into a first layer stack of a preset melody encoder, the melody input data pass through a multi-head relative attention layer of the first layer stack and are transmitted to a feedforward neural network of the first layer stack to generate a first playing segment; the server inputs the first playing segment into a second layer of stack of a preset melody encoder, iteration is carried out according to preset times, and a melody time segment is generated based on data output by a feedforward neural network of the last layer of stack; the server compresses the melody time segment to generate melody coding data.
206. The encoded data is generated based on the performance encoded data and the melody encoded data.
The server generates the encoded data based on the performance encoded data and the melody encoded data. The server combines the performance coded data and the melody coded data to generate final coded data.
207. The encoded data is input to a preset decoder to generate decoded data.
The server inputs the encoded data to a preset decoder to generate decoded data. The network of decoders and encoders having the same structure, packetsComprising 6 stacks of layers, each stack comprising a multi-headed relative attention layer and a feed-forward neural network layer, the decoder receiving the output of the encoder and also generating new tokens. To ensure the accuracy of the newly generated marker, end-to-end model training may be performed, if a sequence n of length x is given, the following formula is obtained:
Figure BDA0002993494730000101
where θ is the model parameter.
208. And correcting errors of the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference.
The server corrects the error of the decoded data based on a preset adjusting mechanism to obtain target data, and generates a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, playing adjustment and input interference. Specifically, the server corrects the target data based on a preset melody and a playing mechanism, deletes abnormal data, and generates first adjustment data; and the server performs noise reduction processing according to the first adjusting data, reduces input interference and generates a final musical composition.
The abnormal data comprises messy codes caused by problems of character coding and the like, the characters are cut off, abnormal numerical values and the like, the data are audited and filtered through setting preset melodies and playing mechanisms, the abnormal data are deleted, the quality of works can be improved, the data noise reduction processing is mainly based on Fourier transform, signals in a time domain are converted into signals in a frequency domain, and corresponding noise reduction processing is carried out.
In the embodiment of the invention, the musical composition is generated according to the specific style, so that the generation efficiency of the music and the controllability of the musical composition are improved.
With reference to fig. 3, the method for generating music based on a specific genre in the embodiment of the present invention is described above, and a music generating apparatus based on a specific genre in the embodiment of the present invention is described below, where an embodiment of a music generating apparatus based on a specific genre in the embodiment of the present invention includes:
an acquiring module 301, configured to acquire original data, where the original data includes a MIDI file of a musical instrument digital interface played by a piano and audio data played by the piano;
a marking module 302, configured to mark original data to generate intermediate data, where the intermediate data includes multiple events;
an encoding module 303, configured to input the intermediate data into a preset performance encoder and a preset melody encoder, and generate encoded data based on a relative attention mechanism and a feed-forward neural network;
a decoding module 304, configured to input the encoded data into a preset decoder, and generate decoded data;
an adjusting module 305, configured to perform error correction on the decoded data based on a preset adjusting mechanism, such as melody adjustment, performance adjustment and input interference, to obtain target data, and generate a final musical composition according to the target data.
In the embodiment of the invention, the musical composition is generated according to the specific style, so that the generation efficiency of the music and the controllability of the musical composition are improved.
Referring to fig. 4, another embodiment of the music generating apparatus based on a specific genre according to the embodiment of the present invention includes:
an acquiring module 301, configured to acquire original data, where the original data includes a MIDI file of a musical instrument digital interface played by a piano and audio data played by the piano;
a marking module 302, configured to mark original data to generate intermediate data, where the intermediate data includes multiple events;
an encoding module 303, configured to input the intermediate data into a preset performance encoder and a preset melody encoder, and generate encoded data based on a relative attention mechanism and a feed-forward neural network;
a decoding module 304, configured to input the encoded data into a preset decoder, and generate decoded data;
an adjusting module 305, configured to perform error correction on the decoded data based on a preset adjusting mechanism, such as melody adjustment, performance adjustment and input interference, to obtain target data, and generate a final musical composition according to the target data.
Optionally, the marking module 302 includes:
a first marking unit 3021 for marking the original data based on the note-on time and the note-off time to generate first marking data, the first marking data including a preset number of note-on events and a preset number of note-off events;
a second marking unit 3022, configured to mark the original data based on a preset time increment value, and generate second mark data, where the second mark data includes a preset number of time-shift events;
a third marking unit 3023 for marking the original data based on a preset quantization speed to generate third marking data, the third marking data including a preset number of note-playing speed events;
a merging unit 3024, configured to merge the first marker data, the second marker data, and the third marker data to generate intermediate data, where the intermediate data includes a plurality of events.
Optionally, the encoding module 303 includes:
a feature extraction unit 3031 for performing feature extraction on the intermediate data to generate performance input data and melody input data;
a first input unit 3032, configured to input performance input data into a preset performance encoder, pass through a multi-head relative attention layer of the preset performance encoder, and transmit the performance input data to a feed-forward neural network to obtain performance encoded data;
the second input unit 3033 is configured to input the melody input data into the preset melody encoder, pass through the preset multi-head relative attention layer of the melody encoder, and transmit the melody input data to the feedforward neural network to obtain melody encoded data;
a first generating unit 3034 generates the encoded data based on the performance encoded data and the melody encoded data.
Optionally, the first input unit 3032 is specifically configured to:
inputting the performance input data into a preset first layer stack of a performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting to a feedforward neural network of the first layer stack to generate a first performance segment; inputting the first playing segment into a second layer of stack of the preset playing encoder, iterating according to preset times, and generating a playing time segment based on data output by a feedforward neural network of the last layer of stack; and performing compression processing on the performance time slices to generate performance coded data.
Optionally, the adjusting module 305 includes:
a correcting unit 3051 for correcting the target data based on a preset melody and a playing mechanism, deleting abnormal data, and generating first adjustment data;
and the second generating unit 3052, configured to perform noise reduction processing according to the first adjustment data, reduce input interference, and generate a final musical piece.
Optionally, after the adjusting module 305, the music generating apparatus based on the specific genre further includes a similarity evaluating module 306, including:
the determining unit 3061 is configured to obtain two pieces of music to be evaluated, and determine evaluation indexes, where the evaluation indexes include note density, pitch range, average change of pitch, overall change of pitch, average speed, speed change, average duration and duration change;
a calculation unit 3062, configured to generate a plurality of evaluation index histograms of the two musical pieces based on the evaluation indexes, and calculate a mean and a variance of each evaluation index to obtain a plurality of sets of means and variances;
a third generation unit 3063, configured to generate a normal distribution map according to the mean and variance of each evaluation index, so as to obtain multiple groups of normal distribution maps;
the evaluation unit 3064 is configured to calculate an overlapping area of the normal distribution map of the evaluation index corresponding to the two musical pieces, and perform similarity evaluation based on the overlapping area.
In the embodiment of the invention, the musical composition is generated according to the specific style, so that the generation efficiency of the music and the controllability of the musical composition are improved.
Fig. 3 and 4 above describe the music generation apparatus based on a specific genre in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the music generation device based on a specific genre in the embodiment of the present invention is described in detail from the perspective of the hardware processing.
Fig. 5 is a schematic structural diagram of a specific-style-based music generating device, where the specific-style-based music generating device 500 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the music generating apparatus 500 based on a specific genre. Still further, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the music generating apparatus 500 based on a specific genre.
The genre-specific music generation device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows server, Mac OS X, Unix, Linux, FreeBSD, and so forth. Those skilled in the art will appreciate that the particular genre-based music generation device configuration shown in fig. 5 does not constitute a limitation of a particular genre-based music generation device, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
The invention also provides a music generation device based on a specific style, which comprises a memory and a processor, wherein computer readable instructions are stored in the memory, and when being executed by the processor, the computer readable instructions cause the processor to execute the steps of the music generation method based on the specific style in the embodiments.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the specific style-based music generation method.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for generating music based on a specific style, the method comprising:
acquiring original data, wherein the original data comprises a Musical Instrument Digital Interface (MIDI) file played by a piano and audio data played by the piano;
marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events;
inputting the intermediate data into a preset playing encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network;
inputting the coded data into a preset decoder to generate decoded data;
and correcting errors of the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference.
2. The method of claim 1, wherein the tagging of the raw data generates intermediate data comprising a plurality of events, comprising:
marking the original data based on the note starting time and the note ending time to generate first marking data, wherein the first marking data comprises a preset number of note-on events and a preset number of note-off events;
marking the original data based on a preset time increment value to generate second marking data, wherein the second marking data comprises a preset number of time-shifting events;
marking the original data based on a preset quantization speed to generate third marking data, wherein the third marking data comprises a preset number of note playing speed events;
and combining the first marking data, the second marking data and the third marking data to generate intermediate data, wherein the intermediate data comprises a plurality of events.
3. The specific-style-based music generation method of claim 1, wherein the inputting the intermediate data into a preset performance encoder and a preset melody encoder, the generating the encoded data based on a relative attention mechanism and a feedforward neural network comprises:
performing feature extraction on the intermediate data to generate performance input data and melody input data;
inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder, and transmitting to a feedforward neural network to obtain performance encoding data;
inputting the melody input data into a preset melody encoder, passing through a multi-head relative attention layer of the preset melody encoder, and transmitting to a feedforward neural network to obtain melody encoded data;
generating encoded data based on the performance encoded data and the melody encoded data.
4. The method of claim 3, wherein the inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder, and transmitting to the feedforward neural network to obtain the performance encoding data comprises:
inputting the performance input data into a first layer stack of the preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting to a feedforward neural network of the first layer stack to generate a first performance segment;
inputting the first playing segment into a second layer of stack of the preset playing encoder, iterating according to preset times, and generating a playing time segment based on data output by a feedforward neural network of the last layer of stack;
and performing compression processing on the performance time slice to generate performance coded data.
5. The method of claim 1, wherein the preset-based adjustment mechanism corrects the decoded data to obtain target data, and generates the final musical composition according to the target data, wherein the adjustment mechanism comprises melody adjustment, performance adjustment and input disturbance:
correcting the target data based on a preset melody and a playing mechanism, deleting abnormal data, and generating first adjusting data;
and performing noise reduction processing according to the first adjusting data, reducing input interference and generating a final musical composition.
6. The method of any of claims 1-5, wherein after the preset-based adjustment mechanism performs error correction on the decoded data to obtain target data, and generates a final musical composition according to the target data, the method further comprises:
similarity evaluation of the performance characteristics is performed.
7. The method of claim 6, wherein said performing similar evaluations of performance characteristics comprises:
acquiring two musical works to be evaluated, and determining evaluation indexes, wherein the evaluation indexes comprise note density, pitch range, average pitch change, integral pitch change, average speed, speed change, average duration and duration change;
respectively generating a plurality of evaluation index histograms of the two musical pieces based on the evaluation indexes, and calculating the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances;
generating a normal distribution graph according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution graphs;
and calculating the overlapping area of the normal distribution diagram of the evaluation index corresponding to the two musical works, and evaluating the similarity based on the overlapping area.
8. A specific-style-based music generating apparatus, the specific-style-based music generating apparatus comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring original data, and the original data comprises a Musical Instrument Digital Interface (MIDI) file played by a piano and audio data played by the piano;
the marking module is used for marking the original data to generate intermediate data, and the intermediate data comprises a plurality of events;
the encoding module is used for inputting the intermediate data into a preset performance encoder and a preset melody encoder and generating encoded data based on a relative attention mechanism and a feedforward neural network;
the decoding module is used for inputting the coded data into a preset decoder to generate decoded data;
and the adjusting module is used for correcting errors of the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, playing adjustment and input interference.
9. A music generating apparatus based on a specific genre, characterized in that the music generating apparatus based on the specific genre comprises: a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invokes the instructions in the memory to cause the particular genre-based music generation device to perform the particular genre-based music generation method of any of claims 1-7.
10. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement a music generation method based on a specific genre as claimed in any one of claims 1-7.
CN202110322904.6A 2021-03-26 2021-03-26 Music generation method, device and equipment based on specific style and storage medium Pending CN113096621A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110322904.6A CN113096621A (en) 2021-03-26 2021-03-26 Music generation method, device and equipment based on specific style and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110322904.6A CN113096621A (en) 2021-03-26 2021-03-26 Music generation method, device and equipment based on specific style and storage medium

Publications (1)

Publication Number Publication Date
CN113096621A true CN113096621A (en) 2021-07-09

Family

ID=76670047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110322904.6A Pending CN113096621A (en) 2021-03-26 2021-03-26 Music generation method, device and equipment based on specific style and storage medium

Country Status (1)

Country Link
CN (1) CN113096621A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113851098A (en) * 2021-08-31 2021-12-28 广东智媒云图科技股份有限公司 Melody style conversion method and device, terminal equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102480784A (en) * 2010-11-24 2012-05-30 中国移动通信集团公司 Method and system for evaluating fingerprint positioning error
CN103218438A (en) * 2013-04-18 2013-07-24 广东欧珀移动通信有限公司 Method of recommending online music based on playing record of mobile terminal and mobile terminal
CN105280170A (en) * 2015-10-10 2016-01-27 北京百度网讯科技有限公司 Method and device for playing music score
CN106409282A (en) * 2016-08-31 2017-02-15 得理电子(上海)有限公司 Audio frequency synthesis system and method, electronic device therefor and cloud server therefor
CN109859245A (en) * 2019-01-22 2019-06-07 深圳大学 Multi-object tracking method, device and the storage medium of video object
CN110148393A (en) * 2018-02-11 2019-08-20 阿里巴巴集团控股有限公司 Music generating method, device and system and data processing method
CN111554255A (en) * 2020-04-21 2020-08-18 华南理工大学 MIDI playing style automatic conversion system based on recurrent neural network
CN112037776A (en) * 2019-05-16 2020-12-04 武汉Tcl集团工业研究院有限公司 Voice recognition method, voice recognition device and terminal equipment
CN112102801A (en) * 2020-09-04 2020-12-18 北京有竹居网络技术有限公司 Method and device for generating main melody, electronic equipment and storage medium
CN112435642A (en) * 2020-11-12 2021-03-02 浙江大学 Melody MIDI accompaniment generation method based on deep neural network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102480784A (en) * 2010-11-24 2012-05-30 中国移动通信集团公司 Method and system for evaluating fingerprint positioning error
CN103218438A (en) * 2013-04-18 2013-07-24 广东欧珀移动通信有限公司 Method of recommending online music based on playing record of mobile terminal and mobile terminal
CN105280170A (en) * 2015-10-10 2016-01-27 北京百度网讯科技有限公司 Method and device for playing music score
CN106409282A (en) * 2016-08-31 2017-02-15 得理电子(上海)有限公司 Audio frequency synthesis system and method, electronic device therefor and cloud server therefor
CN110148393A (en) * 2018-02-11 2019-08-20 阿里巴巴集团控股有限公司 Music generating method, device and system and data processing method
CN109859245A (en) * 2019-01-22 2019-06-07 深圳大学 Multi-object tracking method, device and the storage medium of video object
CN112037776A (en) * 2019-05-16 2020-12-04 武汉Tcl集团工业研究院有限公司 Voice recognition method, voice recognition device and terminal equipment
CN111554255A (en) * 2020-04-21 2020-08-18 华南理工大学 MIDI playing style automatic conversion system based on recurrent neural network
CN112102801A (en) * 2020-09-04 2020-12-18 北京有竹居网络技术有限公司 Method and device for generating main melody, electronic equipment and storage medium
CN112435642A (en) * 2020-11-12 2021-03-02 浙江大学 Melody MIDI accompaniment generation method based on deep neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113851098A (en) * 2021-08-31 2021-12-28 广东智媒云图科技股份有限公司 Melody style conversion method and device, terminal equipment and storage medium
CN113851098B (en) * 2021-08-31 2022-06-17 广东智媒云图科技股份有限公司 Melody style conversion method and device, terminal equipment and storage medium

Similar Documents

Publication Publication Date Title
US6355869B1 (en) Method and system for creating musical scores from musical recordings
US7081581B2 (en) Method and device for characterizing a signal and method and device for producing an indexed signal
CN111554255B (en) MIDI playing style automatic conversion system based on recurrent neural network
CN112331170B (en) Method, device, equipment and storage medium for analyzing Buddha music melody similarity
US20190213279A1 (en) Apparatus and method of analyzing and identifying song
CN113096621A (en) Music generation method, device and equipment based on specific style and storage medium
US20070208791A1 (en) Method and apparatus for the compression and decompression of audio files using a chaotic system
JP4132362B2 (en) Acoustic signal encoding method and program recording medium
US10431191B2 (en) Method and apparatus for analyzing characteristics of music information
CN113196381B (en) Acoustic analysis method and acoustic analysis device
CN104021793A (en) Method and apparatus for processing audio signal
EP1307992B1 (en) Compression and decompression of audio files using a chaotic system
Dubnov et al. Timbre recognition with combined stationary and temporal features
CN113035161A (en) Chord-based song melody generation method, device, equipment and storage medium
Ciamarone et al. Automatic Dastgah recognition using Markov models
CN112967734A (en) Music data identification method, device, equipment and storage medium based on multiple sound parts
CN115004294A (en) Composition creation method, composition creation device, and creation program
CN113066457B (en) Fan-exclamation music generation method, device, equipment and storage medium
Valero-Mas et al. Analyzing the influence of pitch quantization and note segmentation on singing voice alignment in the context of audio-based Query-by-Humming
CN112906872B (en) Method, device, equipment and storage medium for generating conversion of music score into sound spectrum
JP4156252B2 (en) Method for encoding an acoustic signal
JP4695781B2 (en) Method for encoding an acoustic signal
CN113066459B (en) Song information synthesis method, device, equipment and storage medium based on melody
CN113744760B (en) Pitch identification method and device, electronic equipment and storage medium
JP4662406B2 (en) Frequency analysis method and acoustic signal encoding method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination