CN113096621B - Music generation method, device, equipment and storage medium based on specific style - Google Patents
Music generation method, device, equipment and storage medium based on specific style Download PDFInfo
- Publication number
- CN113096621B CN113096621B CN202110322904.6A CN202110322904A CN113096621B CN 113096621 B CN113096621 B CN 113096621B CN 202110322904 A CN202110322904 A CN 202110322904A CN 113096621 B CN113096621 B CN 113096621B
- Authority
- CN
- China
- Prior art keywords
- data
- performance
- preset
- melody
- generating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000007246 mechanism Effects 0.000 claims abstract description 48
- 238000013528 artificial neural network Methods 0.000 claims abstract description 41
- 239000000203 mixture Substances 0.000 claims abstract description 24
- 238000012937 correction Methods 0.000 claims abstract description 16
- 238000011156 evaluation Methods 0.000 claims description 43
- 230000008859 change Effects 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 14
- 230000002159 abnormal effect Effects 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 11
- 230000009467 reduction Effects 0.000 claims description 11
- 239000012634 fragment Substances 0.000 claims description 9
- 238000013139 quantization Methods 0.000 claims description 8
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000013473 artificial intelligence Methods 0.000 abstract 1
- 238000000605 extraction Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/111—Automatic composing, i.e. using predefined musical rules
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The invention relates to the field of artificial intelligence, and discloses a music generation method, device, equipment and storage medium based on a specific style, which are used for generating musical compositions according to the specific style, so that the music generation efficiency and the controllability of the musical compositions are improved. The music generation method based on the specific style comprises the following steps: acquiring original data; marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network; inputting the encoded data into a preset decoder to generate decoded data; and carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference. In addition, the invention also relates to a blockchain technology, and the generated Buddha musical works can be stored in the blockchain nodes.
Description
Technical Field
The present invention relates to the field of audio conversion, and in particular, to a music generating method, apparatus, device and storage medium based on a specific style.
Background
Along with the development of deep learning, a music generation model and variants thereof are particularly important in music generation, and in the field of automatic music generation, a transducer model can generate works with a time longer than one minute in a short time and can be widely applied to language models and translation tasks.
However, currently existing music generation models have great limitations, the efficiency of music generation is low, and the style of the generated musical composition is not controllable.
Disclosure of Invention
The invention provides a music generation method, device, equipment and storage medium based on a specific style, which are used for generating a music work according to the specific style, so that the music generation efficiency and the controllability of the music work are improved.
The first aspect of the present invention provides a music generation method based on a specific style, comprising: acquiring original data, wherein the original data comprise musical instrument digital interface MIDI files of piano performance and audio data of piano performance; marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network; inputting the encoded data into a preset decoder to generate decoded data; and carrying out error correction on the decoded data based on a preset adjustment mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjustment mechanism comprises melody adjustment, performance adjustment and input interference.
Optionally, in a first implementation manner of the first aspect of the present invention, the marking the original data generates intermediate data, where the intermediate data includes a plurality of events including: marking the original data based on the note start time and the note end time, and generating first marking data, wherein the first marking data comprises a preset number of note-on events and a preset number of note-off events; marking the original data based on a preset time increment value, and generating second marking data, wherein the second marking data comprises a preset number of time shifting events; marking the original data based on a preset quantization speed, and generating third marking data, wherein the third marking data comprises a preset number of note playing speed events; and merging the first mark data, the second mark data and the third mark data to generate intermediate data, wherein the intermediate data comprises a plurality of events.
Optionally, in a second implementation manner of the first aspect of the present invention, the inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating the encoded data based on the relative attention mechanism and the feedforward neural network includes: extracting features of the intermediate data to generate performance input data and melody input data; inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder, and transmitting the performance input data to a feedforward neural network to obtain performance encoded data; inputting the melody input data into a preset melody encoder, passing through a multi-head relative attention layer of the preset melody encoder, and transmitting the melody input data to a feedforward neural network to obtain melody coded data; code data is generated based on the performance code data and the melody code data.
Optionally, in a third implementation manner of the first aspect of the present invention, the inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder, and transmitting the performance input data to a feedforward neural network, and obtaining performance coded data includes: inputting the performance input data into a first layer stack of the preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment; inputting the first performance segment into a second layer stack of the preset performance encoder, iterating according to preset times, and generating a performance time segment based on data output by a feedforward neural network of the last layer stack; and compressing the playing time slice to generate playing code data.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the error correction is performed on the decoded data based on a preset adjustment mechanism to obtain target data, and a final musical composition is generated according to the target data, where the adjustment mechanism includes melody adjustment, performance adjustment, and input interference including: correcting the target data based on a preset melody and performance mechanism, deleting abnormal data, and generating first adjustment data; and carrying out noise reduction processing according to the first regulation data, reducing input interference and generating a final musical composition.
Optionally, in a fifth implementation manner of the first aspect of the present invention, after performing error correction on the decoded data based on the preset adjustment mechanism to obtain target data, generating a final musical piece according to the target data, the method further includes: a similar evaluation of the performance characteristics was performed.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the performing a similar evaluation of performance characteristics includes: acquiring two pieces of musical compositions to be evaluated, and determining evaluation indexes, wherein the evaluation indexes comprise note density, pitch range, average change of pitch, overall change of pitch, average speed, speed change, average duration and duration change; respectively generating a plurality of evaluation index histograms of the two pieces of music based on the evaluation indexes, and calculating the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances; generating a normal distribution map according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution maps; and calculating the overlapping area of the normal distribution diagrams of the evaluation indexes corresponding to the two pieces of music, and evaluating the similarity based on the overlapping area.
A second aspect of the present invention provides a music generating apparatus based on a specific style, comprising: the acquisition module is used for acquiring original data, wherein the original data comprises a musical instrument digital interface MIDI file for playing a piano and audio data for playing the piano; the marking module is used for marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; the coding module is used for inputting the intermediate data into a preset performance coder and a preset melody coder, and generating coded data based on a relative attention mechanism and a feedforward neural network; the decoding module is used for inputting the coded data into a preset decoder to generate decoded data; and the adjusting module is used for carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data and generate a final musical composition, and the adjusting mechanism comprises melody adjustment, performance adjustment and input interference.
Optionally, in a first implementation manner of the second aspect of the present invention, the marking module includes: a first marking unit for marking the original data based on a note start time and a note end time, generating first marking data including a preset number of note-on events and a preset number of note-off events; a second marking unit for marking the original data based on a preset time increment value, generating second marking data, wherein the second marking data comprises a preset number of time shift events; a third marking unit for marking the original data based on a preset quantization speed, and generating third marking data, wherein the third marking data comprises a preset number of note playing speed events; and the merging unit is used for merging the first marking data, the second marking data and the third marking data to generate intermediate data, wherein the intermediate data comprises a plurality of events.
Optionally, in a second implementation manner of the second aspect of the present invention, the encoding module includes: the feature extraction unit is used for carrying out feature extraction on the intermediate data to generate performance input data and melody input data; the first input unit is used for inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder and transmitting the performance input data to a feedforward neural network to obtain performance coding data; the second input unit is used for inputting the melody input data into a preset melody encoder, transmitting the melody input data to a feedforward neural network through a multi-head relative attention layer of the preset melody encoder, and obtaining melody coded data; a first generation unit operable to generate encoded data based on the performance encoded data and the melody encoded data.
Optionally, in a third implementation manner of the second aspect of the present invention, the first input unit is specifically configured to: inputting the performance input data into a first layer stack of the preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment; inputting the first performance segment into a second layer stack of the preset performance encoder, iterating according to preset times, and generating a performance time segment based on data output by a feedforward neural network of the last layer stack; and compressing the playing time slice to generate playing code data.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the adjusting module includes: the correcting unit is used for correcting the target data based on a preset melody and performance mechanism, deleting abnormal data and generating first adjustment data; and the second generation unit is used for carrying out noise reduction processing according to the first regulation data, reducing input interference and generating a final musical piece.
Optionally, in a fifth implementation manner of the second aspect of the present invention, after performing error correction on the decoded data based on the preset adjustment mechanism to obtain target data, generating a final musical piece according to the target data, the apparatus further includes: and a similarity evaluation module.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the similarity evaluation module includes: the determining unit is used for obtaining two pieces of music to be evaluated and determining evaluation indexes, wherein the evaluation indexes comprise note density, pitch range, average change of pitch, overall change of pitch, average speed, speed change, average duration and duration change; the computing unit is used for respectively generating a plurality of evaluation index histograms of the two pieces of music based on the evaluation indexes, and computing the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances; the third generation unit is used for generating a normal distribution map according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution maps; and the evaluation unit is used for calculating the overlapping area of the normal distribution diagrams of the evaluation indexes corresponding to the two pieces of music works and carrying out similarity evaluation based on the overlapping area.
A third aspect of the present invention provides a music generating apparatus based on a specific style, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the particular style based music generation device to perform the particular style based music generation method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the above-described style-specific music generation method.
In the technical scheme provided by the invention, the original data is acquired, wherein the original data comprises a musical instrument digital interface MIDI file for playing a piano and audio data for playing the piano; marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network; inputting the encoded data into a preset decoder to generate decoded data; and carrying out error correction on the decoded data based on a preset adjustment mechanism to obtain target data and generate a final musical composition, wherein the adjustment mechanism comprises melody adjustment, performance adjustment and input interference. In the embodiment of the invention, the music works are generated according to the specific style, so that the music generation efficiency and the controllability of the music works are improved.
Drawings
FIG. 1 is a diagram illustrating an embodiment of a music generation method based on a specific style in an embodiment of the present invention;
FIG. 2 is a schematic diagram of another embodiment of a music generation method based on a specific style in an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a music generating apparatus based on a specific style according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of a music generating apparatus based on a specific style according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an embodiment of a music generating apparatus based on a specific style in an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a music generation method, device, equipment and storage medium based on a specific style, which are used for generating a music work according to the specific style, so that the music generation efficiency and the controllability of the music work are improved.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For easy understanding, the following describes a specific flow of an embodiment of the present invention, referring to fig. 1, and one embodiment of a music generating method based on a specific style in the embodiment of the present invention includes:
101. raw data including a musical instrument digital interface MIDI file of a piano performance and audio data of the piano performance are acquired.
The server acquires raw data including a musical instrument digital interface MIDI file of a piano performance and audio data of the piano performance. Raw data is collected from related music websites, including MIDI files for 5000 kinds of classical piano performances, and audio data collected from piano performances up to 20000 hours.
It is to be understood that the execution subject of the present invention may be a music generating device based on a specific style, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
102. The original data is marked to generate intermediate data, and the intermediate data comprises a plurality of events.
The server marks the original data to generate intermediate data, and the intermediate data comprises a plurality of events. Specifically, the server marks the original data based on the note start time and the note end time, and generates first mark data, wherein the first mark data comprises a preset number of note-on events and a preset number of note-off events; the server marks the original data based on a preset time increment value, and generates second marked data, wherein the second marked data comprises a preset number of time shifting events; the server marks the original data based on a preset quantization speed, and generates third mark data, wherein the third mark data comprises a preset number of note playing speed events; the server merges the first tag data, the second tag data and the third tag data to generate intermediate data, the intermediate data comprising a plurality of events.
The server represents the raw data as a series of discrete markers including 88 note-on events, 88 note-off events, 100 time shift events from 10ms to 1s in 10ms increments, and 16 quantized velocity markers representing 88 note-play velocity events, the velocity markers after quantization being capable of representing the note-play velocity.
103. The intermediate data is input to a preset performance encoder and a preset melody encoder, and encoded data is generated based on the relative attentiveness mechanism and the feedforward neural network.
The server inputs the intermediate data into a preset performance encoder and a preset melody encoder, and generates encoded data based on the relative attention mechanism and the feedforward neural network. The encoder is a device for programming signals or data and converting the signals or data into a signal form which can be used for communication, transmission and storage, when audio data is input, the audio can be converted into a data format which can be stored on a computer through the encoder, the performance encoder takes a playing part as an input, the melody encoder takes a melody part as an input, corresponding time slices are generated after encoding, each encoder comprises 6 layers of stacks, each stack is a data structure with data items arranged in sequence, the data items can be inserted and deleted only at one end, each layer of stacks comprises a multi-head relative attention layer and a feedforward neural network layer, the data passes through the relative attention layer of the first layer stack, passes through the feedforward neural network layer of the first layer stack and is transmitted to the relative attention layer of the next layer stack, and iterative processing is sequentially carried out until the feedforward neural network of the last layer stack outputs encoded data.
104. The encoded data is input to a preset decoder to generate decoded data.
The server inputs the encoded data to a preset decoder to generate decoded data. The decoder has the same structure as the network of the encoder, and comprises 6 layers of stacks, each layer stack comprising a multi-headed relative attention layer and a feed forward neural network layer, the decoder accepting the output of the encoder and generating new flags. To ensure the accuracy of the new signature, end-to-end model training can be performed, if given a sequence n of length x, the following formula is obtained: Where θ is a model parameter.
105. And carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference.
The server performs error correction on the decoded data based on a preset adjustment mechanism to obtain target data, and generates a final musical composition according to the target data, wherein the adjustment mechanism comprises melody adjustment, performance adjustment and input interference. Specifically, the server corrects the target data based on a preset melody and performance mechanism, deletes abnormal data, and generates first adjustment data; and the server performs noise reduction processing according to the first regulation data, reduces input interference and generates a final musical composition.
Abnormal data comprise messy codes caused by problems such as character coding, truncated characters, abnormal values and the like, the data are audited and filtered through setting preset melody and playing mechanisms, abnormal data are deleted, quality of works can be improved, noise reduction processing of the data is mainly based on Fourier transformation, signals in a time domain are converted into signals in a frequency domain, and corresponding noise reduction processing is carried out.
In the embodiment of the invention, the music works are generated according to the specific style, so that the music generation efficiency and the controllability of the music works are improved.
Referring to fig. 2, another embodiment of the music generating method based on a specific style in the embodiment of the present invention includes:
201. Raw data including a musical instrument digital interface MIDI file of a piano performance and audio data of the piano performance are acquired.
The server acquires raw data including a musical instrument digital interface MIDI file of a piano performance and audio data of the piano performance. Raw data is collected from related music websites, including MIDI files for 5000 kinds of classical piano performances, and audio data collected from piano performances up to 20000 hours.
It is to be understood that the execution subject of the present invention may be a music generating device based on a specific style, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
202. The original data is marked to generate intermediate data, and the intermediate data comprises a plurality of events.
The server marks the original data to generate intermediate data, and the intermediate data comprises a plurality of events. Specifically, the server marks the original data based on the note start time and the note end time, and generates first mark data, wherein the first mark data comprises a preset number of note-on events and a preset number of note-off events; the server marks the original data based on a preset time increment value, and generates second marked data, wherein the second marked data comprises a preset number of time shifting events; the server marks the original data based on a preset quantization speed, and generates third mark data, wherein the third mark data comprises a preset number of note playing speed events; the server merges the first tag data, the second tag data and the third tag data to generate intermediate data, the intermediate data comprising a plurality of events.
The server represents the raw data as a series of discrete markers including 88 note-on events, 88 note-off events, 100 time shift events from 10ms to 1s in 10ms increments, and 16 quantized velocity markers representing 88 note-play velocity events, the velocity markers after quantization being capable of representing the note-play velocity.
203. And extracting the characteristics of the intermediate data to generate performance input data and melody input data.
The server performs feature extraction on the intermediate data to generate performance input data and melody input data. The feature extraction is mainly based on principal component analysis PCA algorithm, and converts data into a plurality of comprehensive index features by using the idea of dimension reduction, and mainly comprises the following steps: the method comprises the steps of carrying out standardization processing on original data, calculating a correlation coefficient matrix, calculating characteristic values and characteristic vectors of the correlation coefficient matrix to obtain a new index scalar, calculating information contribution rate and accumulated contribution rate of the characteristic values, selecting main components according to preset rules, and finally generating performance input data and melody input data according to the preset rules.
204. And inputting the performance input data into a preset performance encoder, passing through the multi-head relative attention layer of the preset performance encoder, and transmitting to a feedforward neural network to obtain performance encoded data.
The server inputs the performance input data into a preset performance encoder, passes through the multi-head relative attention layer of the preset performance encoder, and transmits the performance input data to the feedforward neural network to obtain performance encoded data. Specifically, the server inputs performance input data into a first layer stack of a preset performance encoder, passes through a multi-head relative attention layer of the first layer stack, and transmits the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment; the server inputs the first performance segment into a second layer stack of a preset performance encoder, iterates according to preset times, and generates a performance time segment based on data output by a feedforward neural network of the last layer stack; the server compresses the performance time slice to generate performance coded data.
205. The melody input data is input into a preset melody encoder, passes through the multi-head relative attention layer of the preset melody encoder, and is transmitted to a feedforward neural network to obtain melody encoded data.
The server inputs the melody input data into a preset melody encoder, passes through the multi-head relative attention layer of the preset melody encoder, and transmits the melody input data to the feedforward neural network to obtain melody coded data. Specifically, the server inputs melody input data into a first layer stack of a preset melody encoder, passes through a multi-head relative attention layer of the first layer stack, and transmits the melody input data to a feedforward neural network of the first layer stack to generate a first performance fragment; the server inputs the first playing fragment into a second layer stack of a preset melody encoder, iterates according to preset times, and generates a melody time fragment based on data output by a feedforward neural network of the last layer stack; the server compresses the melody time segment to generate melody coded data.
206. The encoded data is generated based on the performance encoded data and the melody encoded data.
The server generates the encoded data based on the performance encoded data and the melody encoded data. The server performs a combination process of the performance coded data and the melody coded data to generate final coded data.
207. The encoded data is input to a preset decoder to generate decoded data.
The server inputs the encoded data to a preset decoder to generate decoded data. The decoder has the same structure as the network of the encoder, and comprises 6 layers of stacks, each layer stack comprising a multi-headed relative attention layer and a feed forward neural network layer, the decoder accepting the output of the encoder and generating new flags. To ensure the accuracy of the new signature, end-to-end model training can be performed, if given a sequence n of length x, the following formula is obtained: Where θ is a model parameter.
208. And carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference.
The server performs error correction on the decoded data based on a preset adjustment mechanism to obtain target data, and generates a final musical composition according to the target data, wherein the adjustment mechanism comprises melody adjustment, performance adjustment and input interference. Specifically, the server corrects the target data based on a preset melody and performance mechanism, deletes abnormal data, and generates first adjustment data; and the server performs noise reduction processing according to the first regulation data, reduces input interference and generates a final musical composition.
Abnormal data comprise messy codes caused by problems such as character coding, truncated characters, abnormal values and the like, the data are audited and filtered through setting preset melody and playing mechanisms, abnormal data are deleted, quality of works can be improved, noise reduction processing of the data is mainly based on Fourier transformation, signals in a time domain are converted into signals in a frequency domain, and corresponding noise reduction processing is carried out.
In the embodiment of the invention, the music works are generated according to the specific style, so that the music generation efficiency and the controllability of the music works are improved.
The method for generating music based on a specific style in the embodiment of the present invention is described above, and the apparatus for generating music based on a specific style in the embodiment of the present invention is described below, referring to fig. 3, an embodiment of the apparatus for generating music based on a specific style in the embodiment of the present invention includes:
An acquisition module 301, configured to acquire raw data, where the raw data includes a MIDI file of a musical instrument digital interface of a piano performance and audio data of the piano performance;
the marking module 302 is configured to mark the original data to generate intermediate data, where the intermediate data includes a plurality of events;
An encoding module 303 for inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network;
a decoding module 304, configured to input the encoded data into a preset decoder, and generate decoded data;
the adjustment module 305 is configured to perform error correction on the decoded data based on a preset adjustment mechanism, so as to obtain target data, and generate a final musical composition according to the target data, where the adjustment mechanism includes melody adjustment, performance adjustment, and input interference.
In the embodiment of the invention, the music works are generated according to the specific style, so that the music generation efficiency and the controllability of the music works are improved.
Referring to fig. 4, another embodiment of the music generating apparatus based on a specific style according to an embodiment of the present invention includes:
An acquisition module 301, configured to acquire raw data, where the raw data includes a MIDI file of a musical instrument digital interface of a piano performance and audio data of the piano performance;
the marking module 302 is configured to mark the original data to generate intermediate data, where the intermediate data includes a plurality of events;
An encoding module 303 for inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network;
a decoding module 304, configured to input the encoded data into a preset decoder, and generate decoded data;
the adjustment module 305 is configured to perform error correction on the decoded data based on a preset adjustment mechanism, so as to obtain target data, and generate a final musical composition according to the target data, where the adjustment mechanism includes melody adjustment, performance adjustment, and input interference.
Optionally, the marking module 302 includes:
A first marking unit 3021 for marking the original data based on the note-on time and the note-off time, generating first marking data including a preset number of note-on events and a preset number of note-off events;
A second marking unit 3022 for marking the original data based on the preset time increment value, generating second marking data including a preset number of time shift events;
A third marking unit 3023 for marking the original data based on a preset quantization speed, generating third marking data including a preset number of note playing speed events;
The merging unit 3024 is configured to merge the first flag data, the second flag data, and the third flag data to generate intermediate data, where the intermediate data includes a plurality of events.
Optionally, the encoding module 303 includes:
A feature extraction unit 3031, configured to perform feature extraction on the intermediate data to generate performance input data and melody input data;
A first input unit 3032, configured to input performance input data into a preset performance encoder, and transmit the performance input data to the feedforward neural network through a multi-head relative attention layer of the preset performance encoder to obtain performance encoded data;
The second input unit 3033 is configured to input melody input data into a preset melody encoder, and transmit the melody input data to the feedforward neural network through a multi-head relative attention layer of the preset melody encoder to obtain melody encoded data;
the first generating unit 3034 generates the encoded data based on the performance encoded data and the melody encoded data.
Optionally, the first input unit 3032 is specifically configured to:
Inputting performance input data into a first layer stack of a preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment; inputting the first performance segment into a second layer stack of the preset performance encoder, iterating according to preset times, and generating a performance time segment based on data output by a feedforward neural network of the last layer stack; and compressing the playing time slices to generate playing code data.
Optionally, the adjusting module 305 includes:
the correcting unit 3051 is configured to correct the target data based on a preset melody and performance mechanism, delete the abnormal data, and generate first adjustment data;
and the second generating unit 3052 is configured to perform noise reduction processing according to the first adjustment data, reduce input interference, and generate a final musical piece.
Optionally, after the adjusting module 305, the music generating apparatus based on a specific style further includes a similarity evaluating module 306, including:
the determining unit 3061 is used for obtaining two pieces of music to be evaluated, and determining an evaluation index, wherein the evaluation index comprises note density, a pitch range, an average change of a pitch, an overall change of the pitch, an average speed, a speed change, an average duration and a duration change;
The computing unit 3062 is used for respectively generating a plurality of evaluation index histograms of the two pieces of music based on the evaluation indexes, and computing the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances;
A third generating unit 3063, configured to generate a normal distribution map according to the mean and the variance of each evaluation index, so as to obtain a plurality of groups of normal distribution maps;
And the evaluation unit 3064 is used for calculating the overlapping area of the normal distribution diagrams of the evaluation indexes corresponding to the two pieces of music and performing similarity evaluation based on the overlapping area.
In the embodiment of the invention, the music works are generated according to the specific style, so that the music generation efficiency and the controllability of the music works are improved.
The specific style-based music generating apparatus in the embodiment of the present invention is described in detail above in terms of the modularized functional entity in fig. 3 and 4, and the specific style-based music generating device in the embodiment of the present invention is described in detail below in terms of hardware processing.
Fig. 5 is a schematic structural diagram of a music generating apparatus based on a specific style, where the music generating apparatus 500 based on a specific style may have a relatively large difference according to a configuration or a performance, and may include one or more processors (central processing units, CPU) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing application programs 533 or data 532 according to an embodiment of the present invention. Wherein memory 520 and storage medium 530 may be transitory or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations on the music generating apparatus 500 based on a specific style. Still further, the processor 510 may be arranged to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the specific style based music generating device 500.
The style-specific music generating device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input/output interfaces 560, and/or one or more operating systems 531, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the particular style based music generating device structure shown in fig. 5 does not constitute a limitation of the particular style based music generating device and may include more or less components than illustrated, or may combine certain components, or may be a different arrangement of components.
The present invention also provides a music generating apparatus based on a specific style, the computer apparatus including a memory and a processor, the memory storing computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the music generating method based on a specific style in the above embodiments.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, having stored therein instructions which, when executed on a computer, cause the computer to perform the steps of the specific style based music generation method.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (6)
1. A music generation method based on a specific style, characterized in that the music generation method based on a specific style comprises:
acquiring original data, wherein the original data comprise musical instrument digital interface MIDI files of piano performance and audio data of piano performance;
Marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events;
Extracting features of the intermediate data to generate performance input data and melody input data;
Inputting the performance input data into a first layer stack of a preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment;
Inputting the first performance segment into a second layer stack of the preset performance encoder, iterating according to preset times, and generating a performance time segment based on data output by a feedforward neural network of the last layer stack;
compressing the playing time segment to generate playing code data;
Inputting the melody input data into a preset melody encoder, passing through a multi-head relative attention layer of the preset melody encoder, and transmitting the melody input data to a feedforward neural network to obtain melody coded data;
Generating encoded data based on the performance encoded data and the melody encoded data;
inputting the encoded data into a preset decoder to generate decoded data;
Performing error correction on the decoded data based on a preset adjustment mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjustment mechanism comprises melody adjustment, performance adjustment and input interference;
acquiring two pieces of musical compositions to be evaluated, and determining evaluation indexes, wherein the evaluation indexes comprise note density, pitch range, average change of pitch, overall change of pitch, average speed, speed change, average duration and duration change;
Respectively generating a plurality of evaluation index histograms of the two pieces of music based on the evaluation indexes, and calculating the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances;
Generating a normal distribution map according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution maps;
and calculating the overlapping area of the normal distribution diagrams of the evaluation indexes corresponding to the two pieces of music, and evaluating the similarity based on the overlapping area.
2. The style-specific music generation method of claim 1, wherein the tagging the original data to generate intermediate data, the intermediate data comprising a plurality of events comprises:
marking the original data based on the note start time and the note end time, and generating first marking data, wherein the first marking data comprises a preset number of note-on events and a preset number of note-off events;
marking the original data based on a preset time increment value, and generating second marking data, wherein the second marking data comprises a preset number of time shifting events;
Marking the original data based on a preset quantization speed, and generating third marking data, wherein the third marking data comprises a preset number of note playing speed events;
and merging the first mark data, the second mark data and the third mark data to generate intermediate data, wherein the intermediate data comprises a plurality of events.
3. The music generating method according to claim 1, wherein the error correction is performed on the decoded data based on a preset adjustment mechanism to obtain target data, and a final musical composition is generated according to the target data, and the adjustment mechanism includes melody adjustment, performance adjustment, and input disturbance including:
correcting the target data based on a preset melody and performance mechanism, deleting abnormal data, and generating first adjustment data;
And carrying out noise reduction processing according to the first regulation data, reducing input interference and generating a final musical composition.
4. A specific style-based music generation apparatus, characterized in that the specific style-based music generation apparatus includes:
The acquisition module is used for acquiring original data, wherein the original data comprises a musical instrument digital interface MIDI file for playing a piano and audio data for playing the piano;
the marking module is used for marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events;
The coding module is used for extracting the characteristics of the intermediate data to generate performance input data and melody input data; inputting the performance input data into a first layer stack of a preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment; inputting the first performance segment into a second layer stack of the preset performance encoder, iterating according to preset times, and generating a performance time segment based on data output by a feedforward neural network of the last layer stack; compressing the playing time segment to generate playing code data; inputting the melody input data into a preset melody encoder, passing through a multi-head relative attention layer of the preset melody encoder, and transmitting the melody input data to a feedforward neural network to obtain melody coded data; generating encoded data based on the performance encoded data and the melody encoded data;
the decoding module is used for inputting the coded data into a preset decoder to generate decoded data;
The adjusting module is used for carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference; acquiring two pieces of musical compositions to be evaluated, and determining evaluation indexes, wherein the evaluation indexes comprise note density, pitch range, average change of pitch, overall change of pitch, average speed, speed change, average duration and duration change; respectively generating a plurality of evaluation index histograms of the two pieces of music based on the evaluation indexes, and calculating the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances; generating a normal distribution map according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution maps; and calculating the overlapping area of the normal distribution diagrams of the evaluation indexes corresponding to the two pieces of music, and evaluating the similarity based on the overlapping area.
5. A specific style-based music generating apparatus, characterized in that the specific style-based music generating apparatus comprises: a memory and at least one processor, the memory having instructions stored therein;
The at least one processor invoking the instructions in the memory to cause the particular style based music generation apparatus to perform the particular style based music generation method of any of claims 1-3.
6. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement a particular style-based music generation method as claimed in any one of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110322904.6A CN113096621B (en) | 2021-03-26 | 2021-03-26 | Music generation method, device, equipment and storage medium based on specific style |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110322904.6A CN113096621B (en) | 2021-03-26 | 2021-03-26 | Music generation method, device, equipment and storage medium based on specific style |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113096621A CN113096621A (en) | 2021-07-09 |
CN113096621B true CN113096621B (en) | 2024-05-28 |
Family
ID=76670047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110322904.6A Active CN113096621B (en) | 2021-03-26 | 2021-03-26 | Music generation method, device, equipment and storage medium based on specific style |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113096621B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113851098B (en) * | 2021-08-31 | 2022-06-17 | 广东智媒云图科技股份有限公司 | Melody style conversion method and device, terminal equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102480784A (en) * | 2010-11-24 | 2012-05-30 | 中国移动通信集团公司 | Fingerprint positioning error evaluation method and system |
CN103218438A (en) * | 2013-04-18 | 2013-07-24 | 广东欧珀移动通信有限公司 | Method of recommending online music based on playing record of mobile terminal and mobile terminal |
CN105280170A (en) * | 2015-10-10 | 2016-01-27 | 北京百度网讯科技有限公司 | Method and device for playing music score |
CN106409282A (en) * | 2016-08-31 | 2017-02-15 | 得理电子(上海)有限公司 | Audio frequency synthesis system and method, electronic device therefor and cloud server therefor |
CN109859245A (en) * | 2019-01-22 | 2019-06-07 | 深圳大学 | Multi-object tracking method, device and the storage medium of video object |
CN110148393A (en) * | 2018-02-11 | 2019-08-20 | 阿里巴巴集团控股有限公司 | Music generating method, device and system and data processing method |
CN111554255A (en) * | 2020-04-21 | 2020-08-18 | 华南理工大学 | MIDI playing style automatic conversion system based on recurrent neural network |
CN112037776A (en) * | 2019-05-16 | 2020-12-04 | 武汉Tcl集团工业研究院有限公司 | Voice recognition method, voice recognition device and terminal equipment |
CN112102801A (en) * | 2020-09-04 | 2020-12-18 | 北京有竹居网络技术有限公司 | Method and device for generating main melody, electronic equipment and storage medium |
CN112435642A (en) * | 2020-11-12 | 2021-03-02 | 浙江大学 | Melody MIDI accompaniment generation method based on deep neural network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9721551B2 (en) * | 2015-09-29 | 2017-08-01 | Amper Music, Inc. | Machines, systems, processes for automated music composition and generation employing linguistic and/or graphical icon based musical experience descriptions |
US10799795B1 (en) * | 2019-03-26 | 2020-10-13 | Electronic Arts Inc. | Real-time audio generation for electronic games based on personalized music preferences |
-
2021
- 2021-03-26 CN CN202110322904.6A patent/CN113096621B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102480784A (en) * | 2010-11-24 | 2012-05-30 | 中国移动通信集团公司 | Fingerprint positioning error evaluation method and system |
CN103218438A (en) * | 2013-04-18 | 2013-07-24 | 广东欧珀移动通信有限公司 | Method of recommending online music based on playing record of mobile terminal and mobile terminal |
CN105280170A (en) * | 2015-10-10 | 2016-01-27 | 北京百度网讯科技有限公司 | Method and device for playing music score |
CN106409282A (en) * | 2016-08-31 | 2017-02-15 | 得理电子(上海)有限公司 | Audio frequency synthesis system and method, electronic device therefor and cloud server therefor |
CN110148393A (en) * | 2018-02-11 | 2019-08-20 | 阿里巴巴集团控股有限公司 | Music generating method, device and system and data processing method |
CN109859245A (en) * | 2019-01-22 | 2019-06-07 | 深圳大学 | Multi-object tracking method, device and the storage medium of video object |
CN112037776A (en) * | 2019-05-16 | 2020-12-04 | 武汉Tcl集团工业研究院有限公司 | Voice recognition method, voice recognition device and terminal equipment |
CN111554255A (en) * | 2020-04-21 | 2020-08-18 | 华南理工大学 | MIDI playing style automatic conversion system based on recurrent neural network |
CN112102801A (en) * | 2020-09-04 | 2020-12-18 | 北京有竹居网络技术有限公司 | Method and device for generating main melody, electronic equipment and storage medium |
CN112435642A (en) * | 2020-11-12 | 2021-03-02 | 浙江大学 | Melody MIDI accompaniment generation method based on deep neural network |
Also Published As
Publication number | Publication date |
---|---|
CN113096621A (en) | 2021-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112331170B (en) | Method, device, equipment and storage medium for analyzing Buddha music melody similarity | |
CN113096621B (en) | Music generation method, device, equipment and storage medium based on specific style | |
CN112435642B (en) | Melody MIDI accompaniment generation method based on deep neural network | |
Cherniavsky et al. | Grammar-based compression of DNA sequences | |
US10147443B2 (en) | Matching device, judgment device, and method, program, and recording medium therefor | |
CN111063327A (en) | Audio processing method and device, electronic equipment and storage medium | |
CN117408311A (en) | Small sample malicious website detection method based on CNN, transformer and transfer learning | |
CN113707112A (en) | Recursive jump connection deep learning music automatic generation method based on layer standardization | |
EP3093781A1 (en) | System and method for transforming and compressing genomics data | |
CN116168666A (en) | Music estimation device, music estimation method, and model generation device | |
CN112967734B (en) | Music data identification method, device, equipment and storage medium based on multiple sound parts | |
CN113066459B (en) | Song information synthesis method, device, equipment and storage medium based on melody | |
US20190189100A1 (en) | Method and apparatus for analyzing characteristics of music information | |
CN116259289A (en) | Automatic music description generation method | |
CN113053336B (en) | Musical composition generation method, device, equipment and storage medium | |
CN1023160C (en) | Digital speech coder with vector excitation source having improved speech quality | |
CN113066457B (en) | Fan-exclamation music generation method, device, equipment and storage medium | |
CN112906872B (en) | Method, device, equipment and storage medium for generating conversion of music score into sound spectrum | |
JP2011009868A (en) | Encoding method, decoding method, encoder, decoder, and program | |
CN113012667B (en) | Music track separation method, device, equipment and storage medium based on Buddha music | |
CN113379875B (en) | Cartoon character animation generation method, device, equipment and storage medium | |
CN118152488B (en) | Remote sensing big data storage and retrieval method and system, electronic equipment and storage medium | |
CN115713065B (en) | Method for generating problem, electronic equipment and computer readable storage medium | |
CN113033778B (en) | Buddha music generation method, device, equipment and storage medium | |
CN118886397A (en) | Multi-dimensional flow data synthesis method and system based on large language model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |