CN113096621B

CN113096621B - Music generation method, device, equipment and storage medium based on specific style

Info

Publication number: CN113096621B
Application number: CN202110322904.6A
Authority: CN
Inventors: 刘奡智; 韩宝强; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2024-05-28
Anticipated expiration: 2041-03-26
Also published as: CN113096621A

Abstract

The invention relates to the field of artificial intelligence, and discloses a music generation method, device, equipment and storage medium based on a specific style, which are used for generating musical compositions according to the specific style, so that the music generation efficiency and the controllability of the musical compositions are improved. The music generation method based on the specific style comprises the following steps: acquiring original data; marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network; inputting the encoded data into a preset decoder to generate decoded data; and carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference. In addition, the invention also relates to a blockchain technology, and the generated Buddha musical works can be stored in the blockchain nodes.

Description

Music generation method, device, equipment and storage medium based on specific style

Technical Field

The present invention relates to the field of audio conversion, and in particular, to a music generating method, apparatus, device and storage medium based on a specific style.

Background

Along with the development of deep learning, a music generation model and variants thereof are particularly important in music generation, and in the field of automatic music generation, a transducer model can generate works with a time longer than one minute in a short time and can be widely applied to language models and translation tasks.

However, currently existing music generation models have great limitations, the efficiency of music generation is low, and the style of the generated musical composition is not controllable.

Disclosure of Invention

The invention provides a music generation method, device, equipment and storage medium based on a specific style, which are used for generating a music work according to the specific style, so that the music generation efficiency and the controllability of the music work are improved.

The first aspect of the present invention provides a music generation method based on a specific style, comprising: acquiring original data, wherein the original data comprise musical instrument digital interface MIDI files of piano performance and audio data of piano performance; marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network; inputting the encoded data into a preset decoder to generate decoded data; and carrying out error correction on the decoded data based on a preset adjustment mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjustment mechanism comprises melody adjustment, performance adjustment and input interference.

Optionally, in a first implementation manner of the first aspect of the present invention, the marking the original data generates intermediate data, where the intermediate data includes a plurality of events including: marking the original data based on the note start time and the note end time, and generating first marking data, wherein the first marking data comprises a preset number of note-on events and a preset number of note-off events; marking the original data based on a preset time increment value, and generating second marking data, wherein the second marking data comprises a preset number of time shifting events; marking the original data based on a preset quantization speed, and generating third marking data, wherein the third marking data comprises a preset number of note playing speed events; and merging the first mark data, the second mark data and the third mark data to generate intermediate data, wherein the intermediate data comprises a plurality of events.

Optionally, in a second implementation manner of the first aspect of the present invention, the inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating the encoded data based on the relative attention mechanism and the feedforward neural network includes: extracting features of the intermediate data to generate performance input data and melody input data; inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder, and transmitting the performance input data to a feedforward neural network to obtain performance encoded data; inputting the melody input data into a preset melody encoder, passing through a multi-head relative attention layer of the preset melody encoder, and transmitting the melody input data to a feedforward neural network to obtain melody coded data; code data is generated based on the performance code data and the melody code data.

Optionally, in a third implementation manner of the first aspect of the present invention, the inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder, and transmitting the performance input data to a feedforward neural network, and obtaining performance coded data includes: inputting the performance input data into a first layer stack of the preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment; inputting the first performance segment into a second layer stack of the preset performance encoder, iterating according to preset times, and generating a performance time segment based on data output by a feedforward neural network of the last layer stack; and compressing the playing time slice to generate playing code data.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the error correction is performed on the decoded data based on a preset adjustment mechanism to obtain target data, and a final musical composition is generated according to the target data, where the adjustment mechanism includes melody adjustment, performance adjustment, and input interference including: correcting the target data based on a preset melody and performance mechanism, deleting abnormal data, and generating first adjustment data; and carrying out noise reduction processing according to the first regulation data, reducing input interference and generating a final musical composition.

Optionally, in a fifth implementation manner of the first aspect of the present invention, after performing error correction on the decoded data based on the preset adjustment mechanism to obtain target data, generating a final musical piece according to the target data, the method further includes: a similar evaluation of the performance characteristics was performed.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the performing a similar evaluation of performance characteristics includes: acquiring two pieces of musical compositions to be evaluated, and determining evaluation indexes, wherein the evaluation indexes comprise note density, pitch range, average change of pitch, overall change of pitch, average speed, speed change, average duration and duration change; respectively generating a plurality of evaluation index histograms of the two pieces of music based on the evaluation indexes, and calculating the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances; generating a normal distribution map according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution maps; and calculating the overlapping area of the normal distribution diagrams of the evaluation indexes corresponding to the two pieces of music, and evaluating the similarity based on the overlapping area.

A second aspect of the present invention provides a music generating apparatus based on a specific style, comprising: the acquisition module is used for acquiring original data, wherein the original data comprises a musical instrument digital interface MIDI file for playing a piano and audio data for playing the piano; the marking module is used for marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; the coding module is used for inputting the intermediate data into a preset performance coder and a preset melody coder, and generating coded data based on a relative attention mechanism and a feedforward neural network; the decoding module is used for inputting the coded data into a preset decoder to generate decoded data; and the adjusting module is used for carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data and generate a final musical composition, and the adjusting mechanism comprises melody adjustment, performance adjustment and input interference.

Optionally, in a first implementation manner of the second aspect of the present invention, the marking module includes: a first marking unit for marking the original data based on a note start time and a note end time, generating first marking data including a preset number of note-on events and a preset number of note-off events; a second marking unit for marking the original data based on a preset time increment value, generating second marking data, wherein the second marking data comprises a preset number of time shift events; a third marking unit for marking the original data based on a preset quantization speed, and generating third marking data, wherein the third marking data comprises a preset number of note playing speed events; and the merging unit is used for merging the first marking data, the second marking data and the third marking data to generate intermediate data, wherein the intermediate data comprises a plurality of events.

Optionally, in a second implementation manner of the second aspect of the present invention, the encoding module includes: the feature extraction unit is used for carrying out feature extraction on the intermediate data to generate performance input data and melody input data; the first input unit is used for inputting the performance input data into a preset performance encoder, passing through a multi-head relative attention layer of the preset performance encoder and transmitting the performance input data to a feedforward neural network to obtain performance coding data; the second input unit is used for inputting the melody input data into a preset melody encoder, transmitting the melody input data to a feedforward neural network through a multi-head relative attention layer of the preset melody encoder, and obtaining melody coded data; a first generation unit operable to generate encoded data based on the performance encoded data and the melody encoded data.

Optionally, in a third implementation manner of the second aspect of the present invention, the first input unit is specifically configured to: inputting the performance input data into a first layer stack of the preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment; inputting the first performance segment into a second layer stack of the preset performance encoder, iterating according to preset times, and generating a performance time segment based on data output by a feedforward neural network of the last layer stack; and compressing the playing time slice to generate playing code data.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the adjusting module includes: the correcting unit is used for correcting the target data based on a preset melody and performance mechanism, deleting abnormal data and generating first adjustment data; and the second generation unit is used for carrying out noise reduction processing according to the first regulation data, reducing input interference and generating a final musical piece.

Optionally, in a fifth implementation manner of the second aspect of the present invention, after performing error correction on the decoded data based on the preset adjustment mechanism to obtain target data, generating a final musical piece according to the target data, the apparatus further includes: and a similarity evaluation module.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the similarity evaluation module includes: the determining unit is used for obtaining two pieces of music to be evaluated and determining evaluation indexes, wherein the evaluation indexes comprise note density, pitch range, average change of pitch, overall change of pitch, average speed, speed change, average duration and duration change; the computing unit is used for respectively generating a plurality of evaluation index histograms of the two pieces of music based on the evaluation indexes, and computing the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances; the third generation unit is used for generating a normal distribution map according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution maps; and the evaluation unit is used for calculating the overlapping area of the normal distribution diagrams of the evaluation indexes corresponding to the two pieces of music works and carrying out similarity evaluation based on the overlapping area.

A third aspect of the present invention provides a music generating apparatus based on a specific style, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the particular style based music generation device to perform the particular style based music generation method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the above-described style-specific music generation method.

In the technical scheme provided by the invention, the original data is acquired, wherein the original data comprises a musical instrument digital interface MIDI file for playing a piano and audio data for playing the piano; marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events; inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network; inputting the encoded data into a preset decoder to generate decoded data; and carrying out error correction on the decoded data based on a preset adjustment mechanism to obtain target data and generate a final musical composition, wherein the adjustment mechanism comprises melody adjustment, performance adjustment and input interference. In the embodiment of the invention, the music works are generated according to the specific style, so that the music generation efficiency and the controllability of the music works are improved.

Drawings

FIG. 1 is a diagram illustrating an embodiment of a music generation method based on a specific style in an embodiment of the present invention;

FIG. 2 is a schematic diagram of another embodiment of a music generation method based on a specific style in an embodiment of the present invention;

FIG. 3 is a schematic diagram of an embodiment of a music generating apparatus based on a specific style according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another embodiment of a music generating apparatus based on a specific style according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an embodiment of a music generating apparatus based on a specific style in an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a music generation method, device, equipment and storage medium based on a specific style, which are used for generating a music work according to the specific style, so that the music generation efficiency and the controllability of the music work are improved.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For easy understanding, the following describes a specific flow of an embodiment of the present invention, referring to fig. 1, and one embodiment of a music generating method based on a specific style in the embodiment of the present invention includes:

101. raw data including a musical instrument digital interface MIDI file of a piano performance and audio data of the piano performance are acquired.

The server acquires raw data including a musical instrument digital interface MIDI file of a piano performance and audio data of the piano performance. Raw data is collected from related music websites, including MIDI files for 5000 kinds of classical piano performances, and audio data collected from piano performances up to 20000 hours.

It is to be understood that the execution subject of the present invention may be a music generating device based on a specific style, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.

102. The original data is marked to generate intermediate data, and the intermediate data comprises a plurality of events.

The server marks the original data to generate intermediate data, and the intermediate data comprises a plurality of events. Specifically, the server marks the original data based on the note start time and the note end time, and generates first mark data, wherein the first mark data comprises a preset number of note-on events and a preset number of note-off events; the server marks the original data based on a preset time increment value, and generates second marked data, wherein the second marked data comprises a preset number of time shifting events; the server marks the original data based on a preset quantization speed, and generates third mark data, wherein the third mark data comprises a preset number of note playing speed events; the server merges the first tag data, the second tag data and the third tag data to generate intermediate data, the intermediate data comprising a plurality of events.

The server represents the raw data as a series of discrete markers including 88 note-on events, 88 note-off events, 100 time shift events from 10ms to 1s in 10ms increments, and 16 quantized velocity markers representing 88 note-play velocity events, the velocity markers after quantization being capable of representing the note-play velocity.

103. The intermediate data is input to a preset performance encoder and a preset melody encoder, and encoded data is generated based on the relative attentiveness mechanism and the feedforward neural network.

The server inputs the intermediate data into a preset performance encoder and a preset melody encoder, and generates encoded data based on the relative attention mechanism and the feedforward neural network. The encoder is a device for programming signals or data and converting the signals or data into a signal form which can be used for communication, transmission and storage, when audio data is input, the audio can be converted into a data format which can be stored on a computer through the encoder, the performance encoder takes a playing part as an input, the melody encoder takes a melody part as an input, corresponding time slices are generated after encoding, each encoder comprises 6 layers of stacks, each stack is a data structure with data items arranged in sequence, the data items can be inserted and deleted only at one end, each layer of stacks comprises a multi-head relative attention layer and a feedforward neural network layer, the data passes through the relative attention layer of the first layer stack, passes through the feedforward neural network layer of the first layer stack and is transmitted to the relative attention layer of the next layer stack, and iterative processing is sequentially carried out until the feedforward neural network of the last layer stack outputs encoded data.

104. The encoded data is input to a preset decoder to generate decoded data.

The server inputs the encoded data to a preset decoder to generate decoded data. The decoder has the same structure as the network of the encoder, and comprises 6 layers of stacks, each layer stack comprising a multi-headed relative attention layer and a feed forward neural network layer, the decoder accepting the output of the encoder and generating new flags. To ensure the accuracy of the new signature, end-to-end model training can be performed, if given a sequence n of length x, the following formula is obtained: Where θ is a model parameter.

105. And carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference.

The server performs error correction on the decoded data based on a preset adjustment mechanism to obtain target data, and generates a final musical composition according to the target data, wherein the adjustment mechanism comprises melody adjustment, performance adjustment and input interference. Specifically, the server corrects the target data based on a preset melody and performance mechanism, deletes abnormal data, and generates first adjustment data; and the server performs noise reduction processing according to the first regulation data, reduces input interference and generates a final musical composition.

Abnormal data comprise messy codes caused by problems such as character coding, truncated characters, abnormal values and the like, the data are audited and filtered through setting preset melody and playing mechanisms, abnormal data are deleted, quality of works can be improved, noise reduction processing of the data is mainly based on Fourier transformation, signals in a time domain are converted into signals in a frequency domain, and corresponding noise reduction processing is carried out.

In the embodiment of the invention, the music works are generated according to the specific style, so that the music generation efficiency and the controllability of the music works are improved.

Referring to fig. 2, another embodiment of the music generating method based on a specific style in the embodiment of the present invention includes:

201. Raw data including a musical instrument digital interface MIDI file of a piano performance and audio data of the piano performance are acquired.

202. The original data is marked to generate intermediate data, and the intermediate data comprises a plurality of events.

203. And extracting the characteristics of the intermediate data to generate performance input data and melody input data.

The server performs feature extraction on the intermediate data to generate performance input data and melody input data. The feature extraction is mainly based on principal component analysis PCA algorithm, and converts data into a plurality of comprehensive index features by using the idea of dimension reduction, and mainly comprises the following steps: the method comprises the steps of carrying out standardization processing on original data, calculating a correlation coefficient matrix, calculating characteristic values and characteristic vectors of the correlation coefficient matrix to obtain a new index scalar, calculating information contribution rate and accumulated contribution rate of the characteristic values, selecting main components according to preset rules, and finally generating performance input data and melody input data according to the preset rules.

204. And inputting the performance input data into a preset performance encoder, passing through the multi-head relative attention layer of the preset performance encoder, and transmitting to a feedforward neural network to obtain performance encoded data.

The server inputs the performance input data into a preset performance encoder, passes through the multi-head relative attention layer of the preset performance encoder, and transmits the performance input data to the feedforward neural network to obtain performance encoded data. Specifically, the server inputs performance input data into a first layer stack of a preset performance encoder, passes through a multi-head relative attention layer of the first layer stack, and transmits the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment; the server inputs the first performance segment into a second layer stack of a preset performance encoder, iterates according to preset times, and generates a performance time segment based on data output by a feedforward neural network of the last layer stack; the server compresses the performance time slice to generate performance coded data.

205. The melody input data is input into a preset melody encoder, passes through the multi-head relative attention layer of the preset melody encoder, and is transmitted to a feedforward neural network to obtain melody encoded data.

The server inputs the melody input data into a preset melody encoder, passes through the multi-head relative attention layer of the preset melody encoder, and transmits the melody input data to the feedforward neural network to obtain melody coded data. Specifically, the server inputs melody input data into a first layer stack of a preset melody encoder, passes through a multi-head relative attention layer of the first layer stack, and transmits the melody input data to a feedforward neural network of the first layer stack to generate a first performance fragment; the server inputs the first playing fragment into a second layer stack of a preset melody encoder, iterates according to preset times, and generates a melody time fragment based on data output by a feedforward neural network of the last layer stack; the server compresses the melody time segment to generate melody coded data.

206. The encoded data is generated based on the performance encoded data and the melody encoded data.

The server generates the encoded data based on the performance encoded data and the melody encoded data. The server performs a combination process of the performance coded data and the melody coded data to generate final coded data.

207. The encoded data is input to a preset decoder to generate decoded data.

208. And carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference.

The method for generating music based on a specific style in the embodiment of the present invention is described above, and the apparatus for generating music based on a specific style in the embodiment of the present invention is described below, referring to fig. 3, an embodiment of the apparatus for generating music based on a specific style in the embodiment of the present invention includes:

An acquisition module 301, configured to acquire raw data, where the raw data includes a MIDI file of a musical instrument digital interface of a piano performance and audio data of the piano performance;

the marking module 302 is configured to mark the original data to generate intermediate data, where the intermediate data includes a plurality of events;

An encoding module 303 for inputting the intermediate data into a preset performance encoder and a preset melody encoder, and generating encoded data based on a relative attention mechanism and a feedforward neural network;

a decoding module 304, configured to input the encoded data into a preset decoder, and generate decoded data;

the adjustment module 305 is configured to perform error correction on the decoded data based on a preset adjustment mechanism, so as to obtain target data, and generate a final musical composition according to the target data, where the adjustment mechanism includes melody adjustment, performance adjustment, and input interference.

Referring to fig. 4, another embodiment of the music generating apparatus based on a specific style according to an embodiment of the present invention includes:

Optionally, the marking module 302 includes:

A first marking unit 3021 for marking the original data based on the note-on time and the note-off time, generating first marking data including a preset number of note-on events and a preset number of note-off events;

A second marking unit 3022 for marking the original data based on the preset time increment value, generating second marking data including a preset number of time shift events;

A third marking unit 3023 for marking the original data based on a preset quantization speed, generating third marking data including a preset number of note playing speed events;

The merging unit 3024 is configured to merge the first flag data, the second flag data, and the third flag data to generate intermediate data, where the intermediate data includes a plurality of events.

Optionally, the encoding module 303 includes:

A feature extraction unit 3031, configured to perform feature extraction on the intermediate data to generate performance input data and melody input data;

A first input unit 3032, configured to input performance input data into a preset performance encoder, and transmit the performance input data to the feedforward neural network through a multi-head relative attention layer of the preset performance encoder to obtain performance encoded data;

The second input unit 3033 is configured to input melody input data into a preset melody encoder, and transmit the melody input data to the feedforward neural network through a multi-head relative attention layer of the preset melody encoder to obtain melody encoded data;

the first generating unit 3034 generates the encoded data based on the performance encoded data and the melody encoded data.

Optionally, the first input unit 3032 is specifically configured to:

Inputting performance input data into a first layer stack of a preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment; inputting the first performance segment into a second layer stack of the preset performance encoder, iterating according to preset times, and generating a performance time segment based on data output by a feedforward neural network of the last layer stack; and compressing the playing time slices to generate playing code data.

Optionally, the adjusting module 305 includes:

the correcting unit 3051 is configured to correct the target data based on a preset melody and performance mechanism, delete the abnormal data, and generate first adjustment data;

and the second generating unit 3052 is configured to perform noise reduction processing according to the first adjustment data, reduce input interference, and generate a final musical piece.

Optionally, after the adjusting module 305, the music generating apparatus based on a specific style further includes a similarity evaluating module 306, including:

the determining unit 3061 is used for obtaining two pieces of music to be evaluated, and determining an evaluation index, wherein the evaluation index comprises note density, a pitch range, an average change of a pitch, an overall change of the pitch, an average speed, a speed change, an average duration and a duration change;

The computing unit 3062 is used for respectively generating a plurality of evaluation index histograms of the two pieces of music based on the evaluation indexes, and computing the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances;

A third generating unit 3063, configured to generate a normal distribution map according to the mean and the variance of each evaluation index, so as to obtain a plurality of groups of normal distribution maps;

And the evaluation unit 3064 is used for calculating the overlapping area of the normal distribution diagrams of the evaluation indexes corresponding to the two pieces of music and performing similarity evaluation based on the overlapping area.

The specific style-based music generating apparatus in the embodiment of the present invention is described in detail above in terms of the modularized functional entity in fig. 3 and 4, and the specific style-based music generating device in the embodiment of the present invention is described in detail below in terms of hardware processing.

Fig. 5 is a schematic structural diagram of a music generating apparatus based on a specific style, where the music generating apparatus 500 based on a specific style may have a relatively large difference according to a configuration or a performance, and may include one or more processors (central processing units, CPU) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing application programs 533 or data 532 according to an embodiment of the present invention. Wherein memory 520 and storage medium 530 may be transitory or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations on the music generating apparatus 500 based on a specific style. Still further, the processor 510 may be arranged to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the specific style based music generating device 500.

The style-specific music generating device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input/output interfaces 560, and/or one or more operating systems 531, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the particular style based music generating device structure shown in fig. 5 does not constitute a limitation of the particular style based music generating device and may include more or less components than illustrated, or may combine certain components, or may be a different arrangement of components.

The present invention also provides a music generating apparatus based on a specific style, the computer apparatus including a memory and a processor, the memory storing computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the music generating method based on a specific style in the above embodiments.

The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, having stored therein instructions which, when executed on a computer, cause the computer to perform the steps of the specific style based music generation method.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A music generation method based on a specific style, characterized in that the music generation method based on a specific style comprises:

acquiring original data, wherein the original data comprise musical instrument digital interface MIDI files of piano performance and audio data of piano performance;

Marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events;

Extracting features of the intermediate data to generate performance input data and melody input data;

Inputting the performance input data into a first layer stack of a preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment;

Inputting the first performance segment into a second layer stack of the preset performance encoder, iterating according to preset times, and generating a performance time segment based on data output by a feedforward neural network of the last layer stack;

compressing the playing time segment to generate playing code data;

Inputting the melody input data into a preset melody encoder, passing through a multi-head relative attention layer of the preset melody encoder, and transmitting the melody input data to a feedforward neural network to obtain melody coded data;

Generating encoded data based on the performance encoded data and the melody encoded data;

inputting the encoded data into a preset decoder to generate decoded data;

Performing error correction on the decoded data based on a preset adjustment mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjustment mechanism comprises melody adjustment, performance adjustment and input interference;

acquiring two pieces of musical compositions to be evaluated, and determining evaluation indexes, wherein the evaluation indexes comprise note density, pitch range, average change of pitch, overall change of pitch, average speed, speed change, average duration and duration change;

Respectively generating a plurality of evaluation index histograms of the two pieces of music based on the evaluation indexes, and calculating the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances;

Generating a normal distribution map according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution maps;

and calculating the overlapping area of the normal distribution diagrams of the evaluation indexes corresponding to the two pieces of music, and evaluating the similarity based on the overlapping area.

2. The style-specific music generation method of claim 1, wherein the tagging the original data to generate intermediate data, the intermediate data comprising a plurality of events comprises:

marking the original data based on the note start time and the note end time, and generating first marking data, wherein the first marking data comprises a preset number of note-on events and a preset number of note-off events;

marking the original data based on a preset time increment value, and generating second marking data, wherein the second marking data comprises a preset number of time shifting events;

Marking the original data based on a preset quantization speed, and generating third marking data, wherein the third marking data comprises a preset number of note playing speed events;

and merging the first mark data, the second mark data and the third mark data to generate intermediate data, wherein the intermediate data comprises a plurality of events.

3. The music generating method according to claim 1, wherein the error correction is performed on the decoded data based on a preset adjustment mechanism to obtain target data, and a final musical composition is generated according to the target data, and the adjustment mechanism includes melody adjustment, performance adjustment, and input disturbance including:

correcting the target data based on a preset melody and performance mechanism, deleting abnormal data, and generating first adjustment data;

And carrying out noise reduction processing according to the first regulation data, reducing input interference and generating a final musical composition.

4. A specific style-based music generation apparatus, characterized in that the specific style-based music generation apparatus includes:

The acquisition module is used for acquiring original data, wherein the original data comprises a musical instrument digital interface MIDI file for playing a piano and audio data for playing the piano;

the marking module is used for marking the original data to generate intermediate data, wherein the intermediate data comprises a plurality of events;

The coding module is used for extracting the characteristics of the intermediate data to generate performance input data and melody input data; inputting the performance input data into a first layer stack of a preset performance encoder, passing through a multi-head relative attention layer of the first layer stack, and transmitting the performance input data to a feedforward neural network of the first layer stack to generate a first performance fragment; inputting the first performance segment into a second layer stack of the preset performance encoder, iterating according to preset times, and generating a performance time segment based on data output by a feedforward neural network of the last layer stack; compressing the playing time segment to generate playing code data; inputting the melody input data into a preset melody encoder, passing through a multi-head relative attention layer of the preset melody encoder, and transmitting the melody input data to a feedforward neural network to obtain melody coded data; generating encoded data based on the performance encoded data and the melody encoded data;

the decoding module is used for inputting the coded data into a preset decoder to generate decoded data;

The adjusting module is used for carrying out error correction on the decoded data based on a preset adjusting mechanism to obtain target data, and generating a final musical composition according to the target data, wherein the adjusting mechanism comprises melody adjustment, performance adjustment and input interference; acquiring two pieces of musical compositions to be evaluated, and determining evaluation indexes, wherein the evaluation indexes comprise note density, pitch range, average change of pitch, overall change of pitch, average speed, speed change, average duration and duration change; respectively generating a plurality of evaluation index histograms of the two pieces of music based on the evaluation indexes, and calculating the mean value and the variance of each evaluation index to obtain a plurality of groups of mean values and variances; generating a normal distribution map according to the mean value and the variance of each evaluation index to obtain a plurality of groups of normal distribution maps; and calculating the overlapping area of the normal distribution diagrams of the evaluation indexes corresponding to the two pieces of music, and evaluating the similarity based on the overlapping area.

5. A specific style-based music generating apparatus, characterized in that the specific style-based music generating apparatus comprises: a memory and at least one processor, the memory having instructions stored therein;

The at least one processor invoking the instructions in the memory to cause the particular style based music generation apparatus to perform the particular style based music generation method of any of claims 1-3.

6. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement a particular style-based music generation method as claimed in any one of claims 1-3.