CN112863465A

CN112863465A - Music generation method and device based on context information and storage medium

Info

Publication number: CN112863465A
Application number: CN202110107935.XA
Authority: CN
Inventors: 曾坤; 吴尚达; 朱明杰; 林格
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-05-28
Anticipated expiration: 2041-01-27
Also published as: CN112863465B

Abstract

The invention discloses a music generation method, a device and a storage medium based on context information, wherein the method comprises the following steps: acquiring a music information file; converting a plurality of notes in the music information file into a first list sequence of elements; a melody generating step; extracting the tail of the first melody sequence to obtain a first tail sound sequence; inputting the first tail sound sequence into a preset neural network model based on context information to obtain a first ending sequence; repeating the melody generating step, the steps from the melody generating step to the melody connecting step, and the melody connecting step for N times; taking the second tuple list sequence obtained by the nth melody connecting step as a final tuple list sequence; and decoding the final tuple list sequence to obtain a new music information file. By adopting the embodiment of the invention, the practicability of music generation can be improved.

Description

Music generation method and device based on context information and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a music generation method and device based on context information and a storage medium.

Background

Music generation methods can be classified into rule-based systems and neural network systems according to the structure of the model and the manner in which it processes data. The main ideas of the rule-based system are: based on a set of pre-defined parameters, typically implemented through a set of tests or rules, to create musical compositions of the same style or genre. The computer needs to meet these tests or rules when generating music, and finally obtains the music works meeting the requirements. The main ideas of the neural network system are as follows: the program itself contains little or no specific music theory knowledge. They automatically learn intrinsic rules from example material provided by a user or programmer and then generate musical compositions similar to the example material based on these self-learned rules.

Generating music using a rule-based system is the most common music generation solution before neural networks became popular. However, these solutions involve a large number of subjective choices that are difficult to verify, resulting in the quality of the resulting work often being unsatisfactory.

In recent years, in order to realize automatic generation of high-quality music, many researchers have proposed new schemes based on neural networks. With the advent of more and more music data sets, the music generation model will be better able to learn the music style from the corpus and generate new music scores. At present, the music generation model based on the neural network mainly comprises a recurrent neural network, a convolutional neural network and a generative confrontation network.

However, the current music generation method based on neural network cannot generate chords or notes with duration shorter than sixteen notes or within irregular rhythm, and does not emphasize the particularity of the ending part of music so that the generated music lacks an obvious ending.

Disclosure of Invention

The embodiment of the invention provides a music generation method, a device and a storage medium based on context information, wherein any rhythm and any chord in a music information file are represented by a tuple list, and finally, an ending sequence is generated through an ending generation model, so that the generated music is ensured to have obvious ending characteristics, and the practicability of the generated music is improved.

A first aspect of an embodiment of the present application provides a music generation method based on context information, where the method includes:

acquiring a music information file; the music information file comprises a plurality of musical notes and sequencing information among the musical notes;

converting a plurality of notes in the music information file into a first tuple list sequence and taking the first tuple list sequence as an initial tuple list sequence; each tuple in the first tuple list sequence corresponds to a note, and each tuple comprises duration information and pitch information of the corresponding note; the pitch information includes absolute pitch information for a corresponding note;

a melody generating step: generating a first melody sequence according to the initial tuple list sequence;

extracting the tail of the first melody sequence to obtain a first tail sound sequence; the tail part consists of M tuples arranged at the last of the first melody sequence, and M is more than or equal to 1;

inputting the first tail sound sequence into a preset neural network model based on context information to obtain a first ending sequence;

melody connection step: connecting the first ending sequence with the first melody sequence to obtain a second tuple list sequence, and taking the second tuple list sequence as the initial tuple list sequence;

repeating the melody generating step, the steps from the melody generating step to the melody connecting step and the melody connecting step for N times, wherein N is more than or equal to 1; taking the second tuple list sequence obtained by the nth melody connecting step as a final tuple list sequence;

and decoding the final tuple list sequence to obtain a new music information file.

In a possible implementation manner of the first aspect, the converting a plurality of notes in the music information file into a first tuple list sequence and using the first tuple list sequence as an initial tuple list sequence further includes:

traversing all tonality of the initial tuple list sequence in a semitone displacement mode, changing the absolute pitch of notes corresponding to the initial tuple list sequence, and obtaining the initial tuple list sequence with expanded data size.

In a possible implementation manner of the first aspect, the converting a plurality of notes in the music information file into a first tuple list sequence, and using the first tuple list sequence as an initial tuple list sequence specifically includes:

each note in the music information file is represented as a corresponding tuple, and each tuple comprises duration information and pitch information of the corresponding note; the duration information is expressed by rational numbers, and the pitch information is expressed by character strings.

In a possible implementation manner of the first aspect, the generating a first melody sequence according to the initial tuple list sequence specifically includes:

extracting past information and future information of the initial tuple list sequence; the past information is a forward sequence with respect to the initial tuple list sequence, the future information is a reverse sequence with respect to the initial tuple list sequence;

and inputting the past information and the future information into a deep bidirectional LSTM network model to obtain a first melody sequence.

In one possible implementation form of the first aspect, the context information-based neural network model is a model based on a deep unidirectional LSTM network.

In one possible implementation form of the first aspect, the music information files are MIDI files.

A second aspect of an embodiment of the present application provides a music generating apparatus based on context information, including:

the music acquisition module is used for acquiring a music information file; the music information file comprises a plurality of musical notes and sequencing information among the musical notes;

a data encoding module for converting a plurality of notes in the music information file into a first tuple list sequence and taking the first tuple list sequence as an initial tuple list sequence; each tuple in the first tuple list sequence corresponds to a note, and each tuple comprises duration information and pitch information of the corresponding note; the pitch information includes absolute pitch information for a corresponding note;

the melody generating module is used for generating a first melody sequence according to the initial tuple list sequence;

the ending generation module is used for extracting the end of the first melody sequence to obtain a first tail sound sequence; the tail part consists of M tuples arranged at the last of the first melody sequence, and M is more than or equal to 1;

the ending generation module is further configured to input the first tail tone sequence into a preset neural network model based on context information to obtain a first ending sequence;

the ending generation module is further configured to connect the first ending sequence with the first melody sequence to obtain a second tuple list sequence, and use the second tuple list sequence as a final tuple list sequence;

and the music decoding module is used for decoding the final tuple list sequence to obtain a new music information file.

A third aspect of the embodiments of the present application provides a computer-readable storage medium, which includes a stored computer program, where when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the music generation method based on context information according to the foregoing embodiments.

Compared with the prior art, the music generation method, the device and the storage medium based on the context information provided by the embodiment of the invention adopt the tuple list representation method to represent the special rhythm type and the chord in the music information file, and completely convert the duration and the pitch information in the music information file into the tuple list sequence. In the tuple, the duration of each note is represented by a positive rational number, able to represent any duration; since the pitch attributes are stored in the form of a string, multiple pitches can be filled in while also supporting well the representation of chords. And a termination generation model is also introduced and is specially used for generating terminations to ensure that the generated music has obvious termination characteristics. The ending generation model can ensure that all endings have ending feelings and does not depend on post-processing, thereby greatly reducing the dependence on the music composing ability of the user and greatly improving the practicability of music generation and the practicability of the generated music.

In addition, the tonality of the music in the tuple sequence can be tonally expanded before the melody sequence is generated, so that enough data can be ensured under each key number. It not only ensures that the music generation model can generate music of any tonality, but also greatly expands the available music data sets.

Drawings

Fig. 1 is a flowchart illustrating a music generating method based on context information according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a music generating apparatus based on context information according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a tuple list representation method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating operation of a melody generation module according to an embodiment of the invention;

fig. 5 is a schematic diagram of an operation of an end generation module according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a first aspect of an embodiment of the present invention provides a music generating method based on context information, where the method includes:

s10, acquiring a music information file; the music information file includes a plurality of notes and ordering information among the plurality of notes.

S11, converting a plurality of notes in the music information file into a first tuple list sequence, and taking the first tuple list sequence as an initial tuple list sequence; each tuple in the first tuple list sequence corresponds to a note, and each tuple comprises duration information and pitch information of the corresponding note; the pitch information includes absolute pitch information for the corresponding note.

S12, melody generation step: and generating a first melody sequence according to the initial tuple list sequence.

S13, extracting the tail of the first melody sequence to obtain a first tail sound sequence; the end is composed of M tuples arranged at the end of the first melody sequence, wherein M is more than or equal to 1.

And S14, inputting the first tail sound sequence into a preset neural network model based on context information to obtain a first ending sequence.

S15, melody connection step: and connecting the first ending sequence with the first melody sequence to obtain a second tuple list sequence, and taking the second tuple list sequence as the initial tuple list sequence.

S16, repeating the melody generating step, the steps from the melody generating step to the melody connecting step and the melody connecting step for N times, wherein N is more than or equal to 1; and taking the second tuple list sequence obtained by the nth melody connecting step as a final tuple list sequence.

And S17, decoding the final tuple list sequence to obtain a new music information file.

In S16, the loop executes N times of S12 to S16, which is repeated to repeat the tuple list, and multiple iterations can maximally ensure the overall harmony of the music recorded in the last output music information file.

Compared with the prior art, the music generation method, the device and the storage medium based on the context information provided by the embodiment of the invention adopt the tuple list representation method to represent the special rhythm type and the chord in the music information file, and completely convert the duration and the pitch information in the music information file into the tuple list sequence. In the tuple, the duration of each note is represented by a positive rational number, able to represent any duration; since the pitch attributes are stored in the form of a string, multiple pitches can be filled in while also supporting well the representation of chords. And a termination generation model is also introduced and is specially used for generating terminations to ensure that the generated music has obvious termination characteristics. The ending generating model can ensure that all endings have ending feelings, does not depend on post-processing, and greatly reduces the dependence on the user's own composing ability.

Exemplarily, S11 further includes:

In contrast to the conventional method of unifying the key of music data to the basic key, in this step, all the key pitches of all the music data in the unary group list sequence are traversed in the range of [ -6, +6) by semitone displacement, so that equal parts of music data and enough music data can be provided for the model to provide learning materials under any key pitch. Finally, the amount of music data available can be expanded up to 12 times compared to the original data set.

Since only the tonic is changed without changing the style class when enhancing the data, the style characteristics of the music information file are preserved. Although the transposition changes the absolute pitch of the notes, the rhythm of the notes, and the relative pitch between the notes, is not changed. Therefore, except for the change of the register, the rest information of the music information file is not changed, and the style of the music information file is not changed.

Exemplarily, S11 specifically includes:

Referring to fig. 3, each note in the music information file is represented as a tuple: (D, P), wherein D represents the duration of the note and is expressed by rational numbers; p represents the pitch of the note and is represented by a character string. In particular, when the note is a chord, P will contain a plurality of pitches, and in this way, the chord will be converted into a string containing information of a plurality of pitches; when the note is rest, P will contain a symbol to indicate that it is an unvoiced note.

The tuple list representation method can well represent music information files containing special rhythm patterns and chords, which lays a foundation for the implementation of a music generation method supporting the two special notes.

It should be noted that fig. 3 is a schematic example of a data tuple list representation with a special rhythm type and chord. For the conventional rhythm type, the time value thereof is expressed in decimal form; for a particular tempo type, its time value is expressed in fractional form. For a single tone, its pitch is represented as the corresponding sound name and sound group; for chords, their pitch is converted into a sequence of numbers.

Exemplarily, S12 specifically includes:

extracting past information and future information of the initial tuple list sequence; the past information is a forward sequence with respect to the initial tuple list sequence and the future information is a reverse sequence with respect to the initial tuple list sequence.

The two-way structure is used in the process of generating the first melody sequence, and the basic idea of the two-way structure is to represent each sequence to two independent recursive hidden layers in a forward and reverse mode and finally connect the two independent recursive hidden layers to the same output layer. The bi-directional structure provides complete, symmetric past and future information for each note generated, which allows for more accurate prediction and generation of melodic sequences.

Referring to FIG. 4, the tuple list L is the input layer, the melody list M is the output layer, and H is the hidden layer. After the input music information file is obtained, the music information file is sampled and encoded, and is converted into an initial tuple list sequence L ═ L₁,…,l_t,…,l_T) Where T represents the length of the original tuple list sequence. Then, the past information and the future information will be extracted from L as input to the N-layer deep bi-directional LSTM network model. Finally, the network sets the output melody sequence M to (M)₁,…,m_k,…,m_K) And transmitting the melody sequence to an ending generation module, wherein K represents the length of the melody sequence.

In practice, the deep bi-directional LSTM network can be viewed as optimizing a differentiable error function:

wherein S_trainRepresenting the total number of sequences in the training data and w is the weight between the network nodes. The training goal of the model is to minimize the cross entropy loss function, namely, the Melody sequence M ═ M (M) generated by the alignment model₁…,m_k,…,m_K) And real data

The difference between them. For a particular input sequence j, the error function can be expressed as:

wherein K_jIs the length of the sequence j, C_numIs the number of classifications. In each iteration of the step, the model calculates the weight w and the bias value b by the following formula:

S_dw(r)＝β·S_dw(r-1)+(1-β)dw²(r-1)，

S_db(r)＝β·S_db(r-1)+(1-β)db²(r-1),

in the above formula, S_dw(r) and S_db(r) is the exponentially weighted average of dw and db in the r-th iteration, respectively, β is the momentum value, α is the learning rate, and w (r) and b (r) are the updated values of the weight w and bias value b, respectively, after the r-th iteration. ε is a decimal number to provide stability of the value. If the validation error does not change significantly after the R-th iteration, the model considers it to have converged.

Illustratively, the context information-based neural network model is a deep unidirectional LSTM network-based model.

To accurately predict and account for information not present after the ending, the neural network model uses a deep one-way LSTM network to generate the ending part. As shown in fig. 5, the input to the neural network model is the tail-tone sequence C ═ (C)₁,…,c_i,…,c_I) Wherein I represents the length of the coding sequence. The tail-sound sequence C is extracted from the end of the first melody sequence M. The output of this deep LSTM network is the ending sequence E, a set of notes of fixed duration. The neural network model will connect the first melody sequence M and the first ending sequence E as a new tuple list sequence L^*. If the generation has not ended, L^*Will be used as the initial tuple sequence for melody iteration generation to the next time. Otherwise, L^*Will be decoded, converted and output as a new music information file.

In the generation process, the neural network model can continuously adjust the connection between the melody sequence M and the ending sequence E to make the transition sound more natural, and meanwhile, the ending generator can also make the transition sound more naturalConstantly reevaluating new tuple list sequences L^*Whether or not updating to a new end E is required^*. It is worth mentioning that this non-linear authoring process also corresponds to the way in which human composers write spectra.

Illustratively, the music information file is a MIDI file.

Referring to fig. 2, an embodiment of the present invention provides a music generating apparatus based on context information, including: a music acquisition module 20, a data encoding module 21, a melody generation module 22, an end generation module 23, and a music decoding module 24.

A music obtaining module 20, configured to obtain a music information file; the music information file comprises a plurality of musical notes and sequencing information among the musical notes;

a data encoding module 21, configured to convert a plurality of notes in the music information file into a first tuple list sequence, and use the first tuple list sequence as an initial tuple list sequence; each tuple in the first tuple list sequence corresponds to a note, and each tuple comprises duration information and pitch information of the corresponding note; the pitch information includes absolute pitch information for a corresponding note;

a melody generating module 22, configured to generate a first melody sequence according to the initial tuple list sequence;

an end generating module 23, configured to extract an end of the first melody sequence to obtain a first tail sound sequence; the tail part consists of M tuples arranged at the last of the first melody sequence, and M is more than or equal to 1;

the ending generating module 23 is further configured to input the first tail sequence into a preset neural network model based on context information, so as to obtain a first ending sequence;

the ending generating module 23 is further configured to connect the first ending sequence with the first melody sequence to obtain a second tuple list sequence, and use the second tuple list sequence as a final tuple list sequence;

and the music decoding module 24 is configured to decode the final tuple list sequence to obtain a new music information file.

Fig. 2 shows a music generating apparatus framework according to the present invention. The device consists of a music acquisition module 20, a data coding module 21, a melody generating module 22, an end generating module 23 and a music decoding module 24, wherein the data coding module 21 is responsible for converting an input music information file (MIDI) into a tuple list sequence; the melody generation module 22 generates a new melody sequence based on the existing tuple list sequence; the ending generation module 23 extracts the ending sound sequence from the melody sequence, and based on this, outputs a matching ending part ending sequence, and combines the melody sequence and the ending sequence. If the iteration is not finished, the combined sequence is used as the input of the melody generation module 22, otherwise, the combined sequence is sent to the music decoding module for decoding and is output as a new music information file in the MIDI file format.

Computer readable storage media for embodiments of the present invention may be computer readable signal media or computer readable storage media or any combination of the two. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable storage medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method for generating music based on context information, comprising:

2. The method of contextual information based music generation as recited in claim 1, further comprising, after said converting a plurality of notes in said music information file into a first tuple list sequence and treating said first tuple list sequence as an initial tuple list sequence:

3. The method for generating music based on contextual information according to claim 1, wherein said converting a plurality of notes in said music information file into a first tuple list sequence and using said first tuple list sequence as an initial tuple list sequence specifically comprises:

4. The method of claim 1, wherein the generating the first melody sequence according to the initial tuple list sequence comprises:

5. The contextual information based music generation method of claim 1, wherein said contextual information based neural network model is a deep unidirectional LSTM network based model.

6. The method of generating music based on contextual information according to claim 1, wherein the music information files are MIDI files.

7. An apparatus for generating music based on context information, comprising:

8. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the method for music generation based on contextual information according to any one of claims 1 to 6.