CN113035161A

CN113035161A - Chord-based song melody generation method, device, equipment and storage medium

Info

Publication number: CN113035161A
Application number: CN202110285841.1A
Authority: CN
Inventors: 刘奡智; 韩宝强; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2021-06-25
Anticipated expiration: 2041-03-17
Also published as: CN113035161B

Abstract

The invention relates to the field of artificial intelligence, and discloses a chord-based song melody generation method, a chord-based song melody generation device, chord-based song melody generation equipment and a chord-based song melody storage medium, which are applied to the field of intelligent education and are used for improving the audibility of song melodies and improving the generation efficiency of song melodies. The method comprises the following steps: acquiring target lyrics input by a user and target chords selected by the user in advance; generating a target tone vector according to the target lyrics and generating a target chord vector according to the target chord; combining the target tone vector and the target chord vector to generate a target tone chord vector; inputting the target tone chord vector into a trained Transformer model to obtain an output characteristic vector, wherein the output characteristic vector comprises a target lyric pitch vector and a target lyric time value vector; and generating the target song melody according to the target lyric pitch vector and the target lyric time value vector.

Description

Chord-based song melody generation method, device, equipment and storage medium

Technical Field

The invention relates to the field of audio conversion, in particular to a chord-based song melody generation method, device, equipment and storage medium.

Background

Under normal conditions, the creation of songs is a difficult task and needs to be adjusted according to the comprehensive effect of songs and the inspiration of creators. With the popularization of artificial intelligence technology, it becomes possible to generate songs directly by using the artificial intelligence technology. Because song creation is a comprehensive art, besides word writing, multiple links such as composition and singing are carried out step by step.

The generation of songs needs to consider different music characteristics, the expression forms of the music characteristics are also various, melody is the most important factor influencing the songs, and the melody generated by the existing scheme has low audibility and low melody quality.

Disclosure of Invention

The invention provides a chord-based song melody generation method, device and equipment and a storage medium, which are used for improving the quality of song melodies, improving the audibility of the song melodies and improving the generation efficiency of the song melodies.

A first aspect of an embodiment of the present invention provides a method for generating a song melody based on a chord, including: acquiring target lyrics input by a user and target chords preselected by the user; generating a target tone vector according to the target lyrics and generating a target chord vector according to the target chord; combining the target key vector and the target chord vector to generate a target key chord vector; inputting the target tone chord vector into a trained Transformer model to obtain an output characteristic vector, wherein the output characteristic vector comprises a target lyric pitch vector and a target lyric duration vector; and generating a target song melody according to the target lyric pitch vector and the target lyric time value vector.

Optionally, in a first implementation manner of the first aspect of the embodiment of the present invention, before the obtaining the target lyric input by the user and the target chord pre-selected by the user, the method for generating the song melody based on the chord further includes: acquiring preset training data, wherein the preset training data comprises a digital music score of a plurality of songs; and training a preset initial model by using the preset training data to obtain a Transformer model.

Optionally, in a second implementation manner of the first aspect of the embodiment of the present invention, the training a preset initial model by using the preset training data to obtain a Transformer model includes: obtaining a plurality of digital music scores from preset training data, wherein each digital music score is used for indicating the lyrics, the tone, the chord, the pitch and the duration of a song; extracting tone information, chord information, pitch information and duration information of the songs from the digital music score corresponding to each song; sequentially generating a tone vector T, a chord vector C, a pitch vector P and a duration vector D of the song according to the tone information, the chord information, the pitch information and the duration information, and respectively combining the tone vector T, the chord vector C, the pitch vector P and the duration vector D of a plurality of songs to obtain a tone vector sequence T, a chord vector sequence C, a pitch vector sequence P and a duration vector sequence D; and training the preset initial model by using the tone vector sequence T, the chord vector sequence C, the pitch vector sequence P and the duration vector sequence D to obtain the Transformer model.

Optionally, in a third implementation manner of the first aspect of the embodiment of the present invention, the training the preset initial model by using the tone vector sequence T, the chord vector sequence C, the pitch vector sequence P, and the duration vector sequence D to obtain the transform model includes: converting the pitch vector sequence T to [ T ]₁,t₂,t₃,…,t_n]And chord vector sequence C ═ C₁,c₂,c₃,…,c_n]Combining to obtain a first high-dimensional vector Input, where Input [ [ t ]₁,c₁],[t₂,c₂],[t₃,c₃],…,[t_n,c_n]](ii) a Let pitch vector sequence P ═ P₁,p₂,p₃,…,p_n]And the sequence of duration vectors D ═ D₁,d₂,d₃,…,d_n]Combining to obtain a second high-dimensional vector Output, Output [ [ p ]₁,d₁],[p₂,d₂],[p₃,d₃],…,[p_n,d_n](ii) a And taking the first high-dimensional vector Input as an Input vector and the second high-dimensional vector Output as an Output vector, and training the preset initial model to obtain a Transformer model.

Optionally, in a fourth implementation manner of the first aspect of the embodiment of the present invention, the extracting tone information, chord information, pitch information, and duration information of the song from the digital music score includes: converting a plurality of digital music scores into an XML format, wherein each digital music score corresponds to one song; reading lyric information from the digital music score in the XML format to obtain song Chinese characters in each digital music score; determining tone information and pitch information corresponding to each word in each song according to the lyric Chinese characters in each digital music score; determining corresponding chord information and time value information according to each digital music score; tone information, chord information, pitch information, and duration information corresponding to the plurality of songs are generated.

Optionally, in a fifth implementation manner of the first aspect of the embodiment of the present invention, the training the preset initial model to obtain the fransformer model by using the first high-dimensional vector Input as an Input vector and the second high-dimensional vector Output as an Output vector includes: inputting the first high-dimensional vector Input into an encoder at the head end of an encoding assembly in a preset initial model, wherein the encoding assembly comprises a plurality of encoders which are connected in sequence, and each encoder comprises a multi-head attention layer and a feedforward network layer; inputting a result output by an encoder at the tail end in an encoding assembly into each multi-head attention layer in a decoding assembly, wherein the decoding assembly comprises a plurality of decoders which are connected in sequence, and each decoder comprises a mask multi-head attention layer, a multi-head attention layer and a feedforward network layer; inputting the second high-dimensional vector Output into a mask multi-head attention layer of a coder at the head end in a decoding assembly in a preset initial model; and inputting the output result of the encoder at the tail end in the decoding assembly into a linear network to obtain a Transformer model.

Optionally, in a sixth implementation manner of the first aspect of the embodiment of the present invention, the generating a target song melody according to the target lyric pitch vector and the target lyric duration vector includes: aligning the target lyric pitch vector with the target lyric time value vector; generating a melody line of the target lyric based on the target lyric pitch vector; generating the beat of the target lyric according to the target lyric time value vector; and generating a target song melody according to the melody line of the target lyric and the beat of the target lyric.

A second aspect of an embodiment of the present invention provides a chord-based song melody generating apparatus, including: the obtaining module is used for obtaining target lyrics input by a user and target chords selected by the user in advance; the first generation module is used for generating a target tone vector according to the target lyrics and generating a target chord vector according to the target chord; a combination module for combining the target tone vector and the target chord vector to generate a target tone chord vector; the input module is used for inputting the target tone chord vector into a trained transform model to obtain an output characteristic vector, and the output characteristic vector comprises a target lyric pitch vector and a target lyric time value vector; and the second generation module is used for generating the target song melody according to the target lyric pitch vector and the target lyric duration vector.

Optionally, in a first implementation manner of the second aspect of the embodiment of the present invention, the chord-based song melody generating apparatus further includes: the data acquisition module is used for acquiring preset training data, and the preset training data comprises a digital music score of a plurality of songs; and the training module is used for training a preset initial model by using the preset training data to obtain a Transformer model.

Optionally, in a second implementation manner of the second aspect of the embodiment of the present invention, the training module includes: a score obtaining unit for obtaining a plurality of digital scores from preset training data, wherein each digital score is used for indicating lyrics, tone, chord, pitch and duration of a song; an information acquisition unit for extracting tone information, chord information, pitch information and duration information of a song from a digital music score; the sequence generating unit is used for sequentially generating a tone vector T, a chord vector C, a pitch vector P and a duration vector D of the song according to the tone information, the chord information, the pitch information and the duration information, and respectively combining the tone vector T, the chord vector C, the pitch vector P and the duration vector D of a plurality of songs to obtain a tone vector sequence T, a chord vector sequence C, a pitch vector sequence P and a duration vector sequence D; and the model training unit is used for training the preset initial model by utilizing the tone vector sequence T, the chord vector sequence C, the pitch vector sequence P and the duration vector sequence D to obtain the Transformer model.

Optionally, in a third implementation manner of the second aspect of the embodiment of the present invention, the model training unit specifically includes: a first combining subunit for combining the pitch vector sequence T ═ T₁,t₂,t₃,…,t_n]And chord vector sequence C ═ C₁,c₂,c₃,…,c_n]Combining to obtain a first high-dimensional vector Input, where Input [ [ t ]₁,c₁],[t₂,c₂],[t₃,c₃],…,[t_n,c_n]](ii) a A second combination subunit for combining the pitch vector sequence P ═ P₁,p₂,p₃,…,p_n]And the sequence of duration vectors D ═ D₁,d₂,d₃,…,d_n]Combining to obtain a second high-dimensional vector Output, Output [ [ p ]₁,d₁],[p₂,d₂],[p₃,d₃],…,[p_n,d_n](ii) a And the training subunit is used for training the preset initial model to obtain the transform model by taking the first high-dimensional vector Input as an Input vector and the second high-dimensional vector Output as an Output vector.

Optionally, in a fourth implementation manner of the second aspect of the embodiment of the present invention, the information obtaining unit is specifically configured to: converting a plurality of digital music scores into an XML format, wherein each digital music score corresponds to one song; reading lyric information from the digital music score in the XML format to obtain song Chinese characters in each digital music score; determining tone information and pitch information corresponding to each word in each song according to the lyric Chinese characters in each digital music score; determining corresponding chord information and time value information according to each digital music score; tone information, chord information, pitch information, and duration information corresponding to the plurality of songs are generated.

Optionally, in a fifth implementation manner of the second aspect of the embodiment of the present invention, the training subunit is specifically configured to: inputting the first high-dimensional vector Input into an encoder at the head end of an encoding assembly in a preset initial model, wherein the encoding assembly comprises a plurality of encoders which are connected in sequence, and each encoder comprises a multi-head attention layer and a feedforward network layer; inputting a result output by an encoder at the tail end in an encoding assembly into each multi-head attention layer in a decoding assembly, wherein the decoding assembly comprises a plurality of decoders which are connected in sequence, and each decoder comprises a mask multi-head attention layer, a multi-head attention layer and a feedforward network layer; inputting the second high-dimensional vector Output into a mask multi-head attention layer of a coder at the head end in a decoding assembly in a preset initial model; and inputting the output result of the encoder at the tail end in the decoding assembly into a linear network to obtain a Transformer model.

Optionally, in a sixth implementation manner of the second aspect of the embodiment of the present invention, the second generating module is specifically configured to: aligning the target lyric pitch vector with the target lyric time value vector; generating a melody line of the target lyric based on the target lyric pitch vector; generating the beat of the target lyric according to the target lyric time value vector; and generating a target song melody according to the melody line of the target lyric and the beat of the target lyric.

A third aspect of embodiments of the present invention provides a chord-based song melody generating device, a memory having instructions stored therein and at least one processor, the memory and the at least one processor being interconnected by a line; the at least one processor invokes the instructions in the memory to cause the chord-based song melody generating device to perform the chord-based song melody generating method described above.

A fourth aspect of an embodiment of the present invention provides a computer-readable storage medium storing instructions that, when executed by a processor, implement the steps of the chord-based song melody generation method according to any one of the above-described embodiments.

According to the technical scheme provided by the embodiment of the invention, the target lyrics input by a user and the target chord selected by the user in advance are obtained; generating a target tone vector according to the target lyrics and generating a target chord vector according to the target chord; combining the target tone vector and the target chord vector to generate a target tone chord vector; inputting the target tone chord vector into a trained Transformer model to obtain an output characteristic vector, wherein the output characteristic vector comprises a target lyric pitch vector and a target lyric time value vector; and generating the target song melody according to the target lyric pitch vector and the target lyric time value vector. According to the embodiment of the invention, the initial Transformer model is trained through the chord, the pitch, the tone and the time value, and then the trained Transformer model is used for generating the target song melody, so that the quality of the song melody is improved, the audibility of the song melody is improved, and the generation efficiency of the song melody is further improved.

Drawings

FIG. 1 is a diagram of an embodiment of a chord-based song melody generation method according to an embodiment of the present invention;

FIG. 2 is a diagram of another embodiment of a chord-based song melody generation method according to an embodiment of the present invention;

FIG. 3 is a diagram of an embodiment of a chord-based song melody generating apparatus according to an embodiment of the present invention;

FIG. 4 is a diagram of another embodiment of the song melody generating device based on the chord in the embodiment of the present invention;

fig. 5 is a diagram illustrating an embodiment of a chord-based song melody generating apparatus according to an embodiment of the present invention.

Detailed Description

The invention provides a chord-based song melody generation method, device and equipment and a storage medium, which are used for improving the quality of song melodies and the audibility of the song melodies and further improving the generation efficiency of the song melodies.

In order to make the technical field of the invention better understand the scheme of the invention, the embodiment of the invention will be described in conjunction with the attached drawings in the embodiment of the invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, a flowchart of a method for generating a song melody based on a chord according to an embodiment of the present invention specifically includes:

101. and acquiring the target lyrics input by the user and the target chord selected by the user in advance.

The server obtains target lyrics input by a user and a target chord selected in advance by the user, wherein the target lyrics are character combinations input by the user, and the target chord is the target chord selected by the user in preset chords. Wherein, when the user inputs the word combination thought by himself, the word combination is used as the lyric of the song wanted to be composed, for example, the target lyric can be 'love oneself, love you by someone, this optimistic saying word'. The preset chords include a third chord, a seventh chord, a main fourth chord, a main sixth chord, a secondary sixth chord and the like, and are not described in detail herein.

It is to be understood that the executing body of the present invention may be the chord-based song melody generating apparatus, or may be the server, and is not limited thereto. The embodiment of the present invention is described by taking a server as an execution subject.

102. And generating a target tone vector according to the target lyrics and generating a target chord vector according to the target chord.

The server generates a target pitch vector according to the target lyrics and generates a target chord vector according to the target chord.

Wherein, the length of the target lyrics is not limited, for example, the lyrics of song "forget tonight" is "forget tonight and forget tonight, no matter how much ever the sky is career and haijiao, all the miles of China embrace together, wish to be good in the country, go bad in the night, forget tonight, hope tonight, no matter how much the sky is career and haijiao, all the miles of China embrace together, wish to be good in the country, go good in the country, wish to be good in the country; the Qingshan Zhao and the Weitai Zhang are distinguished, no matter new friends are handed over with the old, the Qingshan is invited again in the next year spring, the Qingshan is not old, the Qingshan is good in wish country and good in wish country, and the Gosshan comprises 129 Chinese characters; the lyrics of the song "sweet honey" are that "sweet you laugh to get sweet, as if flowers are bloomed in the spring wind, and bloomed in the spring wind, where you are seen, your smile is familiar with that i do not want to start at a moment, he is in dream, and you are seen in dream of dream, and sweet laugh to get sweet, that is, you are dreamful, where you are seen, your smile is familiar with that i does not want to start at a moment, he is in dream, where you are seen, your smile is familiar with that i do not want to start at a moment, he is in dream, and you are in dream; you are seen in dream, and the sweet laughs are much more sweet, namely you, dreams, and where you are seen, your smile is familiar with the fact that I does not want to start at any time, and then in dream, including 179 Chinese characters.

It should be noted that, for different songs, the lengths of the corresponding lyrics may be different, the numbers of the corresponding chinese characters are also different, the number of generated tones is also different, and one chord may correspond to a plurality of chinese characters (i.e., a repeated chord occurs), so that the lengths of the target pitch vector and the target chord vector are the same, which is convenient for alignment, and details are not repeated here.

103. And combining the target tone vector and the target chord vector to generate the target tone chord vector.

The server combines the target key vector and the target chord vector to generate a target key chord vector.

Specifically, the tone and chord of the lyrics are combined into a higher-dimensional vector. For example, let the pitch vector sequence T ═ T₁,t₂,t₃,…,t_n]And chord vector sequence C ═ C₁,c₂,c₃,…,c_n]Combining to obtain high dimensional quantity Input [ [ t ]₁,c₁],[t₂,c₂],[t₃,c₃],…,[t_n,c_n]]。

104. And inputting the target tone chord vector into a trained Transformer model to obtain an output characteristic vector, wherein the output characteristic vector comprises a target lyric pitch vector and a target lyric time value vector.

And the server inputs the target tone chord vector into a trained Transformer model to obtain an output characteristic vector, wherein the output characteristic vector comprises a target lyric pitch vector and a target lyric time value vector.

For example, if the Input vector is Input [ [ t ]₁,c₁],[t₂,c₂],[t₃,c₃],…,[t_n,c_n]]Obtaining an Output characteristic vector Output [ [ p ] through a trained Transformer model₁,d₁],[p₂,d₂],[p₃,d₃],…,[p_n,d_n]Including the vector as a pitch vector p₁,p₂,p₃,…,p_nAnd the value vector d₁,d₂,d₃,…,d_n。

105. And generating the target song melody according to the target lyric pitch vector and the target lyric time value vector.

Specifically, the server aligns a target lyric pitch vector and a target lyric duration vector; the server generates a melody line of the target lyrics based on the pitch vector of the target lyrics; the server generates the beat of the target lyrics according to the time value vector of the target lyrics; the server generates a target song melody according to the melody line of the target lyric and the beat of the target lyric.

According to the embodiment of the invention, the initial Transformer model is trained through the chord, the pitch, the tone and the time value, and then the trained Transformer model is used for generating the target song melody, so that the quality of the song melody is improved, the audibility of the song melody is improved, and the generation efficiency of the song melody is further improved. And this scheme can be applied to in the wisdom education field to promote the construction in wisdom city.

Referring to fig. 2, another flowchart of the method for generating a song melody based on a chord according to the embodiment of the present invention specifically includes:

201. preset training data is obtained, the preset training data including a digital score of a plurality of songs.

The server obtains preset training data, wherein the preset training data comprises a digital music score of a plurality of songs.

It should be noted that, in this embodiment, the digital music score at least includes lyrics, tones, chords, pitches and duration values, and may further include other musical features, for example, rest characters, end lines, accent marks, and the like, which are not limited herein.

For example, if a song "keyword" is used as training data, it is necessary to determine a digital music score of the song, including lyrics "love oneself, love you by someone, the optimistic saying words, happiness, i feel good and real, cannot find adjectives, silence under the cover, quickly flood the enthusiasm, only leave words and help words, have a step, when you shout me name in mouth, position of fallen leaves, and spectrum a poem, time is elapsed, our story begins, which is the first time, i see love, can be generous and selfish, you are keywords, not very certain, the best mode of love, is a verb or noun, want to tell you, most naked feelings, but forget words, always disperse, and laugh, i do not worry about laterals, have a step, have a name in mind, position of fallen leaves, spectrum a poem, time is first time, the story is started, the story is first time, i can see love, i can be generous and selfish, i is my keyword, i is hidden in lyrics, represents meaning, is proper noun, and the position of fallen leaves is used for making a poem. Taking the first sentence of lyrics "good for oneself, love you for someone, this optimistic saying word" as an example, the corresponding tones are "3, 4, 3, 2, 4, 3, 4, 1, 0, 1, 2", respectively, wherein the numbers 0, 1, 2, 3, 4 represent the whispering, first, second, third and fourth sounds of the chinese character tone, respectively, the corresponding pitches are "D5, E5, D5, E5, a4, E5, D5, E5, D5, E5, a4, E5, D5, E5, D5, E5, G4", the corresponding values are "0.25, 0.5-, 0.25, 0.5, 0.25 for the remaining corresponding chord, 0.25 for the corresponding _, 0.25, and will not be described herein in detail.

202. And training the preset initial model by using preset training data to obtain a Transformer model.

Specifically, the server acquires a plurality of digital music scores from preset training data, wherein each digital music score is used for indicating the lyrics, the tone, the chord, the pitch and the duration of a song; the server extracts tone information, chord information, pitch information and duration information of the songs from the digital music score; the server sequentially generates a tone vector T, a chord vector C, a pitch vector P and a duration vector D corresponding to each song according to the tone information, the chord information, the pitch information and the duration information, and combines the tone vector T, the chord vector C, the pitch vector P and the duration vector D of a plurality of songs respectively to obtain a tone vector sequence T, a chord vector sequence C, a pitch vector sequence P and a duration vector sequence D; and the server trains a preset initial model by using the tone vector sequence T, the chord vector sequence C, the pitch vector sequence P and the duration vector sequence D to obtain a Transformer model.

Optionally, the training, by the server, of the preset initial model by using the tone vector sequence T, the chord vector sequence C, the pitch vector sequence P, and the duration vector sequence D to obtain the transform model specifically includes: the server sets the pitch vector sequence T to T₁,t₂,t₃,…,t_n]And chord vector sequence C ═ C₁,c₂,c₃,…,c_n]Combining to obtain a first high-dimensional vector Input, where Input [ [ t ]₁,c₁],[t₂,c₂],[t₃,c₃],…,[t_n,c_n]](ii) a The server sets the pitch vector sequence P as [ P ]₁,p₂,p₃,…,p_n]And the sequence of duration vectors D ═ D₁,d₂,d₃,…,d_n]Combining to obtain a second high-dimensional vector Output, Output [ [ p ]₁,d₁],[p₂,d₂],[p₃,d₃],…,[p_n,d_n](ii) a And the server takes the first high-dimensional vector Input as an Input vector and takes the second high-dimensional vector Output as an Output vector, and trains a preset initial model to obtain a Transformer model.

Optionally, the extracting, by the server, the key information, chord information, pitch information, and duration information of the song from the digital music score specifically includes: the server converts a plurality of digital music scores into an XML format, wherein each digital music score corresponds to one song; the server reads lyric information from the digital music score in the XML format to obtain song Chinese characters in each digital music score; the server determines tone information and pitch information corresponding to each word in each song according to the lyric Chinese characters in each digital music score; the server determines corresponding chord information and time value information according to each digital music score; the server generates tone information, chord information, pitch information, and duration information corresponding to the plurality of songs.

Optionally, the training, by the server, the preset initial model to obtain the Transformer model by using the first high-dimensional vector Input as an Input vector and the second high-dimensional vector Output as an Output vector specifically includes: the server inputs the first high-dimensional vector Input into an encoder at the head end of an encoding assembly in a preset initial model, the encoding assembly comprises a plurality of encoders which are connected in sequence, and each encoder comprises a multi-head attention layer and a feedforward network layer; the server inputs the output result of the encoder at the tail end in the encoding assembly into each multi-head attention layer in the decoding assembly, the decoding assembly comprises a plurality of decoders which are connected in sequence, and each decoder comprises a mask multi-head attention layer, a multi-head attention layer and a feedforward network layer; the server inputs the second high-dimensional vector Output to a mask multi-head attention layer of an encoder at the head end in a decoding assembly in a preset initial model; and the server inputs the output result of the encoder at the tail end in the decoding assembly into a linear network to obtain a Transformer model.

203. And acquiring the target lyrics input by the user and the target chord selected by the user in advance.

204. And generating a target tone vector according to the target lyrics and generating a target chord vector according to the target chord.

205. And combining the target tone vector and the target chord vector to generate the target tone chord vector.

206. And inputting the target tone chord vector into a trained Transformer model to obtain an output characteristic vector, wherein the output characteristic vector comprises a target lyric pitch vector and a target lyric time value vector.

207. And generating the target song melody according to the target lyric pitch vector and the target lyric time value vector.

In the above description of the chord-based song melody generating method according to the embodiment of the present invention, referring to fig. 3, a chord-based song melody generating device according to the embodiment of the present invention is described below, and an embodiment of the chord-based song melody generating device according to the embodiment of the present invention includes:

an obtaining module 301, configured to obtain a target lyric input by a user and a target chord pre-selected by the user;

a first generating module 302, configured to generate a target pitch vector according to the target lyric and generate a target chord vector according to the target chord;

a combining module 303, configured to combine the target tone vector and the target chord vector to generate a target tone chord vector;

an input module 304, configured to input the target tone and chord vector into a trained transform model to obtain an output feature vector, where the output feature vector includes a target lyric pitch vector and a target lyric duration vector;

a second generating module 305, configured to generate a target song melody according to the target lyric pitch vector and the target lyric duration vector.

Referring to fig. 4, another embodiment of the chord-based song melody generating apparatus according to the embodiment of the present invention includes:

Optionally, the chord-based song melody generating apparatus further includes:

a data obtaining module 306, configured to obtain preset training data, where the preset training data includes a digital music score of a plurality of songs;

and the training module 307 is configured to train a preset initial model by using the preset training data to obtain a Transformer model.

Optionally, the training module 307 includes:

a score obtaining unit 3071 for obtaining a plurality of digital scores from preset training data, wherein each digital score is used for indicating lyrics, tone, chord, pitch and duration of a song;

an information obtaining unit 3072 for extracting key information, chord information, pitch information and duration information of the song from the digital music score;

a sequence generating unit 3073, configured to sequentially generate a tone vector T, a chord vector C, a pitch vector P, and a duration vector D of a song according to the tone information, the chord information, the pitch information, and the duration information, and combine the tone vector T, the chord vector C, the pitch vector P, and the duration vector D of a plurality of songs respectively to obtain a tone vector sequence T, a chord vector sequence C, a pitch vector sequence P, and a duration vector sequence D;

and the model training unit 3074 is configured to train the preset initial model by using the tone vector sequence T, the chord vector sequence C, the pitch vector sequence P, and the duration vector sequence D to obtain the Transformer model.

Optionally, the model training unit 3074 includes:

a first combining subunit 30741 for combining the pitch vector sequence T ═ T₁,t₂,t₃,…,t_n]And chord vector sequence C ═ C₁,c₂,c₃,…,c_n]Combining to obtain a first high-dimensional vector Input, where Input [ [ t ]₁,c₁],[t₂,c₂],[t₃,c₃],…,[t_n,c_n]]；

A second combination subunit 30742 for combining the pitch vector sequence P ═ P₁,p₂,p₃,…,p_n]And the sequence of duration vectors D ═ D₁,d₂,d₃,…,d_n]Combining to obtain a second high-dimensional vector Output, Output [ [ p ]₁,d₁],[p₂,d₂],[p₃,d₃],…,[p_n,d_n]；

The training subunit 30743 is configured to train the preset initial model to obtain the Transformer model, where the first high-dimensional vector Input is used as an Input vector, and the second high-dimensional vector Output is used as an Output vector.

Optionally, the information obtaining unit 3072 is specifically configured to:

converting a plurality of digital music scores into an XML format, wherein each digital music score corresponds to one song; reading lyric information from the digital music score in the XML format to obtain song Chinese characters in each digital music score; determining tone information and pitch information corresponding to each word in each song according to the lyric Chinese characters in each digital music score; determining corresponding chord information and time value information according to each digital music score; tone information, chord information, pitch information, and duration information corresponding to the plurality of songs are generated.

Optionally, the training subunit 30743 is specifically configured to:

inputting the first high-dimensional vector Input into an encoder at the head end of an encoding assembly in a preset initial model, wherein the encoding assembly comprises a plurality of encoders which are connected in sequence, and each encoder comprises a multi-head attention layer and a feedforward network layer; inputting a result output by an encoder at the tail end in an encoding assembly into each multi-head attention layer in a decoding assembly, wherein the decoding assembly comprises a plurality of decoders which are connected in sequence, and each decoder comprises a mask multi-head attention layer, a multi-head attention layer and a feedforward network layer; inputting the second high-dimensional vector Output into a mask multi-head attention layer of a coder at the head end in a decoding assembly in a preset initial model; and inputting the output result of the encoder at the tail end in the decoding assembly into a linear network to obtain a Transformer model.

Optionally, the second generating module 305 is specifically configured to:

aligning the target lyric pitch vector with the target lyric time value vector; generating a melody line of the target lyric based on the target lyric pitch vector; generating the beat of the target lyric according to the target lyric time value vector; and generating a target song melody according to the melody line of the target lyric and the beat of the target lyric.

Fig. 3 to 4 above describe the chord-based song melody generating device in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the chord-based song melody generating apparatus in the embodiment of the present invention is described in detail from the perspective of the hardware processing.

Fig. 5 is a schematic structural diagram of a chord-based song melody generating apparatus 500 according to an embodiment of the present invention, which may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, one or more storage media 530 (e.g., one or more mass storage devices) storing applications 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in the chord-based song melody generating apparatus 500. Still further, the processor 510 may be configured to communicate with the storage medium 530 and execute a series of instruction operations in the storage medium 530 on the chord-based song melody generating device 500.

The chord-based song melody generating device 500 may further include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will appreciate that the chord-based song melody generating device configuration shown in fig. 5 does not constitute a limitation of the chord-based song melody generating device, and may include more or less components than those shown, or combine certain components, or a different arrangement of components.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the chord-based song melody generating method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A chord-based song melody generating method, comprising:

acquiring target lyrics input by a user and target chords preselected by the user;

generating a target tone vector according to the target lyrics and generating a target chord vector according to the target chord;

combining the target key vector and the target chord vector to generate a target key chord vector;

inputting the target tone chord vector into a trained Transformer model to obtain an output characteristic vector, wherein the output characteristic vector comprises a target lyric pitch vector and a target lyric duration vector;

and generating a target song melody according to the target lyric pitch vector and the target lyric time value vector.

2. The chord-based song melody generating method of claim 1, wherein before the obtaining of the target lyrics inputted by the user and the target chord pre-selected by the user, the chord-based song melody generating method further comprises:

acquiring preset training data, wherein the preset training data comprises a digital music score of a plurality of songs;

and training a preset initial model by using the preset training data to obtain a Transformer model.

3. The method of claim 2, wherein the training a preset initial model with the preset training data to obtain a Transformer model comprises:

obtaining a plurality of digital music scores from preset training data, wherein each digital music score is used for indicating the lyrics, the tone, the chord, the pitch and the duration of a song;

extracting tone information, chord information, pitch information and duration information of the songs from the digital music score;

sequentially generating a tone vector T, a chord vector C, a pitch vector P and a duration vector D of the song according to the tone information, the chord information, the pitch information and the duration information, and respectively combining the tone vector T, the chord vector C, the pitch vector P and the duration vector D of a plurality of songs to obtain a tone vector sequence T, a chord vector sequence C, a pitch vector sequence P and a duration vector sequence D;

and training the preset initial model by using the tone vector sequence T, the chord vector sequence C, the pitch vector sequence P and the duration vector sequence D to obtain the Transformer model.

4. The method of claim 3, wherein the training the preset initial model with the pitch vector sequence T, the chord vector sequence C, the pitch vector sequence P, and the duration vector sequence D to obtain the Transformer model comprises:

converting the pitch vector sequence T to [ T ]₁,t₂,t₃,…,t_n]And chord vector sequence C ═ C₁,c₂,c₃,…,c_n]Combining to obtain a first high-dimensional vector Input, where Input [ [ t ]₁,c₁],[t₂,c₂],[t₃,c₃],…,[t_n,c_n]]；

Let pitch vector sequence P ═ P₁,p₂,p₃,…,p_n]And the sequence of duration vectors D ═ D₁,d₂,d₃,…,d_n]Combining to obtain a second high-dimensional vector Output, Output [ [ p ]₁,d₁],[p₂,d₂],[p₃,d₃],…,[p_n,d_n]；

And taking the first high-dimensional vector Input as an Input vector and the second high-dimensional vector Output as an Output vector, and training the preset initial model to obtain the Transformer model.

5. The chord-based song melody generating method of claim 3, wherein the extracting of the key information, chord information, pitch information, and duration information of the song from the numeric musical score comprises:

converting a plurality of digital music scores into an XML format, wherein each digital music score corresponds to one song;

reading lyric information from the digital music score in the XML format to obtain song Chinese characters in each digital music score;

determining tone information and pitch information corresponding to each word in each song according to the lyric Chinese characters in each digital music score;

determining corresponding chord information and time value information according to each digital music score;

tone information, chord information, pitch information, and duration information corresponding to the plurality of songs are generated.

6. The method of claim 4, wherein the training the preset initial model to obtain the fransformer model by using the first high-dimensional vector Input as an Input vector and the second high-dimensional vector Output as an Output vector comprises:

inputting the first high-dimensional vector Input into an encoder at the head end of an encoding assembly in a preset initial model, wherein the encoding assembly comprises a plurality of encoders which are connected in sequence, and each encoder comprises a multi-head attention layer and a feedforward network layer;

inputting a result output by an encoder at the tail end in an encoding assembly into each multi-head attention layer in a decoding assembly, wherein the decoding assembly comprises a plurality of decoders which are connected in sequence, and each decoder comprises a mask multi-head attention layer, a multi-head attention layer and a feedforward network layer;

inputting the second high-dimensional vector Output into a mask multi-head attention layer of a coder at the head end in a decoding assembly in a preset initial model;

and inputting the output result of the encoder at the tail end in the decoding assembly into a linear network to obtain a Transformer model.

7. The chord-based song melody generating method of any one of claims 1 to 6, wherein the generating a target song melody according to the target lyric pitch vector and the target lyric duration vector comprises:

aligning the target lyric pitch vector with the target lyric time value vector;

generating a melody line of the target lyric based on the target lyric pitch vector;

generating the beat of the target lyric according to the target lyric time value vector;

and generating a target song melody according to the melody line of the target lyric and the beat of the target lyric.

8. A chord-based song melody generating apparatus, comprising:

the obtaining module is used for obtaining target lyrics input by a user and target chords selected by the user in advance;

the first generation module is used for generating a target tone vector according to the target lyrics and generating a target chord vector according to the target chord;

a combination module for combining the target tone vector and the target chord vector to generate a target tone chord vector;

the input module is used for inputting the target tone chord vector into a trained transform model to obtain an output characteristic vector, and the output characteristic vector comprises a target lyric pitch vector and a target lyric time value vector;

and the second generation module is used for generating the target song melody according to the target lyric pitch vector and the target lyric duration vector.

9. A chord-based song melody generating apparatus, characterized in that the chord-based song melody generating apparatus comprises: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the chord-based song melody generating device to perform the chord-based song melody generating method of any of claims 1-7.

10. A computer-readable storage medium storing instructions which, when executed by a processor, implement the chord-based song melody generating method according to any one of claims 1 to 7.