WO2020015153A1

WO2020015153A1 - Method and device for generating music for lyrics text, and computer-readable storage medium

Info

Publication number: WO2020015153A1
Application number: PCT/CN2018/106267
Authority: WO
Inventors: 刘奡智; 王健宗; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-07-19
Filing date: 2018-09-18
Publication date: 2020-01-23
Also published as: CN109166564B; CN109166564A

Abstract

A method and a device for generating music for a lyrics text on the basis of a random forest and a computer-readable storage medium, relating to the field of artificial intelligence technology. Said method comprises: acquiring a lyrics text, the lyrics text being a sequence composed of several words in sequence (S110); performing feature extraction on the lyrics text to obtain text features mapped in sequence (S130); performing feature matching between the text features and lyrics features in a corpus, to obtain lyrics features corresponding to the text features (S150); and predicting, by means of a trained random forest classifier, the melody and rhythm corresponding to the words in the sequence of the obtained lyrics features, to generate music data adapted to the lyrics text (S170). The method automatically generates, by means of a random forest model and according to a lyrics text, music data corresponding to the lyrics text, and a user can perform music composition according to the lyrics text without the need to master professional musical knowledge, so that an ordinary person can use the method to automatically generate music according to a lyrics text.

Description

Method, device and computer-readable storage medium for generating lyrics for lyrics text

Technical field

This application claims priority from Chinese patent application CN201810798036.7, filed on July 19, 2018, with the invention name "Method, Apparatus and Computer-readable Storage Media for Generating Music for Lyrics and Texts", which is hereby incorporated by reference in its entirety. Merge here.

The present disclosure relates to the field of Internet technologies, and in particular, to a method, an apparatus, and a computer-readable storage medium for generating music for lyrics text.

Background technique

Music composition based on lyrics has high professional requirements, and generally requires the use of a large amount of relevant music knowledge such as basic music theory, harmony, polyphony, orchestration, and melody structure. Therefore, the creation of music is usually done by people with a wealth of theoretical knowledge about music. For ordinary people who lack knowledge of music theory, it is basically impossible to create music based on lyrics.

technical problem

Therefore, there is a need for a method that can automatically perform music composition based on lyrics, so that the general public can also participate in music composition.

Technical solutions

In order to solve the above technical problems, an object of the present application is to provide a method, an apparatus, and a computer-readable storage medium for generating music for lyrics text.

Among them, the technical solutions used in this application are:

In one aspect, a method for generating music for lyrics text based on a random forest includes: obtaining lyrics text, the lyrics text being a sequence of several words; performing feature extraction on the lyrics text to obtain the sequence-mapped text Feature; perform feature matching between the text feature and the lyrics feature in the corpus to obtain the lyrics feature corresponding to the text feature; perform the word recognition in the sequence on the obtained lyrics feature through the trained random forest classifier Corresponding to the prediction of melody and rhythm, music data adapted to the lyrics text is generated.

On the other hand, an apparatus for generating music for lyrics text based on a random forest includes: an acquisition module configured to obtain lyrics text, the lyrics text is a sequence of several words; a text feature extraction module is configured to The lyrics text is subjected to feature extraction to obtain the text features of the sequence map; a feature matching module is configured to perform feature matching between the text features and the lyrics features in the corpus to obtain the lyrics features corresponding to the text features; music data generation A module configured to predict the melody and rhythm corresponding to the words in the sequence by using the trained random forest classifier to generate the lyrics data corresponding to the lyrics text.

On the other hand, an apparatus for generating music for lyrics text based on a random forest includes a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to execute the lyrics text as described above. Method of generating music.

In another aspect, a computer-readable storage medium has stored thereon a computer program that, when executed by a processor, implements the method for generating a musical composition for lyrics text as described above.

Beneficial effect

In the above technical solution, by performing feature extraction, feature matching, and rhythm and melody prediction through a random forest classifier, the lyrics data corresponding to the lyrics text is automatically generated based on the obtained lyrics text, and the user does not need to master professional The music knowledge can realize the composition of the music according to the lyrics text, so that the general public can use this disclosure to automatically generate the music based on the lyrics text.

It should be understood that the above general description and the following detailed description are merely exemplary and explanatory, and should not limit the present application.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings herein are incorporated in and constitute a part of the specification, illustrate embodiments consistent with the present application, and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic diagram of an implementation environment according to the present disclosure, according to an exemplary embodiment.

Fig. 2 is a block diagram of a device 200 according to an exemplary embodiment.

Fig. 3 is a flow chart showing a method for generating music for lyrics text according to an exemplary embodiment.

FIG. 4 is a flowchart of steps before step S110 of the embodiment shown in FIG. 3 in an exemplary embodiment.

FIG. 5 is a flowchart of an exemplary embodiment of step S170 of the embodiment shown in FIG. 3.

FIG. 6 is a flowchart of step S175 in the embodiment shown in FIG. 5 in an exemplary embodiment.

FIG. 7 is a flowchart of step S303 in the embodiment shown in FIG. 6 in an exemplary embodiment.

Fig. 8 is a block diagram of a device for generating music for lyrics text according to an exemplary embodiment.

Fig. 9 is a block diagram of a device for generating music for lyrics text according to another exemplary embodiment.

FIG. 10 is a block diagram of a module 170 of the embodiment shown in FIG. 8 in an exemplary embodiment.

FIG. 11 is a block diagram of a musical piece data generating unit 175 of the embodiment shown in FIG. 10 in an exemplary embodiment.

FIG. 12 is a block diagram of the note information combining unit 303 of the embodiment shown in FIG. 11 in an exemplary embodiment.

Through the above drawings, specific embodiments of the present application have been shown, which will be described in more detail later. These drawings and text descriptions are not intended to limit the scope of the concept of the present application in any way, but by referring to specific embodiments. Those skilled in the art will explain the concepts of this application.

Embodiments of the invention

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of devices and methods consistent with certain aspects of the application as detailed in the appended claims.

Fig. 1 is a schematic diagram of an implementation environment according to the present disclosure, according to an exemplary embodiment. The implementation environment is a terminal 100 and a server 200 that establish a network communication connection, where the server 200 is implemented as a back end for generating music for lyrics text by the present disclosure. The terminal 100 may be a computer, a smart phone, or other communication devices capable of generating client music for lyrics and text, and a communication device having a network connection function, which is not limited herein. The terminal 100 can initiate a request to generate music data and provide lyrics text, so that the server 300 receives the request initiated by the terminal 100, and generates music data for the lyrics text according to the lyrics text provided by the terminal 100, and then outputs the generated music data to The terminal 100, in an exemplary embodiment, the server 300 may be a web server or an APP server.

Fig. 2 is a block diagram showing a hardware structure of a server according to an exemplary embodiment. The server can be used to generate music data for lyrics text and deployed in the implementation environment shown in FIG. 1. It should be noted that the server 200 is only an example adapted to the present disclosure, and cannot be considered to provide any limitation on the scope of use of the present disclosure. The server 200 cannot also be interpreted as needing to rely on or having one or more of the components shown in FIG. 2.

The hardware structure of the server may vary greatly due to different configurations or performance. As shown in FIG. 2, the server 200 includes: a power supply 210, an interface 230, at least one memory 250, and at least one processor (CPU, Central Processing Units) 270. The power supply 210 is used to provide working voltages for each hardware device on the server 200.

The interface 230 includes at least one wired or wireless network interface 231, at least one serial-to-parallel conversion interface 233, at least one input-output interface 235, and at least one USB interface 237, etc., for communicating with external devices. In an exemplary embodiment, the terminal 100 in the implementation environment of FIG. 1 may be communicated through a wireless network interface.

The memory 250 serves as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk, or an optical disk. The resources stored on the memory 250 include an operating system 251, an application program 253, and data 255. The storage method may be temporary storage or permanent storage. . The operating system 251 is used to manage and control various hardware devices and application programs 253 on the server 200 to implement the calculation and processing of massive data 255 by the processor 270, which may be Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc. The application program 253 is a computer program that completes at least one specific task based on the operating system 251. It may include at least one module (not shown in FIG. 2), and each module may respectively contain a series of computers for the server 200. Readable instructions. The data 255 may be photos, pictures, and the like stored on the disk.

The processor 270 may include one or more processors, and is configured to communicate with the memory 250 through a bus, for calculating and processing the massive data 255 in the memory 250. As described in detail above, the server 200 to which the present disclosure is applied will complete the generation of music for the lyrics text by the processor 270 reading a series of computer-readable instructions stored in the memory 250.

In addition, the present disclosure can also be implemented by a hardware circuit or a hardware circuit in combination with software. Therefore, the implementation of the present disclosure is not limited to any specific hardware circuit, software, and a combination of the two.

Fig. 3 is a flowchart illustrating a method for generating music for lyrics text according to an exemplary embodiment. As shown in Fig. 3, the method in this embodiment includes:

In step S110, the lyrics text is obtained, and the lyrics text is a sequence composed of several words.

The word is the smallest unit in the lyrics text. For example, in the lyrics text "not yet feeling well, the climate of snowflakes blooming", each word in the lyrics text is the word of the lyrics text. Of course, the language of the lyrics text is not limited. The words in the Chinese text are the words of the lyrics text, and the words of the English text are the words of the lyrics text, which is not limited here.

In an embodiment, the obtained lyrics text may be the lyrics text input by the user in the interactive interface. The length of the lyrics text is not limited, and it can be a sentence or a paragraph of text. In another embodiment, the interactive interface may also display the lyrics text recommended by the server. The user may select the recommended lyrics text on the interactive interface, and the server inputs the selected lyrics text to the server according to the user's selection operation. . In another embodiment, the server also provides an option for adjusting the number of words in the lyrics text, so that the user can enter text and adjust the number of words in the text.

Step S130: Perform feature extraction on the lyrics text to obtain the text features of the sequence map.

Feature extraction of the lyrics text means obtaining the syllable information corresponding to the lyrics text from the lyrics text, that is, reflecting the syllable information of the lyrics text with the text features, wherein each word in the lyrics text corresponds to a syllable. The text features reflecting the syllable information of the lyrics text include, but are not limited to, the syllable type, the number of syllables, the word frequency, and the word rareness of each word in the lyrics text. The syllable type refers to the classification of the syllable corresponding to the word in the lyrics, which can be: single syllable, start syllable, central syllable, end syllable, and the like. The number of syllables refers to the number of syllables, and the word frequency refers to the frequency of a word in the lyrics text; the word rareness is a function of the word frequency, where:

The lyrics text uniquely corresponds to the text feature, that is, the extracted text feature is a sequence mapped text feature, so that the lyrics text can be described by the syllable information reflected by the text feature.

In an exemplary embodiment, text feature extraction can be performed in a python programming manner, and a method for extracting text features such as the syllable type, word frequency, and word rareness of each word in the lyrics text is set in a pre-written program. . In other embodiments, text feature extraction by deep neural networks is also applicable to the present disclosure, which is not limited herein.

Step S150: Perform feature matching between the text features and the lyrics features in the corpus to obtain the lyrics features corresponding to the text features.

The corpus is composed of music files of several songs. The music files include the lyrics features, rhythm features and melody features extracted for each song. In one embodiment, the music file exists in XML format in the corpus. It is worth noting that the corpus has been constructed before text feature matching. The detailed construction of the corpus is described below. By matching the text features with the lyrics features in the corpus, the lyrics features similar to the text features, that is, the lyrics features corresponding to the text features, can be matched from the corpus. Therefore, in the subsequent steps, the rhythm and melody are predicted according to the matched lyrics characteristics.

Taking the extracted text features as syllable type, number of syllables, word frequency, and word rareness as examples, the matching between the text features and lyrics features in the corpus includes the syllable type, number of syllables, word frequency, and word rareness Matching, so that through matching, lyrics features similar to the syllable type, number of syllables, word frequency, and word rareness of the lyrics text can be obtained from the corpus.

In step S170, the obtained lyrics feature is used to predict the melody and rhythm corresponding to the words in the sequence through the trained random forest classifier to generate music data adapted to the lyrics text.

The duration of the notes in the composition of the music is the rhythm of the composition. Rhythm characteristics are used to reflect the duration information of the notes, such as the start of the note and the duration. The rhythm characteristics may include: a tick mark, an offset, a measurement offset, a time value, and the like. The tempo sign refers to the tempo sign of the note in the music corresponding to the lyrics; the offset refers to the number of beats before the start of the music; the measurement offset refers to the number of beats before the start of the measurement; the time value refers to the corresponding note of a word in the music The length of time.

The melody of each note in the composition is the melody of the composition. Melody features are used to reflect the pitch information of a note. The melody characteristics may include: pitch symbols, tone levels, temporary marks, weak beats, and the like. Among them, the tonality symbol refers to the tone symbol corresponding to a word in the lyrics; the tone level refers to the tone level corresponding to a word in the lyrics; the temporary mark refers to the diacritical mark placed directly before the note; the weak beat means no strong The unit of sound is beat. Lyric characteristics may include: syllable type, number of syllables, word frequency, and word rareness of each word in the lyrics text. It is worth noting that the rhythm characteristics and melody characteristics listed above are just examples adapted to the present disclosure, and should not be considered as a limitation on the use of the present disclosure. In other embodiments, the rhythm feature and melody feature may further include features other than the features listed above, or a combination of some features listed above and other features not listed above.

Corresponding to each song, since the lyrics and music of the song are corresponding, the lyrics features in the corpus correspond to the rhythm and melody features. Therefore, on the basis of matching the text features of the lyrics text with the lyrics features in the corpus, the corresponding melody and rhythm can be predicted according to the lyrics features corresponding to the obtained text features. Among them, predicting the rhythm and melody, that is, predicting the rhythm and melody characteristics through the lyrics characteristics corresponding to the text characteristics, and then generating the rhythm and melody based on the predicted rhythm and melody characteristics, and then generating music data.

The random forest classifier is a combination of multiple decision trees constructed by features, where each feature constitutes a node of the decision tree. Therefore, the random forest classifier can combine features non-linearly, and does not require a large amount of sample data to train the random forest classifier, so that it can train the random forest classification without a large amount of sample data on the basis of ensuring the quality of the generated music Device.

In the prediction of melody and rhythm, the prediction is performed according to the characteristics of each node of the random forest classifier. Take the lyrics feature prediction as an example, that is, the input of the random forest classifier is the lyrics feature, and the output is the rhythm feature. Then when constructing and training the random forest classifier, the predicted rhythm is constructed based on the lyrics features in the corpus. Random forest classifier, that is, the lyrics features in the corpus determine the characteristics and judgment conditions of each node of the random forest classifier. For example, the number of syllables, syllable type, word frequency, and word rareness are set at random from top to bottom. The nodes of the forest classifier decision tree are then determined according to the judgment conditions on each node, for example, on this node of the number of syllables: if the number of syllables ≤ 3, the corresponding time value is output; if the number of syllables> 3, according to the syllable type Judging a certain time value of the output. In the same way, there are corresponding judgment conditions and output based on different conditions on syllable type, word frequency, and sub-rare degree, so that the time value can be output based on lyrics characteristics. Of course, other rhythm characteristics can be predicted by similar decision trees. Thereby, the rhythm feature corresponding to the lyrics feature can be obtained through the lyrics feature corresponding to the text feature and the random forest classifier prediction, and then the rhythm corresponding to the lyrics text can be obtained. Of course, a similar method can be used to predict the melody feature corresponding to the lyrics feature by predicting the lyrics feature corresponding to the text feature to achieve the melody prediction. Then according to the predicted rhythm characteristics and melody characteristics, the rhythm and melody corresponding to the lyrics text are formed, and then the music data corresponding to the lyrics text are generated. The generated music data may be in an XML format or a MIDI format, and is not specifically limited herein.

By performing feature extraction, feature matching, and rhythm and melody prediction through a random forest classifier, the lyrics data corresponding to the lyrics text is automatically generated based on the obtained lyrics text, without the need for the user to master professional music knowledge. Music composition is performed according to the lyrics text, so that the general public can also perform music composition according to the present disclosure.

FIG. 4 is a flowchart of steps in an exemplary embodiment before steps S110 in the embodiment shown in FIG. 3. As shown in FIG. 4, before step S110, the method in this embodiment further includes:

Step S210: Extract lyrics features from the lyrics sample text in the sample data, and extract rhythm features and melody features from the sample data of the music data corresponding to the lyrics sample text.

The sample data is used to train several songs collected by the random forest classifier, where each song collected includes lyrics and music. The lyrics sample text is the lyrics of the song in the sample data. The lyrics feature is used to describe the lyrics of the sample data, and the rhythm feature and melody feature are used to describe the rhythm and melody of the sample data, respectively. In other words, the lyrics feature reflects the syllable information of each word in the sample text of the lyrics, the rhythm feature reflects the time value information of the notes in the music, and the melody feature reflects the level information of the notes in the music. In each song, the syllable is Corresponds to a note, that is, each word in the lyrics corresponds to a note. In order to ensure the prediction effect of the random forest classifier, in an exemplary embodiment, the sample data used is a song with a single track and a single instrument, so that in the sample data, each word of the lyrics uniquely corresponds to a note. In an exemplary embodiment, the extracted lyrics features may include: syllable type, number of syllables, word frequency, sub-rareness, and the like. The extracted rhythm features may include features such as beat signatures, offsets, measurement offsets, and time values. The extracted melody features may include: pitch symbols, tone levels, temporary marks, weak shots, and the like.

It should be noted that the specific categories of the lyrics features, rhythm features, and melody features shown above are only examples suitable for the present disclosure, and cannot be considered as providing any limitation on the scope of use of the present disclosure. Nor can it be interpreted as the need to extract all or only the specific lyrics features, rhythm features, and melody features in the above examples to achieve the present disclosure. In other embodiments, features that are more or less than the lyrics features, rhythm features, and melody features of the specific categories listed above may be extracted to implement the present disclosure. Of course, the extracted lyrics features, rhythm features, and melody features can more fully describe the song lyrics and corresponding music, which can better improve the accuracy of the random forest classifier, so that the lyrics text corresponding to the lyrics text is automatically generated based on the lyrics text. The accuracy of the song data is higher.

In an exemplary embodiment, the way of extracting lyrics features, rhythm features, and melody features may be to extract each feature (lyric features, rhythm features, and melody features) through a deep neural network. In another exemplary embodiment, each specific lyrics feature, rhythm feature, and melody feature may be extracted by a Python programming method. The method for extracting features is not limited herein.

In step S230, a corpus is constructed from the lyrics features, rhythm features and melody features.

The extracted lyrics features, rhythm features, and melody features are used as a corpus of the random forest classifier, so that the random forest classifier is trained according to the corpus, and the rhythm and melody corresponding to the lyrics text are predicted after the training is completed. Of course, the corpus may also include the sample text of the lyrics and the corresponding music data. In an exemplary embodiment, a corpus is constructed by extracting the lyrics features, rhythm features, and melody features of 24 single-track, single-music popular music. The corpus includes 59 features and has 12,358 observations. The observed value refers to a value corresponding to a specific feature, for example, a feature of a syllable type, and the observed value for the feature may be a single syllable, a start syllable, a central syllable, and an end syllable.

Step S250, iterative training of the random forest classifier is performed through the lyrics feature, rhythm feature, and melody feature, until the trained random forest classifier predicts the melody and rhythm of the known song text to a specified accuracy, then the random forest classification is stopped Iterative training.

In one embodiment, by inputting the lyrics features into a random forest classifier, the output rhythm features and melody features are separately associated with the rhythm feature and melody feature predicted by the random forest classifier for the lyrics feature output. Compare the actual rhythm features and melody features corresponding to the lyrics features. If they are not the same, adjust the parameters of the random forest classifier, and then re-enter the lyrics features into the parameter-adjusted random forest classifier to determine the output based on the prediction results. If the rhythm feature and melody feature are the same as the rhythm feature and melody feature corresponding to the lyrics feature, if they are different, repeat the above steps; if they are the same, then use the next set of lyrics features in the corpus to train the random forest classifier. This process is the iterative training of the random forest classifier. That is, the random forest classifier is iteratively trained in a deep learning manner, so that the random forest classifier after training is implemented to predict the melody and rhythm.

After training for a period of time, the random forest classifier is evaluated, that is, the accuracy of the random forest classifier is evaluated.

After training the random forest classifier with lyrics features, rhythm features, and melody features from several sample data, the classifier is evaluated. The evaluation process is: input the lyrics characteristics of a song that already has a song, and the random forest classification predicts and outputs the corresponding rhythm characteristics and melody characteristics. The rhythm characteristics and melody characteristics obtained through the random forest are compared with the actual lyrics The rhythm feature and melody feature are compared to calculate the accuracy of the random forest classifier.

In an embodiment, if the lyrics feature, rhythm feature, and melody feature of multiple songs are used to evaluate the random forest classifier, the accuracy of the random forest classifier corresponding to each song is averaged to obtain the The accuracy of the random forest classifier. If the calculated accuracy reaches the specified accuracy, the training of the random forest classifier is completed; if the calculated prediction accuracy does not reach the specified accuracy, the random forest classifier is continued to be trained by the sample data's lyrics feature, rhythm feature, and melody feature.

It is worth noting that the sample data used for random forest classifier evaluation is different from the sample data used during training. For example, if training a random forest classifier uses the lyrics features, rhythm features, and melody features of the song "hurried that year" in the sample data, then the random forest classifier cannot be used for evaluation The lyrics feature, rhythm feature and melody feature corresponding to the song are evaluated by the random forest classifier.

In an exemplary embodiment, the random forest classifier includes a rhythm classifier and a melody classifier, and the evaluation of the random forest classifier can evaluate the rhythm classifier and the melody classifier respectively, so that after the evaluation, the rhythm classifier and the melody can be obtained respectively Classifier accuracy.

5 is a flowchart of an exemplary embodiment of step S170 of the embodiment shown in FIG. 3. In this embodiment, the random forest classifier includes a rhythm classifier and a melody classifier. As shown in FIG. 5, step S170 includes :

In step S171, a rhythm feature corresponding to the lyrics feature is obtained by predicting the rhythm classifier.

The rhythm classifier is a model that predicts rhythm characteristics by combining several decision trees. In this embodiment, the rhythm characteristics are predicted by the lyrics characteristics. Correspondingly, each node of the decision tree of the rhythm classifier is composed of lyrics from sample data. Feature construction. Because in the corpus of the random forest classifier, the lyrics feature has a corresponding rhythm feature, the rhythm feature can predict the rhythm feature corresponding to the lyrics feature, and the obtained rhythm feature can be the beat signature and offset corresponding to the lyrics feature. Characteristics, measurement offset, time value, and other characteristics and their combinations.

In step S173, the lyrics feature and rhythm feature are input to the melody classifier to predict and obtain the melody feature corresponding to the lyrics feature.

The melody classifier is a model that predicts melody characteristics by combining several decision trees. In this embodiment, the melody feature is predicted by the lyrics feature and the rhythm feature. Accordingly, each node of the decision tree of the melody classifier is constructed from the lyrics feature and the rhythm feature of the sample data. Rhythmic features and lyrics features are input into the melody classifier to predict the melody features corresponding to the lyrics features, such as features such as sound level, weak beats, temporary marks, or combinations thereof.

In step S175, the obtained rhythm characteristics and melody characteristics are combined to generate music data adapted to the lyrics text.

By combining the obtained rhythm and melody characteristics, the time value and pitch corresponding to the syllable information of each word in the lyrics text are obtained, so that each word will correspond to a note, and each of the words will be combined in the order of the words in the lyrics text. The musical notes corresponding to the words, thereby obtaining the music data of the obtained lyrics text.

In an exemplary embodiment, the generated music data may be in an XML format, and may also be in a MIDI format.

It should be noted that, in this embodiment, a rhythm feature is first obtained through a rhythm classifier, then a melody feature is obtained through a melody classifier, and finally the melody data of the obtained lyrics text is combined with the rhythm feature and the melody feature. This method of generating music data is only an exemplary embodiment of step S170.

In other embodiments, the melody feature may also be obtained by inputting the lyrics feature into the melody classifier, and then the lyric feature and the melody feature are input into the rhythm classifier to obtain the rhythm feature. Accordingly, each node of the decision tree of the melody classifier in this embodiment is constructed from the lyrics features of the sample data, and each node of the decision tree of the rhythm classifier is constructed from the lyrics features and melody features of the sample data. Finally, the obtained melody and rhythm characteristics are combined to generate music data corresponding to the obtained lyrics text.

Whether to obtain the rhythm feature or the melody feature first can be determined according to the accuracy of the random forest classifier (including the rhythm classifier and the melody classifier) obtained in step S250, that is, if the accuracy of the rhythm classifier is higher than the melody after training The classifier can first obtain the rhythm feature through the rhythm classifier prediction and then the melody feature through the melody classifier prediction; if the accuracy of the rhythm classifier is lower than the melody classifier after training, you can follow the way that the melody feature is followed by the rhythm feature. Generate song data. Therefore, the corresponding features predicted by the classifier with higher accuracy can improve the accuracy of the overall prediction result. Of course, you can also decide from other angles whether to obtain the rhythm feature or the melody feature first.

FIG. 6 is a flowchart of an exemplary embodiment of step S175 in the embodiment shown in FIG. 5. As shown in FIG. 6, step S175 includes:

Step S301: Generate the note information corresponding to the words in the sequence by using the obtained rhythm characteristics and melody characteristics.

The obtained rhythm characteristics and melody characteristics are combined to obtain the time value information and pitch information of the words in the lyrics text, so that each word will get a note, that is, the note information corresponding to the words in the sequence is generated. In an exemplary embodiment, when generating the note information corresponding to a word in the sequence, the subsequent note information is combined with the characteristics of the generated note. For example, when generating one note information, the features of the first 5 notes (such as Time value, pitch, etc.), so that each word in each sequence can generate corresponding note information.

In step S303, the note information corresponding to the words in the sequence is combined to generate music data of the lyrics text. The note information corresponding to the words in the lyrics text is combined to obtain music data of the lyrics text.

Step S303 of the embodiment shown in FIG. 7 and FIG. 6 is a flowchart in an exemplary embodiment. As shown in FIG. 7, step S303 includes:

Step S3031: Combine the note information corresponding to the words in the sequence according to the order of the words in the sequence to generate a note sequence corresponding to the lyrics text. The words in the lyrics text are a sequence of sequences. After generating the note information corresponding to the words in the lyrics text according to the rhythm and melody characteristics, the note information is combined in the order of the words in the lyrics text to obtain the notes corresponding to the lyrics text. sequence.

Step S3033: Filter the note sequence according to the set note threshold.

Filtering a note sequence refers to removing certain notes in the note sequence, and the set note threshold may be a specific note or a range of notes. For example, if you want to ignore shorter notes (such as 1/16 notes), you can set 1/16 notes as a threshold, that is, keep other notes except 1/16 notes, and get a new note sequence. In an exemplary embodiment, the note threshold can be adjusted according to actual needs. The note threshold set for different lyrics texts can be different. For example, the note threshold can be set to 1/64 notes in a certain piece of lyrics text. Thus, 1/64 notes in the note sequence are removed, and 1/32 notes in the note sequence are filtered out by setting a note threshold in another piece of lyrics text.

In step S3035, the music data of the lyrics text is generated by the filtered note sequence.

The following is a device embodiment of the present disclosure, which can be used to execute an embodiment of the method for generating music for lyrics text performed by the server 200 of the present disclosure. For details not disclosed in the device embodiment of the present disclosure, please refer to the embodiment of the method for generating music for lyrics text according to the present disclosure.

Fig. 8 is a block diagram of a device for generating music for lyrics text according to an exemplary embodiment. The apparatus for generating music for lyrics text can be used in the server 200 shown in FIG. 2. All or part of the steps of the method for generating music for lyrics text shown in the method embodiment shown above. As shown in FIG. 8, the device includes, but is not limited to, an acquisition module 110, a text feature extraction module 130, a feature matching module 150, and a music data generation module 170. The acquisition module 110 is configured to acquire lyrics text, and the lyrics text is several A sequence of words. The text feature extraction module 130 is connected to the acquisition module 110 and is configured to perform feature extraction on the lyrics text to obtain the sequence-mapped text features. A feature matching module 150, which is connected to the text feature extraction module 130 and is configured to perform feature matching between text features and lyrics features in the corpus to obtain lyrics features corresponding to the text features. A music data generating module 170, which is connected to the feature matching module 150, is configured to predict the melody and rhythm corresponding to the words in the sequence by the trained random forest classifier on the obtained lyrics features to generate a music adapted to the lyrics text data.

Fig. 9 is a block diagram showing a device for generating music for lyrics text according to another exemplary embodiment. As shown in FIG. 9, in addition to the modules shown in FIG. 8, the device in this embodiment further includes a feature extraction module 210 configured to extract lyrics features from the sample text of the lyrics in the sample data, and from the sample data The rhythm data and melody feature are extracted from the music data corresponding to the lyrics sample text. A corpus construction module 230, which is connected to the feature extraction module 210, is configured to construct the corpus from the lyrics features, rhythm features, and melody features. A training module 250, which is connected to the corpus building module 230, and is configured to perform iterative training of a random forest classifier by using lyrics features, rhythm features, and melody features, until the trained random forest classifier has a melody and If the prediction of the rhythm reaches a specified accuracy, the iterative training of the random forest classifier is stopped.

FIG. 10 is a block diagram of a module 170 of the embodiment shown in FIG. 8 in an exemplary embodiment. In this embodiment, the random forest classifier includes a rhythm classifier and a melody classifier. As shown in FIG. 10, the music data generation module 170 includes a rhythm feature obtaining unit 171 configured to obtain the lyrics feature corresponding to the lyrics feature predicted by the rhythm classifier. Rhythmic characteristics. The melody feature obtaining unit 173 is connected to the rhythm feature obtaining unit 171 and is configured to input lyrics features and rhythm features to the melody classifier to predict and obtain melody features corresponding to the lyrics features. A music data generating unit 175, which is connected to the melody feature obtaining unit connection 173, is configured to combine the obtained rhythm features and melody features to generate music data adapted to the lyrics text.

FIG. 11 is an exemplary block diagram of the music data generating unit 175 of the embodiment shown in FIG. 10. In this embodiment, the music data generating unit 175 includes a note information generating unit 301 configured to pass the obtained rhythm characteristics and melody characteristics. To generate note information corresponding to words in the sequence. The note information combining unit 303 is connected to the note information generating unit 301 and is configured to combine the note information corresponding to the words in the sequence to generate music data of the lyrics text.

FIG. 12 is an exemplary block diagram of the note information combining unit 303 shown in FIG. 11. In this embodiment, the note information combining unit 303 includes a note sequence generating unit 3031 configured to combine the word correspondences in the sequence in the order of the words in the sequence. Note information to generate a note sequence corresponding to the lyrics text. The filtering unit 3033 is connected to the note sequence generating unit 3031 and is configured to filter the note sequence according to a set note threshold. A music data generating unit 3035 is connected to the filtering unit 3033 and is configured to generate music data of the lyrics text through the filtered note sequence.

For details of the implementation process of the functions and functions of the modules in the above device, see the implementation process of corresponding steps in the method for generating music for lyrics text, and details are not described herein again. It can be understood that these modules or units can be implemented by hardware, software, or a combination of both. When implemented in hardware, these modules or units may be implemented as one or more hardware modules, such as one or more application specific integrated circuits. When implemented in software, these modules or units may be implemented as one or more computer programs executing on one or more processors.

Optionally, the present disclosure also provides a device for generating music for lyrics text. The device may be used for the server 200 described in FIG. 2. The device includes: a processor; and a memory for storing processor-executable instructions. Wherein, the processor is configured to execute the method for generating music for lyrics text in the embodiment shown in any one of FIG. 3 to FIG. 7.

The specific manner in which the processor of the device in this embodiment performs operations has been described in detail in the embodiment of the method for generating music for lyrics text, and will not be described in detail here.

In an exemplary embodiment, the present disclosure also provides a computer-readable storage medium. The computer-readable storage medium may be a memory 250 storing a computer program, which may be executed by the processor 270 of the server 200 to complete the foregoing. Method for generating music for lyrics text.

The above content is only a preferred exemplary embodiment of the present application, and is not intended to limit the implementation of the present application. Those skilled in the art can easily make corresponding variations or modifications according to the main idea and spirit of the present application. Therefore, the protection scope of this application shall be subject to the protection scope claimed in the claims.

Claims

A method for generating music for lyrics text based on a random forest, wherein the method includes:

Obtaining lyrics text, which is a sequence composed of several words in sequence;

Performing feature extraction on the lyrics text to obtain text features of the sequence map;

Performing feature matching between the text feature and the lyrics feature in the corpus to obtain the lyrics feature corresponding to the text feature;

Predict the melody and rhythm corresponding to the words in the sequence by using the trained random forest classifier to generate the melody data adapted to the lyrics text.
The method according to claim 1, wherein before the obtaining the lyrics text, the method further comprises:

Extracting lyrics features from the lyrics sample text in the sample data, and extracting rhythm features and melody features from the sample data of the song data corresponding to the lyrics sample text;

Constructing the corpus from the lyrics characteristics, rhythm characteristics and melody characteristics;

Iterative training of the random forest classifier by using the lyrics feature, rhythm feature, and melody feature, until the trained random forest classifier predicts the melody and rhythm of a known song text to a specified accuracy, then stop Iterative training of a random forest classifier.
The method according to claim 1 or 2, wherein the random forest classifier includes a rhythm classifier and a melody classifier, and the obtained lyrics feature is performed in the sequence by the trained random forest classifier. Prediction of the melody and rhythm corresponding to a word, and generating music data adapted to the lyrics text include:

Obtaining a rhythm feature corresponding to the lyrics feature through prediction by the rhythm classifier;

Inputting the lyrics feature and the rhythm feature to the melody classifier to predict and obtain the melody feature corresponding to the lyrics feature;

The obtained rhythm characteristics and the melody characteristics are combined to generate music data adapted to the lyrics text.
The method according to claim 3, wherein generating the melody data adapted to the lyrics text by the combination of the rhythm feature and the melody feature comprises:

Generate the note information corresponding to the words in the sequence by using the obtained rhythm feature and the melody feature;

Combine the note information corresponding to the words in the sequence to generate music data of the lyrics text.
The method according to claim 4, wherein the combining the note information corresponding to the words in the sequence to generate music data of the lyrics text comprises:

Combining the note information corresponding to the words in the sequence according to the order of the words in the sequence to generate a note sequence corresponding to the lyrics text;

Filtering the note sequence according to a set note threshold;

Music data of the lyrics text is generated by the filtered note sequence.
A device for generating music for lyrics text based on a random forest, characterized in that the device includes:

An obtaining module configured to obtain lyrics text, where the lyrics text is a sequence composed of several words in sequence;

A text feature extraction module configured to perform feature extraction on the lyrics text to obtain the text features of the sequence map;

A feature matching module configured to perform feature matching between the text features and lyrics features in a corpus to obtain lyrics features corresponding to the text features;

The music data generating module is configured to predict the melody and rhythm corresponding to the words in the sequence through the trained random forest classifier to generate the music data adapted to the lyrics text.
The apparatus according to claim 6, further comprising:

A feature extraction module configured to extract lyrics features from the lyrics sample text in the sample data, and extract rhythm features and melody features from the sample data of music data corresponding to the lyrics sample text;

A corpus construction module configured to construct the corpus from the lyrics characteristics, rhythm characteristics, and melody characteristics;

A training module configured to perform iterative training of the random forest classifier by using the lyrics feature, rhythm feature, and melody feature until the trained random forest classifier predicts the melody and rhythm of a known song text to a specified accuracy , Stop the iterative training of the random forest classifier.
The apparatus according to claim 7, wherein the random forest classifier includes a rhythm classifier and a melody classifier, and the music data generation module includes:

A rhythm feature obtaining unit configured to obtain a rhythm feature corresponding to the lyrics feature through prediction by the rhythm classifier;

A melody feature obtaining unit configured to input the lyrics feature and the rhythm feature to the melody classifier to predict and obtain a melody feature corresponding to the lyrics feature;

A music data generating unit is configured to combine the obtained rhythm feature and the melody feature to generate music data adapted to the lyrics text.
The apparatus according to claim 8, wherein the music data generating unit comprises:

A note information generating unit configured to generate note information corresponding to a word in the sequence by using the obtained rhythm feature and the melody feature;

A note information combining unit is configured to combine note information corresponding to words in the sequence to generate music data of the lyrics text.
The apparatus according to claim 9, wherein the note information combining unit comprises:

A note sequence generating unit configured to combine the note information corresponding to the words in the sequence according to the order of the words in the sequence to generate a note sequence corresponding to the lyrics text;

A filtering unit configured to filter the note sequence according to a set note threshold;

The music data generating unit is configured to generate music data of the lyrics text through the filtered note sequence.
An apparatus for generating music for lyrics text based on a random forest, the apparatus includes:

A processor; and a memory for storing processor-executable instructions;

The processor is configured to perform the following steps:

Obtaining lyrics text, which is a sequence composed of several words in sequence;

Performing feature extraction on the lyrics text to obtain text features of the sequence map;

Performing feature matching between the text feature and the lyrics feature in the corpus to obtain the lyrics feature corresponding to the text feature;

Predict the melody and rhythm corresponding to the words in the sequence by using the trained random forest classifier to generate the melody data adapted to the lyrics text.
The apparatus according to claim 11, wherein before the step of obtaining lyrics text, the processor performs the following steps:

Extracting lyrics features from the lyrics sample text in the sample data, and extracting rhythm features and melody features from the sample data of the song data corresponding to the lyrics sample text;

Constructing the corpus from the lyrics characteristics, rhythm characteristics and melody characteristics;

Iterative training of the random forest classifier by using the lyrics feature, rhythm feature, and melody feature, until the trained random forest classifier predicts the melody and rhythm of a known song text to a specified accuracy, then stop Iterative training of a random forest classifier.
The device according to claim 11 or 12, wherein the random forest classifier comprises a rhythm classifier and a melody classifier, and the obtained lyrics features are processed in the sequence by the trained random forest classifier. In the step of predicting the melody and rhythm corresponding to a word and generating music data adapted to the lyrics text, the processor performs the following steps:

Obtaining a rhythm feature corresponding to the lyrics feature through prediction by the rhythm classifier;

Inputting the lyrics feature and the rhythm feature to the melody classifier to predict and obtain the melody feature corresponding to the lyrics feature;

The obtained rhythm characteristics and the melody characteristics are combined to generate music data adapted to the lyrics text.
The apparatus according to claim 13, wherein in the step of generating the music data adapted to the lyrics text by the combination of the rhythm feature and the melody feature, the processor performs the following steps:

Generate the note information corresponding to the words in the sequence by using the obtained rhythm feature and the melody feature;

Combine the note information corresponding to the words in the sequence to generate music data of the lyrics text.
The apparatus according to claim 14, wherein in the step of combining musical note information corresponding to words in the sequence to generate music data of the lyrics text, the processor performs the following steps:

Combining the note information corresponding to the words in the sequence according to the order of the words in the sequence to generate a note sequence corresponding to the lyrics text;

Filtering the note sequence according to a set note threshold;

Music data of the lyrics text is generated by the filtered note sequence.
A computer-readable storage medium having stored thereon a computer program, wherein the computer program is executed by a processor with the following steps:

Obtaining lyrics text, which is a sequence composed of several words in sequence;

Performing feature extraction on the lyrics text to obtain text features of the sequence map;

Performing feature matching between the text feature and the lyrics feature in the corpus to obtain the lyrics feature corresponding to the text feature;

Predict the melody and rhythm corresponding to the words in the sequence by using the trained random forest classifier to generate the melody data adapted to the lyrics text.
The computer-readable storage medium of claim 16, wherein before the step of obtaining lyrics text, the processor performs the following steps:

Extracting lyrics features from the lyrics sample text in the sample data, and extracting rhythm features and melody features from the sample data of the song data corresponding to the lyrics sample text;

Constructing the corpus from the lyrics characteristics, rhythm characteristics and melody characteristics;

Iterative training of the random forest classifier by using the lyrics feature, rhythm feature, and melody feature, until the trained random forest classifier predicts the melody and rhythm of a known song text to a specified accuracy, then stop Iterative training of a random forest classifier.
The computer-readable storage medium according to claim 16 or 17, wherein the random forest classifier includes a rhythm classifier and a melody classifier, and the obtained lyrics feature is performed by the trained random forest classifier In the step of predicting the melody and rhythm corresponding to the words in the sequence and generating music data adapted to the lyrics text, the processor performs the following steps:

Obtaining a rhythm feature corresponding to the lyrics feature through prediction by the rhythm classifier;

Inputting the lyrics feature and the rhythm feature to the melody classifier to predict and obtain the melody feature corresponding to the lyrics feature;

The obtained rhythm characteristics and the melody characteristics are combined to generate music data adapted to the lyrics text.
The computer-readable storage medium of claim 18, wherein in the step of generating the music data adapted to the lyrics text, the rhythm feature and the melody feature obtained by the combination, the processor executes The following steps:

Generate the note information corresponding to the words in the sequence by using the obtained rhythm feature and the melody feature;

Combine the note information corresponding to the words in the sequence to generate music data of the lyrics text.
The computer-readable storage medium of claim 18, wherein in the step of combining musical note information corresponding to words in the sequence to generate music data of the lyrics text, the processor performs the following steps:

Combining the note information corresponding to the words in the sequence according to the order of the words in the sequence to generate a note sequence corresponding to the lyrics text;

Filtering the note sequence according to a set note threshold;

Music data of the lyrics text is generated by the filtered note sequence.