CN110660375B

CN110660375B - Method, device and equipment for generating music

Info

Publication number: CN110660375B
Application number: CN201810689003.9A
Authority: CN
Inventors: 周伟浩; 关键; 张喜梅; 张亚鹏; 肖彬; 夏丁胤; 余浩
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2024-06-04
Anticipated expiration: 2038-06-28
Also published as: CN110660375A

Abstract

The invention discloses a method for generating music, which comprises the following steps: acquiring first melody information in input audio; and generating target music according to the first melody information and the music learning model. According to the method provided by the embodiment of the invention, when the user wants to create the music in time, the user only needs to create a part of melodies of the music, and all melodies of the music can be predicted based on the part of melodies input by the user, so that the created music is provided for the user, the user can participate in the created music, and the feeling of the user's mind is expressed. In addition, the invention also discloses a device and equipment for generating the music.

Description

Method, device and equipment for generating music

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a device for generating music.

Background

Currently, various music applications are commonly used by users. Common music class applications may find and provide the user with music desired by the user. Specifically, for existing music applications, the server may collect audio of ready-made music that has finished authoring in advance, and the user may obtain the music audio desired by himself from the audio collected in advance by the server through the client.

As users' demands for music become more and more personalized, users may sometimes want to create a piece of music on their own in time. However, the user often cannot create all the melodies of the music because the user often has problems in terms of insufficient expertise, insufficient inspiration, insufficient creation time, no instrument assistance in creation, and the like. In this case, the existing music class application cannot provide the user with a music piece created immediately by the user.

Disclosure of Invention

The invention aims to solve the technical problem of providing a method, a device and equipment for generating music so as to predict all melodies of the music based on a part of melodies input by a user, thereby providing music audio which is created by participating the user.

In a first aspect, an embodiment of the present invention provides a method of generating a musical composition, the method including:

Acquiring first melody information in input audio;

And generating target music according to the first melody information and a music learning model, wherein the target music comprises the first melody information and second melody information matched with the first melody information.

Optionally, the method further comprises:

Acquiring first historical melody information and second historical melody information matched with the first historical melody information;

And training the music learning model according to the corresponding relation between the first historical melody information and the second historical melody information.

Optionally, the generating the target music according to the first melody information and the music learning model includes:

Acquiring a first note string corresponding to the first melody information, and inputting the first note string into the music learning model to obtain a second note string generated by the music learning model;

judging whether the second note string meets a preset condition or not;

and if the second note string meets the preset condition, determining target music based on the second note string.

Optionally, the preset condition is that the duration of the melody corresponding to the second note string reaches the preset duration of the target music.

searching a target starter melody matched with the first melody information in a plurality of preset starter melodies;

inputting the target primer melody into the music learning model to obtain the target music generated by the music learning model;

wherein the music learning model is trained in advance based on the correspondence between the starter melody and the complete melody information in the history music.

Optionally, the music learning model includes a plurality of music learning models, and the training the music learning model according to the correspondence between the first historical melody information and the second historical melody information includes:

acquiring the first historical melody information and the second historical melody information of different music types;

And training to obtain the music learning models of different music types according to the corresponding relation among the music types, the first historical melody information and the second historical melody information.

Determining a first target music learning model from the music learning models in response to a selection operation by a user;

And generating the target music according to the first melody information and the first target music learning model.

Acquiring historical behavior data of a user, and extracting historical music information from the historical behavior data;

Determining a second music type corresponding to the historical music information, and acquiring a second target music learning model matched with the music type from the music learning model; and generating the target music according to the first melody information and the second target music learning model.

Optionally, the acquiring the first melody information in the input audio includes:

acquiring input audio;

If the format of the input audio is not midi format, converting the input audio into midi format to obtain the input audio in midi format;

the first melody information is extracted from the midi format input audio.

Optionally, the method further comprises:

responding to the editing operation of a user on the target music, and editing the target music according to the editing operation;

Or alternatively

And editing the target music according to preset editing rules.

Optionally, the method further comprises:

and responding to the operation of the user on the target music, and taking the target music as a history music.

In a second aspect, an embodiment of the present invention provides an apparatus for generating a musical composition, the apparatus including: an acquisition unit and a generation unit;

the acquisition unit is used for acquiring first melody information in the input audio;

the generating unit is used for generating target music according to the first melody information and the music learning model, wherein the target music comprises the first melody information and second melody information matched with the first melody information.

Optionally, the apparatus further includes: a training unit;

the acquisition unit is further used for acquiring first historical melody information and second historical melody information matched with the first historical melody information;

the training unit is further configured to train the music learning model according to a correspondence between the first historical melody information and the second historical melody information.

Optionally, the generating unit is further configured to obtain a first note string corresponding to the first melody information, and input the first note string to the music learning model to obtain a second note string generated by the music learning model; judging whether the second note string meets a preset condition or not; and if the second note string meets the preset condition, determining target music based on the second note string.

Optionally, the generating unit is further configured to search a plurality of preset primer melodies for a target primer melody that matches the first melody information; inputting the target primer melody into the music learning model to obtain the target music generated by the music learning model;

Optionally, the music learning model includes a plurality of training units, and the training units are further configured to obtain the first historical melody information and the second historical melody information of different music types; and training to obtain the music learning models of different music types according to the corresponding relation among the music types, the first historical melody information and the second historical melody information.

Optionally, the generating unit is further configured to determine a first target music learning model from the music learning models in response to a selection operation of a user; and generating the target music according to the first melody information and the first target music learning model.

Optionally, the generating unit is further configured to obtain historical behavior data of a user, and extract historical music information from the historical behavior data; determining a second music type corresponding to the historical music information, and acquiring a second target music learning model matched with the music type from the music learning model; and generating the target music according to the first melody information and the second target music learning model.

Optionally, the acquiring unit is further configured to acquire input audio; if the format of the input audio is not midi format, converting the input audio into midi format to obtain the input audio in midi format; the first melody information is extracted from the midi format input audio.

Optionally, the apparatus further includes: an editing unit;

the editing unit is used for responding to the editing operation of a user on the target music, and editing the target music according to the editing operation; or editing the target music according to preset editing rules.

Optionally, the apparatus further includes: a determination unit;

The determining unit is used for responding to the operation of a user on the target music, and the target music is used as a history music.

In a third aspect, an embodiment of the present invention further provides a music generating apparatus, which includes a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by the one or more processors, where the one or more programs include a method for executing the music generating method according to any one of the first aspects.

In a fourth aspect, embodiments of the present invention also provide a non-transitory computer-readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the method of generating a musical composition according to any one of the first aspects.

Compared with the prior art, the embodiment of the invention has the following advantages:

According to the method provided by the embodiment of the invention, the first melody information in the input audio can be acquired first, and then the target music is generated according to the first melody information and the music learning model. Therefore, in the embodiment of the invention, when the user wants to create the music in time, the user only needs to create a part of melodies of the music, and can predict all melodies of the music based on the part of melodies input by the user, so that the user can participate in the created music, and the user can feel the created music in a self-care manner.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.

FIG. 1 is a schematic diagram of an exemplary application scenario in an embodiment of the present invention;

FIG. 2 is a flow chart of a method for generating a musical composition according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for determining target melody information according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a target music learning model according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an apparatus for generating music according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an apparatus for generating a musical composition according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The inventor finds that as the user's demands for music become more and more personalized, the user may sometimes want himself to create a piece of music on the fly to express his own feeling of mind. However, since the user often has problems in that the composition is not professional enough, the inspiration is insufficient, the composition time is insufficient, or the composition is not assisted by musical instruments at present, the user often cannot compose all the melodies of the composition. In this case, the existing music class application cannot provide the user with the music audio authored by the user instantaneously.

In order to solve the above-described problems, in the embodiment of the present invention, when a user wants to create a music piece by himself or herself in real time, first melody information inputted by the user may be acquired first, and then a target music piece may be generated based on the first melody information and a music piece learning model. For example, if the user inputs the first melody information including four notes, the music learning model may generate a target music including the four notes based on the first melody information including the four notes. Thus, all melodies of the music can be predicted based on the part of melodies only by the user creating the part of melodies, so that the user can be provided with the music created by participation of the user, and the user can participate in the created music by himself and express the feeling of the mind of the user.

By way of example, the embodiments of the present invention may be applied to a scenario as shown in fig. 1. In this scenario, the terminal device 102 may be any device having an input audio function (such as having a microphone) such as a mobile phone, ipad, a desktop computer, etc., first, the user 101 may input audio using the microphone in the terminal device 102, and may operate on the input audio on the terminal device 102 to trigger the terminal device 102 to generate the target music piece based on the first melody information in the input audio. The terminal device 102 may transmit the first melody information in the input audio to the server 103 in response to the operation of the user 101. Then, the server 103 may acquire the first melody information, and the music learning model in the server 103 may generate a target music from the first melody information. Finally, the server 103 may transmit the target musical composition to the terminal device 102 so that the terminal device 102 may play the target musical composition to the user 101.

It will be appreciated that in the above application scenario, although the actions of the embodiments of the present invention are described as being performed by the server 103, these actions may also be performed partly by the terminal device 102, partly by the server 103, or entirely by the terminal device 102. The present invention is not limited to the execution subject, and may be executed by performing the operations disclosed in the embodiments of the present invention.

It should be noted that the above application scenario is only shown for the convenience of understanding the present invention, and embodiments of the present invention are not limited in this respect. Rather, embodiments of the invention may be applied to any scenario where applicable.

Various non-limiting embodiments of the present invention are described in detail below with reference to the attached drawing figures.

Exemplary method

Referring to fig. 2, a method of generating a musical composition in an embodiment of the present invention is shown. In this embodiment, the method may include, for example, the steps of:

s201: first melody information in the input audio is acquired.

In this embodiment, the target user may input audio to input the first melody information. The input audio of the target user may be audio input by the target user terminal device, for example, audio recorded by a microphone humming in the target user terminal device, audio synthesized by inputting notes by a keyboard in the target user terminal device, or a piece of audio prestored in the terminal device of the target user. The input audio may be audio files in MP3 format, wav format, etc.

Note that the melody information may be information including a note string, and the first melody information may be information including a note string in the input audio. For example, the first melody information may be a TFRecord format file extracted from the input audio, for example, the input audio may be converted into a TFRecord format file, then data representing the melody and data representing the complex sound may be extracted from the TFRecord format file, and then the data representing the melody and the data representing the complex sound may be formed into a TFRecord format melody file and a TFRecord format complex sound file, respectively, so that the TFRecord format melody file and the TFRecord format complex sound file may be used as the first melody information.

For example, after a user humms a melody "3 (microphone) 4 (method) 5 (dumb)" 1 (dumb) ", generates a recorded audio (i.e., an input audio), the recorded audio may be converted into a file in TFRecord format in response to a triggering operation of the user on the recorded audio, and then the first melody information including the note string" 3451 "in the recorded audio is extracted from the file in TFRecord format.

S202: and generating target music according to the first melody information and the music learning model.

In this embodiment, after the first melody information in the input audio is obtained, a target music may be generated according to the first melody information through a music learning model, where the target music may be a music score or an audio file. The target musical composition may include first melody information and second melody information that matches the first melody information, for example, the target musical composition may include the first melody information and subsequent melody information that is subsequent to the first melody information, i.e., the second melody information may be subsequent melody information that is subsequent to the first melody information, or the target musical composition may include the first melody information and melody information that complements the first melody information with the first melody information, i.e., the second melody information may also be melody information that complements the first melody information (e.g., the second melody information may be chord information that augments the first melody information, etc.). The music learning model may be a deep learning model capable of generating music from melody information, and may be an RNN model, for example.

It should be noted that the present embodiment provides a plurality of ways of obtaining the target musical composition according to the first melody information and the musical composition learning model. Next, an implementation of how two kinds of target music are obtained from the first melody information and the music learning model will be described.

In one possible implementation manner, the first melody information may be input into the music learning model to obtain the target music generated by the music learning model. Wherein, the music learning model may be trained in advance based on the correspondence between the first history melody information and the second history melody information. Specifically, in this embodiment, the first historical melody information and the second historical melody information that matches the first historical melody information may be obtained first, and then the music learning model may be trained according to the correspondence between the first historical melody information and the second historical melody information. It should be noted that, in the complete melody information of the historical music, the second historical melody information may be connected after the first historical melody information, or the second historical melody information may include the first historical melody information and the subsequent historical melody information connected after the first historical melody information, or the second historical melody information may be the historical melody information that supplements the first historical melody information with a color (for example, the second historical melody information may be the historical chord information added to the first historical melody information, etc.); the historical music may be music operated by the user, for example, music listened by the user using a terminal device, music recorded by the user using singing software such as a singing bar, or music shared by the user with friends through chat software (e.g., weChat). Thus, the first melody information is inputted to the music learning model, and the target music corresponding to the first melody information is obtained.

Specifically, after the input first melody information is acquired, the first melody information may be input into a music learning model, which may generate a target music corresponding to the first melody information for the first melody information. It should be emphasized that the specific manner of generating the target musical composition in this embodiment will be described in detail later.

For example, assuming that the acquired first melody information includes the note string "3451", the first melody information may be input into the music piece learning model. The music learning model may generate a target music corresponding to the note string "3451" and including the note string "34515234515", i.e., the music learning model may generate a target music corresponding to the note string "3451" and including the note string "3451" in the first melody information and the note string "5234515" in the subsequent melody information (i.e., second melody information matching the note string "3451" in the first melody information) in the complete melody information of the target music. Or the music learning model may generate a target music corresponding to the note string "3451" and including the note string "34515234", that is, the music learning model may generate a target music corresponding to the note string "3451", and the complete melody information of the target music includes the note string "3451" in the first melody information and the note string "5234" in the chord information added to the first melody information (that is, the second melody information complementary to the note string "3451" in the first melody information).

In another possible implementation manner, a target primer melody matched with the first melody information may be searched for in a plurality of preset primer melodies, and the target primer melody may be input into a target music learning model to obtain a target music generated by the target music learning model. Wherein, the starter melody matched with the first melody information may be referred to as a target starter melody. It should be noted that, the music learning model may be trained in advance based on the correspondence between the primer melody and the complete melody information in the history music; the complete melody information in the history musical composition may include the starter melody and history subsequent melody information connected after the starter melody. Thus, by inputting the target starter melody matched with the first melody information to the music learning model, the target music corresponding to the target starter melody can be obtained.

In this way, a plurality of starter melodies may be preset, and when the inputted first melody information matches one of the plurality of starter melodies, the description music learning model may generate a target music corresponding to the first melody information for the first melody information. Therefore, when the input first melody information is acquired, the first melody information and the plurality of primer melodies may be sequentially matched, and if the preset primer melodies have primer melodies matched with the first melody information, the primer melodies may be input into a music learning model to obtain a target music corresponding to the first melody information generated by the music learning model.

For example, it is assumed that a plurality of preset starter melodies includes a starter melody a having a note string of "3451". When the acquired first melody information includes the note string "3451", a starter melody matching the note string "3451" may be searched for among the plurality of preset starter melodies according to the note string "3451". Since the note string "3451" matches the primer melody a among the plurality of primer melodies, the primer melody a can be taken as the target primer melody corresponding to the note string "3451" and can be input into the music learning model. Next, the music learning model may generate a target music corresponding to the starter melody a and including the note string "34515234515", that is, the music learning model may generate a target music corresponding to the note string "3451", and the complete melody information of the target music includes the starter melody a and the note string "5234515" in the subsequent melody information (that is, the second melody information matched with the starter melody a).

Next, the implementation of "generating a target musical composition from the first melody information and the musical composition learning model" mentioned in the description about S202 will be specifically described.

As shown in fig. 3, the implementation of "generating a target musical composition from the first melody information and the musical composition learning model" may include the steps of:

S202a: and acquiring a first note string corresponding to the first melody information, and inputting the first note string into the music learning model to obtain a second note string generated by the music learning model.

In this embodiment, for convenience of description, a note string to be currently processed may be referred to as a first note string, and a note string obtained by the music learning model from the input first note string may be referred to as a second note string.

Specifically, after the first melody information is obtained, a first note string corresponding to the first melody information may be determined first; then, notes in the first note string may be input into the music learning model first; the musical composition learning model may then generate a second string of notes from the input notes. Wherein the second note string may include a first note string and a predicted note string connected after the first note string, and the predicted note string may be understood as a note string obtained by a music learning model from the first note string; of course, the second note string may include only the predicted note string.

Note that the number of notes in the predicted note string may be preset. For example, the number of notes in the predicted note string may be fixed, for example, assuming that the first note string is "2265" and the number of notes of the predicted note string is set to two, the music learning model may generate a second note string "226574" from the first note string "2265", wherein the note string "74" in the second note string may be the predicted note string corresponding to the first note string "2265". For another example, the number of notes in the predicted note string may be determined according to the number of notes of the first note string, for example, assuming that the first note string is "2265" and the number of notes of the predicted note string may be the same as the number of notes in the first note string, then the music learning model may generate a second note string "22657487" according to the first note string "2265", wherein the note string "7487" in the second note string may be the predicted note string corresponding to the first note string "2265".

Next, an explanation will be given for S202a assuming that the number of notes of the predicted note string in the second note string is set to two. After the first melody information is acquired, since the note string "345345" is included in the first melody information, the note string "345345" of the first melody information may be acquired from the first melody information, and the note string "345345" may be regarded as a first note string. Then, the first note string "345345" may be input into a music learning model, wherein, as shown in fig. 4, the music learning model may be an RNN model, and the RNN models are all in Long Short-Term Memory (LSTM) structure. Next, the music learning model may generate a second note string "34534576" from the first note string "345345", which may include the first note string "345345" and a predicted note "76" connected after the first note string "345345". It should be emphasized that in LSTM structures, all of the input notes have an impact on the prediction of the predicted notes, i.e., if the input notes change, the RNN model of the LSTM structure will also differ according to the predicted notes predicted by the input notes.

S202b: judging whether the second note string meets a preset condition or not; if yes, S202c is executed, and if no, S202d is executed.

In this embodiment, the preset condition may be that the duration of the melody corresponding to the second note string reaches the preset duration of the target music. The duration of the melody may be determined by the number of notes in the note string, for example, the duration of the melody corresponding to the note string including eight notes is typically 1 second, and the duration of the melody corresponding to the note string including sixteen notes is typically 2 seconds.

Thus, after the second note string is determined, the number of notes in the second note string may be determined first. Then, the melody duration of the second note string may be determined according to the number of notes. Then, it may be judged whether or not the melody duration of the second note string satisfies the preset duration of the target musical composition.

It should be noted that, the preset duration of the target music may be preset by the system. Of course, the preset duration of the target music may also be preset by the user according to the actual requirement, for example, when the user wants to create a piece of music for a recitation program with a duration of three minutes, the preset duration of the target music may be set to three minutes, and for example, when the user wants to create a piece of music for a middle rest of a basketball game, the preset duration of the target music may be set to one minute.

Continuing to take the "second note string '34534576'" in S202a as an example, after the music learning model generates the second note string "34534576" according to the first note string, it may be determined that the number of notes in the second note string "34534576" is eight first; then, it may be determined that the melody duration of the second note string is 0.8 seconds; then, it may be judged whether or not the melody duration 0.8 seconds of the second note string "34534576" satisfies the preset duration of the target musical composition.

S202c: a target musical composition is determined based on the second note string.

In this embodiment, if the second note string meets the preset condition, it is indicated that the melody duration of the second note string meets the preset duration of the target music. Thus, the target musical composition can be determined based on the second note string.

Continuing taking the "second note string '34534576'" in S202b as an example, if the preset duration of the target melody information is 0.8 seconds, the preset duration of the target musical composition may be satisfied by the melody duration of the second note string "3453457" for 0.8 seconds, so that the target musical composition corresponding to the first note string "345345" may be generated based on the second note string "34534576".

S202d: taking the second note string as the first note string, and returning to the steps from S202a to S202d until the second note string meets the preset condition.

In this embodiment, if the second note string does not satisfy the preset condition, it is indicated that the melody duration of the second note string does not satisfy the preset duration of the target melody information. Accordingly, the steps in S202a to S202d may be continued for the second note string until the second note string satisfies a preset condition. Continuing with the example of "second note string '34534576'" in S202b, if the preset duration of the target musical composition is 3 minutes, the melody duration of the second note string "34534576" is 0.8 seconds, which does not satisfy the preset duration of the target musical composition. Thus, the second note string "34534576" can be taken as a first note string, and the first note string "4534576" can be input into the music learning model, and the subsequent steps can be continued.

It can be seen that in the embodiment of the present invention, the first melody information in the input audio may be acquired first, and then the target music may be generated according to the first melody information and the music learning model. Therefore, in the embodiment of the invention, when the user wants to create the music in time, the user only needs to create a part of melodies of the music, and can predict all melodies of the music based on the part of melodies input by the user, so that the user can participate in the created music, and the user can feel the created music in a self-care manner.

When the input audio of the target user is an audio file in MP3, WMA, WAV and other formats, because the memory space occupied by the audio file in MP3, WMA, WAV and other formats is larger, a larger running memory is needed to process the audio file. In addition, since the audio files in MP3, WMA, WAV and the like need multiple data processing to be converted into the file in TFRecord format, so that the first melody information can be extracted from the file in TFRecord format, in the process of extracting the first melody information in the input audio, more processing steps are required to be performed on the audio files in MP3, WMA, WAV and the like, so that the process of extracting the first melody information from the audio files in MP3, WMA, WAV and the like is complex.

In order to simplify the process of extracting the first melody information and reduce the running memory when extracting the first melody information, in one implementation of the embodiment of the present invention, S201 may include the following steps:

S201a: the input audio is extracted.

S201b: and if the format of the input audio is not midi format, converting the input audio into midi format to obtain the input audio in midi format.

S201c: the first melody information is extracted from the midi format input audio.

It should be noted that, since the memory space occupied by the input audio in midi format is small, a large running memory is not required to process the input audio in midi format. Furthermore, since only simple processing is required for the midi format input audio, the first melody information in the midi format input audio can be extracted, for example, the midi format input audio can be directly converted into a TFRecord format file, and then the first melody information can be extracted from the TFRecord format file; thus, the process of extracting the first melody information is also simplified.

As an example, after acquiring the input audio of the target user 101, the terminal device 102 may determine whether the input audio is in midi format; if yes, the terminal device 102 may send the input audio to the server 103, and the server 103 may extract the first melody information from the midi format input audio; if not, the terminal device 102 may first convert the input audio into midi format to obtain input audio in midi format, then, the terminal device 102 sends the input audio in midi format to the server 103, and the server 103 may extract the first melody information from the input audio in midi format. Therefore, the process of extracting the first melody information is simpler and faster, and the running memory for extracting the first melody information is reduced, so that the time for data processing is shortened, and the time for generating target music is shortened.

Further, after the music learning model generates the target music, when the user wishes to make the chorus part in the target music as a ringtone, that is, the user is interested in only the chorus part in the target music; then it is necessary to edit the target composition, i.e. to leave only the chorus part in the target composition, and to delete the other parts.

Therefore, after S202, the target music generated by the music learning model is obtained, and the target music can be edited according to the requirement of the user or the preset requirement. In this embodiment, a plurality of ways of editing a target musical composition are provided, and two ways are described below.

In one implementation, the target musical composition may be edited in accordance with an editing operation of the target musical composition in response to the editing operation by a user. Wherein the editing operation may include editing operations for clips of the target musical composition, modifications of the melody in the target musical composition, and the like. For example, when the user is dissatisfied with a piece of melody in the target musical composition, after the piece of melody is rearranged, the piece of melody in the target musical composition may be modified accordingly in accordance with the user's operation for the target musical composition, and the modified target musical composition may be output.

In another implementation manner, the target music piece may be edited according to a preset editing rule. For example, assuming that a preset editing rule is to clip a target musical composition and only a portion corresponding to the previous minute in the target musical composition is reserved, after the target musical composition is generated, the portion corresponding to the previous minute of the target musical composition may be clipped and stored according to the preset editing operation, that is, the portion corresponding to the previous minute of the target musical composition may be output as the edited target musical composition.

It can be seen that, in this embodiment, after the target music generated by the music learning model is obtained, the portion of the target music that is interested by the user may be edited according to the requirement of the user or the preset requirement, so that the edited target music may meet the requirement of the user on the target music, thereby implementing the music that the user may participate in the composition, and expressing the feeling of the user.

Further, in order to make the generated target musical composition more in line with the requirements of the target user, for example, the target musical composition learning model may be trained with the history musical composition in which the target user participates. Accordingly, in one implementation of the present embodiment, after S203, the following steps may be further included:

and responding to the operation of storing or sharing the target music by the target user, and taking the target music as the historical music so that the target music is used for training the music learning model.

In this embodiment, after the target music is generated, if the target user performs the operation of saving or sharing the target music, the target music may be used as a history music, and the target music may be used to train the music learning model. For example, the target user saves the target music locally to the terminal device, for example, the target user shares the target music to other friends by using instant messaging software, for example, the target user shares the target music to a circle of friends in the instant messaging software; thus, the target melody information determined based on the output result of the music learning model can be more in accordance with the authoring requirement of the target user, so that the target music generated according to the target melody information can be more capable of expressing the mind feeling of the target user.

Further, since music pieces can be classified into various types, for example, classical type music pieces, popular type music pieces, and the like. Therefore, the types of target music that the user wants to generate may also be different in different scenes, for example, when the user enjoys a landscape in a ancient town with a long history, the target music that the user wants to generate is a classical type of music, and when the user celebrates in a birthday party, the target music that the user wants to generate is a rock type of music.

Therefore, in order to generate different types of music according to different needs of the user, in one implementation manner of the present embodiment, the music learning model may include a plurality of music learning models, where each music learning model may correspond to one music type, and each music learning model may be obtained by training according to a correspondence between the first historical melody information and the second historical melody information. In one possible implementation, the training manner of the plurality of music learning models may be: the first historical melody information and the second historical melody information of different music types are obtained, and the music learning models of the different music types can be obtained through training according to the corresponding relation among the music types, the first historical melody information and the second historical melody information.

Wherein the musical composition type may be a classification of different dimensions, for example, when the musical composition type is determined from a musical instrument playing the musical composition, the musical composition type may include at least: pianos, guitars, urheens, etc.; when the musical composition type is determined according to the playing environment of the musical composition, the musical composition type may include at least: indoor, outdoor, KTV, etc.; when determining a musical composition type according to a performance manner of a musical composition, the musical composition type may include at least: chord, mixing, etc.

Specifically, in this embodiment, after the first historical melody information and the second historical melody information that matches the first historical melody information are obtained, the types of the music pieces of the first historical melody information and the second historical melody information may be determined first. Then, a music model corresponding to the music type may be determined from among a plurality of music models according to the music type. Then, the music model may be trained according to the correspondence between the first historical melody information and the second historical melody information. Thus, the music learning model of different music types can be trained according to the first historical melody information and the second historical melody information of different music types. That is, the music learning models are trained based on the first and second historical melody information under different music types, and the obtained prediction models are different, i.e., each music learning model corresponds to one music type.

For example, the music learning model is trained based on the first historical melody information and the second historical melody information of the simple chord type, and the obtained music learning model may be a music learning model corresponding to the simple chord type; for example, the music learning model is trained based on the first historical melody information and the second historical melody information of the piano type, and the obtained music learning model may be a music learning model corresponding to the piano type. Thus, the same first melody information is inputted into the music learning models corresponding to the different music types, respectively, and the generated target music is completely different in the music type.

It should be noted that, in the case that the music learning model includes a plurality of music learning models and different music learning models correspond to different music types, the embodiment of the present application further provides a plurality of ways of generating the target music according to the first melody information and the music learning model (i.e., provides a plurality of implementations of S202), and generates the target music of the music type using the music learning model. Next, description will be made for two ways:

The first way is: determining a first target music learning model from the music learning models in response to a selection operation by a user; and generating the target music according to the first melody information and the first target music learning model.

In this mode, the type of music selected by the user may be referred to as a first type of music, and for example, the first type of music may include a simple chord type and may also include a piano type. Also, the music learning model that matches the first music type may be referred to as a first music learning model.

Specifically, after the user inputs the first melody information, one of the plurality of musical composition types may be selected as the first musical composition type according to the need. Then, a first target music learning model may be determined from the plurality of music learning models according to the first music type in response to a selection operation by the user. Then, the first melody information may be input into the first target learning model, which may generate a target musical composition belonging to the first musical composition type based on the first melody information. Thus, the user can obtain the target musical composition whose musical composition type is the first musical composition type.

For example, it is assumed that the user inputs a piece of self-humming audio through a cellular phone, and selects a piano type from a plurality of music types such as a simple Shan Hexian type and a piano type, i.e., the piano type is the first music type. Then, in response to a selection operation by the user, a music learning model corresponding to the piano type may be determined from among a plurality of music learning models first, and the music learning model may be taken as the first target music learning model. Then, the audio humming by the user is input into the first target music learning model. Then, the first target music learning model may generate a target music of the piano type. Thus, the user can obtain the target music piece whose first music piece type is the piano type.

Therefore, in this aspect, after the user selects the first musical composition type of the target musical composition, the prediction model corresponding to the first musical composition type may be determined from the first musical composition type, and then the prediction model may be used as the first target musical composition learning model. Thus, the first target music learning model can be utilized to generate target music corresponding to the first music type selected by the user, so that the requirements of different users for generating different types of target music can be met.

The second way is: acquiring historical behavior data of a user, and extracting historical music information from the historical behavior data; determining a second music type corresponding to the historical music information, and acquiring a second target music learning model matched with the music type from the music learning model; and generating the target music according to the first melody information and the second target music learning model.

In this way, the historical behavior data of the user may be a historical operation record of the user on a mobile phone, an ipad, a computer and other devices, and the historical behavior data of the user may include historical music information operated by the user, where the historical music information may include music listened to by the user on a mobile phone, an ipad and other portable devices, music downloaded by the user on a mobile phone, a computer and other devices, music collected by the user in music software, music recorded by the user using singing software such as a singing bar, music shared by the user with friends through chat software (e.g., weChat), and the like.

Also, the musical composition type corresponding to the history musical composition information extracted from the history behavior data of the user may be referred to as a second musical composition type, which may include, for example, a simple chord type and may also include a piano type. Also, the music learning model that matches the second music type may be referred to as a second music learning model.

Specifically, after the user inputs the first melody information, the user's historical behavior data may be first acquired, and the historical musical composition information may be extracted from the historical behavior data. Then, a second musical composition type corresponding to the history musical composition information may be determined. Then, a second target music learning model may be acquired from the plurality of music learning models according to the second music type. Next, the first melody information may be input into the second target music learning model, which may generate a target music belonging to the second music type based on the first melody information. Thus, the user can obtain the target music whose music type is the second music type. It should be noted that, in one possible implementation, the historical musical composition used for training the second target musical composition learning model may be historical musical composition information extracted from the historical behavior data of the user.

For example, after a user inputs a piece of humming audio through a mobile phone, historical behavior data of the user in a music player may be acquired first, and historical music information may be extracted from the historical behavior data, where the historical music information includes: the twenty-first piano concerto, the first piano concerto and the second piano concerto. Then, it may be determined that the music type corresponding to the history music information is the piano type, i.e., the piano type may be regarded as the second music type. Next, a second target music learning model corresponding to the piano type may be acquired from the plurality of music learning models according to the piano type. Then, the audio humming by the user can be input into the second target music learning model, and the second target learning model can generate target music belonging to the piano type according to the audio humming by the user.

Therefore, after the input audio of the user is acquired, the historical behavior data of the user can be acquired, the historical music information is extracted from the historical behavior data, the second music type corresponding to the historical music information is determined, and the second target music learning model matched with the music type is acquired from the music learning model. Then, the target musical composition may be generated using the second target musical composition learning model and the first melody information. Thus, the second target music learning model is determined from the multiple music learning models by utilizing the historical music information corresponding to the user, and the target music meeting the personalized requirements of the user is generated by utilizing the second target music learning model, so that the generated target music is more targeted and personalized.

Exemplary apparatus

Referring to fig. 5, there is shown an apparatus for generating a musical composition according to an embodiment of the present invention, the apparatus including: an acquisition unit 501 and a generation unit 502;

the acquiring unit 501 is configured to acquire first melody information in an input audio;

the generating unit 502 is configured to generate a target music according to the first melody information and a music learning model, where the target music includes the first melody information and second melody information matched with the first melody information.

Optionally, the apparatus further includes: a training unit;

The acquiring unit 501 is further configured to acquire first historical melody information and second historical melody information that matches the first historical melody information;

Optionally, the generating unit 502 is further configured to obtain a first note string corresponding to the first melody information, and input the first note string to the music learning model to obtain a second note string generated by the music learning model; judging whether the second note string meets a preset condition or not; and if the second note string meets the preset condition, determining target music based on the second note string.

Optionally, the generating unit 502 is further configured to search a plurality of preset starter melodies for a target starter melody that matches the first melody information; inputting the target primer melody into the music learning model to obtain the target music generated by the music learning model;

Optionally, the generating unit 502 is further configured to determine a first target music learning model from the music learning models in response to a selection operation of a user; and generating the target music according to the first melody information and the first target music learning model.

Optionally, the generating unit 502 is further configured to obtain historical behavior data of a user, and extract historical music information from the historical behavior data; determining a second music type corresponding to the historical music information, and acquiring a second target music learning model matched with the music type from the music learning model; and generating the target music according to the first melody information and the second target music learning model.

Optionally, the acquiring unit 501 is further configured to acquire input audio; if the format of the input audio is not midi format, converting the input audio into midi format to obtain the input audio in midi format; the first melody information is extracted from the midi format input audio.

Optionally, the apparatus further includes: an editing unit;

Optionally, the apparatus further includes: a determination unit;

Referring to fig. 6, an apparatus 600 for generating a musical composition may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 606, an audio component 610, an input/output (I/O) interface 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the apparatus 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 may include a multimedia module to facilitate interaction between the multimedia component 606 and the processing component 602.

The memory 604 is configured to store various types of data to support operations at the device 600. Examples of such data include instructions for any application or method operating on the apparatus 600, contact data, phonebook data, messages, pictures, videos, and the like. The memory 604 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 606 provides power to the various components of the device 600. The power supply components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 606 includes a screen between the device 600 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 606 includes a front-facing camera and/or a rear-facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 600 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor assembly 614 may detect the on/off state of the device 600, the relative positioning of the components, such as the display and keypad of the apparatus 600, the sensor assembly 614 may also detect the change in position of the apparatus 600 or one of the components of the apparatus 600, the presence or absence of user contact with the apparatus 600, the orientation or acceleration/deceleration of the apparatus 600, and the change in temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communication between the apparatus 600 and other devices in a wired or wireless manner. The device 600 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication part 616 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

The embodiment of the invention provides a music generating device. The apparatus includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for:

Acquiring first melody information in input audio;

Optionally, the processor is further configured to execute the one or more programs to include instructions for:

judging whether the second note string meets a preset condition or not;

if the second note string meets the preset condition, determining target music based on the second note string;

The preset condition is that the melody duration corresponding to the second note string reaches the preset duration of the target music

The music learning model includes a plurality of pieces, and the training of the music learning model according to the corresponding relation between the first historical melody information and the second historical melody information includes:

The generating target music according to the first melody information and the music learning model includes:

acquiring input audio;

the first melody information is extracted from the midi format input audio.

Or alternatively

And editing the target music according to preset editing rules.

Embodiments of the present invention also provide a non-transitory computer-readable storage medium, such as memory 604, comprising instructions executable by processor 620 of apparatus 600 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A non-transitory computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform a method of generating a musical composition, the method comprising:

Acquiring first melody information in input audio;

Optionally, the method further comprises:

judging whether the second note string meets a preset condition or not;

acquiring input audio;

the first melody information is extracted from the midi format input audio.

Optionally, the method further comprises:

Or alternatively

And editing the target music according to preset editing rules.

Optionally, the method further comprises:

Fig. 7 is a schematic structural diagram of a server according to an embodiment of the present invention. The server 700 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPUs) 722 (e.g., one or more processors) and memory 732, one or more storage mediums 730 (e.g., one or more mass storage devices) that store applications 742 or data 744. Wherein memory 732 and storage medium 730 may be transitory or persistent. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 722 may be configured to communicate with the storage medium 730 and execute a series of instruction operations on the server 700 in the storage medium 730.

The server 700 may also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input/output interfaces 758, one or more keyboards 756, and/or one or more operating systems 741 such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A method of generating a musical composition, comprising:

Acquiring first melody information in input audio;

Generating target music according to the first melody information and a music learning model, wherein the target music comprises the first melody information and second melody information matched with the first melody information;

the music learning model comprises a plurality of training modes of the music learning models:

Acquiring first historical melody information of different music types and second historical melody information matched with the first historical melody information;

Training to obtain music learning models of different music types according to the corresponding relation among the music types, the first historical melody information and the second historical melody information;

Wherein the generating the target music according to the first melody information and the music learning model includes:

Determining a second music type corresponding to the historical music information, and acquiring a second target music learning model matched with the music type from the music learning model;

and generating the target music according to the first melody information and the second target music learning model.

2. The method of claim 1, wherein the generating a target musical composition from the first melody information and a musical composition learning model includes:

judging whether the second note string meets a preset condition or not;

3. The method according to claim 2, wherein the preset condition is that the melody duration corresponding to the second note string reaches the preset duration of the target musical composition.

4. The method of claim 1, wherein the generating a target musical composition from the first melody information and a musical composition learning model includes:

5. The method of claim 1, wherein the generating a target musical composition from the first melody information and a musical composition learning model includes:

6. The method according to any one of claims 1 to 5, wherein the acquiring the first melody information in the input audio includes:

acquiring input audio;

the first melody information is extracted from the midi format input audio.

7. The method according to any one of claims 1 to 5, further comprising:

Or alternatively

And editing the target music according to preset editing rules.

8. The method according to any one of claims 1 to 5, further comprising:

9. An apparatus for generating a musical composition, comprising: an acquisition unit and a generation unit;

The generating unit is used for generating target music according to the first melody information and a music learning model, wherein the target music comprises the first melody information and second melody information matched with the first melody information;

The generating unit is specifically configured to:

10. The apparatus of claim 9, wherein the generating unit is further configured to obtain a first note string corresponding to the first melody information, input the first note string to the music learning model, and obtain a second note string generated by the music learning model; judging whether the second note string meets a preset condition or not; and if the second note string meets the preset condition, determining target music based on the second note string.

11. The apparatus of claim 10, wherein the preset condition is that a melody duration corresponding to the second note string reaches a preset duration of the target musical composition.

12. The apparatus of claim 9, wherein the generating unit is further configured to search a plurality of preset starter melodies for a target starter melody that matches the first melody information; inputting the target primer melody into the music learning model to obtain the target music generated by the music learning model;

13. The apparatus according to claim 9, wherein the generating unit is further configured to determine a first target music learning model from the music learning models in response to a selection operation by a user; and generating the target music according to the first melody information and the first target music learning model.

14. The apparatus according to any one of claims 9 to 13, wherein the acquisition unit is further configured to acquire input audio; if the format of the input audio is not midi format, converting the input audio into midi format to obtain the input audio in midi format; the first melody information is extracted from the midi format input audio.

15. The apparatus according to any one of claims 9 to 13, further comprising: an editing unit;

16. The apparatus according to any one of claims 9 to 13, further comprising: a determination unit;

17. A music generating device comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising a method for performing the generating of a music according to any of claims 1 to 8.

18. A non-transitory computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method of generating a musical composition as claimed in any one of claims 1 to 8.