CN110660375A

CN110660375A - Method, device and equipment for generating music

Info

Publication number: CN110660375A
Application number: CN201810689003.9A
Authority: CN
Inventors: 周伟浩; 关键; 张喜梅; 张亚鹏; 肖彬; 夏丁胤; 余浩
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2020-01-07

Abstract

The invention discloses a method for generating music, which comprises the following steps: acquiring first melody information in input audio; and generating the target music according to the first melody information and the music learning model. According to the method provided by the embodiment of the invention, when the user wants to compose the music in real time, the user only needs to create a part of melody of the music, and all the melody of the music can be predicted based on the part of melody input by the user, so that the music participated in the composition by the user is provided for the user, and the user can participate in the music authored by the user to express the feeling of the mind of the user. In addition, the invention also discloses a device and equipment for generating the music.

Description

Method, device and equipment for generating music

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a device for generating a musical composition.

Background

Currently, various music-like applications are commonly used by users. A common music class application may find and provide the user with the music that the user wants. Specifically, for an existing music application, the server may collect audio of ready-made music pieces that have been authored in advance, and the user may obtain music piece audio desired by the user from the audio collected in advance by the server through the client.

As the user's demand for music becomes more personalized, the user may sometimes want to compose a piece of music on the fly. However, users often have problems of insufficient expertise in music composition, insufficient inspiration, insufficient creation time or no instrument for assisting creation at present, and the like, so that users often cannot create all melodies of music. In this case, the existing music-like application cannot provide the user with a piece of music that the user authored instantly.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method, a device and equipment for generating music, which are used for predicting all melodies of the music based on a part of melodies input by a user so as to provide music audio which is formed by the user participating in the creation.

In a first aspect, an embodiment of the present invention provides a method for generating a music piece, where the method includes:

acquiring first melody information in input audio;

and generating a target music piece according to the first melody information and a music piece learning model, wherein the target music piece comprises the first melody information and second melody information matched with the first melody information.

Optionally, the method further includes:

acquiring first historical melody information and second historical melody information matched with the first historical melody information;

and training the music learning model according to the corresponding relation between the first historical melody information and the second historical melody information.

Optionally, the generating a target music piece according to the first melody information and a music piece learning model includes:

acquiring a first note string corresponding to the first melody information, and inputting the first note string into the music learning model to obtain a second note string generated by the music learning model;

judging whether the second note string meets a preset condition or not;

and if the second note string meets a preset condition, determining the target music based on the second note string.

Optionally, the preset condition is that the melody duration corresponding to the second note string reaches the preset duration of the target music.

searching a target primer melody matched with the first melody information in a plurality of preset primer melodies;

inputting the target primer melody into the music learning model to obtain the target music generated by the music learning model;

wherein the music piece learning model is trained in advance based on the corresponding relation between the primer melody and the complete melody information in the historical music pieces.

Optionally, the music piece learning model includes a plurality of models, and the training of the music piece learning model according to the correspondence between the first historical melody information and the second historical melody information includes:

acquiring the first historical melody information and the second historical melody information of different music types;

and training to obtain the music learning models of different music types according to the corresponding relation among the music types, the first historical melody information and the second historical melody information.

determining a first target music learning model from the music learning models in response to a selection operation by a user;

generating the target music piece according to the first melody information and the first target music piece learning model.

acquiring historical behavior data of a user, and extracting historical music information from the historical behavior data;

determining a second music type corresponding to the historical music information, and acquiring a second target music learning model matched with the music type from the music learning model; and generating the target music according to the first melody information and the second target music learning model.

Optionally, the acquiring the first melody information in the input audio includes:

acquiring an input audio;

if the format of the input audio is not the midi format, converting the input audio into the midi format to obtain the midi format input audio;

extracting the first melody information from the midi format input audio.

Optionally, the method further includes:

responding to the editing operation of the user on the target music, and editing the target music according to the editing operation;

alternatively, the first and second electrodes may be,

and editing the target music according to a preset editing rule.

Optionally, the method further includes:

the target music piece is treated as a history music piece in response to an operation of the target music piece by the user.

In a second aspect, an embodiment of the present invention provides an apparatus for generating a musical composition, the apparatus including: an acquisition unit and a generation unit;

the acquisition unit is used for acquiring first melody information in input audio;

the generation unit is used for generating a target music according to the first melody information and a music learning model, wherein the target music comprises the first melody information and second melody information matched with the first melody information.

Optionally, the apparatus further comprises: a training unit;

the acquisition unit is further used for acquiring first historical melody information and second historical melody information matched with the first historical melody information;

the training unit is further configured to train the music piece learning model according to a corresponding relationship between the first historical melody information and the second historical melody information.

Optionally, the generating unit is further configured to obtain a first note string corresponding to the first melody information, input the first note string to the music learning model, and obtain a second note string generated by the music learning model; judging whether the second note string meets a preset condition or not; and if the second note string meets a preset condition, determining the target music based on the second note string.

Optionally, the generating unit is further configured to search a plurality of preset primer melodies for a target primer melody matched with the first melody information; inputting the target primer melody into the music learning model to obtain the target music generated by the music learning model;

Optionally, the music piece learning model includes a plurality of models, and the training unit is further configured to obtain the first historical melody information and the second historical melody information of different music piece types; and training to obtain the music learning models of different music types according to the corresponding relation among the music types, the first historical melody information and the second historical melody information.

Optionally, the generating unit is further configured to determine a first target music learning model from the music learning models in response to a selection operation by a user; generating the target music piece according to the first melody information and the first target music piece learning model.

Optionally, the generating unit is further configured to acquire historical behavior data of a user, and extract historical music information from the historical behavior data; determining a second music type corresponding to the historical music information, and acquiring a second target music learning model matched with the music type from the music learning model; and generating the target music according to the first melody information and the second target music learning model.

Optionally, the obtaining unit is further configured to obtain an input audio; if the format of the input audio is not the midi format, converting the input audio into the midi format to obtain the midi format input audio; extracting the first melody information from the midi format input audio.

Optionally, the apparatus further comprises: an editing unit;

the editing unit is used for responding to the editing operation of the user on the target music and editing the target music according to the editing operation; or, editing the target music according to a preset editing rule.

Optionally, the apparatus further comprises: a determination unit;

the determination unit is configured to take the target music piece as a history music piece in response to an operation of the target music piece by a user.

In a third aspect, an embodiment of the present invention further provides a music piece generating device, which includes a memory, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include a method for executing a music piece generated according to any one of the first aspect.

In a fourth aspect, the present invention further provides a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method for generating music in any one of the above first aspects.

Compared with the prior art, the embodiment of the invention has the following advantages:

according to the method provided by the embodiment of the invention, the first melody information in the input audio can be acquired, and then the target music can be generated according to the first melody information and the music learning model. Therefore, in the embodiment of the invention, when the user wants to compose the music in real time, the user only needs to create a part of melody of the music, and can predict all the melodies of the music based on the part of melody input by the user, so that the music participated in the composition by the user is provided for the user, and the user can participated in the composition by the user to express the feeling of mind of the user.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of an exemplary application scenario in an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a method of generating a musical composition according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for determining target melody information according to an embodiment of the invention;

FIG. 4 is a schematic structural diagram of a target music learning model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an apparatus for generating music according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an apparatus for generating a musical composition according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The inventor finds that with the demand of the user for the music becoming more personalized, the user may want to instantly compose a piece of music to express the feeling of mind. However, since the user often has problems in that the composition of the music is not professional enough, the inspiration is not sufficient, the composition time is not sufficient, or there is no instrument to assist the composition at present, the user often cannot create all the melodies of the music. In this case, the existing music-like application cannot provide the user with music audio that the user authored instantly.

In order to solve the above problem, in the embodiment of the present invention, when the user wants to compose a music piece by himself/herself in real time, first melody information input by the user may be acquired, and then, a target music piece may be generated based on the first melody information and a music piece learning model. For example, if the input first melody information of the user includes four notes, the music learning model may generate the target music piece including the four notes from the first melody information including the four notes. Therefore, all the melodies of the music can be predicted based on the partial melodies only by creating the partial melodies of the music by the user, so that the music which is participated in the creation by the user can be provided for the user, and the user can participated in the music which is created by the user and express the feeling of the mind of the user.

For example, embodiments of the present invention may be applied to the scenario shown in FIG. 1. In this scenario, the terminal device 102 may be any device with an audio input function (such as a device with a microphone) such as a mobile phone, an ipad, a desktop computer, and the like, and first, the user 101 may input audio by using the microphone in the terminal device 102 and may operate on the terminal device 102 to trigger the terminal device 102 to generate the target music piece based on the first melody information in the input audio. The terminal apparatus 102 may transmit the first melody information in the input audio to the server 103 in response to the operation of the user 101. Then, the server 103 may acquire the first melody information, and a music learning model in the server 103 may generate the target music from the first melody information. Finally, the server 103 may transmit the target music to the terminal device 102 so that the terminal device 102 may play the target music to the user 101.

It is to be appreciated that, in the application scenarios described above, while the actions of the embodiments of the present invention are described as being performed by the server 103, the actions may also be performed partially by the terminal device 102, partially by the server 103, or completely by the terminal device 102. The invention is not limited in its implementation to the details of execution, provided that the acts disclosed in the embodiments of the invention are performed.

It should be noted that the above application scenarios are only presented to facilitate understanding of the present invention, and the embodiments of the present invention are not limited in any way in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.

Various non-limiting embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Exemplary method

Referring to fig. 2, a method of generating a musical composition in an embodiment of the present invention is shown. In this embodiment, the method may include, for example, the steps of:

s201: first melody information in input audio is acquired.

In this embodiment, the target user may input audio to input the first melody information. The input audio of the target user may be audio input by the terminal device of the target user, for example, the audio may be a recorded audio hummed by a microphone in the terminal device of the target user, or an audio synthesized by inputting notes by a keyboard in the terminal device of the target user, or an audio segment pre-stored in the terminal device of the target user. The input audio may be an audio file in MP3 format, wav format, or the like.

Note that the melody information may be information including a note string, and the first melody information may be information including a note string in the input audio. For example, the first melody information may be a TFRecord format file extracted from the input audio, for example, the input audio may be converted into a TFRecord format file, then the data representing the melody and the data representing the polyphonic tone are extracted from the TFRecord format file, and then the data representing the melody and the data representing the polyphonic tone may be formed into a TFRecord format melody file and a TFRecord format polyphonic tone file, respectively, so that the TFRecord format melody file and the TFRecord format polyphonic tone file may be used as the first melody information.

For example, after the user hums a melody "3 (mi) 4 (french) 5 (treble) 1 (treble)" through a microphone in a mobile phone to generate a recorded audio (i.e., an input audio), the recorded audio may be converted into a TFRecord format file in response to a user's trigger operation on the recorded audio, and then, first melody information including a note string "3451" in the recorded audio is extracted from the TFRecord format file.

S202: and generating the target music according to the first melody information and the music learning model.

In this embodiment, after the first melody information in the input audio is acquired, the target music may be generated through a music learning model according to the first melody information, where the target music may be a music score or an audio file. The target music piece may include the first melody information and the second melody information matching the first melody information, for example, the target music piece may include the first melody information and the subsequent melody information following the first melody information, that is, the second melody information may be the subsequent melody information following the first melody information, or the target music piece may include the first melody information and the melody information complementing the first melody information with a touch-tone, that is, the second melody information may also be the melody information complementing the touch-tone to the first melody information (for example, the second melody information may be the chord information or the like added to the first melody information). The music learning model may be a deep learning model capable of generating music from melody information, and may be an RNN model, for example.

It should be noted that the present embodiment provides various ways of obtaining the target music piece according to the first melody information and the music piece learning model. Next, an explanation will be given on two implementations of how the target music is obtained from the first melody information and the music learning model.

In one possible implementation manner, the first melody information may be input into the music learning model to obtain the target music generated by the music learning model. The music learning model may be trained in advance based on a correspondence between the first history melody information and the second history melody information. Specifically, in this embodiment, first historical melody information and second historical melody information matching the first historical melody information may be obtained, and then the music piece learning model may be trained according to the correspondence between the first historical melody information and the second historical melody information. It should be noted that, in the complete melody information of the historical music piece, the second historical melody information may be concatenated after the first historical melody information, or the second historical melody information may include the first historical melody information and the historical subsequent melody information concatenated after the first historical melody information, or the second historical melody information may be the historical melody information supplemented with the color to the first historical melody information (for example, the second historical melody information may be the historical chord information added to the first historical melody information, etc.); the history music may be music operated by the user, for example, music listened to by the user using the terminal device, music recorded by the user using singing software such as a song bar, or music shared by the user with friends through chat software (e.g., WeChat). Thus, the first melody information can be inputted to the music learning model, and the target music corresponding to the first melody information can be obtained.

Specifically, after the input first melody information is acquired, the first melody information may be input into a music learning model, and the music learning model may generate a target music corresponding to the first melody information for the first melody information. It should be emphasized that the specific manner of generating the target music in this implementation will be described in detail later.

For example, assuming that the acquired first melody information includes the note string "3451", the first melody information may be input to the music piece learning model. The music piece learning model may generate the target music piece corresponding to the note string "3451" and including the note string "34515234515", that is, the music piece learning model may generate the target music piece corresponding to the note string "3451", and the complete melody information of the target music piece includes the note string "3451" in the first melody information and the note string "5234515" in the subsequent melody information (i.e., the second melody information matching the note string "3451" in the first melody information). Alternatively, the music piece learning model may generate the target music piece corresponding to the note string "3451" and including the note string "34515234", that is, the music piece learning model may generate the target music piece corresponding to the note string "3451", and the complete melody information of the target music piece includes the note string "3451" in the first melody information and the note string "5234" in the chord information added to the first melody information (i.e., the second melody information supplemented with the color of the note string "3451" in the first melody information).

In another possible implementation manner, the target primer melody matching the first melody information may be searched for among a plurality of preset primer melodies, and the target primer melody may be input into the target music piece learning model to obtain the target music piece generated by the target music piece learning model. The primer melody matching the first melody information may be referred to as a target primer melody. It should be noted that the music learning model may be trained in advance based on the corresponding relationship between the intro melody and the complete melody information in the historical music; the complete melody information in the historical music piece may include the intro melody and historical follow-up melody information concatenated after the intro melody. Thus, the target music corresponding to the target primer melody can be obtained by inputting the target primer melody matching the first melody information to the music learning model.

In this embodiment, a plurality of primer melodies may be preset, and when the input first melody information matches one of the plurality of primer melodies, the music piece learning model may generate the target music piece corresponding to the first melody information for the first melody information. Therefore, when the input first melody information is obtained, the first melody information and the plurality of primer melodies can be sequentially matched, if the preset primer melodies have the primer melodies matched with the first melody information, the primer melodies can be input into the music piece learning model, and the target music piece corresponding to the first melody information generated by the music piece learning model is obtained.

For example, assume that the preset quotation melodies include a quotation melody A with a note string of "3451". When the obtained first melody information includes the note string "3451", a quotation melody matching the note string "3451" can be searched for among the preset quotation melodies according to the note string "3451". Since the note string "3451" matches the primer melody a in the plurality of primer melodies, the primer melody a can be used as the target primer melody corresponding to the note string "3451" and the primer melody a can be inputted into a music learning model. Then, the music piece learning model may generate the target music piece including the note string "34515234515" corresponding to the guide melody a, that is, the music piece learning model may generate the target music piece corresponding to the note string "3451" and the complete melody information of the target music piece includes the note string "5234515" (i.e., the second melody information matching the guide melody a) in the guide melody a and the subsequent melody information.

Next, a specific description will be given of an implementation of "generating a target music piece from the first melody information and the music piece learning model" mentioned in the description about S202.

As shown in fig. 3, an implementation of "generating a target music piece from the first melody information and the music learning model" may include the steps of:

s202 a: and acquiring a first note string corresponding to the first melody information, and inputting the first note string into the music learning model to obtain a second note string generated by the music learning model.

In the present embodiment, for convenience of description, a note string to be currently processed may be referred to as a first note string, and a note string obtained by the music learning model from the input first note string may be referred to as a second note string.

Specifically, after the first melody information is acquired, a first note string corresponding to the first melody information may be determined first; then, the notes in the first note string may be first input into a music learning model; then, the music learning model may generate a second note string from the input notes. Wherein the second note string may include a first note string and a predicted note string connected after the first note string, the predicted note string may be understood as a note string obtained by a music learning model according to the first note string; of course, the second string of notes may also include only the predicted string of notes.

It should be noted that the number of notes in the predicted note string may be preset. For example, the number of notes in the predicted note string may be fixed, for example, assuming that the first note string is "2265" and the number of notes in the predicted note string is set to two, the music learning model may generate a second note string "226574" according to the first note string "2265", wherein the note string "74" in the second note string may be the predicted note string corresponding to the first note string "2265". For another example, the number of notes in the predicted note string may be determined according to the number of notes in the first note string, for example, assuming that the first note string is "2265" and the number of notes in the predicted note string may be the same as the number of notes in the first note string, then the music learning model may generate a second note string "22657487" according to the first note string "2265", wherein the note string "7487" in the second note string may be the predicted note string corresponding to the first note string "2265".

Next, it will be exemplified with respect to S202a, assuming that the number of notes of the predicted note string in the second note string is set to two. After the first melody information is obtained, since the note string "345345" is included in the first melody information, the note string "345345" of the first melody information can be obtained from the first melody information, and the note string "345345" can be used as the first note string. Then, the first note string "345345" can be input into a music learning model, wherein, as shown in fig. 4, the music learning model can be RNN models, and the RNN models are all in a long short-Term Memory network (LSTM) structure. Next, the music learning model may generate a second note string "34534576" from the first note string "345345", which may include the first note string "345345" and a predicted note "76" connected after the first note string "345345". It is emphasized that in the LSTM structure, all notes in the input note have an effect on the prediction of the predicted note, i.e., if a note in the input note is changed, the RNN model of the LSTM structure will also be different according to the predicted note predicted by the input note.

S202 b: judging whether the second note string meets a preset condition or not; if so, go to S202c, otherwise, go to S202 d.

In this embodiment, the preset condition may be that the melody duration corresponding to the second note string reaches the preset duration of the target music. The melody duration can be determined by the number of notes in the note string, for example, a note string including eight notes typically has a melody duration of 1 second, and a note string including sixteen notes typically has a melody duration of 2 seconds.

Therefore, after the second note string is determined, the number of notes in the second note string can be determined. Then, the melody duration of the second note string can be determined according to the number of the notes. Then, it can be determined whether the melody duration of the second note string satisfies the preset duration of the target music.

It should be noted that the preset duration of the target music may be preset by the system. Of course, the preset duration of the target music may also be preset by the user according to actual needs, for example, when the user wants to compose a music for a recited program with a duration of three minutes, the preset duration of the target music may be set to three minutes, and for example, when the user wants to compose a music for a half break of a basketball game, the preset duration of the target music may be set to one minute.

Continuing with the example of the "second note string '34534576'" in S202a, after the music learning model generates the second note string "34534576" from the first note string, the number of notes in the second note string "34534576" may be determined to be eight; then, the melody duration of the second note string may be determined to be 0.8 seconds; then, it can be determined whether the melody time length of the second note string "34534576" 0.8 second satisfies the preset time length of the target music piece.

S202 c: a target musical composition is determined based on the second note string.

In this embodiment, if the second note string satisfies the predetermined condition, it indicates that the melody duration of the second note string satisfies the predetermined duration of the target music piece. Accordingly, a target musical piece can be determined based on the second note string.

Continuing with the example of the "second note string" 34534576' "in S202b, if the preset duration of the target melody information is 0.8 seconds, the melody duration of the second note string" 3453457 "0.8 seconds may satisfy the preset duration of the target music piece, so that the target music piece corresponding to the first note string" 345345 "may be generated based on the second note string" 34534576 ".

S202 d: the second note string is taken as the first note string, and the steps in S202a to S202d are returned to be performed until the second note string satisfies a preset condition.

In this embodiment, if the second note string does not satisfy the predetermined condition, it indicates that the melody duration of the second note string does not satisfy the predetermined duration of the target melody information. Therefore, the steps in S202a to S202d may be continuously performed for the second note string until the second note string satisfies the preset condition. Continuing with the example of the "second note string" 34534576' "in S202b, if the preset time duration of the target music piece is 3 minutes, the melody time duration of the second note string" 34534576 "of 0.8 seconds does not satisfy the preset time duration of the target music piece. Thus, the second note string "34534576" can be regarded as the first note string, and the first note string "4534576" can be input to the music learning model, and the subsequent steps can be continued.

In the embodiment of the present invention, first melody information in the input audio may be obtained, and then the target music may be generated according to the first melody information and the music learning model. Therefore, in the embodiment of the invention, when the user wants to compose the music in real time, the user only needs to create a part of melody of the music, and can predict all the melodies of the music based on the part of melody input by the user, so that the music participated in the composition by the user is provided for the user, and the user can participated in the composition by the user to express the feeling of mind of the user.

When the input audio of the target user is an audio file in a format such as MP3, WMA, WAV, etc., since the memory space occupied by the audio file in the format such as MP3, WMA, WAV, etc. is large, a large operating memory is required to process the audio file. Moreover, since audio files in MP3, WMA, WAV, etc. formats need to be subjected to multiple data processing before being converted into files in TFRecord format, so as to be able to extract the first melody information from the files in TFRecord format, in the process of extracting the first melody information from the input audio, more processing steps need to be performed on the audio files in MP3, WMA, WAV, etc. formats before the first melody information can be extracted from the audio files, which results in a complicated process of extracting the first melody information from the audio files in MP3, WMA, WAV, etc.

In order to simplify the process of extracting the first melody information and reduce the operating memory for extracting the first melody information, in an implementation manner of the embodiment of the present invention, the step S201 may include the following steps:

s201 a: the input audio is extracted.

S201 b: and if the format of the input audio is not the midi format, converting the input audio into the midi format to obtain the midi format input audio.

S201 c: extracting the first melody information from the midi format input audio.

It should be noted that, because the memory space occupied by the midi-formatted input audio is small, a large running memory is not required to process the midi-formatted input audio. Moreover, since only simple processing needs to be performed on the midi-formatted input audio, the first melody information in the midi-formatted input audio can be extracted, for example, the midi-formatted input audio can be directly converted into a TFRecord-formatted file, and then, the first melody information can be extracted from the TFRecord-formatted file; thus, the process of extracting the first melody information is also simplified.

As an example, after acquiring the input audio of the target user 101, the terminal device 102 may determine whether the input audio is in a midi format; if yes, the terminal device 102 may send the input audio to the server 103, and the server 103 may extract the first melody information from the midi format input audio; if not, the terminal device 102 may first convert the input audio into the midi format to obtain the midi format input audio, and then the terminal device 102 sends the midi format input audio to the server 103, and the server 103 may extract the first melody information from the midi format input audio. Therefore, the process of extracting the first melody information is simpler and quicker, and the operation memory for extracting the first melody information is reduced, so that the data processing time is reduced, and the time for generating the target music is shortened.

Further, after the music learning model generates the target music, when the user wishes to make the refrain part in the target music available as the mobile phone ring tone, that is, the user is interested in only the refrain part in the target music; the target music piece needs to be edited by keeping only the refrain portion of the target music piece and deleting the other portions.

Therefore, after S202, the target music generated by the music learning model is obtained, and the target music can be edited according to the user' S requirement or the preset requirement. In the present embodiment, a plurality of ways of editing a target music piece are provided, and two ways of editing a target music piece are described below.

In one implementation, the target music may be edited in accordance with an editing operation in response to the editing operation of the target music by a user. The editing operation may include editing operations such as clipping of the target music piece, modification of the melody in the target music piece, and the like. For example, when the user is dissatisfied with a melody in the target music piece and the melody is rearranged, the melody of the target music piece may be modified accordingly in accordance with the rearrangement of the melody in the target music piece by the user in response to the user's manipulation of the target music piece, and the modified target music piece may be output.

In another implementation manner, the target music may be edited according to a preset editing rule. For example, if the preset editing rule is to clip the target music and only the part corresponding to the previous minute in the target music is reserved, after the target music is generated, the part corresponding to the previous minute of the target music may be clipped and stored according to the preset editing operation, that is, the part corresponding to the previous minute of the target music may be output as the edited target music.

Therefore, in this embodiment, after the target music generated by the music learning model is obtained, the interested part of the target music can be edited according to the requirement of the user or the preset requirement, so that the edited target music can meet the requirement of the user on the target music, and the user can participate in the created music to express the feeling of the mind of the user.

Further, in order to make the generated target music more suitable for the target user, for example, the target music learning model may be trained using historical music in which the target user participates in authoring. Therefore, in an implementation manner of this embodiment, after S203, the following steps may be further included:

and responding to the operation of the target user for saving or sharing the target music, and taking the target music as the historical music so that the target music is used for training the music learning model.

In this embodiment, after the target music is generated, if the target user performs an operation of saving or sharing the target music, the target music may be used as a history music, and the music learning model may be trained using the target music. For example, the target user stores the target music into the local terminal device, and for example, the target user shares the target music with other friends by using the instant messaging software, and for example, the target user shares the target music with a friend circle in the instant messaging software; thus, the target melody information determined based on the output result of the music piece learning model can be made to more conform to the creation requirement of the target user, and the target music piece generated based on the target melody information can be made to more express the feeling of mind of the target user.

Further, since music pieces can be divided into a plurality of types, for example, a classical type of music piece, a popular type of music piece, and the like can be divided. Therefore, the type of the target music desired to be generated may be different in different scenes by the user, for example, when the user enjoys a landscape in a ancient town with a long history, the target music desired to be generated is a music of a classical type, and when the user celebrates a birthday, the target music desired to be generated is a music of a rock type.

Therefore, in order to generate different types of music according to different requirements of the user, in an implementation manner of the embodiment, the music learning models may include a plurality of models, each music learning model may correspond to one music type, and each music learning model may be obtained by training according to a correspondence between the first historical melody information and the second historical melody information. In one possible implementation manner, the plurality of music learning models may be trained by: the method comprises the steps of obtaining first historical melody information and second historical melody information of different music types, and training to obtain music learning models of different music types according to the music types and the corresponding relation between the first historical melody information and the second historical melody information.

Where the music type may be a classification of different dimensions, for example, when the music type is determined from a musical instrument playing the music, the music type may include at least: pianos, guitars, erhu's, etc.; when the type of music is determined according to the playback environment of the music, the type of music may include at least: indoor, outdoor, KTV, church, etc.; when the type of music is determined according to the manner of performance of the music, the type of music may include at least: chords, mixing, etc.

Specifically, in the present embodiment, after the first history melody information and the second history melody information matching the first history melody information are acquired, the music piece types of the first history melody information and the second history melody information may be determined. Then, a music piece model corresponding to the music piece type may be determined from among a plurality of music piece models according to the music piece type. Then, the music model may be trained according to the correspondence between the first historical melody information and the second historical melody information. Therefore, the music learning models of different music types can be obtained by training according to the first historical melody information and the second historical melody information of different music types. That is, the music piece learning models are trained based on the first historical melody information and the second historical melody information under different music piece types, and the obtained prediction models are different, namely, each music piece learning model corresponds to one music piece type.

For example, the music learning model is trained based on the first historical melody information and the second historical melody information of the simple chord type, and the obtained music learning model can be a music learning model corresponding to the simple chord type; for another example, the music learning model is trained based on the first historical melody information and the second historical melody information of the church type, and the obtained music learning model may be a music learning model corresponding to the church type; for example, the music learning model is trained based on the first and second historical melody information of the piano type, and the resulting music learning model may be a music learning model corresponding to the piano type. In this way, the same first melody information is input to the music learning models corresponding to different music types, respectively, and the music types of the generated target music are completely different.

In the case where the music learning model includes a plurality of music learning models, and different music learning models correspond to different music types, the embodiment of the present application further provides a plurality of ways of generating the target music from the first melody information and the music learning model (i.e., provides a plurality of implementations of S202), and generates the target music of the music type using the music learning model. Next, description will be made for these two ways:

the first mode is as follows: determining a first target music learning model from the music learning models in response to a selection operation by a user; generating the target music piece according to the first melody information and the first target music piece learning model.

In this mode, the type of music selected by the user may be referred to as a first type of music, and the first type of music may include a simple chord type, may also include a church type, and may also include a piano type, for example. Also, a music learning model matching the first music type may be referred to as a first music learning model.

Specifically, the user may select one music piece type as the first music piece type from among a plurality of music piece types as desired after inputting the first melody information. Then, a first target music learning model may be determined from the plurality of music learning models in accordance with the first music type in response to a selection operation by the user. The first melody information can then be input into the first target learning model, which can generate the target music piece belonging to the first music piece type based on the first melody information. Thus, the user can obtain a target music having a music type of the first music type.

For example, assume that a user inputs a piece of hummed audio through a mobile phone, and selects a piano type from a plurality of music types such as a simple chord type, a church type, and a piano type, that is, the piano type is used as the first music type. Then, a music learning model corresponding to the piano type may be determined from among the plurality of music learning models in advance in response to the selection operation by the user, and may be taken as the first target music learning model. Then, the audio hummed by the user himself is inputted into the first target music learning model. Next, the first target music learning model may generate a piano type target music. Thus, the user can obtain a target music of which the first music type is a piano type.

In this aspect, therefore, after the user selects the first music type of the target music, the prediction model corresponding to the first music type may be determined on the basis of the first music type, and then the prediction model may be used as the first target music learning model. Thus, the target music corresponding to the first music type selected by the user can be generated by using the first target music learning model, so that the requirements of different users for generating different types of target music can be met.

The second mode is as follows: acquiring historical behavior data of a user, and extracting historical music information from the historical behavior data; determining a second music type corresponding to the historical music information, and acquiring a second target music learning model matched with the music type from the music learning model; and generating the target music according to the first melody information and the second target music learning model.

In this mode, the historical behavior data of the user may be a historical operation record of the user on a mobile phone, an ipad, a computer, and the like, and the historical behavior data of the user may include historical music information operated by the user, where the historical music information may include music listened by the user on a mobile phone, an ipad, and the like, or may include music downloaded by the user on the mobile phone, the computer, and the like, or may include music collected by the user in music software, or may include music recorded by the user using song-singing software such as a song bar, and music shared by the user and friends through chat software (e.g., a WeChat).

Also, the music type corresponding to the history music information extracted from the history behavior data of the user may be referred to as a second music type, and for example, the second music type may include a simple chord type, a church type, or a piano type. Also, a music learning model matching the second music type may be referred to as a second music learning model.

Specifically, after the user inputs the first melody information, the historical behavior data of the user may be acquired first, and the historical music piece information may be extracted from the historical behavior data. Then, a second music type corresponding to the history music information may be determined. Next, a second target music learning model may be acquired from the plurality of music learning models according to the second music type. Then, the first melody information may be inputted into the second target music learning model, and the second target learning model may generate the target music belonging to the second music type based on the first melody information. Thus, the user can obtain a target music having a music type of the second music type. It should be noted that, in one possible implementation, the historical music used for training the second target music learning model may be historical music information extracted from historical behavior data of the user.

For example, after a user inputs a humming audio through a mobile phone, historical behavior data of the user in a music player may be obtained, and historical music information may be extracted from the historical behavior data, where the historical music information includes: the twenty-first piano concerto, the first piano concerto and the second piano concerto. Then, it may be determined that the music type corresponding to the history music information is the piano type, that is, the piano type may be the second music type. Next, a second target music learning model corresponding to the piano type may be acquired from the plurality of music learning models according to the piano type. Then, the audio hummed by the user himself may be inputted into the second target music learning model, and the second target learning model may generate a target music of piano type based on the audio hummed by the user himself.

Therefore, after the input audio of the user is acquired, the historical behavior data of the user can be acquired, the historical music information is extracted from the historical behavior data, the second music type corresponding to the historical music information is determined, and the second target music learning model matched with the music type is acquired from the music learning model. Next, a target music may be generated using the second target music learning model and the first melody information. Therefore, the second target music learning model can be determined from the plurality of music learning models by utilizing the historical music information corresponding to the user, and the target music meeting the personalized requirements of the user is generated by utilizing the second target music learning model, so that the generated target music has pertinence and personalization.

Exemplary device

Referring to fig. 5, there is shown an apparatus for generating a musical composition according to an embodiment of the present invention, the apparatus including: an acquisition unit 501 and a generation unit 502;

the acquiring unit 501 is configured to acquire first melody information in an input audio;

the generating unit 502 is configured to generate a target music piece including the first melody information and the second melody information matching the first melody information according to the first melody information and a music piece learning model.

Optionally, the apparatus further comprises: a training unit;

the obtaining unit 501 is further configured to obtain first historical melody information and second historical melody information matched with the first historical melody information;

Optionally, the generating unit 502 is further configured to obtain a first note string corresponding to the first melody information, input the first note string to the music learning model, and obtain a second note string generated by the music learning model; judging whether the second note string meets a preset condition or not; and if the second note string meets a preset condition, determining the target music based on the second note string.

Optionally, the generating unit 502 is further configured to search a plurality of preset primer melodies for a target primer melody matching the first melody information; inputting the target primer melody into the music learning model to obtain the target music generated by the music learning model;

Optionally, the generating unit 502 is further configured to determine a first target music learning model from the music learning models in response to a selection operation by a user; generating the target music piece according to the first melody information and the first target music piece learning model.

Optionally, the generating unit 502 is further configured to obtain historical behavior data of the user, and extract historical music information from the historical behavior data; determining a second music type corresponding to the historical music information, and acquiring a second target music learning model matched with the music type from the music learning model; and generating the target music according to the first melody information and the second target music learning model.

Optionally, the obtaining unit 501 is further configured to obtain an input audio; if the format of the input audio is not the midi format, converting the input audio into the midi format to obtain the midi format input audio; extracting the first melody information from the midi format input audio.

Optionally, the apparatus further comprises: an editing unit;

Optionally, the apparatus further comprises: a determination unit;

Referring to fig. 6, the apparatus 600 for generating a musical composition may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 606, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 606 and the processing component 602.

The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 606 includes a screen that provides an output interface between the device 600 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, multimedia component 606 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, audio component 610 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect the open/closed status of the device 600, the relative positioning of the components, such as the display and keypad of the apparatus 600, the sensor component 614 may also detect a change in the position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, the orientation or acceleration/deceleration of the apparatus 600, and a change in the temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

The embodiment of the invention provides a music generating device. The apparatus includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

acquiring first melody information in input audio;

Optionally, the processor is further configured to execute the one or more programs including instructions for:

judging whether the second note string meets a preset condition or not;

if the second note string meets a preset condition, determining a target music piece based on the second note string;

the preset condition is that the melody time length corresponding to the second note string reaches the preset time length of the target music

the music learning model comprises a plurality of models, and the training of the music learning model according to the corresponding relation between the first historical melody information and the second historical melody information comprises the following steps:

the generating of the target music according to the first melody information and the music learning model includes:

acquiring an input audio;

extracting the first melody information from the midi format input audio.

alternatively, the first and second electrodes may be,

and editing the target music according to a preset editing rule.

Embodiments of the present invention also provide a non-transitory computer readable storage medium, such as the memory 604, comprising instructions executable by the processor 620 of the device 600 to perform the above-described method. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a method of generating a musical composition, the method comprising:

acquiring first melody information in input audio;

Optionally, the method further includes:

judging whether the second note string meets a preset condition or not;

acquiring an input audio;

extracting the first melody information from the midi format input audio.

Optionally, the method further includes:

alternatively, the first and second electrodes may be,

and editing the target music according to a preset editing rule.

Optionally, the method further includes:

Fig. 7 is a schematic structural diagram of a server in an embodiment of the present invention. The server 700 may vary significantly depending on configuration or performance, and may include one or more Central Processing Units (CPUs) 722 (e.g., one or more processors) and memory 732, one or more storage media 730 (e.g., one or more mass storage devices) storing applications 742 or data 744. Memory 732 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Further, the central processor 722 may be configured to communicate with the storage medium 730, and execute a series of instruction operations in the storage medium 730 on the server 700.

The server 700 may also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input-output interfaces 758, one or more keyboards 756, and/or one or more operating systems 741, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is only limited by the appended claims

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method of generating a musical composition, comprising:

acquiring first melody information in input audio;

2. The method of claim 1, further comprising:

3. The method according to claim 1, wherein the generating a target music piece according to the first melody information and a music piece learning model comprises:

judging whether the second note string meets a preset condition or not;

4. The method according to claim 1, wherein the generating a target music piece according to the first melody information and a music piece learning model comprises:

5. The method according to claim 2, wherein the music piece learning model includes a plurality of pieces, and the training of the music piece learning model based on the correspondence between the first historical melody information and the second historical melody information includes:

6. The method according to claim 5, wherein the generating a target music piece according to the first melody information and a music piece learning model comprises:

7. The method according to claim 5, wherein the generating a target music piece according to the first melody information and a music piece learning model comprises:

determining a second music type corresponding to the historical music information, and acquiring a second target music learning model matched with the music type from the music learning model;

and generating the target music according to the first melody information and the second target music learning model.

8. An apparatus for generating a musical composition, comprising: an acquisition unit and a generation unit;

9. A musical composition producing apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs being configured to be executed by the one or more processors comprise means for performing the method of producing a musical composition as claimed in any one of claims 1 to 7.

10. A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of an electronic device, enable the electronic device to perform a method of generating a musical composition as claimed in any one of claims 1 to 7.