WO2024075422A1

WO2024075422A1 - Musical composition creation method and program

Info

Publication number: WO2024075422A1
Application number: PCT/JP2023/030524
Authority: WO
Inventors: 大樹下薗; 亮佑石浦; 拓上田; 俊亮沼野; 美咲上原
Original assignee: ヤマハ株式会社
Priority date: 2022-10-06
Filing date: 2023-08-24
Publication date: 2024-04-11
Also published as: JP2024055146A

Abstract

According to one embodiment, provided is a musical composition creation method that comprises: extracting, from an input content, feature information included in the content; determining at least one attribute corresponding to the extracted feature information from among a plurality of attributes; determining an accompaniment pattern and a chord progression pattern corresponding to the determined attribute; and creating a musical composition based on the determined accompaniment pattern and chord progression pattern.

Description

Music composition generating method and program

The present invention relates to a music generation method and program.

In recent years, various methods have been proposed for automatically generating music. For example, there is a growing demand for a method for automatically generating background music to be played along with images when playing content such as images on social media.

JP 2016-161774 A

Patent Document 1 discloses a method for generating music data for multiple different songs in response to instructions from a user. However, with the technology disclosed in Patent Document 1, music data is generated based solely on the user's subjective opinion of the content, so music that matches the content may not be generated.

One of the objectives of the present invention is to provide a music generation method that can automatically generate music that matches content.

According to one embodiment of the present invention, a method for generating music is provided, which includes extracting feature information contained in input content from the content, determining at least one attribute from among a plurality of attributes that corresponds to the extracted feature information, determining an accompaniment pattern and a chord progression pattern that correspond to the determined attribute, and generating a music piece based on the determined accompaniment pattern and chord progression pattern.

The present invention provides a music generation method that can automatically generate music that matches content.

FIG. 1 illustrates a music production system according to one embodiment. FIG. 2 is a block diagram showing a configuration of a communication terminal according to an embodiment. FIG. 2 is a block diagram showing a configuration of a storage unit of a server according to an embodiment. 4 is a block diagram showing a functional configuration of a control unit of a server according to an embodiment; FIG. 1 is an example of a table showing a list of first attributes that constitute a first attribute group, output from a trained model. This is an example of a table showing a list of second attributes that constitute a second attribute group, output from the trained model. 11 is a table showing an example of genre information, style information, and score information corresponding to a predetermined piece of music. 11 is a table showing score information total values for each genre of music corresponding to a predetermined piece of music. 11 is a table showing an example of image labels corresponding to predetermined songs. 1 is a table illustrating an example of a situation table according to an embodiment. FIG. 4 is a schematic diagram illustrating an example of a user interface according to an embodiment. FIG. 4 is a schematic diagram illustrating an example of a user interface according to an embodiment. 1 is a flowchart illustrating a music generation process according to an embodiment.

Below, one embodiment of the present invention will be described in detail with reference to the drawings. The embodiments described below are merely examples, and the present invention should not be interpreted as being limited to these embodiments. In the drawings referred to in this embodiment, identical parts or parts having similar functions are given the same or similar symbols (symbols consisting of only a number followed by A, B, etc.), and repeated explanations may be omitted.

[Music Generation System]
1 is a diagram showing a music production system according to one embodiment. The music production system 1000 includes one or more communication terminals 1 and a server 2 connected to a network NW such as the Internet. The communication terminals 1 are, for example, smartphones, tablet computers, laptop computers, and desktop computers, and are connected to the network NW to perform data communication with other devices.

The server 2 receives content from the communication terminal 1 via the network NW, generates music according to the content, and provides it to the communication terminal 1. The communication terminal 1 specifies information for generating music to be played along with the content. The communication terminal 1 can also play the music generated by the server 2 along with the content. The server 2 analyzes the content using a trained model obtained by machine learning, and generates music based on the analysis results. The communication terminal 1 and the server 2 are described below.

[Communication terminal]
2 is a block diagram showing the configuration of the communication terminal 1. The communication terminal 1 includes a control unit 11, a storage unit 12, a communication unit 13, a display unit 14, an operation unit 15, and a speaker 16. These components are connected via a bus 17.

The control unit 11 includes an arithmetic processing circuit such as a CPU (processor). The control unit 11 executes a program stored in the storage unit 12 using the CPU to realize functions such as music selection processing and music playback processing. Some or all of the configuration that realizes these functions is not limited to being realized by software through the execution of a program, but may also be realized by hardware. Note that the functions realized by the control unit 11 include a function to control each part of the communication terminal 1 in addition to the function to perform the above-mentioned processing.

The storage unit 12 is a storage device such as a non-volatile memory or a hard disk. The storage unit 12 includes a storage area for storing application programs for implementing various functions, such as the programs described above, and a storage area for storing information used for each process executed by the communication terminal 1, such as a music selection process and a music playback process. The program may be provided in a state stored in a computer-readable recording medium, such as a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, or a semiconductor memory, as long as it is executable by a computer. In this case, the communication terminal 1 may be provided with a device for reading the recording medium. The program may also be downloaded via a network.

The memory unit 12 also includes a memory area for storing content. The content includes at least one of an image and a text (character string). The image may be a video or a still image. The image may be an image acquired using a camera (not shown) of the communication terminal 1, or an image downloaded from outside via the communication unit 13. The text (character string) may be a text (character string) input via the operation unit 15 of the communication terminal 1, or a text (character string) downloaded from outside via the communication unit 13.

The communication unit 13 connects to the network NW shown in FIG. 1 and transmits and receives information to and from the external server 2 under the control of the control unit 11.

The display unit 14 is a display device such as a liquid crystal display or an organic EL display, and displays images (moving or still images) based on the control of the control unit 11. The operation unit 15 outputs to the control unit 11 a signal corresponding to an operation input by the user via a touch panel, operation buttons, etc. displayed on the display unit 14. The operation buttons may be any operator that accepts user instructions, including, for example, a power switch or cursor keys. The speaker 16 plays music data obtained from the server 2 via the network NW shown in FIG. 1.

The communication terminal 1 transmits content to the server 2 from the communication unit 13 via the network NW. The content transmitted to the server 2 is content to which the user wishes to add BGM (Background Music). The BGM corresponds to a piece of music that the user wishes to play together with the content.

[server]
The configuration of the server 2 will be described with reference to Fig. 1. The server 2 includes a control unit 21, a storage unit 23, and a communication unit 25.

The control unit 21 includes an arithmetic processing circuit such as a CPU (processor). The control unit 21 executes a program stored in the storage unit 23 using the CPU to realize a function for performing music generation processing. A part or all of the configuration that realizes this function is not limited to being realized by software through the execution of a program, but may also be realized by hardware.

The storage unit 23 includes a storage device such as a non-volatile memory. FIG. 3 is a block diagram showing the configuration of the storage unit 23. The storage unit 23 stores a program 231, a trained model 233, a music database 235, and a situation table 237.

The program 231 includes a program used for each process executed by the server 2, such as a music generation process. The program 231 may be provided to the server 2 in a state stored in a computer-readable recording medium, such as a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, or a semiconductor memory, as long as it is executable by a computer. In this case, the server 2 may be provided with a device for reading the recording medium. The program 231 may be downloaded via the communication unit 25.

The trained model 233 is generated by machine learning and provided to the server 2. The trained model 233 determines the attributes of the content provided from the communication terminal 1 by calculation processing using a neural network. Specifically, the trained model 233 is a model (trained model) having a neural network generated by training in advance using training data in a computer such as an external server and machine learning the correlation between the feature information of the content and the attributes. In this embodiment, the trained model 233 is a model that uses N-dimensional transformation Word2vec. The trained model 233 may be stored in another external device connected via the network NW shown in FIG. 1. In this case, the server 2 may be connected to the trained model 233 via the network NW. The feature information and attributes of the content will be described later.

The song database 235 stores song information about multiple songs. The song information includes genre information, style information, score information, image labels, and chord progression data that correspond to multiple songs and are associated with each other. The song database 235 will be described in detail later.

The situation table 237 is a table that associates content attributes with music genres. Details of the situation table 237 will be described later.

Returning to FIG. 1, the communication unit 25 of the server 2 includes a communication module, is connected to the network NW, and transmits and receives various data to and from external devices such as the communication terminal 1.

[Music creation processing]
Next, a description will be given of a music generating process executed by the control unit 21 of the server 2. The music generating process is started in response to a request from the communication terminal 1, for example.

FIG. 4 is a block diagram showing the functional configuration of the control unit 21 of the server 2. The control unit 21 includes a feature information extraction unit 211, an attribute determination unit 213, a music determination unit 215, a music provision unit 217, and a music generation unit 219.

The feature information extraction unit 211 acquires content from the communication terminal 1 via the communication unit 25. The feature information extraction unit 211 extracts feature information of the acquired content from the acquired content. If the content is a video, the feature information extraction unit 211 converts the video into a predetermined number of still images and extracts feature information from the acquired still images.

The feature information means the feature of the content included in the content. When the content is a sentence, the feature of the content is one or more words included in the sentence. The words include nouns, adjectives, verbs, etc. When the content is a sentence, the feature information extraction unit 211 analyzes the sentence by morphological analysis to extract the feature information. For example, when the content is a sentence "Bright balcony by the sea", the morphemes "seaside", "balcony", and "bright" may be extracted as feature information from the content. When the content is an image, the feature information extraction unit 211 extracts the feature information by performing image processing and image analysis using a known image analysis technique (for example, a technique using OpenCV, etc.). For example, when the image is an image of a beach on a sunny day, for example, "sea", "sky", "sand" may be extracted as objects included in the image. The feature information extraction unit 211 provides the feature information of the content to the attribute determination unit 213.

The attribute determination unit 213 acquires feature information of the content from the feature information extraction unit 211, and determines the attributes of the acquired feature information using the trained model 233. The attributes include a first attribute group and a second attribute group. The first attribute group is composed of attributes (first attributes) classified by impression, and the second attribute group is composed of attributes (second attributes) classified by situation.

5 is an example of a table (hereinafter referred to as an impression list) showing a list of first attributes constituting a first attribute group, output from the trained model 233. FIG. 6 is an example of a table (hereinafter referred to as a situation list) showing a list of second attributes constituting a second attribute group, output from the trained model 233. The impression list shown in FIG. 5 includes 29 attributes (first attributes). The first attributes include attributes classified by impression, such as "soft," "elegant," "solemn," and "calm." The number of attributes included in the impression list is not limited to 29. The attributes included in the impression list are not limited to the attributes shown in FIG. 5. The situation list shown in FIG. 6 includes 24 attributes (second attributes). The second attributes include attributes classified by situation, such as "watching sports," "clean weather," "station," "southern country," and "movie theater." The number of attributes included in the situation list is not limited to 24. The attributes included in the situation list are not limited to the attributes shown in FIG. 6.

The attribute determination unit 213 inputs the feature information to the input layer of the trained model 233. The trained model 233 to which the feature information has been input outputs, as a result of calculations in the intermediate layer, one attribute (first attribute) selected from the impression list shown in FIG. 5 and one attribute (second attribute) selected from the situation list shown in FIG. 6 to the output layer. The attribute determination unit 213 obtains the first attribute and second attribute output from the trained model 233, and determines these as attributes of the content. The attribute determination unit 213 provides the attributes of the content to the music determination unit 215.

The music determination unit 215 determines music candidates that correspond to the acquired attributes. The music determination unit 215 determines music candidates based on the acquired attributes by referring to the music database 235 and the situation table 237.

The song database 235 will be described below. Song information relating to multiple songs prepared in advance is registered in the song database 235. The song information includes genre information, style information, score information, image label, and chord progression data that are associated with each other. The song information is registered in association with each of the multiple songs. For example, if there are 500 songs registered in the song database 235, song information corresponding to each of the 500 songs is registered in association with the corresponding song. Note that the number of songs registered in the song database 235 is not limited to 500. The song information will be described below.

Chord progression data is data that indicates a chord progression pattern in which multiple chords that make up a song are arranged in chronological order. For example, the chord progression data is written as "CM7-Dm7-Em7-...". When arranged in chronological order, each chord may be arranged in a unit of a predetermined unit period (for example, one measure, one beat, etc.), or may be arranged in order without considering the unit period. For example, assuming that each chord is arranged in a unit of one measure in the above example, when the first chord in the above example continues for two measures, the song chord data is written as "CM7-CM7-Dm7...". On the other hand, assuming that the number of measures is not taken into consideration, the song chord data is written as "CM7-Dm7-..." as in the above example. The chord progression data may be data that corresponds to a portion of a period including the chorus of each song. However, the present invention is not limited to this, and the chord progression data may be data that corresponds to the entire song of each song.

Genre information is information indicating the genre of a song, for example, "pop", "rock", "Latin", etc. Style information is information indicating a more detailed classification within the genre, that is, the style (accompaniment pattern). For example, if the genre information of a song is "pop" indicating pop music, the style information of the song includes, for example, "80s pop", "easy pop", "ballad", etc. Also, if the genre information of a song is "Latin" indicating Latin music, the style information of the song includes, for example, "reggae", "bossa nova", "tango", "samba", etc. Score information is information indicating the probability that the song is of a certain genre and style. The types of genre information and style information are predetermined. The genre information, style information, and score information for each song can be obtained by analyzing the song using known music analysis technology.

FIG. 7 is a table showing an example of genre information, style information, and score information obtained by analyzing a specific song (hereinafter referred to as song A). Referring to FIG. 7, the genre information and style information of song A are shown from 1st to 10th in order of score. For example, the score information corresponding to the genre information "Pop" and the style information "Easy Pop" is "0.218". This means that as a result of analyzing song A, the probability that the genre of song A is "pop" and the style is "easy pop" is 0.218. Furthermore, the score information corresponding to the genre information "Pop" and the style information "80' Pop" is "0.195". This means that as a result of analyzing song A, the probability that the genre of song A is "pop" and the style is "80's pop" is 0.195. Furthermore, the score information corresponding to the genre information "Rock" and the style information "80' Pop Rock" is "0.102". This means that, after analyzing song A, there is a 0.102 probability that the genre of song A is "rock" and the style is "80s pop rock."

As shown in FIG. 7, when the maximum value of the score information for song A is 0.218, the score ranking is 1st. This indicates that the probability that the genre of song A is "pop" and the style is "easy pop" is the highest. The genre and style corresponding to score information of 0.195, which is the 2nd highest score ranking, is "pop" and "80s pop". This indicates that the probability that the genre of song A is "pop" and the style is "80s pop" is the second highest. FIG. 7 shows an example in which the genre information and style information for song A are registered from 1st to 10th in score order. However, the number of genre information, style information, and score information registered for each song is not limited to 10. For example, score information may be calculated and registered for each song for all predetermined types of genre information and style information.

In addition, the song database 235 registers the sum of score information for each genre information for each song. FIG. 8 is a table showing the sum of score information for each genre information of song A shown in FIG. 7. In FIG. 8, the sum of score information is a value obtained by summing up the score information from 1st place to a specified score ranking for each predetermined genre. As an example, FIG. 8 shows the sum of score information when the score information for 1st place to 10th place is summed up for each genre.

Referring to FIG. 8, when the genre is "Pop", the score information total value is 0.413. This is the total value of the score information of the "Easy Pop" style and the "80' Pop" style corresponding to the "Pop" genre in FIG. 7 (i.e., 0.218+0.195=0.413). As described above, such a score information total value is calculated for each predetermined genre. For example, in the case of song A, as shown in FIG. 8, the score information total value corresponding to the "Rock" genre may be 0.102, and the score information total value corresponding to the "Latin" genre may be 0. When the score information total value is 0, it indicates that the "Latin" genre is not included in the score rankings 1 to 10 shown in FIG. 7 for song A. In other words, it means that there is a low possibility that song A falls into the "Latin" genre. On the other hand, the genre with the largest total score information value indicates that song A is most likely to belong to that genre. This concludes the explanation of genre information, style information, and score information.

The image labels included in the song information are explained below. Image labels are a numerical representation of the impression of a song. The image label for each song is determined in advance by two or more experts and registered in association with the corresponding song.

FIG. 9 is a table showing an example of image labels corresponding to song A. In FIG. 9, the number of labels is a numerical value that the experts assigned to song A when they listened to song A and evaluated the song A for each of the 29 attributes included in the impression list shown in FIG. 5. The experts may evaluate the song on a multi-level scale for each attribute. For example, when the experts evaluate song A for a certain attribute, the numerical value "2" may be assigned if the impression of the attribute is particularly strong, the numerical value "0" may be assigned if the impression of the attribute is particularly weak, and the numerical value "1" may be assigned if the impression of the attribute is intermediate (if the impression of the attribute is not particularly strong, but not particularly weak either). These numerical values may be the average of the number of labels assigned by two or more experts for each attribute, and this average value may be rounded off.

Referring to Figure 9, the number of labels given corresponding to the attribute "soft" is "1". This indicates that when the experts listened to song A, they did not get a strong "soft" impression of song A, but they did not get a particularly weak impression either. Also, referring to Figure 9, the number of labels given corresponding to the attribute "elegant" is "2". This indicates that when the experts listened to song A, they got a strong "elegant" impression of song A. Also, referring to Figure 9, the number of labels given corresponding to the attribute "solemn" is "0". This indicates that when the experts listened to song A, they got a particularly weak "solemn" impression of song A. In other words, this indicates that the experts did not get a "solemn" impression from song A. This concludes the explanation of image labels.

Returning to FIG. 4, the explanation will be continued. As described above, the music determination unit 215 refers to the music database 235 and the situation table 237 to determine candidate music corresponding to the attributes of the acquired content. The music determination unit 215 acquires the attributes of the feature information extracted from the content from the attribute determination unit 213. As described above, the attributes include a first attribute and a second attribute.

The song determination unit 215 determines a score corresponding to the acquired attributes for each of the multiple songs registered in the song database 235. The score is calculated using the following procedure.

First, the music determination unit 215 refers to the situation table 237 to determine the genre of music that corresponds to the second attribute included in the acquired attributes. FIG. 10 is a table showing an example of the situation table 237. The situations registered in the situation table 237 correspond to the 24 attributes shown in the situation list shown in FIG. 6. In the situation table 237, each situation is associated with a predetermined music genre.

Referring to FIG. 10, for example, when the situation is "living room", the genre of the associated music is "Pop". Also, when the situation is "Tropical", the genre of the associated music is "Latin". Below, an example will be described in which the second attribute included in the acquired attributes is "living room" and the first attribute is "elegant". The music determination unit 215 refers to the situation table 237 and determines that the genre of the music corresponding to the situation "living room" is "Pop".

Next, the song determination unit 215 refers to the song database 235 and obtains the score information total value corresponding to the genre of the determined song. If the genre of the determined song is "Pop", the song determination unit 215 obtains the score information total value corresponding to the "Pop" genre for each song registered in the song database 235. For example, for song A registered in the song database 235, the score information total value corresponding to the "Pop" genre is "0.413" (see Figure 8). The song determination unit 215 obtains the score information total value for each song registered in the song database 235.

Next, the song determination unit 215 refers to the song database 235 to obtain the number of labels assigned corresponding to the obtained first attribute. If the obtained first attribute is "elegant", the song determination unit 215 refers to the image label associated with each song to obtain the number of labels assigned corresponding to the first attribute of each song. For example, in the case of song A, the number of labels assigned corresponding to "elegant" is "2" (see Figure 9). The song determination unit 215 obtains the number of labels assigned for each song registered in the song database 235.

Next, the song determination unit 215 calculates a score corresponding to the acquired attribute for each of the multiple songs registered in the song database 235 by multiplying the acquired score information total value by the acquired number of assigned labels. For example, if the second attribute included in the acquired attributes is "living room" and the first attribute is "elegant", the score of song A is 0.413 x 2 ("total score information value" x "number of assigned labels") = 0.826. The song determination unit 215 calculates a score corresponding to the acquired attribute for all songs registered in the song database 235. If there are 500 songs registered in the song database 235, scores corresponding to the acquired attributes are calculated for all 500 songs.

The song determination unit 215 selects song candidates corresponding to the content attribute from among multiple songs registered in the song database 235 based on the calculated score value. Specifically, the song determination unit 215 may determine the first to nth songs in descending order of the calculated score value as song candidates corresponding to the content attribute. Here, n is an arbitrary integer between 1 and 500, and may be, for example, 20. The song determination unit 215 provides song information corresponding to each of the songs determined as song candidates corresponding to the content attribute to the song provision unit 217. The song information provided to the song provision unit 217 includes at least one of chord progression data and style information.

Specifically, the music determination unit 215 refers to the music database 235 and the situation table 237 to obtain style information corresponding to each music piece. As described above, the music determination unit 215 refers to the situation table 237 to obtain a genre corresponding to the second attribute included in the attributes of the content. The music determination unit 215 refers to the music database 235 and determines, among the style information corresponding to the obtained genres, the style information having the largest score information value as the style information to be provided to the music provision unit 217. The music determination unit 215 determines the style information to be provided to the music provision unit 217 for each music piece determined as a candidate for a music piece corresponding to the attributes of the content. For example, a case will be described in which a song A is included in the candidates for a music piece corresponding to the attributes of the content. As described with reference to FIG. 10, when the genre corresponding to the second attribute included in the attributes of the content is "Pop", the music determination unit 215 refers to the style information corresponding to the "Pop" genre of song A in the music database 235 and determines the style information having the largest score information value as the style information of song A.

For example, referring to FIG. 7, the styles corresponding to the "Pop" genre of song A include "Easy Pop" and "80' Pop." Of these, "Easy Pop," which has the largest score information value, is determined as the style information corresponding to song A.

Furthermore, for example, if the genre of the song corresponding to the second attribute included in the attributes of the content is "Rock", then among the styles corresponding to the "Rock" genre of song A, the style with the largest score information value is determined as the style information of song A. For example, if the score information value corresponding to the "80' Pop Rock" style corresponding to the "Rock" genre of song A is the largest, "80' Pop Rock" is determined as the style information corresponding to song A.

The music providing unit 217 provides the communication terminal 1 with music information corresponding to each of the n pieces of music determined as candidates for music corresponding to the attributes of the content, obtained from the music determining unit 215. The music providing unit 217 provides the communication terminal 1 via the network NW shown in FIG. 1. The communication terminal 1 may be the communication terminal 1 that provided the content to the server 2. As described above, the music information provided to the communication terminal 1 includes at least one of chord progression data and style information.

The control unit 11 of the communication terminal 1 provides the display unit 14 with a user interface for the user to determine the background music for the content based on the acquired music information for the n songs, and executes the music selection process. The music selection process is a process in which the user selects at least one of the style and chord progression pattern of the music desired as background music for the content from the music information corresponding to each of the n songs.

The control unit 11 may provide the display unit 14 with a user interface for the user to select the style of the background music for the content. FIG. 11 is an example of a user interface provided to the display unit 14 by the control unit 11 for the user to select the style of the background music for the content.

As shown in FIG. 11, a plurality of icons 1101 indicating style information corresponding to each of n pieces of music may be displayed on the display unit 14 of the communication terminal 1. The user can select the style of the BGM of the content by tapping the icon 1101 indicating the desired style information. In addition, audio data for playing the style corresponding to the style information may be provided from the server 2 together with the music information of the n pieces of music. In this case, a play button 1103 for playing the style corresponding to each piece of style information may be displayed together with the icon 1101 indicating the style information. The user can listen to and check the style corresponding to the desired style information by tapping the play button 1103 corresponding to the desired style information. The user may listen to the styles and select the style desired for the BGM of the content.

The control unit 11 may also provide the display unit 14 with a user interface that allows the user to select a chord progression pattern for the content's background music. FIG. 12 shows an example of a user interface that is provided to the display unit 14 by the control unit 11 and allows the user to select a chord progression pattern for the content's background music.

12 shows an example in which icons 1201 showing four chord progression patterns (pattern A, pattern B, pattern C, pattern D) are displayed on the user interface. The icon 1201 showing the chord progression pattern may be displayed together with information showing the impression of the chord progression pattern. Based on the information showing the impression of the chord progression pattern, the user can imagine the impression of the music corresponding to the chord progression pattern. The information showing the impression of the chord progression pattern may be different for each chord progression pattern. The user can select the chord progression pattern desired for the BGM of the content by tapping the icon 1201 showing the chord progression pattern. In addition, audio data for playing the chord progression pattern may be provided from the server 2 together with the music information of the n songs. In this case, a play button 1203 for playing each chord progression pattern may be displayed together with the icon 1201 showing the chord progression pattern. The user can listen to and check the desired chord progression pattern by tapping the play button 1203 corresponding to the desired chord progression pattern. The user may listen to the chord progression pattern and select the chord progression pattern desired for the BGM of the content.

The control unit 11 provides at least one of a user interface for the user to select the style of the content's background music and a user interface for the user to select the chord progression pattern of the content's background music as a user interface for the user to determine the content's background music.

The control unit 11 may provide both a user interface for the user to select the style of the content's background music, and a user interface for the user to select the chord progression pattern of the content's background music. There may be cases where the style information or chord progression pattern overlaps between multiple songs among the n songs. When the style or chord progression pattern selected by the user overlaps between multiple songs, the control unit 11 allows the user to select both the style and chord progression pattern of the content's background music, thereby enabling the ultimately generated song to be closer to the song desired by the user.

The control unit 11 provides at least one of the style information and chord progression pattern selected by the user to the server 2 from the communication unit 13 via the network NW shown in FIG. 1.

Returning to FIG. 4, the explanation of the music generation process by the server 2 will continue. At least one of the style information and chord progression pattern selected by the user is provided to the music generation unit 219 of the control unit 21 of the server 2. Based on the acquired style information and at least one of the chord progression pattern, the music generation unit 219 generates music data corresponding to the music to be added to the content using known music generation technology. Furthermore, the music generation unit 219 uses the generated music data to generate content playback data for playing the music and the content corresponding to the music in sync.

The server 2 provides the content playback data generated by the music generation unit 219 to the communication terminal 1 from the communication unit 25 via the network NW. The content playback data may be provided in response to a request from the communication terminal 1.

[Music creation process flow]
13 is a flowchart showing the music generation process according to an embodiment of the present invention. As described above, the music generation process is executed by the control unit 21 of the server 2.

The control unit 21 waits until content is received from the communication terminal 1 (S1301; NO). When the user operates the communication terminal 1 to instruct transmission of content, the communication terminal 1 transmits the content to the server 2. When the server 2 receives the content (S1301; YES), the control unit 21 extracts feature information of the received content (S1303). The control unit 21 extracts feature information from the content using known image analysis techniques and morphological analysis.

The control unit 21 provides the extracted feature information to the trained model 233 (S1305). The control unit 21 executes calculation processing by the trained model 233 to obtain the attributes of the content from the trained model 233 (S1307). The attributes of the content include one attribute (first attribute) selected from the impression list shown in FIG. 5 and one attribute (second attribute) selected from the situation list shown in FIG. 6.

The control unit 21 determines song candidates corresponding to the content based on the attributes of the acquired content (S1309). The control unit 21 determines the song candidates by referring to the song database 235 and the situation table 237. The control unit 21 provides the communication terminal 1 with song information corresponding to the determined song candidates (S1311). Here, the song information includes at least one of style information and chord progression patterns associated with each song determined as a song candidate.

The communication terminal 1, which has acquired the song information corresponding to the candidate songs, provides a user interface. The user selects the song information of the song to be added as background music to the content via the user interface and transmits it to the server 2.

The control unit 21 acquires the music information selected by the user (S1313). The acquired music information includes at least one of the style information and the chord progression pattern selected by the user.

The control unit 21 generates music data based on the acquired music information (S1315). The control unit 21 generates the music data using known music generation technology. The control unit 21 uses the generated music data to generate content playback data for playing the music and the content corresponding to the music in synchronization (S1317).

The control unit 21 may provide the generated content playback data to the communication terminal 1 in response to a request from the communication terminal 1. The above is a series of flows for the music generation process executed by the control unit 21.

[Music playback processing]
The communication terminal 1 can acquire content playback data from the server 2 in response to an instruction from a user, and execute a music playback process. The communication terminal 1 acquires content playback data from the server 2 via the network NW. When a user inputs an instruction to play content via the operation unit 15 of the communication terminal 1, music is played along with the content in the communication terminal 1. The content may be images including videos and still images provided to the server 2. When the content provided to the server 2 is text (character string), the content played in the communication terminal 1 may be an image including the text (character string). The music played along with the content is output via the speaker 16 of the communication terminal 1. The above is the music playback process executed by the communication terminal 1.

In this way, by extracting feature information from the content provided by the communication terminal 1 and providing the extracted feature information to the trained model 233, the server 2 can automatically obtain the attributes of the content. By generating music based on the obtained attributes, the server 2 can generate and provide music that matches the content. In addition, when the user inputs the content to which they wish to add background music, they can enjoy the customer experience of being able to obtain automatically generated music that is suitable for the content and content playback data for playing the content in sync.

[Modification]
The present disclosure is not limited to the above-described embodiment, and includes various other modified examples. For example, the above-described embodiment has been described in detail to clearly explain the present disclosure, and is not necessarily limited to those having all of the configurations described. Other configurations may be added, deleted, or replaced with respect to a part of the configuration of the embodiment. Some modified examples will be described below.

(1) In the above-described embodiment, the music determination unit 215 of the server 2 determined the songs ranked 1st to nth (n is any integer between 1 and 500, inclusive; for example, 20) in descending order of calculated score values as candidates for music corresponding to the attributes of the content. However, the music determination unit 215 may extract the songs ranked 1st to mth (m is any integer between 1 and 500, inclusive, and m>n; for example, it may be 40) in ascending order of calculated score values based on a predetermined algorithm, randomly select n songs (for example, 20 songs) from among them, and determine the selected songs as candidates for music corresponding to the attributes of the content.

(2) In the above-described embodiment, the user selected at least one of the style information and the chord progression pattern of the music to be added as BGM to the content via a user interface provided to the communication terminal 1. The control unit 11 of the communication terminal 1 may provide a user interface to the display unit 14 of the communication terminal 1 for allowing the user to set further additional information in addition to the style information and the chord progression pattern.

Further additional information may include, for example, the tempo of the music to be added as background music for the content, the playback time of the music, the intonation of the music, the melody, and lyrics. The communication terminal 1 may provide the server 2 with additional information set by the user. The control unit 21 of the server 2 can reflect the additional information provided by the communication terminal 1 in the music data.

(3) In the above-described embodiment, the user can select the chord progression pattern of the music to be added to the content as BGM via a user interface provided on the communication terminal 1. However, the user may also edit the chord progression pattern of the music to be added to the content as BGM via a user interface provided on the communication terminal 1. Furthermore, the user may be able to preview the edited chord progression pattern. The user can preview the edited chord progression pattern and further edit the chord progression pattern.

The communication terminal 1 may provide the server 2 with a chord progression pattern set by the user. The control unit 21 of the server 2 may generate music data based on the chord progression pattern set by the user, which is obtained from the communication terminal 1.

(4) The content playback data generated by the server 2 can be obtained not only by the communication terminal 1 that provided the content to the server 2, but also by other communication terminals 1 that can connect to the server 2 via the network NW.

The above is an explanation of the modified version.

As described above, according to one embodiment of the present invention, a music generation method is provided that includes extracting feature information contained in input content from the content, determining at least one attribute from among a plurality of attributes that corresponds to the extracted feature information, determining an accompaniment pattern and a chord progression pattern that correspond to the determined attribute, and generating a music piece based on the determined accompaniment pattern and chord progression pattern.

The content may include an image, and the feature information may include objects extracted from the image.

The content may include sentences, and the feature information may include morphemes extracted from the sentences.

The multiple attributes may be divided into multiple groups including a first attribute group and a second attribute group, and the determined attributes may include an attribute included in the first attribute group and an attribute included in the second attribute group.

The determined attributes may be determined based on information obtained from a trained model that has learned the relationship between feature information and attributes by inputting the extracted feature information into the trained model.

Determining the accompaniment pattern and the chord progression pattern may include identifying a plurality of accompaniment patterns corresponding to the determined attributes, providing a user interface for allowing a user to select at least one of the identified plurality of accompaniment patterns, and determining the selected accompaniment pattern as the accompaniment pattern corresponding to the determined attributes.

　Identifying the multiple accompaniment patterns corresponding to the determined attribute includes identifying the multiple accompaniment patterns according to the predetermined algorithm, and may include cases where the multiple accompaniment patterns identified for a first attribute by the predetermined algorithm are a first combination and a second combination.

　Determining the accompaniment pattern and the chord progression pattern may include identifying a plurality of the chord progression patterns corresponding to the determined attribute, providing a user interface for allowing a user to select at least one of the identified plurality of chord progression patterns, and determining the selected chord progression pattern as the chord progression pattern corresponding to the determined attribute.

　Identifying the multiple chord progression patterns corresponding to the determined attribute includes identifying the multiple chord progression patterns according to the predetermined algorithm, and may include cases where the multiple chord progression patterns identified for a first attribute by the predetermined algorithm are a first combination and a second combination.

　The method may further include providing a user interface for allowing a user to set additional information for the song, and the song may be generated based on the set additional information.

This may further include outputting data for playing the generated music in sync with the input content.

According to one embodiment of the present invention, a program may be provided to cause a computer to execute the following operations: extracting characteristic information contained in input content from the content; determining at least one attribute from among a plurality of attributes that corresponds to the extracted characteristic information; determining an accompaniment pattern and a chord progression pattern that correspond to the determined attribute; and generating a piece of music based on the determined accompaniment pattern and chord progression pattern.

In addition, the program (program product) according to one embodiment may be provided on a computer-readable recording medium, or may be provided in a form distributed via a network, such as from an external server.

According to one embodiment of the present invention, a music generating device (server) may be provided that includes a communication unit that receives content from a communication terminal, and a control unit that generates music based on the received content. The control unit executes a music generating process. The control unit extracts feature information contained in the received content from the content, and determines at least one attribute from among a plurality of attributes that corresponds to the extracted feature information. The control unit further determines an accompaniment pattern and a chord progression pattern that correspond to the determined attribute, and generates music based on the determined accompaniment pattern and chord progression pattern.

According to one embodiment of the present invention, a music composition system may be provided that includes one or more communication terminals and a server. The one or more communication terminals and the server are connected via a network such as the Internet. Each communication terminal transmits content to the server via the network. The content is content to which a user wishes to add background music. The server includes a communication unit that receives the content from the communication terminal, and a control unit that generates music based on the received content. The control unit extracts feature information contained in the received content from the content, and determines at least one attribute, out of multiple attributes, that corresponds to the extracted feature information. The control unit further determines an accompaniment pattern and a chord progression pattern that correspond to the determined attribute, and generates music based on the determined accompaniment pattern and chord progression pattern.

1: Communication terminal, 2: Server, 11: Control unit, 12: Memory unit, 13: Communication unit, 14: Display unit, 15: Operation unit, 16: Speaker, 17: Bus, 21: Control unit, 23: Memory unit, 25: Communication unit, 211: Feature extraction unit, 213: Attribute determination unit, 215: Music determination unit, 217: Music provision unit, 219: Music generation unit, 231: Program, 233: Trained model, 235: Music database, 237: Situation table, 1000: Music generation system

Claims

Extracting feature information contained in the input content from the input content;
determining at least one attribute among a plurality of attributes corresponding to the extracted feature information;
determining an accompaniment pattern and a chord progression pattern corresponding to the determined attributes;
generating a piece of music based on the determined accompaniment pattern and chord progression pattern;
A method for generating music, comprising:
the content includes an image;
The feature information includes objects extracted from the image.
The music generating method according to claim 1 .
The content includes text;
The feature information includes morphemes extracted from the sentence.
The music generating method according to claim 1 .
The plurality of attributes are divided into a plurality of groups including a first attribute group and a second attribute group,
The determined attributes include attributes included in the first attribute group and attributes included in the second attribute group.
The music generating method according to claim 1 .
The music generation method according to claim 1, wherein the determined attributes are determined based on information obtained from a trained model that has learned the relationship between feature information and attributes by inputting the extracted feature information into the trained model.
Determining the accompaniment pattern and the chord progression pattern includes:
identifying a plurality of said accompaniment patterns corresponding to said determined attributes;
providing a user interface for allowing a user to select at least one of the identified accompaniment patterns;
determining the selected accompaniment pattern as the accompaniment pattern corresponding to the determined attribute;
including,
The music generating method according to claim 1 .
Identifying a plurality of the accompaniment patterns corresponding to the determined attributes includes:
determining the plurality of accompaniment patterns in accordance with the predetermined algorithm;
the plurality of accompaniment patterns identified for a first attribute by the predetermined algorithm include a first combination and a second combination;
The music generating method according to claim 6.
Determining the accompaniment pattern and the chord progression pattern includes:
identifying a plurality of said chord progression patterns corresponding to said determined attributes;
providing a user interface for allowing a user to select at least one of the identified chord progression patterns;
determining the selected chord progression pattern as a chord progression pattern corresponding to the determined attribute;
including,
The music generating method according to claim 1 .
Identifying a plurality of chord progression patterns corresponding to the determined attributes includes:
identifying the plurality of chord progression patterns in accordance with the predetermined algorithm;
the plurality of chord progression patterns identified for a first attribute by the predetermined algorithm include a first combination and a second combination,
The method of producing music according to claim 8.
providing a user interface for allowing a user to set additional information of the music piece;
The music is further generated based on the set additional information.
The music generating method according to claim 1 .
The music generation method of claim 1, further comprising outputting data for playing the generated music in synchronization with the input content.
Extracting feature information contained in the input content from the input content;
determining at least one attribute among a plurality of attributes corresponding to the extracted feature information;
determining an accompaniment pattern and a chord progression pattern corresponding to the determined attributes;
generating a piece of music based on the determined accompaniment pattern and chord progression pattern;
A program for causing a computer to execute the following.