TWI784434B

TWI784434B - System and method for automatically composing music using approaches of generative adversarial network and adversarial inverse reinforcement learning algorithm

Info

Publication number: TWI784434B
Application number: TW110108455A
Authority: TW
Inventors: 蘇豐文; 威爾澤
Original assignee: 國立清華大學
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2022-11-21
Also published as: TW202236173A

Abstract

The present invention discloses a system and a method for automatically composing music using approaches of generative adversarial network and adversarial inverse reinforcement learning algorithm, comprising: a music feature extracting unit, a first operation module, a second operation module and a third operation module. The music feature extracting unit performs a feature extracting process on each of reference music data stored in the reference music database, thereby extracting a plurality of music features. Particularly, the first operation module performs a long and short-term memory (LSTM) neural network algorithm with bi-axial architecture. The second operation module uses an adversarial inverse reinforcement learning (AIRL) algorithm to perform a second operation on the plurality of music features, thereby obtaining at least one reward function. According to foregoing features of the present invention, the music generated by the system and method of the present invention are melodic and like the music composed by humans.

Description

Automatic composition system and method using adversarial generative network and inverse reinforcement learning method

本發明係關於自動作曲系統的技術領域，尤指一種使用對抗生成網路與對抗逆增強式學習法的自動作曲系統及方法。The invention relates to the technical field of an automatic composition system, in particular to an automatic composition system and method using an adversarial generation network and an adversarial inverse reinforcement learning method.

音樂是人類生活中極為重要的一環，不僅作為日常生活中放鬆心情的方法，且根據古羅馬文獻記載音樂是對人體具有療效且能改善人類的心情。根據專題報導指出，音樂對於人們具有以下七大益處:增加學習效率、紓解壓力、具有止痛效果、增強記憶力、改善失眠、提升運動效率以及促使心情更快樂。Music is an extremely important part of human life. It is not only used as a way to relax in daily life, but also according to ancient Roman literature, music has curative effects on the human body and can improve human mood. According to a special report, music has the following seven benefits for people: increasing learning efficiency, relieving stress, having analgesic effect, enhancing memory, improving insomnia, improving exercise efficiency, and promoting a happier mood.

然而，傳統的作曲方法之中，作曲者必須學習多年的樂器技巧與樂理知識同時耗費多日才能完成一首曲子；因此，為了讓人們不受樂理知識與樂器背景的限制且有效率地自行作出獨特的曲子，現今已有許多種自動作曲系統被提出。其中一種自動作曲系統係利用監督式深度學習(Deep Supervised Learning)演算法為模型。但上述監督式深度學習演算法過度重複使用相同的旋律，而導致其產出的音樂具有不悅耳的缺點。However, in the traditional composing method, the composer must learn many years of musical instrument skills and knowledge of music theory and spend many days to complete a piece of music; Unique tunes, many automatic composition systems have been proposed. One of the automatic composition systems uses a supervised deep learning (Deep Supervised Learning) algorithm as a model. But the above-mentioned supervised deep learning algorithm reuses the same melody excessively, and the music it produces has the disadvantage of being unpleasant.

由上述說明可知，實有必要對現有的自動作曲系統進行改良與重新設計，使其可以產出更加悅耳與受人類喜愛的樂曲。有鑑於此，本案之發明人係極力加以研究創作，而終於研發完成本發明之一種使用對抗生成網路與逆增強式學習法的自動作曲系統及方法。From the above description, it can be seen that it is necessary to improve and redesign the existing automatic composition system so that it can produce more melodious and popular music. In view of this, the inventor of this case made great efforts to research and create, and finally developed an automatic composition system and method using confrontational generative network and inverse reinforcement learning method of the present invention.

本發明之主要目的在於提供一種使用對抗生成網路與逆增強式學習法的自動作曲系統及方法，其中，所述自動作曲系統係應用於一電子裝置之中，使該電子裝置依據複數個參考樂曲資料而產生一樂曲，且包括：一樂曲資料庫、一樂曲特徵萃取單元、一第一運算模組、一第二運算模組以及一第三運算模組。該樂曲資料庫用以儲存複數個參考樂曲資料。特別地，所述樂曲特徵萃取單元對各個所述參考樂曲資料執行一特徵萃取處理，從而萃取出複數個樂曲特徵。接著，該第一運算模組用以利用一深度學習演算法對所述樂曲特徵執行一第一運算，從而獲取至少一樂曲概率特徵以及至少一預訓練權重參數。此外，該第二運算模組利用一增強式學習演算法對該複數個樂曲特徵執行一第二運算以獲取至少一音符獎勵函數。再者，該第三運算模組利用一深度強化學習演算法藉由所述預訓練權重參數進行一初始化設定，且對儲存於一樂理資料庫之中的複數組樂理資料、該至少一音符獎勵函數以及該至少一預訓練權重參數執行一第三運算，從而獲取複數個複音參考樂曲資料；所述複音參考樂曲資料即為悅耳且受人們喜愛的音樂。The main purpose of the present invention is to provide an automatic composition system and method using adversarial generative network and inverse reinforcement learning method, wherein the automatic composition system is applied to an electronic device, so that the electronic device is based on a plurality of references A music is generated from the music data, and includes: a music database, a music feature extraction unit, a first computing module, a second computing module and a third computing module. The music database is used to store a plurality of reference music data. In particular, the music feature extraction unit performs a feature extraction process on each of the reference music data, thereby extracting a plurality of music features. Next, the first operation module is used to perform a first operation on the music feature by using a deep learning algorithm, so as to obtain at least one music probability feature and at least one pre-training weight parameter. In addition, the second operation module uses a reinforcement learning algorithm to perform a second operation on the plurality of music features to obtain at least one note reward function. Furthermore, the third calculation module uses a deep reinforcement learning algorithm to perform an initialization setting through the pre-training weight parameters, and rewards the plurality of sets of music theory data stored in a music theory database, the at least one note The function and the at least one pre-training weight parameter perform a third operation, so as to obtain a plurality of polyphonic reference music data; the polyphonic reference music data is pleasant and popular music.

為了達成上述本發明之主要目的，本案發明人係提供所述使用對抗生成網路與逆增強式學習法的自動作曲系統之一實施例，其應用於一電子裝置之中，使該電子裝置依據複數個參考樂曲資料而產生一樂曲；所述自動作曲系統包括：一樂曲資料庫，用以儲存該複數個參考樂曲資料；一樂曲特徵萃取單元，用以對各個所述參考樂曲資料執行一特徵萃取處理，從而萃取出複數個樂曲特徵；一第一運算模組，用以利用一深度增強式學習演算法對該複數個樂曲特徵執行一第一運算，從而獲取至少一樂曲概率特徵以及至少一預訓練權重參數；一第二運算模組，用以利用一增強式學習演算法對該複數個樂曲特徵執行一第二運算，從而獲取至少一音符獎勵函數；一第三運算模組，用以利用一深度增強式學習演算法藉由該至少一預訓練權重參數進行一初始化設定，且對該儲存於一樂理資料庫之中的複數組樂理資料、該至少一音符獎勵函數以及該至少一預訓練權重參數執行一第三運算，從而獲取複數個複音樂曲資料。 In order to achieve the above-mentioned main purpose of the present invention, the inventor of this case provides an embodiment of the automatic composition system using the confrontation generation network and the inverse reinforcement learning method, which is applied to an electronic device, so that the electronic device is based on A plurality of reference music data are used to generate a music; the automatic composition system includes: a music database for storing the plurality of reference music data; a music feature extraction unit, configured to perform a feature extraction process on each of the reference music data, thereby extracting a plurality of music features; A first operation module, for performing a first operation on the plurality of music features by using a deep enhanced learning algorithm, so as to obtain at least one music probability feature and at least one pre-training weight parameter; A second operation module, used for performing a second operation on the plurality of music features by using a reinforcement learning algorithm, so as to obtain at least one note reward function; A third calculation module is used to use a deep reinforcement learning algorithm to perform an initialization setting with the at least one pre-training weight parameter, and for the complex set of music theory data stored in a music theory database, the at least one A note reward function and the at least one pre-trained weight parameter perform a third operation, so as to obtain a plurality of complex music data.

並且，為了達成上述本發明之主要目的，本案發明人係同時提供所述使用對抗生成網路與逆增強式學習法的自動作曲方法之一實施例，其係應用在一電子裝置以藉由該電子裝置之一處理器實現，且包括以下步驟： (1)提供一樂曲特徵萃取單元用以對儲存於該電子裝置之一樂曲資料庫之中的複數個樂曲資料分別進行一特徵萃取處理，從而萃取出複數個樂曲特徵； (2)提供一第一運算模組以利用一深度學習演算法對該複數個樂曲特徵執行一第一運算，從而獲取至少一樂曲概率特徵以及至少一預訓練權重參數； (3)提供一第二運算模組以利用一增強式學習演算法對該複數個樂曲特徵執行一第二運算，從而獲取至少一音符獎勵函數； (4)提供一第三運算模組以利用一深度強化學習演算法對該至少一預訓練權重參數進行一初始化設定，且對儲存於該樂理資料庫之中的複數組樂理資料、該至少一音符獎勵函數以及該至少一預訓練權重參數執行一第三運算，從而獲取複數個複音樂曲資料。 And, in order to achieve the main purpose of the above-mentioned present invention, the inventor of the present case provides an embodiment of the automatic composition method using the confrontation generation network and the inverse reinforcement learning method at the same time, which is applied to an electronic device through the A processor of the electronic device implements, and includes the following steps: (1) providing a music feature extraction unit for performing a feature extraction process on a plurality of music data stored in a music database of the electronic device, thereby extracting a plurality of music features; (2) providing a first operation module to perform a first operation on the plurality of music features by using a deep learning algorithm, thereby obtaining at least one music probability feature and at least one pre-training weight parameter; (3) providing a second operation module to perform a second operation on the plurality of music features by using an enhanced learning algorithm, so as to obtain at least one note reward function; (4) Provide a third calculation module to use a deep reinforcement learning algorithm to initialize the at least one pre-training weight parameter, and to store the plurality of sets of music theory data stored in the music theory database, the at least one The note reward function and the at least one pre-trained weight parameter perform a third operation to obtain a plurality of complex music data.

為了能夠更清楚地描述本發明所提出之一種使用對抗生成網路與逆增強式學習法的自動作曲系統及方法，以下將配合圖式，詳盡說明本發明之較佳實施例。In order to more clearly describe an automatic composition system and method using an adversarial generative network and an inverse reinforcement learning method proposed by the present invention, preferred embodiments of the present invention will be described in detail below with reference to the drawings.

圖1顯示應用有本發明之一種使用對抗生成網路與逆增強式學習法的自動作曲系統的一電子裝置的立體圖。並且，圖2顯示本發明之使用對抗生成網路與逆增強式學習法的自動作曲系統的功能方塊圖。如圖1所示，本發明之使用對抗生成網路與逆增強式學習法的自動作曲系統1應用於一電子裝置2之中。在一實施例中，本發明之自動作曲系統1安裝在該電子裝置2的一作業系統(OS)之中，使該電子裝置2的一處理器透過執行本發明之自動作曲系統1而能夠依據複數個參考樂曲資料而產生一樂曲。如圖2所示，本發明之自動作曲系統1主要包括：一樂曲資料庫11、一樂曲特徵萃取單元12、一第一運算模組13、一第二運算模組14、一第三運算模組15以及一樂理資料庫16。其中，所述樂曲資料庫用以儲存複數個樂曲資料，且所述樂理資料庫16用以儲存複數個樂理資料。所述樂曲特徵萃取單元12對各個所述參考樂曲資料執行一特徵萃取處理，從而萃取出複數個樂曲特徵。更具體地說明，所述樂曲特徵係以MIDI（Musical Instrument Digital Interface）值紀錄樂曲的音高。圖3顯示紀錄樂曲音高之示意圖。如圖3所示，位於右邊之矩陣，水平向的列(row)代表每個節拍演奏的和絃，而垂直向的行(column)為以MIDI值紀錄之音高；其中，圖3為8個節拍的和旋；從圖3之矩陣可以得知其第一個音符演奏節拍為二又二分之一拍。也就是說，透過此紀錄方式可以記錄下樂曲之複音(harmony)旋律。接著，該樂曲特徵萃取單元12之示意矩陣如下(1)所示:

…………….(1) FIG. 1 shows a perspective view of an electronic device applied with an automatic composition system using an adversarial generative network and an inverse reinforcement learning method of the present invention. Moreover, FIG. 2 shows a functional block diagram of an automatic composition system using an adversarial generative network and an inverse reinforcement learning method of the present invention. As shown in FIG. 1 , an automatic music composition system 1 using an adversarial generative network and an inverse reinforcement learning method of the present invention is applied to an electronic device 2 . In one embodiment, the automatic composition system 1 of the present invention is installed in an operating system (OS) of the electronic device 2, so that a processor of the electronic device 2 can execute the automatic composition system 1 of the present invention according to A plurality of reference music data are used to generate a music. As shown in Figure 2, the automatic composition system 1 of the present invention mainly includes: a music database 11, a music feature extraction unit 12, a first computing module 13, a second computing module 14, a third computing module Group 15 and a database 16 of music theory. Wherein, the music database is used to store a plurality of music data, and the music theory database 16 is used to store a plurality of music theory data. The music feature extraction unit 12 performs a feature extraction process on each of the reference music data, thereby extracting a plurality of music features. More specifically, the feature of the music is to record the pitch of the music in MIDI (Musical Instrument Digital Interface) value. Fig. 3 shows a schematic diagram of recording the pitch of a music piece. As shown in Figure 3, in the matrix on the right, the horizontal row (row) represents the chord played for each beat, and the vertical row (column) is the pitch recorded in MIDI value; among them, Figure 3 is 8 The chord of a beat; from the matrix in Figure 3, it can be known that the first note's performance beat is two and one-half beats. That is to say, the polyphonic (harmony) melody of the music can be recorded through this recording method. Then, the schematic matrix of the music feature extraction unit 12 is shown in (1):

…………….(1)

於上式(1)中，N為每一個音高之MIDI值，T為第幾節拍，p記錄有無演奏，且a為相對應的演奏法(articulation)。補充說明的是，該第一運算模組13利用一深度增強式學習演算法對該複數個樂曲特徵執行一第一運算，從而獲取至少一樂曲概率特徵以及至少一預訓練權重(weights)參數。圖4顯示第一運算模組之功能方塊圖。如圖4所示，該第一運算模組13包括：一第一運算單元131、一第二運算單元132、一第三運算單元133以及一第四運算單元134。更具體地說明，該第一運算單元131用以將各個所述樂曲特徵分別轉換成一音符向量特徵；其中，所述音符向量特徵之長度為79，且係由當前音調向量特徵、當前音高向量特徵、前音符之音高向量特徵、後音符之音高向量特徵、以及節拍向量特徵所組成。接著，該第二運算單元132用以利用一深度學習演算法對所述音符向量特徵執行沿著時間軸的一第四運算，從而獲取至少一時間參數。並且，該第三運算單元133用以利用一深度學習演算法對所述音符向量特徵執行沿著音符軸的一第五運算，從而獲取至少一音符參數。值得說明的是，所數深度學習演算法為一長短期記憶(Long short-term memory, LSTM)神經網路演算法。其中，所述第一運算模組13透過該第二運算單元132與該第三運算單元133分別沿著時間軸與音符軸做所述長短期記憶神經網路演算法，換句話說，本發明之第一運算模組13係採用雙軸結構(Bi-axial architecture)之長短期記憶神經網路演算法。In the above formula (1), N is the MIDI value of each pitch, T is the beat number, p records whether there is performance, and a is the corresponding articulation. It is supplemented that the first operation module 13 uses a deep enhanced learning algorithm to perform a first operation on the plurality of music features, so as to obtain at least one music probability feature and at least one pre-training weights (weights) parameter. FIG. 4 shows a functional block diagram of the first computing module. As shown in FIG. 4 , the first computing module 13 includes: a first computing unit 131 , a second computing unit 132 , a third computing unit 133 and a fourth computing unit 134 . More specifically, the first calculation unit 131 is used to convert each of the music features into a note vector feature; wherein, the length of the note vector feature is 79, and is composed of the current pitch vector feature, the current pitch vector feature, the pitch vector feature of the front note, the pitch vector feature of the back note, and the beat vector feature. Next, the second computing unit 132 is configured to use a deep learning algorithm to perform a fourth computing along the time axis on the note vector features, so as to obtain at least one time parameter. Moreover, the third operation unit 133 is configured to use a deep learning algorithm to perform a fifth operation along the note axis on the note vector features, so as to obtain at least one note parameter. It is worth noting that the deep learning algorithm is a long short-term memory (Long short-term memory, LSTM) neural network algorithm. Wherein, the first computing module 13 uses the second computing unit 132 and the third computing unit 133 to perform the long-short-term memory neural network algorithm along the time axis and the note axis respectively. In other words, the present invention The first calculation module 13 adopts a biaxial structure (Bi-axial architecture) long short-term memory neural network algorithm.

承上述，該第四運算單元134利用一深度學習演算法(非遞歸線性演算法)對所述時間參數、所述音符參數執行一第五運算，從而獲取至少一訓練權重參數以及由至少一音符概率特徵以及至少一演奏法概率特徵所組成的所述樂曲概率特徵。補充說明的是，為了避免所述長短期記憶神經網路演算法過度擬合，本發明之第一運算模組13利用一捨棄(dropout)演算法於該第四運算與該第五運算之中。As mentioned above, the fourth operation unit 134 uses a deep learning algorithm (non-recursive linear algorithm) to perform a fifth operation on the time parameter and the note parameter, so as to obtain at least one training weight parameter and at least one note The music probability feature composed of the probability feature and at least one articulation probability feature. It is supplemented that, in order to avoid the overfitting of the LSTM neural network algorithm, the first calculation module 13 of the present invention uses a dropout algorithm in the fourth calculation and the fifth calculation.

承上述，該第二運算模組14用以利用一增強式學習演算法對該複數個樂曲特徵執行一第二運算，從而獲取至少一音符獎勵函數(reward function)。更具體地說明，所述增強式學習演算法係為一對抗性逆增強式學習(Adversarial Inverse Reinforcement Learning, AIRL)演算法。所述對抗性逆增強式學習演算法係類似於習知生成對抗式成本指導學習法(Generative Adversarial Guided Cost Learning, GAN-GCL)的架構；換句話說，透過該第二運算模組14所利用的所述逆增強式學習演算法，以令其獲取最大化之獎勵(Reward)，進而獲取所述音符獎勵函數。其中，其分別同時訓練一生成器(generator)以及一判別器(discriminator)。其中，利用下式(2)、(3)和(4)可推得前述判別器。 r _t(s,a)=c* r mt(s,a)+(1-c)* r airl(s,a) ……………..(2) L(θ)=E[r(s,a)+ γ(maxa’Q(s’,a’;θ ^- )-Q(s,a;θ)) ²] …………….(3)

………….…..(4) According to the above, the second operation module 14 is used for performing a second operation on the plurality of music features by using a reinforcement learning algorithm, so as to obtain at least one note reward function. More specifically, the reinforcement learning algorithm is an Adversarial Inverse Reinforcement Learning (AIRL) algorithm. The adversarial inverse reinforcement learning algorithm is similar to the structure of the conventionally known Generative Adversarial Guided Cost Learning (GAN-GCL); in other words, through the second computing module 14 utilized The inverse reinforcement learning algorithm to obtain the maximum reward (Reward), and then obtain the note reward function. Wherein, it respectively trains a generator and a discriminator simultaneously. Wherein, the aforementioned discriminator can be deduced by using the following formulas (2), (3) and (4). r _t (s,a)=c* r mt(s,a)+(1-c)* r airl(s,a) ……………..(2) L(θ)=E[r( s,a)+ γ(maxa'Q(s',a';θ ^- )-Q(s,a;θ)) ² ] …………….(3)

……………..(4)

於上式(2)與上式(3)中，其中c為常數，s為當前狀態，a為當前動作，s’為下一狀態，a’為下一動作， r mt為樂理獎勵函數， r airl為對抗式逆強化學習獎勵函數，θ ^- 為目標Q網絡(Target-Q network)的權重，γ為未來獎勵的折扣因子(discount factor)。於上式(4)中，其中θ為Q網絡(Q-network)的權重(weights)，q(τ)為生成器密度(generator density)。並且，

為實際分布p(τ)被以波爾曼分佈(Boltzmann distribution)所表示之，且其獎勵函數為能量函數。繼續地說明本發明之技術，該第三運算模組15用以利用一深度強化學習演算法藉由該至少一預訓練權重參數進行一初始化設定，且對該儲存於一樂理資料庫16之中的複數組樂理資料、該至少一音符獎勵函數以及該至少一預訓練權重參數執行一第三運算，從而獲取複數個複音樂曲資料。其中，本實施例之第三運算模組15所使用之深度強化學習演算法為一深度Q網絡學習(Deep Q Learning network, DQN)演算法結合所述雙軸式長短期記憶神經網路演算法。更詳細地說明，本發明人設定不同數值的獎勵折扣因子c(Reward discount factor)之模擬與實驗，得到以下表(1)：

Value of c 0.25 0.5 0.75 樂理獎勵 32.55 34.27 34.15 AIRL獎勵 25.3 29.71 22.46 平均值 28.93 31.99 28.31

表(1) In the above formula (2) and the above formula (3), where c is a constant, s is the current state, a is the current action, s' is the next state, a' is the next action, r mt is the music theory reward function, r airl is the reward function of confrontational inverse reinforcement learning, θ ^- is the weight of the target Q network (Target-Q network), and γ is the discount factor for future rewards. In the above formula (4), where θ is the weights of the Q-network (Q-network), and q(τ) is the generator density. and,

The actual distribution p(τ) is represented by a Boltzmann distribution, and its reward function is an energy function. Continuing to illustrate the technology of the present invention, the third calculation module 15 is used to use a deep reinforcement learning algorithm to perform an initialization setting through the at least one pre-training weight parameter, and store it in a music theory database 16 A third operation is performed on the complex sets of music theory data, the at least one note reward function and the at least one pre-trained weight parameter, so as to obtain a plurality of complex music data. Wherein, the deep reinforcement learning algorithm used by the third computing module 15 of this embodiment is a Deep Q Learning network (DQN) algorithm combined with the biaxial long-short-term memory neural network algorithm. To explain in more detail, the inventors set different values of the reward discount factor c (Reward discount factor) for simulation and experimentation, and obtained the following table (1):

Value of c 0.25 0.5 0.75 Music Theory Award 32.55 34.27 34.15 AIRL rewards 25.3 29.71 22.46 average value 28.93 31.99 28.31

Table 1)

接著，發明人將本發明所產生的所述複音樂曲資料與不同演算法所產生的樂曲進行比對，所述比對為由演算法產生的樂曲資料與現有的人類樂曲(即，參考樂曲資料或大師樂曲資料)之間的一差異性比對，比對結果如下表(2): 雙軸LSTM 樂理 AIRL 樂理+AIRL Ratio polyphonicity 0.108861 0.084123 0.040719 0.020159 Ratio steps in scale 0.301088 0.294629 0.092172 0.171911 Ratio empty bars 0.002749 0.000134 0.051174 0.020323 Ratio unique chords 0.540401 0.192479 0.026132 0.078249 Ratio chords repeated 0.157354 0.138524 0.085684 0.031908 Ratio tonality 0.022615 0.085325 0.058328 0.006202 Ratio chords in repeated motif 0.275816 0.222622 0.270690 0.264218 表(2) Next, the inventor compares the complex music data produced by the present invention with the music produced by different algorithms, the comparison is the music data generated by the algorithm and the existing human music (that is, the reference music) data or master music data), the comparison results are shown in the following table (2): Biaxial LSTM music theory AIRL Music theory+AIRL Ratio polyphonicity 0.108861 0.084123 0.040719 0.020159 Ratio steps in scale 0.301088 0.294629 0.092172 0.171911 Ratio empty bars 0.002749 0.000134 0.051174 0.020323 Ratio unique chords 0.540401 0.192479 0.026132 0.078249 Ratio chords repeated 0.157354 0.138524 0.085684 0.031908 Ratio tonality 0.022615 0.085325 0.058328 0.006202 Ratio chords in repeated motifs 0.275816 0.222622 0.270690 0.264218 Table 2)

由上表(2)可以得知，本實施例所採用的樂理以及AIRL演算法並用的差異性比對數值，有三項數值最低。值得說明的是，數值越低即越相似人類所做的樂曲。換句話說，前述之差異性比對的差異值越小，表示本發明之自動作曲系統1所做出的樂曲越接近人類所創作之樂曲。進一步地，本發明人對不同演算法架構以及人類樂曲進行使用者的喜好調查，如下表(3)所示：雙軸LSTM 樂理 AIRL 樂理+AIRL 人類作曲 Total times preferred 20 66 30 69 75 Percentage preferred 19% 63% 29% 66% 72% Standard Deviation 4.02 4.91 4.62 4.82 4.57 表(3) It can be seen from the above table (2) that among the difference comparison values of the music theory and the AIRL algorithm used in this embodiment, three values are the lowest. It is worth noting that the lower the value, the more similar to the music made by humans. In other words, the smaller the difference value of the aforementioned difference comparison, the closer the music created by the automatic composition system 1 of the present invention is to the music composed by humans. Further, the inventor conducted a user preference survey on different algorithm frameworks and human music, as shown in the following table (3): Biaxial LSTM music theory AIRL Music theory+AIRL human composition Total times preferred 20 66 30 69 75 Percentage preferred 19% 63% 29% 66% 72% Standard Deviation 4.02 4.91 4.62 4.82 4.57 table 3)

由上表(3)可以得知，本實施例所採用的樂理以及AIRL演算法架構獲得的喜好程度僅低於人類所作的曲子，且差異性小。從上述表(2)的客觀分析以及表(3)的主觀分析都可以得出本發明之對抗生成網路與逆增強式學習法的自動作曲系統所產生的樂曲最為接近人類所做得樂曲。換句話說，本發明所產生的曲目具有悅耳且受人類喜愛的優點。It can be known from the above table (3) that the degree of preference obtained by the music theory and the AIRL algorithm architecture adopted in this embodiment is only lower than that of the music composed by humans, and the difference is small. From the objective analysis of the above table (2) and the subjective analysis of the table (3), it can be concluded that the music produced by the automatic composition system of the present invention against the generative network and the inverse reinforcement learning method is the closest to the music made by humans. In other words, the repertoire produced by the present invention has the advantage of being pleasing to the ear and loved by humans.

如此，上述係已完整說明本發明之一種使用對抗生成網路與逆增強式學習法的自動作曲系統，接著，下文中將繼續說明本發明之一種使用對抗生成網路與逆增強式學習法的自動作曲方法。繼續地參閱圖1與圖2，且同時參閱圖5與圖6，其顯示本發明之一種使用對抗生成網路與逆增強式學習法的自動作曲方法的第一流程圖與第二流程圖。本發明之使用對抗生成網路與逆增強式學習法的自動作曲方法係應用於一電子裝置2之中，使該電子裝置2的處理器透過執行本發明之自動作曲方法1而能夠依據複數個參考樂曲資料而產生一樂曲。如圖5與圖6所示，本發明之自動作曲方法包括多個執行步驟。首先，於步驟S1之中，該樂曲特徵萃取單元12對儲存於該樂曲資料庫11之中的複數個參考樂曲資料分別進行一特徵萃取處理，從而萃取出複數個樂曲特徵。In this way, the above-mentioned system has fully described an automatic composition system using the confrontation generation network and the inverse reinforcement learning method of the present invention. Next, the following will continue to describe an automatic composition system using the confrontation generation network and the inverse reinforcement learning method Automatic composition method. Continuing to refer to FIG. 1 and FIG. 2 , and refer to FIG. 5 and FIG. 6 at the same time, which show a first flowchart and a second flowchart of an automatic composition method using an adversarial generative network and an inverse reinforcement learning method of the present invention. The automatic composition method using the confrontation generative network and the inverse reinforcement learning method of the present invention is applied in an electronic device 2, so that the processor of the electronic device 2 can perform the automatic composition method 1 according to a plurality of A music is generated by referring to the music data. As shown in FIG. 5 and FIG. 6 , the automatic composition method of the present invention includes multiple execution steps. First, in step S1, the music feature extraction unit 12 performs a feature extraction process on the plurality of reference music data stored in the music database 11, thereby extracting a plurality of music features.

如圖5與圖6所示，方法流程係接著執行步驟S2：該第一運算模組利用一深度學習演算法對該複數個樂曲特徵執行一第一運算，從而獲取至少一樂曲概率特徵以及至少一預訓練權重參數。值得注意的是，方法流程會執行步驟S3：該第二運算模組(AIRL)利用一增強式學習演算法對該複數個樂曲特徵執行一第二運算，從而獲取至少一音符獎勵函數(Reward function)。進一步地，於步驟S4之中，該第三運算模組利用一深度強化學習演算法對該至少一預訓練權重參數進行一初始化設定，且對儲存於該樂理資料庫之中的複數組樂理資料、該至少一音符獎勵函數以及該至少一預訓練權重參數執行一第三運算，從而獲取複數個複音參考樂曲資料。 As shown in FIG. 5 and FIG. 6, the method flow is followed by step S2: the first operation module uses a deep learning algorithm to perform a first operation on the plurality of music features, thereby obtaining at least one music probability feature and at least one A pretrained weight parameter. It is worth noting that the method flow will execute step S3: the second operation module (AIRL) uses an enhanced learning algorithm to perform a second operation on the plurality of music features, thereby obtaining at least one note reward function (Reward function ). Further, in step S4, the third operation module uses a deep reinforcement learning algorithm to perform an initialization setting on the at least one pre-training weight parameter, and performs an initialization setting on the complex sets of music theory data stored in the music theory database , the at least one note reward function and the at least one pre-trained weight parameter perform a third operation, so as to obtain a plurality of polyphonic reference music data .

更詳細地說明，步驟S2之中包括以下步驟：步驟S21之中，該第一運算模組13的一第一運算單元131將各個所述樂曲特徵分別轉換成一音符向量特徵。接著，執行步驟S22，該第一運算模組13的一第二運算單元132利用一深度學習演算法對所述音符向量特徵執行沿著時間軸的一第四運算，從而獲取至少一時間參數。如圖5所示，方法流程係接著執行步驟S23，該第一運算模組13的一第三運算單元133用以利用一深度學習演算法對所述音符向量特徵執行沿著音符軸的一第五運算，從而獲取至少一音符參數。最後，該第一運算模組13的一第四運算單元134用以利用一深度學習演算法(非遞歸線性演算法)對所述時間參數、所述音符參數執行一第五運算，從而獲取至少一訓練權重參數以及由至少一音符概率特徵以及至少一演奏法概率特徵所組成的所述樂曲概率特徵。In more detail, step S2 includes the following steps: in step S21, a first computing unit 131 of the first computing module 13 converts each of the music features into a note vector feature. Next, step S22 is executed, a second computing unit 132 of the first computing module 13 uses a deep learning algorithm to perform a fourth computing along the time axis on the note vector features, so as to obtain at least one time parameter. As shown in FIG. 5 , the method flow is then executed in step S23. A third computing unit 133 of the first computing module 13 is used to perform a first step along the note axis on the note vector feature using a deep learning algorithm. Five operations, so as to obtain at least one note parameter. Finally, a fourth calculation unit 134 of the first calculation module 13 is used to perform a fifth calculation on the time parameter and the note parameter by using a deep learning algorithm (non-recursive linear algorithm), so as to obtain at least A training weight parameter and the music probability feature composed of at least one note probability feature and at least one articulation probability feature.

如此，上述係已完整且清楚地說明本發明之一種使用對抗生成網路與逆增強式學習法的自動作曲系統及方法；並且，經由上述可知本發明係具有下列之優點：In this way, the above system has completely and clearly described an automatic composition system and method using the confrontation generation network and the inverse reinforcement learning method of the present invention; and, it can be seen from the above that the present invention has the following advantages:

(1)本發明主要以一樂曲資料庫11、一樂曲特徵萃取單元12、一第一運算模組13、一第二運算模組14、一第三運算模組15組成本發明之使用對抗生成網路與逆增強式學習法的自動作曲系統1。特別地，所述樂曲特徵萃取單元12對各個所述參考樂曲資料執行一特徵萃取處理，從而萃取出複數個樂曲特徵。其中，所述樂曲特徵係以矩陣的方式將樂曲之和聲紀錄其中。此外，該第二運算模組14利用一逆增強式學習演算法對該複數個樂曲特徵執行一第二運算以獲取至少一音符獎勵函數。該第三運算模組15利用一深度強化學習演算法藉由至少一預訓練權重參數進行一初始化設定，且對該複數組樂理資料、所述音符獎勵函數值及所述預訓練權重參數執行一第三運算，從而獲取複數個複音參考樂曲資料。經由上述客觀與主觀的分析，可以得知本發明使用對抗生成網路與逆增強式學習法的自動作曲系統1所產出的樂曲最為接近人類所做的曲子。並且，本發明之系統產出的樂曲不僅悅耳且較受人類的喜愛。(1) The present invention mainly uses a music database 11, a music feature extraction unit 12, a first computing module 13, a second computing module 14, and a third computing module 15 to form the use confrontation generation of the present invention An automatic music composition system for networks and inverse reinforcement learning1. In particular, the music feature extraction unit 12 performs a feature extraction process on each of the reference music data, thereby extracting a plurality of music features. Wherein, the feature of the music is to record the harmony of the music in the form of matrix. In addition, the second operation module 14 uses an inverse reinforcement learning algorithm to perform a second operation on the plurality of music features to obtain at least one note reward function. The third operation module 15 uses a deep reinforcement learning algorithm to perform an initialization setting by at least one pre-training weight parameter, and executes a method on the complex set of music theory data, the value of the note reward function and the pre-training weight parameter. The third operation is to obtain a plurality of polyphonic reference music materials. Through the above objective and subjective analysis, it can be known that the music produced by the automatic composition system 1 using the confrontation generative network and the inverse reinforcement learning method of the present invention is the closest to the music made by humans. Moreover, the music produced by the system of the present invention is not only pleasing to the ear but also more popular with human beings.

必須加以強調的是，上述之詳細說明係針對本發明可行實施例之具體說明，惟該實施例並非用以限制本發明之專利範圍，凡未脫離本發明技藝精神所為之等效實施或變更，均應包含於本案之專利範圍中。It must be emphasized that the above detailed description is a specific description of a feasible embodiment of the present invention, but the embodiment is not used to limit the patent scope of the present invention, any equivalent implementation or modification that does not depart from the technical spirit of the present invention, All should be included in the patent scope of this case.

＜本發明＞ 1:自動作曲系統 2:電子裝置 11: 樂曲資料庫 12:樂曲特徵萃取單元 13:第一運算模組 131:第一運算單元 132:第二運算單元 133:第三運算單元 134:第四運算單元 14:第二運算模組 15:第三運算模組 16:樂理資料庫 S1~S4:步驟 S21~S24:步驟 <The present invention> 1: Automatic composition system 2: Electronic device 11: Song Database 12: Music Feature Extraction Unit 13: The first computing module 131: The first computing unit 132: Second computing unit 133: The third computing unit 134: The fourth computing unit 14: Second computing module 15: The third computing module 16: Music theory database S1~S4: steps S21~S24: Steps

＜習知＞無＜Knowledge＞ none

圖1顯示應用有本發明之一種使用對抗生成網路與逆增強式學習法的自動作曲系統的一電子裝置的立體圖；圖2顯示本發明之使用對抗生成網路與逆增強式學習法的自動作曲系統的功能方塊圖；圖3顯示紀錄樂曲音高之示意圖；圖4顯示第一運算模組的功能方塊圖；圖5顯示本發明之使用對抗生成網路與逆增強式學習法的自動作曲方法的第一流程圖；以及圖6顯示本發明之使用對抗生成網路與逆增強式學習法的自動作曲方法的第二流程圖。 FIG. 1 shows a perspective view of an electronic device applied with an automatic composition system using an adversarial generative network and an inverse reinforcement learning method according to the present invention; Fig. 2 shows the functional block diagram of the automatic composition system using the confrontation generation network and the inverse reinforcement learning method of the present invention; Figure 3 shows a schematic diagram of recording the pitch of a musical piece; Fig. 4 shows the functional block diagram of the first computing module; Fig. 5 shows the first flow chart of the automatic composition method using the confrontation generation network and the inverse reinforcement learning method of the present invention; and FIG. 6 shows the second flow chart of the automatic composition method using adversarial generative network and inverse reinforcement learning method of the present invention.

1: 使用對抗生成網路與逆增強式學習法的自動作曲系統 11: 樂曲資料庫 12: 樂曲特徵萃取單元 13: 第一運算模組 14: 第二運算模組 15: 第二運算模組 16: 樂理資料庫 1: Automatic composition system using adversarial generative networks and inverse reinforcement learning 11: Song Database 12: Music Feature Extraction Unit 13: The first computing module 14: Second computing module 15: Second computing module 16: Music Theory Database

Claims

An automatic composition system, which is applied in an electronic device, so that the electronic device generates a music according to a plurality of reference music data; the automatic composition system includes: a music database for storing the plurality of reference music data; A music feature extraction unit is used to perform a feature extraction process on each of the reference music data, thereby extracting a plurality of music features; a first computing module is used to use a deep enhanced learning algorithm to extract a plurality of music features. The music feature performs a first operation, thereby obtaining a music feature probability (probability) and at least one pre-training weight parameter that can be represented by a Bollman distribution (Boltzmann distribution); wherein, the plurality of music features are stored based on MIDI values. The pitch of the note, and the music feature probability includes at least one selected from the group consisting of the note probability and the articulation probability; a second operation module for using a reinforcement learning algorithm to The plurality of music features perform a second operation to obtain at least one note reward function; a third operation module is used to use a deep reinforcement learning algorithm to perform an initialization setting by using the at least one pre-training weight parameter, And a third operation is performed on the complex sets of music theory data stored in a music theory database, the at least one note reward function and the at least one pre-trained weight parameter, so as to obtain a plurality of complex music data.

The automatic composition system according to claim 1, wherein the first computing module includes: A first calculation unit is used to convert each of the music features into a note vector feature; a second calculation unit is used to perform a fourth algorithm along the time axis on the note vector feature using a deep learning algorithm. operation, so as to obtain at least one time parameter; a third operation unit, for using a deep learning algorithm to perform a fifth operation along the note axis on the note vector feature, thereby obtaining at least one note parameter; a fourth The computing unit is configured to use a non-recursive deep learning algorithm to perform a fifth operation on the time parameter and the note parameter, so as to obtain at least one training weight parameter, the note probability, and the articulation probability.

The automatic composition system according to claim 2, wherein the first calculation module uses a discarding algorithm in the fourth calculation and the fifth calculation, so as to prevent the deep learning algorithm from overfitting.

The automatic composition system as described in claim 2, wherein the deep learning algorithm is a long short-term memory neural network algorithm.

The automatic composition system as described in Claim 1, wherein the deep reinforcement learning algorithm is a deep Q network learning algorithm.

The automatic composition system as described in claim 4, wherein the deep reinforcement learning algorithm is a deep Q network learning algorithm combined with the long short-term memory neural network algorithm method, and the long-short-term memory neural network algorithm is a biaxial long-short-term memory neural network algorithm.

The automatic composition system as described in Claim 1, wherein the reinforcement learning algorithm is an adversarial inverse reinforcement learning algorithm.

The automatic composition system as described in claim 2, wherein the length of the note vector feature is 79, and is composed of the current pitch vector feature, the current pitch vector feature, the pitch vector feature of the previous note, and the pitch of the back note Vector features, and beat vector features.

An automatic composition method is applied to an electronic device to be realized by a processor of the electronic device, and includes the following steps: (1) providing a music feature extraction unit for storing in a music database of the electronic device A feature extraction process is performed on the plurality of music data, thereby extracting a plurality of music features; (2) providing a first computing module to perform a first calculation on the multiple music features using a deep learning algorithm , so as to obtain a music feature probability (probability) and at least one pre-training weight parameter that can be represented by Bollman distribution (Boltzmann distribution); wherein, the plurality of music features are the pitches of multiple notes stored based on MIDI values, and the The music feature probability includes at least one selected from the group consisting of note probability and articulation probability; (3) provide a second operation module to utilize a reinforcement learning algorithm to perform a second operation on the plurality of music features, thereby obtaining at least one note reward function; (4) provide a third operation module to utilize A deep reinforcement learning algorithm performs an initialization setting on the at least one pre-training weight parameter, and executes on the plurality of sets of music theory data stored in the music theory database, the at least one note reward function, and the at least one pre-training weight parameter A third operation to obtain a plurality of polyphonic music data.

The automatic composition method as described in claim 9, wherein the first computing module includes a first computing unit, a second computing unit, a third computing unit and a fourth computing unit, and the steps ( 2) further comprising the following steps: (21) the first computing unit converts each of the music features into a note vector feature; (22) the second computing unit utilizes a deep learning algorithm to perform the following steps on the note vector feature A fourth operation along the time axis to obtain at least one time parameter; (23) the third operation unit uses a deep learning algorithm to perform a fifth operation on the note vector feature along the note axis to obtain at least A note parameter; (24) the fourth operation unit utilizes a non-recursive deep learning algorithm to perform a fifth operation on the time parameter and the note parameter, thereby obtaining at least one training weight parameter, the note probability and The articulation probability.

The automatic composition method according to claim 10, wherein, when performing the first calculation, the first calculation module uses a dropout algorithm (dropout) to prevent a calculation result of the deep learning algorithm from being excessive fit.

The automatic composition method as described in Claim 9, wherein the deep learning algorithm is a long short-term memory neural network.

The automatic composition method as described in claim 9, wherein the deep reinforcement learning algorithm is a deep Q learning algorithm (Deep Q Learning network, DQN).

The automatic composition method as described in claim 13, wherein the deep reinforcement learning algorithm is a deep Q learning algorithm (Deep Q Learning network, DQN) combined with the long-short-term memory neural network algorithm, and the long-short-term The memory neural network algorithm is a two-axis long-short-term memory neural network algorithm.

The automatic composition method system as described in Claim 9, wherein the reinforcement learning algorithm is an inverse reinforcement learning algorithm.

The automatic composition method as described in claim 10, wherein the length of the note vector feature is 79, which is composed of the current pitch vector feature, the current pitch vector feature, the pitch vector feature of the previous note, and the pitch vector of the back note features, and beat vector features.

The method for automatically composing music according to Claim 9, wherein the feature of the plurality of pieces of music is storing the pitches of the plurality of notes based on MIDI values.