CN110085203A - 一种基于对偶生成对抗网络的音乐风格融合方法 - Google Patents

一种基于对偶生成对抗网络的音乐风格融合方法 Download PDF

Info

Publication number
CN110085203A
CN110085203A CN201910312288.9A CN201910312288A CN110085203A CN 110085203 A CN110085203 A CN 110085203A CN 201910312288 A CN201910312288 A CN 201910312288A CN 110085203 A CN110085203 A CN 110085203A
Authority
CN
China
Prior art keywords
gan
confrontation network
data
style
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910312288.9A
Other languages
English (en)
Inventor
周武能
徐亦捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
National Dong Hwa University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN201910312288.9A priority Critical patent/CN110085203A/zh
Publication of CN110085203A publication Critical patent/CN110085203A/zh
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0091Means for obtaining special acoustic effects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/036Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal of musical genre, i.e. analysing the style of musical pieces, usually for selection, filtering or classification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/086Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for transcription of raw audio or music data to a displayed or printed staff representation or to displayable MIDI-like note-oriented data, e.g. in pianoroll format
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/131Morphing, i.e. transformation of a musical piece into a new different one, e.g. remix
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/011Files or data streams containing coded musical information, e.g. for transmission
    • G10H2240/046File format, i.e. specific or non-standard musical file format used in or adapted for electrophonic musical instruments, e.g. in wavetables
    • G10H2240/071Wave, i.e. Waveform Audio File Format, coding, e.g. uncompressed PCM audio according to the RIFF bitstream format method
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/541Details of musical waveform synthesis, i.e. audio waveshape processing from individual wavetable samples, independently of their origin or of the sound they represent
    • G10H2250/571Waveform compression, adapted for music synthesisers, sound banks or wavetables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Acoustics & Sound (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

本发明公开了一种基于对偶生成对抗网络的音乐风格融合方法,包括:将音频文件转为波形图文件训练,引入对偶学习的思想,建立三个互相耦合的生成对抗网络去完成两种不同风格的音乐序列的融合。本发明的创新点在于能够有效地对两种不同曲风的音乐进行融合生成新的序列,与音乐风格融合领域的现有方法相比,提出了使用波形图形文件做音乐生成的新思路。

Description

一种基于对偶生成对抗网络的音乐风格融合方法
技术领域
本发明涉及一种基于对偶生成对抗网络的音乐风格融合方法。
背景技术
人工智能已经在诸多领域带来变革,而在艺术创作方面也具有很大的潜力。在AI生成艺术的范畴中,与生成图像、文字不同的是,生成音乐更具挑战性,因为首先音乐是一种关于时间的艺术,其次音乐通常由多个音轨/乐器并行演奏而成,随着时间推移互相联系地展开。
风格融合与风格迁移类似,最初被应用在图像上,普通的照片可以被艺术化处理生成大师级风格艺术照,本质上风格融合与风格迁移都是对样本风格进行转换的一种技术。音乐领域的风格融合学名“fusion”,这一概念起源于60年代后期,属于爵士乐的子流派,它结合了几种音乐风格,如放克、摇滚、布鲁斯,以及爵士乐的和声和即兴创作。
用AI来做音乐风格融合可以为广告、游戏等视频的配乐节约大量时间和金钱成本,这一研究并不会取代人类作曲家,相反它会辅助人类作曲,给予人类作曲家新的灵感;同时也可以用于音乐播放软件的娱乐功能上。
生成对抗网络是一种深度学习模型,是近年来复杂分布上无监督学习最具前景的方法之一。模型框架中往往有两个模块:生成模型和判别模型,生成模型主要用来学习真实数据分布从而让自身生成的图像更加真实,以骗过判别模型。判别模型则需要对生成的数据进行真假判别。通过这一互相博弈学习的过程,可以生成能以假乱真的数据。
发明内容
本发明的目的是提供一种生成融合两种不同曲风的音乐的方法。
为了达到上述目的,本发明的技术方案是提供了一种基于对偶生成对抗网络的音乐风格融合方法,其特征在于,包括如下步骤:
(1)获取用于训练的音频数据训练集,获取的音频数据训练集被人工分为两种不同曲风;
(2)将音频数据训练集中的音频文件转为波形图文件;
(3)建立对偶生成对抗网络模型,对偶生成对抗网络模型由三个独立的生成对抗网络构成,分别为GANA、GANB、GANF,生成对抗网络GANA、GANB、GANF的判别模型与生成模型均是采用卷积神经网络,其中:GANA用于学习第一种风格数据集的数据分布,GANB用于学习第二种风格的数据集的数据分布,GANF用于对两类数据进行风格融合;
(4)将步骤(2)得到的音频数据训练集中的两种不同曲风的音频文件分别输入GANA与GANB,对GANA与GANB进行训练,随后对GANF进行训练,GANF中的判别模型DF将从GANA与GANB的判别模型DA、DB中学习并迭代更新,而GANF中的生成模型GF也从GANA与GANB的判别模型DA、DB中获取反馈并试图和两者保持等距;
(5)由训练好的GANF生成融合了两种不同曲风的波形图文件,将波形图文件转为音频即可得到最终的结果乐段。
优选地,所述生成对抗网络的算法公式为:
式中,V(D,G)表示生成对抗网络优化问题的目标函数;Pdata代表真实数据的分布;Pz代表噪声信号的分布;x表示输入的真实样本,当x~Pdata,D(x)=1,最大,D(x)表示x为真实数据的概率,表示真实数据的数学期望;z表示随机噪声,当z~Pz,D(G(z))=0,最大,G(z)表示生成模型的输出,表示随机噪声的数学期望。
生成对抗网络的判别模型使V(D,G)最大,而生成对抗网络的生成模型是使V(D,G)最小。
优选地,所述步骤(4)中,训练GANA的判别模型时,DA(A)>DA(F)>DA(B);训练GANB的判别模型时,DA(B)>DA(F)>DA(A)。
优选地,所述步骤(4)中,训练过程中用于衡量两个分布之间的距离,采用的是Wasserstein距离:
式中,W(P1,P2)表示分布P1与分布P2之间的Wasserstein距离;分别表示样本x对于分布P1与分布P2的距离期望值。
本发明能够有效地对两种不同曲风的音乐进行融合生成新的序列,与音乐风格融合领域的现有方法相比,提出了使用波形图形文件做音乐生成的新思路。
附图说明
图1为本发明实施的流程算法;
图2为本方法中的对偶生成对抗网络模型图。
具体实施方式
下面结合具体实施例,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解,在阅读了本发明讲授的内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所附权利要求书所限定的范围。
本发明提供了一种基于对偶生成对抗网络的音乐风格融合方法,以下实施例中以将“民谣”与“爵士”两种曲风为例进一步说明本发明,包括以下步骤:
(1)获取用于训练的音频数据训练集,音乐融合学名“Fusion”,主要是60年代后期出现的爵士乐的子流派,它结合了几种音乐风格,如放克、摇滚、布鲁斯以及爵士乐的和声和即兴创作。这里获取的音频数据训练集将被人工分为“民谣”与“爵士”两种曲风。
(2)将音频数据训练集中的音频文件转为波形图(wav格式)文件。音乐领域的风格融合由于其自身独特的分层与序列结构,导致与图像的风格融合相比更具难度与挑战性。而使用波形图去训练可以让生成的样本在音色听感上更接近真实样本,也可以去借鉴图像风格融合领域里现有的模型。
(3)建立对偶生成对抗网络模型,模型由三个独立的生成对抗网络(GAN)构成,分别为:GANA,GANB,GANF。GANA用于学习第一种风格数据集的数据分布,GANB用于学习第二种风格的数据集的数据分布,GANF用于对两类数据进行风格融合。
式中,V(D,G)表示生成对抗网络优化问题的目标函数;Pdata代表真实数据的分布;Pz代表噪声信号的分布;x表示输入的真实样本,当x~Pdata,D(x)=1,最大,D(x)表示x为真实数据的概率,表示真实数据的数学期望;z表示随机噪声,当z~Pz,D(G(z))=0,最大,G(z)表示生成模型的输出,表示随机噪声的数学期望。
最好的判别模型是使V(D,G)最大,而最好的生成模型是使V(D,G)最小。GAN的本质是学习数据分布,最终得到两个一样的数据分布的零和博弈问题。
本发明中的GAN的判别模型与生成模型均是采用卷积神经网络,针对图像有更快的训练速度且易于并行化。
先训练分别输入两种音乐流派数据集的GANA与GANB
负责进行风格融合的GANF中的判别模型DF将从GANA与GANB的判别模型DA、DB中学习并迭代更新。而生成模型GF也从DA、DB中获取反馈并试图和两者保持等距。
在训练好的三组GAN的基础上,有一些限制来促进一半对一半的混合:例如如果GANF生成的抽样数据的分布,距离GANA和GANB都是一样的,就是一个完美的混合;否则,就会加上一个惩罚。
训练GANA的判别模型时候,DA(A)>DA(F)>DA(B)。而训练GANB的时候也有类似限制。
而在训练过程中用于衡量两个分布之间的距离,采用的是Wasserstein距离。
式中,W(P1,P2)表示分布P1与分布P2之间的Wasserstein距离;分别表示样本x对于分布P1与分布P2的距离期望值。直观上可以把W(P1,P2)理解为在这个路径规划下把土堆P1挪到土堆P2所需要的消耗。而Wasserstein距离就是在最优路径规划下的最小消耗。所以Wesserstein距离又叫Earth-Mover距离。
(5)由训练好的GANF生成融合了“民谣”以及“爵士”两种不同曲风的波形图文件,改变初始输入到GANA与GANB进行训练的数据类型也可以得到其他风格融合的产物。将波形图文件转为音频即可得到最终的结果乐段。

Claims (4)

1.一种基于对偶生成对抗网络的音乐风格融合方法,其特征在于,包括如下步骤:
(1)获取用于训练的音频数据训练集,获取的音频数据训练集被人工分为两种不同曲风;
(2)将音频数据训练集中的音频文件转为波形图文件;
(3)建立对偶生成对抗网络模型,对偶生成对抗网络模型由三个独立的生成对抗网络构成,分别为GANA、GANB、GANF,生成对抗网络GANA、GANB、GANF的判别模型与生成模型均是采用卷积神经网络,GANA用于学习第一种风格数据集的数据分布,GANB用于学习第二种风格的数据集的数据分布,GANF用于对两类数据进行风格融合;
(4)将步骤(2)得到的音频数据训练集中的两种不同曲风的音频文件分别输入GANA与GANB,对GANA与GANB进行训练,随后对GANF进行训练,GANF中的判别模型DF将从GANA与GANB的判别模型DA、DB中学习并迭代更新,而GANF中的生成模型GF也从GANA与GANB的判别模型DA、DB中获取反馈并试图和两者保持等距;
(5)由训练好的GANF生成融合了两种不同曲风的波形图文件,将波形图文件转为音频即可得到最终的结果乐段。
2.如权利要求1所述的一种基于对偶生成对抗网络的音乐风格融合方法,其特征在于,所述生成对抗网络的算法公式为:
式中,V(D,G)表示生成对抗网络优化问题的目标函数;Pdata代表真实数据的分布;Pz代表噪声信号的分布;x表示输入的真实样本,当x~Pdata,D(x)=1,最大,D(x)表示x为真实数据的概率,表示真实数据的数学期望;z表示随机噪声,当z~Pz,D(G(z))=0,最大,G(z)表示生成模型的输出,表示随机噪声的数学期望。
生成对抗网络的判别模型使V(D,G)最大,而生成对抗网络的生成模型是使V(D,G)最小。
3.如权利要求1所述的一种基于对偶生成对抗网络的音乐风格融合方法,其特征在于,所述步骤(4)中,训练GANA的判别模型时,DA(A)>DA(F)>DA(B);训练GANB的判别模型时,DA(B)>DA(F)>DA(A)。
4.如权利要求1所述的一种基于对偶生成对抗网络的音乐风格融合方法,其特征在于,所述步骤(4)中,训练过程中用于衡量两个分布之间的距离,采用的是Wasserstein距离:
式中,W(P1,P2)表示分布P1与分布P2之间的Wasserstein距离;分别表示样本x对于分布P1与分布P2的距离期望值。
CN201910312288.9A 2019-04-18 2019-04-18 一种基于对偶生成对抗网络的音乐风格融合方法 Pending CN110085203A (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910312288.9A CN110085203A (zh) 2019-04-18 2019-04-18 一种基于对偶生成对抗网络的音乐风格融合方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910312288.9A CN110085203A (zh) 2019-04-18 2019-04-18 一种基于对偶生成对抗网络的音乐风格融合方法

Publications (1)

Publication Number Publication Date
CN110085203A true CN110085203A (zh) 2019-08-02

Family

ID=67415549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910312288.9A Pending CN110085203A (zh) 2019-04-18 2019-04-18 一种基于对偶生成对抗网络的音乐风格融合方法

Country Status (1)

Country Link
CN (1) CN110085203A (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110853605A (zh) * 2019-11-15 2020-02-28 中国传媒大学 音乐生成方法、装置及电子设备
CN113408576A (zh) * 2021-05-12 2021-09-17 上海师范大学 基于融合标签和堆叠机器学习模型的学习风格识别方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293289A (zh) * 2017-06-13 2017-10-24 南京医科大学 一种基于深度卷积生成对抗网络的语音生成方法
US20180314716A1 (en) * 2017-04-27 2018-11-01 Sk Telecom Co., Ltd. Method for learning cross-domain relations based on generative adversarial networks
CN109065021A (zh) * 2018-10-18 2018-12-21 江苏师范大学 基于条件深度卷积生成对抗网络的端到端方言辨识方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180314716A1 (en) * 2017-04-27 2018-11-01 Sk Telecom Co., Ltd. Method for learning cross-domain relations based on generative adversarial networks
CN107293289A (zh) * 2017-06-13 2017-10-24 南京医科大学 一种基于深度卷积生成对抗网络的语音生成方法
CN109065021A (zh) * 2018-10-18 2018-12-21 江苏师范大学 基于条件深度卷积生成对抗网络的端到端方言辨识方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHIQIAN CHEN ET AL.: "《Learning to Fuse Music Genres with Generative Adversarial Dual Learning》", 《ARXIV:1712.01456V1》 *
杨卫华、吴茂念主编: "《眼科人工智能》", 28 February 2018, 《湖北科学技术出版社》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110853605A (zh) * 2019-11-15 2020-02-28 中国传媒大学 音乐生成方法、装置及电子设备
CN113408576A (zh) * 2021-05-12 2021-09-17 上海师范大学 基于融合标签和堆叠机器学习模型的学习风格识别方法

Similar Documents

Publication Publication Date Title
Cook Music as creative practice
Frith ‘The magic that can set you free’: The ideology of folk and the myth of the rock community
Anderton A many-headed beast: progressive rock as European meta-genre
Anderson Soul in Seoul: African American popular music and K-pop
Bowler et al. Bigger, better, louder: the prosperity gospel's impact on contemporary Christian worship
Stone The value of popular music: An approach from post-Kantian aesthetics
CN110085263A (zh) 一种音乐情感分类和机器作曲方法
CN110085203A (zh) 一种基于对偶生成对抗网络的音乐风格融合方法
Rooksby How to Write Songs on Guitar: A Guitar-playing and Songwriting Course
Fairchild “Alternative”; music and the politics of cultural autonomy: The case of Fugazi and the DC Scene
CN103425901A (zh) 原创声响数据整理器
CN105931625A (zh) 基于文字输入的说唱音乐自动生成方法
Titon Authenticity and authentication: Mike Seeger, the New Lost City Ramblers, and the old-time music revival
Wente Magical mechanics: the player piano in the age of digital reproduction
Eigenfeldt Generating structure–towards large-scale formal generation
Wang Music composition and emotion recognition using big data technology and neural network algorithm
Wijaya et al. Song Similarity Analysis With Clustering Method On Korean Pop Song
Guocheng et al. Xinyang Folk Songs, Development and Transmission Process in Henan Province of China.
Mitrano et al. Using recurrent neural networks to judge fitness in musical genetic algorithms
Račiūnaitė-Vyčinienė The revival of Lithuanian polyphonic Sutartinės songs in the late 20th and early 21st century
Funk et al. Aesthetics and design for group music improvisation
Liu et al. Huangmei Opera in Anqing City, Anhui Province, China
Huo et al. An LSTM-based Framework for the Synthesis of Original Soundtracks
Dong A Study of the Relationship between the Expressions of Folk Dance Language Art and Culture
West Caught between jazz and pop: The contested origins, criticism, performance practice, and reception of smooth jazz

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190802