CN111785236A - Automatic composition method based on motivational extraction model and neural network - Google Patents

Automatic composition method based on motivational extraction model and neural network Download PDF

Info

Publication number
CN111785236A
CN111785236A CN201910259941.XA CN201910259941A CN111785236A CN 111785236 A CN111785236 A CN 111785236A CN 201910259941 A CN201910259941 A CN 201910259941A CN 111785236 A CN111785236 A CN 111785236A
Authority
CN
China
Prior art keywords
neural network
motive
extraction model
automatic composition
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910259941.XA
Other languages
Chinese (zh)
Inventor
陈德龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910259941.XA priority Critical patent/CN111785236A/en
Publication of CN111785236A publication Critical patent/CN111785236A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Abstract

The embodiment of the invention provides an automatic composition method based on a motivational extraction model and a neural network. The invention relates to the field of music information retrieval in computer music and the field of computer composition. The method comprises the following steps: constructing a motive extraction model, obtaining a new oddness function through Gaussian kernel convolution of a self-similarity matrix, calculating the first order difference of the new oddness function to obtain a segmentation point, performing Gaussian blur on the obtained candidate motive segments, performing mutual convolution, and solving a maximum value of a convolution result to obtain a motive similarity matrix, thereby selecting a frequently-appearing music theme; the method comprises the steps of constructing a main melody recognition model, constructing a melody generation model and constructing a deep convolution cyclic neural network with long-time and short-time memory for automatic composition. Training samples of a neural network for automatic composition are prepared using a dominant melody recognition model and a motivational extraction model. After training, the user inputs the melody fragment or samples the neural network for melody generation to obtain the melody fragment, and inputs the melody fragment into the neural network to obtain the automatic composition output of the model.

Description

Automatic composition method based on motivational extraction model and neural network
Technical Field
The invention relates to the field of music information retrieval in computer music and the field of computer composition, in particular to an automatic composition method based on an motivational extraction model and a neural network.
Background
Automatic composition based on a neural network is always a research hotspot in the field of artificial intelligence, and due to the defects of the structure of the plain neural network, the existing automatic composition method based on the neural network is difficult to generate polyphonic music with high audibility under the condition of no manual post-processing. In addition, the existing automatic composition method has insufficient combination degree with the music theory, and the design of the algorithm is not reasonable enough in terms of music theory, which is also a factor for limiting the effect of the automatic composition method.
Disclosure of Invention
Aiming at the defects or shortcomings of the existing automatic composition method, the invention provides an automatic composition method based on a motivational extraction model and a neural network. The automatic composition method based on the motivation extraction model and the neural network comprises the following steps: a motivational extraction model, the model comprising a plurality of steps: calculating a self-similarity matrix of each independent sound part in the multi-sound part music, performing convolution operation on the self-similarity matrix along a diagonal line by using a Gaussian kernel to obtain a novelty function, dividing the music sample into a plurality of segments according to the novelty function, calculating the similarity between the segments, and selecting k motivations with the maximum similarity as the extracted music motivations.
The automatic composition method based on the motivational extraction model and the neural network further comprises the following steps: an automatic composition method based on a neural network. The method comprises the following steps: a main melody recognition model, a melody generation model, and a deep convolutional cyclic neural network with long and short term memory for automatic composition.
The main melody recognition model comprises: and the multilayer neural network inputs a plurality of characteristic values of one vocal part in the multi-vocal part music and outputs a Boolean value corresponding to whether the vocal part is the main melody vocal part or not. The training sample of the multilayer neural network is a manually marked sample, and the trained neural network can judge whether the vocal part contains the main melody or not according to a plurality of characteristic values input into the single vocal part. The main melody recognition model has the following functions in the automatic composition method: and judging each vocal part in the multi-vocal part music samples by using the main melody recognition model, and taking all the monaural part music samples which enable the model to output true values as training samples of a neural network for generating the melody.
The neural network for melody generation is a bidirectional long-time memory cyclic neural network, and the training sample of the neural network is a training sample prepared by the main melody recognition model.
The training sample of the deep convolution cyclic neural network with long-time and short-time memory for automatic composition is to input all midi samples into an engine extraction model to obtain an engine set. The corresponding training output is complete polyphonic music.
The automatic composition method further comprises: the user inputs the melody fragment or samples the neural network for melody generation to obtain the melody fragment, and inputs the deep convolution cyclic neural network with long and short time memory for automatic composition to obtain the output of the model.
Compared with the prior art, the method has the technical effects that the audibility of the output music is high and the output music conforms to the music theory, and the effects are directly brought by the use of the motivation extraction method and the architecture of the neural network. In the experiment, the automatic composition method based on the motivation extraction model and the neural network can learn composition skills such as motivation reuse, modal advance and the like widely used in classical music writing.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a flowchart of a music generation method according to the present invention;
FIG. 2 is a flowchart illustrating a method of the melody recognition model according to the present invention;
FIG. 3 is a flow chart of a method of the motivational extraction model of the present invention.
Detailed Description
Fig. 1 is a schematic diagram of an automatic composition system according to an example of the present invention, in which s1, s2, s3 denote neural network training phases and s4 denotes a system application phase.
At step s1, a neural network for the melody recognition is constructed and trained, including steps s101, s102, s103 and s 104.
In step s101, first, a music MIDI data set is prepared, and each sample (i.e. each musical piece) in the data set should include a plurality of sound parts. Manually labeling the main melody for each vocal part of each sample, wherein if the vocal part contains the main melody, the label is '1', otherwise, the label is '0', and a vocal part sample set is obtained.
Step s102, processing the sound part sample set obtained in step s101, and extracting a plurality of music attributes in the music data, including: note density, average time value, maximum interval, pitch distribution variance, integral value of the novelty function (s 302), and the like are combined with the label obtained in step s101 to obtain a training set of the neural network for melody recognition.
And step s103, inputting the training set obtained in the step s102 into a neural network and training, wherein the music attribute after normalization processing is input, and the main melody label is output as a target.
Step s2 builds and trains an episodic memory recurrent neural network for melody generation. First, all music samples are inputted into the main melody recognition neural network in step s103, and a main melody set is obtained. And establishing a bidirectional long-time memory cyclic neural network, inputting the melody samples in the acquired main melody set into the neural network, and training.
Step s3 implements the extraction of the musical motivation, comprising a number of sub-steps.
In step S301, a similarity matrix S of individual voice part midi files is calculated. And the value ranges of i and j are the ith frame and the jth frame of the midi file, and the value ranges of i and j depend on the sampling rate of the midi file.
And step S302, performing convolution operation on the matrix S along a diagonal line by using a Gaussian core to obtain a novel singularity function N. Self-similarity (lower left and upper right parts of the gaussian kernel) and cross-similarity (upper left and lower right parts of the gaussian kernel) can be captured simultaneously using the gaussian kernel.
And step 303, solving the maximum value of the novelty function N by using a difference method, and taking the zero point with the distance smaller than that in the first-order difference function, which has the larger value of the corresponding novelty function N.
And step s304, taking the obtained maximum value point as a segmentation point, splitting midi, and obtaining a plurality of motivation candidate segments.
Step s305, a gaussian blur process is performed on the piano rolling shutter matrix corresponding to the midi file of the obtained motivation candidate segment, thereby allowing a small change in motivation in the subsequent step.
In step S306, the engine similarity matrix S' is calculated. Wherein, for the ith and jth motivational candidate segments, the conv _ sim () function is defined as: and (4) performing full convolution operation on the ith (blurred) engine candidate matrix by taking the jth engine candidate matrix as a convolution kernel, and taking the maximum value in the obtained matrix as the value of the corresponding conv _ sim () function. The convolution operation both tolerates errors in the segmentation operation (horizontal movement of the convolution kernel), increases robustness, and gives the model the ability to identify the tuned motive (vertical movement of the convolution kernel).
And step S307, reserving 2-4 motivation pairs with the highest similarity in the obtained motivation similarity matrix S'. In a piece of music, the motivation can repeatedly appear in the same or similar forms, so that the step not only finishes the screening of the repeatedly appearing motivation, but also filters other noises, and improves the quality of a subsequent neural network training set.
And step s308, finally, taking the obtained 2k motivation segments as input of the neural network, taking corresponding total-voice music as target output, and training the deep convolution cyclic neural network with long-time and short-time memory for automatic composition. The neural network has also been widely used for machine translation, which here can be seen as the translation of a musical motivation into complete polyphonic music.
Step s4 is an application stage, where the user inputs the melody fragment or samples the neural network for melody generation to obtain the melody fragment, and inputs the deep convolution cyclic neural network with long and short term memory for automatic composition to obtain the output of the model.

Claims (4)

1. An automatic composition method based on an engine extraction model and a neural network is characterized by comprising a music engine extraction model, a main melody recognition model, a melody generation model and a deep convolution cyclic neural network with long-time and short-time memories for automatic composition.
2. The automatic composition method based on the motivational extraction model and the neural network, according to claim 1, wherein a motivational extraction model is used to generate the training samples of the deep convolution cyclic neural network with long and short term memory for automatic composition, and the full polyphonic music corresponding to the motivational is used as the output of the deep convolution cyclic neural network with long and short term memory.
3. The motivational extraction model of claim 2, wherein: scanning 8-16 frames per second for midi samples to obtain a piano rolling curtain matrix, calculating a self-similarity matrix of the midi samples, performing convolution by using a Gaussian core to obtain a novelty function, calculating an extreme point of the obtained function by using a first-order difference method to serve as a dividing point, performing Gaussian blur with the radius of 2 (corresponding to two semitones) on the obtained motive candidate segments, performing mutual convolution on the two motive candidate segments after performing Gaussian blur with the radius of 2 (corresponding to two semitones) to obtain a motive similarity matrix, and selecting a motive corresponding to the coordinates of the largest 2-4 elements in the matrix as the output of the music motive extraction model.
4. The motivation extraction model according to claim 2, wherein after segmentation using extreme points and gaussian blur processing on candidate motivation segments, the motivation similarity matrix is calculated by a method in which a conv _ sim () function is defined as: and performing full-filling convolution operation on the ith (blurred) motive candidate matrix by taking the jth motive candidate matrix as a convolution kernel, and taking the maximum value in the obtained convolution result as the value of the corresponding conv _ sim () function.
CN201910259941.XA 2019-04-02 2019-04-02 Automatic composition method based on motivational extraction model and neural network Pending CN111785236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910259941.XA CN111785236A (en) 2019-04-02 2019-04-02 Automatic composition method based on motivational extraction model and neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910259941.XA CN111785236A (en) 2019-04-02 2019-04-02 Automatic composition method based on motivational extraction model and neural network

Publications (1)

Publication Number Publication Date
CN111785236A true CN111785236A (en) 2020-10-16

Family

ID=72754737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910259941.XA Pending CN111785236A (en) 2019-04-02 2019-04-02 Automatic composition method based on motivational extraction model and neural network

Country Status (1)

Country Link
CN (1) CN111785236A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112435642A (en) * 2020-11-12 2021-03-02 浙江大学 Melody MIDI accompaniment generation method based on deep neural network
CN116160459A (en) * 2022-12-30 2023-05-26 广州市第二中学 Music robot control method and system based on machine learning algorithm

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112435642A (en) * 2020-11-12 2021-03-02 浙江大学 Melody MIDI accompaniment generation method based on deep neural network
CN112435642B (en) * 2020-11-12 2022-08-26 浙江大学 Melody MIDI accompaniment generation method based on deep neural network
CN116160459A (en) * 2022-12-30 2023-05-26 广州市第二中学 Music robot control method and system based on machine learning algorithm
CN116160459B (en) * 2022-12-30 2023-09-29 广州市第二中学 Music robot control method and system based on machine learning algorithm

Similar Documents

Publication Publication Date Title
CN107578775B (en) Multi-classification voice method based on deep neural network
CN109065031B (en) Voice labeling method, device and equipment
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
CN106485984B (en) Intelligent teaching method and device for piano
CN109272992A (en) A kind of spoken language assessment method, device and a kind of device for generating spoken appraisal model
Kelly et al. Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors
CN102376182B (en) Language learning system, language learning method and program product thereof
CN105206258A (en) Generation method and device of acoustic model as well as voice synthetic method and device
CN101178896A (en) Unit selection voice synthetic method based on acoustics statistical model
CN103824565A (en) Humming music reading method and system based on music note and duration modeling
CN108305618B (en) Voice acquisition and search method, intelligent pen, search terminal and storage medium
CN108597538B (en) Evaluation method and system of speech synthesis system
CN107993636B (en) Recursive neural network-based music score modeling and generating method
CN112259083A (en) Audio processing method and device
CN103885924A (en) Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method
CN113823323B (en) Audio processing method and device based on convolutional neural network and related equipment
CN111785236A (en) Automatic composition method based on motivational extraction model and neural network
CN116206496A (en) Oral english practice analysis compares system based on artificial intelligence
CN108172211A (en) Adjustable waveform concatenation system and method
CN110992988A (en) Speech emotion recognition method and device based on domain confrontation
CN110390937A (en) A kind of across channel method for recognizing sound-groove based on ArcFace loss algorithm
CN111508466A (en) Text processing method, device and equipment and computer readable storage medium
CN112052686B (en) Voice learning resource pushing method for user interactive education
CN105895079A (en) Voice data processing method and device
CN110473548B (en) Classroom interaction network analysis method based on acoustic signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination