CN111785236A - Automatic composition method based on motivational extraction model and neural network - Google Patents
Automatic composition method based on motivational extraction model and neural network Download PDFInfo
- Publication number
- CN111785236A CN111785236A CN201910259941.XA CN201910259941A CN111785236A CN 111785236 A CN111785236 A CN 111785236A CN 201910259941 A CN201910259941 A CN 201910259941A CN 111785236 A CN111785236 A CN 111785236A
- Authority
- CN
- China
- Prior art keywords
- neural network
- motive
- extraction model
- automatic composition
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Abstract
The embodiment of the invention provides an automatic composition method based on a motivational extraction model and a neural network. The invention relates to the field of music information retrieval in computer music and the field of computer composition. The method comprises the following steps: constructing a motive extraction model, obtaining a new oddness function through Gaussian kernel convolution of a self-similarity matrix, calculating the first order difference of the new oddness function to obtain a segmentation point, performing Gaussian blur on the obtained candidate motive segments, performing mutual convolution, and solving a maximum value of a convolution result to obtain a motive similarity matrix, thereby selecting a frequently-appearing music theme; the method comprises the steps of constructing a main melody recognition model, constructing a melody generation model and constructing a deep convolution cyclic neural network with long-time and short-time memory for automatic composition. Training samples of a neural network for automatic composition are prepared using a dominant melody recognition model and a motivational extraction model. After training, the user inputs the melody fragment or samples the neural network for melody generation to obtain the melody fragment, and inputs the melody fragment into the neural network to obtain the automatic composition output of the model.
Description
Technical Field
The invention relates to the field of music information retrieval in computer music and the field of computer composition, in particular to an automatic composition method based on an motivational extraction model and a neural network.
Background
Automatic composition based on a neural network is always a research hotspot in the field of artificial intelligence, and due to the defects of the structure of the plain neural network, the existing automatic composition method based on the neural network is difficult to generate polyphonic music with high audibility under the condition of no manual post-processing. In addition, the existing automatic composition method has insufficient combination degree with the music theory, and the design of the algorithm is not reasonable enough in terms of music theory, which is also a factor for limiting the effect of the automatic composition method.
Disclosure of Invention
Aiming at the defects or shortcomings of the existing automatic composition method, the invention provides an automatic composition method based on a motivational extraction model and a neural network. The automatic composition method based on the motivation extraction model and the neural network comprises the following steps: a motivational extraction model, the model comprising a plurality of steps: calculating a self-similarity matrix of each independent sound part in the multi-sound part music, performing convolution operation on the self-similarity matrix along a diagonal line by using a Gaussian kernel to obtain a novelty function, dividing the music sample into a plurality of segments according to the novelty function, calculating the similarity between the segments, and selecting k motivations with the maximum similarity as the extracted music motivations.
The automatic composition method based on the motivational extraction model and the neural network further comprises the following steps: an automatic composition method based on a neural network. The method comprises the following steps: a main melody recognition model, a melody generation model, and a deep convolutional cyclic neural network with long and short term memory for automatic composition.
The main melody recognition model comprises: and the multilayer neural network inputs a plurality of characteristic values of one vocal part in the multi-vocal part music and outputs a Boolean value corresponding to whether the vocal part is the main melody vocal part or not. The training sample of the multilayer neural network is a manually marked sample, and the trained neural network can judge whether the vocal part contains the main melody or not according to a plurality of characteristic values input into the single vocal part. The main melody recognition model has the following functions in the automatic composition method: and judging each vocal part in the multi-vocal part music samples by using the main melody recognition model, and taking all the monaural part music samples which enable the model to output true values as training samples of a neural network for generating the melody.
The neural network for melody generation is a bidirectional long-time memory cyclic neural network, and the training sample of the neural network is a training sample prepared by the main melody recognition model.
The training sample of the deep convolution cyclic neural network with long-time and short-time memory for automatic composition is to input all midi samples into an engine extraction model to obtain an engine set. The corresponding training output is complete polyphonic music.
The automatic composition method further comprises: the user inputs the melody fragment or samples the neural network for melody generation to obtain the melody fragment, and inputs the deep convolution cyclic neural network with long and short time memory for automatic composition to obtain the output of the model.
Compared with the prior art, the method has the technical effects that the audibility of the output music is high and the output music conforms to the music theory, and the effects are directly brought by the use of the motivation extraction method and the architecture of the neural network. In the experiment, the automatic composition method based on the motivation extraction model and the neural network can learn composition skills such as motivation reuse, modal advance and the like widely used in classical music writing.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a flowchart of a music generation method according to the present invention;
FIG. 2 is a flowchart illustrating a method of the melody recognition model according to the present invention;
FIG. 3 is a flow chart of a method of the motivational extraction model of the present invention.
Detailed Description
Fig. 1 is a schematic diagram of an automatic composition system according to an example of the present invention, in which s1, s2, s3 denote neural network training phases and s4 denotes a system application phase.
At step s1, a neural network for the melody recognition is constructed and trained, including steps s101, s102, s103 and s 104.
In step s101, first, a music MIDI data set is prepared, and each sample (i.e. each musical piece) in the data set should include a plurality of sound parts. Manually labeling the main melody for each vocal part of each sample, wherein if the vocal part contains the main melody, the label is '1', otherwise, the label is '0', and a vocal part sample set is obtained.
Step s102, processing the sound part sample set obtained in step s101, and extracting a plurality of music attributes in the music data, including: note density, average time value, maximum interval, pitch distribution variance, integral value of the novelty function (s 302), and the like are combined with the label obtained in step s101 to obtain a training set of the neural network for melody recognition.
And step s103, inputting the training set obtained in the step s102 into a neural network and training, wherein the music attribute after normalization processing is input, and the main melody label is output as a target.
Step s2 builds and trains an episodic memory recurrent neural network for melody generation. First, all music samples are inputted into the main melody recognition neural network in step s103, and a main melody set is obtained. And establishing a bidirectional long-time memory cyclic neural network, inputting the melody samples in the acquired main melody set into the neural network, and training.
Step s3 implements the extraction of the musical motivation, comprising a number of sub-steps.
In step S301, a similarity matrix S of individual voice part midi files is calculated. And the value ranges of i and j are the ith frame and the jth frame of the midi file, and the value ranges of i and j depend on the sampling rate of the midi file.
And step S302, performing convolution operation on the matrix S along a diagonal line by using a Gaussian core to obtain a novel singularity function N. Self-similarity (lower left and upper right parts of the gaussian kernel) and cross-similarity (upper left and lower right parts of the gaussian kernel) can be captured simultaneously using the gaussian kernel.
And step 303, solving the maximum value of the novelty function N by using a difference method, and taking the zero point with the distance smaller than that in the first-order difference function, which has the larger value of the corresponding novelty function N.
And step s304, taking the obtained maximum value point as a segmentation point, splitting midi, and obtaining a plurality of motivation candidate segments.
Step s305, a gaussian blur process is performed on the piano rolling shutter matrix corresponding to the midi file of the obtained motivation candidate segment, thereby allowing a small change in motivation in the subsequent step.
In step S306, the engine similarity matrix S' is calculated. Wherein, for the ith and jth motivational candidate segments, the conv _ sim () function is defined as: and (4) performing full convolution operation on the ith (blurred) engine candidate matrix by taking the jth engine candidate matrix as a convolution kernel, and taking the maximum value in the obtained matrix as the value of the corresponding conv _ sim () function. The convolution operation both tolerates errors in the segmentation operation (horizontal movement of the convolution kernel), increases robustness, and gives the model the ability to identify the tuned motive (vertical movement of the convolution kernel).
And step S307, reserving 2-4 motivation pairs with the highest similarity in the obtained motivation similarity matrix S'. In a piece of music, the motivation can repeatedly appear in the same or similar forms, so that the step not only finishes the screening of the repeatedly appearing motivation, but also filters other noises, and improves the quality of a subsequent neural network training set.
And step s308, finally, taking the obtained 2k motivation segments as input of the neural network, taking corresponding total-voice music as target output, and training the deep convolution cyclic neural network with long-time and short-time memory for automatic composition. The neural network has also been widely used for machine translation, which here can be seen as the translation of a musical motivation into complete polyphonic music.
Step s4 is an application stage, where the user inputs the melody fragment or samples the neural network for melody generation to obtain the melody fragment, and inputs the deep convolution cyclic neural network with long and short term memory for automatic composition to obtain the output of the model.
Claims (4)
1. An automatic composition method based on an engine extraction model and a neural network is characterized by comprising a music engine extraction model, a main melody recognition model, a melody generation model and a deep convolution cyclic neural network with long-time and short-time memories for automatic composition.
2. The automatic composition method based on the motivational extraction model and the neural network, according to claim 1, wherein a motivational extraction model is used to generate the training samples of the deep convolution cyclic neural network with long and short term memory for automatic composition, and the full polyphonic music corresponding to the motivational is used as the output of the deep convolution cyclic neural network with long and short term memory.
3. The motivational extraction model of claim 2, wherein: scanning 8-16 frames per second for midi samples to obtain a piano rolling curtain matrix, calculating a self-similarity matrix of the midi samples, performing convolution by using a Gaussian core to obtain a novelty function, calculating an extreme point of the obtained function by using a first-order difference method to serve as a dividing point, performing Gaussian blur with the radius of 2 (corresponding to two semitones) on the obtained motive candidate segments, performing mutual convolution on the two motive candidate segments after performing Gaussian blur with the radius of 2 (corresponding to two semitones) to obtain a motive similarity matrix, and selecting a motive corresponding to the coordinates of the largest 2-4 elements in the matrix as the output of the music motive extraction model.
4. The motivation extraction model according to claim 2, wherein after segmentation using extreme points and gaussian blur processing on candidate motivation segments, the motivation similarity matrix is calculated by a method in which a conv _ sim () function is defined as: and performing full-filling convolution operation on the ith (blurred) motive candidate matrix by taking the jth motive candidate matrix as a convolution kernel, and taking the maximum value in the obtained convolution result as the value of the corresponding conv _ sim () function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910259941.XA CN111785236A (en) | 2019-04-02 | 2019-04-02 | Automatic composition method based on motivational extraction model and neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910259941.XA CN111785236A (en) | 2019-04-02 | 2019-04-02 | Automatic composition method based on motivational extraction model and neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111785236A true CN111785236A (en) | 2020-10-16 |
Family
ID=72754737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910259941.XA Pending CN111785236A (en) | 2019-04-02 | 2019-04-02 | Automatic composition method based on motivational extraction model and neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111785236A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112435642A (en) * | 2020-11-12 | 2021-03-02 | 浙江大学 | Melody MIDI accompaniment generation method based on deep neural network |
CN116160459A (en) * | 2022-12-30 | 2023-05-26 | 广州市第二中学 | Music robot control method and system based on machine learning algorithm |
-
2019
- 2019-04-02 CN CN201910259941.XA patent/CN111785236A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112435642A (en) * | 2020-11-12 | 2021-03-02 | 浙江大学 | Melody MIDI accompaniment generation method based on deep neural network |
CN112435642B (en) * | 2020-11-12 | 2022-08-26 | 浙江大学 | Melody MIDI accompaniment generation method based on deep neural network |
CN116160459A (en) * | 2022-12-30 | 2023-05-26 | 广州市第二中学 | Music robot control method and system based on machine learning algorithm |
CN116160459B (en) * | 2022-12-30 | 2023-09-29 | 广州市第二中学 | Music robot control method and system based on machine learning algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107578775B (en) | Multi-classification voice method based on deep neural network | |
CN109065031B (en) | Voice labeling method, device and equipment | |
CN110457432B (en) | Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium | |
CN106485984B (en) | Intelligent teaching method and device for piano | |
CN109272992A (en) | A kind of spoken language assessment method, device and a kind of device for generating spoken appraisal model | |
Kelly et al. | Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors | |
CN102376182B (en) | Language learning system, language learning method and program product thereof | |
CN105206258A (en) | Generation method and device of acoustic model as well as voice synthetic method and device | |
CN101178896A (en) | Unit selection voice synthetic method based on acoustics statistical model | |
CN103824565A (en) | Humming music reading method and system based on music note and duration modeling | |
CN108305618B (en) | Voice acquisition and search method, intelligent pen, search terminal and storage medium | |
CN108597538B (en) | Evaluation method and system of speech synthesis system | |
CN107993636B (en) | Recursive neural network-based music score modeling and generating method | |
CN112259083A (en) | Audio processing method and device | |
CN103885924A (en) | Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method | |
CN113823323B (en) | Audio processing method and device based on convolutional neural network and related equipment | |
CN111785236A (en) | Automatic composition method based on motivational extraction model and neural network | |
CN116206496A (en) | Oral english practice analysis compares system based on artificial intelligence | |
CN108172211A (en) | Adjustable waveform concatenation system and method | |
CN110992988A (en) | Speech emotion recognition method and device based on domain confrontation | |
CN110390937A (en) | A kind of across channel method for recognizing sound-groove based on ArcFace loss algorithm | |
CN111508466A (en) | Text processing method, device and equipment and computer readable storage medium | |
CN112052686B (en) | Voice learning resource pushing method for user interactive education | |
CN105895079A (en) | Voice data processing method and device | |
CN110473548B (en) | Classroom interaction network analysis method based on acoustic signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |