CN110264987A - Chord based on deep learning carries out generation method - Google Patents

Chord based on deep learning carries out generation method Download PDF

Info

Publication number
CN110264987A
CN110264987A CN201910527315.4A CN201910527315A CN110264987A CN 110264987 A CN110264987 A CN 110264987A CN 201910527315 A CN201910527315 A CN 201910527315A CN 110264987 A CN110264987 A CN 110264987A
Authority
CN
China
Prior art keywords
melody
chord
sequence
deep learning
carries out
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910527315.4A
Other languages
Chinese (zh)
Inventor
王子豪
魏东来
王文玉
赵梓良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910527315.4A priority Critical patent/CN110264987A/en
Publication of CN110264987A publication Critical patent/CN110264987A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/38Chord
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • G06F2218/10Feature extraction by analysing the shape of a waveform, e.g. extracting parameters relating to peaks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/571Chords; Chord sequences
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/571Chords; Chord sequences
    • G10H2210/586Natural chords, i.e. adjustment of individual note pitches in order to generate just intonation chords

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The present invention provides a kind of, and the chord based on deep learning carries out generation method, which comprises S1, makes data set based on computer vision technique, data set includes melody data and chord data;S2, pass through the data set training melody based on attention mechanism-chord Machine Translation Model;S3, detection melody sequence, and melody sequence is input to the melody based on attention mechanism-chord Machine Translation Model, obtain corresponding chord sequence.Present invention combination attention Mechanism Model and related music theory are theoretical, training obtains melody-chord Machine Translation Model, realizes the automatic translation of melody to chord, reduces translation model training penalty values, translation accuracy rate is improved, suitable, interesting to listen to chord can be allocated for music rhythm and is carried out.

Description

Chord based on deep learning carries out generation method
Technical field
The present invention relates to music harmony technical fields, carry out generation side more particularly to a kind of chord based on deep learning Method.
Background technique
Chord (chord) is originated from Greek, and original meaning refers to string.In music theory, refer to two combined Or more different pitches sound.In European Classic Music and the music style being affected by it, more often refer to three or with On pitch combination, and the combinations of two pitches is then described with interval.The group audio of chord can be separated and be played, also can be simultaneously It plays.
Chord is divided into three degree to be stacked and non-three degree stacked, the chords in west tradition harmony, stacked according to three degree Principle constitute, therefore chord is expressed as the set of several groups audio.It coordinates according to modern harmony theoretical, when two different pitches Sound simultaneously when playing, the interval difference between two sounds determines the consonance degree of this harmony, and this consonance degree be can root Classify according to interval difference.
Existing automatic accompaniment technology needs to provide chord by user, is generated most by arpeggio generator (Arpeggiator) Whole accompaniment.The principle of arpeggio generator is to carry out permutation and combination to the sound in the chord provided by finite state machine (DFA), Generate a single-tone sequence being made of polyphonic ring tone.
The technology that several chords automatically generate, such as Chinese patent application have been disclosed in the prior art A kind of harmony accompaniment generation method based on genetic algorithm is disclosed in 201410582178.1, uses genetic algorithm to generate single The chord of sound sequence, but generated using genetic algorithm still lower with rotation accuracy rate.
Therefore, in view of the above technical problems, it is necessary to which a kind of chord progress generation method based on deep learning is provided.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of, the chord based on deep learning carries out generation method.
To achieve the goals above, the technical solution that one embodiment of the invention provides is as follows:
A kind of chord progress generation method based on deep learning, which comprises
S1, data set is made based on computer vision technique, data set includes melody data and chord data;
S2, pass through the data set training melody based on attention mechanism-chord Machine Translation Model;
S3, detection melody sequence, and melody sequence is input to the melody based on attention mechanism-chord machine translation mould Type obtains corresponding chord sequence.
In one embodiment, the step S1 specifically:
S11, several music score pictures are obtained;
S12, processing is split to music score picture, obtains melody region picture and corresponding chord region picture;
S13, melody text data and chord text data are converted by melody region picture and chord region picture correspondence, Melody text data include high pitch and when value parameter, chord text data include letter;
S14, melody text data and chord text data are handled to obtain data set.
In one embodiment, the step S12 specifically:
Gaussian filtering noise reduction is carried out to music score picture, then music score picture is subjected to transverse projection, obtains pixel accumulation Sharp peaks characteristic;
Tracking sharp peaks characteristic navigates to the position of every line spectrum, cuts out the melody sequence of top and the chord sequence of lower section Column;
A line music score picture that line spectrum, melody sequence, chord sequence mark are combined is done into longitudinal projection, tracks trifle Music score picture is cut into melody region picture and chord region picture by line feature.
In one embodiment, the step S14 specifically:
Dimension-reduction treatment is carried out to melody text data and chord text data, removes the duration ginseng in melody text data Number.
In one embodiment, the melody-chord Machine Translation Model is machine translation mould of the melody sequence to chord sequence Type, including encoder and decoder, the encoder and decoder use LSTM shot and long term to remember Recognition with Recurrent Neural Network, coding For receiving melody sequence and being processed into intermediate semantic vector C, the decoder is used to be converted to intermediate semantic vector C pair device The chord sequence answered.
In one embodiment, in the melody based on the attention mechanism-chord Machine Translation Model, encoder is for receiving Melody sequence is simultaneously processed into intermediate semantic vector Ci, decoder is used for intermediate semantic vector CiCorresponding chord sequence is converted to, Intermediate semantic vector CiAre as follows:
Wherein, Lx is the length of melody sequence, aijMelody sequence jth section is locally revolved when being i-th of chord sequence of output The attention force coefficient of rule, hjIt is the semantic coding of melody sequence jth section part melody.
In one embodiment, the intermediate semantic vector CiIn, aijCalculation method are as follows:
Use the hidden state H at output chord sequence i-1 momenti-1It goes to revolve with each part in input melody sequence one by one Restrain corresponding neural network hidden state hjIt is compared, passes through adaptation function F (hj,Hi-1) obtain chord sequence currently entered Decoder hidden state caused by arranging sequentially inputs multiple encoders that local melody records with encoder before one by one Hidden state is compared;
Pass through adaptation function F (hj,Hi-1) multiple numerical value are obtained, and be uniformly output in SoftMax function and be normalized The Automobile driving probability distribution numerical value for meeting probability distribution section value is obtained afterwards.
In one embodiment, the adaptation function F (hj,Hi-1) it is weighted sum function.
In one embodiment, " detection melody sequence " in the step S3 includes:
S31, it determines and claps speed BPM;
S32, adjustment sensitivity of microphone threshold value;
S33, using microphone detection melody, obtain melody sequence.
In one embodiment, the step S33 specifically:
According to the BPM/60*4 times/second of fixed frequency, detect function is called to be deposited into obtain the frequency values of current outside In array;
The storage organization of melody uses JAVA inner classes, includes two member variables of pitch and duration in class.After stopping, number Identical before and after pitch in group to merge, dump is melody sequence.
The invention has the following advantages:
Present invention combination attention Mechanism Model and related music theory are theoretical, and training obtains melody-chord Machine Translation Model, The automatic translation for realizing melody to chord reduces translation model training penalty values, improves translation accuracy rate, can be sound Happy melody is allocated, and suitable, interesting to listen to chord carries out.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in invention, for those of ordinary skill in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow diagram of the chord progress generation method in the present invention based on deep learning;
Fig. 2 is melody-chord Machine Translation Model module diagram in the present invention;
Fig. 3 is the schematic illustration that LSTM shot and long term remembers Recognition with Recurrent Neural Network in the present invention;
Fig. 4 is the schematic illustration of attention Mechanism Model in the present invention;
Fig. 5 is the schematic diagram of guitar music score in a specific embodiment of the invention.
Specific embodiment
Technical solution in order to enable those skilled in the art to better understand the present invention, below in conjunction with of the invention real The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without making creative work, all should belong to protection of the present invention Range.
Join shown in Fig. 1, the invention discloses a kind of, and the chord based on deep learning carries out generation method, this method comprises:
S1, data set is made based on computer vision technique, data set includes melody data and chord data;
S2, pass through the data set training melody based on attention mechanism-chord Machine Translation Model;
S3, detection melody sequence, and melody sequence is input to the melody based on attention mechanism-chord machine translation mould Type obtains corresponding chord sequence.
Wherein, step S1 specifically:
S11, several music score pictures are obtained;
S12, processing is split to music score picture, obtains melody region picture and corresponding chord region picture;
S13, melody text data and chord text data are converted by melody region picture and chord region picture correspondence, Melody text data include high pitch and when value parameter, chord text data include letter;
S14, melody text data and chord text data are handled to obtain data set.
Further, step S12 specifically:
Gaussian filtering noise reduction is carried out to music score picture, then music score picture is subjected to transverse projection, obtains pixel accumulation Sharp peaks characteristic;
Tracking sharp peaks characteristic navigates to the position of every line spectrum, cuts out the melody sequence of top and the chord sequence of lower section Column;
A line music score picture that line spectrum, melody sequence, chord sequence mark are combined is done into longitudinal projection, tracks trifle Music score picture is cut into melody region picture and chord region picture by line feature.
Further, step S14 specifically:
Dimension-reduction treatment is carried out to melody text data and chord text data, removes the duration ginseng in melody text data Number.
Join shown in Fig. 2, melody-chord Machine Translation Model is machine translation of the melody sequence to chord sequence in the present invention Model, including encoder and decoder, encoder and decoder use LSTM shot and long term to remember Recognition with Recurrent Neural Network, encoder For receiving melody sequence and being processed into intermediate semantic vector C, decoder be used for by intermediate semantic vector C be converted to it is corresponding and String sequence.
Join shown in Fig. 3, LSTM shot and long term memory Recognition with Recurrent Neural Network is improved on RNN neural net base, is being protected It holds to increase in the cyclicity of input and output and forgets three door, input gate and out gate neuron control components, substantially enhance Screening capacity and memory degree of the neural network to sequence signal.
It joined attention mechanism, as shown in connection with fig. 4, encoder in melody-chord Machine Translation Model in the present invention For receiving melody sequence and being processed into intermediate semantic vector Ci, decoder is used for intermediate semantic vector CiIt is converted to corresponding Chord sequence, intermediate semantic vector CiAre as follows:
Wherein, Lx is the length of melody sequence, aijMelody sequence jth section is locally revolved when being i-th of chord sequence of output The attention force coefficient of rule, hjIt is the semantic coding of melody sequence jth section part melody.
Specifically, intermediate semantic vector CiIn, aijCalculation method are as follows:
Use the hidden state H at output chord sequence i-1 momenti-1It goes to revolve with each part in input melody sequence one by one Restrain corresponding neural network hidden state hjIt is compared, passes through adaptation function F (hj,Hi-1) obtain chord sequence currently entered Decoder hidden state caused by arranging sequentially inputs multiple encoders that local melody records with encoder before one by one Hidden state is compared;
Pass through adaptation function F (hj,Hi-1) multiple numerical value are obtained, and be uniformly output in SoftMax function and be normalized The Automobile driving probability distribution numerical value for meeting probability distribution section value is obtained afterwards.
Preferably, adaptation function F (hj,Hi-1) it is weighted sum function.
Common Machine Translation Model does not have emphasis when realizing translation of the melody to chord, because network decoder exists The intermediate semanteme C used when generating all chord sequence symbols is the same, that is to say, that when generating any one chord, Each part paragraph is identical, none focus to its influence power in melody sequence.
Attention Mechanism Model in the present invention can be given when circulation generates chord symbol by matching mechanisms and self study Each section of local melody distributes a probability, each rotation when embodying for translating current chord of probability value size in melody sequence Restrain the Different Effects degree of section.Show that adding the chord quality generated after attention mechanism is obviously improved by many experiments.
Wherein, " the detection melody sequence " in step S3 includes:
S31, it determines and claps speed BPM;
S32, adjustment sensitivity of microphone threshold value;
S33, using microphone detection melody, obtain melody sequence.
Specifically, step S33 specifically:
According to the BPM/60*4 times/second of fixed frequency, detect function is called to be deposited into obtain the frequency values of current outside In array;
The storage organization of melody uses JAVA inner classes, includes two member variables of pitch and duration in class.After stopping, number Identical before and after pitch in group to merge, dump is melody sequence.
It includes data set production, the melody based on attention mechanism-chord machine that chord in the present invention, which carries out generation method, The training of device translation model and melody sequence to chord sequence the big step of translation three, below in conjunction with specific embodiment to the present invention It is described further.
One, data set makes
A) on Cloud Server using Python crawler to a large amount of pop music guitars existing on domestic outer network compose into Row crawls, and crawling quantity is 10,008 thousand sheets.
B) three-wheel screening is carried out to more than the Wan Baqian crawled spectrum.
The first round removes the guitar spectrum of no melody numbered musical notation mark and chord mark;Second wheel, which removes, does not meet 4/4 bat lattice Formula, the guitar spectrum with capo tasto modulation;Third round removes that notation level is of poor quality, the poor guitar spectrum of chord configuration level.Most More than 10,000 Zhang Suoxu music score of residue, score formats are joined shown in Fig. 5 eventually.
C) musical score image dividing processing is done using Matlab, is partitioned into melody and the picture region conduct of chord symbol two parts Valid data.
Guitar spectrum in melody sequence indicated by the numbered musical notation below six line spectrums, harmony by above six line spectrums chord label Lai It indicates.
A gaussian filtering noise reduction first is carried out to music score picture, then guitar is composed and carries out transverse projection, due to every six line of row Six line pixels of spectrum are the most intensive, and the peak value of available 6 pixels accumulation is as its feature.Tracking this feature can be certainly The dynamic position for navigating to every six line spectrum of row, then cuts out the melody sequence and the corresponding chord sequence of melody above and below it.
A line music score picture that six line spectrums, numbered musical notation, chord label are combined is done into longitudinal projection again, due to six line spectrums Every trifle can all have longitudinal trifle wire tag, track this pixel accumulation peak value this can be made merry spectrum and be cut into and be with trifle The picture of unit.
It is available from the first trifle to the file of a last trifle by above-mentioned cutting twice, in file thus The chord of trifle compose with melody.
D) a basic deep learning neural network is constructed, identifies that the corresponding of two parts picture is closed by deep learning System handles the image data that segmentation obtains, the corresponding text data that output conversion obtains by model.I.e. problem is converted into symbol Number identification, top chord identification based on letter and storage, the melody that lower section numbered musical notation is identified as pitch and duration to indicate Sequence.
E) pitch value is the integer of 1-72.The value pitch=37 in center 1, one semitone of every raising, pitch+1;Instead One semitone of reduction then -1.Duration value is the integer of 1-16.The present embodiment is only with the music score of Chinese operas with analysis 4/4, so often The duration of trifle is all defined as 16, a length of 4 when every bat, as unit of 16 dieresis.
F) again obtained data handle as data set.
Dimensionality reduction is carried out first, and by duration, this parameter is removed.The every bat of melody samples 4 times, and the every bat sampling of chord is primary (to be paid attention to A trifle is 4 to clap in 4/4 music score) realize chord sequence and melody sequence be stored separately but it is interrelated corresponding;In addition Splicing is used as a word before and after four values of the melody of every bat, and the chord of every bat adapts it to language mould as a word Type.Utmostly to retain influencing each other between every trifle chord, using every two trifle as one, i.e. tetra- trifle of ABCD is converted into AB, BC and CD.
Such as data set sample (tri- trifle of A, B, C) in the present embodiment specifically:
Melody data:
A:41 39 39 39 39 39 42 42 41 41 41 41 44 44 49 49
B:49 49 49 49 41 41 46 46 46 46 44 44 44 44 51 51
C:51 51 49 51 51 51 41 42 42 42 44 44 44 44 53 53。
Chord data:
A:G G C C
B:Am Am F F
C:G G C C。
Eventually pass through the data set that processing and integration obtain are as follows:
G G C C Am Am F F@41393939 39394242 41414141 44444949 4949494941414646 46464444 44445151
Am Am F F G G C C@49494949 41414646 46464444 44445151 5151495151514142 42424444 44445353。
Two, the melody based on attention mechanism-chord Machine Translation Model training
Core of the invention is using artificial intelligence machine Machine Translation Model (Sequence to Sequence Network) conversion of melody sequence to chord sequence is realized.
Melody-chord Machine Translation Model be melody sequence arrive chord sequence Machine Translation Model, by two substantially Recognition with Recurrent Neural Network handles two time series, including encoder and decoder respectively, and encoder is for receiving melody sequence simultaneously It is processed into intermediate semantic vector C, the decoder is used to intermediate semantic vector C being converted to corresponding chord sequence.
The two processes are the coding and decoding process of music sequence, because outputting and inputting, all handle is similar In the temporal model of natural language sequence, therefore two encoder and decoder use LSTM shot and long term memory circulation nerve net Network.
The melody sequence sequentially input (numerical value expression) is carried out coded treatment by encoder first, by the input of a fixed length Sequence Transformed the last one hidden state exported at Recognition with Recurrent Neural Network, this hidden state are equivalent to input after study (the intermediate semanteme C as obtained) together is concentrated in this melody sequence, passes it to decoder use.
The initial hidden of decoder Recognition with Recurrent Neural Network is exactly the intermediate semanteme C that encoder generates, and decoder is by root Generate corresponding chord symbol accordingly, customized beginning flag is inputted as first, chord symbol is generated according to C and is made For output, then this output generates subsequent chord symbol as next input continuation of decoder in a decoder, each Subsequent chord symbol is all built upon among decoder under the collective effect of semantic vector C and a upper chord symbol and in chord It selects in symbol word pond, until output generates end mark side and stops circulation, completes by melody sequence to chord sequence Translation process.
On this basis, the present invention is added to attention mechanism on Recognition with Recurrent Neural Network model.Attention mechanism is from originally Make peace in matter the mankind selective visual attention power mechanism it is similar, core objective is to comform to select to current task in multi information The more crucial information of target.Because music has systematicness and the music theory sense of organization as natural language: such as one section of melody Local tone just corresponds mainly to a chord symbol of corresponding portion, and the chord in remaining chord sequence can be because before music Relevance and certain influence by the part melody afterwards, but much less than suffered by the part melody corresponding position chord Influence it is big.Therefore our Machine Translation Model at work, when a chord symbol generated of serving as interpreter, will focus more on Its corresponding a part in intermediate semanteme C, the weighing factor that this part plays translation chord should be higher than that other parts It is to meet convention and rule (to be equivalent to when sequentially generating chord symbol one by one and simulate a focus in melody sequence Upper movement).
Attention mechanism is added in melody-chord Machine Translation Model, is preferably to complete function in the present invention to improve standard The crucial place of true rate.It implements principle are as follows: attention model can be to every in melody sequence when generating chord symbol A part melody distributes a probability value, this local melody when the size of probability value is characterized for translating some chord symbol Influence degree size.Thus the chord sequence in training set should all learn its attention for corresponding to local melody section in melody sequence Power allocation probability information, then original fixed intermediate semanteme C can be changed to basis and be currently generated chord symbol and constantly change Ci.I.e. each intermediate vector C needs to be recalculated according to the current input of decoder, calculation formula are as follows:
Wherein, Lx is the length of melody sequence, aijMelody sequence jth section is locally revolved when being i-th of chord symbol of output The attention force coefficient of rule, hjIt is then the semantic coding of melody sequence jth section part melody.
Wherein, the attention force coefficient a of each local melody sectionijCalculation method are as follows:
Use the hidden state H at output chord sequence i-1 momenti-1It goes to revolve with each part in input melody sequence one by one Restrain corresponding neural network hidden state hjIt is compared, passes through an adaptation function F (hj,Hi-1) come obtain it is currently entered and A decoder hidden state caused by string symbol sequentially inputs what local melody segment record obtained with encoder before one by one Multiple encoder hidden states are compared, and the form of comparison function F used herein is weighted sum.
Multiple numerical value are obtained by the comparison function, this multiple numerical value is uniformly output in SoftMax function and is returned One changes the Automobile driving probability distribution numerical value for obtaining meeting probability distribution section value.These pay attention to force coefficient as aijEffect In each hjThe semantic coding of the semantic coding of local melody, all part melody generates the centre based on attention mechanism jointly Semantic vector Ci
It is the melody based on attention mechanism-chord Machine Translation Model taproot content above, building is completed should After model, batch training is carried out by the data set that web crawlers obtains using model.Due to common chord symbol relative to Common spoken and written languages much smaller number, therefore chord symbol should take high-dimensional term vector to indicate that (term vector is a kind of high Dimensional vector, often one-dimensional is a real number, a certain attribute of this chord symbol is reacted, then the size of dimension illustrates chord Feature, the feature the more more can preferably open the different instructions between chord symbol, but the pass between corresponding chord symbol System also can slightly be desalinated, and be indicated in the model by testting each chord using the feature term vector of 200 dimensions).By After network batch training test, penalty values are reduced to 0.7121 on training set, reflect that the network model is pre- in training It is quite accurate on translater to survey, and the parameter of encoder and decoder in model at this time is saved in file, this is based on It can complete to input the data value that audio file after treatment obtains, output obtains being based on deep learning mass data for it The chord sequence allocated out afterwards.
Three, melody sequence to chord sequence translation
A) it first selects suitable bat speed BPM (how many per minute to clap), this value will determine next step detection frequency function Call frequency, frequency formula are as follows: BPM/60*4 (secondary/second).
B) threshold value for adjusting sensitivity of microphone again ignores all voice signals lower than the value loudness when detecting, with This controls ambient noise annoyance level.
C) melody is detected and saves using mobile phone built-in microphone, it is synchronous to generate melodic curve.
The basic principle of this step is to call the detect function of C language layer according to fixed frequency in a process The frequency values for obtaining current outside are deposited into array.
The storage organization of melody uses a JAVA inner classes, includes pitch (Pitch) and duration (Duration) in class Two member variables.After stopping, identical before and after pitch in array to merge, dump is melody sequence.
Process is every to call 4 detect functions (completing one to clap), just successively calls and is previously stored in soundpool Crash (big small cymbals) and kick (pucking) audio remind user, keep melody typing more accurate.
After detecting melody sequence, melody sequence is input to the above-mentioned melody based on attention mechanism-chord machine and is turned over It translates in model, can translate to obtain corresponding chord sequence.
For example, the melody sequence of input are as follows:
0 0 0 0 44 44 44 44 44 44 44 42 42 42 42 44;
41 41 41 41 41 41 41 41 0 0 39 41 39 36 36 37;
37 37 37 37 37 37 37 37 0 0 39 39 37 39 39 39。
Translate obtained chord sequence are as follows:
C C G G;
Am Am Em Em;
F F G G。
As can be seen from the above technical solutions, the invention has the following advantages that
Present invention combination attention Mechanism Model and related music theory are theoretical, and training obtains melody-chord Machine Translation Model, The automatic translation for realizing melody to chord reduces translation model training penalty values, improves translation accuracy rate, can be sound Happy melody is allocated, and suitable, interesting to listen to chord carries out.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiments being understood that.

Claims (10)

1. a kind of chord based on deep learning carries out generation method, which is characterized in that the described method includes:
S1, data set is made based on computer vision technique, data set includes melody data and chord data;
S2, pass through the data set training melody based on attention mechanism-chord Machine Translation Model;
S3, detection melody sequence, and melody sequence is input to the melody based on attention mechanism-chord Machine Translation Model, Obtain corresponding chord sequence.
2. the chord according to claim 1 based on deep learning carries out generation method, which is characterized in that the step S1 Specifically:
S11, several music score pictures are obtained;
S12, processing is split to music score picture, obtains melody region picture and corresponding chord region picture;
S13, melody text data and chord text data, melody are converted by melody region picture and chord region picture correspondence Text data include high pitch and when value parameter, chord text data include letter;
S14, melody text data and chord text data are handled to obtain data set.
3. the chord according to claim 2 based on deep learning carries out generation method, which is characterized in that the step S12 specifically:
Gaussian filtering noise reduction is carried out to music score picture, then music score picture is subjected to transverse projection, obtains the peak of pixel accumulation Value tag;
Tracking sharp peaks characteristic navigates to the position of every line spectrum, cuts out the melody sequence of top and the chord sequence of lower section;
A line music score picture that line spectrum, melody sequence, chord sequence mark are combined is done into longitudinal projection, tracking bar line is special Music score picture is cut into melody region picture and chord region picture by sign.
4. the chord according to claim 2 based on deep learning carries out generation method, which is characterized in that the step S14 specifically:
Dimension-reduction treatment is carried out to melody text data and chord text data, removes the when value parameter in melody text data.
5. the chord according to claim 1 based on deep learning carries out generation method, which is characterized in that the melody- Chord Machine Translation Model is Machine Translation Model of the melody sequence to chord sequence, including encoder and decoder, the volume Code device and decoder use LSTM shot and long term to remember Recognition with Recurrent Neural Network, and encoder is in receiving melody sequence and being processed into Between semantic vector C, the decoder is used to intermediate semantic vector C being converted to corresponding chord sequence.
6. the chord according to claim 5 based on deep learning carries out generation method, which is characterized in that described based on note In the melody-chord Machine Translation Model for power mechanism of anticipating, encoder is for receiving melody sequence and being processed into intermediate semantic vector Ci, decoder is used for intermediate semantic vector CiBe converted to corresponding chord sequence, intermediate semantic vector CiAre as follows:
Wherein, Lx is the length of melody sequence, aijTo melody sequence jth section part melody when being i-th of chord sequence of output Pay attention to force coefficient, hjIt is the semantic coding of melody sequence jth section part melody.
7. the chord according to claim 6 based on deep learning carries out generation method, which is characterized in that the intermediate language Adopted vector CiIn, aijCalculation method are as follows:
Use the hidden state H at output chord sequence i-1 momenti-1Go one by one with each local melody pair in input melody sequence The neural network hidden state h answeredjIt is compared, passes through adaptation function F (hj,Hi-1) obtain chord sequence institute currently entered The decoder hidden state of generation sequentially inputs multiple encoders that local melody records with encoder before one by one and hides State is compared;
Pass through adaptation function F (hj,Hi-1) obtain multiple numerical value, and be uniformly output to be normalized in SoftMax function after To the Automobile driving probability distribution numerical value for meeting probability distribution section value.
8. the chord according to claim 7 based on deep learning carries out generation method, which is characterized in that the matching letter Number F (hj,Hi-1) it is weighted sum function.
9. the chord according to claim 1 based on deep learning carries out generation method, which is characterized in that the step S3 In " detection melody sequence " include:
S31, it determines and claps speed BPM;
S32, adjustment sensitivity of microphone threshold value;
S33, using microphone detection melody, obtain melody sequence.
10. the chord according to claim 9 based on deep learning carries out generation method, which is characterized in that the step S33 specifically:
According to the BPM/60*4 times/second of fixed frequency, detect function is called to be deposited into array to obtain the frequency values of current outside In;
The storage organization of melody uses JAVA inner classes, includes two member variables of pitch and duration in class.After stopping, in array Identical before and after pitch to merge, dump is melody sequence.
CN201910527315.4A 2019-06-18 2019-06-18 Chord based on deep learning carries out generation method Pending CN110264987A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910527315.4A CN110264987A (en) 2019-06-18 2019-06-18 Chord based on deep learning carries out generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910527315.4A CN110264987A (en) 2019-06-18 2019-06-18 Chord based on deep learning carries out generation method

Publications (1)

Publication Number Publication Date
CN110264987A true CN110264987A (en) 2019-09-20

Family

ID=67919060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910527315.4A Pending CN110264987A (en) 2019-06-18 2019-06-18 Chord based on deep learning carries out generation method

Country Status (1)

Country Link
CN (1) CN110264987A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079093A (en) * 2019-12-11 2020-04-28 北京阿尔山区块链联盟科技有限公司 Music score processing method and device and electronic equipment
CN112133270A (en) * 2020-08-31 2020-12-25 广东工业大学 Automatic playing method of stringed instrument
CN112133264A (en) * 2020-08-31 2020-12-25 广东工业大学 Music score recognition method and device
CN112435642A (en) * 2020-11-12 2021-03-02 浙江大学 Melody MIDI accompaniment generation method based on deep neural network
CN112749569A (en) * 2019-10-29 2021-05-04 阿里巴巴集团控股有限公司 Text translation method and device
CN113012665A (en) * 2021-02-19 2021-06-22 腾讯音乐娱乐科技(深圳)有限公司 Music generation method and training method of music generation model
CN114970651A (en) * 2021-02-26 2022-08-30 北京达佳互联信息技术有限公司 Training method of chord generation model, chord generation method, device and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093748A (en) * 2013-01-31 2013-05-08 成都玉禾鼎数字娱乐有限公司 Method of automatically matching chord for known melody
CN109767752A (en) * 2019-02-27 2019-05-17 平安科技(深圳)有限公司 A kind of phoneme synthesizing method and device based on attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093748A (en) * 2013-01-31 2013-05-08 成都玉禾鼎数字娱乐有限公司 Method of automatically matching chord for known melody
CN109767752A (en) * 2019-02-27 2019-05-17 平安科技(深圳)有限公司 A kind of phoneme synthesizing method and device based on attention mechanism

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
姚佳宁: "基于深度注意力机制的音乐流派分类方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *
姚佳宁: "基于深度注意力机制的音乐流派分类方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》, 15 February 2019 (2019-02-15), pages 086 - 30 *
张严: "基于循环神经网络的音乐要素生成", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *
张严: "基于循环神经网络的音乐要素生成", 《中国优秀硕士学位论文全文数据库(电子期刊)》, 15 June 2019 (2019-06-15), pages 136 - 348 *
王林泉,章文怡,郑刚: "乐谱识别的预处理和环境参数测定", 计算机工程, no. 01, pages 47 - 49 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749569A (en) * 2019-10-29 2021-05-04 阿里巴巴集团控股有限公司 Text translation method and device
CN112749569B (en) * 2019-10-29 2024-05-31 阿里巴巴集团控股有限公司 Text translation method and device
CN111079093A (en) * 2019-12-11 2020-04-28 北京阿尔山区块链联盟科技有限公司 Music score processing method and device and electronic equipment
CN112133270A (en) * 2020-08-31 2020-12-25 广东工业大学 Automatic playing method of stringed instrument
CN112133264A (en) * 2020-08-31 2020-12-25 广东工业大学 Music score recognition method and device
CN112133270B (en) * 2020-08-31 2023-09-19 广东工业大学 Automatic playing method of stringed instrument
CN112133264B (en) * 2020-08-31 2023-09-22 广东工业大学 Music score recognition method and device
CN112435642A (en) * 2020-11-12 2021-03-02 浙江大学 Melody MIDI accompaniment generation method based on deep neural network
CN113012665A (en) * 2021-02-19 2021-06-22 腾讯音乐娱乐科技(深圳)有限公司 Music generation method and training method of music generation model
CN113012665B (en) * 2021-02-19 2024-04-19 腾讯音乐娱乐科技(深圳)有限公司 Music generation method and training method of music generation model
CN114970651A (en) * 2021-02-26 2022-08-30 北京达佳互联信息技术有限公司 Training method of chord generation model, chord generation method, device and equipment

Similar Documents

Publication Publication Date Title
CN110264987A (en) Chord based on deep learning carries out generation method
Hayashi et al. ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit
CN110782870B (en) Speech synthesis method, device, electronic equipment and storage medium
Briot et al. Deep learning techniques for music generation--a survey
Wu et al. Automatic audio chord recognition with MIDI-trained deep feature and BLSTM-CRF sequence decoding model
Gold et al. Speech and audio signal processing: processing and perception of speech and music
Barbancho et al. Automatic transcription of guitar chords and fingering from audio
Huang et al. Pretraining techniques for sequence-to-sequence voice conversion
Lin et al. A unified model for zero-shot music source separation, transcription and synthesis
Park et al. A bi-directional transformer for musical chord recognition
Park Towards automatic musical instrument timbre recognition
CN110010136A (en) The training and text analyzing method, apparatus, medium and equipment of prosody prediction model
Van Nort et al. Electro/acoustic improvisation and deeply listening machines
CN113761841B (en) Method for converting text data into acoustic features
Xu Recognition and classification model of music genres and Chinese traditional musical instruments based on deep neural networks
Bittner et al. Pitch contours as a mid-level representation for music informatics
Dawson Connectionist representations of tonal music: discovering musical patterns by interpreting artificial neural networks
Zhang [Retracted] Practice and Exploration of Music Solfeggio Teaching Based on Data Mining Technology
Sturm et al. Folk the algorithms:(Mis) Applying artificial intelligence to folk music
CN117711444A (en) Interaction method, device, equipment and storage medium based on talent expression
Cheng et al. [Retracted] Construction of AI Environmental Music Education Application Model Based on Deep Learning
Roig et al. A non-homogeneous beat-based harmony Markov model
CN113470612A (en) Music data generation method, device, equipment and storage medium
Jiang et al. Music signal recognition based on the mathematical and physical equation inversion method
Rönnberg Classification of heavy metal subgenres with machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190920