CN111739491B - Method for automatically editing and allocating accompaniment chord - Google Patents

Method for automatically editing and allocating accompaniment chord Download PDF

Info

Publication number
CN111739491B
CN111739491B CN202010370928.4A CN202010370928A CN111739491B CN 111739491 B CN111739491 B CN 111739491B CN 202010370928 A CN202010370928 A CN 202010370928A CN 111739491 B CN111739491 B CN 111739491B
Authority
CN
China
Prior art keywords
chord
main melody
pcp
frequency
track
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010370928.4A
Other languages
Chinese (zh)
Other versions
CN111739491A (en
Inventor
韦岗
刘俊伟
曹燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010370928.4A priority Critical patent/CN111739491B/en
Publication of CN111739491A publication Critical patent/CN111739491A/en
Application granted granted Critical
Publication of CN111739491B publication Critical patent/CN111739491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/38Chord
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/111Automatic composing, i.e. using predefined musical rules

Abstract

The invention discloses a method for automatically editing and allocating accompaniment chords, which comprises feature extraction, model training and chord prediction, wherein the feature extraction is the feature extraction of music audio data, aiming at the problem of low chord identification rate caused by harmonic variation and uneven tone among different musical instruments, improved pitch level profile features are adopted, gaussian windowing and logarithmic compression algorithms are introduced, and the negative effects caused by harmonic and multiple tones are eliminated; label calibration and data training are needed in model training, chord information is extracted through accompaniment track symbol data to obtain a corresponding chord sequence, the chord sequence is constructed into a chord label file, and the chord label file and the corresponding enhanced PCP main melody characteristic vector are input into a hidden Markov model together for parameter training; and (4) chord prediction is carried out on the enhanced PCP main melody feature vector input model to be identified, and finally a chord sequence is generated.

Description

Method for automatically editing and allocating accompaniment chord
Technical Field
The invention relates to the technical field of data processing, in particular to a method for automatically editing and allocating accompaniment chords.
Background
The rapid development of modern economy promotes the continuous rise of the demand of people. The change from the original material needs to the mental needs reflects the upgrading of new markets and development spaces. In the aspect of entertainment, people are also diversified, wherein the market of music is gradually growing, and more people are researching music and investing in the industry of music.
Creation is one of the indispensable factors of music, and good music requires the creator to have good music literacy, which needs a lot of rich musical theory as support to ensure good creation. However, this requires long-term accumulation and learning, as well as higher music literacy to accomplish. One important field involved in creation is the allocation of accompaniment chords, which generally needs to be completed by the intensive music talent and rich music theory knowledge, and most of the accompaniment chords are manually operated, so that the threshold is higher.
Today, related computer technologies have been developed in succession to address this problem, facilitating manual chord assignments for burdensome chords, automated through algorithmic models, such as: hidden markov models, stochastic processes, genetic algorithms, deep networks, and the like. However, many of the existing models are hidden markov models, which require a large amount of training data as input for model training, and the performance of the model is greatly influenced by the quality of the training data. However, in most of audio data, if there are many musical instruments, there are problems of harmonic interference and uneven tone under many circumstances, which greatly affects the extraction of relevant features of audio, and further affects the model performance, and reduces the chord recognition rate.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a method for automatically editing and matching accompaniment chords.
The purpose of the invention can be achieved by adopting the following technical scheme:
a method for automatically assigning accompaniment chords, said method comprising the steps of:
s1, preprocessing a MIDI file, deleting the percussion instrument tracks in the MIDI file, and fusing the same instrument tracks to obtain a new track MIDI file;
s2, extracting a main melody track and an accompaniment track from the MIDI file respectively, performing C-tone normalization on the two groups of track sequences, converting the format of the main melody track into main melody audio data through format conversion, and keeping the format of the accompaniment track unchanged;
s3, carrying out Fourier transform on the main melody audio data to obtain frequency spectrum characteristics, and expanding each component in a frequency domain into twelve frequency bands according to twelve equal temperaments in the music theory; adding the components corresponding to the same tone level frequency band aiming at twelve frequency bands obtained by all the components to obtain twelve-dimensional PCP tone level contour characteristics of the whole frequency domain, and reducing the influence of high and low frequency weights through Gaussian windowing to obtain a PCP main melody characteristic vector after filtering; according to a logarithmic compression algorithm, reducing the redundancy of the characteristic space in a certain compression ratio to obtain an enhanced PCP main melody characteristic vector;
s4, extracting tempo, duration, pitch, rhythm and key signature of the accompaniment music track, and calculating the tempo and the rhythm to obtain the minor-bar duration of the accompaniment music track so as to divide the accompaniment music track into a plurality of music minor bars; performing harmony transformation on each music measure to obtain chord root and interval relations, wherein the interval relations comprise major keys and minor keys, and a chord sequence of the music measure is formed according to the key number, the chord root and the interval relations so as to construct and obtain a chord sequence of the whole accompaniment track; the chord sequence of the accompaniment music track is saved into a chord label file in an XML data format;
s5, 36 hidden Markov models are constructed, wherein the 36 hidden Markov models correspond to 36 chords respectively, the 36 chords comprise a triad chord, a quintuple chord, an eleven chord, a thirteen chord and respective deformed chords, the number of states of each model is six, and the states are four active states, a starting state and a stopping state respectively, wherein an observation function of the active state is formed by a single Gaussian observation function with a diagonal matrix; then inputting the enhanced PCP main melody feature vector and the chord label file of the corresponding accompaniment track into 36 hidden Markov models together for parameter training;
and S6, extracting the audio data of the main melody to be recognized to obtain the characteristic vector of the enhanced PCP main melody to be recognized, inputting the characteristic vector of the enhanced PCP main melody to be recognized into the trained hidden Markov model, and predicting to generate the chord sequence.
Further, the step S2 process is as follows:
s21, extracting a main melody sound track from the MIDI file by using a high pitch contour skyline algorithm;
s22, extracting accompaniment tracks from the MIDI file by using a bass contour landline algorithm;
s23, respectively carrying out C-tone normalization processing on the main melody track and the accompaniment track sequence to ensure uniform tone;
s24, WAV format audio conversion is carried out on the main melody track for extracting the characteristics of the enhanced PCP main melody;
and S25, keeping the format of the accompaniment track symbol data unchanged, and constructing a chord label file of the model.
Further, the step S3 process is as follows:
s31, repeatedly framing the main melody audio data, adopting a Hamming window function, overlapping two adjacent windows by half frame length, and performing sliding sampling with the number of sampling points N =4096 on each window, so as to obtain an energy spectrum X (k) of the main melody audio data through Fourier transform;
s32, according to twelve equal temperaments in the musical theory, neglecting the influence of high octaves or low octaves, only considering the frequency values of twelve scales of the lowest scale group in the music, correspondingly dividing each component in a frequency domain by the frequency value of the lowest scale respectively to obtain twelve frequency ratios, and accordingly expanding the components into twelve frequency bands; adding the components corresponding to the same tone level frequency band aiming at the twelve frequency bands obtained by all the components to further obtain a twelve-dimensional PCP melody characteristic vector of the whole frequency domain, wherein the formula is as follows:
Figure BDA0002478271790000041
wherein f is rel Is the reference frequency value of the lowest scale group scale, the lowest scale group comprises scales C1, D1, E1, F1, G1, A1 and B1; f. of sr Is the sampling frequency, N represents the samplingNumber of points, f sr the/N denotes the transform frequency interval of the fourier transform,
Figure BDA0002478271790000042
represents the frequency of each component in the frequency domain, and thus
Figure BDA0002478271790000043
Figure BDA0002478271790000044
And (3) representing the frequency ratio of the component to the tone level, and adding all components corresponding to the frequency value of the same tone level according to a formula (1) to obtain a twelve-dimensional PCP main melody feature vector:
PCP[p]=∑ k:p(k)=p |X[k]| 2 p =1,2, \ 8230;, 12 formula (2)
Wherein X (k) is an energy spectrum obtained by Fourier transform of the main melody audio data, k is a component index of the Fourier transform, and p is a serial number corresponding to twelve tone levels;
and S33, performing Gaussian window filtering by taking the frequency value f =261.6Hz corresponding to the C4 sound value as a central frequency, wherein the formula is as follows:
Figure BDA0002478271790000045
f sr is the sampling frequency, PCP [ p ]]The feature vector of the twelve-dimensional PCP is obtained by the formula (2), the feature vector is squared after the subtraction of the central frequencies, then the exponential transformation is carried out, the weight of high and low frequency domains is reduced, and the PCP main melody feature vector after the filtering is further obtained;
s34, dividing the PCP main melody feature vector after each dimension is filtered by the sum of all octave frequency component values of the corresponding sound level, multiplying the result by a certain compression coefficient, then carrying out logarithmic transformation, and carrying out logarithmic compression on the features, wherein the formula is as follows:
Figure BDA0002478271790000046
Figure BDA0002478271790000047
is the filtered PCP main melody feature vector obtained by the formula (3), pp] sum The sum of all octave frequency component values of the corresponding sound level is obtained, the ratio is obtained and multiplied by a compression coefficient with eta =1000, 1 is added for summation and then logarithmic transformation is carried out, logarithmic compression of the characteristics is carried out, the redundancy is reduced, and the characteristic vector of the enhanced PCP main melody is obtained.
Further, the step S5 process is as follows:
s51, constructing 36 hidden Markov models, wherein the 36 hidden Markov models correspond to 36 chords respectively;
s52, inputting the enhanced PCP main melody characteristic vector and the chord label file of the corresponding accompaniment track into 36 hidden Markov models;
s53, assuming that the features are not related to each other, traversing all feature vectors, carrying out state transition according to a Markov property, and counting chord state transition times and chord occurrence times;
and S54, calculating an initial probability matrix, a state transition probability matrix, and an average vector and a covariance matrix of each state observation function to obtain parameter estimation and finish training.
Further, the step S6 process is as follows:
s61, extracting the audio data of the main melody to be identified to obtain the feature vector of the enhanced PCP main melody to be identified;
s62, inputting the enhanced PCP main melody feature vector to be identified into the trained 36 hidden Markov models, and searching an optimal path under the maximum likelihood criterion through a Viterbi algorithm to obtain an optimal chord sequence.
Further, the step S23 process is as follows:
equally dividing notes within octaves into 12 semitones according to twelve-tone equal temperament for the main melody track and the accompaniment track sequence, namely [ C, # C, D, # D, E, F, # F, G, # G, A, # A, B ], recording the pitch as p, the corresponding value of p is 0-127, the number of ascending and descending tones is n, the positive number is ascending, otherwise, the negative number is negative; the tone value before modulation is T, the tone value after modulation is T, the tone pitch number is subjected to modulo 12 complementation calculation according to the music theory modulation principle, and then the offset value corresponding to the half tone number is determined by increasing and decreasing the modulation number; after the corresponding interval is shifted, carrying out cyclic shift by taking 12 as a period to obtain a minimum pitch value, namely a corresponding C pitch name; similarly, the octave where the pitch value is located is obtained by dividing the pitch p by 12, rounding down and subtracting 1.
Further, in step S1, the same instrument tracks are fused in a fusion manner in which the same track events are arranged in ascending order of their starting times.
Compared with the prior art, the invention has the following advantages and effects:
1) In the aspect of data characteristics, the enhanced PCP main melody characteristic extraction method combines the music characteristics and the audio signal correlation characteristics and has more excellent performance in the music identification field;
2) C-tone normalization processing is carried out on the melody tones, the data processing difficulty is reduced, the tone uniformity is ensured, and the model accuracy is improved;
3) Compared with the original audio data, the method can improve the correlation performance of the model and shorten the training time;
4) Chord automatic marking is completed through the symbol data set, so that manual marking is avoided, and efficiency is improved;
5) Aiming at the condition that the unique musical instrument interweaving characteristics in music are complicated, a logarithmic compression algorithm is introduced to help reduce complexity and accelerate model training.
Drawings
FIG. 1 is a flow chart of a method for automatically assigning accompaniment chords in accordance with an embodiment of the present invention;
FIG. 2 is a flowchart illustrating the method for extracting the main melody track of the music file according to the contour line (skyline) algorithm in the embodiment of the present invention;
FIG. 3 is a flow chart of note extraction enhanced PCP feature vectors in an embodiment of the present invention;
FIG. 4 is a spectrogram of a conventional PCP feature vector extraction process in an embodiment of the present invention;
fig. 5 is a spectrogram of the enhanced PCP feature vector extraction process in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
As shown in fig. 1, a flowchart of a method of automatically assigning accompaniment chords in the present embodiment is disclosed.
The method comprises the following specific steps:
s1, preprocessing MIDI audio data, deleting the percussion instrument tracks, fusing the same instrument tracks, and arranging the same track event starting time in an ascending order in a fusion mode to obtain a new track MIDI file.
S2, distinguishing a main melody track and an accompaniment track of the MIDI file, extracting the main melody track by using a high pitch contour line skyline algorithm and the accompaniment track by using a low pitch contour line landyline algorithm through a known track clustering algorithm to obtain a main melody track set and an accompaniment track set, and respectively carrying out C-tone normalization processing on two groups of track sequences to ensure uniform tone; then, WAV format audio conversion is carried out on the main melody audio track to obtain main melody audio data, and data preparation is carried out for extraction of the enhanced PCP main melody characteristic vector; the format of the accompaniment tracks remains unchanged and provides for the chord labeling of the training model tags.
S3, framing the main melody audio data, performing sliding sampling by adopting a Hamming window function in a mode of two adjacent window half frames, and performing Fourier transform to obtain energy spectrum characteristics; then according to twelve equal temperaments in the music, neglecting the influence of high octave or low octave, only considering the frequency values of twelve scales of the lowest scale group in the music, correspondingly dividing each component in a frequency domain with the frequency value of the lowest scale respectively to obtain twelve frequency ratios, thereby completing the expansion of the components into twelve frequency bands; adding the components corresponding to the same tone level frequency band aiming at the twelve frequency bands obtained by all the components to obtain a twelve-dimensional PCP main melody feature vector of the whole frequency domain; reducing the influence of high and low frequency weight by Gaussian windowing to obtain the PCP main melody characteristic vector after filtering; according to a logarithmic compression algorithm, a certain compression coefficient is multiplied by the ratio of the frequency value sum of each component and the corresponding sound level, the redundancy rate of the characteristic space is reduced, and the characteristic vector of the enhanced PCP main melody is obtained.
S4, extracting tempo, duration, pitch, rhythm and key signature of the accompaniment music track in the step S2, and calculating the tempo and the rhythm to obtain the time of the measures of the accompaniment music track so as to divide the accompaniment music track into a plurality of music measures; performing harmony transformation on each bar to obtain a chord root and a musical interval relation (including major and minor), and forming a chord sequence of the bar according to the key, the chord root and the musical interval relation so as to obtain the chord sequence of the whole accompaniment track; the chord sequence of the accompaniment tracks is saved as a chord label file in an XML data format.
S5, 36 hidden Markov models are constructed, wherein the 36 hidden Markov models respectively correspond to 36 chords (including three chords, five chords, nine chords, eleven chords, thirteen chords and respective deformed chords), the number of each model state is six, and the models are respectively four active states, starting states and stopping states, wherein an observation function of the active states is formed by a single Gaussian observation function with a diagonal matrix; and then inputting the enhanced PCP main melody feature vector and the chord label file of the corresponding accompaniment track into 36 hidden Markov models for parameter training.
And S6, extracting the audio data of the main melody to be recognized to obtain the characteristic vector of the enhanced PCP main melody to be recognized, inputting the characteristic vector of the enhanced PCP main melody to be recognized into the trained hidden Markov model, and predicting to generate the chord sequence.
As shown in fig. 2, the main melody note vector group is extracted by using the treble contour skyline algorithm, and the specific process is as follows:
s21, the sound tracks of the melody sound track set are fused to generate a sound track, the events of the sound track set are arranged in ascending order according to the starting time, and the sound track set is converted into a note vector set to be processed.
And S22, traversing the note vector group to be processed, if the note vectors with the same starting time exist, reserving the note vector with the highest pitch, and deleting the rest to obtain the high pitch note vector group.
S23, correcting the note ending time of the high-pitch note vector group, and eliminating the polyphone relationship between adjacent notes; if there is a previous vector between two adjacent note vectors with a start time earlier than the next vector and a lower pitch, but an end time later than the next vector, the adjustment makes the two note vectors end at the same time.
As shown in fig. 3, the specific process of extracting the enhanced PCP feature vector is as follows:
and S31, repeatedly framing the main melody audio data, overlapping two adjacent windows by half frame length by adopting a Hamming window function, and performing sliding sampling with the number of sampling points N =4096 in each window, so as to obtain an energy spectrum X (k) through Fourier transform.
S32, according to the obtained frequency spectrum X (k), according to twelve temperaments in music theory, ignoring the influence of high octaves or low octaves, only considering the frequency values of twelve scales of the lowest scale group in music, and correspondingly dividing each component in a frequency domain by the frequency value of the lowest scale to obtain twelve frequency ratios, so that the components are unfolded into twelve frequency bands; adding the components corresponding to the same tone level frequency band aiming at the twelve frequency bands obtained by all the components to further obtain a twelve-dimensional PCP main melody feature vector of the whole frequency domain, wherein the formula is as follows:
Figure BDA0002478271790000091
wherein f is rel Is the reference frequency value of the lowest scale group scale, the lowest scale group comprises scales C1, D1, E1, F1, G1, A1 and B1; f. of sr Is the sampling frequency, N is the number of sampling points, f sr the/N denotes the transform frequency interval of the fourier transform,
Figure BDA0002478271790000092
represents the frequency of each component in the frequency domain, and thus
Figure BDA0002478271790000093
Representing the frequency ratio of the component to the level. Adding all components corresponding to the same tone frequency value according to a formula (1) to obtain a twelve-dimensional PCP main melody feature vector:
PCP[p]=∑ k:p(k)=p |X[k]| 2 p =1,2, \ 8230;, 12 formula (2)
Wherein X (k) is an energy spectrum obtained by Fourier transform of the main melody audio data, k is a component index of the Fourier transform, and p is a serial number corresponding to twelve tone levels;
and S33, performing Gaussian window filtering by taking the frequency value f =261.6Hz corresponding to the C4 sound value as a central frequency, wherein the formula is as follows:
Figure BDA0002478271790000094
f sr is the sampling frequency, PCP [ p ]]The feature vector of the twelve-dimensional PCP is obtained by the formula (2), the feature vector is squared after the subtraction of the central frequencies, then the exponential transformation is carried out, the weight of high and low frequency domains is reduced, and the PCP main melody feature vector after the filtering is further obtained;
s34, dividing the PCP main melody characteristic vector after each dimension is filtered by the sum of all octave frequency component values of the corresponding sound level, multiplying the result by a certain compression coefficient, then carrying out logarithmic transformation, and carrying out logarithmic compression on the characteristics, wherein the formula is as follows
Figure BDA0002478271790000095
Figure BDA0002478271790000096
Is the filtered PCP main melody feature vector obtained by the formula (3), pp] sum Is corresponding to all octave frequencies of tone levelAnd (3) obtaining the ratio by the sum of the rate component values, multiplying the ratio by a compression coefficient with eta =1000, adding 1 for summation, then carrying out logarithmic transformation, carrying out characteristic logarithmic compression, reducing redundancy and obtaining the enhanced PCP main melody characteristic vector.
As shown in fig. 4, the spectrogram extracted from the conventional PCP feature vector is found to have better robustness compared with the spectrogram of the enhanced PCP feature vector shown in fig. 5, and although the feature is compressed to reduce the degree of distinction of irrelevant features, the coherence of effective features is enhanced, and the influence of interfering tones and overtones is better eliminated. As can be seen from fig. 5, the improved PCP features are better represented by Cm chord (C, E, G), gm chord (F, G, # a), fm chord (C, F, # G), and # a chord (D, F, # a).
In summary, the signal feature part of the present invention uses a specific spectral feature for music signals, namely, pitch Class Profile (PCP) feature, and introduces gaussian windowing and logarithmic compression algorithms to improve such feature with reference to the method in audio signal processing, thereby effectively overcoming the existing problems of harmonic interference and uneven timbre. The PCP feature is a spectral feature specifically processed for music signals, and utilizes twelve-dimensional equal laws in music to spread audio signals into twelve-dimensional vectors, so that the audio signals have more excellent performance in music data processing. And the gaussian windowing can effectively filter the signal weight of an unnecessary frequency band, improve the weight of an intermediate frequency where C4 (pitch = 60) is located, and reduce the high-low frequency weight (the high frequency band is mainly harmonic), and the low frequency band is mainly noise, drum sound and the like), so that the influence of harmonic can be effectively filtered. Meanwhile, the melody generated by interweaving various musical instruments contains different pitches, intensities and rhythms, and the extracted feature complexity is too high, so that the performance of the model is influenced, and the chord identification difficulty is increased. Therefore, a logarithmic compression algorithm is introduced, the characteristic redundancy is reduced, and the effects of reducing the complexity and accelerating the training are achieved.
In the feature extraction, the problems are solved by introducing a method of audio signal processing in consideration of the defects of overtone interference, uneven tone color and the like possibly existing in audio: 1) And partial frequency band weight is reduced through a Gaussian windowing mode, and the influence of high-frequency band overtone and low-frequency band noise interference possibly existing on music characteristics can be effectively avoided. 2) And the characteristic frequency spectrum is subjected to logarithmic compression, so that the characteristic complexity is reduced, the change of a characteristic dynamic range is limited, the redundancy is reduced, and the problem of tone color unevenness possibly caused by playing of various musical instruments is solved.
Meanwhile, in the existing model training method, there are input types in which the melody and the accompaniment are both frequency spectrum characteristics of the audio, but the chord label needs to be manually marked for training, which is cumbersome. In addition, there are also some input types in which the melody and the accompaniment are symbolic data (such as MIDI format), but there is no better method for overcoming the above-mentioned characteristic problem, so the present invention selects a data format combining the two, i.e. the main melody is the spectral characteristic data of the audio, and the accompaniment is symbolic data, which can increase the chord labeling efficiency and effectively overcome the problems of harmonic interference and uneven tone. Therefore, the extracted music characteristic effect is better, the robustness of the trained model is stronger, and chord identification is more accurate.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (5)

1. A method for automatically assigning accompaniment chords, said method comprising the steps of:
s1, preprocessing MIDI audio data, deleting the percussion instrument tracks, and fusing the same instrument tracks to obtain new track MIDI files;
s2, extracting a main melody track and an accompaniment track from the MIDI file respectively, performing C-tone normalization on the two groups of track sequences, converting the format of the main melody track into main melody audio data through format conversion, and keeping the format of the accompaniment track unchanged;
s3, carrying out Fourier transform on the main melody audio data to obtain frequency spectrum characteristics, and expanding each component in a frequency domain into twelve frequency bands according to twelve equal temperaments in the music theory; adding the components corresponding to the same tone level frequency band aiming at the twelve frequency bands obtained by all the components to obtain twelve-dimensional PCP tone level contour characteristics of the whole frequency domain, and reducing the influence of high and low frequency weights through Gauss windowing to obtain a PCP main melody characteristic vector after filtering; according to a logarithmic compression algorithm, reducing the redundancy of the characteristic space in a certain compression ratio to obtain an enhanced PCP main melody characteristic vector;
the step S3 comprises the following steps:
s31, repeatedly framing the main melody audio data, adopting a Hamming window function, overlapping two adjacent windows by half frame length, and performing sliding sampling with the number of sampling points N =4096 on each window, so as to obtain an energy spectrum X (k) of the main melody audio data through Fourier transform;
s32, according to twelve equal temperaments in the musical theory, neglecting the influence of high octaves or low octaves, only considering the frequency values of twelve scales of the lowest scale group in the music, correspondingly dividing each component in a frequency domain by the frequency value of the lowest scale respectively to obtain twelve frequency ratios, and accordingly expanding the components into twelve frequency bands; adding the components corresponding to the same tone level frequency band aiming at the twelve frequency bands obtained by all the components to further obtain a twelve-dimensional PCP main melody feature vector of the whole frequency domain, wherein the formula is as follows:
Figure FDA0004050985240000011
wherein f is rel Is the reference frequency value of the lowest scale group scale, the lowest scale group comprises scales C1, D1, E1, F1, G1, A1 and B1; f. of sr Is the sampling frequency, N represents the number of sampling points, f sr the/N denotes the transform frequency interval of the fourier transform,
Figure FDA0004050985240000021
represents the frequency of each component in the frequency domain, and thus
Figure FDA0004050985240000022
Figure FDA0004050985240000023
And (3) representing the frequency ratio of the component to the tone level, and adding all components corresponding to the frequency value of the same tone level according to a formula (1) to obtain a twelve-dimensional PCP main melody feature vector:
PCP[p]=∑ k:p(k)=p |X[k]| 2 p =1,2, \ 8230;, 12 formula (2)
Wherein X (k) is an energy spectrum obtained by Fourier transform of the main melody audio data, k is a component index of the Fourier transform, and p is a serial number corresponding to twelve tone levels;
and S33, performing Gaussian windowing filtering by taking the frequency value f =261.6Hz corresponding to the C4 sound value as a central frequency, wherein the formula is as follows:
Figure FDA0004050985240000024
f sr is the sampling frequency, PCP [ p ]]The feature vector of the twelve-dimensional PCP is obtained by the formula (2), the feature vector is squared after the subtraction of the central frequencies, then the exponential transformation is carried out, the weight of high and low frequency domains is reduced, and the PCP main melody feature vector after the filtering is further obtained;
s34, dividing the PCP main melody feature vector after each dimension is filtered by the sum of all octave frequency component values of the corresponding sound level, multiplying the result by a certain compression coefficient, then carrying out logarithmic transformation, and carrying out logarithmic compression on the features, wherein the formula is as follows:
Figure FDA0004050985240000025
Figure FDA0004050985240000026
is the filtered PCP main melody feature vector obtained by the formula (3), pp] sum Is corresponding toSumming all octave frequency component values of the sound level, obtaining a ratio, multiplying the ratio by a compression coefficient with eta =1000, adding 1 for summation, then carrying out logarithmic transformation, carrying out logarithmic compression on the characteristics, reducing the redundancy rate, and obtaining an enhanced PCP main melody characteristic vector;
s4, extracting the tempo, the duration, the pitch, the rhythm and the key signature of the accompaniment music track, and calculating the tempo and the rhythm to obtain the time length of a bar of the accompaniment music track so as to divide the accompaniment music track into a plurality of music bars; performing harmony transformation on each music measure to obtain chord root and interval relations, wherein the interval relations comprise major keys and minor keys, and a chord sequence of the music measure is formed according to the key number, the chord root and the interval relations so as to construct and obtain a chord sequence of the whole accompaniment track; the chord sequence of the accompaniment music track is saved into a chord label file in an XML data format;
s5, 36 hidden Markov models are constructed, wherein the 36 hidden Markov models correspond to 36 chords respectively, the 36 chords comprise a triad chord, a quintuple chord, an eleven chord, a thirteen chord and respective deformed chords, the number of states of each model is six, and the states are four active states, a starting state and a stopping state respectively, wherein an observation function of the active state is formed by a single Gaussian observation function with a diagonal matrix; then inputting the enhanced PCP main melody feature vector and the chord label file of the corresponding accompaniment track into 36 hidden Markov models together for parameter training;
the step S5 comprises the following processes:
s51, constructing 36 hidden Markov models, wherein the 36 hidden Markov models correspond to 36 chords respectively;
s52, inputting the enhanced PCP main melody characteristic vector and the chord label file of the corresponding accompaniment track into 36 hidden Markov models;
s53, assuming that the features are not related to each other, traversing all feature vectors, carrying out state transition according to a Markov property, and counting chord state transition times and chord occurrence times;
s54, calculating an initial probability matrix, a state transition probability matrix, and an average vector and a covariance matrix of each state observation function to obtain parameter estimation to complete training;
and S6, extracting the audio data of the main melody to be recognized to obtain the characteristic vector of the enhanced PCP main melody to be recognized, inputting the characteristic vector of the enhanced PCP main melody to be recognized into the trained hidden Markov model, and predicting to generate the chord sequence.
2. A method for automatically assigning accompaniment chords according to claim 1, wherein said step S2 comprises the steps of:
s21, extracting a main melody sound track from the MIDI file by using a high pitch contour skyline algorithm;
s22, extracting accompaniment tracks from the MIDI file by using a bass contour landline algorithm;
s23, respectively carrying out C-tone normalization processing on the main melody track and the accompaniment track sequence to ensure uniform tone;
s24, performing WAV format audio conversion on the main melody sound track for extracting the characteristics of the enhanced PCP main melody;
and S25, keeping the format of the accompaniment track symbol data unchanged, and constructing a chord label file of the model.
3. A method for automatically assigning accompaniment chords according to claim 1, wherein said step S6 comprises the steps of:
s61, extracting the audio data of the main melody to be identified to obtain the feature vector of the enhanced PCP main melody to be identified;
s62, inputting the enhanced PCP main melody feature vector to be identified into the trained 36 hidden Markov models, and searching an optimal path under the maximum likelihood criterion through a Viterbi algorithm to obtain an optimal chord sequence.
4. A method for automatically assigning accompaniment chords according to claim 2, wherein said step S23 comprises the steps of:
equally dividing notes within octaves into 12 semitones according to twelve-tone equal temperament for the main melody track and the accompaniment track sequence, namely [ C, # C, D, # D, E, F, # F, G, # G, A, # A, B ], recording the pitch as p, the corresponding value of p is 0-127, the number of ascending and descending tones is n, the positive number is ascending, otherwise, the negative number is negative; the tone value before modulation is T, the tone value after modulation is T, the tone pitch number is subjected to modulo 12 complementation calculation according to the music theory modulation principle, and then the offset value corresponding to the half tone number is determined by increasing and decreasing the modulation number; after the corresponding interval is shifted, carrying out cyclic shift by taking 12 as a period to obtain a minimum pitch value, namely a corresponding C pitch name; similarly, the octave where the pitch value is located is obtained by dividing the pitch p by 12, rounding down and subtracting 1.
5. The method of claim 1, wherein in step S1, the fusion of the same instrument tracks is performed in a fusion manner with ascending starting time sequence of the same track events.
CN202010370928.4A 2020-05-06 2020-05-06 Method for automatically editing and allocating accompaniment chord Active CN111739491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010370928.4A CN111739491B (en) 2020-05-06 2020-05-06 Method for automatically editing and allocating accompaniment chord

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010370928.4A CN111739491B (en) 2020-05-06 2020-05-06 Method for automatically editing and allocating accompaniment chord

Publications (2)

Publication Number Publication Date
CN111739491A CN111739491A (en) 2020-10-02
CN111739491B true CN111739491B (en) 2023-03-21

Family

ID=72646996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010370928.4A Active CN111739491B (en) 2020-05-06 2020-05-06 Method for automatically editing and allocating accompaniment chord

Country Status (1)

Country Link
CN (1) CN111739491B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927667A (en) * 2021-03-26 2021-06-08 平安科技(深圳)有限公司 Chord identification method, apparatus, device and storage medium
CN112951184A (en) * 2021-03-26 2021-06-11 平安科技(深圳)有限公司 Song generation method, device, equipment and storage medium
CN116189636B (en) * 2023-04-24 2023-07-07 深圳视感文化科技有限公司 Accompaniment generation method, device, equipment and storage medium based on electronic musical instrument
CN117251717B (en) * 2023-11-17 2024-02-09 成都立思方信息技术有限公司 Method, device, equipment and medium for extracting synchronous channelized multiple different signals

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03203789A (en) * 1989-12-30 1991-09-05 Casio Comput Co Ltd Music part generating device
JP2016057389A (en) * 2014-09-08 2016-04-21 ヤマハ株式会社 Chord determination device and chord determination program
CN106097280A (en) * 2016-06-23 2016-11-09 浙江工业大学之江学院 Based on normal state against the medical ultrasound image denoising method of Gauss model
CN106847248A (en) * 2017-01-05 2017-06-13 天津大学 Chord recognition methods based on robustness scale contour feature and vector machine
CN110134823A (en) * 2019-04-08 2019-08-16 华南理工大学 The MIDI musical genre classification method of Markov model is shown based on normalization note

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020065649A1 (en) * 2000-08-25 2002-05-30 Yoon Kim Mel-frequency linear prediction speech recognition apparatus and method
US7705231B2 (en) * 2007-09-07 2010-04-27 Microsoft Corporation Automatic accompaniment for vocal melodies

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03203789A (en) * 1989-12-30 1991-09-05 Casio Comput Co Ltd Music part generating device
JP2016057389A (en) * 2014-09-08 2016-04-21 ヤマハ株式会社 Chord determination device and chord determination program
CN106097280A (en) * 2016-06-23 2016-11-09 浙江工业大学之江学院 Based on normal state against the medical ultrasound image denoising method of Gauss model
CN106847248A (en) * 2017-01-05 2017-06-13 天津大学 Chord recognition methods based on robustness scale contour feature and vector machine
CN110134823A (en) * 2019-04-08 2019-08-16 华南理工大学 The MIDI musical genre classification method of Markov model is shown based on normalization note

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《On the Relative Importance of Individual Components of Chord Recognition Systems》;Taemin Cho et al.;《IEEE/ACM Transactions on Audio, Speech, and Language Processing》;20131223;第22卷(第2期);第97-100页 *
《基于PCP特征的和弦识别研究与探讨》;王峰;《数学的实践与认识》;20100331;第40卷(第5期);第1-16页 *
《基于隐马尔可夫模型的自动化伴奏系统》;蔡斯凡;《中国优秀硕士学位论文全文数据库 基础科技辑》;20181215(第12期);第4、17-32页 *

Also Published As

Publication number Publication date
CN111739491A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN111739491B (en) Method for automatically editing and allocating accompaniment chord
Lee et al. Acoustic chord transcription and key extraction from audio using key-dependent HMMs trained on synthesized audio
JP5463655B2 (en) Information processing apparatus, voice analysis method, and program
JP4465626B2 (en) Information processing apparatus and method, and program
JP5625235B2 (en) Information processing apparatus, voice analysis method, and program
Ikemiya et al. Singing voice analysis and editing based on mutually dependent F0 estimation and source separation
Papadopoulos et al. Joint estimation of chords and downbeats from an audio signal
JP5593608B2 (en) Information processing apparatus, melody line extraction method, baseline extraction method, and program
US7288710B2 (en) Music searching apparatus and method
US7179981B2 (en) Music structure detection apparatus and method
US20100131086A1 (en) Sound source separation system, sound source separation method, and computer program for sound source separation
CN112382257B (en) Audio processing method, device, equipment and medium
CN109979488B (en) System for converting human voice into music score based on stress analysis
WO2023040332A1 (en) Method for generating musical score, electronic device, and readable storage medium
Benetos et al. Automatic transcription of Turkish microtonal music
WO2010043258A1 (en) Method for analyzing a digital music audio signal
Lerch Software-based extraction of objective parameters from music performances
Bosch et al. Melody extraction based on a source-filter model using pitch contour selection
Vatolkin Evolutionary approximation of instrumental texture in polyphonic audio recordings
Kumar et al. Melody extraction from music: A comprehensive study
CN112634841B (en) Guitar music automatic generation method based on voice recognition
CN111696500B (en) MIDI sequence chord identification method and device
Noland et al. Influences of signal processing, tone profiles, and chord progressions on a model for estimating the musical key from audio
Wang et al. Automatic music transcription dedicated to Chinese traditional plucked string instrument pipa using multi-string probabilistic latent component analysis models
Devaney An empirical study of the influence of musical context on intonation practices in solo singers and SATB ensembles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant