CN111739491B

CN111739491B - Method for automatically editing and allocating accompaniment chord

Info

Publication number: CN111739491B
Application number: CN202010370928.4A
Authority: CN
Inventors: 韦岗; 刘俊伟; 曹燕
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-05-06
Filing date: 2020-05-06
Publication date: 2023-03-21
Anticipated expiration: 2040-05-06
Also published as: CN111739491A

Abstract

The invention discloses a method for automatically editing and allocating accompaniment chords, which comprises feature extraction, model training and chord prediction, wherein the feature extraction is the feature extraction of music audio data, aiming at the problem of low chord identification rate caused by harmonic variation and uneven tone among different musical instruments, improved pitch level profile features are adopted, gaussian windowing and logarithmic compression algorithms are introduced, and the negative effects caused by harmonic and multiple tones are eliminated; label calibration and data training are needed in model training, chord information is extracted through accompaniment track symbol data to obtain a corresponding chord sequence, the chord sequence is constructed into a chord label file, and the chord label file and the corresponding enhanced PCP main melody characteristic vector are input into a hidden Markov model together for parameter training; and (4) chord prediction is carried out on the enhanced PCP main melody feature vector input model to be identified, and finally a chord sequence is generated.

Description

Method for automatically editing and allocating accompaniment chord

Technical Field

The invention relates to the technical field of data processing, in particular to a method for automatically editing and allocating accompaniment chords.

Background

The rapid development of modern economy promotes the continuous rise of the demand of people. The change from the original material needs to the mental needs reflects the upgrading of new markets and development spaces. In the aspect of entertainment, people are also diversified, wherein the market of music is gradually growing, and more people are researching music and investing in the industry of music.

Creation is one of the indispensable factors of music, and good music requires the creator to have good music literacy, which needs a lot of rich musical theory as support to ensure good creation. However, this requires long-term accumulation and learning, as well as higher music literacy to accomplish. One important field involved in creation is the allocation of accompaniment chords, which generally needs to be completed by the intensive music talent and rich music theory knowledge, and most of the accompaniment chords are manually operated, so that the threshold is higher.

Today, related computer technologies have been developed in succession to address this problem, facilitating manual chord assignments for burdensome chords, automated through algorithmic models, such as: hidden markov models, stochastic processes, genetic algorithms, deep networks, and the like. However, many of the existing models are hidden markov models, which require a large amount of training data as input for model training, and the performance of the model is greatly influenced by the quality of the training data. However, in most of audio data, if there are many musical instruments, there are problems of harmonic interference and uneven tone under many circumstances, which greatly affects the extraction of relevant features of audio, and further affects the model performance, and reduces the chord recognition rate.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provides a method for automatically editing and matching accompaniment chords.

The purpose of the invention can be achieved by adopting the following technical scheme:

a method for automatically assigning accompaniment chords, said method comprising the steps of:

s1, preprocessing a MIDI file, deleting the percussion instrument tracks in the MIDI file, and fusing the same instrument tracks to obtain a new track MIDI file;

s2, extracting a main melody track and an accompaniment track from the MIDI file respectively, performing C-tone normalization on the two groups of track sequences, converting the format of the main melody track into main melody audio data through format conversion, and keeping the format of the accompaniment track unchanged;

s3, carrying out Fourier transform on the main melody audio data to obtain frequency spectrum characteristics, and expanding each component in a frequency domain into twelve frequency bands according to twelve equal temperaments in the music theory; adding the components corresponding to the same tone level frequency band aiming at twelve frequency bands obtained by all the components to obtain twelve-dimensional PCP tone level contour characteristics of the whole frequency domain, and reducing the influence of high and low frequency weights through Gaussian windowing to obtain a PCP main melody characteristic vector after filtering; according to a logarithmic compression algorithm, reducing the redundancy of the characteristic space in a certain compression ratio to obtain an enhanced PCP main melody characteristic vector;

s4, extracting tempo, duration, pitch, rhythm and key signature of the accompaniment music track, and calculating the tempo and the rhythm to obtain the minor-bar duration of the accompaniment music track so as to divide the accompaniment music track into a plurality of music minor bars; performing harmony transformation on each music measure to obtain chord root and interval relations, wherein the interval relations comprise major keys and minor keys, and a chord sequence of the music measure is formed according to the key number, the chord root and the interval relations so as to construct and obtain a chord sequence of the whole accompaniment track; the chord sequence of the accompaniment music track is saved into a chord label file in an XML data format;

s5, 36 hidden Markov models are constructed, wherein the 36 hidden Markov models correspond to 36 chords respectively, the 36 chords comprise a triad chord, a quintuple chord, an eleven chord, a thirteen chord and respective deformed chords, the number of states of each model is six, and the states are four active states, a starting state and a stopping state respectively, wherein an observation function of the active state is formed by a single Gaussian observation function with a diagonal matrix; then inputting the enhanced PCP main melody feature vector and the chord label file of the corresponding accompaniment track into 36 hidden Markov models together for parameter training;

and S6, extracting the audio data of the main melody to be recognized to obtain the characteristic vector of the enhanced PCP main melody to be recognized, inputting the characteristic vector of the enhanced PCP main melody to be recognized into the trained hidden Markov model, and predicting to generate the chord sequence.

Further, the step S2 process is as follows:

s21, extracting a main melody sound track from the MIDI file by using a high pitch contour skyline algorithm;

s22, extracting accompaniment tracks from the MIDI file by using a bass contour landline algorithm;

s23, respectively carrying out C-tone normalization processing on the main melody track and the accompaniment track sequence to ensure uniform tone;

s24, WAV format audio conversion is carried out on the main melody track for extracting the characteristics of the enhanced PCP main melody;

and S25, keeping the format of the accompaniment track symbol data unchanged, and constructing a chord label file of the model.

Further, the step S3 process is as follows:

s31, repeatedly framing the main melody audio data, adopting a Hamming window function, overlapping two adjacent windows by half frame length, and performing sliding sampling with the number of sampling points N =4096 on each window, so as to obtain an energy spectrum X (k) of the main melody audio data through Fourier transform;

s32, according to twelve equal temperaments in the musical theory, neglecting the influence of high octaves or low octaves, only considering the frequency values of twelve scales of the lowest scale group in the music, correspondingly dividing each component in a frequency domain by the frequency value of the lowest scale respectively to obtain twelve frequency ratios, and accordingly expanding the components into twelve frequency bands; adding the components corresponding to the same tone level frequency band aiming at the twelve frequency bands obtained by all the components to further obtain a twelve-dimensional PCP melody characteristic vector of the whole frequency domain, wherein the formula is as follows:

wherein f is _rel Is the reference frequency value of the lowest scale group scale, the lowest scale group comprises scales C1, D1, E1, F1, G1, A1 and B1; f. of _sr Is the sampling frequency, N represents the samplingNumber of points, f _sr the/N denotes the transform frequency interval of the fourier transform,

represents the frequency of each component in the frequency domain, and thus

And (3) representing the frequency ratio of the component to the tone level, and adding all components corresponding to the frequency value of the same tone level according to a formula (1) to obtain a twelve-dimensional PCP main melody feature vector:

PCP[p]＝∑ _k:p(k)＝p |X[k]| ² p =1,2, \ 8230;, 12 formula (2)

Wherein X (k) is an energy spectrum obtained by Fourier transform of the main melody audio data, k is a component index of the Fourier transform, and p is a serial number corresponding to twelve tone levels;

and S33, performing Gaussian window filtering by taking the frequency value f =261.6Hz corresponding to the C4 sound value as a central frequency, wherein the formula is as follows:

f _sr is the sampling frequency, PCP [ p ]]The feature vector of the twelve-dimensional PCP is obtained by the formula (2), the feature vector is squared after the subtraction of the central frequencies, then the exponential transformation is carried out, the weight of high and low frequency domains is reduced, and the PCP main melody feature vector after the filtering is further obtained;

s34, dividing the PCP main melody feature vector after each dimension is filtered by the sum of all octave frequency component values of the corresponding sound level, multiplying the result by a certain compression coefficient, then carrying out logarithmic transformation, and carrying out logarithmic compression on the features, wherein the formula is as follows:

is the filtered PCP main melody feature vector obtained by the formula (3), pp] _sum The sum of all octave frequency component values of the corresponding sound level is obtained, the ratio is obtained and multiplied by a compression coefficient with eta =1000, 1 is added for summation and then logarithmic transformation is carried out, logarithmic compression of the characteristics is carried out, the redundancy is reduced, and the characteristic vector of the enhanced PCP main melody is obtained.

Further, the step S5 process is as follows:

s51, constructing 36 hidden Markov models, wherein the 36 hidden Markov models correspond to 36 chords respectively;

s52, inputting the enhanced PCP main melody characteristic vector and the chord label file of the corresponding accompaniment track into 36 hidden Markov models;

s53, assuming that the features are not related to each other, traversing all feature vectors, carrying out state transition according to a Markov property, and counting chord state transition times and chord occurrence times;

and S54, calculating an initial probability matrix, a state transition probability matrix, and an average vector and a covariance matrix of each state observation function to obtain parameter estimation and finish training.

Further, the step S6 process is as follows:

s61, extracting the audio data of the main melody to be identified to obtain the feature vector of the enhanced PCP main melody to be identified;

s62, inputting the enhanced PCP main melody feature vector to be identified into the trained 36 hidden Markov models, and searching an optimal path under the maximum likelihood criterion through a Viterbi algorithm to obtain an optimal chord sequence.

Further, the step S23 process is as follows:

equally dividing notes within octaves into 12 semitones according to twelve-tone equal temperament for the main melody track and the accompaniment track sequence, namely [ C, # C, D, # D, E, F, # F, G, # G, A, # A, B ], recording the pitch as p, the corresponding value of p is 0-127, the number of ascending and descending tones is n, the positive number is ascending, otherwise, the negative number is negative; the tone value before modulation is T, the tone value after modulation is T, the tone pitch number is subjected to modulo 12 complementation calculation according to the music theory modulation principle, and then the offset value corresponding to the half tone number is determined by increasing and decreasing the modulation number; after the corresponding interval is shifted, carrying out cyclic shift by taking 12 as a period to obtain a minimum pitch value, namely a corresponding C pitch name; similarly, the octave where the pitch value is located is obtained by dividing the pitch p by 12, rounding down and subtracting 1.

Further, in step S1, the same instrument tracks are fused in a fusion manner in which the same track events are arranged in ascending order of their starting times.

Compared with the prior art, the invention has the following advantages and effects:

1) In the aspect of data characteristics, the enhanced PCP main melody characteristic extraction method combines the music characteristics and the audio signal correlation characteristics and has more excellent performance in the music identification field;

2) C-tone normalization processing is carried out on the melody tones, the data processing difficulty is reduced, the tone uniformity is ensured, and the model accuracy is improved;

3) Compared with the original audio data, the method can improve the correlation performance of the model and shorten the training time;

4) Chord automatic marking is completed through the symbol data set, so that manual marking is avoided, and efficiency is improved;

5) Aiming at the condition that the unique musical instrument interweaving characteristics in music are complicated, a logarithmic compression algorithm is introduced to help reduce complexity and accelerate model training.

Drawings

FIG. 1 is a flow chart of a method for automatically assigning accompaniment chords in accordance with an embodiment of the present invention;

FIG. 2 is a flowchart illustrating the method for extracting the main melody track of the music file according to the contour line (skyline) algorithm in the embodiment of the present invention;

FIG. 3 is a flow chart of note extraction enhanced PCP feature vectors in an embodiment of the present invention;

FIG. 4 is a spectrogram of a conventional PCP feature vector extraction process in an embodiment of the present invention;

fig. 5 is a spectrogram of the enhanced PCP feature vector extraction process in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

As shown in fig. 1, a flowchart of a method of automatically assigning accompaniment chords in the present embodiment is disclosed.

The method comprises the following specific steps:

s1, preprocessing MIDI audio data, deleting the percussion instrument tracks, fusing the same instrument tracks, and arranging the same track event starting time in an ascending order in a fusion mode to obtain a new track MIDI file.

S2, distinguishing a main melody track and an accompaniment track of the MIDI file, extracting the main melody track by using a high pitch contour line skyline algorithm and the accompaniment track by using a low pitch contour line landyline algorithm through a known track clustering algorithm to obtain a main melody track set and an accompaniment track set, and respectively carrying out C-tone normalization processing on two groups of track sequences to ensure uniform tone; then, WAV format audio conversion is carried out on the main melody audio track to obtain main melody audio data, and data preparation is carried out for extraction of the enhanced PCP main melody characteristic vector; the format of the accompaniment tracks remains unchanged and provides for the chord labeling of the training model tags.

S3, framing the main melody audio data, performing sliding sampling by adopting a Hamming window function in a mode of two adjacent window half frames, and performing Fourier transform to obtain energy spectrum characteristics; then according to twelve equal temperaments in the music, neglecting the influence of high octave or low octave, only considering the frequency values of twelve scales of the lowest scale group in the music, correspondingly dividing each component in a frequency domain with the frequency value of the lowest scale respectively to obtain twelve frequency ratios, thereby completing the expansion of the components into twelve frequency bands; adding the components corresponding to the same tone level frequency band aiming at the twelve frequency bands obtained by all the components to obtain a twelve-dimensional PCP main melody feature vector of the whole frequency domain; reducing the influence of high and low frequency weight by Gaussian windowing to obtain the PCP main melody characteristic vector after filtering; according to a logarithmic compression algorithm, a certain compression coefficient is multiplied by the ratio of the frequency value sum of each component and the corresponding sound level, the redundancy rate of the characteristic space is reduced, and the characteristic vector of the enhanced PCP main melody is obtained.

S4, extracting tempo, duration, pitch, rhythm and key signature of the accompaniment music track in the step S2, and calculating the tempo and the rhythm to obtain the time of the measures of the accompaniment music track so as to divide the accompaniment music track into a plurality of music measures; performing harmony transformation on each bar to obtain a chord root and a musical interval relation (including major and minor), and forming a chord sequence of the bar according to the key, the chord root and the musical interval relation so as to obtain the chord sequence of the whole accompaniment track; the chord sequence of the accompaniment tracks is saved as a chord label file in an XML data format.

S5, 36 hidden Markov models are constructed, wherein the 36 hidden Markov models respectively correspond to 36 chords (including three chords, five chords, nine chords, eleven chords, thirteen chords and respective deformed chords), the number of each model state is six, and the models are respectively four active states, starting states and stopping states, wherein an observation function of the active states is formed by a single Gaussian observation function with a diagonal matrix; and then inputting the enhanced PCP main melody feature vector and the chord label file of the corresponding accompaniment track into 36 hidden Markov models for parameter training.

As shown in fig. 2, the main melody note vector group is extracted by using the treble contour skyline algorithm, and the specific process is as follows:

s21, the sound tracks of the melody sound track set are fused to generate a sound track, the events of the sound track set are arranged in ascending order according to the starting time, and the sound track set is converted into a note vector set to be processed.

And S22, traversing the note vector group to be processed, if the note vectors with the same starting time exist, reserving the note vector with the highest pitch, and deleting the rest to obtain the high pitch note vector group.

S23, correcting the note ending time of the high-pitch note vector group, and eliminating the polyphone relationship between adjacent notes; if there is a previous vector between two adjacent note vectors with a start time earlier than the next vector and a lower pitch, but an end time later than the next vector, the adjustment makes the two note vectors end at the same time.

As shown in fig. 3, the specific process of extracting the enhanced PCP feature vector is as follows:

and S31, repeatedly framing the main melody audio data, overlapping two adjacent windows by half frame length by adopting a Hamming window function, and performing sliding sampling with the number of sampling points N =4096 in each window, so as to obtain an energy spectrum X (k) through Fourier transform.

S32, according to the obtained frequency spectrum X (k), according to twelve temperaments in music theory, ignoring the influence of high octaves or low octaves, only considering the frequency values of twelve scales of the lowest scale group in music, and correspondingly dividing each component in a frequency domain by the frequency value of the lowest scale to obtain twelve frequency ratios, so that the components are unfolded into twelve frequency bands; adding the components corresponding to the same tone level frequency band aiming at the twelve frequency bands obtained by all the components to further obtain a twelve-dimensional PCP main melody feature vector of the whole frequency domain, wherein the formula is as follows:

wherein f is _rel Is the reference frequency value of the lowest scale group scale, the lowest scale group comprises scales C1, D1, E1, F1, G1, A1 and B1; f. of _sr Is the sampling frequency, N is the number of sampling points, f _sr the/N denotes the transform frequency interval of the fourier transform,

represents the frequency of each component in the frequency domain, and thus

Representing the frequency ratio of the component to the level. Adding all components corresponding to the same tone frequency value according to a formula (1) to obtain a twelve-dimensional PCP main melody feature vector:

PCP[p]＝∑ _k:p(k)＝p |X[k]| ² p =1,2, \ 8230;, 12 formula (2)

s34, dividing the PCP main melody characteristic vector after each dimension is filtered by the sum of all octave frequency component values of the corresponding sound level, multiplying the result by a certain compression coefficient, then carrying out logarithmic transformation, and carrying out logarithmic compression on the characteristics, wherein the formula is as follows

Is the filtered PCP main melody feature vector obtained by the formula (3), pp] _sum Is corresponding to all octave frequencies of tone levelAnd (3) obtaining the ratio by the sum of the rate component values, multiplying the ratio by a compression coefficient with eta =1000, adding 1 for summation, then carrying out logarithmic transformation, carrying out characteristic logarithmic compression, reducing redundancy and obtaining the enhanced PCP main melody characteristic vector.

As shown in fig. 4, the spectrogram extracted from the conventional PCP feature vector is found to have better robustness compared with the spectrogram of the enhanced PCP feature vector shown in fig. 5, and although the feature is compressed to reduce the degree of distinction of irrelevant features, the coherence of effective features is enhanced, and the influence of interfering tones and overtones is better eliminated. As can be seen from fig. 5, the improved PCP features are better represented by Cm chord (C, E, G), gm chord (F, G, # a), fm chord (C, F, # G), and # a chord (D, F, # a).

In summary, the signal feature part of the present invention uses a specific spectral feature for music signals, namely, pitch Class Profile (PCP) feature, and introduces gaussian windowing and logarithmic compression algorithms to improve such feature with reference to the method in audio signal processing, thereby effectively overcoming the existing problems of harmonic interference and uneven timbre. The PCP feature is a spectral feature specifically processed for music signals, and utilizes twelve-dimensional equal laws in music to spread audio signals into twelve-dimensional vectors, so that the audio signals have more excellent performance in music data processing. And the gaussian windowing can effectively filter the signal weight of an unnecessary frequency band, improve the weight of an intermediate frequency where C4 (pitch = 60) is located, and reduce the high-low frequency weight (the high frequency band is mainly harmonic), and the low frequency band is mainly noise, drum sound and the like), so that the influence of harmonic can be effectively filtered. Meanwhile, the melody generated by interweaving various musical instruments contains different pitches, intensities and rhythms, and the extracted feature complexity is too high, so that the performance of the model is influenced, and the chord identification difficulty is increased. Therefore, a logarithmic compression algorithm is introduced, the characteristic redundancy is reduced, and the effects of reducing the complexity and accelerating the training are achieved.

In the feature extraction, the problems are solved by introducing a method of audio signal processing in consideration of the defects of overtone interference, uneven tone color and the like possibly existing in audio: 1) And partial frequency band weight is reduced through a Gaussian windowing mode, and the influence of high-frequency band overtone and low-frequency band noise interference possibly existing on music characteristics can be effectively avoided. 2) And the characteristic frequency spectrum is subjected to logarithmic compression, so that the characteristic complexity is reduced, the change of a characteristic dynamic range is limited, the redundancy is reduced, and the problem of tone color unevenness possibly caused by playing of various musical instruments is solved.

Meanwhile, in the existing model training method, there are input types in which the melody and the accompaniment are both frequency spectrum characteristics of the audio, but the chord label needs to be manually marked for training, which is cumbersome. In addition, there are also some input types in which the melody and the accompaniment are symbolic data (such as MIDI format), but there is no better method for overcoming the above-mentioned characteristic problem, so the present invention selects a data format combining the two, i.e. the main melody is the spectral characteristic data of the audio, and the accompaniment is symbolic data, which can increase the chord labeling efficiency and effectively overcome the problems of harmonic interference and uneven tone. Therefore, the extracted music characteristic effect is better, the robustness of the trained model is stronger, and chord identification is more accurate.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A method for automatically assigning accompaniment chords, said method comprising the steps of:

s1, preprocessing MIDI audio data, deleting the percussion instrument tracks, and fusing the same instrument tracks to obtain new track MIDI files;

s3, carrying out Fourier transform on the main melody audio data to obtain frequency spectrum characteristics, and expanding each component in a frequency domain into twelve frequency bands according to twelve equal temperaments in the music theory; adding the components corresponding to the same tone level frequency band aiming at the twelve frequency bands obtained by all the components to obtain twelve-dimensional PCP tone level contour characteristics of the whole frequency domain, and reducing the influence of high and low frequency weights through Gauss windowing to obtain a PCP main melody characteristic vector after filtering; according to a logarithmic compression algorithm, reducing the redundancy of the characteristic space in a certain compression ratio to obtain an enhanced PCP main melody characteristic vector;

the step S3 comprises the following steps:

s32, according to twelve equal temperaments in the musical theory, neglecting the influence of high octaves or low octaves, only considering the frequency values of twelve scales of the lowest scale group in the music, correspondingly dividing each component in a frequency domain by the frequency value of the lowest scale respectively to obtain twelve frequency ratios, and accordingly expanding the components into twelve frequency bands; adding the components corresponding to the same tone level frequency band aiming at the twelve frequency bands obtained by all the components to further obtain a twelve-dimensional PCP main melody feature vector of the whole frequency domain, wherein the formula is as follows:

wherein f is _rel Is the reference frequency value of the lowest scale group scale, the lowest scale group comprises scales C1, D1, E1, F1, G1, A1 and B1; f. of _sr Is the sampling frequency, N represents the number of sampling points, f _sr the/N denotes the transform frequency interval of the fourier transform,

represents the frequency of each component in the frequency domain, and thus

PCP[p]＝∑ _k:p(k)＝p |X[k]| ² p =1,2, \ 8230;, 12 formula (2)

and S33, performing Gaussian windowing filtering by taking the frequency value f =261.6Hz corresponding to the C4 sound value as a central frequency, wherein the formula is as follows:

is the filtered PCP main melody feature vector obtained by the formula (3), pp] _sum Is corresponding toSumming all octave frequency component values of the sound level, obtaining a ratio, multiplying the ratio by a compression coefficient with eta =1000, adding 1 for summation, then carrying out logarithmic transformation, carrying out logarithmic compression on the characteristics, reducing the redundancy rate, and obtaining an enhanced PCP main melody characteristic vector;

s4, extracting the tempo, the duration, the pitch, the rhythm and the key signature of the accompaniment music track, and calculating the tempo and the rhythm to obtain the time length of a bar of the accompaniment music track so as to divide the accompaniment music track into a plurality of music bars; performing harmony transformation on each music measure to obtain chord root and interval relations, wherein the interval relations comprise major keys and minor keys, and a chord sequence of the music measure is formed according to the key number, the chord root and the interval relations so as to construct and obtain a chord sequence of the whole accompaniment track; the chord sequence of the accompaniment music track is saved into a chord label file in an XML data format;

the step S5 comprises the following processes:

s54, calculating an initial probability matrix, a state transition probability matrix, and an average vector and a covariance matrix of each state observation function to obtain parameter estimation to complete training;

2. A method for automatically assigning accompaniment chords according to claim 1, wherein said step S2 comprises the steps of:

s24, performing WAV format audio conversion on the main melody sound track for extracting the characteristics of the enhanced PCP main melody;

3. A method for automatically assigning accompaniment chords according to claim 1, wherein said step S6 comprises the steps of:

4. A method for automatically assigning accompaniment chords according to claim 2, wherein said step S23 comprises the steps of:

5. The method of claim 1, wherein in step S1, the fusion of the same instrument tracks is performed in a fusion manner with ascending starting time sequence of the same track events.