CN104992712B

CN104992712B - It can identify music automatically at the method for spectrum

Info

Publication number: CN104992712B
Application number: CN201510389632.6A
Authority: CN
Inventors: 刘德文; 茄振中; 陈洪波; 阮广璇
Original assignee: MEET STUDIO Co Ltd
Current assignee: MEET STUDIO Co Ltd
Priority date: 2015-07-06
Filing date: 2015-07-06
Publication date: 2019-02-12
Anticipated expiration: 2035-07-06
Also published as: CN104992712A

Abstract

The invention discloses can identify music automatically at the method for spectrum.Comprising steps of (1) identifies audio, the variation of overall spectrum is tracked, checks whether there is pronunciation trend in real time；(2) spectral change for tracking each pitch, checks which is pitch sounding in real time；(3) continue the spectral change of tracking sounding sound pitch, the pitch sounding before checking judges whether it is erroneous judgement；(4) sounding pitch data, the phonation time data obtained according to above step, estimate speed, mode and the note type of the music score of Chinese operas, generate the music score of Chinese operas.The present invention has created one kind can be according to pitch recognition result, inversely estimate the technology of the original music score of Chinese operas, the final complete method realized automatic identification audio music and automatically form the music score of Chinese operas, it is simple, efficiently credible that operation can be met, there is robustness on pronunciation number, the requirement of compatible various musical instruments applies to cell phone software, in insertion type equipment, applies also for composition creation and the scenes such as checks at spectrum, instrument playing practice automatically.

Description

It can identify music automatically at the method for spectrum

Technical field

The present invention relates to can identify that music at the method for spectrum, belongs to the identification of music pitch automatically, multitone is identified, automatically formed The technical fields such as the music score of Chinese operas.

Background technique

Pitch identification technology tool has been widely used, and can be used for the sides such as tuner, melody identification and audio file conversion Face.In the tuner of musical instrument, the pitch identification technology of single-tone can be mainly applied to, utilizes the base of harmonic spike method identification sound Frequently, whether inspection musical instrument pitch frequencies have offset, but the pitch identification that this technology is realized can only recognize single-tone, if simultaneously Several pitches that sound will be unable to identify.And in the application of melody identification, it can identify that roughly user groans using pitch identification technology The melody sung, searches in this, as basis of characterization and matches associated song.In digital audio file, it is generally divided into audio waveform File (including mp3, wav etc.) and MIDI musical instrument digital interface file, wherein waveform audio file only records the wave of the audio Shape recorded information, MIDI file is then the music score of Chinese operas information for recording music, currently, there are no effective methods audio waveform by people File is converted to MIDI file, can only write out again the original music score of Chinese operas by auditory experience by the professional person for being familiar with music.

The technology applied to pitch identification mainly has harmonic spike method, parallel processing method, wavelet analysis method etc. at present.Wherein Harmonic spike method is performed an analysis for frequency domain harmonic, advantage be method intuitively can be achieved, operand it is little, but due to depend on energy Maximum harmonic wave is measured as analyzing point, the case where multitone identifies can not be analyzed, frequent harmonic wave is inclined simultaneously because musical instrument pronounces It moves, the situation of the fundamental frequency analyzed inaccuracy often occurs.Parallel processing method is to carry out audio analysis for time domain, according to fundamental tone The principle for carrying out periodically regular superposition on a timeline with the harmonic wave of fundamental tone, estimates the distance between two wave crests, thus The audio period is calculated, advantage is that operation and realization are all fairly simple, the disadvantage is that unstable result.Wavelet analysis rule is to utilize Wavelet transformation makees deep analysis frequency domain character, fundamental tone is extracted, its advantage is that accuracy rate is higher, the disadvantage is that operand is too big.

And multitone estimation technique, the difficult point always being in music recognition, from the Stanford University scholar Moore of the seventies It has been begun one's study that, " sensory information of music signal extracts " research project of Osaka University eighties, the Massachusetts Institute of Technology Hawley and Martin project, the Kashino project of Tokyo University, the Sterian project of University of Michigan, up to La Mo university Douglas Nunn project etc. all accordingly achieves research and breaks through, but does not have an operation simple, efficiently credible always, is pronouncing There is robustness, the method for compatible various musical instruments can be applied to the small-sized arithmetic facility software in non-laboratory, such as intelligent hand on number Machine, Intelligent flat, insertion type equipment etc..

Summary of the invention

The purpose of the present invention is to provide that can identify music automatically at the method for spectrum, the present invention has created one kind can be according to sound High recognition result inversely estimates the technology of the original music score of Chinese operas, and final complete realization automatic identification audio music automatically forms the music score of Chinese operas Method.This technology can be applied to composition creation automatically into scenes such as spectrum, instrument playing practice inspections.

To achieve the goals above, The technical solution adopted by the invention is as follows:

It can identify that music at the method for spectrum, includes the following steps: automatically

(1) make environmental noise spectrum analysis, wherein energy value unit takes decibel (dB), carries out a period of time to environmental noise Sampling, record average frequency spectrum energy Distribution value A and each frequency band energy of the environmental noise within this time in the time The standard deviation V of distribution can draft the sounding energy cut-off of each frequency band later the step of according to two above data distribution Value TTL.Principle is: higher in the noise energy average value of some frequency band, sounding energy threshold is higher；Noise energy standard Difference is higher, and sounding energy threshold is higher.Sounding energy threshold: TTL=A x P+V x can be generally drafted as follows Q；Wherein P, Q are the fixed values empirically taken, can be adjusted according to environment, and P=3, Q=1 are generally taken.

(2) tracking overall spectrum variation, checks whether there is pronunciation trend in real time.Effective articulation frequency range is set, is marked The fundamental frequency frequency domain of quasi- piano pronunciation is about 27.5HZ to 4186HZ, it is contemplated that fundamental frequency deviates and reserves the frequency of 4 ~ 5 times of harmonics Domain space generally sets effective articulation frequency range as 20HZ to 20000HZ.Calculating current spectral in the range is more than hair The area Area of acoustic energy threshold value, if the area is greater than preset area threshold Attl, then it is assumed that audio has hair at this time Sound trend, otherwise it is assumed that present video is still mute.

(3) spectral change for tracking each pitch, checks which is pitch sounding in real time.For the sound of each potential sounding Height, the fundamental frequency and harmonics of analyzing the pitch whether there is the peak value of section energy, and the peak value has to be larger than sounding energy threshold Just be included into calculating, whether there is according to peak value and the size of peak value, calculate each pitch may sounding confidence level L.It connects The maximum pitch of inspection confidence level whether meet two conditions: 1) at this time the confidence value L of the pitch suddenly from it is steady or under Drop state becomes sharp increase state, i.e., (current confidence value L is greater than the certain multiple of the confidence value of former frame to satisfaction), And meet that (confidence value of former frame will be within the scope of the certain proportion of the confidence level mean value in a bit of time before, i.e., The confidence level of former frame is in steady or decline state)；2) confidence level maximum value, which accounts for the ratio of all pitch confidence value summations, is It is no to be greater than threshold value, that is, meet condition.If meeting two above condition, then it is assumed that pitch sounding at this moment, while eliminating should The influence of the fundamental frequency of pitch and the peak value of harmonics to other pitches, calculates the confidence value of other pitches, most to remaining confidence level The judgement for the pitch circulation two above condition being worth greatly, continually looks for the pitch of sounding, until cannot meet two above condition It terminates.

(4) continue the spectral change of tracking sounding pitch, the sounding before checking judges whether to judge by accident.In instruments sound In, can all there be continuity after a general pitch sounding within a short period of time, setting time length t is the time model for checking erroneous judgement It encloses, if it find that the during this period of time confidence value decaying of sounding pitch is too fast, then it is assumed that the pitch sounding is erroneous judgement.Judge Condition, wherein confidence value when being the pitch sounding, is elapsed time after sounding, it is a fixed value, it can be according to every frame The factors such as sampling time interval, pitch, environment adjust the size of decline threshold value.If non real-time analysis, but it is directed to complete sound The analysis of misjudgement of frequency file, step can insert in step (3) judge pitch sounding every time after.

(5) according to above step obtain sounding pitch data, phonation time data, estimate speed, the mode of the music score of Chinese operas with And note type.

The principle of velocity estimation is to make the practical duration of each note as close as possible to the note duration of estimation.Method is as follows:

1) pre-set velocity range, the velocity interval of general melody are 30 ~ 240 four points of bats per minute；2) it is directed to each speed Angle value estimates the when long type of the note according to the time interval of each pitch sounding, and long type is whole note, two when limiting Dieresis, crotchet, quaver, semiquaver, the time span range of long type when by rationally drafting every class, such as It is more than the duration that extra blank is then filled with whole tone rest of whole note that fruit, which has duration, all notes can all be concluded accordingly To it is above when long type in；3) deviation that each practical duration of note and the speed subscript are grown on time is calculated, wherein being note Practical duration is standard duration；4) more all speed lower deviation value summations take the speed under minimum deviation value for estimation speed Degree.

The principle of mode estimation is to occur increase and decrease sound less as far as possible, next makes common five under the mode Sound (do, re, mi, so, la) ratio is most, is finally to try to few rising-falling tone.Method is as follows:

1) it (since each big tune can be equivalent to a ditty, therefore directly fixes for 12 big ditty and is judged as that certain is big Adjust), judge the increase and decrease sound number n occurred under the mode, five notes of traditional Chinese music number m, the mode lifting number d (positive number expression sharp number, Negative number representation flat number)；2) the increase and decrease the smallest mode of sound number is filtered out, is met simultaneously if there is two or more modes, Then continue to screen；3) the largest number of modes of the further screening five notes of traditional Chinese music, if still remain two or more modes while according with It closes, then continues to screen；4) further screening lifting number is least, if finally still remaining the same two of lifting number Mode then filters out the mode of rising tune.

The principle of note type estimation is to record faithfully note duration and make the music score of Chinese operas beautiful, few to occur across trifle Note.Method is as follows:

1) limit when long type has whole note, minim to add some points, minim, crotchet are added some points, crotchet, Quaver adds some points, quaver, semiquaver, and long type corresponds to true duration range when defining each, all notes When concluding above in long type；2) according to note sequence, appropriate number of note is concluded into a trifle.If there is sound According with duration is more than whole note, then allow the note duration to become reaching small section end before when long type, remaining duration filled out with rest It fills；Trifle is crossed over if there is note, and end point then changes into the note duration within eight points of bats that trifle starts Previous small section end；If there is note end point within the last one the eight points bats that trifle ends up, and it is next Note duration then allows next note to occur directly in the beginning of next trifle than or equal to crotchet.

According to above step, it can finally realize identification pitch and estimate the speed of the music score of Chinese operas, mode, note information, generate The music score of Chinese operas.

Compared with prior art, the invention has the following advantages:

(1) present invention uses pitch identification technology, using completely new frequency spectrum analysis method, can meet wanting for multitone identification It asks, applies to cell phone software, in insertion type equipment.

(2) present invention, which has created one kind, inversely to estimate the technology of the original music score of Chinese operas according to pitch recognition result, final complete The whole method realized automatic identification audio music and automatically form the music score of Chinese operas.This technology can be applied to composition creation automatically at spectrum, musical instrument The scenes such as playing practice inspection.

(3) present invention all can analyze for real-time audio stream and complete audio file, and operation and realization are all fairly simple, And result is efficiently credible, and compatible a variety of musical instruments have substantive distinguishing features outstanding and significant progress.

Specific embodiment

Below with reference to embodiment, the invention will be further described, and embodiments of the present invention include but is not limited to following reality Apply example.

Embodiment

It can identify that music at the method for spectrum, all can analyze, this reality automatically for real-time audio stream and complete audio file It applies example to explain with the analysis of real-time audio flow point, sample rate 44100HZ is sample of one group of carry out point with 2048 sample points Analysis, i.e., about 0.04644 second interval time (sample spaced by time=1 second ÷ sample rate) of each sample analysis.

(1) before recognition, environmental noise can first be sampled, the duration is 0.5 second, i.e., there are about 10 samples point Data are analysed, the specific method is as follows:

Each sample analysis data is that (X-axis is frequency, and Y-axis is energy for the spectrum energy Distribution value in the sample time point region Value), and the peaceful mean square deviation of mean value u (x) that thus can calculate 10 samples, wherein N=10, are that i-th sample analysis exists The corresponding energy value of frequency x is mean value of the corresponding energy value of frequency x in 10 sampling analyses.Thus sounding can be calculated The distribution of energy threshold TTL, wherein P=3 is taken, Q=1.

(2) then start to identify music in real time, check whether there is pronunciation trend in real time, the specific method is as follows:

In sample analysis each later, frequency plane product S will be calculated for its frequency spectrum, formula is, wherein M represents every The sample point number of secondary sample analysis, i.e. M=2048, L are the frequency accuracy in sample analysis, frequency accuracy=2 ÷ of sample rate ÷ Sample point number, about L=10.77, by comparing the frequency plane of sample analysis real-time frequency spectrum area S and sounding energy threshold Product, if, then it is assumed that there is pronunciation trend at this time, Attl is adjustable empirical value.

(3) it is checked then for the pitch value of each possible sounding, pays attention to that all standard pitches might not be directed to Value is checked, if known sounding musical instrument can not issue certain pitch values, then can not be checked the pitch value, specifically Method it is as follows:

Wherein, the calculation formula of confidence level, wherein K represents maximum harmonics multiple, generally takes K=10, is near the harmonics The peak value of existing section energy, the section for generally defining the harmonics are, wherein i.e. fundamental frequency, p, which is represented, allows harmonics to deviate fundamental frequency Percentage, general harmonics multiple is bigger, deviates more to the right.Then by (current confidence value L is greater than the confidence of former frame The certain multiple of angle value), (confidence value of former frame will be in the certain proportion model of the confidence level mean value in a bit of time before In enclosing, i.e., the confidence level of former frame is in steady or decline state), judge whether each potential pitch pronounces, if met above Condition, then it is assumed that pitch sounding at this moment.Wherein according to taking empirical value, according to potential pitch number, it is assumed that sounding musical instrument is 88 key pianos, potential pitch number are 88, then take, and circulation executes judgement, until judging all sounding sounds for meeting condition Until height.

(4) if it find that there is pitch sounding, 3 sample analysis points of the meeting after sounding pitch check whether erroneous judgement, that is, miss Sentence the review time for about 0.14 second, Rule of judgment can be reduced to it is discrete under middle i=1,2,3；0.5,0.6,0.7 is taken respectively.

(5) finally, according to the sounding pitch data and phonation time data recognized, estimate speed, the mode of the music score of Chinese operas with And note type, generate the music score of Chinese operas.The effective note type wherein estimated can increase and decrease according to actual needs.

The principle of velocity estimation is: making the practical duration of each note as close as possible to the note duration of estimation.Method is as follows:

1) pre-set velocity range, the velocity amplitude range of general melody are 30 ~ 240 four points of bats per minute；

2) the when long type of the note is estimated according to the time interval of each pitch sounding for each velocity amplitude, limited When long type be whole note, minim, crotchet, quaver, semiquaver, by rationally drafting every class duration class The time span range of type, if there is duration is more than the duration for then filling extra blank with whole tone rest of whole note, accordingly All notes can all be concluded to it is above when long type in；

3) deviation that each practical duration of note and the speed subscript are grown on time is calculated, wherein be the practical duration of note, It is standard duration；

4) more all speed lower deviation value summations, taking the speed under minimum deviation value is estimated speed.

The principle of mode estimation is: occurring increase and decrease sound less as far as possible, next makes common five under the mode Sound (do, re, mi, so, la) ratio is most, is finally to try to few rising-falling tone.Method is as follows:

1) it (since each big tune can be equivalent to a ditty, therefore directly fixes for 12 big ditty and is judged as that certain is big Adjust), judge the increase and decrease sound number n occurred under the mode, five notes of traditional Chinese music number m, the mode lifting number d (positive number expression sharp number, Negative number representation flat number)；

2) the increase and decrease the smallest mode of sound number is filtered out, meets simultaneously if there is two or more modes, then continues to sieve Choosing；

3) the largest number of modes of the further screening five notes of traditional Chinese music, if still remain two or more modes while meeting, Then continue to screen；

4) further screening lifting number is least, if finally still remaining two the same modes of lifting number, Then filter out the mode of rising tune.

The principle of note type estimation is: it records faithfully note duration and makes the music score of Chinese operas beautiful, it is few to occur across trifle Note.Method is as follows:

1) limit when long type has whole note, minim to add some points, minim, crotchet are added some points, crotchet, Quaver adds some points, quaver, semiquaver, and long type corresponds to true duration range when defining each, all notes When concluding above in long type；

2) according to note sequence, appropriate number of note is concluded into a trifle.It is more than complete if there is note duration Note, then allow the note duration to become reaching small section end before when long type, remaining duration filled with rest；If there is sound Symbol crosses over trifle, and end point is then changed into the note duration in previous small nodule within eight points of bats that trifle starts Beam；If there is note end point within the last one the eight points bats that trifle ends up, and next note duration is more than Or be equal to crotchet, then allow next note to occur directly in the beginning of next trifle.

According to above-described embodiment, the present invention can be realized well.It is worth noting that before based on said structure design It puts, to solve same technical problem, even if that makes in the present invention is some without substantive change or polishing, is used Technical solution essence still as the present invention, therefore it should also be as within the scope of the present invention.

Claims

1. can identify music automatically at the method for spectrum, which comprises the steps of:

(1) it identifies audio, tracks the variation of overall spectrum, check whether there is pronunciation trend in real time；

(2) spectral change for tracking each pitch, checks which is pitch sounding in real time；

(3) continue the spectral change of tracking sounding pitch, the pitch sounding before checking judges whether it is erroneous judgement；

(4) sounding pitch data, the phonation time data obtained according to above step, estimate speed, mode and the sound of the music score of Chinese operas Type is accorded with, the music score of Chinese operas is generated.

2. according to claim 1 can identify music automatically at the method for spectrum, which is characterized in that before the step (1), also It need to make environmental noise spectrum analysis, method particularly includes:

(L1) energy value unit takes decibel, and the sampling of a period of time is carried out to environmental noise, records environmental noise in this time The standard deviation V of interior average frequency spectrum energy Distribution value A and each frequency band energy in Annual distribution；

(L2) the sounding energy threshold TTL of each frequency band is drafted according to average frequency spectrum energy Distribution value A and standard deviation V, TTL=AxP+VxQ, P, Q are fixed value.

3. according to claim 2 can identify music automatically at the method for spectrum, which is characterized in that in the step (1), hair The inspection method of sound trend are as follows:

(11) effective articulation frequency range is set as 20HZ~20000HZ；

(12) the area Area that current spectral in the range is more than sounding energy threshold is calculated；

(13) if area Area is greater than preset area threshold Attl, then it is assumed that audio has pronunciation trend at this time, no Then, it is believed that present video is still mute.

4. according to claim 3 can identify music automatically at the method for spectrum, which is characterized in that in the step (2), sound The inspection method of high sounding are as follows:

(21) it is directed to the pitch of each potential sounding, the fundamental frequency and harmonics of analyzing the pitch whether there is the peak value of section energy, And the peak value has to be larger than sounding energy threshold and is just included into calculating；

(22) whether there is according to peak value and the size of peak value, calculate each pitch may sounding confidence level L；

(23) check whether the maximum pitch of confidence level meets condition, if meeting condition, then it is assumed that pitch sounding at this moment, together When eliminate the fundamental frequency of the pitch and influence of the peak value to other pitches of harmonics；

(24) the confidence level L for calculating other pitches, the judgement to remaining confidence level maximum pitch circulation conditions above, continues to seek The pitch of sounding is looked for, until termination when cannot meet condition.

5. according to claim 4 can identify music automatically at the method for spectrum, which is characterized in that in the step (23), The condition that the maximum pitch of confidence level need to meet simultaneously are as follows:

A, the confidence value L of the pitch becomes sharp increase state from steady or decline state suddenly at this time, and current confidence value L is wanted Greater than the certain multiple of the confidence value of former frame；The confidence value of former frame will it for the previous period in confidence level mean value Certain proportion within the scope of, i.e., the confidence level of former frame is in steady or decline state；

B, whether the ratio that confidence level maximum value accounts for all pitch confidence value summations is greater than threshold value.

6. according to claim 5 can identify music automatically at the method for spectrum, which is characterized in that in the step (3), inspection Look into the method whether judged by accident are as follows:

(31) setting time length t is the time range for checking erroneous judgement；

(32) if it find that the confidence value decaying in t this period of sounding pitch is too fast, then it is assumed that the pitch sounding is erroneous judgement.

7. according to claim 6 can identify music automatically at the method for spectrum, which is characterized in that in the step (4), estimate The method for calculating the speed of the music score of Chinese operas are as follows:

(411) pre-set velocity range is 30~240 four points of bats per minute；

(412) the when long type of the note is estimated according to the time interval of each pitch sounding for each velocity amplitude, limited When long type be whole note, minim, crotchet, quaver, semiquaver, by rationally drafting every class duration class The time span range of type, if there is duration is more than the duration for then filling extra blank with whole tone rest of whole note, accordingly All notes can all be concluded to it is above when long type in；

(413) deviation that each practical duration of note and the speed subscript are grown on time is calculated；

(414) more all speed lower deviation value summations, taking the speed under minimum deviation value is estimated speed；

The method for estimating music score of Chinese operas mode are as follows:

(421) 12 big ditties are directed to, judge the increase and decrease sound number n occurred under the mode, five notes of traditional Chinese music number m, mode lifting Number d, positive number indicate sharp number, negative number representation flat number；

(422) the increase and decrease the smallest mode of sound number is filtered out, meets simultaneously if there is two or more modes, then continues to sieve Choosing；

(423) the largest number of modes of the further screening five notes of traditional Chinese music, if still remain two or more modes while meeting, Continue to screen；

(424) further screening lifting number is least, if finally still remaining two the same modes of lifting number, Filter out the mode of rising tune；

The method for estimating music score of Chinese operas note type are as follows:

(431) long type has whole note, minim to add some points, minim, crotchet are added some points, crotchet, eight when limiting Dieresis adds some points, quaver, semiquaver, and long type corresponds to true duration range when defining each, and all notes are returned Receive to it is above when long type in；

(432) according to note sequence, appropriate number of note is concluded into a trifle, is more than whole tone if there is note duration Symbol, then allow the note duration to become reaching small section end before when long type, remaining duration filled with rest；If there is note Across trifle, and end point is then changed into the note duration in previous small section end within eight points of bats that trifle starts； If there is note end point within the last one the eight points bats that trifle ends up, and next note duration is more than or waits In crotchet, then next note is allowed to occur directly in the beginning of next trifle.