CN105227763B

CN105227763B - A kind of instrumental audio real time method for segmenting realized on Intelligent mobile equipment

Info

Publication number: CN105227763B
Application number: CN201510549631.3A
Authority: CN
Inventors: 冷娇娇; 赵彤洲; 方晖; 李翔; 李碧; 翟畅
Original assignee: Wuhan Institute of Technology
Current assignee: Wuhan Institute of Technology
Priority date: 2015-08-31
Filing date: 2015-08-31
Publication date: 2018-03-20
Anticipated expiration: 2035-08-31
Also published as: CN105227763A

Abstract

The invention discloses a kind of instrumental audio real time method for segmenting realized on Intelligent mobile equipment, this method can realize the simultaneously operating of recording and segmentation on the mobile apparatus.In order to reach real-time effect, the present invention determines the presence of true trip point after Quick Pretreatment has been carried out by the short-time energy for calculating signal in fixed length window and the mutation for finding average energy value so that is treated as possibility in collection.For the audio signal of instruments sound, the present invention can be while user's shoegaze, the audio signal collected is split, and quickly shown, carrying out the deep level of processing such as audio identification, the semantic structuring information of acquisition, progress audio retrieval for user prepares.

Description

A kind of instrumental audio real time method for segmenting realized on Intelligent mobile equipment

Technical field

The present invention relates to, more particularly to a kind of instrumental audio real time method for segmenting realized on Intelligent mobile equipment.

Background technology

Audio segmentation is the steps necessary for carrying out audio identification, obtaining semantic structuring information, carrying out audio retrieval.Tradition Audio frequency splitting method be just to start to split audio after whole audio-frequency informations is obtained, that is, be not carried out recording and point The simultaneously operating cut.Conventional audio dividing method is calculated at PC ends, because PC does not possess portability, is limited in intelligence It is some on mobile communication equipment to need to carry out using and promoting for audio segmentation application.

The content of the invention

The defects of the technical problem to be solved in the present invention is to be directed in the prior art, there is provided one kind is in Intelligent mobile equipment The instrumental audio real time method for segmenting of upper realization.

The technical solution adopted for the present invention to solve the technical problems is：A kind of musical instrument realized on Intelligent mobile equipment Audio real time method for segmenting, comprises the following steps：

1) voice data is gathered：Obtain the voice data of musical instrument in real time by the sound pick-up outfit on mobile communication equipment；

2) the Jing Yin elimination in front end：Eliminate the quiet data of beginning in the voice data of collection；

3) voice data after elimination quiet data is handled in real time, specifically included：

3.1) preemphasis digital filtering：Preemphasis processing sum is carried out to the audio signal by the Jing Yin Processing for removing in front end Word filters；

3.2) framing：Continuous framing is seamlessly transitted using alternately framing method；Assuming that the length wlen of a frame, Step-length is inc, then the repeating signal length of adjacent two frame is wlen-inc；

3.3) short-time energy is calculated：To each frame signal after framing, short-time energy is calculated：Wherein join Number n is length of window, and x (m) is signal sequence.

3.4) background sound eliminates：If frame signal energy value is less than 0.1 times of audio signal ceiling capacity, the frame signal is Background sound, background sound is deleted；

Wherein, the audio signal ceiling capacity of audio signal ceiling capacity at that time, and audio signal ceiling capacity is real-time Renewal；

3.5) cut-point is inquired about：To the average energy value per adjacent 3 frameIt is compared, if there is larger gap, then recognizes Catastrophe point to be present, catastrophe point is cut-point, that is, is met：Wherein, Th is the threshold value of adjacent 3 frame before and after judging Coefficient；

4) the instrumental audio Real-time segmentation that the position according to where cut-point is realized.

By such scheme, original audio signal front end amplitude is set in the step 2) less than 10^-4Signal be Jing Yin

By such scheme, in the step 3.1), preemphasis processing is carried out to sample according to 6dB/oct ratio；Numeral Filtering uses below equation：Between y (n)=x (n)-a*x (n-1), wherein parameter a values are 0.9~1, before x (n) is filtering Sampled signal, y (n) are filtered signal.

By such scheme, in the step 3.2), framing strategy is as follows：Frame length L=m × 256 are set, wherein m is 1~16 In arbitrary integer, frame walk length takes 0.3~0.7 times of frame length L.

By such scheme, in the step 3.5), between Th spans are 3~7.

The beneficial effect comprise that：The inventive method is directed to the instrumental audio Real-time segmentation of mobile terminal, is adapted to move Moved end is handled, and can reach the advantages of speed is fast and consumption resource is few, can realize and the cutting operation in recording is carried out to audio-source, non- Often it is applied to audio Real-time segmentation class application.

Brief description of the drawings

Below in conjunction with drawings and Examples, the invention will be further described, in accompanying drawing：

Fig. 1 is the method flow diagram of the embodiment of the present invention；

Fig. 2 is the flow chart that short-time energy is calculated in the embodiment of the present invention；

Fig. 3 is the flow chart that cut-point is inquired about in the embodiment of the present invention.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that specific embodiment described herein is not used to limit only to explain the present invention The fixed present invention.

As shown in figure 1, a kind of instrumental audio real time method for segmenting realized on Intelligent mobile equipment, including following step Suddenly：

1) voice data is gathered：The voice data that musical instrument plays is obtained by the sound pick-up outfit on mobile communication equipment in real time； Sample frequency is 44.1KHz or 22.1KHz, and audio signal files form is mp3.

2) the Jing Yin elimination in front end：Eliminate the quiet data of beginning in the voice data of collection；In sampling process, by A period of time is needed in recording preparation, causing the front end of sampled data to have does not have musical sounding in some time, that is, the standard played In the standby stage, the background sound that the amplitude that this section gathers is similar to 0 is exactly quiet data.Obviously this part dumb data should be eliminated, To remove redundancy as calculating speed can be improved.The system thinks that original audio signal front end amplitude is less than 10^-4Signal be quiet Sound, under the required user environment of the present invention, such quiet data may be considered completely ineffective interference data, should be all Deleted.

3.1) preemphasis digital filtering：To carrying out digital filtering by the audio signal of the Jing Yin Processing for removing in front end；Due to Audio signal has decay in transmitting procedure, therefore carries out preemphasis processing to sample according to 6dB/oct ratio so that defeated Going out level mutually will not produce relatively large deviation, while can also eliminate low-frequency disturbance.Formula is：Y (n)=x (n)-a*x (n-1), ginseng Number between number a values 0.9~1.

3.2) framing：Audio sample attribute changes more gently, within the time of this smooth variation in the short period of time The audio frequency characteristics extracted keep basicly stable.Therefore, the present invention is carried out smooth using alternately framing method to continuous framing Transition, to keep its continuity.Frame length L=m × 256 are set, wherein m is the arbitrary integer in 1~16, and frame walk length takes frame length 0.3~0.7 times of L, i.e., 70%~30% duplicate data per two adjacent frames all be present, the leakage of cut-point can be reduced with this Inspection rate and false drop rate.

3.3) short-time energy is calculated：To each frame signal after framing, short-time energy is calculated：Parameter n For length of window.

3.4) background sound eliminates：The energy of background sound is small compared with playing music, in a stable environment, it is believed that works as energy value It is background sound during less than 0.1 times of audio signal ceiling capacity, this part background sound is deleted, speed is calculated to improve Degree.

3.5) cut-point is inquired about：To the average energy value per adjacent 3 frameIt is compared, if there is larger gap, then recognizes Catastrophe point (cut-point) to be present, that is, meet：Wherein, Th be judge before and after adjacent 3 frame threshold coefficient, one As take between 3~7.

The present invention can use Java language to carry out specific procedure design, be broadly divided into two modules of recording and segmentation.In order to Ensure to complete the real-time processing of audio signal on mobile communication equipment, two background threads are opened in design process, respectively For recording and splitting.

1) the notice thread by the use of InitNotifications () function as initialization recording, when the buffering area of recording is expired Data are stored in allData objects afterwards, identify for the pretreatment of subsequent audio data, segmentation and finally data storage.

2) delSilence () function is established, to the deletion to front end quiet data, that is, original audio front end is deleted and shakes Width is less than 10^-4Signal.

3) filter (short [] data, int z) function is established, to be filtered to the signal after Jing Yin deletion.Filter letter Number is filter (), parameter value 0.97 or 0.98, and this filter function does convolution algorithm with the signal after Jing Yin deletion, as a result It is stored in preProcess () function.

4) during shoegaze, after a note is pressed, the amplitude of audio is gradually decayed.To prevent high frequency Sound interval is fallen, and audio signal is carried out into alternately sub-frame processing, processing function is frame (int m, double c), and wherein m is frame Long interception coefficient, c are that frame moves coefficient, and between usual frame shifting coefficient is 0.3~0.7, framing result is stored in function In frameSignal () function.

5) short-time energy is calculated to the result frameSignal () after framing, short-time energy is frameSignal () In every frame element quadratic sum, i.e.,As a result transmitted by function shortEnergy ().Idiographic flow such as Fig. 2 It is shown.

6) in order to accelerate the calculating speed of lookup cut-point, need to delete outlier before entering cut-point and searching.By After calculating short-time energy, amplification can be produced to noise caused by some outliers, therefore, it is necessary to reject these outliers. Compared with the music played, the energy of noise will be far smaller than playing music, and therefore, this method passes through many experiments, it is believed that when Energy value is noise when being less than 0.1 times of audio signal ceiling capacity, deletes this partial noise.

7) after smooth and denoising, audio variation tendency is clearly.This method is by judging being averaged per adjacent 3 frame Value is with the presence or absence of identification foundation of the large change as cut-point, it is believed that if the change of consecutive frame is clearly, then it is assumed that have One new note occurs.Function findSegment (double [] shortEnergy, the int BgTh) realizations of this process, Wherein BgTh is the threshold coefficient of background sound, generally takes 0.1.If the energy datum of a certain frame is less than the BgTh of ceiling capacity Times, then it is assumed that cut-point is not present in the frame, without carrying out cut-point inquiry.The result for meeting cut-point inquiry is to split Point, the position occurred is marked with pos [].

Idiographic flow is as shown in Figure 3.

8) the instrumental audio Real-time segmentation that the position according to where cut-point can quickly be realized, pos [] set is exactly point Location sets where cutpoint.

It should be appreciated that for those of ordinary skills, can according to the above description be improved or converted, And all these modifications and variations should all belong to the protection domain of appended claims of the present invention.

Claims

1. a kind of instrumental audio real time method for segmenting realized on Intelligent mobile equipment, it is characterised in that comprise the following steps：

2) the Jing Yin elimination in front end：Eliminate the quiet data of beginning in the voice data of collection；The quiet data is original Front end amplitude is less than 10 in voice data^-4Data；

3.1) preemphasis digital filtering：Preemphasis processing is carried out to the audio signal by the Jing Yin Processing for removing in front end and numeral is filtered Ripple；

3.2) framing：Continuous framing is seamlessly transitted using alternately framing method；

3.3) short-time energy is calculated：To each frame signal after framing, short-time energy is calculated：Wherein parameter n For length of window, x (m) is signal sequence；

3.5) inquiry cut-point：Average energy value H per adjacent 3 frame is compared, if there is larger gap, then it is assumed that exist Catastrophe point, catastrophe point are cut-point, that is, are met：H₁＞ H₂* Th, wherein, Th is the threshold coefficient of adjacent 3 frame before and after judging；

2. instrumental audio real time method for segmenting according to claim 1, it is characterised in that in the step 3.1), according to 6dB/oct ratio carries out preemphasis processing to sample；Digital filtering uses below equation：Y (n)=x (n)-a*x (n-1), its Between middle parameter a values are 0.9~1, x (n) is the sampled signal before filtering, and y (n) is filtered signal.

3. instrumental audio real time method for segmenting according to claim 1, it is characterised in that in the step 3.2), framing Strategy is as follows：Frame length L=m × 256 are set, wherein m is the arbitrary integer in 1~16, and frame walk length takes the 0.3~0.7 of frame length L Times.

4. instrumental audio real time method for segmenting according to claim 1, it is characterised in that in the step 3.5), Th takes Between value scope is 3~7.