CN107424628A

CN107424628A - A kind of method that specific objective sound end is searched under noisy environment

Info

Publication number: CN107424628A
Application number: CN201710670308.0A
Authority: CN
Inventors: 王贺; 杨兆鹏; 李莉
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2017-08-08
Filing date: 2017-08-08
Publication date: 2017-12-01

Abstract

The present invention relates to the method that specific objective sound end under a kind of noisy environment is searched for, including record multiple continuous voice sheets；Each voice the average energy value of voice sheet and the average energy value of all voice sheets in sample voice are calculated according to energy value：Framing section of its energy value more than voice the average energy value and the average energy value is obtained from each framing section, traversing of probe is carried out by sentence intermediate frame of the framing section, if the energy threshold of preamble frame or postorder frame is less than setting voice the average energy value, the frame is merged with the sentence intermediate frame by frame start sequence turns into independent sentence；Whether the frame length for judging the independent sentence is the short sentence frame length scope set；Punctuate of the independent sentence as audio for not being identified as noise sentence that each framing section of the audio is obtained.Being recorded in a manner of voice sheet to sound, initial some time piece is sampled and energy balane, judged according to the result of calculation of energy.

Description

A kind of method that specific objective sound end is searched under noisy environment

Technical field：

The present invention relates to the side that specific objective sound end under speech processes field, more particularly to a kind of noisy environment is searched for Method.

Background technology：

With the appearance of speech recognition technology and increasingly ripe, pass through the sample sound of advance typing specific objective, extraction Target person one is without two phonetic feature and is stored in database, by the spy in sound and database to be verified during application Sign is matched, so as to determine the identity of sought target.But the difference under noisy environment and under quiet environment, frequent nothing Method accuracy of judgement, it is impossible to correctly intercept useful voice messaging, or even the minimum well below various speech recognition applications Degree, leads to not use.

The content of the invention：

The present invention is to overcome drawbacks described above, there is provided a kind of method that specific objective sound end is searched under noisy environment, It is sampled and energy balane being recorded in a manner of voice sheet to sound to initial some time piece, according to The result of calculation of energy judges the beginning and end of voice, is allowed to adapt to the different parameters inspection under noisy environment and quiet environment Mark is accurate, so as to the end points of adaptive environment detection voice.

The technical solution adopted by the present invention is：A kind of method that specific objective sound end is searched under noisy environment, bag Include：

Step 1：Record multiple continuous voice sheets and obtain multiple framing sections as sample voice；

Step 2：According to the energy value of each framing section calculate in sample voice voice the average energy value of each voice sheet and The average energy value of all voice sheets；

Step 3：Its energy value is obtained from each framing section more than voice the average energy value and point of the average energy value Frame section, then the preamble frame or postorder frame of the frame are scanned using the framing section as sentence intermediate frame, if preamble frame or postorder frame Energy threshold is less than setting voice the average energy value, then merges the frame as only by frame start sequence with the sentence intermediate frame Vertical sentence；

Step 4：Whether the frame length for judging the independent sentence is the short sentence frame length scope set, if so, then by historical storage Short independent sentence sample contrasted with current independent sentence, if matching degree is less than setting value, independent sentence is identified as noise Sentence；

Step 5：Punctuate of the independent sentence as audio for not being identified as noise sentence that each framing section of the audio is obtained.

It is further preferred that the step 3 also includes：If the frame length of the independent sentence is counted beyond independent frame length is set Spectrum entropy ratio of the independent office per frame is calculated, is two by above-mentioned independent office's style using lowest spectrum entropy than corresponding frame as cut-point Individual independent sentence.

The beneficial effects of the invention are as follows：Being recorded in a manner of voice sheet to sound, to the initial some time Piece is sampled and energy balane, and the beginning and end of voice is judged according to the result of calculation of energy, is allowed to adapt to noisy environment With the different parameters examination criteria under quiet environment, so as to adaptive environment detection voice end points.

Embodiment：

Below in conjunction with the present invention, technical scheme is clearly and completely described, it is clear that described Embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, belongs to the present invention The scope of protection.

The method that specific objective sound end is searched under a kind of noisy environment of the present invention, including：

Step 1：Record multiple continuous voice sheets and obtain multiple framing sections as sample voice.

The present invention may be mounted on server, can also be arranged on personal computer or mobile computing device.It is alleged Computing terminal can be server or personal computer or mobile computing device.First, to service Device uploads audio-video document, either opens audio-video document on personal computer or mobile computing device.Afterwards, calculate Audio stream in equipment extraction audio-video document, audio stream is uniformly arrived to fixed sampling frequency symbol single-channel data.Afterwards Using framing parameter set in advance, sub-frame processing is carried out to data.

Step 2：According to the energy value of each framing section calculate in sample voice voice the average energy value of each voice sheet and The average energy value of all voice sheets.

The voice-based energy value of detection of sound end is realized, first has to calculate the voice average energy of individual voice piece Value and all voice sheets the average energy value (each voice sheet voice the average energy value summation after divided by voice sheet Number).

Step 3：Its energy value is obtained from each framing section more than voice the average energy value and point of the average energy value Frame section, then the preamble frame or postorder frame of the frame are scanned using the framing section as sentence intermediate frame, if preamble frame or postorder frame Energy threshold is less than setting voice the average energy value, then merges the frame as only by frame start sequence with the sentence intermediate frame Vertical sentence；If the frame length of the independent sentence beyond independent frame length is set, calculates spectrum entropy ratio of the independent office per frame, with lowest spectrum Above-mentioned independent office's style, as cut-point, is two independent sentences than corresponding frame by entropy.

Step 4：Whether the frame length for judging the independent sentence is the short sentence frame length scope set, if so, then by historical storage Short independent sentence sample contrasted with current independent sentence, if matching degree is less than setting value, independent sentence is identified as noise Sentence.

In summary, the collaborative work of above-mentioned each unit, being recorded in a manner of voice sheet to sound, to initial Some time piece sampled and energy balane, the beginning and end of voice is judged according to the result of calculation of energy, is allowed to suitable The different parameters examination criteria under noisy environment and quiet environment is answered, so as to the end points of adaptive environment detection voice.Meanwhile Dynamic corrections background noise energy value, allows true environment residing for the equipment of background noise energy value reflexless terminal, judges more smart Really.

The foregoing is only a preferred embodiment of the present invention, these embodiments are all based on the present invention Different implementations under general idea, and protection scope of the present invention is not limited thereto, it is any to be familiar with the art Technical staff the invention discloses technical scope in, the change or replacement that can readily occur in, should all cover the present invention's Within protection domain.Therefore, protection scope of the present invention should be defined by the protection domain of claims.

Claims

1. a kind of method that specific objective sound end is searched under noisy environment, it is characterised in that：Including：

Step 2：Voice the average energy value of each voice sheet in sample voice is calculated according to the energy value of each framing section and owned The average energy value of voice sheet；

Step 3：Framing section of its energy value more than voice the average energy value and the average energy value is obtained from each framing section, Then the preamble frame or postorder frame of the frame are scanned using the framing section as sentence intermediate frame, if the energy valve of preamble frame or postorder frame Value is less than setting voice the average energy value, then merging the frame by frame start sequence with the sentence intermediate frame turns into independent sentence；

Step 4：Whether the frame length for judging the independent sentence is the short sentence frame length scope set, if so, then by the short of historical storage Independent sentence sample is contrasted with current independent sentence, if matching degree is less than setting value, independent sentence is identified as into noise sentence；

2. the method that specific objective sound end is searched under a kind of noisy environment according to claim 1, it is characterised in that： Step 3 also includes：If the frame length of the independent sentence beyond independent frame length is set, calculates spectrum entropy ratio of the independent office per frame, with Above-mentioned independent office's style, as cut-point, is two independent sentences than corresponding frame by lowest spectrum entropy.