CN107424628A - A kind of method that specific objective sound end is searched under noisy environment - Google Patents
A kind of method that specific objective sound end is searched under noisy environment Download PDFInfo
- Publication number
- CN107424628A CN107424628A CN201710670308.0A CN201710670308A CN107424628A CN 107424628 A CN107424628 A CN 107424628A CN 201710670308 A CN201710670308 A CN 201710670308A CN 107424628 A CN107424628 A CN 107424628A
- Authority
- CN
- China
- Prior art keywords
- frame
- sentence
- voice
- energy value
- independent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 11
- 238000009432 framing Methods 0.000 claims abstract description 21
- 238000001228 spectrum Methods 0.000 claims description 6
- 239000000523 sample Substances 0.000 abstract 2
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the method that specific objective sound end under a kind of noisy environment is searched for, including record multiple continuous voice sheets;Each voice the average energy value of voice sheet and the average energy value of all voice sheets in sample voice are calculated according to energy value:Framing section of its energy value more than voice the average energy value and the average energy value is obtained from each framing section, traversing of probe is carried out by sentence intermediate frame of the framing section, if the energy threshold of preamble frame or postorder frame is less than setting voice the average energy value, the frame is merged with the sentence intermediate frame by frame start sequence turns into independent sentence;Whether the frame length for judging the independent sentence is the short sentence frame length scope set;Punctuate of the independent sentence as audio for not being identified as noise sentence that each framing section of the audio is obtained.Being recorded in a manner of voice sheet to sound, initial some time piece is sampled and energy balane, judged according to the result of calculation of energy.
Description
Technical field:
The present invention relates to the side that specific objective sound end under speech processes field, more particularly to a kind of noisy environment is searched for
Method.
Background technology:
With the appearance of speech recognition technology and increasingly ripe, pass through the sample sound of advance typing specific objective, extraction
Target person one is without two phonetic feature and is stored in database, by the spy in sound and database to be verified during application
Sign is matched, so as to determine the identity of sought target.But the difference under noisy environment and under quiet environment, frequent nothing
Method accuracy of judgement, it is impossible to correctly intercept useful voice messaging, or even the minimum well below various speech recognition applications
Degree, leads to not use.
The content of the invention:
The present invention is to overcome drawbacks described above, there is provided a kind of method that specific objective sound end is searched under noisy environment,
It is sampled and energy balane being recorded in a manner of voice sheet to sound to initial some time piece, according to
The result of calculation of energy judges the beginning and end of voice, is allowed to adapt to the different parameters inspection under noisy environment and quiet environment
Mark is accurate, so as to the end points of adaptive environment detection voice.
The technical solution adopted by the present invention is:A kind of method that specific objective sound end is searched under noisy environment, bag
Include:
Step 1:Record multiple continuous voice sheets and obtain multiple framing sections as sample voice;
Step 2:According to the energy value of each framing section calculate in sample voice voice the average energy value of each voice sheet and
The average energy value of all voice sheets;
Step 3:Its energy value is obtained from each framing section more than voice the average energy value and point of the average energy value
Frame section, then the preamble frame or postorder frame of the frame are scanned using the framing section as sentence intermediate frame, if preamble frame or postorder frame
Energy threshold is less than setting voice the average energy value, then merges the frame as only by frame start sequence with the sentence intermediate frame
Vertical sentence;
Step 4:Whether the frame length for judging the independent sentence is the short sentence frame length scope set, if so, then by historical storage
Short independent sentence sample contrasted with current independent sentence, if matching degree is less than setting value, independent sentence is identified as noise
Sentence;
Step 5:Punctuate of the independent sentence as audio for not being identified as noise sentence that each framing section of the audio is obtained.
It is further preferred that the step 3 also includes:If the frame length of the independent sentence is counted beyond independent frame length is set
Spectrum entropy ratio of the independent office per frame is calculated, is two by above-mentioned independent office's style using lowest spectrum entropy than corresponding frame as cut-point
Individual independent sentence.
The beneficial effects of the invention are as follows:Being recorded in a manner of voice sheet to sound, to the initial some time
Piece is sampled and energy balane, and the beginning and end of voice is judged according to the result of calculation of energy, is allowed to adapt to noisy environment
With the different parameters examination criteria under quiet environment, so as to adaptive environment detection voice end points.
Embodiment:
Below in conjunction with the present invention, technical scheme is clearly and completely described, it is clear that described
Embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, belongs to the present invention
The scope of protection.
The method that specific objective sound end is searched under a kind of noisy environment of the present invention, including:
Step 1:Record multiple continuous voice sheets and obtain multiple framing sections as sample voice.
The present invention may be mounted on server, can also be arranged on personal computer or mobile computing device.It is alleged
Computing terminal can be server or personal computer or mobile computing device.First, to service
Device uploads audio-video document, either opens audio-video document on personal computer or mobile computing device.Afterwards, calculate
Audio stream in equipment extraction audio-video document, audio stream is uniformly arrived to fixed sampling frequency symbol single-channel data.Afterwards
Using framing parameter set in advance, sub-frame processing is carried out to data.
Step 2:According to the energy value of each framing section calculate in sample voice voice the average energy value of each voice sheet and
The average energy value of all voice sheets.
The voice-based energy value of detection of sound end is realized, first has to calculate the voice average energy of individual voice piece
Value and all voice sheets the average energy value (each voice sheet voice the average energy value summation after divided by voice sheet
Number).
Step 3:Its energy value is obtained from each framing section more than voice the average energy value and point of the average energy value
Frame section, then the preamble frame or postorder frame of the frame are scanned using the framing section as sentence intermediate frame, if preamble frame or postorder frame
Energy threshold is less than setting voice the average energy value, then merges the frame as only by frame start sequence with the sentence intermediate frame
Vertical sentence;If the frame length of the independent sentence beyond independent frame length is set, calculates spectrum entropy ratio of the independent office per frame, with lowest spectrum
Above-mentioned independent office's style, as cut-point, is two independent sentences than corresponding frame by entropy.
Step 4:Whether the frame length for judging the independent sentence is the short sentence frame length scope set, if so, then by historical storage
Short independent sentence sample contrasted with current independent sentence, if matching degree is less than setting value, independent sentence is identified as noise
Sentence.
Step 5:Punctuate of the independent sentence as audio for not being identified as noise sentence that each framing section of the audio is obtained.
In summary, the collaborative work of above-mentioned each unit, being recorded in a manner of voice sheet to sound, to initial
Some time piece sampled and energy balane, the beginning and end of voice is judged according to the result of calculation of energy, is allowed to suitable
The different parameters examination criteria under noisy environment and quiet environment is answered, so as to the end points of adaptive environment detection voice.Meanwhile
Dynamic corrections background noise energy value, allows true environment residing for the equipment of background noise energy value reflexless terminal, judges more smart
Really.
The foregoing is only a preferred embodiment of the present invention, these embodiments are all based on the present invention
Different implementations under general idea, and protection scope of the present invention is not limited thereto, it is any to be familiar with the art
Technical staff the invention discloses technical scope in, the change or replacement that can readily occur in, should all cover the present invention's
Within protection domain.Therefore, protection scope of the present invention should be defined by the protection domain of claims.
Claims (2)
1. a kind of method that specific objective sound end is searched under noisy environment, it is characterised in that:Including:
Step 1:Record multiple continuous voice sheets and obtain multiple framing sections as sample voice;
Step 2:Voice the average energy value of each voice sheet in sample voice is calculated according to the energy value of each framing section and owned
The average energy value of voice sheet;
Step 3:Framing section of its energy value more than voice the average energy value and the average energy value is obtained from each framing section,
Then the preamble frame or postorder frame of the frame are scanned using the framing section as sentence intermediate frame, if the energy valve of preamble frame or postorder frame
Value is less than setting voice the average energy value, then merging the frame by frame start sequence with the sentence intermediate frame turns into independent sentence;
Step 4:Whether the frame length for judging the independent sentence is the short sentence frame length scope set, if so, then by the short of historical storage
Independent sentence sample is contrasted with current independent sentence, if matching degree is less than setting value, independent sentence is identified as into noise sentence;
Step 5:Punctuate of the independent sentence as audio for not being identified as noise sentence that each framing section of the audio is obtained.
2. the method that specific objective sound end is searched under a kind of noisy environment according to claim 1, it is characterised in that:
Step 3 also includes:If the frame length of the independent sentence beyond independent frame length is set, calculates spectrum entropy ratio of the independent office per frame, with
Above-mentioned independent office's style, as cut-point, is two independent sentences than corresponding frame by lowest spectrum entropy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710670308.0A CN107424628A (en) | 2017-08-08 | 2017-08-08 | A kind of method that specific objective sound end is searched under noisy environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710670308.0A CN107424628A (en) | 2017-08-08 | 2017-08-08 | A kind of method that specific objective sound end is searched under noisy environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107424628A true CN107424628A (en) | 2017-12-01 |
Family
ID=60437492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710670308.0A Pending CN107424628A (en) | 2017-08-08 | 2017-08-08 | A kind of method that specific objective sound end is searched under noisy environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107424628A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106157951A (en) * | 2016-08-31 | 2016-11-23 | 北京华科飞扬科技股份公司 | Carry out automatic method for splitting and the system of audio frequency punctuate |
CN106373592A (en) * | 2016-08-31 | 2017-02-01 | 北京华科飞扬科技股份公司 | Audio noise tolerance punctuation processing method and system |
CN110232933A (en) * | 2019-06-03 | 2019-09-13 | Oppo广东移动通信有限公司 | Audio-frequency detection, device, storage medium and electronic equipment |
CN112863496A (en) * | 2019-11-27 | 2021-05-28 | 阿里巴巴集团控股有限公司 | Voice endpoint detection method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101308653A (en) * | 2008-07-17 | 2008-11-19 | 安徽科大讯飞信息科技股份有限公司 | End-point detecting method applied to speech identification system |
CN103578470A (en) * | 2012-08-09 | 2014-02-12 | 安徽科大讯飞信息科技股份有限公司 | Telephone recording data processing method and system |
CN105070287A (en) * | 2015-07-03 | 2015-11-18 | 广东小天才科技有限公司 | Method and device of detecting voice end points in a self-adaptive noisy environment |
CN106157951A (en) * | 2016-08-31 | 2016-11-23 | 北京华科飞扬科技股份公司 | Carry out automatic method for splitting and the system of audio frequency punctuate |
CN106373592A (en) * | 2016-08-31 | 2017-02-01 | 北京华科飞扬科技股份公司 | Audio noise tolerance punctuation processing method and system |
-
2017
- 2017-08-08 CN CN201710670308.0A patent/CN107424628A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101308653A (en) * | 2008-07-17 | 2008-11-19 | 安徽科大讯飞信息科技股份有限公司 | End-point detecting method applied to speech identification system |
CN103578470A (en) * | 2012-08-09 | 2014-02-12 | 安徽科大讯飞信息科技股份有限公司 | Telephone recording data processing method and system |
CN105070287A (en) * | 2015-07-03 | 2015-11-18 | 广东小天才科技有限公司 | Method and device of detecting voice end points in a self-adaptive noisy environment |
CN106157951A (en) * | 2016-08-31 | 2016-11-23 | 北京华科飞扬科技股份公司 | Carry out automatic method for splitting and the system of audio frequency punctuate |
CN106373592A (en) * | 2016-08-31 | 2017-02-01 | 北京华科飞扬科技股份公司 | Audio noise tolerance punctuation processing method and system |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106157951A (en) * | 2016-08-31 | 2016-11-23 | 北京华科飞扬科技股份公司 | Carry out automatic method for splitting and the system of audio frequency punctuate |
CN106373592A (en) * | 2016-08-31 | 2017-02-01 | 北京华科飞扬科技股份公司 | Audio noise tolerance punctuation processing method and system |
CN106157951B (en) * | 2016-08-31 | 2019-04-23 | 北京华科飞扬科技股份公司 | Carry out the automatic method for splitting and system of audio punctuate |
CN106373592B (en) * | 2016-08-31 | 2019-04-23 | 北京华科飞扬科技股份公司 | Audio holds processing method and the system of making pauses in reading unpunctuated ancient writings of making an uproar |
CN110232933A (en) * | 2019-06-03 | 2019-09-13 | Oppo广东移动通信有限公司 | Audio-frequency detection, device, storage medium and electronic equipment |
CN110232933B (en) * | 2019-06-03 | 2022-02-22 | Oppo广东移动通信有限公司 | Audio detection method and device, storage medium and electronic equipment |
CN112863496A (en) * | 2019-11-27 | 2021-05-28 | 阿里巴巴集团控股有限公司 | Voice endpoint detection method and device |
CN112863496B (en) * | 2019-11-27 | 2024-04-02 | 阿里巴巴集团控股有限公司 | Voice endpoint detection method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xiao et al. | Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge. | |
Shang et al. | Score normalization in playback attack detection | |
CN111816218B (en) | Voice endpoint detection method, device, equipment and storage medium | |
CN107424628A (en) | A kind of method that specific objective sound end is searched under noisy environment | |
CN107945790B (en) | Emotion recognition method and emotion recognition system | |
TWI473080B (en) | The use of phonological emotions or excitement to assist in resolving the gender or age of speech signals | |
CN105374352B (en) | A kind of voice activated method and system | |
CN105938716A (en) | Multi-precision-fitting-based automatic detection method for copied sample voice | |
Zhu et al. | Online speaker diarization using adapted i-vector transforms | |
CN111429935B (en) | Voice caller separation method and device | |
CN101887722A (en) | Rapid voiceprint authentication method | |
CN104103272B (en) | Audio recognition method, device and bluetooth earphone | |
US20180308501A1 (en) | Multi speaker attribution using personal grammar detection | |
US7050973B2 (en) | Speaker recognition using dynamic time warp template spotting | |
CN108091340B (en) | Voiceprint recognition method, voiceprint recognition system, and computer-readable storage medium | |
US11611581B2 (en) | Methods and devices for detecting a spoofing attack | |
CN113744742B (en) | Role identification method, device and system under dialogue scene | |
Krikke et al. | Detection of nonverbal vocalizations using gaussian mixture models: looking for fillers and laughter in conversational speech | |
Pao et al. | Combining acoustic features for improved emotion recognition in mandarin speech | |
Partila et al. | Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state | |
TWI299855B (en) | Detection method for voice activity endpoint | |
Varela et al. | Combining pulse-based features for rejecting far-field speech in a HMM-based voice activity detector | |
Harriero et al. | Analysis of the utility of classical and novel speech quality measures for speaker verification | |
Su et al. | A multitask learning framework for speaker change detection with content information from unsupervised speech decomposition | |
Yali et al. | A speech endpoint detection algorithm based on wavelet transforms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171201 |