CN107424628A - A kind of method that specific objective sound end is searched under noisy environment - Google Patents

A kind of method that specific objective sound end is searched under noisy environment Download PDF

Info

Publication number
CN107424628A
CN107424628A CN201710670308.0A CN201710670308A CN107424628A CN 107424628 A CN107424628 A CN 107424628A CN 201710670308 A CN201710670308 A CN 201710670308A CN 107424628 A CN107424628 A CN 107424628A
Authority
CN
China
Prior art keywords
frame
sentence
voice
energy value
independent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710670308.0A
Other languages
Chinese (zh)
Inventor
王贺
杨兆鹏
李莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN201710670308.0A priority Critical patent/CN107424628A/en
Publication of CN107424628A publication Critical patent/CN107424628A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the method that specific objective sound end under a kind of noisy environment is searched for, including record multiple continuous voice sheets;Each voice the average energy value of voice sheet and the average energy value of all voice sheets in sample voice are calculated according to energy value:Framing section of its energy value more than voice the average energy value and the average energy value is obtained from each framing section, traversing of probe is carried out by sentence intermediate frame of the framing section, if the energy threshold of preamble frame or postorder frame is less than setting voice the average energy value, the frame is merged with the sentence intermediate frame by frame start sequence turns into independent sentence;Whether the frame length for judging the independent sentence is the short sentence frame length scope set;Punctuate of the independent sentence as audio for not being identified as noise sentence that each framing section of the audio is obtained.Being recorded in a manner of voice sheet to sound, initial some time piece is sampled and energy balane, judged according to the result of calculation of energy.

Description

A kind of method that specific objective sound end is searched under noisy environment
Technical field:
The present invention relates to the side that specific objective sound end under speech processes field, more particularly to a kind of noisy environment is searched for Method.
Background technology:
With the appearance of speech recognition technology and increasingly ripe, pass through the sample sound of advance typing specific objective, extraction Target person one is without two phonetic feature and is stored in database, by the spy in sound and database to be verified during application Sign is matched, so as to determine the identity of sought target.But the difference under noisy environment and under quiet environment, frequent nothing Method accuracy of judgement, it is impossible to correctly intercept useful voice messaging, or even the minimum well below various speech recognition applications Degree, leads to not use.
The content of the invention:
The present invention is to overcome drawbacks described above, there is provided a kind of method that specific objective sound end is searched under noisy environment, It is sampled and energy balane being recorded in a manner of voice sheet to sound to initial some time piece, according to The result of calculation of energy judges the beginning and end of voice, is allowed to adapt to the different parameters inspection under noisy environment and quiet environment Mark is accurate, so as to the end points of adaptive environment detection voice.
The technical solution adopted by the present invention is:A kind of method that specific objective sound end is searched under noisy environment, bag Include:
Step 1:Record multiple continuous voice sheets and obtain multiple framing sections as sample voice;
Step 2:According to the energy value of each framing section calculate in sample voice voice the average energy value of each voice sheet and The average energy value of all voice sheets;
Step 3:Its energy value is obtained from each framing section more than voice the average energy value and point of the average energy value Frame section, then the preamble frame or postorder frame of the frame are scanned using the framing section as sentence intermediate frame, if preamble frame or postorder frame Energy threshold is less than setting voice the average energy value, then merges the frame as only by frame start sequence with the sentence intermediate frame Vertical sentence;
Step 4:Whether the frame length for judging the independent sentence is the short sentence frame length scope set, if so, then by historical storage Short independent sentence sample contrasted with current independent sentence, if matching degree is less than setting value, independent sentence is identified as noise Sentence;
Step 5:Punctuate of the independent sentence as audio for not being identified as noise sentence that each framing section of the audio is obtained.
It is further preferred that the step 3 also includes:If the frame length of the independent sentence is counted beyond independent frame length is set Spectrum entropy ratio of the independent office per frame is calculated, is two by above-mentioned independent office's style using lowest spectrum entropy than corresponding frame as cut-point Individual independent sentence.
The beneficial effects of the invention are as follows:Being recorded in a manner of voice sheet to sound, to the initial some time Piece is sampled and energy balane, and the beginning and end of voice is judged according to the result of calculation of energy, is allowed to adapt to noisy environment With the different parameters examination criteria under quiet environment, so as to adaptive environment detection voice end points.
Embodiment:
Below in conjunction with the present invention, technical scheme is clearly and completely described, it is clear that described Embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, belongs to the present invention The scope of protection.
The method that specific objective sound end is searched under a kind of noisy environment of the present invention, including:
Step 1:Record multiple continuous voice sheets and obtain multiple framing sections as sample voice.
The present invention may be mounted on server, can also be arranged on personal computer or mobile computing device.It is alleged Computing terminal can be server or personal computer or mobile computing device.First, to service Device uploads audio-video document, either opens audio-video document on personal computer or mobile computing device.Afterwards, calculate Audio stream in equipment extraction audio-video document, audio stream is uniformly arrived to fixed sampling frequency symbol single-channel data.Afterwards Using framing parameter set in advance, sub-frame processing is carried out to data.
Step 2:According to the energy value of each framing section calculate in sample voice voice the average energy value of each voice sheet and The average energy value of all voice sheets.
The voice-based energy value of detection of sound end is realized, first has to calculate the voice average energy of individual voice piece Value and all voice sheets the average energy value (each voice sheet voice the average energy value summation after divided by voice sheet Number).
Step 3:Its energy value is obtained from each framing section more than voice the average energy value and point of the average energy value Frame section, then the preamble frame or postorder frame of the frame are scanned using the framing section as sentence intermediate frame, if preamble frame or postorder frame Energy threshold is less than setting voice the average energy value, then merges the frame as only by frame start sequence with the sentence intermediate frame Vertical sentence;If the frame length of the independent sentence beyond independent frame length is set, calculates spectrum entropy ratio of the independent office per frame, with lowest spectrum Above-mentioned independent office's style, as cut-point, is two independent sentences than corresponding frame by entropy.
Step 4:Whether the frame length for judging the independent sentence is the short sentence frame length scope set, if so, then by historical storage Short independent sentence sample contrasted with current independent sentence, if matching degree is less than setting value, independent sentence is identified as noise Sentence.
Step 5:Punctuate of the independent sentence as audio for not being identified as noise sentence that each framing section of the audio is obtained.
In summary, the collaborative work of above-mentioned each unit, being recorded in a manner of voice sheet to sound, to initial Some time piece sampled and energy balane, the beginning and end of voice is judged according to the result of calculation of energy, is allowed to suitable The different parameters examination criteria under noisy environment and quiet environment is answered, so as to the end points of adaptive environment detection voice.Meanwhile Dynamic corrections background noise energy value, allows true environment residing for the equipment of background noise energy value reflexless terminal, judges more smart Really.
The foregoing is only a preferred embodiment of the present invention, these embodiments are all based on the present invention Different implementations under general idea, and protection scope of the present invention is not limited thereto, it is any to be familiar with the art Technical staff the invention discloses technical scope in, the change or replacement that can readily occur in, should all cover the present invention's Within protection domain.Therefore, protection scope of the present invention should be defined by the protection domain of claims.

Claims (2)

1. a kind of method that specific objective sound end is searched under noisy environment, it is characterised in that:Including:
Step 1:Record multiple continuous voice sheets and obtain multiple framing sections as sample voice;
Step 2:Voice the average energy value of each voice sheet in sample voice is calculated according to the energy value of each framing section and owned The average energy value of voice sheet;
Step 3:Framing section of its energy value more than voice the average energy value and the average energy value is obtained from each framing section, Then the preamble frame or postorder frame of the frame are scanned using the framing section as sentence intermediate frame, if the energy valve of preamble frame or postorder frame Value is less than setting voice the average energy value, then merging the frame by frame start sequence with the sentence intermediate frame turns into independent sentence;
Step 4:Whether the frame length for judging the independent sentence is the short sentence frame length scope set, if so, then by the short of historical storage Independent sentence sample is contrasted with current independent sentence, if matching degree is less than setting value, independent sentence is identified as into noise sentence;
Step 5:Punctuate of the independent sentence as audio for not being identified as noise sentence that each framing section of the audio is obtained.
2. the method that specific objective sound end is searched under a kind of noisy environment according to claim 1, it is characterised in that: Step 3 also includes:If the frame length of the independent sentence beyond independent frame length is set, calculates spectrum entropy ratio of the independent office per frame, with Above-mentioned independent office's style, as cut-point, is two independent sentences than corresponding frame by lowest spectrum entropy.
CN201710670308.0A 2017-08-08 2017-08-08 A kind of method that specific objective sound end is searched under noisy environment Pending CN107424628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710670308.0A CN107424628A (en) 2017-08-08 2017-08-08 A kind of method that specific objective sound end is searched under noisy environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710670308.0A CN107424628A (en) 2017-08-08 2017-08-08 A kind of method that specific objective sound end is searched under noisy environment

Publications (1)

Publication Number Publication Date
CN107424628A true CN107424628A (en) 2017-12-01

Family

ID=60437492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710670308.0A Pending CN107424628A (en) 2017-08-08 2017-08-08 A kind of method that specific objective sound end is searched under noisy environment

Country Status (1)

Country Link
CN (1) CN107424628A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157951A (en) * 2016-08-31 2016-11-23 北京华科飞扬科技股份公司 Carry out automatic method for splitting and the system of audio frequency punctuate
CN106373592A (en) * 2016-08-31 2017-02-01 北京华科飞扬科技股份公司 Audio noise tolerance punctuation processing method and system
CN110232933A (en) * 2019-06-03 2019-09-13 Oppo广东移动通信有限公司 Audio-frequency detection, device, storage medium and electronic equipment
CN112863496A (en) * 2019-11-27 2021-05-28 阿里巴巴集团控股有限公司 Voice endpoint detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308653A (en) * 2008-07-17 2008-11-19 安徽科大讯飞信息科技股份有限公司 End-point detecting method applied to speech identification system
CN103578470A (en) * 2012-08-09 2014-02-12 安徽科大讯飞信息科技股份有限公司 Telephone recording data processing method and system
CN105070287A (en) * 2015-07-03 2015-11-18 广东小天才科技有限公司 Method and device of detecting voice end points in a self-adaptive noisy environment
CN106157951A (en) * 2016-08-31 2016-11-23 北京华科飞扬科技股份公司 Carry out automatic method for splitting and the system of audio frequency punctuate
CN106373592A (en) * 2016-08-31 2017-02-01 北京华科飞扬科技股份公司 Audio noise tolerance punctuation processing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308653A (en) * 2008-07-17 2008-11-19 安徽科大讯飞信息科技股份有限公司 End-point detecting method applied to speech identification system
CN103578470A (en) * 2012-08-09 2014-02-12 安徽科大讯飞信息科技股份有限公司 Telephone recording data processing method and system
CN105070287A (en) * 2015-07-03 2015-11-18 广东小天才科技有限公司 Method and device of detecting voice end points in a self-adaptive noisy environment
CN106157951A (en) * 2016-08-31 2016-11-23 北京华科飞扬科技股份公司 Carry out automatic method for splitting and the system of audio frequency punctuate
CN106373592A (en) * 2016-08-31 2017-02-01 北京华科飞扬科技股份公司 Audio noise tolerance punctuation processing method and system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157951A (en) * 2016-08-31 2016-11-23 北京华科飞扬科技股份公司 Carry out automatic method for splitting and the system of audio frequency punctuate
CN106373592A (en) * 2016-08-31 2017-02-01 北京华科飞扬科技股份公司 Audio noise tolerance punctuation processing method and system
CN106157951B (en) * 2016-08-31 2019-04-23 北京华科飞扬科技股份公司 Carry out the automatic method for splitting and system of audio punctuate
CN106373592B (en) * 2016-08-31 2019-04-23 北京华科飞扬科技股份公司 Audio holds processing method and the system of making pauses in reading unpunctuated ancient writings of making an uproar
CN110232933A (en) * 2019-06-03 2019-09-13 Oppo广东移动通信有限公司 Audio-frequency detection, device, storage medium and electronic equipment
CN110232933B (en) * 2019-06-03 2022-02-22 Oppo广东移动通信有限公司 Audio detection method and device, storage medium and electronic equipment
CN112863496A (en) * 2019-11-27 2021-05-28 阿里巴巴集团控股有限公司 Voice endpoint detection method and device
CN112863496B (en) * 2019-11-27 2024-04-02 阿里巴巴集团控股有限公司 Voice endpoint detection method and device

Similar Documents

Publication Publication Date Title
Xiao et al. Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge.
Shang et al. Score normalization in playback attack detection
CN111816218B (en) Voice endpoint detection method, device, equipment and storage medium
CN107424628A (en) A kind of method that specific objective sound end is searched under noisy environment
CN107945790B (en) Emotion recognition method and emotion recognition system
TWI473080B (en) The use of phonological emotions or excitement to assist in resolving the gender or age of speech signals
CN105374352B (en) A kind of voice activated method and system
CN105938716A (en) Multi-precision-fitting-based automatic detection method for copied sample voice
Zhu et al. Online speaker diarization using adapted i-vector transforms
CN111429935B (en) Voice caller separation method and device
CN101887722A (en) Rapid voiceprint authentication method
CN104103272B (en) Audio recognition method, device and bluetooth earphone
US20180308501A1 (en) Multi speaker attribution using personal grammar detection
US7050973B2 (en) Speaker recognition using dynamic time warp template spotting
CN108091340B (en) Voiceprint recognition method, voiceprint recognition system, and computer-readable storage medium
US11611581B2 (en) Methods and devices for detecting a spoofing attack
CN113744742B (en) Role identification method, device and system under dialogue scene
Krikke et al. Detection of nonverbal vocalizations using gaussian mixture models: looking for fillers and laughter in conversational speech
Pao et al. Combining acoustic features for improved emotion recognition in mandarin speech
Partila et al. Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state
TWI299855B (en) Detection method for voice activity endpoint
Varela et al. Combining pulse-based features for rejecting far-field speech in a HMM-based voice activity detector
Harriero et al. Analysis of the utility of classical and novel speech quality measures for speaker verification
Su et al. A multitask learning framework for speaker change detection with content information from unsupervised speech decomposition
Yali et al. A speech endpoint detection algorithm based on wavelet transforms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171201