CN108538310A - It is a kind of based on it is long when power spectrum signal variation sound end detecting method - Google Patents
It is a kind of based on it is long when power spectrum signal variation sound end detecting method Download PDFInfo
- Publication number
- CN108538310A CN108538310A CN201810266002.3A CN201810266002A CN108538310A CN 108538310 A CN108538310 A CN 108538310A CN 201810266002 A CN201810266002 A CN 201810266002A CN 108538310 A CN108538310 A CN 108538310A
- Authority
- CN
- China
- Prior art keywords
- frame
- power spectrum
- long
- spectrum signal
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001228 spectrum Methods 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000003044 adaptive effect Effects 0.000 claims abstract description 9
- 238000009432 framing Methods 0.000 claims abstract description 9
- 230000009466 transformation Effects 0.000 claims description 5
- 230000003595 spectral effect Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 24
- 241000724705 Lucerne transient streak virus Species 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 239000004568 cement Substances 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
It is a kind of based on it is long when power spectrum signal variation sound end detecting method:Framing adding window is carried out to input signal;Power spectrum is calculated to the signal after framing adding window;Power spectrum signal changing value when calculating long;Using it is long when power spectrum signal changing value carry out threshold value judgement;Threshold value update is carried out, is that adaptive updates are carried out to threshold value using the threshold value court verdict of 80 frame signals in the past;Ballot judgement, current goal frame are m frames, power spectrum signal changing value L when at this time longx(m) it is codetermined by whole 1 frame signals of R before present frame and present frame, then the judgement of R subthresholds is taken part in altogether for current goal frame, each result is respectively labeled as Dm, Dm+1,…,Dm+R‑1If the result in the judgement of this R subthreshold being more than 80% is comprising speech frame, it is speech frame to adjudicate current goal frame, is otherwise noise frame;It repeats the above process until input signal terminates.The present invention can be obviously improved babble and the Detection accuracy under machine gun noise circumstances.
Description
Technical field
The present invention relates to a kind of sound end detecting methods.More particularly to it is a kind of based on it is long when power spectrum signal variation
Sound end detecting method.
Background technology
Speech terminals detection refers to that voice segments and non-speech segment are distinguished in noise circumstance, is voice coding, speech enhan-cement
With the key technology of the field of voice signal such as speech recognition.
Currently, sound end detecting method can be mainly divided into two major classes:The method [1] of feature based and be based on engineering
Practise the method with pattern-recognition.Wherein the method for feature based was extensively studied and answers because of its simple, quick the advantages that
With.
1, the speech terminals detection based on voice temporal characteristics
Early stage mainly has for the feature of speech terminals detection:Short-time energy and Average zero-crossing rate, spectrum entropy and cepstrum distance
Deng.Detection result of the such methods in the higher environment of signal-to-noise ratio is ideal, but the detection performance meeting when noise is relatively low
Drastically decline.In order to improve the noise immunity and robustness of algorithm, related scholar proposes a series of new methods.As being based on making an uproar
The sound end detecting method that sound inhibits;Merge the speech terminals detection side of Fisher linear discriminants and Mel frequency cepstral coefficients
Method etc..
2, the speech terminals detection based on voice characteristic when long
The above method mostly be voice-based short-time characteristic, do not fully consider voice it is long when change information.In order to more
Good land productivity voice it is long when characteristic, Ghosh etc., which is proposed, a kind of being based on long duration change rate signal (Long-term Signal
Variability, LTSV) feature detection method, this method have stronger noise adaptation, and Arctic ice area (-
Voice segments and non-speech segment can be still efficiently differentiated under 10dB);MaY et al. proposes to be based on long duration signal spectrum flatness
The speech terminals detection of (Long-term Spectral Flatness Measure, LSFM) feature, by estimating long duration language
Sound distinguishes voice and noise in the spectrum flatness of different frequency bands, improves in noisy voice (babble) and machine gun (machine
Gun) accuracy rate under nonstationary noises and the robustness under different noise circumstances such as.Although above two method makes an uproar to difference
Have preferable robustness under sound, but the detection performance under low signal-to-noise ratio still has the space of promotion, particularly with babble and
The slightly worse nonstationary noise of both detection performances of machine gun.
Invention content
The technical problem to be solved by the invention is to provide it is a kind of can be promoted based on it is long when phonetic feature sound end
Robustness of the detection algorithm under different noise circumstances improves the detection property under the noise circumstances such as babble and machine gun
Can based on it is long when power spectrum signal variation sound end detecting method.
The technical solution adopted in the present invention is:It is a kind of based on it is long when power spectrum signal variation speech terminals detection side
Method includes the following steps:
1) framing adding window is carried out to input signal;
2) power spectrum is calculated to the signal after framing adding window;
3) power spectrum signal changing value when calculating long;
4) using it is long when power spectrum signal changing value carry out threshold value judgement;
5) threshold value update is carried out, is that adaptive updates are carried out to threshold value using the threshold value court verdict of 80 frame signals in the past;
6) ballot judgement, current goal frame are m frames, power spectrum signal changing value L when at this time longx(m) by present frame
And whole R-1 frame signals codetermine before present frame, then take part in the judgement of R subthresholds altogether for current goal frame, every time
As a result it is respectively labeled as Dm, Dm+1,…,Dm+R-1If the result in the judgement of this R subthreshold being more than 80% is to sentence comprising speech frame
Certainly current goal frame is speech frame, is otherwise noise frame;
7) step 1)~step 6) is repeated until input signal terminates.
Step 2) is to use classical period map method by calculating the short of the signal each frame input signal x (n) respectively
When discrete Fourier transform acquire the frame signal in frequency wkPower spectrum, the i-th frame signal is in frequency wkPower spectral representation such as
Under:
In formula, NWIt indicates per frame data length, NSHIt indicates per frame data movable length, h (l) indicates that length is NWWindow letter
Number.
The specific calculating process of step 3) is as follows:
Wherein, Lx(m) indicate m frame signals it is long when power spectrum signal changing value, NFFTFourier transformation points are represented,It indicates power spectrum variation degree of the whole R frame signals at k-th of frequency point in the past, is by past whole R frame signal
Arbitrary two frame between power spectrum variable quantity at k-th of frequency point be averaging to obtain, corresponding calculation formula is as follows:
Wherein Sx(j, wk) and Sx(i, wk) power spectrum of jth frame and the i-th frame signal at k-th of frequency point is indicated respectively.
Step 4) is power spectrum signal changing value L when utilizing longx(m), it adjudicates in current whole R frame signals and whether contains language
Sound frame, if Lx(m) it is more than the threshold value of setting, expression contains speech frame, marks D at this timemIt is denoted as 1, otherwise indicates to be free of speech frame,
Mark DmIt is denoted as 0.
Step 5) is specifically to design two buffer BN(m) and BS+N(m), judgement is noise in 80 frames of storage past respectively
Frame and speech frame it is long when power spectrum signal changing value, threshold adaptive more new formula is as follows:
T (m)=α min (BS+N(m))+(1-α)max(BN(m))
α is weight parameter.
To start 50 frames as initial background noise, according to initial background noise initial threshold value:
Tinit=μN+pσN
Wherein μNAnd σNIndicate that the average value and standard deviation of power spectrum signal changing value when 50 frame ambient noises are long, p are respectively
Weighting coefficient.
The present invention it is a kind of based on it is long when power spectrum signal variation sound end detecting method, babble can be obviously improved
With the Detection accuracy under machine gun noise circumstances.By using the method for adaptive updates threshold value, it is solid to overcome tradition
Determine the poor disadvantage of threshold value environmental suitability.Through experimental test, accuracy rate of the invention is integrally better than LTSV, the end-speech of LSFM
Point detecting method.Under machine gun noise circumstances, speech terminals detection accuracy rate of the invention is substantially better than LTSV,
The sound end detecting method of LSFM, average detected accuracy rate are improved more than 10%.
Description of the drawings
Fig. 1 be the present invention it is a kind of based on it is long when power spectrum signal variation sound end detecting method flow chart;
Fig. 2 is the schematic diagram of judgement of voting in the present invention;
Fig. 3 is the VAD results under different noise circumstances.
Specific implementation mode
With reference to embodiment and attached drawing to the present invention it is a kind of based on it is long when power spectrum signal variation sound end examine
Survey method is described in detail.
The present invention it is a kind of based on it is long when power spectrum signal variation sound end detecting method, include the following steps:
1) framing adding window is carried out to input signal, since voice signal is a kind of typical non-stationary signal, but and sound
The speed of wave vibration is compared, and the movement of phonatory organ is very slow, it is generally recognized that in 10ms~30ms periods, voice signal
It is stationary signal, therefore blocks sub-frame processing to measured signal;
2) power spectrum is calculated to the signal after framing adding window;Specifically respectively to each frame input signal x (n) using classical
Period map method acquire the frame signal in frequency w by calculating the discrete Fourier transform in short-term of the signalkPower spectrum, i-th
Frame signal is in frequency wkPower spectral representation it is as follows:
In formula, NWIt indicates per frame data length, NSHIt indicates per frame data movable length, h (l) indicates that length is NWWindow letter
Number.
3) power spectrum signal changing value when calculating long;Power spectrum signal running parameter is by the current of input signal x (n) when long
The power spectrum of whole R-1 frame signals codetermines before frame and present frame, reflects the power spectrum of signal in the non-flat of past R frame
Stability.The specific calculating process of power spectrum signal changing value is as follows when long:
Wherein, Lx(m) indicate m frame signals it is long when power spectrum signal changing value, NFFTFourier transformation points are represented,It indicates power spectrum variation degree of the whole R frame signals at k-th of frequency point in the past, is by past whole R frame signal
Arbitrary two frame between power spectrum variable quantity at k-th of frequency point be averaging to obtain, corresponding calculation formula is as follows:
Wherein Sx(j, wk) and Sx(i, wk) power spectrum of jth frame and the i-th frame signal at k-th of frequency point is indicated respectively.
4) using it is long when power spectrum signal changing value carry out threshold value judgement;It is power spectrum signal changing value L when utilizing longx
(m), it adjudicates in current whole R frame signals and whether contains speech frame, if Lx(m) it is more than the threshold value of setting, expression contains speech frame,
D is marked at this timemIt is denoted as 1, otherwise indicates to be free of speech frame, marks DmIt is denoted as 0.
5) threshold value update is carried out, is that adaptive updates are carried out to threshold value using the threshold value court verdict of 80 frame signals in the past;
Specifically design two buffer BN(m) and BS+N(m), the length for noise frame and speech frame is adjudicated in 80 frames of storage past respectively
When power spectrum signal changing value, threshold adaptive more new formula is as follows:
T (m)=α min (BS+N(m))+(1-α)max(BN(m))
Best results when α is α=0.3 in weight parameter emulation experiment.
To start 50 frames as initial background noise, according to initial background noise initial threshold value:
Tinit=μN+pσN
Wherein μNAnd σNIndicate that the average value and standard deviation of power spectrum signal changing value when 50 frame ambient noises are long, p are respectively
Weighting coefficient, best results when p=3 in emulation experiment.
6) ballot judgement, due to having counted the long duration feature of signal, so carrying out needing to consider when end-point detection judgement
The information of front and back frame.Ballot judgement schematic diagram is as shown in Fig. 2, current goal frame is m frames, and power spectrum signal becomes when at this time long
Change value Lx(m) it is codetermined by whole R-1 frame signals before present frame and present frame, then R is taken part in altogether for current goal frame
Subthreshold is adjudicated, and each result is respectively labeled as Dm, Dm+1,…,Dm+R-1If in the judgement of this R subthreshold being more than 80% result
It is speech frame comprising speech frame, then to adjudicate current goal frame, is otherwise noise frame;
7) step 1)~step 6) is repeated until input signal terminates.
Specific example is given below:
According to flow chart shown in FIG. 1, to the present invention it is a kind of based on it is long when power spectrum signal variation sound end examine
Survey method carries out instance analysis, voice signal 20 speakers in TIMIT sound banks, 10 men, 10 female, each speaker couple
10 sentences are answered, and endpoint (0 represents noise segment, and 1 represents voice segments) is manually marked to each sentence.Due to sentence in TIMIT
Shorter (about 3.5 seconds), and most of is voice, therefore mute section of 1 second is added in testing before each sentence, in order to count
The characteristic parameter of noise simultaneously initializes decision threshold.Noise be selected from NOISEX-92 noises library, here select white, pink,
Tetra- kinds of noises of babble and machine gun.And the testing algorithm performance under the noise circumstance of -5,0,5 and 10dB respectively, this
In using Detection accuracy as performance indicator, Detection accuracy is defined as:
Wherein, mistake frame number includes that speech frame is mistaken for noise frame number and noise frame is mistaken for number of speech frames.
Example is specific as follows:
1, voice signal is read, and carries out framing windowing process, per 512 sampled points of frame, adds 512 points of Hamming window, frame
It is 256 sampled points to move.
2,512 Fourier transformations are carried out to every frame data after adding window, calculates every frame data power spectrum parameters Sx(i,
ωk)。
3, according to power spectrum signal Sx(i,ωk) long per frame signal of statistics when power spectrum signal changing value Lx(m), and it is sharp
With the background noise information initial threshold value T of incipient stageinit。
4, L is utilizedx(m) threshold value judgement is carried out, adjudicates in current R frame signals whether contain speech frame, if Lx(m) it is more than and sets
Determine threshold value, expression contains speech frame, at this time DmIt is denoted as 1, otherwise indicates to be free of speech frame, DmIt is denoted as 0.
5, adaptive updates are carried out to decision threshold using the threshold value court verdict of 80 frame signal of past.
6, D is utilizedmParameter is that current goal frame carries out ballot judgement.As shown in Fig. 2, the R for including target frame information
Frame threshold value is adjudicated, if the result for being more than 80% is comprising speech frame, it is speech frame to adjudicate target frame, is otherwise noise frame.
Select two sections of voices at random from TIMIT sound banks, the results are shown in Figure 3 by the VAD under 0bB noise circumstances.Its
Middle a1, b1, c1 and d1 indicate the speech waveform after white, pink, babble and machine gun noises of addition 0dB respectively
Figure, a2, b2, c2 and d2 indicate corresponding VAD results.
Under the noise circumstance of different signal-to-noise ratio, power spectrum signal when counted respectively based on LTSV, LSFM and being based on long
The speech terminals detection accuracy rate of changing value, as shown in table 1.As can be seen from the table, in white, pink and babble noise
Under environment, three kinds of method detection performances relatively, based on it is long when power spectrum signal changing value speech terminals detection accuracy rate
Slightly it is better than other two method.But under machine gun noise circumstances, the speech terminals detection accuracy rate based on LSVM is obviously excellent
In other two method.
1 result statistical form of table
Claims (5)
1. it is a kind of based on it is long when power spectrum signal variation sound end detecting method, which is characterized in that include the following steps:
1) framing adding window is carried out to input signal,;
2) power spectrum is calculated to the signal after framing adding window;
3) power spectrum signal changing value when calculating long;
4) using it is long when power spectrum signal changing value carry out threshold value judgement;
5) threshold value update is carried out, is that adaptive updates are carried out to threshold value using the threshold value court verdict of 80 frame signals in the past;
6) ballot judgement, current goal frame are m frames, power spectrum signal changing value L when at this time longx(m) by present frame and currently
Whole R-1 frame signals codetermine before frame, then take part in the judgement of R subthresholds, each result point altogether for current goal frame
Biao Ji not bem, Dm+1,…,Dm+R-1If the result in the judgement of this R subthreshold being more than 80% is comprising speech frame, judgement is current
Target frame is speech frame, is otherwise noise frame;
7) step 1)~step 6) is repeated until input signal terminates.
2. it is according to claim 1 it is a kind of based on it is long when power spectrum signal variation sound end detecting method, feature
Be, step 2) be respectively to each frame input signal x (n) using classical period map method by calculate the signal in short-term from
Scattered Fourier transformation acquires the frame signal in frequency wkPower spectrum, the i-th frame signal is in frequency wkPower spectral representation it is as follows:
In formula, NWIt indicates per frame data length, NSHIt indicates per frame data movable length, h (l) indicates that length is NWWindow function.
3. it is according to claim 1 it is a kind of based on it is long when power spectrum signal variation sound end detecting method, feature
It is, the specific calculating process of step 3) is as follows:
Wherein, Lx(m) indicate m frame signals it is long when power spectrum signal changing value, NFFTFourier transformation points are represented,Table
Show power spectrum variation degree of the whole R frame signals at k-th of frequency point in the past, is by the arbitrary of past whole R frame signal
Power spectrum variable quantity between two frames at k-th of frequency point is averaging to obtain, and corresponding calculation formula is as follows:
Wherein Sx(j, wk) and Sx(i, wk) power spectrum of jth frame and the i-th frame signal at k-th of frequency point is indicated respectively.
4. it is according to claim 1 it is a kind of based on it is long when power spectrum signal variation sound end detecting method, feature
It is, step 4) is power spectrum signal changing value L when utilizing longx(m), it adjudicates in current whole R frame signals and whether contains voice
Frame, if Lx(m) it is more than the threshold value of setting, expression contains speech frame, marks D at this timemIt is denoted as 1, otherwise indicates to be free of speech frame, mark
Remember DmIt is denoted as 0.
5. it is according to claim 1 it is a kind of based on it is long when power spectrum signal variation sound end detecting method, feature
It is, step 5) is specifically to design two buffer BN(m) and BS+N(m), respectively storage in the past in 80 frames judgement be noise frame and
Speech frame it is long when power spectrum signal changing value, threshold adaptive more new formula is as follows:
T (m)=α min (BS+N(m))+(1-α)max(BN(m))
α is weight parameter.
To start 50 frames as initial background noise, according to initial background noise initial threshold value:
Tinit=μN+pσN
Wherein μNAnd σNIndicate that the average value and standard deviation of power spectrum signal changing value when 50 frame ambient noises are long, p are weighting respectively
Coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810266002.3A CN108538310B (en) | 2018-03-28 | 2018-03-28 | Voice endpoint detection method based on long-time signal power spectrum change |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810266002.3A CN108538310B (en) | 2018-03-28 | 2018-03-28 | Voice endpoint detection method based on long-time signal power spectrum change |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108538310A true CN108538310A (en) | 2018-09-14 |
CN108538310B CN108538310B (en) | 2021-06-25 |
Family
ID=63481488
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810266002.3A Active CN108538310B (en) | 2018-03-28 | 2018-03-28 | Voice endpoint detection method based on long-time signal power spectrum change |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108538310B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109346062A (en) * | 2018-12-25 | 2019-02-15 | 苏州思必驰信息科技有限公司 | Sound end detecting method and device |
CN109545188A (en) * | 2018-12-07 | 2019-03-29 | 深圳市友杰智新科技有限公司 | A kind of real-time voice end-point detecting method and device |
CN110047470A (en) * | 2019-04-11 | 2019-07-23 | 深圳市壹鸽科技有限公司 | A kind of sound end detecting method |
CN110085264A (en) * | 2019-04-30 | 2019-08-02 | 北京儒博科技有限公司 | Voice signal detection method, device, equipment and storage medium |
CN110827858A (en) * | 2019-11-26 | 2020-02-21 | 苏州思必驰信息科技有限公司 | Voice endpoint detection method and system |
CN110890104A (en) * | 2019-11-26 | 2020-03-17 | 苏州思必驰信息科技有限公司 | Voice endpoint detection method and system |
CN111179966A (en) * | 2019-11-25 | 2020-05-19 | 泰康保险集团股份有限公司 | Voice analysis method and device, electronic equipment and storage medium |
CN111613250A (en) * | 2020-07-06 | 2020-09-01 | 泰康保险集团股份有限公司 | Long voice endpoint detection method and device, storage medium and electronic equipment |
CN112735482A (en) * | 2020-12-04 | 2021-04-30 | 珠海亿智电子科技有限公司 | Endpoint detection method and system based on combined deep neural network |
CN112967738A (en) * | 2021-02-01 | 2021-06-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Human voice detection method and device, electronic equipment and computer readable storage medium |
CN113205823A (en) * | 2021-04-12 | 2021-08-03 | 广东技术师范大学 | Lung sound signal endpoint detection method, system and storage medium |
CN113345423A (en) * | 2021-06-24 | 2021-09-03 | 科大讯飞股份有限公司 | Voice endpoint detection method and device, electronic equipment and storage medium |
WO2022242479A1 (en) * | 2021-05-17 | 2022-11-24 | 华为技术有限公司 | Three-dimensional audio signal encoding method and apparatus, and encoder |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1632816A (en) * | 2003-12-23 | 2005-06-29 | 广州可夫医疗科技有限公司 | Method for analyzing EEG fluctuation signal and equipment thereof |
US20090222258A1 (en) * | 2008-02-29 | 2009-09-03 | Takashi Fukuda | Voice activity detection system, method, and program product |
CN101814290A (en) * | 2009-02-25 | 2010-08-25 | 三星电子株式会社 | Method for enhancing robustness of voice recognition system |
CN102982811A (en) * | 2012-11-24 | 2013-03-20 | 安徽科大讯飞信息科技股份有限公司 | Voice endpoint detection method based on real-time decoding |
CN103236260A (en) * | 2013-03-29 | 2013-08-07 | 京东方科技集团股份有限公司 | Voice recognition system |
CN106558316A (en) * | 2016-11-09 | 2017-04-05 | 天津大学 | It is a kind of based on it is long when signal special frequency band rate of change detection method of uttering long and high-pitched sounds |
CN107371116A (en) * | 2017-07-21 | 2017-11-21 | 天津大学 | A kind of detection method of uttering long and high-pitched sounds based on interframe spectrum flatness deviation |
CN107393555A (en) * | 2017-07-14 | 2017-11-24 | 西安交通大学 | A kind of detecting system and detection method of low signal-to-noise ratio abnormal sound signal |
-
2018
- 2018-03-28 CN CN201810266002.3A patent/CN108538310B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1632816A (en) * | 2003-12-23 | 2005-06-29 | 广州可夫医疗科技有限公司 | Method for analyzing EEG fluctuation signal and equipment thereof |
US20090222258A1 (en) * | 2008-02-29 | 2009-09-03 | Takashi Fukuda | Voice activity detection system, method, and program product |
CN101814290A (en) * | 2009-02-25 | 2010-08-25 | 三星电子株式会社 | Method for enhancing robustness of voice recognition system |
CN102982811A (en) * | 2012-11-24 | 2013-03-20 | 安徽科大讯飞信息科技股份有限公司 | Voice endpoint detection method based on real-time decoding |
CN103236260A (en) * | 2013-03-29 | 2013-08-07 | 京东方科技集团股份有限公司 | Voice recognition system |
CN106558316A (en) * | 2016-11-09 | 2017-04-05 | 天津大学 | It is a kind of based on it is long when signal special frequency band rate of change detection method of uttering long and high-pitched sounds |
CN107393555A (en) * | 2017-07-14 | 2017-11-24 | 西安交通大学 | A kind of detecting system and detection method of low signal-to-noise ratio abnormal sound signal |
CN107371116A (en) * | 2017-07-21 | 2017-11-21 | 天津大学 | A kind of detection method of uttering long and high-pitched sounds based on interframe spectrum flatness deviation |
Non-Patent Citations (3)
Title |
---|
GHOSH, PRASANTA KUMAR ET AL.: "Robust voice activity detection using long-term signal variability", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 * |
冯璐: "基于长时特征的语音端点检测方法研究", 《万方学位论文》 * |
张君昌 等: "融合Burg谱估计与信号变化率测度的语音端点检测", 《西安电子科技大学学报(自然科学版)》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109545188A (en) * | 2018-12-07 | 2019-03-29 | 深圳市友杰智新科技有限公司 | A kind of real-time voice end-point detecting method and device |
CN109545188B (en) * | 2018-12-07 | 2021-07-09 | 深圳市友杰智新科技有限公司 | Real-time voice endpoint detection method and device |
CN109346062A (en) * | 2018-12-25 | 2019-02-15 | 苏州思必驰信息科技有限公司 | Sound end detecting method and device |
CN109346062B (en) * | 2018-12-25 | 2021-05-28 | 思必驰科技股份有限公司 | Voice endpoint detection method and device |
CN110047470A (en) * | 2019-04-11 | 2019-07-23 | 深圳市壹鸽科技有限公司 | A kind of sound end detecting method |
CN110085264B (en) * | 2019-04-30 | 2021-10-15 | 北京如布科技有限公司 | Voice signal detection method, device, equipment and storage medium |
CN110085264A (en) * | 2019-04-30 | 2019-08-02 | 北京儒博科技有限公司 | Voice signal detection method, device, equipment and storage medium |
CN111179966A (en) * | 2019-11-25 | 2020-05-19 | 泰康保险集团股份有限公司 | Voice analysis method and device, electronic equipment and storage medium |
CN110827858A (en) * | 2019-11-26 | 2020-02-21 | 苏州思必驰信息科技有限公司 | Voice endpoint detection method and system |
CN110890104A (en) * | 2019-11-26 | 2020-03-17 | 苏州思必驰信息科技有限公司 | Voice endpoint detection method and system |
CN110890104B (en) * | 2019-11-26 | 2022-05-03 | 思必驰科技股份有限公司 | Voice endpoint detection method and system |
CN111613250A (en) * | 2020-07-06 | 2020-09-01 | 泰康保险集团股份有限公司 | Long voice endpoint detection method and device, storage medium and electronic equipment |
CN112735482A (en) * | 2020-12-04 | 2021-04-30 | 珠海亿智电子科技有限公司 | Endpoint detection method and system based on combined deep neural network |
CN112735482B (en) * | 2020-12-04 | 2024-02-13 | 珠海亿智电子科技有限公司 | Endpoint detection method and system based on joint deep neural network |
CN112967738A (en) * | 2021-02-01 | 2021-06-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Human voice detection method and device, electronic equipment and computer readable storage medium |
CN113205823A (en) * | 2021-04-12 | 2021-08-03 | 广东技术师范大学 | Lung sound signal endpoint detection method, system and storage medium |
WO2022242479A1 (en) * | 2021-05-17 | 2022-11-24 | 华为技术有限公司 | Three-dimensional audio signal encoding method and apparatus, and encoder |
CN113345423A (en) * | 2021-06-24 | 2021-09-03 | 科大讯飞股份有限公司 | Voice endpoint detection method and device, electronic equipment and storage medium |
CN113345423B (en) * | 2021-06-24 | 2024-02-13 | 中国科学技术大学 | Voice endpoint detection method, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108538310B (en) | 2021-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108538310A (en) | It is a kind of based on it is long when power spectrum signal variation sound end detecting method | |
CN108922541B (en) | Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models | |
EP2083417B1 (en) | Sound processing device and program | |
CN108305639B (en) | Speech emotion recognition method, computer-readable storage medium and terminal | |
CN105825852A (en) | Oral English reading test scoring method | |
CN104078039A (en) | Voice recognition system of domestic service robot on basis of hidden Markov model | |
CN108682432B (en) | Speech emotion recognition device | |
CN103366759A (en) | Speech data evaluation method and speech data evaluation device | |
Archana et al. | Gender identification and performance analysis of speech signals | |
Jaafar et al. | Automatic syllables segmentation for frog identification system | |
CN103366735A (en) | A voice data mapping method and apparatus | |
Eringis et al. | Improving speech recognition rate through analysis parameters | |
Yutai et al. | Speaker recognition based on dynamic MFCC parameters | |
Zhao et al. | Speech recognition system based on integrating feature and HMM | |
Pohjalainen et al. | Automatic detection of anger in telephone speech with robust autoregressive modulation filtering | |
CN202758611U (en) | Speech data evaluation device | |
Varela et al. | Combining pulse-based features for rejecting far-field speech in a HMM-based voice activity detector | |
Yavuz et al. | A Phoneme-Based Approach for Eliminating Out-of-vocabulary Problem Turkish Speech Recognition Using Hidden Markov Model. | |
Slaney et al. | Pitch-gesture modeling using subband autocorrelation change detection. | |
CN111091816B (en) | Data processing system and method based on voice evaluation | |
Heese et al. | Speech-codebook based soft voice activity detection | |
Singh et al. | A comparative study of recognition of speech using improved MFCC algorithms and Rasta filters | |
Jijomon et al. | An offline signal processing technique for accurate localisation of stop release bursts in vowel-consonant-vowel utterances | |
RU2174714C2 (en) | Method for separating the basic tone | |
Joseph et al. | Indian accent detection using dynamic time warping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |