TW202226225A - Apparatus and method for improved voice activity detection using zero crossing detection - Google Patents
Apparatus and method for improved voice activity detection using zero crossing detection Download PDFInfo
- Publication number
- TW202226225A TW202226225A TW110139243A TW110139243A TW202226225A TW 202226225 A TW202226225 A TW 202226225A TW 110139243 A TW110139243 A TW 110139243A TW 110139243 A TW110139243 A TW 110139243A TW 202226225 A TW202226225 A TW 202226225A
- Authority
- TW
- Taiwan
- Prior art keywords
- signal
- zero
- voice activity
- detection
- activity detection
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/09—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being zero crossing rates
Abstract
Description
一種語音活動檢測改進設備及其方法,特別係指一種以零點交越檢測改進語音活動檢測之設備及方法。A device and method for improving voice activity detection, particularly a device and method for improving voice activity detection by zero-crossing detection.
智慧揚聲器(smart speaker)與其他聲控裝置將人類語音解釋為指令並執行對應動作。在許多狀況下,裝置能夠監聽關鍵字(如「Alexa」、「OK Google」、「OK Siri」),並能夠在關鍵字被偵測到時,監聽後續的指令。為了實現這樣的功能,裝置必須總是需要一定的電力去監聽指令。降低電力使用的一種方式是語音活動檢測(voice activity detection, VAD),也就是將噪聲由人類語音中區分開來。使用上述的方式,只有在人類語音被檢測到時才評估聲音訊號以判斷是否說出關鍵字。Smart speakers and other voice-activated devices interpret human speech as commands and perform corresponding actions. In many cases, the device can listen for keywords (such as "Alexa", "OK Google", "OK Siri"), and can listen for subsequent commands when a keyword is detected. To implement such a function, the device must always require a certain amount of power to listen for commands. One way to reduce power usage is voice activity detection (VAD), which separates noise from human speech. Using the above approach, the sound signal is evaluated to determine whether to speak a keyword only when human speech is detected.
綜上所述,可知先前技術中長期以來一直存在只有在人類語音被檢測到時才評估聲音訊號以判斷是否說出關鍵字的問題,因此有必要提出改進語音活動檢測實施的技術手段,來解決此一問題。To sum up, it can be seen that there has been a long-standing problem in the prior art that the sound signal is only evaluated to determine whether to speak a keyword when human speech is detected. Therefore, it is necessary to propose technical means to improve the implementation of voice activity detection to solve the problem. this question.
有鑒於先前技術存在只有在人類語音被檢測到時才評估聲音訊號以判斷是否說出關鍵字的問題,本發明遂揭露一種以零點交越檢測改進語音活動檢測之設備及方法,其中:In view of the problem in the prior art that the sound signal is only evaluated to determine whether to speak a keyword when human speech is detected, the present invention discloses an apparatus and method for improving voice activity detection with zero-crossing detection, wherein:
本發明所揭露之以零點交越檢測改進語音活動檢測之設備,至少包含:處理裝置,用以被程式化以實現:接收聲音訊號;依據由聲音訊號中所偵測到之零點交越(zero crossing)產生脈衝流(pulse stream);依據脈衝流中隨時間發生之脈衝的頻率產生脈衝密度流 (pulse density stream);根據門檻條件評估脈衝密度流;識別與符合門檻條件之脈衝密度流之部分對應之聲音訊號之語音(speech)部分。The apparatus for improving voice activity detection by zero-crossing detection disclosed in the present invention at least includes: a processing device programmed to achieve: receiving an audio signal; crossing) generates a pulse stream; generates a pulse density stream based on the frequency of the pulses occurring over time in the pulse stream; evaluates the pulse density stream based on a threshold condition; identifies and matches the portion of the pulse density stream that meets the threshold condition The speech part of the corresponding sound signal.
本發明所揭露之以零點交越檢測改進語音活動檢測之方法,其步驟至少包括:處理裝置接收聲音訊號;處理裝置依據由聲音訊號中所偵測到之零點交越產生脈衝流;處理裝置依據脈衝流中隨時間發生之脈衝的頻率產生脈衝密度流;處理裝置根據門檻條件評估脈衝密度流;處理裝置識別與符合門檻條件之脈衝密度流之部分對應之聲音訊號之語音部分。The method for improving voice activity detection by zero-crossing detection disclosed in the present invention at least comprises: a processing device receives a sound signal; the processing device generates a pulse stream according to the zero-crossing detected in the sound signal; The frequency of the pulses occurring over time in the pulse stream produces the pulse density stream; the processing means evaluates the pulse density stream according to the threshold condition; the processing means identifies the speech portion of the sound signal corresponding to the portion of the pulse density stream that meets the threshold condition.
本發明所揭露之設備與方法如上,與先前技術之間的差異在於本發明透過處理裝置依據由聲音訊號中所偵測到之零點交越產生脈衝流,並依據脈衝流中隨時間發生之脈衝的頻率產生脈衝密度流後,根據門檻條件評估脈衝密度流,並識別與符合門檻條件之脈衝密度流之部分對應之聲音訊號之語音部分,藉以解決先前技術所存在的問題,並可以達成以較低功率且更高準確度識別出潛在語音的技術功效。The apparatus and method disclosed in the present invention are as described above. The difference between the present invention and the prior art is that the present invention generates a pulse stream through a processing device according to the zero crossing detected in the sound signal, and generates pulses according to the pulses in the pulse stream over time. After the pulse density flow is generated at the frequency of the frequency, the pulse density flow is evaluated according to the threshold condition, and the voice part of the sound signal corresponding to the part of the pulse density flow that meets the threshold condition is identified, so as to solve the problems existing in the prior art and achieve a comparative The technical efficacy of identifying latent speech with low power and higher accuracy.
以下將配合圖式及實施例來詳細說明本發明之特徵與實施方式,內容足以使任何熟習相關技藝者能夠輕易地充分理解本發明解決技術問題所應用的技術手段及本發明的優點並據以實施,藉此實現本發明可達成的功效。要理解的是,圖式僅是用來附加描述和解釋本發明之實施例的具體性與細節,並不應被視為對本發明的限制。The features and implementations of the present invention will be described in detail with reference to the drawings and examples below, and the content is sufficient to enable any person skilled in the relevant art to easily and fully understand the technical means used by the present invention to solve the technical problems and the advantages of the present invention, and accordingly Implementation, thereby achieving the effect that the present invention can achieve. It is to be understood that the drawings are only used to additionally describe and explain the specificity and detail of the embodiments of the present invention, and should not be construed as limiting the present invention.
以下先以「第1圖」來說明本發明。如「第1圖」所示,語音檢測系統100(在本發明中也以系統100表示)對輸入訊號實現聲音活動檢測。系統100的元件也可以由處理器、不同的硬體元件或其他實現方式所執行之可執行程式碼的方式呈現。系統100可以做為第一裝置,用以喚醒第二裝置以回應在輸入訊號102中所偵測到的語音。例如,第二裝置可以是能夠執行語音轉文字、網路通訊、或能被智慧揚聲器或其他聲音控制裝置執行之其他處理功能的一般處理器。Hereinafter, the present invention will be described with reference to "FIG. 1". As shown in "FIG. 1", a speech detection system 100 (also represented as
輸入訊號102可以被麥克風接收,也可以是由麥克風之輸出採樣的原始(raw)數位聲音訊號,或可以是依據一個或多個預處理(pre-process)步驟對原始數位聲音訊號進行預處理所產生的結果,如低通濾波(low-pass filtering)、縮放(scaling)、降低採樣頻率(downsampling)、增加採樣頻率(upsampling)、或其他預處理步驟。The
系統100可以包含帶通濾波器104。帶通濾波器104可以具有與語音對應的頻帶(passband),如3db的頻帶。一般而言,頻帶可以介於0.3到2萬赫茲(Hz)之間。在其他的實施例中,也可以使用介於1到2千赫茲之間的頻帶。帶通濾波器104可以實現除去輸入訊號102中之任何直流分量(direct circuit component, DC component)及除去不會與語音對應之噪聲(noise)的功能。
帶通濾波器104可以輸出被輸入到加法器106的第一濾波訊號。加法器106可以將第一濾波訊號與高頻訊號108相加以產生總和訊號。高頻訊號108具有頻率與振幅。在某些實施例中,選擇頻率以確保在高頻訊號108中之每對連續的樣本間發生零點交越。因此,高頻訊號108的頻率可以等於輸入訊號102之採樣率的一半(二分之一)。
高頻訊號108的振幅可以校準到產生輸入訊號102之檢測麥克風的屬性及系統100所預期被遇到之周圍噪聲的屬性。舉例來說,可以在沒有語音的情況下由預期的環境(如真實世界之環境中的聲音記錄)中捕獲聲音訊號。當系統100如下述處理聲音訊號時,高頻訊號108的振幅可以被提高,直到系統100沒有檢測到語音。高頻訊號108的振幅可以是動態的。舉例來說,若來自語音轉文字元件的回饋(feedback)表示被判斷為包含語音之輸入訊號的部分實際上不包含語音,則高頻訊號108的振幅可以被增加以減少錯誤的報告判斷(false positive)。在本說明書中,訊號的「部分」是指訊號中之一連串的連續樣本。The amplitude of the
經過加法器106相加後所產生的總和訊號可以被輸入到零點交越檢測器110。零點交越檢測器110的輸出為脈衝流(pulse stream)。舉例來說,對於每個零點交越,零點交越檢測器110可以輸出第一數值,例如二進位制的1。若在總和訊號中的某個樣本與先前的樣本之間沒有正負號的變化,則零點交越檢測器110可以輸出第二數值,例如二進位制的0。在部分的實施例中,僅有由正值穿越(cross)到負值被檢測為零點交越。在某些實施例中,僅有由負值穿越到正值被檢測為零點交越。而還有另一部分的實施例,由正值穿越到負值或由負值穿越到正值都被檢測為零點交越。The sum signal generated after being added by the
脈衝流可以被輸入到脈衝密度檢測器112。脈衝密度檢測器112產生密度流(density stream),使得脈衝密度檢測器112對脈衝流中的每個樣本輸出一個在密度流中的樣本,被產生的密度流對應在脈衝流中每一個樣本之前N個脈衝之窗口的脈衝數量(第一數值)。其中,N大於1,較好的是N大於10,更好的是N大於100。The pulse stream may be input to the
密度流可以被輸入到輸出第二濾波訊號的低通濾波器114。截止頻率(cutoff frequency),如3dB的截止頻率,可以被選擇以達成相對於密度流之第二濾波訊號之平滑或平均的期望程度。在部分的實施例中,低通濾波器114可以以做為脈衝密度檢測器,即低通濾波器的結果通常是隨著脈衝密度增加而增加且隨著脈衝密度減少而減少的訊號,儘管對應關係可能不夠完美。如此,在這樣的實施例中,脈衝密度檢測器112可以被消去。The density stream may be input to a
第二濾波訊號可以被比較器116,比較器116可以就語音門檻值評估第二濾波訊號,並可以為在第二濾波訊號中的每個樣本輸出語音判定120。語音判定120可以是二進位制的數值,使用系統100處理輸入訊號102中的輸入樣本,使得表示輸入樣本是否可能與語音對應之相對應的語音判定120被輸出。被識別為語音之輸入訊號102中的輸入樣本可以被傳遞到後續階段以確認樣本確實包含語音、執行語音轉文字的合成(synthesis)、儲存以供後續使用或其他目的。或者,在時間上與第二濾波訊號之樣本對應的第一濾波訊號之樣本可以被傳遞到後續階段,藉以利用帶通濾波器104的濾波。The second filtered signal may be used by
數值低於語音門檻值118之樣本可以被判斷為與語音對應。尤其是,由語音造成之低頻率與高振幅的調變(modulation)可以將總和訊號的振幅提高到高於高頻訊號108的振幅,導致零點交越減少且脈衝密度對應減少。Samples with values below the
語音門檻值118可以被統計分析器122調整。統計分析器122接收輸入訊號102及/或第一濾波訊號,並隨時間產生表示一個或兩個訊號之特徵的分析數值。這些統計數值可以包含平均值、標準差、最大值、最小值、均方根、低於輸入樣本之絕對值的百分位數(如第90個百分位)、或其他統計值。The
舉例來說,統計分析器122可以計算輸入訊號中多個樣本之片段的均方根,並可以因而縮放語音門檻值118,例如,隨著均方根的增加而增加語音門檻值118,且隨著均方根的減少而減少語音門檻值118。在另一個例子中,統計分析器122可以計算輸入訊號中多個樣本之片段的均方根,並可以使用均方根縮放高頻訊號108的振幅,例如,隨著均方根的增加而增加高頻訊號108的振幅,且隨著均方根的減少而減少高頻訊號108的振幅。上述任一種方式都可以回應周圍噪聲之振幅的增加與減少而動態的減少錯誤的判斷。For example,
「第2圖」說明根據系統100而被使用與產生之訊號之示意圖。上方的圖200a包含一系列樣本上之語音訊號之振幅的語音訊號202,例如原始語音訊號或語音帶通濾波器104所輸出的濾波訊號。曲線204說明與上圖200a之樣本有關的語音判定120,數值較高的部分表示被識別為語音的樣本,數值較低的部分表示非語音。明顯的,具有與語音對應之輪廓(envelope)之振幅較高的部分被正確的識別為語音,而低振幅之噪聲則沒有被識別為語音。"FIG. 2" illustrates a schematic diagram of the signals used and generated in accordance with the
要注意的是,被識別為非語音的某些部分可能對應語音的特定部分,例如,/s/、/sh/、及/f/等無聲摩擦(unvoiced friction),很難由噪聲中區分出來。然而,這些部分是短暫的且可以透過延伸語音的部分以包含特定期間(如小於200毫秒)的部分與被識別為語音的部分之間或在被識別為語音之開頭或結尾的部分來取得。Note that some parts identified as non-speech may correspond to specific parts of speech, e.g. unvoiced friction such as /s/, /sh/, and /f/, which are difficult to distinguish from noise . However, these parts are ephemeral and can be obtained by extending the part of speech to include a certain period (eg, less than 200 milliseconds) between the part and the part recognized as speech or at the beginning or end of the part recognized as speech.
圖200b說明了在一系列樣本上之低通濾波訊號之振幅的曲線206,例如,低通濾波器114的輸出。曲線208代表比較器116使用的門檻值。在這個實施例中,與低於門檻值之低通濾波訊號的樣本對應之輸入訊號102的樣本將被識別為語音。
繼續以「第3圖」來說明本發明。如「第3圖」所示,語音活董檢測系統300(在本發明中亦以系統300表示)實施另一種方式以實現聲音活動檢測。以更多的複雜計算為代價,系統300比系統100更複雜。然而,系統300在計算上仍然非常有效,且可以只使用低儲存空間需求的加法、乘法與減法運算來實現儲存在被處理之樣本間的語音活動檢測演算法之狀態。如此,系統300可以被用來實現用以觸發喚醒處理裝置的語音活動檢測,其中,處理裝置能夠進行比系統300更複雜的運作,如通用的處理器。The present invention will be described with reference to "FIG. 3". As shown in "FIG. 3", the voice activity detection system 300 (also represented as
在某些實施例中,是在使用系統100實現語音活動檢測後實施「第3圖」與「第4圖」的方案,即使使用系統100將訊號的部分識別為語音可以使用系統300來處理以確認訊號確實包含語音。系統300可以由被處理器、不同的硬體元件或其他的實現方式執行之可執行程式碼來呈現。系統100與系統300可以由同一硬體裝置或不同的硬體裝置上之不同的可執行程式碼實現。In some embodiments, the schemes of "Fig. 3" and "Fig. 4" are implemented after the voice activity detection is implemented using the
系統300可以接收輸入訊號302,輸入訊號302是原始的聲音訊號或聲音訊號的濾波版本。輸入訊號302可以是被系統100識別為語音之輸入訊號102的部分。或者,由帶通濾波產生之訊號的部分(帶通濾波器104的輸出)可以被用作輸入訊號302。The
在輸入訊號302未經過帶通濾波的情況下,輸入訊號302可以被語音帶通濾波器304處理而獲得第一濾波訊號。語音帶通濾波器304可以如上述之帶通濾波器104被配置。When the
系統300還可以包含泰格(Teager)能量計算器306。泰格能量計算器306輸出輸入到泰格能量計算器306之訊號(輸入訊號302或第一濾波訊號)的泰格能量訊號(T)。舉例來說,對於給定的輸入訊號(s)可以根據公式(1)計算輸入訊號(s)之個別樣本(s[n])的泰格能量(T[n])。在公式(1)中,k為時間偏移量,例如1到5的數值。k的數值可以是採樣率的函數,且可以隨著採樣率增加而更高。
T[n] = (s[n]*s[n]) – (s[n-k]*s[n+k])……公式(1)
The
系統300可以包含整流器308。整流器308輸出被輸入之訊號(輸入訊號302、第一濾波訊號、或泰格能量訊號)的絕對值。
系統300還可以包含第一低通濾波器310。舉例來說,對於指定為x的輸入信號(整流器308的輸出),可以進行低通濾波以獲得第一低通信號。 可以將第一低通信號輸入到選擇性採樣階段312,選擇性採樣階段312參考第一低通信號來選擇x的樣本。選擇性採樣階段312可以選擇其幅度相對於第一低通信號在統計上是顯著異常值的那些樣本。由於選擇是基於x的特性,因此可能是不均勻的,即根據x幅度的變化以不均勻的間隔進行。
在選擇性採樣階段312選擇性地採樣的那些x的樣本可以被輸入到第二低通濾波314以獲得第二低通信號。然後可以在選擇性採樣階段316再次選擇性地對在選擇性採樣階段312採樣的x的樣本進行採樣,從而導致樣本的進一步減少。在選擇性採樣階段316的選擇性採樣可以從在選擇性採樣階段312選擇的具有相對於第二低通信號在統計上顯著的異常值的幅度的樣本中進行選擇。Those samples of x that are selectively sampled in the
在選擇性採樣階段316選擇的x的樣本可以在低通濾波器318再次低通濾波以獲得第三低通信號。第三低通信號可以被進一步處理,例如通過按比例的縮小階段320來獲得按比例縮小的信號。 在一些實施方式中,這可以包括將第三低通信號乘以小於1的縮小因子。縮小階段320的功能可以是至少部分地補償由在選擇性採樣階段312與316後剩餘之較高振幅之x的樣本取得之第三低通信號的事實。縮小因子可以通過實驗選擇給定的情境,例如,通過從1逐漸減小縮減因子,直到誤報數達到所需的數值,如樣本的 0.1%。The samples of x selected in
差分階段322可以計算縮小信號和第二低通信號之間的差以獲得差異信號。例如,對於縮小信號中的樣本,可以識別第一低通信號中具有相同索引或在第二低通信號的樣本系列中相同位置的樣本,並從該樣本中減去縮小的信號。
差異信號中的樣本可以被解釋以獲得語音判定324。在一些實施方式中,可以選擇縮小因子使得那些大於零的差異值可能是具有可接受之可信度的語音樣本。顯然,這將在第二低通信號小於縮小信號時發生。當差異值被判斷為與語音對應時,具有相同索引的輸入訊號302的樣本可以被判斷為與語音對應且可以被傳遞到另一個設備或另一個處理階段以執行語音到文本分析或其他處理。The samples in the difference signal may be interpreted to obtain
系統300可以進行各種修改。例如,可以使用單個低通濾波器310和選擇性採樣階段312,隨後是低通濾波器318,並且可以省略低通濾波器314和選擇性採樣階段316。或者,一個或多個組合,每個組合包括低通濾波器,後面跟著一個選擇性採樣階段,可以插入選擇性採樣階段316和低通濾波器318之間。Various modifications may be made to
差分階段322可以將來自任何前面階段的任何信號作為輸入,例如從一些或所有低通濾波器310、314、318輸出的信號。差分函數因此可以是一種功能,可以包括對這些信號進行縮放、加法或減法,以實現語音識別所需的準確度。
系統300可以實現以下的演算法1。 可以按順序對輸入訊號302的每個樣本s[n]執行演算法1(n是從0到N-1的索引,其中N是樣本的總數)。 儘管使用了「s[n]」,但應當理解,可以使用從s[n]導出的信號樣本,例如輸入訊號302的帶通濾波版本或計算的泰格能量T[n],如上面針對輸入訊號302或輸入訊號302的帶通濾波版本所描述的。演算法2可以作為演算法1的替代方案,其中執行衰減以考慮零振幅或非語音的周期。
演算法1:參考低通濾波訊號進行選擇性採樣之語音活動檢測
x = Abs(s[n]); //absolute value of s[n]
f1 = alpha * f1 + (1-alpha) * x;
if (x > m * f1) {
f3 = alpha * f3 + (1-alpha) * x;
}
if (x > m * f3) {
f5 = alpha * f5 + (1-alpha) * x;
}
d = (f5 * mult) - f3;
if (d > 0) {
speech = 1;
}
演算法2:參考具有濾波器值衰減之低通濾波訊號進行選擇性採樣之語音活動檢測
x = Abs(s[n]); //absolute value of s[n]
f1 = alpha * f1 + (1-alpha) * x;
if (x + offset3 > m * f1) {
f3 = alpha * f3 + (1-alpha) * x;
} else {
f3 = beta*f3
f5 = beta*f5
}
if (x + offset5 > m * f3) {
f5 = alpha * f5 + (1-alpha) * x;
} else {
f5 = beta*f5
}
d = (f5 * mult) - f3;
if (d > 0) {
speech = 1;
}
演算法1、2中的f1、f3和f5的計算實現了低通濾波(分別為低通濾波器310、314、318)。 alpha 是一個低通濾波器係數,可以是 0.98 和 0.9999 之間的值。例如,已發現 0.99 的值是有效的。用於計算 f1、f3、f5 或任何其他低通濾波步驟的 alpha 值可以是相同的或不同的 alpha 值,且可用於不同的低通濾波步驟。The computations of f1, f3 and f5 in Algorithms 1 and 2 implement low-pass filtering (low-
演算法1、2中的「if」語句可以對應於選擇性採樣階段312、316。可以根據調整過程來選擇m的值。在一些實施方式中,m可以是1.3和1.7之間的值。例如,已發現 1.5 的值是可以接受的。在「if」語句中使用的m值可以是相同的或不同的m值,且可以用於評估低通濾波信號f1和f3,或者然而計算許多其他低通濾波信號。The "if" statements in Algorithms 1, 2 may correspond to selective sampling stages 312, 316. The value of m can be chosen according to the adjustment process. In some embodiments, m may be a value between 1.3 and 1.7. For example, a value of 1.5 has been found to be acceptable. The m values used in the "if" statements can be the same or different m values, and can be used to evaluate the low pass filtered signals f1 and f3, or to calculate many other low pass filtered signals however.
在演算法1、2中將f5乘以「mult」可以實現縮小階段320的縮小因子。因此,mult可以是小於如上所述關於縮小因子選擇的一個的值,以便實現可接受的誤報數量。The reduction factor of the
差分階段322與d的計算對應。在d大於零的情況下,根據演算法1、2可以認為從中計算x的樣本s[n]對應於語音。要注意的是在僅執行過濾和選擇性採樣的一個實例的實施例中,等式「d = (f5 * mult) - f3」可以替換為「d = (f3 * mult) - f1」。以類似的方式,在執行多於兩個過濾和選擇性採樣的情況下,d可以計算為「d = (fx * mult) – fy」,其中,fx是最後一個過濾實例中的過濾結果,fy是在前一個過濾實例中過濾的結果,例如倒數第二個實例。The
演算法 2 中的beta值可以是小於1的衰減因子,例如在0.999和0.9999之間。乘以 beta 所產生的衰減可能是非常緩慢的,這說明了對於許多樣本,例如數百或數千,可能檢測不到語音。在沒有衰減因子的情況下,f1、f3 及/或 f5 可能會發生突然變化,從而導致不必要的誤報。在演算法 1 中,可以省略根據 beta 的衰減,且誤報的可能性由後續階段處理或簡單地接受。The beta value in Algorithm 2 can be a decay factor less than 1, for example between 0.999 and 0.9999. The decay produced by multiplying beta can be very slow, indicating that for many samples, say hundreds or thousands, speech may not be detected. In the absence of a decay factor, f1, f3, and/or f5 may change suddenly, resulting in unnecessary false positives. In Algorithm 1, decay according to beta can be omitted, and the possibility of false positives is handled by subsequent stages or simply accepted.
在一些實施方式中,可以通過使用如演算法2中所示的offset3和offset5來處理一系列零值輸入樣本的出現。offset3和offset5的值可以相同或不同。 offset3和offset5的值可以是使用x的位數和格式可表示的最小值的量級。例如,假設 x 是 12 位無符號整數(x 是絕對值,因此始終為正),則offset3和offset5可能等於 2^(-11)(2的-11次方)。或者,offset3和offset5可以等於最小可表示值的某個倍數(例如,2到10)。從演算法2可以看出,當有一系列零值樣本時,低通濾波器值f1最終也將達到零。通過將offset3或offset5添加到零值 x 仍將滿足「if」語句的條件,從而避免不連續性並確保f3和f5也將響應一系列零值樣本衰減到零。演算法2中所示的offset3和offset5的使用可以用來代替使用beta的衰減,或者可以與使用beta的衰減結合使用。同樣,使用beta的衰減可以在「ifIn some embodiments, the occurrence of a series of zero-valued input samples can be handled by using offset3 and offset5 as shown in Algorithm 2. The values of offset3 and offset5 can be the same or different. The values of offset3 and offset5 may be on the order of the smallest value representable using the number of bits and format of x. For example, assuming x is a 12-bit unsigned integer (x is an absolute value, so always positive), offset3 and offset5 might equal 2^(-11) (2 to the power of -11). Alternatively, offset3 and offset5 may be equal to some multiple of the smallest representable value (eg, 2 to 10). As can be seen from Algorithm 2, when there is a series of zero-valued samples, the low-pass filter value f1 will eventually also reach zero. By adding offset3 or offset5 to a zero-valued x will still satisfy the condition of the "if" statement, thus avoiding discontinuities and ensuring that f3 and f5 will also decay to zero in response to a series of zero-valued samples. The use of offset3 and offset5 shown in Algorithm 2 can be used instead of, or in combination with, the decay using beta. Likewise, decay using beta can be used in "if
」語句中不使用offset3和offset5的情況下使用。 明顯的,演算法1、2僅需要乘法、加法和減法運算。在多次迭代中使用的值僅包括 alpha、m、mult、f1、f3 和 f5(及實現衰減的beta)。 因此,實現演算法 1 所需的計算和記憶體要求非常低。因此,演算法1提供了一種低功率且高準確識別潛在語音的方法。 ” is used when offset3 and offset5 are not used in the statement. Obviously, Algorithms 1 and 2 only require multiplication, addition and subtraction operations. The values used in multiple iterations include only alpha, m, mult, f1, f3, and f5 (and beta to implement decay). Therefore, the computational and memory requirements required to implement Algorithm 1 are very low. Therefore, Algorithm 1 provides a low-power and high-accuracy method for identifying latent speech.
「第4圖」表示在系統300的實施期間可能存在的各種信號的圖。圖400a包含語音和周期性噪聲週期的信號的曲線402。曲線404表示關於由曲線402表示的信號樣本的語音判定(高值表示語音,低值表示非語音)。"FIG. 4" shows a diagram of various signals that may exist during the implementation of the
圖400b呈現系統300的內部信號的曲線圖。包括fl的曲線406、f3的曲線408和f5的曲線410。 明顯的,每個信號都相對於先前計算的信號進行了平滑處理(f3比f1更平滑,f5比f3更平滑)。同樣顯而易見的是,原始信號中未被識別為語音的噪聲週期(曲線 402)低於 f1、f3 和 f5,在比較之前它們被額外放大了m。Diagram 400b presents a graph of the internal signals of
「第5圖」是系統500的區塊示意圖,系統500可以結合如上所述之語音活動檢測系統100和語音活動檢測系統300。系統500可以包括麥克風502,其可以是單獨的麥克風或麥克風陣列。麥克風502的輸出可以通過低通濾波、帶通濾波或其他類型的處理進行預處理,以便調節輸出用於後續處理。"FIG. 5" is a block diagram of the
麥克風502的輸出可以輸入到語音活動檢測的系統100。系統100使用上面關於「第1圖」和「第2圖」描述的方式識別可能對應於語音的麥克風輸出的第一部分。 參考「第1圖」和「第2圖」。第一部分可以被輸入到語音活動檢測的系統300。例如,當系統300標識的第一部分區域被斷電或被斷電時,系統100可以喚醒系統300以處理第一部分。在睡眠模式中使用比系統300喚醒時更少的功率。系統300可以使用上面關於「第1圖」和「第2圖」的方式描述的方法來處理第一部分並識別可能對應於語音的第二部分。 參考「第3圖」和「第4圖」。可以預期,被系統100識別為語音的一些部分不會被系統300識別為語音。The output of
由語音活動檢測的系統 300 識別的第二部分可以輸入到另一個語音處理系統 504。語音處理系統 504可以執行本領域已知的任何語音處理功能,例如語音到文本、語音認證或類似功能。The second portion recognized by the voice
在「第5圖」中的元件(系統100、系統300、語音處理系統504)都可以是單獨的硬件設備,例如單獨的半導體晶片、單獨的電路板或單獨的獨立運行的計算設備。或者,元件(系統100、系統300、語音處理系統504)中的任何兩個或更多個可以是在相同硬體設備上執行的不同可執行模組。The elements in "FIG. 5" (
「第6圖」表示計算設備600的區塊圖。計算設備600可以用於執行各種過程,例如本發明所討論的。FIG. 6 shows a block diagram of
計算設備600包括一個或多個處理器602、一個或多個儲存裝置604、一個或多個介面606、一個或多個大容量儲存裝置608、一個或多個輸入/輸出(I/O)裝置611和顯示裝置630,上述的處理器、介面、及各種裝置都與匯流排612耦合。處理器602包括一個或多個處理器或控制器,處理器602所包含的處理器或控制器可以執行儲存在儲存裝置604及/或大容量存儲裝置608中的指令。處理器602還可以包括各種類型的計算機可讀媒體,例如快取記憶體。
儲存裝置604包含各種計算機可讀媒體,例如揮發性記憶體及/或非揮發性記憶體,揮發性記憶體如隨機存取記憶體(RAM)614,非揮發性記憶體如唯讀記憶體(ROM)616。 儲存裝置604還可以包括可覆寫記憶體,如快閃記憶體(Flash Memory)。The storage device 604 includes various computer-readable media, such as volatile memory and/or non-volatile memory, volatile memory such as random access memory (RAM) 614, non-volatile memory such as read-only memory ( ROM) 616. The storage device 604 may also include a rewritable memory, such as a flash memory (Flash Memory).
大容量儲存裝置608包括各種計算機可讀媒體,例如磁帶、磁片/磁碟、光碟、固態記憶體(如快閃記憶體)等。如「第6圖」所示,特定的大容量儲存裝置是硬碟機624。大容量儲存裝置608也可以包含各種驅動器以實現從各種計算機可讀媒體讀取及/或寫入到各種計算機可讀媒體。大容量儲存裝置608包含可移除(removable)媒體626和/或不可移除(non-removable)媒體。
輸入/輸出裝置610包括允許資料及/或其他訊息輸入到計算設備600或允許從計算設備600取得資料及/或其他訊息的各種裝置。輸入/輸出裝置610的例子包含游標控制裝置、鍵盤、小鍵盤 、麥克風、監視器或其他顯示裝置、揚聲器、列表機、網路介面卡、數據機、鏡頭、攝影機/電荷耦合裝置(Charge-Coupled Device, CCD) 或其他裝置等。Input/
顯示裝置630包含能夠向計算設備600的一個或多個使用者顯示訊息的任何類型的裝置。顯示裝置630的例子包含監視器、顯示終端、影像投影裝置等。
介面606包含允許計算設備600與其他系統、設備或計算環境互動的各種介面。介面606的例子包含任意數量的不同網路介面620,例如區域網路(LAN)、廣域網路(WAN)、無線網路和Internet的介面。 其他介面包括使用者介面 618 和周邊裝置介面 622。介面 606 還可以包括一個或多個周邊介面,例如用於列印機、定點設備(滑鼠、觸控板等)、鍵盤和其他類似。
匯流排612允許處理器602、儲存裝置604、介面606、大容量儲存裝置608、輸入/輸出裝置610和顯示裝置630與其他與匯流排612連接的其他元件連接,匯流排612表示多種類型的匯流排架構中的一種或多種,例如系統匯流排、PCI匯流排、IEEE 1394、USB等。Bus 612 allows
出於說明的目的,程式和其他可執行程式元件在本發明中被表示為離散的區塊,儘管應當理解,這樣的程式和元件可以在不同時間駐留在計算設備600的不同儲存元件中,並且由處理器執行。或者,本發明描述的系統和過程可以在硬體或硬體、軟體及/或韌體的組合中實現。例如,一個或多個特殊應用積體電路(application specific integrated circuits, ASIC)可以被程式化以執行一個或多個在本發明中描述的系統和程序。For illustrative purposes, programs and other executable program elements are represented in this disclosure as discrete blocks, although it should be understood that such programs and elements may reside in different storage elements of
在上述的揭露中,參考了圖式,這些圖式形成了揭露的一部分,並且在圖式中通過說明的方式示出了可以實踐本發明的具體實施方式。應當理解,在不脫離本發明的範圍的情況下,可以利用其他實施方式並且可以進行結構改變。說明書中所描述的實施例可以包括特定的特徵、結構或特性,但每個實施例不一定都包括特定的特徵、結構或特徵。此外,這些用語不一定指相同的實施例。此外,當結合實施例描述特定特徵、結構或特性時,認為在本領域技術人員的知識範圍內影響與其他實施例結合的這種特徵、結構或特性,無論是否沒有明確描述。In the foregoing disclosure, reference is made to the drawings, which form a part hereof, and which illustrate, by way of illustration, specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention. The embodiments described in the specification may include a particular feature, structure or characteristic, but each embodiment does not necessarily include the particular feature, structure or characteristic. Moreover, these terms are not necessarily referring to the same embodiment. Furthermore, when a particular feature, structure or characteristic is described in conjunction with one embodiment, it is believed to be within the purview of those skilled in the art to affect such feature, structure or characteristic in combination with other embodiments, whether or not expressly described.
本發明所揭露的系統、設備和方法的實現可以包含或利用包含計算機硬體的專用或通用計算機,例如如本發明所討論的一個或多個處理器和系統記憶體。本發明揭露之範圍內的實施方式還可以包括用於攜帶或儲存計算機可執行指令和/或資料結構的物理和其他計算機可讀媒體。這樣的計算機可讀媒體可以是可由通用或專用計算機系統存取的任何可用媒體。儲存計算機可執行指令的計算機可讀媒體是計算機儲存媒體或裝置或設備。承載計算機可執行指令的計算機可讀媒體是傳輸媒體。因此,作為例子而非限制,本發明的實現可以包括至少兩種截然不同的計算機可讀媒體:計算機儲存媒體(或裝置或設備)和傳輸媒體。Implementations of the systems, devices, and methods disclosed herein may include or utilize a special purpose or general purpose computer containing computer hardware, such as one or more processors and system memory as discussed herein. Embodiments within the scope of the present disclosure may also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media storing computer-executable instructions are computer storage media or devices or devices. Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example and not limitation, implementations of the invention may include at least two distinct computer-readable media: computer storage media (or apparatus or devices) and transmission media.
計算機存儲媒體(或裝置或設備)包含 RAM、ROM、EEPROM、CD-ROM、(基於RAM的)SSD、快閃記憶體、相變記憶體(PCM)、其他類型的記憶體、其他光碟儲存裝置、磁碟儲存裝置或其他磁儲存裝置,或可用於以計算機可執行指令或資料結構的形式儲存所需程式碼裝置且可由計算機存取的任何其他媒體通用或專用計算機。Computer storage media (or devices or devices) include RAM, ROM, EEPROM, CD-ROM, (RAM-based) SSD, flash memory, phase change memory (PCM), other types of memory, other optical disk storage devices , disk storage device or other magnetic storage device, or any other medium general purpose or special purpose computer that can be used to store the desired code means in the form of computer executable instructions or data structures and which can be accessed by a computer.
本發明所接露的設備、系統和方法的實現可以通過計算機網路進行通信。「網路」被定義為能夠在計算機系統及/或模組及/或其他電子設備之間傳輸電子資料的一個或多個資料鏈路。當訊息通過網路或其他通訊連接(有線、無線或有線或無線的組合)傳輸或提供給計算機時,計算機會將連接正確地視為傳輸媒體。傳輸媒體可包括網路和/或資料鏈路,其可用於承載計算機可執行指令或資料結構形式的所需程式碼裝置,並且可由通用或專用計算機存取。上述的組合也應包括在計算機可讀媒體的範圍內。Implementations of the devices, systems and methods disclosed in the present invention may communicate through a computer network. "Network" is defined as one or more data links capable of transmitting electronic data between computer systems and/or modules and/or other electronic devices. When information is transmitted or provided to a computer over a network or other communication connection (wired, wireless, or a combination of wired or wireless), the computer correctly considers the connection to be a transmission medium. Transmission media can include networks and/or data links, which can be used to carry the desired code means in the form of computer-executable instructions or data structures, and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
計算機可執行指令包括例如指令和資料,當它們在處理器處執行時,使通用計算機、專用計算機或專用處理設備執行特定功能或功能組合。計算機可執行指令可以是例如二進位制、組合語言等中間格式指令、甚至是原始碼。儘管已經以特定於結構特徵和/或方法行為的語言描述了本發明,但是應當理解,在權利要求中定義的發明名稱不一定限於上述描述的特徵或行為。 相反,所描述的特徵和動作被揭露為實施權利要求的示例形式。Computer-executable instructions include, for example, instructions and materials which, when executed at a processor, cause a general purpose computer, special purpose computer or special purpose processing device to perform a specified function or combination of functions. Computer-executable instructions may be intermediate format instructions such as binary, assembly language, etc., or even source code. Although the present invention has been described in language specific to structural features and/or methodological acts, it is to be understood that the inventive title defined in the claims is not necessarily limited to the above-described features or acts. Rather, the described features and acts are disclosed as example forms of implementing the claims.
本領域技術人員將理解,本公開可以在具有多種類型的計算機系統配置的網路計算環境中實施,包括儀表板車載電腦、個人電腦、桌上型電腦、筆記型電腦、訊息處理器、手持設備、多處理器系統、基於微處理器或可程式化的消費電子產品、網路PC、小型電腦、大型電腦、行動電話、PDA、平板電腦、呼叫器、路由器、交換機、各種儲存設備等。本發明也可以在分佈式系統環境中實踐,其中本地和遠端計算機系統通過網路鏈接(通過以連接線直接連接的資料鏈路、無線資料鏈路或通過連接線和無線資料鏈路的組合)執行任務。 在分佈式系統環境中,程式模組可以位於本地和遠端記憶體儲存裝置中。Those skilled in the art will understand that the present disclosure may be practiced in networked computing environments with many types of computer system configurations, including dashboard, vehicle-mounted computers, personal computers, desktop computers, notebook computers, message processors, handheld devices , multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile phones, PDAs, tablet computers, pagers, routers, switches, various storage devices, etc. The invention may also be practiced in a distributed system environment where local and remote computer systems are linked by a network (via a data link directly connected by a cable, a wireless data link, or through a combination of a cable and a wireless data link ) to perform the task. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
此外,在適當的情況下,本發明描述的功能可以在以下一項或多項中執行:硬體、軟體、韌體、數位元件或模擬元件。例如,一個或多個特殊應用積體電路可以程式化以執行一個或多個在本發明中描述的系統和程序。在整個描述和權利要求中使用某些術語來指代特定的系統元件。如本領域技術人員將理解的,元件可以用不同的名稱來代替。本發明無意區分名稱不同但功能不同的元件。Furthermore, where appropriate, the functions described herein may be performed in one or more of the following: hardware, software, firmware, digital components, or analog components. For example, one or more application-specific integrated circuits may be programmed to perform one or more of the systems and procedures described in this disclosure. Certain terms are used throughout the description and claims to refer to specific system elements. As will be understood by those skilled in the art, elements may be replaced by different names. The present invention does not intend to distinguish between elements with different names but different functions.
要注意的是,上面討論的感測器實施例可以包括計算機硬體、軟體、韌體或它們的任何組合以執行它們的至少一部分功能。例如,感測器可以包括被配置為在一個或多個處理器中執行的程式碼,並且可以包含由程式碼控制的硬體邏輯/電路。這些舉例的設備在本文中是為了說明的目的而提供的,而不是限制性的。如相關領域的技術人員所知,本發明所揭露的實施例可以在其他類型的設備中實現。Note that the sensor embodiments discussed above may include computer hardware, software, firmware, or any combination thereof to perform at least a portion of their functions. For example, a sensor may include code configured to execute in one or more processors, and may contain hardware logic/circuitry controlled by the code. These exemplary devices are provided herein for purposes of illustration and not limitation. As known to those skilled in the relevant art, the disclosed embodiments may be implemented in other types of devices.
本發明所揭露的的至少一些實施例已經針對包括儲存在任何計算機可用媒體上的邏輯(如以軟體的形式)的計算機程式產品。這樣的軟體當在一個或多個資料處理設備中執行時,使設備如本發明所述進行操作。At least some embodiments of the present disclosure have been directed to computer program products including logic (eg, in the form of software) stored on any computer-usable medium. Such software, when executed in one or more data processing devices, causes the devices to operate as described in the present invention.
雖然上面已經描述了本發明的各種實施例,但是應當理解,它們僅作為例子而不是限制來呈現。對於相關領域的技術人員來說,在不背離本公開的精神和範圍的情況下,可以在其中做出各種形式和細節的改變,這是顯而易見的。因此,本公開的廣度和範圍不應受任何上述實施例的限制,而應僅根據權利要求及其等同物來定義。已經出於說明和描述的目的而呈現了前述描述。其並非旨在詳盡無遺或將本公開限制為所公開的精確形式。鑑於上述教示,許多修改和變化都是可能的。此外,應當注意,任何或所有前述替代實現可以以期望的任何組合來使用以形成本發明的另外的混合實現。While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to those skilled in the relevant art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure. Thus, the breadth and scope of the present disclosure should not be limited by any of the above-described embodiments, but should be defined only in accordance with the claims and their equivalents. The foregoing description has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. Many modifications and variations are possible in light of the above teachings. Furthermore, it should be noted that any or all of the foregoing alternative implementations may be used in any desired combination to form further hybrid implementations of the present invention.
100:語音活動檢測系統
102:輸入訊號
104:帶通濾波器
106:加法器
108:高頻訊號
110:零點交越檢測器
112:脈衝密度檢測器
114:低通濾波器
116:比較器
118:語音門檻值
120:語音判定
122:統計分析器
200a:圖
200b:圖
202:語音訊號
204:曲線
206:曲線
208:曲線
300:語音活動檢測系統
302:輸入訊號
304:語音帶通濾波器
306:泰格能量計算器
308:整流器
310:低通濾波器
312:選擇性採樣階段
314:低通濾波器
316:選擇性採樣階段
318:低通濾波器
320:縮小階段
322:差分階段
324:語音判定
400a:圖
400b:圖
402:曲線
404:曲線
406~410:曲線
500:系統
502:麥克風
504:語音處理系統
600:計算設備
602:處理器
604:儲存裝置
606:介面
608:大容量儲存裝置
610:輸入/輸出裝置
612:匯流排
614:隨機存取記憶體
616:唯讀記憶體
618:使用者介面
620:網路介面
622:周邊裝置介面
630:顯示裝置
100: Voice Activity Detection System
102: Input signal
104: Bandpass filter
106: Adder
108: high frequency signal
110: Zero Crossing Detector
112: Pulse Density Detector
114: Low pass filter
116: Comparator
118: Voice threshold
120: Voice Judgment
122:
第1圖為本發明實施例所提之基於檢測零點交越檢測以實現語音活動檢測之元件的區塊示意圖。 第2圖為本發明實施例所包含之基於檢測零點交越檢測以實現語音活動檢測之聲音訊號與由聲音訊號導出之訊號之示意圖。 第3圖為本發明實施例所提之基於聲音訊號之樣本的統計特質以實現語音活動檢測之元件之區塊示意圖。 第4圖為本發明實施例所包含之基於聲音訊號之樣本的統計特質以實現語音活動檢測之聲音訊號與由聲音訊號導出之訊號之示意圖。 第5圖為本發明實施例所提之語音處理系統之區塊示意圖。 第6圖為本發明實施例所提之計算設備之區塊示意圖。 FIG. 1 is a block diagram of an element for realizing voice activity detection based on detection of zero-crossing detection according to an embodiment of the present invention. FIG. 2 is a schematic diagram of a sound signal and a signal derived from the sound signal based on detection of zero-crossing detection to realize voice activity detection included in an embodiment of the present invention. FIG. 3 is a block diagram of an element for realizing voice activity detection based on the statistical characteristics of a sample of a sound signal according to an embodiment of the present invention. FIG. 4 is a schematic diagram of a sound signal and a signal derived from the sound signal for realizing voice activity detection based on the statistical characteristics of a sample of the sound signal included in the embodiment of the present invention. FIG. 5 is a block diagram of a speech processing system according to an embodiment of the present invention. FIG. 6 is a block diagram of a computing device according to an embodiment of the present invention.
100:語音活動檢測系統 100: Voice Activity Detection System
102:輸入訊號 102: Input signal
104:帶通濾波器 104: Bandpass filter
106:加法器 106: Adder
108:高頻訊號 108: high frequency signal
110:零點交越檢測器 110: Zero Crossing Detector
112:脈衝密度檢測器 112: Pulse Density Detector
114:低通濾波器 114: Low pass filter
116:比較器 116: Comparator
118:語音門檻值 118: Voice threshold
120:語音判定 120: Voice Judgment
122:統計分析器 122: Statistical Analyzer
Claims (20)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/081,640 | 2020-10-27 | ||
US17/081,640 US20220130405A1 (en) | 2020-10-27 | 2020-10-27 | Low Complexity Voice Activity Detection Algorithm |
US17/081,378 US11790931B2 (en) | 2020-10-27 | 2020-10-27 | Voice activity detection using zero crossing detection |
US17/081,378 | 2020-10-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
TW202226225A true TW202226225A (en) | 2022-07-01 |
Family
ID=81383171
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW110139244A TW202226226A (en) | 2020-10-27 | 2021-10-22 | Apparatus and method with low complexity voice activity detection algorithm |
TW110139243A TW202226225A (en) | 2020-10-27 | 2021-10-22 | Apparatus and method for improved voice activity detection using zero crossing detection |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW110139244A TW202226226A (en) | 2020-10-27 | 2021-10-22 | Apparatus and method with low complexity voice activity detection algorithm |
Country Status (2)
Country | Link |
---|---|
TW (2) | TW202226226A (en) |
WO (2) | WO2022093705A1 (en) |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6240386B1 (en) * | 1998-08-24 | 2001-05-29 | Conexant Systems, Inc. | Speech codec employing noise classification for noise compensation |
US7072833B2 (en) * | 2000-06-02 | 2006-07-04 | Canon Kabushiki Kaisha | Speech processing system |
US20020039425A1 (en) * | 2000-07-19 | 2002-04-04 | Burnett Gregory C. | Method and apparatus for removing noise from electronic signals |
US7167568B2 (en) * | 2002-05-02 | 2007-01-23 | Microsoft Corporation | Microphone array signal enhancement |
EP1443498B1 (en) * | 2003-01-24 | 2008-03-19 | Sony Ericsson Mobile Communications AB | Noise reduction and audio-visual speech activity detection |
WO2008106036A2 (en) * | 2007-02-26 | 2008-09-04 | Dolby Laboratories Licensing Corporation | Speech enhancement in entertainment audio |
US9053697B2 (en) * | 2010-06-01 | 2015-06-09 | Qualcomm Incorporated | Systems, methods, devices, apparatus, and computer program products for audio equalization |
US9269368B2 (en) * | 2013-03-15 | 2016-02-23 | Broadcom Corporation | Speaker-identification-assisted uplink speech processing systems and methods |
WO2014189931A1 (en) * | 2013-05-23 | 2014-11-27 | Knowles Electronics, Llc | Vad detection microphone and method of operating the same |
US10360926B2 (en) * | 2014-07-10 | 2019-07-23 | Analog Devices Global Unlimited Company | Low-complexity voice activity detection |
-
2021
- 2021-10-22 TW TW110139244A patent/TW202226226A/en unknown
- 2021-10-22 TW TW110139243A patent/TW202226225A/en unknown
- 2021-10-25 WO PCT/US2021/056479 patent/WO2022093705A1/en active Application Filing
- 2021-10-25 WO PCT/US2021/056473 patent/WO2022093702A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
TW202226226A (en) | 2022-07-01 |
WO2022093705A1 (en) | 2022-05-05 |
WO2022093702A1 (en) | 2022-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5905608B2 (en) | Voice activity detection in the presence of background noise | |
KR101437830B1 (en) | Method and apparatus for detecting voice activity | |
JP3963850B2 (en) | Voice segment detection device | |
JP4863713B2 (en) | Noise suppression device, noise suppression method, and computer program | |
JP4842583B2 (en) | Method and apparatus for multisensory speech enhancement | |
KR100930060B1 (en) | Recording medium on which a signal detecting method, apparatus and program for executing the method are recorded | |
US8046215B2 (en) | Method and apparatus to detect voice activity by adding a random signal | |
US20140067388A1 (en) | Robust voice activity detection in adverse environments | |
JP2005227782A (en) | Apparatus and method for detecting voiced sound and unvoiced sound | |
CN107358964B (en) | Method for detecting an alert signal in a changing environment | |
CN112309414B (en) | Active noise reduction method based on audio encoding and decoding, earphone and electronic equipment | |
CN108053834B (en) | Audio data processing method, device, terminal and system | |
KR100930061B1 (en) | Signal detection method and apparatus | |
EP1548703A1 (en) | Apparatus and method for voice activity detection | |
WO2017128910A1 (en) | Method, apparatus and electronic device for determining speech presence probability | |
TW202226225A (en) | Apparatus and method for improved voice activity detection using zero crossing detection | |
CN113314134B (en) | Bone conduction signal compensation method and device | |
US11790931B2 (en) | Voice activity detection using zero crossing detection | |
JP7152112B2 (en) | Signal processing device, signal processing method and signal processing program | |
US20220130405A1 (en) | Low Complexity Voice Activity Detection Algorithm | |
CN111370033A (en) | Keyboard sound processing method and device, terminal equipment and storage medium | |
US9413323B2 (en) | System and method of filtering an audio signal prior to conversion to an MU-LAW format | |
CN113470621B (en) | Voice detection method, device, medium and electronic equipment | |
CN112312258B (en) | Intelligent earphone with hearing protection and hearing compensation | |
KR102044962B1 (en) | Environment classification hearing aid and environment classification method using thereof |