TWI839132B - Voice activity detection system - Google Patents
Voice activity detection system Download PDFInfo
- Publication number
- TWI839132B TWI839132B TW112106990A TW112106990A TWI839132B TW I839132 B TWI839132 B TW I839132B TW 112106990 A TW112106990 A TW 112106990A TW 112106990 A TW112106990 A TW 112106990A TW I839132 B TWI839132 B TW I839132B
- Authority
- TW
- Taiwan
- Prior art keywords
- voice
- activity detection
- detection system
- critical value
- voice activity
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 56
- 230000000694 effects Effects 0.000 title claims abstract description 50
- 230000003068 static effect Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 6
- 238000005311 autocorrelation function Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000000034 method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Abstract
Description
本發明係有關一種語音活動偵測(voice activity detection,VAD),特別是關於一種具適應臨界值的語音活動偵測系統。 The present invention relates to a voice activity detection (VAD), and more particularly to a voice activity detection system with an adaptive threshold.
語音活動偵測係用以偵測或識別人語,主要用於語音的處理。語音活動偵測可用以啟動基於語音的應用。於非語音期間,語音活動偵測關閉部分處理以避免不需要的傳輸,因而降低通訊頻寬與功率消耗。 Voice activity detection is used to detect or recognize human speech, and is mainly used for voice processing. Voice activity detection can be used to activate voice-based applications. During non-voice periods, voice activity detection shuts down some processing to avoid unnecessary transmission, thereby reducing communication bandwidth and power consumption.
傳統語音活動偵測系統易造成錯誤或不可靠,特別是處於雜訊環境。因此亟需提出一種新穎機制,以克服傳統語音活動偵測系統的缺失。 Traditional voice activity detection systems are prone to errors or unreliability, especially in noisy environments. Therefore, it is urgent to propose a novel mechanism to overcome the shortcomings of traditional voice activity detection systems.
鑑於上述,本發明實施例的目的之一在於提出一種具適應臨界值的語音活動偵測系統,可適用於環境變化與雜訊克服,因而輸出可靠且正確的偵測結果。 In view of the above, one of the purposes of the embodiments of the present invention is to provide a voice activity detection system with an adaptive threshold value, which can be applied to environmental changes and noise overcoming, thereby outputting reliable and correct detection results.
根據本發明實施例,語音活動偵測系統包含語音訊框偵測器及語音偵測器。語音訊框偵測器用以偵測語音信號非靜止時的語音訊框。語音偵測器根據語音訊框以偵測人語。 According to an embodiment of the present invention, the voice activity detection system includes a voice frame detector and a voice detector. The voice frame detector is used to detect the voice frame when the voice signal is not static. The voice detector detects human speech based on the voice frame.
於一實施例中,語音活動偵測系統更包含臨界值更新單元,根據語音偵測器偵測人語的結果,用以更新相應臨界值以偵測人語。 In one embodiment, the voice activity detection system further includes a threshold value updating unit, which is used to update the corresponding threshold value to detect human speech according to the result of the voice detector detecting human speech.
100:語音活動偵測系統 100: Voice activity detection system
100A:語音活動偵測系統 100A: Voice Activity Detection System
100B:語音活動偵測系統 100B: Voice Activity Detection System
11:轉換器 11: Converter
12:語音訊框偵測器 12: Voice frame detector
13:語音偵測器 13: Voice detector
14:臨界值更新單元 14: Critical value update unit
15:控制器 15: Controller
16:影像感測器 16: Image sensor
17:人工智慧引擎 17: Artificial Intelligence Engine
18:語音識別單元 18: Speech recognition unit
19:人臉識別單元 19: Face recognition unit
200:語音活動偵測方法 200: Voice activity detection method
21:將聲音轉換為語音信號 21: Convert sound into voice signals
22:偵測語音信號非靜止時的語音訊框 22: Detect the voice frame when the voice signal is not static
23:是否偵測到人語 23: Whether human speech is detected
24:更新臨界值 24: Update critical value
TH_B:第一臨界值 TH_B: First critical value
TH_C:第二臨界值 TH_C: Second critical value
第一圖顯示本發明實施例的語音活動偵測系統的方塊圖。 The first figure shows a block diagram of a voice activity detection system of an embodiment of the present invention.
第二圖顯示本發明實施例的語音活動偵測方法的流程圖。 The second figure shows a flow chart of the voice activity detection method of an embodiment of the present invention.
第三A圖例示語音信號及端點的波形。 Figure 3A shows the waveforms of voice signals and endpoints.
第三B圖例示語音信號的音量與高階差值。 Figure 3B shows the volume and high-frequency difference of the voice signal.
第三C圖例示語音訊框。 Figure 3C shows the audio frame.
第四A圖例示語音信號及端點的波形。 Figure 4A shows the waveform of the voice signal and endpoints.
第四B圖例示自相關與相應第一臨界值TH_B。 Figure 4B illustrates the autocorrelation and the corresponding first critical value TH_B.
第四C圖例示正規化平方差值與相應第二臨界值TH_C。 Figure 4C illustrates the normalized squared difference and the corresponding second critical value TH_C.
第五A圖例示自相關及如何獲得更新第一臨界值。 Figure 5A illustrates the autocorrelation and how to obtain the updated first critical value.
第五B圖例示正規化平方差及如何獲得更新第二臨界值。 Figure 5B illustrates the normalized squared difference and how to obtain the updated second critical value.
第六圖顯示本發明第一例示實施例的語音活動偵測系統的方塊圖。 Figure 6 shows a block diagram of the voice activity detection system of the first exemplary embodiment of the present invention.
第七圖顯示本發明第二例示實施例的語音活動偵測系統的方塊圖。 FIG. 7 shows a block diagram of a voice activity detection system of the second exemplary embodiment of the present invention.
第一圖顯示本發明實施例的語音活動偵測(voice activity detection,VAD)系統100的方塊圖,第二圖顯示本發明實施例的語音活動偵測方法200的流程圖。
The first figure shows a block diagram of a voice activity detection (VAD)
本實施例的語音活動偵測系統100可包含轉換器(transducer)11,例如麥克風,用以將聲音轉換為(電子)語音信號(步驟21)。
The voice
語音活動偵測系統100可包含語音訊框偵測器12,接收語音信號且用以偵測語音信號非靜止時的語音訊框(步驟22)。在一實施例中,語音訊框偵測器12使用端點偵測(end-point detection,EPD)以決定語音信號的端點,於該端點之間語音信號非靜止。在一實施例中,大於預設臨界值之語音信號的振幅(其代表音量)決定為端點。在另一實施例中,大於預設臨界值之語音信號的高階差值(high-order difference,HOD)(其代表斜率)決
定為端點。第三A圖例示語音信號及端點的波形,第三B圖例示語音信號的音量與高階差值,第三C圖例示語音訊框。
The voice
本實施例之語音活動偵測系統100可包含語音偵測器13,根據語音訊框以偵測人語(步驟23)。
The voice
在本實施例中,當語音訊框之間的相似度(similarity)或相關度(correlation)的值大於相應臨界值時,則(語音偵測器13)偵測到人語。其中,對語音訊框執行自相關(auto-correlation)(函數)以決定自相關值,其代表語音訊框與具延遲時間的(延遲)語音訊框之間的相似度(或偵測音高(detect pitch))。自相關函數(ACF)可表示如下:
在本實施例中,更對語音訊框(例如語音訊框與具延遲時間的語音訊框)執行正規化平方差(normalized squared difference)(函數)以決定正規化平方差值,正規化平方差函數(NSDF)可表示如下:
在本實施例中,當自相關值大於第一臨界值且正規化平方差值大於第二臨界值時,則偵測到人語。第四A圖例示語音信號及端點的波形,第四B圖例示自相關與相應第一臨界值TH_B,第四C圖例示正規化平方差值與相應第二臨界值TH_C。 In this embodiment, when the autocorrelation value is greater than the first critical value and the normalized square difference is greater than the second critical value, human speech is detected. The fourth figure A illustrates the waveform of the voice signal and the endpoints, the fourth figure B illustrates the autocorrelation and the corresponding first critical value TH_B, and the fourth figure C illustrates the normalized square difference and the corresponding second critical value TH_C.
回到第二圖,如果偵測到人語,則偵測另一語音訊框。如果未偵測到人語(表示偵測到雜訊),則於偵測另一語音訊框之前,以步驟24更新(或調整)語音訊框間之相似度所相應的臨界值。藉此,語音活動偵測系統100與語音活動偵測方法200可根據人語偵測結果以適應決定臨界值,
因而得以適應目前環境,而非如傳統語音活動偵測系統與方法係使用固定臨界值。
Returning to the second figure, if human speech is detected, another voice frame is detected. If human speech is not detected (indicating that noise is detected), before detecting another voice frame, the critical value corresponding to the similarity between voice frames is updated (or adjusted) in
本實施例之語音活動偵測系統100可包含臨界值更新單元14,(當未偵測到人語時),藉由(語音偵測器13發出之)啟動信號以啟動臨界值更新單元14,用以決定更新(第一/第二)臨界值。當未偵測到人語時,啟動信號變為主動。
The voice
第五A圖例示自相關及如何獲得更新第一臨界值。在本實施例中,不具延遲時間的自相關值(亦即,ACF(0))減去特定範圍內的最大自相關值(例如max(ACF(62:188))),以得到更新第一臨界值。 Figure 5A illustrates the autocorrelation and how to obtain the updated first critical value. In this embodiment, the autocorrelation value without delay (i.e., ACF(0)) is subtracted from the maximum autocorrelation value within a specific range (e.g., max(ACF(62:188))) to obtain the updated first critical value.
第五B圖例示正規化平方差及如何獲得更新第二臨界值。在本實施例中,更新第二臨界值等於特定範圍內的最大自相關值(例如max(ACF(62:188)))。 Figure 5B illustrates the normalized squared error and how to obtain the updated second critical value. In this embodiment, the updated second critical value is equal to the maximum autocorrelation value within a specific range (e.g., max(ACF(62:188))).
根據上述實施例,由於偵測人語時的臨界值係適應決定,因此語音活動偵測系統100與語音活動偵測方法200可適應環境變化與雜訊克服,因而輸出可靠且正確的偵測結果。
According to the above-mentioned embodiment, since the critical value when detecting human speech is adaptively determined, the voice
第六圖顯示本發明第一例示實施例的語音活動偵測系統100A的方塊圖。在本實施例中,(僅)當偵測到人語時,語音偵測器13發出語音觸發信號至控制器15,其發出影像觸發信號以喚醒影像感測器16(例如接觸式影像感測器(CIS))以擷取影像。值得注意的是,影像感測器16通常處於低功率模式或睡眠模式,直到影像觸發信號變為主動。藉此,得以大量降低功率消耗與通訊頻寬。
FIG6 shows a block diagram of a voice
在本實施例中,語音活動偵測系統100A可包含人工智慧(AI)引擎17,例如類神經網路,用以分析影像感測器16所擷取影像,並將分析結果傳送至控制器15,其根據分析結果以執行特定功能或應用。
In this embodiment, the voice
第七圖顯示本發明第二例示實施例的語音活動偵測系統100B的方塊圖。第七圖之語音活動偵測系統100B類似於第六圖之語音活動偵測系統100A,其差異處說明如下。
FIG. 7 shows a block diagram of a voice
在本實施例中,語音活動偵測系統100B可更包含語音識別單元18,根據(語音訊框偵測器12之)語音訊框,用以識別口述語言甚至將口述語言翻譯為文字,或者用以識別口述者,或者執行兩者。
In this embodiment, the voice
本實施例之語音活動偵測系統100B可更包含人臉識別單元19,用以從影像感測器16所擷取影像當中識別人臉。僅當(控制器15之)影像觸發信號變為主動時,才會啟動人臉識別單元19。
The voice
以上所述僅為本發明之較佳實施例而已,並非用以限定本發明之申請專利範圍;凡其它未脫離發明所揭示之精神下所完成之等效改變或修飾,均應包含在下述之申請專利範圍內。 The above is only a preferred embodiment of the present invention and is not intended to limit the scope of the patent application of the present invention; any other equivalent changes or modifications that do not deviate from the spirit disclosed by the invention should be included in the scope of the patent application described below.
100:語音活動偵測系統 11:轉換器 12:語音訊框偵測器 13:語音偵測器 14:臨界值更新單元 100: Voice activity detection system 11: Converter 12: Voice frame detector 13: Voice detector 14: Threshold update unit
Claims (11)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/839,962 | 2022-06-14 | ||
US17/839,962 US20230402057A1 (en) | 2022-06-14 | 2022-06-14 | Voice activity detection system |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202349378A TW202349378A (en) | 2023-12-16 |
TWI839132B true TWI839132B (en) | 2024-04-11 |
Family
ID=
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150112689A1 (en) | 2013-10-18 | 2015-04-23 | Knowles Electronics Llc | Acoustic Activity Detection Apparatus And Method |
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150112689A1 (en) | 2013-10-18 | 2015-04-23 | Knowles Electronics Llc | Acoustic Activity Detection Apparatus And Method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9502028B2 (en) | Acoustic activity detection apparatus and method | |
CN111566730B (en) | Voice command processing in low power devices | |
US9940949B1 (en) | Dynamic adjustment of expression detection criteria | |
KR100636317B1 (en) | Distributed Speech Recognition System and method | |
US7227960B2 (en) | Robot and controlling method of the same | |
US20180061396A1 (en) | Methods and systems for keyword detection using keyword repetitions | |
KR101437830B1 (en) | Method and apparatus for detecting voice activity | |
JP3255584B2 (en) | Sound detection device and method | |
KR20090054642A (en) | Method for recognizing voice, and apparatus for implementing the same | |
KR20110131147A (en) | Method of noise reduction using instantaneous signal-to-noise ratio as the principal quantity for optimal estimation | |
JP4682700B2 (en) | Voice recognition device | |
CN112073862B (en) | Digital processor, microphone assembly and method for detecting keyword | |
CN205754809U (en) | A kind of robot self-adapting volume control system | |
JP2012242609A (en) | Voice recognition device, robot, and voice recognition method | |
CN106033673B (en) | A kind of near-end voice signals detection method and device | |
TWI839132B (en) | Voice activity detection system | |
KR20080059881A (en) | Apparatus for preprocessing of speech signal and method for extracting end-point of speech signal thereof | |
JP2023553451A (en) | Hot phrase trigger based on sequence of detections | |
US10104472B2 (en) | Acoustic capture devices and methods thereof | |
US20230402057A1 (en) | Voice activity detection system | |
US20220114447A1 (en) | Adaptive tuning parameters for a classification neural network | |
KR102308022B1 (en) | Apparatus for recognizing call sign and method for the same | |
KR20230118165A (en) | Adapting Automated Speech Recognition Parameters Based on Hotword Attributes | |
CN110958033B (en) | Method and terminal for controlling communication of half-duplex digital intercom system | |
Kim et al. | Sound's Direction Detection and Speech Recognition System for Humanoid Active Audition |