TW201627990A

TW201627990A - Time domain based voice event detection method and related device

Info

Publication number: TW201627990A
Application number: TW104102008A
Authority: TW
Inventors: 劉宜文; 林伯亮
Original assignee: 宇智網通股份有限公司
Priority date: 2015-01-21
Filing date: 2015-01-21
Publication date: 2016-08-01
Also published as: TWI559300B

Abstract

A time domain based voice event detection method is provided. The voice event detection method includes converting an analog audio signal into an audio digital signal according to a sample rate and defining a plurality of sample points as a frame, wherein a sample value of each sample point of the audio digital signal is in decibel, for each sample point of the frame, calculating average difference of a plurality of different sample point groups corresponding to one sample point to generate a plurality of average difference values, determining whether the sample point is a candidate sample point of a voice event according to the plurality of average difference values and determining whether a voice event occurs in the frame to generate a determination result according to the ratio of the number of determined candidate sample points of the frame to the total number of sample points of the frame, and outputting a control signal for controlling an application function according to the determination result.

Description

Sound event detecting method based on time domain operation and related device

本發明關於一種聲音事件偵測方法及相關裝置，尤指一種基於時域運算且不需經過複雜的快速傅利葉轉換運算之聲音事件偵測方法及相關裝置。 The invention relates to a sound event detecting method and related device, in particular to a sound event detecting method and related device based on time domain operation and without complicated complex Fourier transform operation.

聲音偵測技術在語音辨識應用中是重要的處理程序之一。然而，傳統方式通常必須執行大量且複雜的轉換計算過程，例如執行一快速傅利葉轉換(Fast Fourier Transform，FFT)運算，來將音訊訊號由時域(time domain)轉換至頻率域(frequency domain)以進行相關運算，如此一來，將會耗費過多的系統資源及轉換運算時間而不適合於即時的應用。因此，習知技術實有改進的必要。 Sound detection technology is one of the important processing procedures in speech recognition applications. However, the conventional approach usually has to perform a large and complex conversion calculation process, such as performing a Fast Fourier Transform (FFT) operation to convert the audio signal from the time domain to the frequency domain. Perform related operations, which will consume too much system resources and conversion operation time and is not suitable for immediate applications. Therefore, the prior art is in need of improvement.

為了解決上述之問題，本發明提供一種尤指一種基於時域運算之聲音事件偵測方法及相關裝置而能有效降低系統運算量並適用於即時的應用。 In order to solve the above problems, the present invention provides a voice event detecting method and related device based on time domain operation, which can effectively reduce the amount of system computation and is suitable for real-time applications.

本發明揭露一種基於時域運算之聲音事件偵測方法，包含有：依據一取樣率，將一音訊類比訊號轉換成一音訊數位訊號，每一取樣點的取樣值是以分貝(dB)為單位；將一定個數的取樣點定義為一個音框，針對該音框中之每一取樣點，計算相應於一取樣點之複數個不同取樣群組之平均差異，以產生複數個平均差異值；根據該複數個平均差異值，判斷該取樣點是否可能為一聲音事件的候選取樣點，並根據該音框中被判斷為候選取樣點的取樣點個數佔該音框之取樣點總數的比例，判斷該音框中是否發生該聲音事件，以產生一判斷結果；以及根據該判斷結果，輸出一控制訊號，以控制一應用功能。 The invention discloses a sound event detecting method based on time domain operation, which comprises: converting an audio analog signal into an audio digital signal according to a sampling rate, and sampling values of each sampling point are in decibels (dB); Defining a certain number of sampling points as a sound box, for each sampling point in the sound box, calculating an average difference of a plurality of different sampling groups corresponding to one sampling point, Generating a plurality of average difference values; determining, according to the plurality of average difference values, whether the sampling point is a candidate sampling point of a sound event, and occupying the sound according to the number of sampling points determined as candidate sampling points in the sound box The ratio of the total number of sampling points of the frame determines whether the sound event occurs in the sound box to generate a determination result; and outputs a control signal according to the determination result to control an application function.

本發明另揭露一種基於時域運算之聲音事件偵測裝置，包含有：一類比對數位轉換單元，用來依據一取樣率，將一音訊類比訊號轉換成一音訊數位訊號，並將該音訊數位訊號之複數個取樣點定義為一個音框，每一取樣點的取樣值是以分貝(dB)為單位；一計算單元，用來針對該音框中之每一取樣點，計算相應於一取樣點之複數個不同取樣群組之平均差異值，以產生複數個平均差異值；一判斷單元，用來根據該複數個平均差異值，判斷該取樣點是否可能為一聲音事件的候選取樣點，並根據該音框中被判斷為候選取樣點的取樣點個數佔該音框之取樣點總數的比例，判斷該音框中是否發生該聲音事件，以產生一判斷結果；以及一控制單元，用來根據該判斷結果，輸出一控制訊號，以控制一應用功能。 The invention further discloses a sound event detecting device based on time domain operation, comprising: a type of comparison digital conversion unit for converting an audio analog signal into an audio digital signal according to a sampling rate, and the audio digital signal The plurality of sampling points are defined as a sound box, and the sampling value of each sampling point is in decibels (dB); a calculating unit is used to calculate a sampling point corresponding to each sampling point in the sound box. The average difference value of the plurality of different sampling groups to generate a plurality of average difference values; a determining unit configured to determine, according to the plurality of average difference values, whether the sampling point is a candidate sampling point of a sound event, and Determining whether the sound event occurs in the sound box to generate a judgment result according to a ratio of the number of sampling points determined to be candidate sampling points in the sound box to the total number of sampling points of the sound box; and a control unit, According to the judgment result, a control signal is output to control an application function.

10‧‧‧聲音事件偵測裝置 10‧‧‧Sound event detection device

102‧‧‧類比對數位轉換單元 102‧‧‧ analog-to-digital conversion unit

104‧‧‧計算單元 104‧‧‧Computation unit

106‧‧‧判斷單元 106‧‧‧judging unit

108‧‧‧控制單元 108‧‧‧Control unit

20‧‧‧流程 20‧‧‧ Process

202、204、206、208‧‧‧步驟 202, 204, 206, 208‧‧ steps

AD1、AD2、AD3‧‧‧平均差異值 Average difference between AD1, AD2, AD3‧‧

SP1、SP2、SP3、SP4、SP5、SP6、SP7、SP 8‧‧‧取樣點 SP1, SP2, SP3, SP4, SP5, SP6, SP7, SP 8‧‧‧ sampling points

TH1、TH2‧‧‧臨限值 TH1, TH2‧‧‧ threshold

V1、V2、V3、V4、V5、V6、V7、V8‧‧‧取樣值 V1, V2, V3, V4, V5, V6, V7, V8‧‧‧ sample values

第1圖為本發明實施例之一聲音事件偵測裝置之示意圖。 FIG. 1 is a schematic diagram of a sound event detecting apparatus according to an embodiment of the present invention.

第2圖為本發明實施例之一聲音事件偵測方法之示意圖。 FIG. 2 is a schematic diagram of a method for detecting a sound event according to an embodiment of the present invention.

第3圖為本發明實施例之一音訊數位訊號之一音框之示意圖。 FIG. 3 is a schematic diagram of a sound box of an audio digital signal according to an embodiment of the present invention.

第4圖為本發明實施例之取樣點之相應平均差異值與統計結果之示意圖。 FIG. 4 is a schematic diagram of corresponding average difference values and statistical results of sampling points according to an embodiment of the present invention.

第5圖為本發明實施例之一音框中之候選取樣點之示意圖。 FIG. 5 is a schematic diagram of candidate sampling points in a sound box according to an embodiment of the present invention.

請參考第1圖，第1圖為本發明實施例之一聲音事件偵測裝置10 之示意圖。聲音事件偵測裝置10包含有一類比對數位轉換單元102、一計算單元104、一判斷單元106以及一控制單元108。類比對數位轉換單元102用來依據一取樣率，將一音訊類比訊號轉換成一音訊數位訊號，並將音訊數位訊號之複數個取樣點定義為一個音框，其中音訊數位訊號之每一取樣點之取樣值的數值大小是以分貝(decibel，dB)為單位。計算單元104用來對音訊數位訊號之每一音框中之每一取樣點，計算相應於一取樣點之複數個不同取樣群組之平均差異，以產生複數個平均差異值。判斷單元106用來根據該複數個平均差異值判斷該取樣點是否可能為聲音事件的候選取樣點，並根據音框中被判斷為候選取樣點的取樣點個數佔該音框之取樣點總數的比例，判斷該音框中是否發生一聲音事件。控制單元108用來根據判斷結果，輸出一控制訊號，以控制一特定應用功能。 Please refer to FIG. 1 , which is a sound event detecting apparatus 10 according to an embodiment of the present invention. Schematic diagram. The sound event detecting device 10 includes a type of comparison digital bit converting unit 102, a calculating unit 104, a determining unit 106, and a control unit 108. The analog-to-digital conversion unit 102 is configured to convert an audio analog signal into an audio digital signal according to a sampling rate, and define a plurality of sampling points of the audio digital signal as a sound box, wherein each sampling point of the audio digital signal The value of the sampled value is in decibel (dB). The calculating unit 104 is configured to calculate an average difference of a plurality of different sampling groups corresponding to a sampling point for each sampling point in each of the audio digital signals to generate a plurality of average difference values. The determining unit 106 is configured to determine, according to the plurality of average difference values, whether the sampling point is a candidate sampling point of the sound event, and the number of sampling points determined as the candidate sampling point in the sound box accounts for the total number of sampling points of the sound box. The ratio determines whether a sound event has occurred in the sound box. The control unit 108 is configured to output a control signal according to the determination result to control a specific application function.

簡言之，聲音事件偵測裝置10可於時域(time domain)中對音訊數位訊號之取樣點的取樣值執行相關運算處理，以判斷出是否發生聲音事件，並據以輸出相關控制訊號來控制執行相關應用功能。如此一來，本發明可基於時域的運算而不需經過複雜的FFT運算即可判斷出聲音事件，而能有效降低系統運算量並適用於即時的應用，例如即時嵌入式系統的應用。 In short, the sound event detecting device 10 can perform correlation processing on the sampling value of the sampling point of the audio digital signal in the time domain to determine whether a sound event occurs, and output the relevant control signal accordingly. Controls the execution of related application functions. In this way, the present invention can determine the sound event based on the time domain operation without complicated FFT operation, and can effectively reduce the amount of system computation and is suitable for real-time applications, such as the application of an instant embedded system.

關於聲音事件偵測裝置10之運作方式，請參考第2圖，第2圖為本發明實施例之一聲音事件偵測流程20之示意圖。聲音事件偵測流程20包含以下步驟：步驟202：依據取樣率，將音訊類比訊號轉換成音訊數位訊號。 For the operation mode of the sound event detecting device 10, please refer to FIG. 2, which is a schematic diagram of a sound event detecting process 20 according to an embodiment of the present invention. The sound event detection process 20 includes the following steps: Step 202: Convert the audio analog signal into an audio digital signal according to the sampling rate.

步驟204：針對音框中之每一取樣點，計算相應於一取樣點之複數個不同取樣群組之平均差異，以產生複數個平均差異值。 Step 204: Calculate, for each sampling point in the sound box, an average difference of a plurality of different sampling groups corresponding to one sampling point to generate a plurality of average difference values.

步驟206：根據該複數個平均差異值，判斷該取樣點是否可能為聲音事件的候選取樣點，並根據該音框中被判斷為候選取樣點的取樣點個數佔該音框之取樣點總數的比例，判斷該音框中是否發生一聲音事件。 Step 206: Determine, according to the plurality of average difference values, whether the sampling point is a candidate sampling point of the sound event, and according to the number of sampling points that are determined as candidate sampling points in the sound box. The ratio of the total number of sampling points of the sound box determines whether a sound event occurs in the sound box.

步驟208：根據判斷結果輸出控制訊號，以控制應用功能。 Step 208: Output a control signal according to the judgment result to control the application function.

以下將對上述步驟作詳細說明，首先，在步驟202中，當接收到音訊類比訊號後，類比對數位轉換單元102依據一取樣率將音訊類比訊號轉換成音訊數位訊號。音訊數位訊號可切分成音框形式來進行相關後續處理，也就是說，可將該音訊數位訊號之複數個取樣點定義為一個音框，音訊數位訊號之每一取樣點之取樣值的數值大小是以分貝(dB)為單位，且每一音框的取樣點長度係可依需求來設定。 The above steps will be described in detail. First, in step 202, after receiving the audio analog signal, the analog-to-digital conversion unit 102 converts the audio analog signal into an audio digital signal according to a sampling rate. The audio digital signal can be divided into a sound box form for subsequent processing, that is, a plurality of sampling points of the audio digital signal can be defined as a sound box, and the sample value of each sampling point of the audio digital signal is It is in decibels (dB), and the sampling point length of each frame can be set according to requirements.

於步驟204中，針對音訊數位訊號之每一音框中之每一取樣點，計算單元104計算相應於每一音框之每一取樣點之複數個不同取樣群組之平均差異，以產生複數個平均差異值。 In step 204, for each sampling point in each of the audio digital signals, the calculating unit 104 calculates an average difference of a plurality of different sampling groups corresponding to each sampling point of each of the sound frames to generate a plurality of samples. Average difference value.

進一步地，於一實施例中，針對每一音框中之每一取樣點，計算單元104根據複數個參考取樣率，計算相應於一取樣點之該複數個不同取樣群組之平均差異，以產生該複數個平均差異值。較佳地，參考取樣率小於音訊數位訊號之取樣率。 Further, in an embodiment, for each sampling point in each of the sound frames, the calculating unit 104 calculates an average difference of the plurality of different sampling groups corresponding to a sampling point according to the plurality of reference sampling rates, The plurality of average difference values are generated. Preferably, the reference sampling rate is less than the sampling rate of the audio digital signal.

更具體來說，針對相應於一取樣點之複數個不同取樣群組，計算單元104根據每一參考取樣率，分別計算出該取樣點與相距一相應取樣點間隔之相鄰取樣點間之取樣值的絕絕對差異以及至少一個位於該取樣點之後的取樣點與相距一相應取樣點間隔之相鄰取樣點間之取樣值的絕對差異，並將該些絕對差異結果進行平均值運算以產生相應之平均差異值。其中該相應取樣點間隔的取樣點長度等於音訊數位訊號之取樣率與相應參考取樣率之比值。舉例來說，請參考第3圖，第3圖為本發明實施例之一音訊數位訊號之一音框F之示意圖。要注意的是，為方便說明，如第3圖所示，音框F之取樣點長度為8個取樣點，但不以此為限，音框F之取樣點長度可為256個取樣點、512個取樣點......等，端視系統需求而定。假設音訊數位訊號之取樣率為8kHz(即每秒8000個取樣點)。參考取樣率分別為8kHz、4kHz、2.67kHz。 More specifically, for a plurality of different sampling groups corresponding to a sampling point, the calculating unit 104 calculates the sampling between the sampling points and the adjacent sampling points separated by a corresponding sampling point according to each reference sampling rate. Absolute difference in value and absolute difference between sample points between at least one sample point after the sample point and adjacent sample points spaced apart from each other, and averaged the absolute difference results to produce a corresponding The average difference value. The sampling point interval of the corresponding sampling point interval is equal to the ratio of the sampling rate of the audio digital signal to the corresponding reference sampling rate. For example, please refer to FIG. 3, which is an audio digital signal according to an embodiment of the present invention. A schematic diagram of a sound box F. It should be noted that, for convenience of explanation, as shown in FIG. 3, the sampling point length of the sound box F is 8 sampling points, but not limited thereto, the sampling point length of the sound box F may be 256 sampling points, 512 sampling points...etc., depending on system requirements. Assume that the sampling rate of the audio digital signal is 8 kHz (ie, 8000 sampling points per second). The reference sampling rates are 8 kHz, 4 kHz, and 2.67 kHz, respectively.

如第3圖所示，針對取樣點SP1，首先於音框F選取相應於取樣點SP1以及取樣點SP之後的取樣點SP2~SP4。進一步地，若參考取樣率為8kHz，相應取樣點間隔的長度等於1個取樣點間隔(即8kHz/8kHz=1)。接著，計算單元104分別計算出取樣點SP1~SP4之取樣值與相距1個取樣點間隔之相鄰取樣點之取樣值的差異，以產生平均差異值AD1。例如，計算出取樣點SP1與相距1個取樣點間隔之相鄰取樣點SP2間之取樣值的絕對差值，計算出取樣點SP2與相距1個取樣點間隔之相鄰取樣點SP3間之取樣值的絕對差值，計算出取樣點SP3與相距1個取樣點間隔之取樣點SP4間之取樣值的絕對差值，以及計算出取樣點SP4與相距1個取樣點間隔之取樣點SP5間之取樣值的絕對差值。接著，將所計算出的該些絕對差值進行平均運算，以產生平均差異值AD1，其中平均差異值AD1如式(1)所示。 As shown in FIG. 3, for the sampling point SP1, the sampling points SP2 to SP4 corresponding to the sampling point SP1 and the sampling point SP are first selected in the frame F. Further, if the reference sampling rate is 8 kHz, the length of the corresponding sampling point interval is equal to 1 sampling point interval (ie, 8 kHz/8 kHz = 1). Next, the calculating unit 104 calculates the difference between the sampling values of the sampling points SP1 to SP4 and the sampling values of the adjacent sampling points separated by one sampling point interval to generate an average difference value AD1. For example, calculating the absolute difference between the sampling point SP1 and the sampling value between the adjacent sampling points SP2 separated by one sampling point interval, and calculating the sampling between the sampling point SP2 and the adjacent sampling point SP3 separated by one sampling point interval. The absolute difference of the values is calculated, and the absolute difference between the sampling point SP3 and the sampling point SP4 separated by one sampling point interval is calculated, and the sampling point SP4 is calculated between the sampling point SP4 and the sampling point SP5 spaced apart from one sampling point. The absolute difference of the sampled values. Then, the calculated absolute differences are averaged to generate an average difference value AD1, wherein the average difference value AD1 is as shown in the formula (1).

針對取樣點SP1，若參考取樣率為4kHz，相應取樣點間隔的長度等於2個取樣點間隔(即8kHz/4kHz=2)。計算單元104分別計算出取樣點SP1~SP4之取樣值與相距2個取樣點間隔之相鄰取樣點之取樣值的差異，以產生平均差異值AD2。例如，計算出取樣點SP1與相距2個取樣點間隔之相鄰取樣點SP3間之取樣值的絕對差值，計算出取樣點SP2與相距2個取樣點間隔之相鄰取樣點SP4間之取樣值的絕對差值，計算出取樣點SP3與相距2個取樣點間隔之相鄰取樣點SP5間之取樣值的絕對差值，以及計算出取樣點 SP4與相距2個取樣點間隔之相鄰取樣點SP6間之取樣值的絕對差值。接著，將所計算出的該些絕對差值進行平均運算，以產生平均差異值AD2。其中平均差異值AD2如式(2)所示。 For the sampling point SP1, if the reference sampling rate is 4 kHz, the length of the corresponding sampling point interval is equal to 2 sampling point intervals (ie, 8 kHz / 4 kHz = 2). The calculating unit 104 calculates the difference between the sampling values of the sampling points SP1 to SP4 and the sampling values of the adjacent sampling points separated by two sampling point intervals, respectively, to generate an average difference value AD2. For example, calculating the absolute difference between the sampling point SP1 and the sampling value between the adjacent sampling points SP3 separated by two sampling point intervals, and calculating the sampling between the sampling point SP2 and the adjacent sampling point SP4 separated by two sampling point intervals. The absolute difference of the values, the absolute difference between the sampled point SP3 and the sampled value between the adjacent sample points SP5 of the interval between the two sample points is calculated, and the sample point is calculated. The absolute difference between the sampled value between SP4 and the adjacent sampling point SP6 spaced apart by 2 sampling points. Then, the calculated absolute differences are averaged to generate an average difference value AD2. The average difference value AD2 is as shown in the formula (2).

同理，針對取樣點SP1，若參考取樣率為2.67kHz，相應取樣點間隔的長度等於3個取樣點間隔(即8kHz/2.67kHz=3)。計算單元104分別計算出取樣點SP1~SP4之取樣值與相距3個取樣點間隔之相鄰取樣點之取樣值的差異，以產生平均差異值AD3。例如，計算出取樣點SP1與相距3個取樣點間隔之相鄰取樣點SP4間之取樣值的絕對差值，計算出取樣點SP2與相距3個取樣點間隔之相鄰取樣點SP5間之取樣值的絕對差值，計算出取樣點SP3與相距3個取樣點間隔之相鄰取樣點SP6間之取樣值的絕對差值，以及計算出取樣點SP4與相距3個取樣點間隔之相鄰取樣點SP7間之取樣值的絕對差值。接著，將所計算出的該些絕對差值進行平均運算，以產生平均差異值AD3。其中平均差異值AD3如式(3)所示。 Similarly, for the sampling point SP1, if the reference sampling rate is 2.67 kHz, the length of the corresponding sampling point interval is equal to 3 sampling point intervals (ie, 8 kHz / 2.67 kHz = 3). The calculating unit 104 calculates the difference between the sampling values of the sampling points SP1 to SP4 and the sampling values of the adjacent sampling points separated by three sampling point intervals, respectively, to generate an average difference value AD3. For example, the absolute difference between the sampling point SP1 and the sampling value between the adjacent sampling points SP4 separated by three sampling point intervals is calculated, and the sampling between the sampling point SP2 and the adjacent sampling point SP5 separated by three sampling point intervals is calculated. The absolute difference of the values is calculated, and the absolute difference between the sampled point SP3 and the sampled value between the adjacent sample points SP6 separated by 3 sample points is calculated, and the adjacent sample of the sample point SP4 and the interval of 3 sample points is calculated. The absolute difference between the sample values between points SP7. Then, the calculated absolute differences are averaged to generate an average difference value AD3. The average difference value AD3 is as shown in the formula (3).

接著，於步驟206中，判斷單元106依據計算單元104所計算出的平均差異結果來評估判斷所計算的取樣點是否可能為一聲音事件的候選取樣點。也就是說，針對每一音框中之每一取樣點計算出相應於各參考取樣率的平均差異值之後，判斷單元106分別將所計算出的平均差異值與一臨限值TH1進行比較，並統計該些平均差異值中大於臨限值TH1的個數，如第4圖所示。由於所計算出的平均差異值大於臨限值TH1時，表示訊號中有可能存在有聲音事件。因此，依據臨限值TH1來判斷出可能存在有聲音事件的情況。再者，對同一取樣點來說，當相應於不同參考取樣率所計算出的平均差異值大於臨限值TH1的數量越多時，表示訊號中存在有聲音事件的可能性越高。在此情況下，當該些平均差異值中大於臨限值TH1的個數大於臨限值TH2時，判斷單元106判斷該取樣點為一候選取樣點。舉例來說，若臨限值TH2為1個，如第4圖所示，針對取樣點SP1，若依據3個不同參考取樣率所計算出的平均差異值AD1~AD3之中，平均差異值AD1、AD、AD3皆大於臨限值TH1。也就是說，針對取樣點SP1而言，所計算出的該些平均差異值中大於臨限值TH1的個數為3個，也就是說，所計算出的該些平均差異值中大於臨限值TH1的個數大於臨限值TH2(即3>TH2=1)，因此，取樣點SP1被判斷並歸類為一候選取樣點，如第5圖所示(實心為候選點，空心為非候選點)。 Next, in step 206, the determining unit 106 evaluates whether the calculated sampling point is a candidate sampling point of a sound event according to the average difference result calculated by the calculating unit 104. That is, after calculating the average difference value corresponding to each reference sampling rate for each sampling point in each of the sound frames, the determining unit 106 respectively compares the calculated average difference value with a threshold value TH1. And count the number of the average difference values that is greater than the threshold TH1, as shown in FIG. Since the calculated average difference value is greater than the threshold value TH1, it indicates that there is a possibility of a sound event in the signal. Therefore, it is judged based on the threshold value TH1 that there may be a sound event. Moreover, for the same sampling point, when the average difference value calculated corresponding to the different reference sampling rates is greater than the number of the threshold TH1, the probability that there is a sound event in the signal is higher. In this case, when the number of the average difference values greater than the threshold value TH1 is greater than the threshold value TH2, the determining unit 106 determines that the sampling point is a candidate sampling point. For example, if the threshold TH2 is one, as shown in FIG. 4, for the sampling point SP1, if the average difference value AD1 to AD3 calculated based on the three different reference sampling rates, the average difference value AD1 , AD, AD3 are all greater than the threshold TH1. That is to say, for the sampling point SP1, the number of the calculated average difference values that is greater than the threshold value TH1 is three, that is, the calculated average difference values are greater than the threshold. The number of values TH1 is greater than the threshold TH2 (ie, 3>TH2=1). Therefore, the sampling point SP1 is judged and classified as a candidate sampling point, as shown in FIG. 5 (solid is a candidate point, hollow is not Candidate point).

接著，參照第5圖，於步驟206中，判斷單元106根據音框中被判斷為候選取樣點的取樣點個數佔該音框之取樣點總數的比例，判斷該音框中是否發生聲音事件，以產生判斷結果。由於在一音框中被判斷為候選取樣點的取樣點越多，表示發生聲音事件的機率越高。當一音框中被判斷為候選取樣點的取樣點個數佔該音框之取樣點總數之比例大於一第三臨限值TH3時，判斷單元106判斷該音框中發生一聲音事件，並據以產生判斷結果。換言之，透過上述步驟，本發明只要對時域的音訊數位訊號之取樣點的取樣值執行相關運算處理，即可以準確判斷出是否發生聲音事件。 Next, referring to FIG. 5, in step 206, the determining unit 106 determines whether a sound event occurs in the sound box according to the ratio of the number of sampling points determined to be candidate sampling points in the sound box to the total number of sampling points of the sound box. To produce a judgment result. The more sampling points that are judged to be candidate sampling points in a sound box, the higher the probability of occurrence of a sound event. When the ratio of the number of sampling points determined to be the candidate sampling point in the sound box to the total number of sampling points of the sound box is greater than a third threshold value TH3, the determining unit 106 determines that a sound event occurs in the sound box, and According to the results of the judgment. In other words, through the above steps, the present invention can accurately determine whether a sound event occurs by performing correlation processing on the sampled value of the sampling point of the audio digital signal in the time domain.

舉例來說，參照第5圖，若臨限值TH3為0.6，針對音框F中之取樣點SP1~SP8，於步驟206中，取樣點SP1、SP2、SP3、SP5、SP6、SP7、SP8被判斷並歸類為候選取樣點。在此情況下，音框F中被判斷為候選取樣點的取樣點個數為7個，音框F之取樣點總數為8個，如此一來，被判斷為候選取樣點的取樣點個數(7個)與音框F之取樣點總數(8個)之比例為 7/8=0.875，也就是說，音框F中被判斷為候選取樣點的取樣點個數與取樣點總數之比例大於臨限值TH3(0.875>0.6)。因此，判斷單元106判斷音框F中發生了一聲音事件。 For example, referring to FIG. 5, if the threshold TH3 is 0.6, for the sampling points SP1 to SP8 in the frame F, in step 206, the sampling points SP1, SP2, SP3, SP5, SP6, SP7, SP8 are Determined and classified as candidate sampling points. In this case, the number of sampling points determined to be candidate sampling points in the sound box F is seven, and the total number of sampling points in the sound box F is eight, and thus, the number of sampling points that are determined as candidate sampling points is determined. The ratio of (7) to the total number of sampling points (8) of the frame F is 7/8=0.875, that is, the ratio of the number of sampling points judged to be candidate sampling points in the sound box F to the total number of sampling points is greater than the threshold TH3 (0.875>0.6). Therefore, the judging unit 106 judges that a sound event has occurred in the sound box F.

簡言之，透過步驟206中的運算處理，可以判斷出音訊數位訊號之複數個音框中之某一音框中的哪些取樣點可視為候選取樣點，並依據被判斷為候選取樣點的取樣點個數所佔比例來評估音框中的聲音事件。 In short, through the arithmetic processing in step 206, it can be determined which sampling points in a certain sound box of the audio signal of the audio digital signal can be regarded as candidate sampling points, and according to the sampling determined as candidate sampling points. The proportion of the number of points is used to evaluate the sound events in the sound box.

於步驟208中，控制單元108根據判斷結果輸出控制訊號以控制相關應用功能。也就是說，透過步驟202至步驟206，聲音事件偵測裝置10持續對所輸入的音訊訊號進行聲音事件的偵測處理。當判斷結果指示音訊數位訊號之某一音框中發生聲音事件時，控制單元108會輸出一控制訊號以控制相關應用功能。舉例來說，以聲音事件偵測裝置10應用於一嬰兒哭聲偵測系統(或一防盜系統)為例，若環境中有嬰兒哭聲(或其他聲音)時，判斷單元106所輸出的判斷結果指示音訊數位訊號之音框中發生聲音事件，控制單元108會輸出一控制訊號，以觸發該嬰兒哭聲偵測系統(或防盜系統)將該音訊數位訊號傳送至遠端之通訊裝置。或者是，控制單元108會輸出一控制訊號，以觸發該嬰兒哭聲偵測系統(或防盜系統)傳送簡訊至遠端之通訊裝置，如此一來，持有該通訊裝置之使用者即可知曉聲音事件的發生。 In step 208, the control unit 108 outputs a control signal according to the determination result to control the related application function. That is to say, through steps 202 to 206, the sound event detecting device 10 continuously performs a sound event detecting process on the input audio signal. When the judgment result indicates that a sound event occurs in a certain sound box of the audio digital signal, the control unit 108 outputs a control signal to control the related application function. For example, the sound event detecting device 10 is applied to a baby crying detection system (or an anti-theft system). If there is a baby crying (or other sound) in the environment, the judgment output by the determining unit 106 is used. The result indicates that a sound event occurs in the sound box of the audio digital signal, and the control unit 108 outputs a control signal to trigger the baby crying detection system (or the anti-theft system) to transmit the audio digital signal to the remote communication device. Alternatively, the control unit 108 outputs a control signal to trigger the baby crying detection system (or the anti-theft system) to transmit the short message to the remote communication device, so that the user holding the communication device can know The occurrence of a sound event.

值得注意的是，第1圖之聲音事件偵測裝置10僅為本發明之一舉例說明，本領域具通常知識者當可依本發明之精神加以修飾或變化，而不限於此。例如，取樣率為可為8kHz、11.025kHz、22.05kHz、16kHz、37.8kHz、44.1kHz、48kHz，但不以此為限。音框的取樣點長度可依需求來設定。透過改變臨限值TH1、TH2、TH3的大小，也可調整聲音事件偵測裝置10之偵測靈敏度。此外，於步驟204中，音訊數位訊號中之相鄰音框彼此重疊，如此一來，將可使音框之間保有資料的關聯性。 It is to be noted that the sound event detecting device 10 of the first embodiment is merely illustrative of one of the present invention, and those skilled in the art can modify or change the present invention without departing from the scope of the present invention. For example, the sampling rate may be 8 kHz, 11.025 kHz, 22.05 kHz, 16 kHz, 37.8 kHz, 44.1 kHz, 48 kHz, but not limited thereto. The sampling point length of the frame can be set according to requirements. The detection sensitivity of the sound event detecting device 10 can also be adjusted by changing the size of the threshold values TH1, TH2, and TH3. In addition, in step 204, adjacent audio frames in the audio digital signal overlap each other, As a result, the association of the data between the frames will be preserved.

綜上所述，本發明之聲音事件偵測裝置10可於時域中對音訊數位訊號之取樣點的取樣值執行相關運算處理，以判斷出是否發生聲音事件，並據以輸出相關控制訊號來控制執行相關應用功能。如此一來，本發明可基於時域的運算而不需經過複雜的FFT運算即可判斷出聲音事件，而能有效降低系統運算量並適用於即時的應用。 In summary, the sound event detecting apparatus 10 of the present invention can perform correlation processing on the sampling value of the sampling point of the audio digital signal in the time domain to determine whether a sound event occurs, and output the relevant control signal accordingly. Controls the execution of related application functions. In this way, the present invention can determine the sound event based on the time domain operation without complicated FFT operation, and can effectively reduce the system calculation amount and is suitable for the instant application.

以上所述僅為本發明之較佳實施例，凡依本發明申請專利範圍所做之均等變化與修飾，皆應屬本發明之涵蓋範圍。 The above are only the preferred embodiments of the present invention, and all changes and modifications made to the scope of the present invention should be within the scope of the present invention.

20‧‧‧流程 20‧‧‧ Process

202、204、206、208‧‧‧步驟 202, 204, 206, 208‧‧ steps

Claims

A method for detecting a sound event based on a time domain operation includes: converting an audio analog signal into an audio digital signal according to a sampling rate, and defining a plurality of sampling points of the audio digital signal as a sound box, each The sampling value of the sampling point is in decibels (dB); for each sampling point in the sound box, an average difference of a plurality of different sampling groups corresponding to one sampling point is calculated to generate a plurality of average difference values; Determining, according to the plurality of average difference values, whether the sampling point is a candidate sampling point of a sound event, and according to the ratio of the number of sampling points determined as candidate sampling points in the sound box to the total number of sampling points of the sound box Determining whether the sound event occurs in the sound box to generate a determination result; and outputting a control signal according to the determination result to control an application function.

The sound event detecting method of claim 1, wherein calculating the average difference of the plurality of different sampling groups corresponding to the sampling point to generate the plurality of average difference values comprises: according to the plurality of reference sampling rates And calculating an average difference of the plurality of different sampling groups corresponding to the sampling point to generate the plurality of average difference values.

The method for detecting a sound event according to claim 2, wherein the step of calculating the average difference of the plurality of different sampling groups corresponding to the sampling point according to the plurality of reference sampling rates to generate the plurality of average difference values The method includes: for a first sampling group of the plurality of different sampling groups, calculating, according to one of the plurality of reference sampling rates, a first reference sampling rate, respectively, the sampling point is adjacent to a first sampling point interval The absolute difference between the sampled values between the sampling points and the absolute difference between the sampled points at least one of the sampling points after the sampling point and the adjacent sampling points spaced apart from the first sampling point And generating a plurality of first absolute differences, wherein a length of the first sampling point interval is equal to a ratio of the sampling rate of the audio digital signal to the first reference sampling rate; and calculating the plurality of first absolute differences An average value as a first average difference value; for the second sampling group of the plurality of different sampling groups, the sampling point is calculated according to the second reference sampling rate of the plurality of reference sampling rates Absolute difference between sample values between adjacent sample points separated by a second sample point interval and absolute difference between sample points between at least one sample point located after the sample point and adjacent sample points spaced apart from the second sample point interval a plurality of second absolute difference values, wherein a length of the second sampling point interval is equal to a ratio of the sampling rate of the audio digital signal to the second reference sampling rate; and calculating the plurality of second absolute differences The average value is taken as a second average difference value.

The sound event detecting method of claim 2, wherein the plurality of reference sampling rates are smaller than the sampling rate of the audio digital signal.

The sound event detecting method of claim 1, wherein the step of determining whether the sampling point is a candidate sampling point of the sound event according to the plurality of average difference values comprises: comparing the plurality of average difference values with a first a threshold value, and counting the number of the plurality of average difference values that is greater than the first threshold value; and the number of the plurality of average difference values that is greater than the first threshold value is greater than a second threshold When the value is determined, the sampling point is judged to be the candidate sampling point.

The method for detecting a sound event according to claim 1, wherein the sound box determines whether the number of sampling points of the candidate sampling point is the ratio of the total number of sampling points of the sound box, and whether the sound box occurs in the sound box. The step of generating the judgment result includes: determining the sound box when the ratio of the number of sampling points determined to be candidate sampling points in the sound box to the total number of sampling points of the sound box is greater than a third threshold value The sound event occurs in the middle to generate the critical result.

The sound event detecting method of claim 1, wherein the step of outputting a control signal to control an application function according to the determining result comprises: outputting the sound event when the determining result indicates that the sound event occurs in the sound box Control signals to control the application's functionality.

The method for detecting a sound event according to claim 1, wherein adjacent sound frames in the audio digital signal overlap each other.

A sound event detecting device based on time domain operation includes: a type of comparison digital conversion unit for converting an audio analog signal into an audio digital signal according to a sampling rate, and sampling the plurality of audio digital signals The point is defined as a sound box, and the sampling value of each sampling point is in decibels (dB); a calculating unit is used to calculate a plurality of different ones corresponding to one sampling point for each sampling point in the sound box. The average difference of the sampling groups is generated to generate a plurality of average difference values; a determining unit is configured to determine, according to the plurality of average difference values, whether the sampling point is a candidate sampling point of a sound event, and according to the sound box The ratio of the number of sampling points determined as candidate sampling points to the total number of sampling points of the sound box, determining whether the sound event occurs in the sound box to generate a determination result; and a control unit for determining the result according to the determination , output a control signal to control an application function.

The sound event detecting device of claim 9, wherein the calculating unit calculates an average difference of the plurality of different sampling groups corresponding to the sampling point according to the plurality of reference sampling rates to generate the plurality of average difference values, The determining unit compares the plurality of average difference values with a first threshold value and counts the number of the plurality of average difference values that is greater than the first threshold value, and the judgment sheet And determining, according to the plurality of average difference values, that the number of the first threshold is greater than a second threshold, determining each of the sampling points as the candidate sampling point.

The sound event detecting device of claim 10, wherein the calculating unit calculates, according to one of the plurality of different sampling groups, the first sampling group, according to the first reference sampling rate of the plurality of reference sampling rates An absolute difference between the sample point and a sample value between adjacent sample points spaced apart from a first sample point and at least one sample point located after the sample point and a sample between adjacent sample points spaced apart from the first sample point An absolute difference of values to generate a plurality of first absolute difference values, the calculation unit calculates an average of the plurality of first absolute difference values as a first average difference value, the calculation unit is for the plurality of different sampling groups a second sampling group, and calculating, according to one of the plurality of reference sampling rates, a second reference sampling rate, an absolute difference between the sampling points of the sampling points and adjacent sampling points spaced apart from the second sampling point, and at least one An absolute difference between sample points located after the sampling point and adjacent sample points spaced apart from the second sampling point to generate a plurality of second absolute differences to The calculating unit calculates an average of the plurality of second absolute differences as a second average difference value, wherein the length of the first sampling point interval is equal to the sampling rate of the audio digital signal and the first reference sampling rate The ratio of the second sampling point interval is equal to the ratio of the sampling rate of the audio digital signal to the second reference sampling rate.

The sound event detecting device of claim 10, wherein the plurality of reference sampling rates are less than the sampling rate of the audio digital signals.

The sound event detecting device of claim 9, wherein the determining unit determines that the number of sampling points of the candidate sampling points in the sound box is greater than a third threshold of the total number of sampling points of the sound box. When it is determined, the sound event occurs in the sound box.

The sound event detecting device of claim 9, wherein the control unit outputs the control signal to control the application function when the determining result indicates that the sound event occurs in the sound box.

The sound event detecting device of claim 9, wherein the adjacent sound frames in the audio digital signal overlap each other.