TW202042217A

TW202042217A - Method for detecting baby cry

Info

Publication number: TW202042217A
Application number: TW108116218A
Authority: TW
Inventors: 林至善
Original assignee: 佑華微電子股份有限公司
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2020-11-16
Also published as: TWI687920B

Abstract

The disclosure provides a method for detecting baby cry, comprising: extracting at least a feature set of a sound signal to be tested, and further comprising: performing framing on the sound signal to be tested to obtain at least a sound frame; obtaining a fundamental frequency of each sound frame; performing a DC removal on each sound frame; calculate a signal strength and a signal zero-crossing rate of each DC-removed sound frame, the fundamental frequency, signal strength and signal zero-crossing rate constituting a feature set; and determining whether the sound signal to be tested including a baby crying sound to obtain a detection result, further comprising: detecting a sound frame attribute of each feature set, and according to the sound frame attribute, determining whether the sound signal to be tested including a baby crying sound to obtain the detection result.

Description

Baby crying detection method

本發明係有關一種嬰兒哭聲偵測方法。The invention relates to a method for detecting infant crying.

由於智慧電子產品的日益普及，過去常見類似對講機功能的嬰兒監視器(baby monitor)已不敷使用，越來越多的功能，例如，嬰兒哭聲自動偵測功能，也受到許多父母的青睞。在目前已知嬰兒哭聲偵測方法中，常見的方式是透過將聲音裡的過零率當作特徵值，再搭配臨界值及特定判斷規則，來判斷所收到的聲音源中是否含有嬰兒的哭聲。然而，由於聲音裡的過零率受非嬰兒哭聲干擾的影響較大，容易影響判斷的準確性。另一類常見的方式則是透過將倒頻譜係數當作特徵值，再搭配機器學習或樣式辨認(pattern recognition)演算法進行判別；這類方式的缺點是需要收集大量已標註的樣本進行訓練，並且需要執行的運算量較高。Due to the increasing popularity of smart electronic products, the common baby monitors with functions similar to walkie-talkies in the past are no longer adequate. More and more functions, such as automatic detection of baby crying, are also favored by many parents. In the current known baby crying detection methods, the common method is to judge whether the received sound source contains a baby by taking the zero-crossing rate in the sound as a characteristic value, and then matching the threshold value and specific judgment rules. Crying. However, since the zero-crossing rate in the sound is greatly affected by the interference of non-baby crying, it is easy to affect the accuracy of judgment. Another common method is to use cepstral coefficients as feature values, and then use machine learning or pattern recognition algorithms for discrimination; the disadvantage of this method is that it requires a large number of labeled samples to be collected for training, and The amount of computation that needs to be performed is high.

本發明之實施例揭露一種嬰兒哭聲偵測方法，包含下列步驟：萃取特徵值步驟，係將一待測聲音訊號依時序輸入以萃取該待測聲音訊號的至少一特徵值組；以及，特徵值判斷步驟，係將該特徵值組依時序輸入並根據該特徵值組判斷該待測聲音訊號是否包含一嬰兒哭聲，以得到一偵測結果；其中，該萃取特徵值步驟更包括：將該待測聲音訊號進行音框化，產生至少一音框化聲音訊號；計算每個音框化聲音訊號的訊號基頻以得到一音框基頻；將該音框化聲音訊號進行直流移除運算，產生一直流移除音框化聲音訊號；計算該直流移除音框化聲音訊號的訊號強度、以及訊號過零率，以分別得到一音框強度、以及一音框過零率，該音框強度、音框過零率、以及音框基頻即構成一特徵值組；以及，該特徵值判斷步驟更包括：檢測該特徵值組的一音框屬性，再針對該音框屬性判斷該待測聲音訊號是否包含嬰兒哭聲，以得到該偵測結果。An embodiment of the present invention discloses a baby crying detection method, which includes the following steps: a step of extracting characteristic values, which is a step of inputting a sound signal to be measured in time sequence to extract at least one characteristic value group of the sound signal to be measured; and, The value judgment step is to input the characteristic value group in time sequence and determine whether the sound signal to be measured includes a baby cry according to the characteristic value group to obtain a detection result; wherein, the characteristic value extraction step further includes: The sound signal to be tested is sound-framed to generate at least one sound-framed sound signal; the signal fundamental frequency of each sound-framed sound signal is calculated to obtain a sound-frame fundamental frequency; the sound-framed sound signal is DC removed Calculation to generate a direct current to remove the framed audio signal; calculate the signal strength of the DC removed framed audio signal and the signal zero crossing rate to obtain a frame strength and a frame zero crossing rate respectively. The sound frame intensity, the sound frame zero-crossing rate, and the sound frame fundamental frequency constitute a feature value group; and the feature value judgment step further includes: detecting a sound frame attribute of the feature value group, and then determining the sound frame attribute Whether the sound signal to be tested contains baby crying to obtain the detection result.

在一較佳實施例中，計算該待測聲音訊號的訊號強度更包括下列步驟：將該直流移除音框化聲音訊號進行時域能量計算，產生一音框能量；將該音框能量進行能量平滑化運算，即可得該音框強度。In a preferred embodiment, calculating the signal strength of the sound signal under test further includes the following steps: performing time-domain energy calculation on the DC-removed sound framed sound signal to generate a sound frame energy; Energy smoothing calculation can get the sound frame intensity.

在一較佳實施例中，計算該待測聲音訊號的訊號過零率更包括下列步驟：將該直流移除音框化聲音訊號進行過零次數計算，產生一音框過零次數；將該音框過零次數進行過零次數平滑化運算，即可得該音框過零率。In a preferred embodiment, calculating the signal zero-crossing rate of the sound signal under test further includes the following steps: calculating the zero-crossing times of the DC-removed sound framed sound signal to generate a sound frame zero-crossing times; The zero-crossing frequency of the sound frame is smoothed to obtain the zero-crossing rate of the sound frame.

在一較佳實施例中，計算該待測聲音訊號的訊號基頻更包括下列步驟：根據該音框化聲音訊號產生一能量頻譜；根據該能量頻譜產生一基頻估測值；將該基頻估測值進行基頻估測值平滑化運算，即可得該音框基頻。In a preferred embodiment, calculating the signal base frequency of the sound signal to be measured further includes the following steps: generating an energy spectrum based on the sound framed sound signal; generating an estimated base frequency based on the energy spectrum; The basic frequency of the sound frame can be obtained by smoothing the estimated value of the fundamental frequency.

在一較佳實施例中，該產生一能量頻譜步驟係包括：將該音框化聲音訊號進行加窗，產生一加窗音框化聲音訊號；將該加窗音框化聲音訊號進行時頻轉換，產生一頻譜；將該頻譜透過頻譜能量計算，產生該能量頻譜。In a preferred embodiment, the step of generating an energy spectrum includes: windowing the sound framed sound signal to generate a windowed sound framed sound signal; and performing time-frequency on the windowed sound framed sound signal Converting to generate a frequency spectrum; passing the frequency spectrum through the spectrum energy calculation to generate the energy spectrum.

在一較佳實施例中，該產生一基頻估測值步驟更包括：根據該能量頻譜，產生一區域峰值組，係先在該能量頻譜上，將一個頻點選為一候選峰值，再以該候選峰值為參考點，進行區域能量比較，若該候選峰值在區域能量比較中被判定為勝出，則將該候選峰值標註為一區域峰值，反之則標註為其它，直到該能量頻譜上的所有的頻點都被標註完畢為止，所有該區域峰值之集合即為該區域峰值組，其中該區域能量比較，係指若該候選峰值之能量大於以該候選峰值為中心之一頻率範圍內所有其他頻點之能量，則將該候選峰值判定為勝出；然後，計算峰值間隔，包含，若該區域峰值組之區域峰值數高於一區域峰值數門檻，則計算該區域峰值組中相鄰峰值之間隔，以產生一峰值間隔組；反之，則判定基頻估測結果為不穩定；以及，計算基頻，根據該峰值間隔組計算基頻，產生一基頻估測結果，更包含：排除異常間隔，係排除峰值間隔組中之異常極值，以得到一正常峰值間隔組；檢測峰值間隔變異度，係計算該正常峰值間隔組中極值之差異，若差異小於一差異門檻，則進行峰值平均間隔計算，反之則判定該基頻估測結果為不穩定；計算峰值平均間隔，係計算該正常峰值間隔組之平均值，以得到一峰值平均間隔；搜尋基頻峰值，係在該能量頻譜上峰值平均間隔處搜尋該基頻峰值；以及基頻加權平均，係將該基頻峰值與其上下頻點之能量較高者進行加權平均，即可得該基頻估測值。In a preferred embodiment, the step of generating a fundamental frequency estimation value further includes: generating a regional peak group according to the energy spectrum, first selecting a frequency point as a candidate peak on the energy spectrum, and then Use the candidate peak as the reference point to compare the regional energy. If the candidate peak is judged to be the winner in the regional energy comparison, the candidate peak is marked as a regional peak, otherwise, it is marked as other, until the energy spectrum Until all the frequency points are marked, the set of all peaks in the region is the peak group of the region. The energy comparison of the region means that if the energy of the candidate peak is greater than all the peaks in the frequency range centered on the candidate peak For the energy of other frequency points, the candidate peak is judged as the winner; then, the peak interval is calculated, including, if the number of regional peaks in the regional peak group is higher than the threshold of a regional peak number, then the adjacent peaks in the regional peak group are calculated In order to generate a peak interval group; otherwise, determine that the fundamental frequency estimation result is unstable; and, calculate the fundamental frequency, calculate the fundamental frequency according to the peak interval group, and generate a fundamental frequency estimation result, including: The abnormal interval is to exclude the abnormal extreme value in the peak interval group to obtain a normal peak interval group; to detect the peak interval variability, to calculate the difference of the extreme value in the normal peak interval group, if the difference is less than a difference threshold, proceed Peak average interval calculation, otherwise it is judged that the fundamental frequency estimation result is unstable; to calculate the peak average interval, calculate the average value of the normal peak interval group to obtain a peak average interval; search for the fundamental frequency peak, based on the energy Search for the fundamental frequency peak at the average interval of the peaks on the spectrum; and fundamental frequency weighted average, which is a weighted average of the fundamental frequency peak and the higher energy of the upper and lower frequency points to obtain the fundamental frequency estimate.

在一較佳實施例中，該檢測該特徵值組的一音框屬性步驟更包括：對該音框進行強音框檢測，若該音框強度大於一強度門檻，則判定該音框具強音框屬性；反之則判定該音框具弱音框屬性；以及，若該音框具強音框屬性，則再對該音框進行哭聲音框檢測，若該音框過零率落在一過零率上下界之間，或者該音框基頻落在一基頻上下界之間，則判定該音框具哭聲音框屬性。In a preferred embodiment, the step of detecting a sound frame attribute of the feature value group further includes: performing strong sound frame detection on the sound frame, and if the strength of the sound frame is greater than a strength threshold, then determining that the sound frame is strong The sound frame attribute; otherwise, it is determined that the sound frame has the weak sound frame attribute; and, if the sound frame has the strong sound frame attribute, then the crying sound frame detection is performed on the sound frame, and if the zero crossing rate of the sound frame falls within one If it is between the upper and lower bounds of zero rate, or the fundamental frequency of the sound frame falls between the upper and lower bounds of a fundamental frequency, it is determined that the sound frame has the crying sound frame attribute.

在一較佳實施例中，該針對該音框屬性判斷是否包含嬰兒哭聲步驟更包括：計算強音框以及哭聲音框的數量；若兩相鄰音框之屬性依序為先強後弱，則進行聲音長度檢測；若通過該聲音長度檢測，則進行哭聲程度檢測；反之則判定偵測結果為非哭聲，各屬性音框計數歸零。In a preferred embodiment, the step of judging whether the sound frame attribute includes baby crying sound further includes: calculating the number of strong sound frame and crying sound frame; if the attributes of two adjacent sound frames are first strong and then weak , The sound length detection is performed; if the sound length detection is passed, the crying degree detection is performed; otherwise, the detection result is determined to be non-crying, and the count of each attribute sound frame is reset to zero.

在一較佳實施例中，該哭聲程度檢測係指若該哭聲音框計數超過一哭聲音框計數門檻，則判定偵測結果為哭聲，各屬性音框計數歸零；若該哭聲音框計數與該強音框計數之比例高於一哭聲比例門檻，則判定該待測聲音訊號為類哭聲，若類哭聲出現次數超過一類哭聲計數門檻，則判定偵測結果為哭聲，各屬性音框計數歸零；若相鄰兩次類哭聲之間隔大於一類哭聲間隔門檻，則類哭聲計數歸零。In a preferred embodiment, the degree of crying detection means that if the crying sound frame count exceeds a crying sound frame count threshold, the detection result is determined to be a crying sound, and the count of each attribute sound frame is reset to zero; If the ratio of the frame count to the strong sound frame count is higher than the crying ratio threshold, the sound signal to be tested is determined to be a cry-like sound, and if the number of occurrences of the cry-like sound exceeds the threshold of a cry-like count, the detection result is determined to be crying The count of each attribute sound frame is reset to zero; if the interval between two adjacent crying sounds is greater than the threshold of the interval of a kind of crying, the crying count is reset to zero.

以下藉由特定的具體實施例說明本發明之實施方式，熟悉此技術之人士可由本說明書所揭示之內容輕易地瞭解本發明之其他優點及功效。本發明亦可藉由其他不同的具體實例加以施行或應用，本發明說明書中的各項細節亦可基於不同觀點與應用在不悖離本發明之精神下進行各種修飾與變更。The following specific examples illustrate the implementation of the present invention. Those skilled in the art can easily understand the other advantages and effects of the present invention from the contents disclosed in this specification. The present invention can also be implemented or applied by other different specific examples, and various details in the specification of the present invention can also be modified and changed based on different viewpoints and applications without departing from the spirit of the present invention.

其中，本說明書所附圖式繪示之結構、比例、大小等，均僅用以配合說明書所揭示之內容，以供熟悉此技術之人士瞭解與閱讀，並非用以限定本發明可實施之限定條件，故不具技術上之實質意義，任何結構之修飾、比例關之改變或大小之調整，在不影響本發明所能產生之功效及所能達成之目的下，均應落在本發明所揭示之技術內容得能涵蓋之範圍內。Among them, the structure, ratio, size, etc. shown in the drawings in this specification are only used to match the content disclosed in the specification for the understanding and reading of those familiar with the technology, and are not intended to limit the implementation of the present invention. Conditions, so it does not have any technical significance. Any structural modification, ratio change or size adjustment, without affecting the effects and objectives that can be achieved by the invention, should fall within the disclosure of the invention The technical content must be covered.

如圖1所示，本發明之實施例揭露一種嬰兒哭聲偵測方法，包含：步驟100：萃取特徵值步驟，係將一待測聲音訊號依時序輸入以萃取該待測聲音訊號的至少一特徵值組；以及，步驟200：特徵值判斷步驟，係將該特徵值組依時序輸入並根據該特徵值組判斷該待測聲音訊號是否包含一嬰兒哭聲，以得到一偵測結果；其中，該萃取特徵值步驟更包括：步驟110將該待測聲音訊號進行音框化，產生至少一音框化聲音訊號；步驟120計算每個音框化聲音訊號的訊號基頻以得到一音框基頻；步驟130將該音框化聲音訊號進行直流移除運算，產生一直流移除音框化聲音訊號；步驟140計算該直流移除音框化聲音訊號的訊號強度、以及訊號過零率，以分別得到一音框強度、以及一音框過零率，該音框強度、音框過零率、以及音框基頻即構成一特徵值組；以及，該特徵值判斷步驟更包括：步驟210：檢測該特徵值組的一音框屬性，以及步驟220：再針對該音框屬性判斷該待測聲音訊號是否包含嬰兒哭聲，以得到該偵測結果。As shown in FIG. 1, an embodiment of the present invention discloses a baby crying detection method, which includes: Step 100: a step of extracting characteristic values, in which a sound signal to be measured is input in time sequence to extract at least one of the sound signals to be measured Feature value group; and, step 200: feature value judging step, which is to input the feature value group in time sequence and determine whether the sound signal to be tested includes a baby cry according to the feature value group to obtain a detection result; wherein , The step of extracting the characteristic value further includes: step 110 to frame the sound signal to be tested to generate at least one sound framed sound signal; step 120 calculate the signal fundamental frequency of each sound framed sound signal to obtain a sound frame Fundamental frequency; step 130 performs a DC removal operation on the audio framed audio signal to generate a direct current removed audio framed audio signal; step 140 calculates the signal strength of the DC audio framed audio signal and the signal zero crossing rate , To obtain a sound frame intensity and a sound frame zero-crossing rate respectively, the sound frame strength, the sound frame zero-crossing rate, and the sound frame fundamental frequency constitute a feature value group; and the feature value judging step further includes: Step 210: Detect a sound frame attribute of the feature value group, and Step 220: Determine whether the sound signal to be tested includes baby crying according to the sound frame attribute, so as to obtain the detection result.

所謂音框(frame)係先將 N 個取樣點集合成一個觀測單位，稱為音框，通常 N 的值是 256 或 512，涵蓋的時間約為 20~30 ms 左右。為了避免相鄰兩音框的變化過大，會讓兩相鄰音框之間有一段重疊區域，此重疊區域包含了 M 個取樣點，通常 M 的值約是 N 的一半或 1/3。其中，在進行音框化時(步驟110)，所產生的各音框之間會有部分重疊。值得說明的是，上述之N值、M值、涵蓋的時間長度、以及音框之間是否重疊皆只是習知用來說明本發明之實施例，但在實際應用時並不限於此。The so-called frame is to gather N sampling points into an observation unit, called frame. Usually, the value of N is 256 or 512, and the time covered is about 20-30 ms. In order to avoid excessive changes between two adjacent sound frames, there will be an overlap area between two adjacent sound frames. This overlap area contains M sampling points. Usually the value of M is about half or 1/3 of N. Among them, when the sound frame is performed (step 110), the generated sound frames will partially overlap. It is worth noting that the above-mentioned N value, M value, the length of time covered, and whether the sound frames overlap or not are all conventionally used to illustrate the embodiments of the present invention, but are not limited to these in practical applications.

值得說明的是，在步驟130的主要目的係針對該待測聲音訊號擷取出一組具有嬰兒哭聲辨識度的特徵值；在本發明的設定中該具有嬰兒哭聲辨識度的特徵值組至少包含一訊號強度、一訊號過零率、以及一訊號基頻。其中，該音框強度、音框過零率、以及音框基頻可分別進行計算，其計算順序並無先後之分。It is worth noting that the main purpose of step 130 is to extract a set of feature values with baby cry recognition for the sound signal to be tested; in the setting of the present invention, the feature value set with baby cry recognition is at least It includes a signal strength, a signal zero-crossing rate, and a signal base frequency. Among them, the sound frame intensity, the sound frame zero-crossing rate, and the sound frame fundamental frequency can be calculated separately, and the calculation sequence is not prioritized.

圖2為本發明之一種嬰兒哭聲偵測方法中計算待測聲音訊號的訊號強度的流程示意圖；圖3為本發明之一種嬰兒哭聲偵測方法中計算待測聲音訊號的訊號過零率的流程示意圖；圖4為本發明之一種嬰兒哭聲偵測方法中計算待測聲音訊號的訊號基頻的流程示意圖。2 is a schematic diagram of the flow chart of calculating the signal intensity of the sound signal to be measured in a method for detecting infant crying of the present invention; FIG. 3 is a flow chart of calculating the signal zero-crossing rate of the sound signal to be measured in a method for detecting infant crying of the present invention Figure 4 is a schematic flow chart of calculating the signal base frequency of the sound signal to be tested in a method for detecting baby crying according to the present invention.

具體來說，如圖2所示，計算該待測聲音訊號的訊號強度更包括：步驟1301、將該直流移除音框化聲音訊號進行時域能量計算，產生一音框能量；步驟1302、將該音框能量進行能量平滑化運算，即可得該音框強度。其中，在一較佳實施例中，進行時域能量計算時所使用的方法為取用該音框內所有取樣點的絕對值之平均值，但不限於此。同樣地，在一較佳實施例中，進行能量平滑化運算時所使用的方法為：將當前音框能量與前一個音框能量進行加權平均，但也不限於此。Specifically, as shown in FIG. 2, calculating the signal intensity of the sound signal to be measured further includes: step 1301, performing time-domain energy calculation on the DC removed sound framed sound signal to generate a sound frame energy; step 1302 Perform energy smoothing calculation on the sound frame energy to obtain the sound frame intensity. Among them, in a preferred embodiment, the method used in the time-domain energy calculation is to take the average of the absolute values of all sampling points in the sound frame, but it is not limited to this. Similarly, in a preferred embodiment, the method used when performing the energy smoothing calculation is: weighted average the energy of the current sound frame and the energy of the previous sound frame, but it is not limited to this.

另一方面，如圖3所示，計算該待測聲音訊號的訊號過零率更包括：步驟1303、將該直流移除音框化聲音訊號進行過零次數計算，產生一音框過零次數；步驟1304、將該音框過零次數進行過零次數平滑化運算，即可得該音框過零率。其中，在一較佳實施例中，進行過零次數平滑化運算時所使用的方法為：將當前音框過零次數與前一個音框過零次數進行加權平均，但也不限於此。On the other hand, as shown in FIG. 3, calculating the signal zero-crossing rate of the sound signal under test further includes: step 1303, calculating the number of zero-crossings of the DC-removed sound framed sound signal to generate a zero-crossing frequency of a sound frame Step 1304: Perform a zero-crossing smoothing operation on the zero-crossing frequency of the sound frame to obtain the zero-crossing rate of the sound frame. Among them, in a preferred embodiment, the method used when performing the zero-crossing times smoothing operation is: weighted average the current frame zero-crossing times and the previous frame zero-crossing times, but it is not limited to this.

同樣地，如圖4所示，計算該待測聲音訊號的訊號基頻更包括：步驟1305、根據該音框化聲音訊號產生一能量頻譜；步驟1306、根據該能量頻譜產生一基頻估測值；步驟1307、將該基頻估測值進行基頻估測值平滑化運算，即可得該音框基頻。Similarly, as shown in FIG. 4, calculating the signal base frequency of the sound signal under test further includes: step 1305, generating an energy spectrum based on the sound framed sound signal; step 1306, generating a base frequency estimate based on the energy spectrum Value; Step 1307: Perform the basic frequency estimated value smoothing operation on the estimated fundamental frequency to obtain the fundamental frequency of the sound frame.

圖5為本發明之一種嬰兒哭聲偵測方法中產生能量頻譜的流程示意圖；圖6為本發明之一種嬰兒哭聲偵測方法中產生基頻估測值的流程示意圖。FIG. 5 is a schematic diagram of the process of generating energy spectrum in a method for detecting infant crying of the present invention; FIG. 6 is a schematic diagram of the process of generating an estimated fundamental frequency in a method of detecting infant crying of the present invention.

承前所述，如圖5所示，該產生一能量頻譜步驟(步驟1305)係包括：步驟1305a、將該音框化聲音訊號進行加窗，產生一加窗音框化聲音訊號；步驟1305b、將該加窗音框化聲音訊號進行時頻轉換，產生一頻譜；步驟1305c、將該頻譜透過頻譜能量計算，產生該能量頻譜。其中，所謂加窗，係指將每一個音框乘上一窗函數，例如，漢寧窗(Hamming window)，以增加音框左端和右端的連續性，但不限於此。另一方面，在一較佳實施例中，在時頻轉換時所使用的轉換方法為快速傅立葉轉換，但也不限於此。同樣地，在一較佳實施例中，在頻譜能量計算時所使用的計算函式為絕對值函式，但也不限於此。Based on the foregoing, as shown in Figure 5, the step of generating an energy spectrum (step 1305) includes: step 1305a, windowing the sound framed sound signal to generate a windowed sound framed sound signal; step 1305b, Perform time-frequency conversion on the windowed sound framed sound signal to generate a frequency spectrum; step 1305c, calculate the energy spectrum through the frequency spectrum to generate the energy spectrum. Among them, the so-called windowing refers to multiplying each sound frame by a window function, for example, a Hamming window, to increase the continuity between the left and right ends of the sound frame, but it is not limited to this. On the other hand, in a preferred embodiment, the conversion method used in the time-frequency conversion is fast Fourier transform, but it is not limited to this. Similarly, in a preferred embodiment, the calculation function used in the calculation of spectral energy is an absolute value function, but it is not limited to this.

如圖6所示，該產生一基頻估測值步驟更包括：步驟1306a、根據該能量頻譜，產生一區域峰值組；步驟1306b、計算峰值間隔；步驟1306c、計算基頻。分別詳述如下：As shown in FIG. 6, the step of generating a fundamental frequency estimate further includes: step 1306a, generating a regional peak group based on the energy spectrum; step 1306b, calculating the peak interval; step 1306c, calculating the fundamental frequency. The details are as follows:

其中，步驟1306a、根據該能量頻譜，產生一區域峰值組，係先在該能量頻譜上，將一個頻點選為一候選峰值；再以該候選峰值為參考點，進行區域能量比較，若該候選峰值在區域能量比較中被判定為勝出，則將該候選峰值標註為一區域峰值，反之則標註為其它，直到該能量頻譜上的所有的頻點都被標註完畢為止。所有該區域峰值之集合即為該區域峰值組，其中該區域能量比較，係指若該候選峰值之能量大於以該候選峰值為中心之一頻率範圍內所有其他頻點之能量，則將該候選峰值判定為勝出。Wherein, in step 1306a, a regional peak group is generated based on the energy spectrum, and a frequency point is first selected as a candidate peak on the energy spectrum; then the candidate peak is used as a reference point to compare regional energy. If the candidate peak is judged as the winner in the regional energy comparison, the candidate peak is marked as a regional peak, otherwise, it is marked as other, until all the frequency points on the energy spectrum are marked. The set of all peaks in the region is the peak group of the region. The energy comparison of the region means that if the energy of the candidate peak is greater than the energy of all other frequency points in a frequency range centered on the candidate peak, then the candidate The peak value is judged as the winner.

步驟1306b、計算峰值間隔，更包含：若該區域峰值組之區域峰值數高於一區域峰值數門檻，則計算該區域峰值組中相鄰峰值之間隔，以產生峰值間隔組；反之，則判定基頻估測結果為不穩定。Step 1306b. Calculate the peak interval, which further includes: if the number of regional peaks in the regional peak group is higher than the threshold of the number of regional peaks, then calculate the interval between adjacent peaks in the regional peak group to generate a peak interval group; otherwise, determine The fundamental frequency estimation result is unstable.

步驟1306c、計算基頻，係根據該峰值間隔組計算基頻，產生一基頻估測結果，更包含：排除異常間隔，係排除峰值間隔組中之異常極值，以得到一正常峰值間隔組；檢測峰值間隔變異度，係計算該正常峰值間隔組中極值之差異，若差異小於一差異門檻，則進行峰值平均間隔計算，反之則判定該基頻估測結果為不穩定；計算峰值平均間隔，係計算該正常峰值間隔組之平均值，以得到一峰值平均間隔；搜尋基頻峰值，係在該能量頻譜上峰值平均間隔處搜尋該基頻峰值；以及基頻加權平均，係將該基頻峰值與其上下頻點之能量較高者進行加權平均，即可得該基頻估測值。其中，在一較佳實施例中，進行基頻估測值平滑化運算時所使用的方法為：若當前基頻估測值為穩定時，則該音框基頻即為當前基頻估測值；反之，則該音框基頻為前一個音框之基頻。Step 1306c: Calculate the fundamental frequency, calculate the fundamental frequency based on the peak interval group, and generate a fundamental frequency estimation result, including: excluding abnormal intervals, excluding abnormal extreme values in the peak interval group to obtain a normal peak interval group ; Detection of peak interval variability is to calculate the difference between the extreme values in the normal peak interval group. If the difference is less than a difference threshold, the peak average interval calculation is performed; otherwise, the fundamental frequency estimation result is judged to be unstable; the peak average is calculated The interval is to calculate the average value of the normal peak interval group to obtain a peak average interval; to search for the fundamental frequency peak, to search for the fundamental frequency peak at the peak average interval on the energy spectrum; and the fundamental frequency weighted average to obtain the The base frequency peak value and the higher energy of the upper and lower frequency points are weighted and averaged to obtain the estimated value of the base frequency. Among them, in a preferred embodiment, the method used in the smoothing operation of the fundamental frequency estimation value is: if the current fundamental frequency estimation value is stable, the sound frame fundamental frequency is the current fundamental frequency estimation Value; otherwise, the fundamental frequency of the sound frame is the fundamental frequency of the previous sound frame.

圖7為本發明之一種嬰兒哭聲偵測方法中檢測特徵值組的音框屬性的流程示意圖。如圖7所示，該檢測該特徵值組的一音框屬性步驟更包括：步驟2101、對該音框進行強音框檢測，若該音框強度大於一強度門檻，則判定該音框具強音框屬性，反之則判定該音框具弱音框屬性；以及，步驟2102、若該音框具強音框屬性，則再對該音框進行哭聲音框檢測，若該音框過零率落在一過零率上下界之間，或者該音框基頻落在一基頻上下界之間，則判定該音框具哭聲音框屬性。FIG. 7 is a schematic diagram of the process of detecting the frame attributes of the feature value group in a method for detecting infant crying according to the present invention. As shown in FIG. 7, the step of detecting a sound frame attribute of the feature value group further includes: step 2101, performing strong sound frame detection on the sound frame, and if the sound frame intensity is greater than an intensity threshold, it is determined that the sound frame has If the tone frame has the attribute of strong tone frame, if the tone frame has the attribute of strong tone frame, then perform crying sound frame detection on the tone frame, and if the tone frame has zero crossing rate If it falls between the upper and lower bounds of a zero-crossing rate, or the fundamental frequency of the sound frame falls between the upper and lower bounds of a fundamental frequency, it is determined that the sound frame has a crying sound frame attribute.

圖8為本發明之一種嬰兒哭聲偵測方法中針對音框屬性判斷待測聲音訊號是否包含嬰兒哭聲的流程示意圖；圖9為本發明之一種嬰兒哭聲偵測方法中哭聲程度檢測的流程示意圖。其中，如圖8所示，該針對該音框屬性判斷該待測聲音訊號是否包含嬰兒哭聲步驟更包括：步驟2201、計算強音框以及哭聲音框的數量；步驟2202、若兩相鄰音框之屬性依序為先強後弱，則進行聲音長度檢測；步驟2203、若通過聲音長度檢測，則進行哭聲程度檢測；反之則判定偵測結果為非哭聲，各屬性音框計數歸零。其中，步驟2202之該聲音長度檢測係指若該強音框計數低於一強音框計數門檻，則視為通過檢測。Fig. 8 is a schematic diagram of the process of judging whether the sound signal to be measured includes baby crying according to the properties of the sound frame in a method for detecting infant crying; Schematic diagram of the process. Wherein, as shown in FIG. 8, the step of judging whether the sound signal to be tested contains baby crying for the sound frame attribute further includes: step 2201, calculating the number of strong sound boxes and crying sound boxes; step 2202, if the two are adjacent The attributes of the sound frame are strong first and then weak, then the sound length detection is performed; step 2203, if the sound length detection is passed, then the crying degree detection is performed; otherwise, the detection result is judged to be non-crying, and each attribute sound frame is counted Zero. Wherein, the sound length detection in step 2202 means that if the strong tone frame count is lower than a strong tone frame count threshold, it is deemed to pass the detection.

如圖9所示，步驟2203中該哭聲程度檢測更包含：步驟2203a、若該哭聲音框計數超過一哭聲音框計數門檻，則判定偵測結果為哭聲，各屬性音框計數歸零；步驟2203b、若該哭聲音框計數與該強音框計數之比例高於一哭聲比例門檻，則判定該待測聲音訊號為類哭聲；步驟2203c、若類哭聲出現次數超過一類哭聲計數門檻，則判定偵測結果為哭聲，各屬性音框計數歸零；步驟2203d、若相鄰兩次類哭聲之間隔大於一類哭聲間隔門檻，則類哭聲計數歸零。As shown in Figure 9, the crying level detection in step 2203 further includes: step 2203a. If the crying sound frame count exceeds a crying sound frame count threshold, then the detection result is determined to be crying, and the attribute sound frame count is reset to zero Step 2203b, if the ratio of the count of the crying sound frame to the count of the strong sound frame is higher than a crying sound ratio threshold, determine that the sound signal to be measured is a cry-like sound; Step 2203c, if the number of occurrences of the crying sound exceeds a kind of crying If the sound count threshold is reached, it is determined that the detection result is a cry, and the count of each attribute sound frame is reset to zero; in step 2203d, if the interval between two adjacent crying sounds is greater than the first kind of crying interval threshold, the crying sound count is reset to zero.

儘管已參考本申請的許多說明性實施例描述了實施方式，但應瞭解的是，本領域技術人員能夠想到多種其他改變及實施例，這些改變及實施例將落入本公開原理的精神與範圍內。尤其是，在本公開、圖式以及所附申請專利的範圍之內，對主題結合設置的組成部分及/或設置可作出各種變化與修飾。除對組成部分及/或設置做出的變化與修飾之外，可替代的用途對本領域技術人員而言將是顯而易見的。Although the implementation has been described with reference to many illustrative embodiments of the present application, it should be understood that those skilled in the art can think of many other changes and embodiments, and these changes and embodiments will fall within the spirit and scope of the principles of the present disclosure. Inside. In particular, within the scope of the present disclosure, the drawings and the attached patent application, various changes and modifications can be made to the components and/or arrangements of the subject combination arrangement. In addition to changes and modifications to the components and/or settings, alternative uses will be obvious to those skilled in the art.

100:萃取特徵值 110:將該待測聲音訊號進行音框化 120:計算每個音框化聲音訊號的音框基頻 130:將音框化聲音訊號進行直流移除運算 140:計算直流移除音框化聲音訊號的音框強度、音框過零率，該音框強度、音框過零率、以及音框基頻即構成一特徵值組 200:特徵值判斷 210:檢測特徵值組的音框屬性 220:針對音框屬性判斷是否包含嬰兒哭聲 1301:進行時域能量計算 1302:進行能量平滑化運算 1303:進行過零次數計算 1304:進行過零次數平滑化運算 1305:產生能量頻譜 1306:產生基頻估測值 1307:進行基頻估測值平滑化運算 1305a:進行音框加窗 1305b:進行時頻轉換 1305c:進行頻譜能量計算 1306a:根據能量頻譜，產生區域峰值組 1306b:計算峰值間隔 1306c:計算基頻 2101:進行強音框檢測，判斷是否具強音框屬性 2102:若具強音框屬性，則進行哭聲音框檢測 2201:計算強音框以及哭聲音框的數量 2202:若兩相鄰音框之屬性依序為先強後弱，則進行聲音長度檢測 2203:若通過聲音長度檢測，則進行哭聲程度檢測 2203a:若哭聲音框計數超過一哭聲音框計數門檻，則判定偵測結果為哭聲，各屬性音框計數歸零 2203b:若哭聲音框計數與強音框計數之比例高於一哭聲比例門檻，則判定待測聲音訊號為類哭聲 2203c:若類哭聲出現次數超過一類哭聲計數門檻，則判定偵測結果為哭聲，各屬性音框計數歸零 2203d:若相鄰兩次類哭聲之間隔大於一類哭聲間隔門檻，則類哭聲計數歸零100: Extract characteristic value 110: Frame the sound signal to be tested 120: Calculate the fundamental frequency of each framed sound signal 130: Perform DC removal operation on the framed sound signal 140: Calculate the frame strength and the zero-crossing rate of the sound frame of the DC removed framed sound signal. The frame strength, the zero-crossing rate of the sound frame, and the fundamental frequency of the sound frame constitute a characteristic value group 200: eigenvalue judgment 210: Detect the frame attributes of the feature value group 220: Determine whether the sound frame contains baby crying 1301: Perform time domain energy calculation 1302: Perform energy smoothing operations 1303: Calculate the number of zero crossings 1304: Perform zero-crossing smoothing operation 1305: Generate energy spectrum 1306: Generate estimated fundamental frequency 1307: Perform a smoothing operation on the estimated fundamental frequency 1305a: Perform sound frame and window 1305b: Perform time-frequency conversion 1305c: Perform spectrum energy calculation 1306a: Generate a regional peak group based on the energy spectrum 1306b: Calculate the peak interval 1306c: Calculate the fundamental frequency 2101: Perform strong sound frame detection to determine whether it has strong sound frame attributes 2102: If there is a strong sound frame attribute, perform crying sound frame detection 2201: Calculate the number of strong sound boxes and crying sound boxes 2202: If the attributes of two adjacent sound frames are strong first and then weak, the sound length detection is performed 2203: If the sound length detection is passed, the crying level detection is performed 2203a: If the crying sound frame count exceeds the one crying sound frame count threshold, the detection result is determined to be crying, and the sound frame count of each attribute is reset to zero 2203b: If the ratio of the crying sound frame count to the strong sound frame count is higher than a crying sound ratio threshold, the sound signal to be measured is determined to be a cry-like sound 2203c: If the number of occurrences of crying sounds exceeds the count threshold of crying sounds, the detection result is determined to be crying, and the count of each attribute sound frame is reset to zero 2203d: If the interval between two adjacent crying sounds is greater than the threshold of the first crying sound, the crying sound count will be reset to zero

圖1為本發明之一種嬰兒哭聲偵測方法的流程示意圖；圖2為本發明之一種嬰兒哭聲偵測方法中計算待測聲音訊號的訊號強度的流程示意圖；圖3為本發明之一種嬰兒哭聲偵測方法中計算待測聲音訊號的訊號過零率的流程示意圖；圖4為本發明之一種嬰兒哭聲偵測方法中計算待測聲音訊號的訊號基頻的流程示意圖；圖5為本發明之一種嬰兒哭聲偵測方法中產生能量頻譜的流程示意圖；圖6為本發明之一種嬰兒哭聲偵測方法中產生基頻估測值的流程示意圖；圖7為本發明之一種嬰兒哭聲偵測方法中檢測特徵值組的音框屬性的流程示意圖；圖8為本發明之一種嬰兒哭聲偵測方法中針對音框屬性判斷待測聲音訊號是否包含嬰兒哭聲的流程示意圖；圖9為本發明之一種嬰兒哭聲偵測方法中哭聲程度檢測的流程示意圖。Fig. 1 is a schematic flow diagram of a method for detecting infant crying according to the present invention; 2 is a schematic diagram of the process of calculating the signal intensity of the sound signal to be measured in a method for detecting infant crying according to the present invention; 3 is a flow chart of calculating the zero-crossing rate of a sound signal to be measured in a method for detecting baby crying according to the present invention; 4 is a schematic flow chart of calculating the signal base frequency of the sound signal to be measured in a method for detecting baby crying according to the present invention; FIG. 5 is a schematic diagram of the process of generating energy spectrum in a method for detecting infant crying according to the present invention; FIG. 6 is a schematic diagram of a flow chart of generating a fundamental frequency estimation value in a method for detecting infant crying according to the present invention; FIG. 7 is a schematic diagram of the process of detecting the sound frame attributes of the feature value group in a method for detecting baby cry of the present invention; FIG. 8 is a schematic diagram of the process of judging whether the sound signal to be tested contains a baby cry according to the sound frame attribute in a method for detecting baby cry of the present invention; FIG. 9 is a schematic diagram of the flow of detecting the degree of crying in a method for detecting infant crying according to the present invention.

100:萃取特徵值 100: Extract characteristic value

110:將該待測聲音訊號進行音框化 110: Frame the sound signal to be tested

120:計算每個音框化聲音訊號的音框基頻 120: Calculate the fundamental frequency of each framed sound signal

130:將音框化聲音訊號進行直流移除運算 130: Perform DC removal operation on the framed sound signal

140:計算直流移除音框化聲音訊號的音框強度、音框過零率，該音框強度、音框過零率、以及音框基頻即構成一特徵值組 140: Calculate the frame strength and the zero-crossing rate of the sound frame of the DC removed framed sound signal. The frame strength, the zero-crossing rate of the sound frame, and the fundamental frequency of the sound frame constitute a characteristic value group

200:特徵值判斷 200: eigenvalue judgment

210:檢測特徵值組的音框屬性 210: Detect the frame attributes of the feature value group

220:針對音框屬性判斷是否包含嬰兒哭聲 220: Determine whether the sound frame contains baby crying

Claims

A method for detecting baby crying includes the following steps: The step of extracting characteristic values is to input a sound signal to be measured in time sequence to extract at least one characteristic value group of the sound signal to be measured; and, The feature value judging step is to input the feature value group in time sequence and determine whether the sound signal to be tested includes a baby crying according to the feature value group, so as to obtain a detection result; Wherein, the step of extracting characteristic values further includes: sound frame the sound signal to be tested to generate at least one sound frame sound signal; calculate the signal fundamental frequency of each sound frame sound signal to obtain a sound frame fundamental frequency; Perform a DC removal operation on the audio framed audio signal to generate a DC audio framed audio signal; calculate the signal strength of the DC audio framed audio signal and the signal zero crossing rate to obtain a audio frame respectively Intensity and the zero-crossing rate of a sound frame, the sound-frame intensity, the zero-crossing rate of the sound frame, and the fundamental frequency of the sound frame constitute a characteristic value group; The feature value judging step further includes: detecting a sound frame attribute of the feature value group, and then judging whether the sound signal to be tested includes baby crying according to the sound frame attribute, so as to obtain the detection result.

For example, the baby crying detection method described in the first item of the scope of patent application, wherein when the sound frame is performed, the generated sound frames will partially overlap.

For the infant cry detection method described in claim 1, wherein calculating the signal strength of the sound signal to be measured further includes the following steps: performing time-domain energy calculation on the DC-removed sound framed sound signal to generate A sound frame energy; the sound frame energy is subjected to an energy smoothing operation to obtain the sound frame intensity.

For the infant cry detection method described in item 3 of the scope of patent application, the method used in the time domain energy calculation is to take the average of the absolute values of all sampling points in the sound frame.

The method for detecting infant crying as described in item 3 of the scope of patent application, wherein the method used in the energy smoothing operation is: weighted average of the energy of the current sound frame and the energy of the previous sound frame.

As described in the first item of the patent application, the method for detecting baby crying, wherein calculating the signal zero-crossing rate of the sound signal to be tested further includes the following steps: calculating the number of zero-crossings of the DC removed sound framed sound signal , Generate the zero-crossing frequency of a sound frame; perform the zero-crossing smoothing operation on the zero-crossing frequency of the sound frame to obtain the zero-crossing rate of the sound frame.

For example, the baby crying detection method described in item 6 of the scope of patent application, wherein the method used for smoothing the number of zero crossings is: weighting the number of zero crossings of the current frame with the number of zero crossings of the previous frame average.

According to the infant crying detection method described in claim 1, wherein, calculating the signal base frequency of the sound signal to be measured further includes the following steps: generating an energy spectrum according to the sound framed sound signal; according to the energy spectrum Generate a fundamental frequency estimated value; perform the fundamental frequency estimated value smoothing operation on the fundamental frequency estimated value to obtain the fundamental frequency of the sound frame.

For example, the baby crying detection method described in claim 8, wherein the step of generating an energy spectrum includes: windowing the sound framed sound signal to generate a windowed sound framed sound signal; The windowed sound framed sound signal is time-frequency converted to generate a frequency spectrum; the frequency spectrum is calculated through spectrum energy to generate the energy spectrum.

The method for detecting baby crying as described in item 8 of the scope of patent application, wherein the function used in the windowing function is the Hanning window.

The method for detecting infant crying as described in item 8 of the scope of patent application, wherein the conversion method used in the time-frequency conversion is fast Fourier transform.

The method for detecting baby crying as described in item 8 of the scope of patent application, wherein the calculation function used in the calculation of spectral energy is an absolute value function.

According to the infant crying detection method described in claim 1, wherein the step of generating a fundamental frequency estimation value further includes: According to the energy spectrum, a regional peak group is generated. First, a frequency point is selected as a candidate peak on the energy spectrum, and then the candidate peak is used as a reference point to compare the regional energy. If the candidate peak is in the regional energy If it is judged as a winner in the comparison, the candidate peak is marked as a regional peak, otherwise, it is marked as other. Until all the frequency points on the energy spectrum are marked, the set of all peaks in the region is the region Peak group, where the regional energy comparison means that if the energy of the candidate peak is greater than the energy of all other frequency points in a frequency range centered on the candidate peak, the candidate peak is judged as the winner; Then, calculate the peak interval, including, if the number of regional peaks in the regional peak group is higher than the threshold of a regional peak number, then calculate the interval between adjacent peaks in the regional peak group to generate a peak interval group; otherwise, determine the fundamental frequency The estimated result is unstable; and, Calculate the fundamental frequency, calculate the fundamental frequency based on the peak interval group, and generate a fundamental frequency estimation result, including: excluding abnormal intervals, excluding abnormal extreme values in the peak interval group to obtain a normal peak interval group; detecting the peak interval The degree of variability is to calculate the difference between the extreme values in the normal peak interval group. If the difference is less than a difference threshold, the peak average interval calculation is performed. Otherwise, the fundamental frequency estimation result is judged to be unstable; to calculate the peak average interval, calculate The average value of the normal peak interval group to obtain a peak average interval; the search for the fundamental frequency peak is to search for the fundamental frequency peak at the peak average interval on the energy spectrum; and the fundamental frequency weighted average is to obtain the fundamental frequency peak and the peak average interval. The weighted average of the energy of the upper and lower frequency points is performed to obtain the estimated value of the fundamental frequency.

For example, the baby crying detection method described in item 13 of the scope of patent application, wherein the method used in the smoothing calculation of the fundamental frequency estimation value is: if the current fundamental frequency estimation value is stable, the sound frame The fundamental frequency is the estimated value of the current fundamental frequency; otherwise, the fundamental frequency of the sound frame is the fundamental frequency of the previous sound frame.

According to the infant crying detection method described in claim 1, wherein the step of detecting a frame attribute of the characteristic value group further includes: Perform strong frame detection on the sound frame, and if the strength of the sound frame is greater than an intensity threshold, it is determined that the sound frame has the attribute of strong sound frame; otherwise, it is determined that the sound frame has the attribute of weak sound frame; and, If the sound frame has the attribute of strong sound frame, then the sound frame is detected again. If the zero crossing rate of the sound frame falls between the upper and lower bounds of the zero crossing rate, or the fundamental frequency of the sound frame falls on the first base Between the upper and lower bounds of the frequency, it is determined that the sound frame has the crying sound frame attribute.

According to the infant crying detection method described in the first item of the scope of patent application, the step of judging whether the sound frame attribute contains the infant crying further includes: Calculate the number of strong sound boxes and crying sound boxes; If the attributes of two adjacent sound frames are strong first and then weak, the sound length detection is performed; If the sound length is detected, the degree of crying is detected; otherwise, the detection result is determined to be non-crying, and the count of each attribute sound frame is returned to zero.

For example, the baby crying detection method described in item 16 of the scope of patent application, wherein the sound length detection means that if the high-tone frame count is lower than a high-tone frame count threshold, the detection is deemed to be passed.

Such as the infant crying detection method described in item 16 of the scope of patent application, wherein the crying level detection refers to: If the crying sound frame count exceeds a crying sound frame count threshold, it is determined that the detection result is a crying sound, and the count of each attribute sound frame is reset to zero; If the ratio of the count of the crying sound frame to the count of the strong sound frame is higher than a crying ratio threshold, it is determined that the sound signal to be measured is a cry-like sound; If the number of occurrences of similar crying sounds exceeds the threshold for counting the number of crying sounds, the detection result is determined to be crying, and the count of each attribute sound frame is reset to zero; If the interval between two adjacent crying sounds is greater than the threshold of the interval between crying sounds, the count of crying sounds will be reset to zero.