TWI798111B - Cough identification method and system thereof - Google Patents

Cough identification method and system thereof Download PDF

Info

Publication number
TWI798111B
TWI798111B TW111122515A TW111122515A TWI798111B TW I798111 B TWI798111 B TW I798111B TW 111122515 A TW111122515 A TW 111122515A TW 111122515 A TW111122515 A TW 111122515A TW I798111 B TWI798111 B TW I798111B
Authority
TW
Taiwan
Prior art keywords
audio
cough
training
personal
input
Prior art date
Application number
TW111122515A
Other languages
Chinese (zh)
Other versions
TW202401456A (en
Inventor
盧沛怡
洪淑惠
芮嘉勇
郭漢彬
洪宗杰
Original Assignee
財團法人國家實驗研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 財團法人國家實驗研究院 filed Critical 財團法人國家實驗研究院
Priority to TW111122515A priority Critical patent/TWI798111B/en
Application granted granted Critical
Publication of TWI798111B publication Critical patent/TWI798111B/en
Publication of TW202401456A publication Critical patent/TW202401456A/en

Links

Images

Abstract

The present invention relates to a cough identification method and a system thereof. The identification method comprises the following steps: inputting a plurality of training audios and storing them in a storage device; converting the plurality of training audios into a plurality of audio signals, and inputting them into a personal audio feature extraction module performs convolutional neural network operations to establish a personal audio feature model; the personal audio features obtained by the personal audio feature model and the cough audio are input to the cough audio analysis and identification module then performs convolutional neural network operations to establish a cough audio analysis and identification model. When performing disease identification, pre-input audio into the personal audio feature model to obtain personal audio features, then input real-time cough audio and the personal audio features to the cough audio analysis and identification model to identify the corresponding respiratory disease.

Description

咳聲辨識方法及其系統 Cough sound recognition method and system

本發明係關於一種咳聲辨識方法及其系統,特別係關於一種利用卷積神經網路(Convolutional neural networks,CNN)運算,對個人聲音及其咳聲進行分析,進而正確辨識其咳聲所對應的呼吸道疾病的辨識方法及其系統。 The present invention relates to a cough recognition method and system thereof, in particular to a method that uses convolutional neural networks (CNN) operations to analyze individual voices and their coughs, and then correctly identify the corresponding coughs. Identification method and system for respiratory diseases.

咳嗽為一種呼吸道常見症狀,其係由氣管、支氣管黏膜或胸膜受炎症、異物、物理或化學刺激所引起。咳嗽是多種咳嗽疾病的生理表徵,不同的咳嗽疾病所展現出的咳嗽特徵也不盡相同。 Cough is a common symptom of the respiratory tract, which is caused by inflammation, foreign body, physical or chemical stimulation of the trachea, bronchial mucosa or pleura. Cough is a physiological symptom of various cough diseases, and different cough diseases exhibit different cough characteristics.

在醫學上,經驗豐富的醫生可根據病人咳嗽聲的特徵進行咳嗽疾病的診斷,常見的咳嗽疾病及其特徵包含: In medicine, experienced doctors can diagnose coughing diseases based on the characteristics of the patient's coughing sound. Common coughing diseases and their characteristics include:

1.純乾咳或純濕咳-鼻後滴漏綜合症。 1. Pure dry cough or pure wet cough-postnasal drip syndrome.

2.乾咳並以喘息聲結束-哮喘。 2. A dry cough that ends with a wheezing sound - asthma.

3.哮吼性咳嗽音調高-急性喉炎 3. Roaring cough with high pitch - acute laryngitis

4.具有卡噠聲的咳嗽-慢性阻塞性肺病。 4. Cough with a clicking sound - COPD.

5.乾咳無力但急促-肺炎。 5. Dry cough weak but short-pneumonia.

6.乾咳且具有痙攣性-百日咳。 6. Dry cough with spasticity - pertussis.

7.單聲咳嗽-上呼吸道發炎。 7. Single cough - upper airway inflammation.

由於藉由咳嗽音頻判斷咳嗽所對應的呼吸道疾病種類需要一定的經驗累積,因此一般只有經驗豐富的醫生可以進行精準判斷,沒有經驗或經驗較少的人則無法根據咳嗽音頻判斷其所對應的呼吸道疾病。 Since it takes a certain amount of experience to judge the type of respiratory disease corresponding to a cough by cough audio, generally only experienced doctors can make an accurate judgment, and people with no experience or less experience cannot judge the corresponding respiratory disease based on cough audio. disease.

有鑑於此,如何建立一種無須人工進行辨識的技術,使其能直接利用咳嗽聲直接正確辨識出其所對應的呼吸道疾病,將是相關醫學產業所希望達成之目標。因此,本發明之發明人思索並設計一種咳聲辨識方法及其系統,針對習知技術之缺失加以改善,進而增進產業上之實施利用。 In view of this, how to establish a technology that does not require manual identification, so that it can directly and correctly identify the corresponding respiratory diseases by directly using the cough sound, will be the goal that the relevant medical industry hopes to achieve. Therefore, the inventors of the present invention conceived and designed a cough recognition method and its system to improve the deficiencies of the conventional technology, thereby enhancing the implementation and utilization in the industry.

有鑑於上述習知技術之問題,本發明之目的在於提供一種咳聲辨識方法及其系統,以解決習知之人工判讀精確度不足且難以自動化之問題。 In view of the above-mentioned problems in the prior art, the purpose of the present invention is to provide a cough recognition method and system thereof, so as to solve the conventional problem of insufficient accuracy of manual interpretation and difficulty in automation.

根據本發明之一目的,提出一種咳聲辨識方法,其包含下列步驟:步驟S1:通過輸入裝置輸入複數個訓練音頻及其對應的複數個訓練咳聲音頻,儲存於儲存裝置;步驟S2:藉由處理器存取儲存裝置,將複數個訓練音頻轉換為複數個音頻訊號;步驟S3:藉由處理器將複數個音頻訊號輸入至個人音頻特徵擷取模組進行卷積神經網路運算,以建立個人音頻特徵模型,取得複數個個人音頻特徵; 步驟S4:藉由處理器將複數個個人音頻特徵及其對應的複數個訓練咳聲音頻一起輸入至咳聲音頻分析辨識模組進行卷積神經網路運算,以建立咳聲音頻分析辨識模型;步驟S5:通過輸入裝置輸入待辨識個人音頻及其待辨識咳聲音頻,藉由處理器進行判讀程序,依據個人音頻特徵模型及咳聲音頻分析辨識模型判讀對應之呼吸道疾病種類;步驟S6:通過輸出裝置存取儲存裝置,將經判讀分析之呼吸道疾病種類輸出。 According to one object of the present invention, a cough recognition method is proposed, which includes the following steps: Step S1: input a plurality of training audio and its corresponding training cough audio through an input device, and store them in a storage device; Step S2: borrow The storage device is accessed by the processor, and the plurality of training audios are converted into a plurality of audio signals; Step S3: The processor inputs the plurality of audio signals to the personal audio feature extraction module to perform convolutional neural network calculations to Establish a personal audio feature model and obtain a plurality of personal audio features; Step S4: Input the plurality of personal audio features and the corresponding plurality of training cough audio to the cough audio analysis and identification module through the processor to perform convolutional neural network calculations to establish a cough audio analysis and identification model; Step S5: Input the audio of the person to be identified and the audio of the cough to be identified through the input device, and perform the interpretation program through the processor, and interpret the corresponding respiratory disease type according to the personal audio feature model and the cough audio analysis and identification model; Step S6: Pass The output device accesses the storage device and outputs the types of respiratory diseases that have been interpreted and analyzed.

根據本發明之另一目的,提出一種咳聲辨識系統,其包含輸入裝置、儲存裝置、處理器以及輸出裝置。其中,輸入裝置用以輸入複數個訓練音頻及其對應的複數個訓練咳聲音頻、待辨識個人音頻及其待辨識咳聲音頻;儲存裝置連接於輸入裝置及輸出裝置,用以儲存複數個訓練音頻及其對應的複數個訓練咳聲音頻、待辨識個人音頻及其待辨識咳聲音頻;輸出裝置連接於儲存裝置,將經判讀分析之呼吸道疾病種類輸出;處理器連接於儲存裝置,執行複數個指令以施行下列步驟:將複數個訓練音頻轉換為複數個音頻訊號,並將複數個音頻訊號輸入至個人音頻特徵擷取模組進行卷積神經網路運算,以建立個人音頻特徵模型,取得複數個個人音頻特徵;將複數個個人音頻特徵及其對應的複數個訓練咳聲音頻(也轉換為複數個訓練咳聲音頻訊號)一起輸入至咳聲音頻分析辨識模組進行卷積神經網路運算,以建立咳聲音頻分析辨識模型;依據個人音頻特徵模型及咳聲音頻分析辨識模型,判讀待辨識個人音頻及其咳聲音頻,以分析出對應之呼吸道疾病種類。 According to another object of the present invention, a cough recognition system is provided, which includes an input device, a storage device, a processor, and an output device. Among them, the input device is used to input a plurality of training audio and its corresponding plurality of training cough audio, the audio of the individual to be identified and its audio of cough to be identified; the storage device is connected to the input device and the output device to store the plurality of training audio The audio and its corresponding multiple training cough audio, the audio of the individual to be identified and the audio of the cough to be identified; the output device is connected to the storage device to output the types of respiratory diseases that have been interpreted and analyzed; the processor is connected to the storage device to execute multiple Instructions are used to perform the following steps: convert a plurality of training audios into a plurality of audio signals, and input the plurality of audio signals to the personal audio feature extraction module to perform convolutional neural network operations to establish a personal audio feature model, and obtain A plurality of personal audio features; the plurality of personal audio features and their corresponding plurality of training cough audio (also converted into a plurality of training cough audio signals) are input to the cough audio analysis and identification module for convolutional neural network Calculation to establish a cough audio analysis and identification model; according to the personal audio feature model and the cough audio analysis and identification model, the audio of the person to be identified and the cough audio are interpreted to analyze the corresponding type of respiratory disease.

較佳地,所述複數個音頻訊號及所述複數個訓練咳聲音頻訊號可為梅爾倒頻譜係數(Mel-Frequency Cepstral Coefficient,MFCC);梅爾倒頻譜係數係為一組用來建立梅爾倒頻譜的關鍵係數,由聲音訊號當中的片段,可得到一組足以代表此聲音訊號之倒頻譜(Cepstrum),而梅爾倒頻譜係數即是從這個倒頻譜中推得的倒頻譜。與一般的倒頻譜不同,梅爾倒頻譜的特色在於,其上的頻帶是均勻分布於梅爾刻度上,亦即,這類頻帶相較於一般所看到、線性的倒頻譜表示方法,與人類非線性的聽覺系統更為接近。例如:在音訊壓縮的技術中,便經常使用梅爾倒頻譜來處理。 Preferably, the plurality of audio signals and the plurality of training cough audio signals may be Mel-Frequency Cepstral Coefficients (MFCC); Mel-Frequency Cepstral Coefficients are a group used to establish Mel The key coefficients of the Mel cepstrum, from the segment of the sound signal, a group of cepstrums (Cepstrum) that can represent the sound signal can be obtained, and the Mel cepstrum coefficients are the cepstrum derived from the cepstrum. Different from the general cepstrum, the characteristic of the Mel cepstrum is that the frequency bands on it are uniformly distributed on the Mel scale, that is, compared with the generally seen, linear cepstrum representation method, this type of frequency band is different from the The human nonlinear auditory system is much closer. For example: In audio compression technology, Mel cepstrum is often used for processing.

承上所述,使用本發明之咳聲辨識方法及其系統,可快速且便利的得知患者罹患之呼吸道疾病種類,透過此方式以辨別呼吸道疾病種類,可輔助臨床判讀,提升後續診斷結果的正確率。 Based on the above, using the cough sound recognition method and system of the present invention, it is possible to quickly and conveniently know the type of respiratory disease that the patient is suffering from. Through this method, the type of respiratory disease can be identified, which can assist clinical interpretation and improve the accuracy of subsequent diagnosis results. Correct rate.

1:待辨識個人音頻 1: Personal audio to be identified

2:待辨識咳聲音頻 2: Cough audio to be identified

3:個人音頻特徵模型 3: Personal audio feature model

4:轉換為待辨識咳聲音頻訊號 4: Convert to the cough audio signal to be identified

5:咳聲音頻分析辨識模型 5: Cough audio analysis and identification model

6:呼吸道疾病種類 6: Types of respiratory diseases

7:訓練音頻 7: Training Audio

8:訓練咳聲音頻 8: Training cough audio

9:轉換為訓練咳聲音頻訊號 9: Convert to training cough audio signal

10:複數卷積層 10: Complex convolution layer

11:複數長短記憶層 11: Complex long and short memory layers

12:全連接層 12: Fully connected layer

13:複數全連接層 13: Complex fully connected layer

20:咳聲辨識系統 20: Cough sound recognition system

21:輸入裝置 21: Input device

22:儲存裝置 22: storage device

23:處理器 23: Processor

24:輸出裝置 24: output device

A:音訊輸入 A:Audio input

B:特徵擷取與分析辨識 B: Feature extraction and analysis and identification

C:咳聲音頻分析辨識模組 C: Cough sound audio analysis and identification module

S1~S6:步驟 S1~S6: steps

為使本發明之技術特徵、內容與優點及其所能達成之功效更為顯而易見,茲將本發明配合附圖,並以實施例之表達形式詳細說明如下:第1圖係為本發明實施例之咳聲辨識方法之步驟流程圖;第2圖係為本發明實施例之咳聲辨識方法之方塊示意圖;第3圖係為本發明實施例之個人音頻特徵擷取模組訓練個人音頻特徵模型之示意圖;第4圖係為本發明實施例之咳聲音頻分析辨識模組訓練咳聲音頻分析辨識模型之示意圖;第5圖係為本發明實施例之咳聲辨識系統之示意圖。 In order to make the technical features, content and advantages of the present invention and the effects that can be achieved more obvious, the present invention is hereby combined with the accompanying drawings, and described in detail in the form of embodiments as follows: The first figure is an embodiment of the present invention The flow chart of the steps of the cough recognition method; Figure 2 is a schematic block diagram of the cough recognition method of the embodiment of the present invention; Figure 3 is the personal audio feature extraction module training the personal audio feature model of the embodiment of the present invention Figure 4 is a schematic diagram of the cough audio analysis and recognition module training cough audio analysis and recognition model of the embodiment of the present invention; Figure 5 is a schematic diagram of the cough recognition system of the embodiment of the present invention.

為利貴審查委員瞭解本發明之技術特徵、內容與優點及其所能達成之功效,茲將本發明配合附圖,並以實施例之表達形式詳細說明如下,而其中所使用之圖式,其主旨僅為示意及輔助說明書之用,未必為本發明實施後之真實比例與精準配置,故不應就所附之圖式的比例與配置關係解讀、侷限本發明於實際實施上的權利範圍,合先敘明。 In order for the Ligui Examiner to understand the technical features, content and advantages of the present invention and the effects it can achieve, the present invention is hereby combined with the accompanying drawings and described in detail in the form of an embodiment as follows, and the drawings used therein, its The subject matter is only for illustration and auxiliary instructions, and not necessarily the true proportion and precise configuration of the present invention after implementation, so it should not be interpreted based on the proportion and configuration relationship of the attached drawings, and limit the scope of rights of the present invention in actual implementation. Together first describe.

除非另有定義,本文所使用的所有術語(包括技術和科學術語)具有與本發明所屬技術領域的通常知識者通常理解的含義。將進一步理解的是,諸如在通常使用的字典中定義的那些術語應當被解釋為具有與它們在相關技術和本發明的上下文中的含義一致的含義,並且將不被解釋為理想化的或過度正式的意義,除非本文中明確地如此定義。 Unless otherwise defined, all terms (including technical and scientific terms) used herein have the meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms such as those defined in commonly used dictionaries should be interpreted to have meanings consistent with their meanings in the context of the relevant art and the present invention, and will not be interpreted as idealized or excessive formal meaning, unless expressly so defined herein.

請一併參閱第1圖、第2圖及第4圖,第1圖係為本發明實施例之咳聲辨識方法之步驟流程圖;第2圖係為本發明實施例之咳聲辨識方法之方塊示意圖;而第4圖係為本發明實施例之咳聲音頻分析辨識模組訓練咳聲音頻分析辨識模型之示意圖。如第1圖所示,咳聲辨識方法包含以下步驟(S1~S6): Please refer to Fig. 1, Fig. 2 and Fig. 4 together. Fig. 1 is a flow chart of the steps of the cough recognition method of the embodiment of the present invention; Fig. 2 is a flowchart of the cough recognition method of the embodiment of the present invention Block diagram; and Figure 4 is a schematic diagram of the cough audio analysis and recognition module training cough audio analysis and recognition model of the embodiment of the present invention. As shown in Figure 1, the cough recognition method includes the following steps (S1~S6):

步驟S1:通過輸入裝置輸入複數個訓練音頻7及其對應的複數個訓練咳聲音頻8,儲存於儲存裝置。 Step S1: Input a plurality of training audios 7 and corresponding training cough audios 8 through an input device, and store them in a storage device.

通過輸入裝置輸入複數個訓練音頻7及其對應的複數個訓練咳聲音頻8,輸入至系統的儲存裝置當中,這裡所述的輸入裝置為音頻採集設備,例如麥克風,抑或是具有音頻採集功能的電子設備,例如智慧型手機、平板電腦、 筆記型電腦、相機等,但不侷限於此,任何可採集音頻的設備均可作為輸入裝置。 Input a plurality of training audio 7 and its corresponding training cough audio 8 through the input device, and input them into the storage device of the system. The input device described here is an audio collection device, such as a microphone, or an audio collection device. Electronic devices such as smartphones, tablets, Notebook computers, cameras, etc., but not limited to, any device that can capture audio can be used as an input device.

步驟S2:藉由處理器存取儲存裝置,將複數個訓練音頻7轉換為複數個音頻訊號。 Step S2: Convert the plurality of training audio signals 7 into a plurality of audio signals through the processor accessing the storage device.

此步驟係將訓練音頻7轉換為特定之音頻訊號,較佳為梅爾倒頻譜係數,因梅爾倒頻譜與人類非線性的聽覺系統更為接近,將其作為後續建立個人音頻特徵模型的效果較顯著。 This step is to convert the training audio 7 into a specific audio signal, preferably the Mel cepstrum coefficient, because the Mel cepstrum is closer to the human nonlinear auditory system, and it will be used as the effect of subsequent establishment of a personal audio feature model more significant.

步驟S3:藉由處理器將複數個音頻訊號輸入至個人音頻特徵擷取模組進行卷積神經網路運算,以建立個人音頻特徵模型3,取得複數個個人音頻特徵。 Step S3: Input the plurality of audio signals to the personal audio feature extraction module through the processor to perform convolutional neural network calculations to establish a personal audio feature model 3 and obtain a plurality of personal audio features.

個人音頻特徵擷取模組包含複數個卷積網路層(convolutional layers)、複數個長短期記憶層(long short-term memory)以及複數個全連接層(fully-connected layers),且每層包含一觸發函數。 The personal audio feature extraction module includes a plurality of convolutional layers, a plurality of long short-term memory layers, and a plurality of fully-connected layers, and each layer includes A trigger function.

藉由個人音頻特徵擷取模組可將不同人的音頻輸入後進行卷積神經網路運算,使音頻訊號被映射(mapping)至一高維度連續特徵空間(latent space),所述高維度連續特徵空間即為所述個人音頻特徵模型3,其為一高維度連續特徵空間,具有複數個高維度向量(latent vector),所述複數個高維度向量即為不同人之個人音頻特徵;在經由卷積神經網路運算訓練模型時,可使用但不限於歐式距離(Euclidean distance)將屬於同一人之不同音頻得到之高維度向量間之距離最小化,並同時最大化屬於不同人之音頻特徵間的距離。 Through the personal audio feature extraction module, the audio of different people can be input and then the convolutional neural network operation is performed, so that the audio signal is mapped to a high-dimensional continuous feature space (latent space), and the high-dimensional continuous The feature space is the personal audio feature model 3, which is a high-dimensional continuous feature space with multiple high-dimensional vectors (latent vectors), and the multiple high-dimensional vectors are the personal audio features of different people; When training the convolutional neural network model, you can use but not limited to Euclidean distance to minimize the distance between high-dimensional vectors obtained from different audios belonging to the same person, and at the same time maximize the distance between audio features belonging to different people. distance.

請一併參照第3圖,第3圖係為本發明實施例之個人音頻特徵擷取模組訓練個人音頻特徵模型3之示意圖。由圖中可知,個人A所發出之音頻A1至 An為n個獨立之音頻且音頻內容無須相同(如不同之語句、或聲音),個人音頻特徵擷取模組需將A1至An的音頻映射至所述高維度連續特徵空間中相近的區域;類似地,另一個人B提供m個音頻B1至Bm也需映射到接近的區域。另一方面,藉由個人音頻特徵擷取模組訓練的過程中,會以額外的損失函數(loss function)來最大化不同人音頻之間的差異,因此相較於傳統辨識方法具有更高之區別能力(discrimination)。訓練模型時會以複數個不同人之音頻進行,且每個人將提供複數個且涵蓋不同內容之音頻作為訓練模型之用。 Please also refer to FIG. 3 . FIG. 3 is a schematic diagram of a personal audio feature extraction module training a personal audio feature model 3 according to an embodiment of the present invention. It can be seen from the figure that the audio A1 sent by individual A to An is n independent audio and the audio content does not have to be the same (such as different sentences or sounds), and the personal audio feature extraction module needs to map the audio from A1 to An to similar regions in the high-dimensional continuous feature space; Similarly, m audio B1 to Bm provided by another person B also needs to be mapped to a nearby area. On the other hand, during the training process of the personal audio feature extraction module, an additional loss function (loss function) will be used to maximize the difference between different people's audio, so it has a higher accuracy than traditional recognition methods. Discrimination. When training the model, multiple audios from different people will be used, and each person will provide multiple audios covering different content for training the model.

步驟S4:藉由處理器將複數個個人音頻特徵及其對應的複數個訓練咳聲音頻8一起輸入至咳聲音頻分析辨識模組C進行卷積神經網路運算,以建立咳聲音頻分析辨識模型5。 Step S4: Input the multiple personal audio features and the corresponding multiple training cough audios 8 to the cough audio analysis and identification module C through the processor to perform convolutional neural network calculations to establish cough audio analysis and identification Model 5.

咳聲音頻分析辨識模組C包含複數個卷積網路層10、複數個長短期記憶層11與複數個全連接層13(其中一者獨立為單一全連接層12),且每層也包含一觸發函數。 The cough audio analysis and recognition module C includes a plurality of convolutional network layers 10, a plurality of long-term short-term memory layers 11 and a plurality of fully connected layers 13 (one of which is independently a single fully connected layer 12), and each layer also includes A trigger function.

再參照第4圖,由圖中可知,咳聲音頻分析辨識模組C包含兩種輸入資料,一者為步驟S3中建立之個人音頻特徵模型3中所取得複數個個人音頻特徵;而另一者則為所述複數個個人音頻特徵所對應的複數個訓練咳聲音頻8,所述複數個訓練咳聲音頻8也轉換為複數個訓練咳聲音頻訊號9,所述複數個訓練咳聲音頻訊號9較佳為梅爾倒頻譜係數;亦即,個人A之個人音頻特徵與其咳聲音頻訊號、個人B之個人音頻特徵與其咳聲音頻訊號等,以此類推;其中,所述複數個訓練咳聲音頻訊號9經過複數個卷積網路層10、複數個長短期記憶層11以及單一全連接層12後,再與所述複數個個人音頻特徵一同經過剩餘的複數個全連接層13進行訓練,最終建立一咳聲音頻分析辨識模型5。 Referring to Fig. 4 again, it can be seen from the figure that the cough sound audio analysis and recognition module C includes two kinds of input data, one is a plurality of personal audio characteristics obtained in the personal audio characteristic model 3 established in step S3; and the other The other is the plurality of training cough audio 8 corresponding to the plurality of personal audio features, and the plurality of training cough audio 8 is also converted into a plurality of training cough audio signals 9, and the plurality of training cough audio The signal 9 is preferably Mel cepstral coefficients; that is, the personal audio characteristics of individual A and its cough audio signal, the personal audio characteristics of individual B and its cough audio signal, etc., and so on; wherein, the plurality of training The cough audio signal 9 passes through a plurality of convolutional network layers 10, a plurality of long-term short-term memory layers 11 and a single fully connected layer 12, and then passes through the remaining plurality of fully connected layers 13 together with the plurality of personal audio features. Training, and finally establish a cough audio analysis and recognition model 5.

步驟S5:通過輸入裝置輸入待辨識個人音頻1及其待辨識咳聲音頻2,藉由處理器進行判讀程序,依據個人音頻特徵模型3及咳聲音頻分析辨識模型5判讀對應之呼吸道疾病種類6。 Step S5: Input the personal audio 1 to be identified and the cough audio 2 to be identified through the input device, and execute the interpretation program through the processor, and interpret the corresponding respiratory disease type 6 according to the personal audio feature model 3 and the cough audio analysis and identification model 5 .

將待辨識個人音頻1輸入至個人音頻特徵模型3以獲得一個人音頻特徵,再將所述個人音頻特徵與其待辨識咳聲音頻2(即同一人之咳聲音頻,可為即時錄製或預先錄製)轉換為待辨識咳聲音頻訊號4,一起輸入至咳聲音頻分析辨識模型5中進行判讀,以獲得對應之呼吸道疾病種類6。藉由所述個人音頻特徵作為後續呼吸道疾病種類辨識之個人化校正資訊,可使咳聲疾病之辨識精準度大幅提升。 Input the personal audio 1 to be identified into the personal audio feature model 3 to obtain a personal audio feature, and then combine the personal audio feature with the cough audio 2 to be identified (that is, the cough audio of the same person, which can be recorded in real time or pre-recorded) It is converted into a cough audio signal 4 to be identified, and is input to the cough audio analysis and identification model 5 for interpretation, so as to obtain the corresponding respiratory disease type 6 . Using the personal audio features as the personalized correction information for subsequent identification of respiratory diseases can greatly improve the accuracy of identification of cough diseases.

步驟S6:通過輸出裝置存取儲存裝置,將經判讀分析之呼吸道疾病種類輸出。 Step S6: Access the storage device through the output device, and output the types of respiratory diseases that have been interpreted and analyzed.

經由咳聲音頻分析辨識模型5所判讀分析之呼吸道疾病種類辨識結果6,通過輸出裝置讀取儲存裝置以顯示其對應之呼吸道疾病種類;所述輸出裝置可包含各種顯示介面,例如電腦螢幕、顯示器或手持裝置顯示器等。 The identification result 6 of the type of respiratory disease analyzed by the cough audio analysis and identification model 5 is read from the storage device through the output device to display the corresponding type of respiratory disease; the output device may include various display interfaces, such as computer screens, monitors Or handheld device display, etc.

請參閱第5圖,第5圖係為本發明實施例之咳聲辨識系統之示意圖。如圖所示,咳聲辨識系統20可包含輸入裝置21、儲存裝置22、處理器23及輸出裝置24。輸入裝置21可包含各類音頻採集設備,例如麥克風抑或是具有音頻採集功能的電子設備,例如智慧型手機、平板電腦、筆記型電腦、相機等,透過檔案方式傳輸複數個訓練音頻及其對應的複數個訓練咳聲音頻、待辨識個人音頻及其待辨識咳聲音頻至儲存裝置22當中的記憶體儲存,記憶體可包含唯讀記憶體、快閃記憶體、磁碟或是雲端資料庫等。 Please refer to FIG. 5, which is a schematic diagram of a cough recognition system according to an embodiment of the present invention. As shown in the figure, the cough recognition system 20 may include an input device 21 , a storage device 22 , a processor 23 and an output device 24 . The input device 21 may include various audio collection devices, such as a microphone or an electronic device with audio collection function, such as a smart phone, a tablet computer, a notebook computer, a camera, etc., and transmit a plurality of training audio and its corresponding A plurality of training cough audios, individual audios to be identified and cough audios to be identified are stored in the memory of the storage device 22. The memory can include read-only memory, flash memory, disk or cloud database, etc. .

接著,咳聲辨識系統20藉由處理器23來存取儲存裝置22,處理器23可包含電腦或伺服器當中的中央處理器、圖像處理器、微處理器等,其可包含多核心的處理單元或者是多個處理單元的組合。處理器23執行指令以存取儲存裝置22當中的複數個訓練音頻7及其對應的複數個訓練咳聲音頻8進行卷積神經網路運算,以獲得個人音頻特徵模型3以及咳聲音頻分析辨識模型5;其後,處理器23執行指令以存取儲存裝置22當中的待辨識個人音頻1及其待辨識咳聲音頻2,利用個人音頻特徵模型3以及咳聲音頻分析辨識模型5對待辨識個人音頻1及其待辨識咳聲音頻2進行判讀程序,以獲得一對應之呼吸道疾病種類辨識結果6;最後,輸出裝置24存取儲存裝置22將所判讀獲得之呼吸道疾病種類辨識結果6輸出,輸出裝置24可包含各種顯示介面,例如電腦螢幕、顯示器或手持裝置顯示器等,但不限於此。 Then, the cough recognition system 20 accesses the storage device 22 through the processor 23. The processor 23 may include a central processing unit, an image processor, a microprocessor, etc. in a computer or a server, and it may include a multi-core A processing unit or a combination of multiple processing units. The processor 23 executes instructions to access the plurality of training audios 7 in the storage device 22 and the corresponding plurality of training cough audios 8 to perform convolutional neural network operations to obtain the personal audio feature model 3 and cough audio analysis and identification Model 5; thereafter, the processor 23 executes instructions to access the personal audio 1 to be identified and the cough audio 2 to be identified in the storage device 22, and use the personal audio feature model 3 and the cough audio analysis and identification model 5 to identify the individual The audio 1 and the cough audio 2 to be identified are interpreted to obtain a corresponding identification result 6 of the type of respiratory disease; finally, the output device 24 accesses the storage device 22 to output the identification result 6 of the type of respiratory disease obtained through interpretation, and outputs The device 24 may include various display interfaces, such as, but not limited to, a computer screen, a display, or a display of a handheld device.

綜上所述,使用本發明之咳聲辨識方法及其系統,可快速且便利的得知患者罹患之呼吸道疾病種類,透過此方式以辨別呼吸道疾病種類,可輔助臨床判讀,提升後續診斷結果的正確率。 To sum up, using the cough sound recognition method and system of the present invention, the type of respiratory disease that the patient is suffering from can be quickly and conveniently known. By identifying the type of respiratory disease in this way, it can assist clinical interpretation and improve the accuracy of subsequent diagnosis results. Correct rate.

以上所述僅為舉例性,而非為限制性者。任何未脫離本發明之精神與範疇,而對其進行之等效修改或變更,均應包含於後附之申請專利範圍中。 The above descriptions are illustrative only, not restrictive. Any equivalent modification or change made without departing from the spirit and scope of the present invention shall be included in the scope of the appended patent application.

S1~S6:步驟 S1~S6: steps

Claims (10)

一種咳聲辨識方法,其下列步驟:步驟S1:通過一輸入裝置輸入複數個訓練音頻及其對應的複數個訓練咳聲音頻,儲存於該儲存裝置;步驟S2:藉由一處理器存取該儲存裝置,將該複數個訓練音頻轉換為複數個音頻訊號;步驟S3:藉由該處理器將該複數個音頻訊號輸入至一個人音頻特徵擷取模組進行卷積神經網路運算,以建立一個人音頻特徵模型,取得複數個個人音頻特徵;步驟S4:藉由該處理器將該複數個個人音頻特徵及該複數個訓練咳聲音頻一起輸入至一咳聲音頻分析辨識模組進行卷積神經網路運算,以建立一咳聲音頻分析辨識模型;步驟S5:通過該輸入裝置輸入一待辨識個人音頻及一待辨識咳聲音頻,藉由該處理器進行判讀程序,依據該個人音頻特徵模型及該咳聲音頻分析辨識模型判讀對應之一呼吸道疾病種類;以及步驟S6:通過一輸出裝置存取該儲存裝置,將經判讀分析之該呼吸道疾病種類輸出,其中步驟S3中所述的該個人音頻特徵擷取模組,包含複數個卷積網路層(convolutional layers)、複數個長短期記憶層(long short-term memory)以及複數個全連接層(fully-connected layers),且每層包含一觸發函數,藉由該個人音頻特徵擷取模組對該複數個音頻訊號進行卷積神經網路運算,使該複數個音頻訊號被映射(mapping)至一高維度連續特徵空間(latent space),該高維度連續特徵空間即為該個人音頻特徵模型,並具有複數個高維度向量(latent vector),該複數個 高維度向量即為該複數個個人音頻特徵。 A cough recognition method, the following steps: Step S1: input a plurality of training audios and corresponding training cough audios through an input device, and store them in the storage device; Step S2: use a processor to access the storage device, convert the plurality of training audio into a plurality of audio signals; step S3: input the plurality of audio signals to a person audio feature extraction module through the processor to perform convolutional neural network calculations to create a person The audio feature model obtains a plurality of personal audio features; step S4: input the plurality of personal audio features and the plurality of training cough audio to a cough audio analysis and identification module through the processor to perform convolutional neural network to establish a cough audio analysis and identification model; step S5: input a personal audio to be identified and a cough audio to be identified through the input device, and perform an interpretation program by the processor, according to the personal audio feature model and The cough audio analysis and recognition model interprets a type of respiratory disease corresponding to it; and step S6: access the storage device through an output device, and output the type of respiratory disease that has been interpreted and analyzed, wherein the personal audio frequency described in step S3 The feature extraction module includes a plurality of convolutional layers, a plurality of long short-term memory layers and a plurality of fully-connected layers, and each layer includes a The trigger function is to perform a convolutional neural network operation on the plurality of audio signals by the personal audio feature extraction module, so that the plurality of audio signals are mapped to a high-dimensional continuous feature space (latent space), The high-dimensional continuous feature space is the personal audio feature model, and has a plurality of high-dimensional vectors (latent vector), the plurality of The high-dimensional vectors are the plurality of personal audio features. 如請求項1所述之咳聲辨識方法,其中步驟S4中所述的該複數個訓練咳聲音頻係轉換為複數個訓練咳聲音頻訊號後,再與該複數個個人音頻特徵一起輸入至該咳聲音頻分析辨識模組進行卷積神經網路運算,以建立該咳聲音頻分析辨識模型。 The cough recognition method as described in Claim 1, wherein the plurality of training cough audio systems described in step S4 are converted into a plurality of training cough audio signals, and then input together with the plurality of personal audio features into the The cough audio analysis and identification module performs convolutional neural network operations to establish the cough audio analysis and identification model. 如請求項1所述之咳聲辨識方法,其中步驟S2中所述的該複數個音頻訊號為梅爾倒頻譜係數。 The cough recognition method according to claim 1, wherein the plurality of audio signals in step S2 are Mel cepstral coefficients. 如請求項2所述之咳聲辨識方法,其中步驟S4中所述的該複數個訓練咳聲音頻訊號為梅爾倒頻譜係數。 The cough recognition method according to claim 2, wherein the plurality of training cough audio signals in step S4 are Mel cepstrum coefficients. 如請求項1至請求項4中任一項所述之咳聲辨識方法,其中步驟S5中的該待辨識咳聲音頻為即時錄製或預先錄製。 The cough recognition method according to any one of claim 1 to claim 4, wherein the cough audio to be recognized in step S5 is real-time recording or pre-recording. 一種咳聲辨識系統,其包含:一輸入裝置,係用以輸入複數個訓練音頻及其對應的複數個訓練咳聲音頻、一待辨識個人音頻及一待辨識咳聲音頻;一儲存裝置,連接於該輸入裝置,係用以儲存該複數個訓練音頻及其對應的該複數個訓練咳聲音頻、該待辨識個人音頻及該待辨識咳聲音頻;一輸出裝置,連接於該儲存裝置,係用以經判讀分析之呼吸道疾病種類輸出;以及一處理器,連接於該儲存裝置,用以執行複數個指令以施行下列步驟:將該複數個訓練音頻轉換為複數個音頻訊號,並將該複數個音頻訊號輸入至一個人音頻特徵擷取模組進行卷積神經網路運算,以建立 一個人音頻特徵模型,取得複數個個人音頻特徵;將該複數個個人音頻特徵及該複數個訓練咳聲音頻一起輸入至一咳聲音頻分析辨識模組進行卷積神經網路運算,以建立一咳聲音頻分析辨識模型;以及依據該個人音頻特徵模型及該咳聲音頻分析辨識模型,判讀該待辨識個人音頻及該待辨識咳聲音頻,以分析出對應之呼吸道疾病種類,其中該個人音頻特徵擷取模組,包含複數個卷積網路層(convolutional layers)、複數個長短期記憶層(long short-term memory)以及複數個全連接層(fully-connected layers),且每層包含一觸發函數,藉由該個人音頻特徵擷取模組對該複數個音頻訊號進行卷積神經網路運算,使該複數個音頻訊號被映射(mapping)至一高維度連續特徵空間(latent space),該高維度連續特徵空間即為該個人音頻特徵模型,並具有複數個高維度向量(latent vector),該複數個高維度向量即為該複數個個人音頻特徵。 A cough recognition system, which includes: an input device, which is used to input a plurality of training audio and its corresponding plurality of training cough audio, a personal audio to be identified, and a cough audio to be identified; a storage device connected to The input device is used to store the plurality of training audios and the corresponding plurality of training cough audios, the audio to be identified and the audio to be identified; an output device is connected to the storage device and is The type output of the respiratory disease for interpretation and analysis; and a processor, connected to the storage device, for executing a plurality of instructions to perform the following steps: convert the plurality of training audio into a plurality of audio signals, and convert the plurality of An audio signal is input to a human audio feature extraction module for convolutional neural network operations to establish A human audio feature model obtains a plurality of personal audio features; the plurality of personal audio features and the plurality of training cough audio are input to a cough audio analysis and identification module for convolutional neural network operations to create a cough Sound audio analysis and identification model; and according to the personal audio feature model and the cough audio analysis and identification model, interpret the personal audio to be identified and the cough audio to be identified to analyze the corresponding type of respiratory disease, wherein the personal audio feature The capture module includes a plurality of convolutional layers, a plurality of long short-term memory layers, and a plurality of fully-connected layers, and each layer contains a trigger The function uses the personal audio feature extraction module to perform convolutional neural network operations on the multiple audio signals, so that the multiple audio signals are mapped (mapped) to a high-dimensional continuous feature space (latent space), the The high-dimensional continuous feature space is the personal audio feature model, and has multiple high-dimensional vectors (latent vectors), and the multiple high-dimensional vectors are the multiple personal audio features. 如請求項6所述之咳聲辨識系統,其中該複數個訓練咳聲音頻係轉換為複數個訓練咳聲音頻訊號後,再與該複數個個人音頻特徵一起輸入至該咳聲音頻分析辨識模組進行卷積神經網路運算,以建立該咳聲音頻分析辨識模型。 The cough recognition system as described in claim 6, wherein the plurality of training cough audio systems are converted into a plurality of training cough audio signals, and then input to the cough audio analysis and identification module together with the plurality of personal audio features The group performs convolutional neural network operations to establish the cough audio analysis and identification model. 如請求項6所述之咳聲辨識系統,其中該複數個音頻訊號為梅爾倒頻譜係數。 The cough recognition system as claimed in claim 6, wherein the plurality of audio signals are Mel cepstral coefficients. 如請求項7所述之咳聲辨識系統,其中該複數個訓練咳聲音頻訊號為梅爾倒頻譜係數。 The cough recognition system according to claim 7, wherein the plurality of training cough audio signals are Mel cepstral coefficients. 如請求項6至請求項9中任一項所述之咳聲辨識系統,其中 該待辨識咳聲音頻為即時錄製或預先錄製。 The cough recognition system according to any one of claim 6 to claim 9, wherein The cough audio to be identified is recorded in real time or pre-recorded.
TW111122515A 2022-06-16 2022-06-16 Cough identification method and system thereof TWI798111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW111122515A TWI798111B (en) 2022-06-16 2022-06-16 Cough identification method and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW111122515A TWI798111B (en) 2022-06-16 2022-06-16 Cough identification method and system thereof

Publications (2)

Publication Number Publication Date
TWI798111B true TWI798111B (en) 2023-04-01
TW202401456A TW202401456A (en) 2024-01-01

Family

ID=86945174

Family Applications (1)

Application Number Title Priority Date Filing Date
TW111122515A TWI798111B (en) 2022-06-16 2022-06-16 Cough identification method and system thereof

Country Status (1)

Country Link
TW (1) TWI798111B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112105297A (en) * 2018-05-08 2020-12-18 思睿逻辑国际半导体有限公司 Health-related information generation and storage

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112105297A (en) * 2018-05-08 2020-12-18 思睿逻辑国际半导体有限公司 Health-related information generation and storage

Also Published As

Publication number Publication date
TW202401456A (en) 2024-01-01

Similar Documents

Publication Publication Date Title
Aykanat et al. Classification of lung sounds using convolutional neural networks
CN109431507A (en) Cough disease identification method and device based on deep learning
WO2019023879A1 (en) Cough sound recognition method and device, and storage medium
CN105512348A (en) Method and device for processing videos and related audios and retrieving method and device
JP2015514456A (en) Method and apparatus for processing patient sounds
CN113436726B (en) Automatic lung pathological sound analysis method based on multi-task classification
US11741986B2 (en) System and method for passive subject specific monitoring
Xia et al. Exploring machine learning for audio-based respiratory condition screening: A concise review of databases, methods, and open issues
Niu et al. A time-frequency channel attention and vectorization network for automatic depression level prediction
Lin et al. Contactless sleep apnea detection in snoring signals using hybrid deep neural networks targeted for embedded hardware platform with real-time applications
TWI798111B (en) Cough identification method and system thereof
Tian et al. Classification of phonocardiogram based on multi-view deep network
CN110074759B (en) Voice data auxiliary diagnosis method, device, computer equipment and storage medium
CN111489824A (en) OSAHS prediction system based on Internet of things
Abhishek et al. ESP8266-based Real-time Auscultation Sound Classification
Rahman et al. Efficient online cough detection with a minimal feature set using smartphones for automated assessment of pulmonary patients
CN112071388A (en) Intelligent medicine dispensing and preparing method based on deep learning
Melms et al. Training one model to detect heart and lung sound events from single point auscultations
Karataş et al. Mobile application that detects COVID-19 from cough and image using smartphone recordings and machine learning
Kim et al. Non-invasive way to diagnose dysphagia by training deep learning model with voice spectrograms
CN117393156B (en) Multi-dimensional remote auscultation and diagnosis intelligent system based on cloud computing
Ahmed et al. DeepLung: Smartphone Convolutional Neural Network-Based Inference of Lung Anomalies for Pulmonary Patients.
Abhishek et al. The Auscultation Sound Classification Era of the Future
Nallanthighal et al. Detection of Coughing and Respiratory Sensing in Conversational Speech.
CN114708972B (en) VTE risk early warning system