CN110946554A - Cough type identification method, device and system - Google Patents

Cough type identification method, device and system Download PDF

Info

Publication number
CN110946554A
CN110946554A CN201911188230.4A CN201911188230A CN110946554A CN 110946554 A CN110946554 A CN 110946554A CN 201911188230 A CN201911188230 A CN 201911188230A CN 110946554 A CN110946554 A CN 110946554A
Authority
CN
China
Prior art keywords
signal
cough
target
template
slope
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911188230.4A
Other languages
Chinese (zh)
Inventor
刘洪涛
柳丝
张翔
王伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Original Assignee
Shenzhen H & T Home Online Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen H & T Home Online Network Technology Co ltd filed Critical Shenzhen H & T Home Online Network Technology Co ltd
Priority to CN201911188230.4A priority Critical patent/CN110946554A/en
Publication of CN110946554A publication Critical patent/CN110946554A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Abstract

The embodiment of the invention provides a cough type identification method and a cough type identification device. The method comprises the following steps: acquiring a first signal and a second signal acquired when a user coughs; the first signal is a signal obtained by collecting other signals which are not sound and are generated when the user coughs in a contact mode, and the second signal is an audio signal obtained by collecting the sound generated when the user coughs; obtaining a third signal according to the first signal and the second signal; extracting a first signal feature according to the third signal; the cough type of the user's cough is determined from the first signal characteristic. By adopting the embodiment of the invention, the noise can be effectively filtered, and only the cough sound of the target user is subjected to signal characteristic extraction and cough type identification, so that the cough types such as dry cough or wet cough can be more accurately distinguished.

Description

Cough type identification method, device and system
Technical Field
The present application relates to the field of sound processing technologies, and in particular, to a method, an apparatus, and a system for identifying a cough type.
Background
The first clinical manifestation of respiratory diseases is cough, which can be classified as dry cough or wet cough according to whether there is phlegm. Dry cough, i.e. dry cough, without phlegm or with little phlegm, which is commonly seen in chronic laryngitis, tracheitis or foreign body in the tube; cough due to dampness, i.e. cough with phlegm, is usually seen in chronic bronchitis, pneumonia or bronchiectasis.
When patients visit a clinic in real life, patients often cannot accurately describe cough symptoms and can hardly reproduce the complete cough symptoms. Therefore, doctors often have difficulty in effectively utilizing cough related information, and it is more difficult for doctors to accurately locate causes based on the only information in a short visit time, which results in long course of disease and even worsening of disease. The cough judgment of artificial audition mainly depends on the experience of doctors, the judgment standard is not uniform, and the misdiagnosis probability is high.
Disclosure of Invention
The embodiment of the invention discloses a cough type identification method, a cough type identification device and a cough type identification system, which can effectively filter noise, extract signal characteristics and identify cough types only for cough sounds of target users, and therefore more accurately distinguish dry cough, wet cough and other cough types to which the cough sounds belong.
In a first aspect, an embodiment of the present invention provides a method for identifying a cough type, including: acquiring a first signal and a second signal acquired when a user coughs; the first signal is a signal obtained by collecting other signals which are not sound and are generated when the user coughs in a contact mode, and the second signal is an audio signal obtained by collecting the sound generated when the user coughs; obtaining a third signal according to the first signal and the second signal; extracting a first signal feature according to the third signal; the cough type of the user's cough is determined from the first signal characteristic.
In the above method, the first signal is a signal which is not a sound generated when the target user coughs and is collected by contact with the target user, and the first signal belongs only to the signal emitted by the target user. The second signal is an audio signal that captures the sound generated when the target user coughs, but the second signal may be noise from non-target users. According to the embodiment of the invention, the first signal and the second signal are combined, the environmental noise and other noises can be effectively removed, and the first signal characteristic extraction and the cough type identification are only carried out on the third signal sent by the target user, so that the cough types such as dry cough or wet cough, which the cough sound belongs to, can be more accurately distinguished.
In an alternative of the first aspect, the first signal and the second signal each comprise a plurality of signal frames, each signal frame corresponding to a different acquisition period.
The obtaining a third signal according to the first signal and the second signal includes: if the energy of the first signal frame is larger than or equal to a preset energy threshold value, determining a second signal frame in the third signal as the first signal frame; the second signal frame and the first signal frame correspond to the same acquisition time period, and the first signal frame is any one of the first signals; and if the energy of the first signal frame is smaller than the preset energy threshold, determining that the second signal frame is a third signal frame in the second signal, wherein the third signal frame and the first signal frame correspond to the same acquisition time period.
In the above method, although the first signal only belongs to the signal emitted by the target user, the frequency band range of the first signal acquired by the contact type method is small, and the cough of the target user cannot be comprehensively characterized. The second signal has a wide frequency band, but includes noise other than the voice of the target user. In the embodiment of the invention, the first signal is combined to remove noise (such as speaking sound, cough sound, body movement sound and the like of other people except the target user) in the second signal, so that a third signal which only belongs to the target user and has a comprehensive frequency band range and can be used for accurately representing the cough sound of the target user is obtained. The extraction of the first signal characteristic and the identification of the cough type are carried out on the third signal, so that not only can noise be filtered, but also signals of partial frequency bands sent by a target user can not be omitted, and the identification of the cough types such as dry cough, wet cough and the like is more accurate and comprehensive.
In yet another alternative of the first aspect, the first signal feature comprises an average short-time energy of a short-time energy spectrum of the third signal, a first slope, a second slope, and one or more mel-frequency cepstral coefficient feature vectors; the first slope of the third signal is the slope of a signal frame corresponding to the signal start point in the third signal in the short-time energy spectrum of the third signal, and the second slope of the third signal is the slope of a signal frame corresponding to the signal end point in the third signal in the short-time energy spectrum of the third signal.
In the above method, the first signal features used for characterizing the third signal include a sound energy feature that can be used for distinguishing cough types such as dry cough or wet cough, and a mel-frequency cepstrum coefficient MFCC feature vector that is commonly used for characterizing human voice. According to the embodiment of the invention, three energy characteristics (average short-time energy, a first slope and a second slope of a short-time energy spectrum) are added on the basis of the MFCC feature vector, so that the accuracy of cough classification is improved, and the identification of cough types such as dry cough or wet cough is more accurate and comprehensive.
In yet another alternative of the first aspect, the determining the cough type of the user's cough based on the first signal characteristic includes: inputting the first signal characteristic into a first model to obtain an output result of the first model; the output result is used for representing the cough type of the user cough, and the first model is a cough type recognition model which is obtained through pre-training and is based on a support vector data description algorithm.
In the above method, the cough type of the user's cough is determined by a cough type recognition model based on the support vector data description algorithm SVDD. The SVDD algorithm has the advantages of wide application range, low algorithm complexity and less calculation amount, so that the requirement on hardware is low, and the manufacturing cost of the product is reduced.
In yet another alternative of the first aspect, the first model includes a plurality of cough templates, one cough template corresponding to one cough type, each cough template corresponding to one average short-time energy sub-model, one start-point slope sub-model, one end-point slope sub-model, and one vector sub-model; the center of the average short-time energy submodel is a first center, and the radius of the average short-time energy submodel is a first radius; the center of the starting point slope submodel is a second center, and the radius of the starting point slope submodel is a second radius; the center of the terminal slope submodel is a third center, and the radius of the terminal slope submodel is a third radius; the center of the vector sub-model is the fourth center, and the radius of the vector sub-model is the fourth radius.
The above inputting the first signal characteristic into the first model to obtain the output result of the first model includes: and calculating the distance between the coordinate point corresponding to the average short-time energy of the third signal and the coordinate point corresponding to the first center corresponding to each cough template so as to determine the first distance corresponding to each cough template.
And calculating the distance between the coordinate point corresponding to the first slope of the third signal and the coordinate point corresponding to the second center corresponding to each cough template so as to determine the second distance corresponding to each cough template.
And calculating the distance between the coordinate point corresponding to the second slope of the third signal and the coordinate point corresponding to the third center corresponding to each cough template so as to determine the third distance corresponding to each cough template.
And calculating one or more distances between the coordinate point corresponding to each of the one or more mel-frequency cepstral coefficient feature vectors of the third signal and the coordinate point corresponding to the fourth center corresponding to each cough template so as to determine one or more fourth distances corresponding to each cough template.
Under the condition that a first distance corresponding to the target cough template is smaller than a first radius corresponding to the target cough template, a second distance corresponding to the target cough template is smaller than a second radius corresponding to the target cough template, a third distance corresponding to the target cough template is smaller than a third radius corresponding to the target cough template, and one or more fourth distances corresponding to the target cough template are smaller than a fourth radius corresponding to the target cough template, determining that the cough type corresponding to the target cough template is an output result of the first model; the target cough template is one of a plurality of cough templates.
In the above method, the first model includes a plurality of cough templates, and the different cough templates correspond to different cough types. Each cough template corresponds to an average short-time energy sub-model, a start-point slope sub-model, an end-point slope sub-model and a vector sub-model. And under the condition that each feature of the third signal corresponds to all the submodels corresponding to the matched target cough template, determining that the cough type corresponding to the target cough template is the cough type of the user cough. According to the embodiment of the invention, the three energy characteristics for distinguishing the cough types such as dry cough, wet cough and the like, the Mel cepstrum coefficient MFCC characteristic vector and the SVDD are combined to identify the cough types, so that the accuracy rate of identifying the cough types such as dry cough, wet cough and the like can be obviously improved.
In yet another alternative of the first aspect, before determining the cough type of the cough of the user according to the first signal characteristic, the method further comprises: acquiring signal characteristics of a preset number of sample signals of the target cough types; the target cough type is a preset and known cough type; a first model is derived based on signal characteristics of a preset number of sample signals of a target cough type.
In yet another alternative of the first aspect, the signal characteristic of the sample signal includes an average short-time energy of a short-time energy spectrum of the sample signal, a first slope of the sample signal, a second slope of the sample signal, and one or more mel-frequency cepstral coefficient feature vectors of the sample signal.
The obtaining a first model according to the signal characteristics of the sample signals of the preset number of target cough types includes: and obtaining an average short-time energy submodel corresponding to the target cough type according to the average short-time energy of the sample signals of the preset number of target cough types.
And obtaining a starting point slope submodel corresponding to the target cough type according to the first slopes of the sample signals of the preset number of target cough types.
And obtaining a terminal slope submodel corresponding to the target cough type according to the second slopes of the sample signals of the preset number of target cough types.
And obtaining a vector sub-model corresponding to the target cough type according to one or more Mel cepstrum coefficient feature vectors of the sample signals of the preset number of target cough types.
In a second aspect, an embodiment of the present invention provides a cough type identification device, including: the first acquisition module is used for acquiring a first signal and a second signal acquired when a user coughs; the first signal is a signal obtained by collecting other signals which are not sound and are generated when the user coughs in a contact mode, and the second signal is an audio signal obtained by collecting the sound generated when the user coughs; the second acquisition module is used for obtaining a third signal according to the first signal and the second signal; the extraction module is used for extracting a first signal characteristic according to the third signal; and the determining module is used for determining the cough type of the cough of the user according to the first signal characteristic.
In an alternative of the second aspect, the first signal and the second signal each comprise a plurality of signal frames, each signal frame corresponding to a different acquisition period.
The second obtaining module is specifically configured to: if the energy of the first signal frame is larger than or equal to a preset energy threshold value, determining a second signal frame in the third signal as the first signal frame; the second signal frame and the first signal frame correspond to the same acquisition time period, and the first signal frame is any one of the first signals; if the energy of the first signal frame is smaller than a preset energy threshold value, determining that the second signal frame is a third signal frame in the second signal; the third signal frame corresponds to the same acquisition period as the first signal frame.
In yet another alternative of the second aspect, the first signal feature comprises an average short-time energy of a short-time energy spectrum of the third signal, a first slope, a second slope, and one or more mel-frequency cepstral coefficient feature vectors; the first slope of the third signal is the slope of a signal frame corresponding to the signal start point in the third signal in the short-time energy spectrum of the third signal, and the second slope of the third signal is the slope of a signal frame corresponding to the signal end point in the third signal in the short-time energy spectrum of the third signal.
In yet another alternative of the second aspect, the determining module is specifically configured to input the first signal characteristic into the first model, and obtain an output result of the first model; the output result is used for representing the cough type of the user cough, and the first model is a cough type recognition model which is obtained through pre-training and is based on a support vector data description algorithm.
In yet another alternative of the second aspect, the first model includes a plurality of cough templates, one cough template corresponding to one cough type, each cough template corresponding to one average short-time energy sub-model, one start-point slope sub-model, one end-point slope sub-model and one vector sub-model; the center of the average short-time energy submodel is a first center, and the radius of the average short-time energy submodel is a first radius; the center of the starting point slope submodel is a second center, and the radius of the starting point slope submodel is a second radius; the center of the terminal slope submodel is a third center, and the radius of the terminal slope submodel is a third radius; the center of the vector sub-model is the fourth center, and the radius of the vector sub-model is the fourth radius.
The determining module includes: and the first calculating unit is used for calculating the distance between the coordinate point corresponding to the average short-time energy of the third signal and the coordinate point corresponding to the first center corresponding to each cough template so as to determine the first distance corresponding to each cough template.
And the second calculating unit is used for calculating the distance between the coordinate point corresponding to the first slope of the third signal and the coordinate point corresponding to the second center corresponding to each cough template so as to determine the second distance corresponding to each cough template.
And the third calculating unit is used for calculating the distance between the coordinate point corresponding to the second slope of the third signal and the coordinate point corresponding to the third center corresponding to each cough template so as to determine the third distance corresponding to each cough template.
And the fourth calculating unit is used for calculating one or more distances between the coordinate point corresponding to each of the one or more mel cepstrum coefficient feature vectors of the third signal and the coordinate point corresponding to the fourth center corresponding to each cough template so as to determine one or more fourth distances corresponding to each cough template.
The output determining unit is used for determining that the cough type corresponding to the target cough template is the output result of the first model under the condition that the first distance corresponding to the target cough template is smaller than the first radius corresponding to the target cough template, the second distance corresponding to the target cough template is smaller than the second radius corresponding to the target cough template, the third distance corresponding to the target cough template is smaller than the third radius corresponding to the target cough template, and one or more fourth distances corresponding to the target cough template are smaller than the fourth radius corresponding to the target cough template; the target cough template is one of a plurality of cough templates.
In yet another alternative of the second aspect, the apparatus further comprises a third obtaining module, the third obtaining module comprising: the acquisition characteristic unit is used for acquiring signal characteristics of sample signals of a preset number of target cough types; the target cough type is a preset and known cough type.
And the model obtaining unit is used for obtaining a first model according to the signal characteristics of the sample signals of the preset number of target cough types.
In yet another alternative of the second aspect, the signal characteristic of the sample signal includes an average short-time energy of a short-time energy spectrum of the sample signal, a first slope of the sample signal, a second slope of the sample signal, and one or more mel-frequency cepstral coefficient feature vectors of the sample signal.
The model obtaining unit includes:
the first obtaining subunit is configured to obtain an average short-time energy submodel corresponding to the target cough type according to the average short-time energy of the sample signals of the preset number of target cough types.
And the second obtaining subunit is used for obtaining a starting point slope submodel corresponding to the target cough type according to the first slopes of the sample signals of the preset number of target cough types.
And the third obtaining subunit is configured to obtain an end point slope sub-model corresponding to the target cough type according to the second slopes of the sample signals of the preset number of target cough types.
And the fourth obtaining subunit is configured to obtain a vector sub-model corresponding to the target cough type according to one or more mel cepstrum coefficient feature vectors of the sample signals of the preset number of target cough types.
In a third aspect, an embodiment of the present invention provides a cough type identification device, including a processor and a memory; the memory is configured to store a computer program, and the processor is configured to invoke the computer program to execute the cough type identification method provided by the first aspect of the embodiment of the present invention or any implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a cough type identification device, which includes a first acquisition module, a second acquisition module, a digital signal processor DSP, a micro control unit MCU, a communication module, and a power module. The MCU controls the first acquisition module, the second acquisition module and the DSP to execute the cough type identification method provided by the first aspect of the embodiments of the present invention or any implementation manner of the first aspect. The communication module is used for establishing a data connection relation with the server. The power module is used for providing electric energy for the device.
In a fifth aspect, an embodiment of the present invention provides a cough type identification system, which includes a cough type identification device, a server, and a user terminal. The cough type recognition device and the server establish a data connection relationship, and the server and the user terminal establish a data connection relationship. The cough type identification device is provided in any implementation manner of the second aspect or the second aspect of the embodiment of the present invention, or provided in any implementation manner of the third aspect or the third aspect, or provided in any implementation manner of the fourth aspect or the fourth aspect. A server is a single server or a cluster of servers that can provide multiple services. The user terminal is any one of the following: the mobile terminal comprises a mobile phone, a tablet computer, a notebook computer, a palm computer, mobile internet equipment, wearable equipment and the like, wherein the mobile phone, the tablet computer, the notebook computer, the palm computer, the mobile internet equipment, the wearable equipment and the like are provided with terminals with display panels.
In a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium, which includes a computer program and, when the computer program runs on a cough type recognition apparatus, causes the cough type recognition apparatus to execute the cough type recognition method provided in the first aspect of the embodiment or any implementation manner of the first aspect of the present invention.
In a seventh aspect, an embodiment of the present invention provides a computer program product, where the computer program product includes: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code being executable by one or more processors for performing the cough type identification method provided by the first aspect of the embodiments or any one implementation manner of the first aspect of the embodiments.
It is to be understood that the cough type identification device provided in the second aspect, the cough type identification device provided in the third aspect, the cough type identification device provided in the fourth aspect, the computer-readable storage medium provided in the sixth aspect, and the computer program product provided in the seventh aspect are all configured to execute the cough type identification method provided in the first aspect, and therefore, the beneficial effects achieved by the method can refer to the beneficial effects in the cough type identification method provided in the first aspect, and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments of the present invention or the background art will be briefly described below.
Fig. 1 is a schematic structural diagram of a cough type identification system according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a method for identifying a cough type according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a time domain spectrum of a signal;
fig. 4 is a schematic flow chart of another cough type identification method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an embodiment of obtaining a third signal according to the present invention;
FIG. 6 is a schematic diagram of a short time energy spectrum and a short time zero crossing rate spectrum;
fig. 7 is a flowchart illustrating a cough type recognition method according to another embodiment of the present invention;
FIG. 8 is a schematic of a short-time energy spectrum;
FIG. 9 is a schematic flow chart of a method of extracting mel-frequency cepstral coefficients;
fig. 10 is a flowchart illustrating a method for identifying a cough type according to another embodiment of the present invention;
fig. 11 is a schematic structural diagram of a cough type identification device according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of another cough type identification device according to an embodiment of the present invention;
fig. 13A is a side view of a cough type recognition device provided by an embodiment of the present invention;
fig. 13B is a front view of a cough type recognition device provided by an embodiment of the present invention;
fig. 14 is a schematic structural diagram of another cough type identification device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a cough type identification system according to an embodiment of the present invention.
As shown in fig. 1, the cough type recognition system may include a target user 10, a cough type recognition device 20, a server 30, and a user terminal 40. The target user 10 wears the cough type recognition device 20, and the cough type recognition device 20 is used for collecting signals generated when the target user 10 coughs, and extracting signal characteristics and recognizing cough types of the signals. Meanwhile, the cough type recognition device 20 may transmit the processed cough data to the server 30. The server 30 may analyze and count the received cough data, and then send the analysis and counting result to the user terminal 40 for display. Alternatively, the server 30 may directly send the cough data to the user terminal 40 for display. The target user 10 can view his or her own cough data through the user terminal 40 and provide his or her own cough data to be analyzed by the doctor at the time of a visit or a consultation, so that the doctor can quickly locate the cause.
The cough type recognition device 20 may collect the first signal not only by direct contact with the body of the target user 10, but also by collecting the second signal through a medium such as air, and extract the signal characteristics and recognize the type of cough by combining the first signal and the second signal. The cough type recognition device 20 may, but is not limited to, establish a data connection with the server 30 via a wireless link.
The server 30 may be a server capable of providing various services, and the server 30 may establish a data connection relationship with the cough type recognition apparatus 20 and the user terminal 40 through a wireless link, but is not limited thereto. The server 30 may be, but is not limited to, a hardware server, a virtual server, a cloud server, and the like.
The wireless link may be, but is not limited to, a Wireless Personal Area Network (WPAN) (e.g., bluetooth), a Wireless Local Area Network (WLAN) (e.g., WIFI), and a mobile device network.
The user terminal 40 may perform bidirectional communication with the server 30, the server 30 may send the cough data to the user terminal 40 for display, and the user terminal 40 may also upload the related information recorded by the target user 10 at the user terminal 40 to the server 30. The related information recorded by the target user 10 at the user terminal 40 may include, but is not limited to: basic information such as sex, age, eating habits and sleep, and cough information such as duration of cough, subjective high-incidence time and whether sputum exists.
The user terminal 40 may be, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a mobile internet device, a wearable device, and other terminals having a display panel.
As an alternative embodiment, the user terminal 40 may, but is not limited to, establish a data connection with the cough type identifying device 20 through the above-mentioned wireless link. The cough type recognition device 20 may send the cough data collected and processed by itself to the user terminal 40, and the user terminal 40 may install an application program capable of analyzing the cough data, and analyze and display the cough data through the application program.
As an alternative, the user terminal 40 may install an application server as the server 30, that is, an application program configured with the server 30, and display the cough data sent by the server 30 through the application program.
In a specific implementation, the user terminal 40 may also parse and display the cough data sent by the cough type identifying device 20 or the server 30 through an application program that is not installed, for example, but not limited to, a wechat applet, which is not limited in this embodiment of the present invention.
It is to be understood that the number of servers 30 and user terminals 40 in the cough type recognition system shown in fig. 1 is merely an example. In a specific implementation, the network architecture of the cough type identification system may include any number of servers and user terminals, which is not limited in this embodiment of the present invention. For example, but not limiting of, server 30 may be a server cluster of multiple servers.
Based on the cough type recognition system shown in fig. 1, a method for recognizing a cough according to an embodiment of the present invention will be described.
Referring to fig. 2, fig. 2 is a schematic flowchart of a cough type identification method according to an embodiment of the present invention, which can be implemented by the cough type identification device 20 shown in fig. 1. The method includes, but is not limited to, the steps of:
step S210: the first signal and the second signal collected when the user coughs are acquired.
Specifically, the first signal is a signal obtained by collecting other signals which are not sound and are generated when the user coughs in a contact mode, and the second signal is an audio signal obtained by collecting sound generated when the user coughs.
As an alternative embodiment, the first signal may be acquired by a first acquisition module, which may be, but is not limited to, a bone conduction sensor, a piezoelectric sensor, or a piezoresistive sensor. The first signal may be acquired by a second acquisition module, which may be, but is not limited to, a micro-electro-mechanical system microphone, a moving coil microphone, a capacitive microphone, or an electret condenser microphone.
For convenience of description, in the embodiment of the present invention, the first acquisition module is a bone conduction sensor, the sampling rate of the first acquisition module is 2.4kHz, the second acquisition module is a micro-electromechanical system microphone, and the sampling rate of the second acquisition module is 8 kHz.
Specifically, bone conduction means that sound waves are transmitted to the inner ear through the vibration of the skull, the jaw bone, and the like without passing through the outer ear and the middle ear, and without affecting others due to diffusion in the air. The technology utilizing the bone conduction principle is mainly embodied in bone conduction hearing aids, bone conduction microphones, bone conduction earphones, bone conduction mobile phones and the like.
The first acquisition module in the embodiment of the invention: the bone conduction transducer can be collected and converted into an electrical signal using a slight vibration signal of the head and neck bone marrow that is caused when a person speaks (which vibration signal can be used to characterize the sound made by the user). Unlike a conventional microphone which collects or propagates an audio signal through air propagation, a bone conduction sensor is not affected by other sounds propagating through air, and a vibration signal of a target user contacting the bone conduction sensor can be collected or converted into sounds audible to the human ear and clearly propagated even in a very noisy environment. Therefore, the bone conduction transducer has good noise resistance and noise resistance.
Since the frequency band of signals that existing bone conduction sensors are capable of acquiring or propagating is typically in the range of 0-2.4 kilohertz (kHz), which human coughing sounds sometimes are above, a second acquisition module is required: the micro electro mechanical system microphone is matched with the cough audio signal, so that the audio signal conforming to the human voice frequency band is more comprehensively collected. The MEMS microphone (abbreviated as "micro microphone") may be formed by integrating various mechanical elements, sensors, and electronic circuits on a silicon chip through a micro-electro-mechanical system (MEMS) micro-processing technique, so that the micro microphone is more compact while improving audio quality.
In particular, the first signal and the second signal may be signals acquired during the same time period (e.g., a time period during which the target user coughs). But the manner of acquisition may be different, such as the first signal being a signal acquired by the bone conduction sensor through contact with the target user and the second signal being an audio signal acquired by the micromodule wind through an air medium. The frequency bands of the signals may be different, e.g. the frequency band of the first signal is in the range 0-2.4kHz and the frequency band of the second signal is 2-6 kHz. The sampling rate may be different, e.g. the sampling rate of the first signal may be 2.4kHz and the sampling rate of the second signal may be 8 kHz.
Step S220: and obtaining a third signal according to the first signal and the second signal.
Specifically, although the first signal only belongs to the signal sent by the target user, the frequency band range of the first signal acquired by the contact type method is small, and the cough of the target user cannot be comprehensively represented. The second signal has a wide frequency band, but includes noise other than the voice of the target user. According to the embodiment of the invention, the third signal is obtained by combining the first signal and the second signal, the environmental noise and other noises (such as speaking sound, cough sound, body activity sound and the like of other people except the target user) in the second signal are removed, and the third signal which only belongs to the cough sound of the target user and has a comprehensive frequency band range can be obtained. The extraction of the first signal characteristics and the identification of the cough type are carried out on the third signal, so that not only can noise be filtered, but also the cough signals belonging to partial frequency bands of the target user can not be omitted, and the identification of the cough types such as dry cough, wet cough and the like is more accurate and comprehensive.
As an alternative embodiment, the analysis and processing of the audio signal typically needs to be done on a short-time basis, i.e. on a short-time analysis. The short-time analysis divides the audio signal into a plurality of sections, each section is called a frame, and in the actual processing, the length of the frame is generally 10-30 milliseconds (ms).
In a specific implementation, the signal may be framed, but not limited to, using a continuous or overlapping segmentation method. Referring next to fig. 3, one way to frame a signal is illustrated.
Referring to fig. 3, fig. 3 is a schematic diagram of a time domain spectrum of a signal. The signal may be an analog signal or a digital signal, which is not limited in the embodiment of the present invention.
As shown in fig. 3, the continuous curve in fig. 3 is the time domain spectrum of the signal. The time domain spectrum of the signal is divided into a first frame, a second frame and a third frame according to the frame length N and the frame shift M. The frame shift is an overlapping portion of the previous and subsequent frames, and the ratio M/N of the frame shift to the frame length may be, but is not limited to, 0-1/2.
In practical applications, the signal to be processed is usually framed. For convenience of explanation and practical consideration, in the embodiment of the present invention, the first signal, the second signal, and the third signal are all signals that are framed and include a plurality of signal frames, and the frame length is 30ms and the frame shift is 15 ms.
In particular, the first signal and the second signal each comprise a plurality of signal frames, each signal frame corresponding to a different acquisition period.
The acquisition periods of the first signal and the second signal may be identical, and the first signal may include the same number of signal frames as the second signal. Each signal frame of the first signal corresponds to a signal frame of the second signal one to one according to the acquisition period. For example, but not limiting of, the acquisition periods of the first signal frame of the first signal and the first signal frame of the second signal are the same.
Step S220 is then performed with reference to fig. 4: a flow of obtaining the third signal from the first signal and the second signal will be described.
Referring to fig. 4, fig. 4 is a flowchart illustrating another cough type identification method according to an embodiment of the present invention. The method may be implemented by the cough type identifying device 20 shown in fig. 1. The method includes, but is not limited to, the steps of:
step S221: and judging whether the energy of the first signal frame of the first signal is smaller than a preset energy threshold value.
Specifically, the first signal frame is any one of a plurality of signal frames included in the first signal.
Specifically, the energy may be, but is not limited to, short-term energy, short-term zero-crossing rate, and the like. The preset energy threshold may be, but is not limited to, a threshold obtained by means of statistics, processing experience, model calculation, and the like, and the preset energy thresholds of different signals may be different. The embodiment of the present invention is not limited thereto.
It is understood that in practical applications, the acquired signals may not actually have the required data although the acquired signals have data due to, but not limited to, the influence of transmission media, acquisition manners, environmental factors and the like. Therefore, in order to facilitate subsequent processing and reduce unnecessary calculation amount and processing amount, it is usually necessary to determine the collected signals and determine whether the data actually exist, i.e., whether the signals are valid signals. A common determination method is to determine by energy characteristics of the signal, such as, but not limited to, a short-term energy analysis method.
For convenience of explanation, the embodiment of the present invention is described by taking energy as short-time energy and a short-time energy analysis method as an example.
Specifically, the short-term energy is a common characteristic parameter of energy analysis, and the short-term energy represents the intensity of signals at different moments. The short-term energy can distinguish voiced, consonant and vowel boundaries, unvoiced and voiced boundaries, and the like from speech. Short-time energy E of nth signal framenThe expression of (a) is as follows:
Figure BDA0002291360680000091
wherein N is the frame length of the signal frame.
Performing short-time energy analysis on the first signal includes, but is not limited to, the following operations:
first, the short-time energy E of each signal frame of the first signal is calculatednWhere N is the frame length of each signal frame, and for digital signals is the number of sampling points per frame. Based on the first sampling module (bone conduction sensor) and the corresponding sampling rate (2.4kHz) agreed by the embodiment of the present invention, and the agreed frame length (30ms), the number of sampling points of any one signal frame of the first signal is N ═ 2.4 × 103×30×10-3=72。
Then, but not limited to, a preset energy threshold, i.e., a short-time energy threshold G, corresponding to the first signal may be determined through statistics, processing experience, model calculation, and the like. For example, but not limited to, the average short-time energy of the first signal is used as the short-time energy threshold G.
Finally, the short-time energy E of each signal frame of the first signal is comparednAnd a short-time energy threshold G. For each signal frame, if the short-time energy E of the signal framenAnd if the short-time energy threshold value is larger than the short-time energy threshold value G, marking the signal frame as a valid signal frame, otherwise marking the signal frame as an invalid signal frame.
In a specific implementation, the short-time energy and the short-time zero crossing rate may be combined to determine whether the signal is valid, which is not limited in the embodiment of the present invention.
In particular, each signal frame comprised by the first signal and the second signal may be traversed sequentially in the order of the acquisition periods. If the energy of the first signal frame is less than the preset energy threshold, the first signal frame is considered to be an invalid signal frame, and step S222 is executed. In the case that the energy of the first signal frame is greater than or equal to the preset energy threshold, the first signal frame is considered to be a valid signal frame, and step S223 is executed.
Step S222: the second signal frame of the third signal is determined to be the first signal frame.
Specifically, the second signal frame corresponds to the same acquisition period as the first signal frame.
And under the condition that the energy of the first signal frame of the first signal is less than the preset energy threshold value, even if the signal frame which is the same as the acquisition time interval of the first signal frame in the second signal is effective, the signal frame does not belong to the audio signal sent by the target user. The second signal frame of the third signal is thus set as the first signal frame of the first signal, i.e. the inactive signal frame. And invalid signal frames are not processed in the subsequent processing, so that the unnecessary computation amount is reduced.
Step S223: determining the second signal frame of the third signal as a third signal frame of the second signal.
Specifically, the third signal frame corresponds to the same acquisition period as the first signal frame.
And under the condition that the energy of a first signal frame of the first signal is equal to or greater than a preset energy threshold, a signal frame, which is the same as the acquisition time period of the first signal frame, in the second signal belongs to the audio signal sent by the target user. The second signal frame of the third signal is thus set as the third signal frame of the second signal.
Step S224: a third signal is obtained.
Specifically, the above-described steps S221, S222, and S223 complete the identification process of one signal frame of the first signal and the second signal, resulting in a signal frame of the third signal of the corresponding period.
Specifically, the above steps S221, S222, and S223 are sequentially performed on all signal frames included in the first signal and the second signal according to the sequence of the acquisition time period, so as to obtain a complete third signal.
As an optional implementation manner, after step S223, the method may further include:
and judging whether the first signal frame is the last signal frame of the first signal.
For example, but not limiting of, the number of all signal frames in the first signal that have been processed is counted. If the number is smaller than the number of all signal frames contained in the first signal, the current first signal frame is not the last audio frame of the first signal. If the number is equal to the number of all signal frames included in the first signal, the current first signal frame is the last signal frame of the first signal.
In the case where the first signal frame is not the last signal frame of the first signal, the explanation has not completed the identification processing of all the signal frames of the first signal and the second signal. The determination of the next signal frame is continued, so the first signal frame may be shifted backward by one frame length, that is, the first signal frame is defined as the next signal frame of the last signal frame that has completed steps S221, S222 and S223. And executing the steps S221, S222 and S223 on the next signal frame, and circulating the steps until the identification processing of all the signal frames is completed.
When the first signal frame is the last signal frame of the first signal, it indicates that the identification processing has been completed for all the signal frames of the first signal and the second signal, and the second signal frame of the plurality of third signals is obtained. The plurality of second signal frames may be combined in time order, resulting in a complete third signal. The third signal comprises part or all of third signal frames, and the signal frames except the third signal frame in the third signal are invalid signal frames.
As an alternative, after obtaining the third signal, the third signal may be subjected to, but not limited to, a short-time energy analysis to determine whether each signal frame in the third signal is an invalid signal frame or a valid signal frame. And invalid signal frames are not processed any more during subsequent processing, so that unnecessary computation is reduced, and the processing efficiency is improved.
For ease of understanding, please refer to fig. 5. Fig. 5 is a schematic diagram illustrating an embodiment of obtaining a third signal from the first signal and the second signal by the method shown in fig. 4 according to an embodiment of the present invention.
As shown in fig. 5, this embodiment includes a histogram of a first signal 51, a second signal 52, and a third signal 53, where the horizontal axis of the histogram represents signal frames and the vertical axis represents a corresponding signal spectrum for each signal frame. It should be noted that the signal exemplarily shown in fig. 5 only includes 10 signal frames, and in a specific implementation, the signal may include less than or greater than 10 signal frames, and the number of signal frames is not limited in the embodiment of the present invention.
In the histogram of the second signal 52 and the third signal 53 shown in fig. 5, the different heights of the vertical axis signals are only used to indicate that the signals are different from each other. For example, but not limited to, a higher height of the vertical axis signal corresponding to a signal frame may indicate that the loudness of the signal frame is stronger, and may also indicate that the temporal energy of the signal frame is larger, which is not limited by the embodiment of the present invention. The color of the vertical axis signal is used to indicate whether the corresponding signal frame is an invalid signal frame or a valid signal frame. The color of the vertical axis signal is black to indicate that the signal frame is an effective signal frame, and the color of the vertical axis signal is gray to indicate that the signal frame is an ineffective signal frame.
It should be noted that, since the first signal 51 is a signal frame for excluding the non-target user voice in the second signal 52, the third signal 53 containing only the target user voice is obtained. In the process of acquiring the third signal, it is not necessary to consider the difference of signal characteristics such as loudness and short-time energy of each signal frame included in the first signal 51, and it is only necessary to distinguish whether each signal frame is invalid or valid. For ease of understanding, the histogram of the first signal 51 shown in fig. 5 has the same height and different color for all vertical axis signals. Wherein the color convention is consistent with the second signal 52 and the third signal 53 described above. However, in a specific implementation, the signal characteristics of the signal frames included in the first signal 51 may be different, and the embodiment of the present invention does not limit this.
With reference to the method shown in fig. 4, a first signal frame of the first signal 51 is taken as an example of a first signal frame to be processed currently. The second signal frame corresponding to the acquisition time of the first signal frame is a first signal frame of the third signal, and the third signal frame corresponding to the acquisition time of the first signal frame is a first signal frame of the second signal.
As shown in fig. 5, the color of the vertical axis signal of the first signal frame of the first signal 51 is gray, and the first signal frame is an invalid signal frame, that is, the energy of the first signal frame is less than the preset energy threshold. Step S222 is executed according to the method shown in fig. 4: the second signal frame of the third signal is determined to be the first signal frame. Therefore, the first signal frame of the third signal 53 shown in fig. 5 is determined to be an invalid signal frame, and the color of the vertical axis signal is gray.
Then, as shown in fig. 5, the number of signal frames that have been processed in the first signal 51 is 1, which is smaller than the number 10 of all signal frames that the first signal 51 contains. It can be confirmed that the first signal frame is not the last signal frame of the first signal 51. The first signal frame is shifted backward by one frame length according to the method shown in fig. 4, i.e. the first signal frame is defined as the second signal frame of the first signal 51, and the processing and the determination are continued.
The second signal frame corresponding to the acquisition time of the current first signal frame is the second signal frame of the third signal 53, and the third signal frame corresponding to the acquisition time of the current first signal frame is the second signal frame of the second signal 52.
Finally, as shown in fig. 5, if the color of the vertical axis signal of the redefined first signal frame (i.e. the second signal frame of the first signal 51) is black, the first signal frame is a valid signal frame, i.e. the energy of the first signal frame is equal to or greater than the preset energy threshold. Step S223 is performed according to the method shown in fig. 4: determining the second signal frame of the third signal as a third signal frame of the second signal. Therefore, the vertical axis signal of the second signal frame of the third signal 53 shown in fig. 5 and the vertical axis signal of the second signal frame of the second signal 52 are determined to coincide. By analogy, after traversing all signal frames of the first signal 51 according to the method shown in fig. 4, a third signal 53 containing 10 signal frames is obtained as shown in fig. 5.
It is understood that the embodiment shown in fig. 5 may be used to assist in understanding the method shown in fig. 4, in a specific implementation, the first signal, the second signal and the third signal are more complex, the actual processing procedure is also more complex, and the detailed processing procedure and the result are not limited by the embodiment of the present invention.
Step S230: a first signal feature is extracted from the third signal.
Specifically, the third signal obtained in step S220 contains only the sound of the target user 10 wearing the cough type recognition device 20, the interference of other sounds is removed, and the frequency band of the third signal conforms to the human voice frequency band.
As an alternative, the third signal contains invalid signal frames and valid signal frames, and the extraction of the first signal feature and the identification of the cough type may be performed only on valid signal frames in the third signal. Thereby facilitating subsequent processing and reducing unnecessary calculation amount.
As an alternative implementation, before extracting the first signal feature according to the third signal, the endpoint detection may be performed on the third signal to find the starting point and the ending point of the sound, so as to obtain a more effective audio signal, thereby reducing the data amount, the operation amount, and the processing time of the subsequent processing.
Referring to fig. 6, an embodiment of the present invention will now describe how to perform endpoint detection on an audio signal by a two-stage decision method. The two-stage decision method of the embodiment of the present invention needs to involve two characteristic parameters: short-term average zero-crossing rate and short-term energy.
Referring to fig. 6, fig. 6 is a schematic diagram of a short-term energy spectrum and a short-term zero-crossing rate spectrum of an audio signal.
As shown in fig. 6, this embodiment includes a short-time energy spectrum 61 and a short-time zero-crossing rate spectrum 62 of the audio signal. The horizontal axes of the short-term energy spectrum 61 and the short-term zero-crossing rate spectrum 62 are used to represent audio frames of the audio signal, and the vertical axis of the short-term energy spectrum 61 is used to represent the short-term energy E corresponding to the audio framesnThe vertical axis of the SMZ spectrum 62 is used to represent the corresponding SMZ for the audio framen
The description of the short-time energy can be seen from the description of step S221, and is not repeated here. The zero-crossing rate is the number of times a signal crosses zero in a unit time, and the short-term zero-crossing rate of an audio signal can be used to roughly characterize the spectral characteristics of the audio signal and to distinguish between voiced and unvoiced sounds. Zero crossing rate Z of nth signal framenThe expression of (a) is as follows:
Figure BDA0002291360680000111
wherein, N is the frame length of the nth audio frame, sgn [ ] is a sign function, and the expression of the sign function is as follows:
Figure BDA0002291360680000121
understandably, as can be seen from the above expression of the zero crossing rate, when the signs of the signal spectrums corresponding to two adjacent signal frames are the same, no zero crossing occurs; when the signs of the signal spectrums corresponding to two adjacent signal frames are opposite, a zero crossing is generated.
Specifically, the two-stage decision method is to use short-time energy EnMaking a first determination, and then using a short-time zero-crossing rate ZnAnd making a second judgment. The method combines advantages of short-term energy and short-term zero-crossing rateAnd the detection result is more accurate. The specific flow of the two-stage decision method will be described with reference to fig. 6.
As shown in fig. 6, the two-stage decision method includes, but is not limited to, the following steps:
the first discrimination can adopt a double-threshold comparison method: first, the energy E can be determined according to the short-time energynSelecting a first threshold M1Thereby obtaining M1And EnIntersection of envelopes: point a and point B. The start and end points of the sound are located outside the points a and B. The second threshold M may then be chosen based on the average energy of the background noise2Thereby obtaining M2And EnIntersection of envelope first time: points C and D, i.e. the start and end points of the possible sounds.
And (3) second discrimination: the third threshold M may be selected according to the average zero crossing rate of the background noise3Combining the C point and the D point obtained by the first discrimination to obtain M3And ZnIntersection of the curves for the first time: point O and point P. Namely, the starting point and the end point of the sound obtained by the current endpoint detection. The signal frame corresponding to the O point is the starting point signal frame of the audio signal, and the signal frame corresponding to the P point is the ending point signal frame of the audio signal.
Not limited to the above-mentioned cases, the method of endpoint detection may also be: first, a fourth threshold and a fifth threshold are set. Then, rough judgment is carried out through short-time energy: under the condition that the short-time energy of the signal frame is larger than a fourth threshold, marking the signal frame as a signal frame of a possible sound starting point; and finally, judging through a short-time zero crossing rate: under the condition that the short-time zero crossing rates of the signal frames of the possible sound starting points of the three continuous marks are all larger than a fifth threshold, marking the signal frame of the first possible sound starting point as a starting point signal frame; continuing to determine the signal frame following the start signal frame, in case the signal frame marking a possible sound start is smaller than a fifth threshold, marking the signal frame as an end signal frame. The method for determining the starting point and the ending point of the sound is not limited in the embodiment of the invention.
The two-stage decision method for endpoint detection is not limited to the above-mentioned cases, but the first decision may be made by using a short-time average amplitude instead of a short-time energy, which is not limited in the embodiment of the present invention.
Without being limited to the above-mentioned examples, the selection manner of the first threshold, the second threshold, the third threshold, the fourth threshold and the fifth threshold needs to be determined according to the actually processed audio signal, which is not limited in the embodiment of the present invention.
For practical considerations and ease of understanding, the third signal in embodiments of the present invention is an audio signal that has undergone endpoint detection. The sound starting point of the third signal is the signal starting point of the third signal, and the signal frame corresponding to the signal starting point is called a starting point signal frame. The sound end of the third signal is the signal end of the third signal, and the signal frame corresponding to the signal end is called an end signal frame.
In particular, the first signal feature includes an average short-time energy of a short-time energy spectrum of the third signal, a first slope, a second slope, and one or more mel-frequency cepstral coefficient feature vectors. The first slope is the slope of a signal frame (i.e. a starting point signal frame) corresponding to a signal starting point in the third signal in the short-time energy spectrum, and the second slope is the slope of a signal frame (i.e. an end point signal frame) corresponding to a signal end point in the third signal in the short-time energy spectrum.
As an alternative, the first signal feature may include a first sub-signal feature and a second sub-signal feature, and step S230 may be to extract the first sub-signal feature and the second sub-signal feature according to the third signal. Next, a method of extracting a first sub-signal feature and a second sub-signal feature from a third signal is described with reference to fig. 7.
Referring to fig. 7, fig. 7 is a flowchart illustrating a cough type identification method according to an embodiment of the present invention. Step S230 of fig. 2 described above may include the method, which may be implemented by the cough type identifying device 20 shown in fig. 1. The method includes, but is not limited to, the steps of:
step S231: acquiring a short-time energy spectrum of the third signal, and obtaining average short-time energy, a first slope and a second slope according to the short-time energy spectrum; the average short-time energy, the first slope and the second slope are first sub-signal features.
Specifically, the method for calculating the short-time energy may refer to the description of step S221, and is not described herein again. Each signal frame of the third signal has a corresponding curve slope in the curve of the short-time energy spectrum, and the curve slope calculation method may be, but is not limited to, calculating a slope of a tangent line or calculating a tangent value of an included angle between the tangent line and the horizontal axis, which is not limited in the embodiment of the present invention. The average short-time energy may be calculated by, but not limited to, summing the short-time energies of all signal frames of the third signal and dividing by the total number of signal frames.
For ease of understanding, fig. 8 illustrates a schematic diagram of a short-time energy spectrum of a third signal. The average short-time energy, the first slope and the second slope obtained from the short-time energy spectrum of the third signal will be described with reference to fig. 8.
As shown in fig. 8, the horizontal axis of the short-term energy spectrum 81 represents a signal frame, and the vertical axis represents short-term energy corresponding to a signal. From the short-term energy spectrum 81 shown in fig. 8, an expression of a tangent to the short-term energy curve corresponding to the starting signal frame of the third signal can be calculated: y is ax + b, where a is the first slope. An expression of a tangent to the short-time energy curve corresponding to the end-point signal frame of the third signal can be calculated: y ═ cx + d, where c is the second slope. The envelope curves of the short-time energy spectrum 81 may be summed and divided by the total number of signal frames to obtain an average short-time energy, which is used to obtain the first sub-signal feature.
The short-time energy spectrum 81 and the manner of acquiring the first sub-signal feature shown in fig. 8 are not limited, and in a specific implementation, other manners of acquiring may also be possible, which is not limited in this embodiment of the present invention.
Step S232: acquiring one or more Mel-scale frequency cepstral coefficients (MFCC) feature vectors of the third signal; the one or more mel-frequency cepstral coefficient feature vectors are second sub-signal features.
Specifically, each signal frame of the third signal corresponds to one MFCC feature vector, and one MFCC feature vector comprises a plurality of MFCC coefficients. Among them, speech recognition and speaker recognition etc. practical application often use the speech characteristic: MFCC coefficients. The MFCC coefficients take human auditory features into account, and first map the linear spectrum into the Mel nonlinear spectrum based on auditory perception, and then convert to the cepstrum. The expression for converting the normal frequency to Mel frequency is as follows:
Figure BDA0002291360680000131
where f is frequency in Hz.
Specifically, the extraction process of the MFCC coefficients includes a series of steps, and a method of extracting the MFCC coefficients of the third signal is described with reference to fig. 9.
Referring to fig. 9, fig. 9 is a flowchart illustrating a method for extracting MFCC coefficients, which may be implemented by the cough type identification apparatus 20 shown in fig. 1. The method includes, but is not limited to, the steps of:
step S901: and carrying out pre-emphasis, framing and windowing on the third signal to obtain a fourth signal.
Specifically, in order to boost the high frequency part, flatten the spectrum of the signal, and highlight the formants of the high frequency, the third signal may be pre-emphasized. Pre-emphasis is performed by passing the signal through a high-pass filter whose expression is as follows:
H(z)=1-μ×z-1
wherein, the value range of μ is 0.9-1.0, and μ is usually 0.97 in practical application, which is not limited in the embodiment of the present invention.
The description of the framing can refer to the description of step S220 in fig. 2 and the description of the embodiment shown in fig. 3, which are not repeated herein.
At the same time, the third signal may be windowed in order to reduce the effect of truncation of the signal frames. The window function may be, but is not limited to, a rectangular window, a triangular window, a Hanning window (Hanning), a Hamming window (Hamming), and a Blackman window (Blackman), among others.
Step S902: and performing discrete Fourier transform on the fourth signal to obtain a corresponding first frequency spectrum.
In particular, the discrete fourier transform may be, but is not limited to, a Fast Fourier Transform (FFT).
Step S903: and passing the first frequency spectrum through a triangular filter bank to obtain a second frequency spectrum.
Specifically, in order to smooth the spectrum and eliminate the effect of harmonics, so as to highlight the formants of the original sound, the first spectrum may be passed through a set of Mel (Mel) filter banks to obtain the Mel spectrum (i.e., the second spectrum). The Mel filter bank may be a set of Mel-scale triangular filter bank, and the triangular filter bank includes M triangular filters, where an expression of a frequency response of any one of the triangular filters is as follows:
Figure BDA0002291360680000141
wherein the expression satisfies
Figure BDA0002291360680000142
Step S904: the second spectrum is subjected to Discrete Cosine Transform (DCT).
Specifically, in order to improve the identification performance, the cepstrum analysis may be performed on the second frequency spectrum to obtain a cepstrum vector of each signal frame of the third signal, that is, a MFCC feature vector including a plurality of MFCC coefficients. Cepstrum is understood to mean, among other things, the transformation of a frequency domain signal into a time domain signal. Cepstral analysis may include, but is not limited to: and calculating the logarithmic energy of the second frequency spectrum, and combining the logarithmic energy to obtain a cepstrum vector through DCT. Where the logarithmic energy is defined as the sum of the squares of the signal in a frame, then the base-10 logarithm is removed, and then the value obtained by multiplying by 10.
As an alternative embodiment, after the four steps are completed, an MFCC matrix of F × L order is obtained, and the matrix includes a plurality of MFCC feature vectors. Where F is the number of signal frames included in the third signal and L is the length of the MFCC feature vector. Since the MFCC matrix has a high dimension, and in order to further extract valid signal features from the MFCC matrix, the dimension of the matrix may be reduced. The dimension reduction method may be, but is not limited to, dimension reduction of the MFCC matrix by algorithms such as Dynamic Time Warping (DTW) or Principal Component Analysis (PCA).
According to the embodiment of the invention, the first signal characteristics comprising the first sub-signal characteristics (the average short-time energy, the first slope and the second slope of the short-time energy spectrum) and the second sub-signal characteristics (one or more MFCC characteristic vectors) are used for representing the cough sound of the target user, so that the classification accuracy is improved, and the cough types such as dry cough or wet cough can be more accurately distinguished.
The first signal feature may further include other sub-signal features in a specific implementation, which is not limited to the above-mentioned list.
Step S240: the cough type of the user's cough is determined from the first signal characteristic.
Specifically, the cough type may be a dry cough, a wet cough, or the like. Without being limited thereto, in a specific implementation, the cough type may further include a cough type corresponding to a cough disease, such as a cough of chronic laryngitis, a cough of tracheitis, a cough of pneumonia, and the like, which is not limited by the embodiment of the present invention.
As an alternative embodiment, step S240 may be inputting the first signal into the first model, and obtaining an output result of the first model. The output result is used for representing the cough type of the user cough, and the first model is a cough type identification model which is obtained through pre-training and is based on a Support Vector Data Description algorithm (SVDD).
For convenience of explanation, the embodiment of the present invention is described by taking the first model as an example of the SVDD-based cough type recognition model. Next, a method for determining the cough type of the user's cough from the first signal characteristics by the first model will be described with reference to fig. 10.
Referring to fig. 10, fig. 10 is a flowchart illustrating another cough type identification method according to an embodiment of the present invention. The flow illustrates a manner of training a plurality of sub-models corresponding to one cough template in the first model. Step S240 of fig. 2 described above may include the method, which may be implemented by the cough type recognition apparatus 20 shown in fig. 1. The method includes, but is not limited to, the steps of:
step S241: signal characteristics of a preset number of sample signals of a target cough type are acquired.
The target cough type is a preset and known cough type, such as the aforementioned dry cough, cough of wet cough chronic laryngitis, cough of tracheitis, cough of pneumonia, and the like.
In particular, the sample signal may be, but is not limited to, the first signal, the second signal, or the third signal described above. The signal characteristic may be, but is not limited to, the first signal characteristic described above.
For convenience of explanation and practical consideration, the signal characteristics of the sample signal according to the embodiment of the present invention are described by taking the first signal characteristic as an example. I.e. the signal features of the sample signal comprise an average short-time energy of a short-time energy spectrum of the sample signal, a first slope of the sample signal, a second slope of the sample signal, and one or more mel-cepstral coefficient feature vectors of the sample signal. The manner of obtaining the signal characteristics of the sample signal can refer to the description of steps S220 and S230, and is not repeated here.
Step S242: a first model is derived based on signal characteristics of a preset number of sample signals of a target cough type.
The method includes the steps of training a SVDD model by taking signal characteristics of a preset number of sample signals of target cough types as input and cough types as output, and obtaining the center and the radius of a cough type recognition model based on the SVDD so as to obtain a first model.
Specifically, the central idea of SVDD is to train a minimum hypersphere (hypersphere refers to a sphere in a space with more than three dimensions), and the data contained in the hypersphere all belong to the same class. When new data is input into the SVDD model, whether the data can fall on the hypersphere is confirmed. If the data falls within the hypersphere, the data is considered to belong to the category to which the data contained in the hypersphere belongs, and if the data does not fall within the hypersphere, the data is considered not to belong to the category to which the data contained in the hypersphere belongs.
Specifically, the optimization goal of SVDD is to find a minimum sphere with a center a and a radius R:
Figure BDA0002291360680000151
such that this sphere satisfies the following formula (1):
Figure BDA0002291360680000152
note that, for data x of 3 dimensions or moreiThe spherical surface is a hypersphere. The hypersphere refers to a spherical surface in a space of 3 dimensions or more. The sphere corresponds to a curve in 2-dimensional space and to a sphere in 3-dimensional space.
A sphere that satisfies the above condition means that the data points in the input data set used for training are all contained within the sphere. Wherein x isiRepresenting input data for training, x in an embodiment of the inventioniIs the signal characteristic of the sample signal for the predetermined number of target cough types.
Now that there are targets and constraints to be solved, the following solving method can be, but is not limited to, using lagrange Lagrangian multiplier method:
Figure BDA0002291360680000153
wherein, αi≥0,γiNot less than 0, then respectively comparing the parameters R, a and ξiCalculating the partial derivative, and making the derivative equal to 0, the following formula (2), formula (3) and formula (4) are obtained:
iαi=1 (2)
Figure BDA0002291360680000154
Figure BDA0002291360680000161
substituting the above equations (2), (3) and (4) into equation (1) can obtain the dual problem, as shown in the following equation:
Figure BDA0002291360680000162
wherein the above formula satisfies 0 ≦ αi≤C,∑iαi=1。
The above vector inner product can be solved by a kernel function K, which results in the following formula:
Figure BDA0002291360680000163
through the calculation process, the value with the center as a and the radius as R can be obtained, and an SVDD-based model is obtained.
For convenience of explanation, the center of the target sphere is denoted by a, but a is not a numerical value. For a 2-dimensional curve, the center a is a coordinate point in a 2-dimensional coordinate system, such as but not limited to (x, y). For a 3-dimensional sphere, the center a is a coordinate point in a 3-dimensional coordinate system, such as but not limited to (x, y, z), and so on.
As an alternative embodiment, the first model includes a plurality of cough templates, one cough template corresponding to each cough type, and each cough template corresponding to one of the average short-time energy submodel, one of the start-point slope submodel, one of the end-point slope submodel, and one of the vector submodel.
The signal characteristic of the sample signal is the first signal characteristic according to the convention described above. Thus, step S242 may include, but is not limited to, the following operations:
and obtaining an average short-time energy submodel corresponding to the target cough type according to the average short-time energy of the sample signals of the preset number of target cough types.
Specifically, the average short-time energy of the sample signals of the preset number of target cough types is used as input, the target cough types are used as output, the SVDD model is trained, and the average short-time energy submodel corresponding to the target cough types is obtained.
And obtaining a starting point slope submodel corresponding to the target cough type according to the first slopes of the sample signals of the preset number of target cough types.
Specifically, a starting point slope sub-model corresponding to a target cough type is obtained by training a SVDD model with a preset number of first slopes of sample signals of the target cough type as input and the target cough type as output.
And obtaining a terminal slope submodel corresponding to the target cough type according to the second slopes of the sample signals of the preset number of target cough types.
Specifically, the second slopes of the preset number of sample signals of the target cough types are used as input, the target cough types are used as output, the SVDD model is trained, and the endpoint slope submodel corresponding to the target cough types is obtained.
And obtaining a vector sub-model corresponding to the target cough type according to one or more Mel cepstrum coefficient feature vectors of the sample signals of the preset number of target cough types.
Specifically, one or more Mel cepstrum coefficient feature vectors of sample signals of preset number of target cough types are used as input, the target cough types are used as output, the SVDD model is trained, and the vector submodel corresponding to the target cough types is obtained.
It can be understood that, in the training process, on one hand, the size and range of the hypersphere need to be controlled so that the hypersphere contains as much input data x as possibleiOn the other hand, the radius of the hypersphere is also required to be minimized, so that the optimal classification effect is achieved.
It should be understood that, when a plurality of cough types corresponding to the cough templates are to be trained, the first model may be trained in turn in the above manner, so as to obtain sub-models (referred to as an average short-time energy sub-model, a starting point slope sub-model, an ending point slope sub-model, and a vector sub-model) corresponding to each cough template in the first model.
In the embodiment of the invention, each of the four submodels, namely the average short-time energy submodel, the starting point slope submodel, the end point slope submodel and the vector submodel, corresponds to one hypersphere. On the premise of containing input data (namely signal characteristics of sample signals of a preset number of target cough types), the boundary of the hypersphere is optimized, the radius of the hypersphere is minimized, and finally the SVDD-based cough type recognition model which meets the requirements most and has the best classification effect is obtained. Thereby improving the accuracy of cough types such as dry cough, wet cough and the like.
Without being limited to the four submodels listed above, in particular implementations, the inputs to train an SVDD model may be two signal features, such as a first slope and a second slope, or the inputs may be other features such as a short-time zero-crossing rate. The embodiment of the present invention is not limited thereto.
Step S243: inputting the first signal characteristic into a first model to obtain an output result of the first model; the output is used to characterize the type of cough that the user coughs.
Specifically, the first model is a pre-trained SVDD-based cough type recognition model. The first model may be, but is not limited to being, trained in the manner of step S242.
As an alternative embodiment, consistent with the above description, the first model includes a plurality of cough templates, one cough template corresponding to one average short-time energy sub-model, one start-point slope sub-model, one end-point slope sub-model and one vector sub-model.
The center of the average short-time energy submodel may be referred to as a first center and the radius of the average short-time energy submodel may be referred to as a first radius. The center of the origin slope sub-model may be referred to as a second center and the radius of the origin slope sub-model may be referred to as a second radius. The center of the endpoint slope sub-model may be referred to as a third center and the radius of the endpoint slope sub-model may be referred to as a third radius. The center of the vector sub-model may be referred to as a fourth center and the radius of the vector sub-model may be referred to as a fourth radius.
Thus, step S234 may include, but is not limited to, the following operations:
and calculating the distance between the coordinate point corresponding to the average short-time energy of the third signal and the coordinate point corresponding to the first center corresponding to each cough template so as to determine the first distance corresponding to each cough template.
And calculating the distance between the coordinate point corresponding to the first slope of the third signal and the coordinate point corresponding to the second center corresponding to each cough template so as to determine the second distance corresponding to each cough template.
And calculating the distance between the coordinate point corresponding to the second slope of the third signal and the coordinate point corresponding to the third center corresponding to each cough template so as to determine the third distance corresponding to each cough template.
And calculating one or more distances between the coordinate point corresponding to each of the one or more mel-frequency cepstral coefficient feature vectors of the third signal and the coordinate point corresponding to the fourth center corresponding to each cough template so as to determine one or more fourth distances corresponding to each cough template.
And under the condition that the first distance corresponding to the target cough template is smaller than the first radius corresponding to the target cough template, the second distance corresponding to the target cough template is smaller than the second radius corresponding to the target cough template, the third distance corresponding to the target cough template is smaller than the third radius corresponding to the target cough template, and one or more fourth distances corresponding to the target cough template are smaller than the fourth radius corresponding to the target cough template, determining that the cough type corresponding to the target cough template is the output result of the first model.
Wherein the target cough template is one of a plurality of cough templates included in the first model.
It is understood that the distance between the coordinate point corresponding to the signal feature and the coordinate point corresponding to the center of the model is smaller than the radius of the model, indicating that the coordinate point corresponding to the signal feature is within the hypersphere corresponding to the model. Therefore, the cough type corresponding to the signal characteristic is the cough type corresponding to the hypersphere, and meets the optimization target and the related description of the SVDD.
In a specific implementation, without being limited to the above-mentioned cases, when any N distances among the distances corresponding to the target cough template are smaller than N radii corresponding to the target cough template, the cough type corresponding to the target cough template may be determined as the output result of the first model, and is not limited to this. N is a preset threshold, and the preset mode may be any mode, which is not limited in the embodiment of the present invention. Each of the N arbitrary distances is in one-to-one correspondence with a radius of the N radii. For example, but not limited to, if N is 2, when the first distance corresponding to the target cough template is smaller than the first radius corresponding to the target cough template, and the third distance corresponding to the target cough template is smaller than the third radius corresponding to the target cough template, it is determined that the cough type corresponding to the target cough template is the output result of the first model.
In a specific implementation, one cough template may correspond to a plurality of average short-time energy submodels, and the correspondence between the cough template and the submodels is not limited in the embodiment of the present invention.
Not limited to the SVDD-based cough type recognition models listed above, in a particular implementation, the first model may also be a deep learning-based cough type recognition model. The embodiment of the present invention is not limited thereto.
In the embodiment of the present invention, the first signal is a signal collected by contact with the target user and generated when the target user coughs, which is not a sound emitted by the target user, and therefore the first signal only belongs to the signal emitted by the target user. According to the embodiment of the invention, the first signal and the second signal are combined, so that the environmental noise and other noises can be effectively removed, and the first signal characteristic is extracted only from the cough sound of the target user. For the first signal feature for characterizing the cough sound, three signal features of the average short-time energy of the short-time energy spectrum, the slope of the starting signal frame (first slope) and the slope of the ending signal frame (second slope) are added on the basis of the MFCC feature vector, so that the classification accuracy is improved. In addition, the embodiment of the invention also combines a cough type identification model based on SVDD to analyze and identify the cough type of the first signal characteristic, and can more accurately distinguish the cough types such as dry cough or wet cough.
While the method of the embodiments of the present invention has been described in detail above, to facilitate a better understanding of the above-described aspects of the embodiments of the present invention, the following provides a corresponding apparatus of the embodiments of the present invention.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a cough type identification device according to an embodiment of the present invention, where the cough type identification device 11 may include a first obtaining module 111, a second obtaining module 112, an extracting module 113, and a determining module 114. Wherein, the detailed description of each unit is as follows:
the first obtaining module 111 is configured to obtain a first signal and a second signal that are collected when the user coughs.
Specifically, the first signal is a signal obtained by collecting other signals which are not sound and are generated when the user coughs in a contact mode, and the second signal is an audio signal obtained by collecting sound generated when the user coughs.
As an alternative embodiment, the first acquisition module 111 may comprise a first acquisition unit for acquiring the first signal and a second acquisition unit for acquiring the second signal. The first acquisition unit may be, but is not limited to, a bone conduction sensor, a piezoelectric sensor, or a piezoresistive sensor. The second acquisition unit may be, but is not limited to, a micro-electro-mechanical system microphone, a moving coil microphone, a condenser microphone, or an electret condenser microphone.
The second obtaining module 112 is configured to obtain a third signal according to the first signal and the second signal.
And an extracting module 113, configured to extract the first signal feature according to the third signal.
A determining module 114 for determining a cough type of the cough of the user according to the first signal characteristic.
As an optional implementation manner, the first signal and the second signal each include a plurality of signal frames, and each signal frame corresponds to a different acquisition period.
The second obtaining module 112 is specifically configured to:
and if the energy of the first signal frame is greater than or equal to the preset energy threshold value, determining that the second signal frame in the third signal is the first signal frame. The second signal frame and the first signal frame correspond to the same acquisition time period, and the first signal frame is any one of the first signals.
And if the energy of the first signal frame is less than the preset energy threshold value, determining the second signal frame as a third signal frame in the second signal. And the third signal frame and the first signal frame correspond to the same acquisition time interval.
As an alternative embodiment, the first signal feature includes an average short-time energy of a short-time energy spectrum of the third signal, a first slope, a second slope, and one or more mel-frequency cepstrum coefficient feature vectors; the first slope of the third signal is the slope of a signal frame corresponding to the signal start point in the third signal in the short-time energy spectrum of the third signal, and the second slope of the third signal is the slope of a signal frame corresponding to the signal end point in the third signal in the short-time energy spectrum of the third signal.
As an alternative embodiment, the determining module 114 is specifically configured to input the first signal characteristic into the first model, and obtain an output result of the first model. The output result is used for representing the cough type of the user cough, and the first model is a cough type recognition model which is obtained through pre-training and is based on a support vector data description algorithm.
As an alternative embodiment, the first model includes a plurality of cough templates, one cough template corresponding to one cough type, and each cough template corresponding to one average short-time energy sub-model, one start-point slope sub-model, one end-point slope sub-model and one vector sub-model. The center of the average short-time energy submodel is a first center, and the radius of the average short-time energy submodel is a first radius. The center of the starting point slope submodel is a second center, and the radius of the starting point slope submodel is a second radius. The center of the endpoint slope sub-model is a third center, and the radius of the endpoint slope sub-model is a third radius. The center of the vector sub-model is the fourth center, and the radius of the vector sub-model is the fourth radius.
The determination module 114 may include:
and the first calculating unit is used for calculating the distance between the coordinate point corresponding to the average short-time energy of the third signal and the coordinate point corresponding to the first center corresponding to each cough template so as to determine the first distance corresponding to each cough template.
And the second calculating unit is used for calculating the distance between the coordinate point corresponding to the first slope of the third signal and the coordinate point corresponding to the second center corresponding to each cough template so as to determine the second distance corresponding to each cough template.
And the third calculating unit is used for calculating the distance between the coordinate point corresponding to the second slope of the third signal and the coordinate point corresponding to the third center corresponding to each cough template so as to determine the third distance corresponding to each cough template.
And the fourth calculating unit is used for calculating one or more distances between the coordinate point corresponding to each of the one or more mel cepstrum coefficient feature vectors of the third signal and the coordinate point corresponding to the fourth center corresponding to each cough template so as to determine one or more fourth distances corresponding to each cough template.
The output determining unit is used for determining that the cough type corresponding to the target cough template is the output result of the first model under the condition that the first distance corresponding to the target cough template is smaller than the first radius corresponding to the target cough template, the second distance corresponding to the target cough template is smaller than the second radius corresponding to the target cough template, the third distance corresponding to the target cough template is smaller than the third radius corresponding to the target cough template, and one or more fourth distances corresponding to the target cough template are smaller than the fourth radius corresponding to the target cough template; the target cough template is one of a plurality of cough templates.
As an optional implementation manner, the apparatus 11 may further include a third obtaining module, where the third obtaining module includes:
an acquisition feature unit for acquiring signal features of a preset number of sample signals of the target cough type. Wherein the target cough type is a preset and known cough type.
And the model obtaining unit is used for obtaining a first model according to the signal characteristics of the sample signals of the preset number of target cough types.
As an alternative embodiment, the signal characteristics of the sample signal include an average short-time energy of a short-time energy spectrum of the sample signal, a first slope of the sample signal, a second slope of the sample signal, and one or more mel-frequency cepstral coefficient feature vectors of the sample signal.
The model obtaining unit includes:
the first obtaining subunit is configured to obtain an average short-time energy submodel corresponding to the target cough type according to the average short-time energy of the sample signals of the preset number of target cough types.
And the second obtaining subunit is used for obtaining a starting point slope submodel corresponding to the target cough type according to the first slopes of the sample signals of the preset number of target cough types.
And the third obtaining subunit is configured to obtain an end point slope sub-model corresponding to the target cough type according to the second slopes of the sample signals of the preset number of target cough types.
And the fourth obtaining subunit is configured to obtain a vector sub-model corresponding to the target cough type according to one or more mel cepstrum coefficient feature vectors of the sample signals of the preset number of target cough types.
It should be noted that, in the embodiment of the present invention, the specific implementation of each unit may also correspond to the corresponding description of the method embodiments shown in fig. 2, fig. 4, fig. 7, and fig. 10.
Referring to fig. 12, fig. 12 is a schematic structural diagram of another cough type identification apparatus according to an embodiment of the present invention, where the cough type identification apparatus 12 may include a first acquisition module 121, a second acquisition module 122, a Digital Signal Processor (DSP) 123, a Micro Control Unit (MCU) 124, a communication module 125, and a power module 126.
Specifically, the first acquisition module 121 may acquire the first signal through contact with a target user wearing the cough type recognition device 12, and the second acquisition module 122 may acquire the second signal through an air medium.
The first acquisition module 121 may be, but is not limited to, a bone conduction sensor, a piezoelectric sensor, or a piezoresistive sensor. The second acquisition module 122 may be, but is not limited to, a mems microphone, a moving coil microphone, a condenser microphone, or an electret condenser microphone.
Specifically, the MCU124 is a control module, and is configured to control the specific implementation of the first acquisition module 121, the second acquisition module 122, the DSP123 and the communication module 125.
After the signals are acquired by the first acquisition module 121 and the second acquisition module 122, the signals may be transmitted to the DSP 123. The MCU124 controls the DSP123 to perform the cough type identification method shown in fig. 2, 4, 7 or 10. Then, the DSP123 transmits the processed cough data to the MCU 123. The MCU123 transmits the cough data to the communication module 125, and controls the communication module 125 to transmit the cough data to the server 30 of fig. 1 described above. The server 30 may send the cough data to the user terminal 40 for display, so that the target user 10 can view the cough data through the user terminal 40 conveniently.
As an optional implementation manner, the control module shown in fig. 12 is not limited to the MCU124, and in a specific implementation, the control module may also be a single chip or a Central Processing Unit (CPU). The embodiment of the present invention is not limited thereto.
In particular, a power module 126 for providing electrical energy to the cough type identification means 12.
Next, a further cough type identification device provided by an embodiment of the present invention will be described with reference to fig. 13A and 13B, and fig. 13A and 13B are a side view and a front view of the cough type identification device 13, respectively.
Referring to fig. 13A, fig. 13A is a side view of a cough type recognition device according to an embodiment of the invention.
As shown in fig. 13A, the cough type identifying device 13 may include a fixing pin and button 131, a key switch 132, a network connection indicator 133, a signal acquisition indicator 134, a microphone sound receiving hole 135, a bone conduction sensor interface 136, a bone conduction sensor 137, a device body 138, and a connection wire 139.
Specifically, the fixing pin and button 131, the key switch 132, the network connection indicator 133, the signal acquisition indicator 134, the microphone sound receiving hole 135, and the bone conduction sensor interface 136 may all be disposed on a housing of the device body 138.
Wherein the fixing pin and the button 131 are disposed on the back of the housing of the device body 138. The key switches 132 are disposed at the side of the casing of the device main body 138. The network connection indicator light 133, the signal acquisition indicator light 134, the miniature microphone sound receiving hole 135 and the bone conduction sensor interface 136 are arranged on the top of the shell of the device main body 138. The bone conduction sensor 137 is connected to the bone conduction sensor interface 136 by a connection 139.
Specifically, when the target user 10 uses the cough type recognition device 13, the cough type recognition device 13 may be fixed to the clothing of the target user 10 by the fixing pin and the button 131, and the bone conduction sensor 137 may be fixed to the body of the target user 10, so as to be conveniently detected and carried. For example, but not limited to, the bone conduction sensor 137 is attached to the occurrence of the vocal cords of the neck of the target user 10 by a disposable adhesive tape, and the cough type recognition device 13 is worn on the collar of the target user 10 by fixing the needle and the button 131, as shown in fig. 1 by the example of the target user 10 wearing the cough type recognition device 20.
Specifically, the bone conduction sensor 137 may collect a first signal through contact with the body of the target user 10, and the micro-microphone sound receiving hole 135 may collect a second signal through air.
The bone conduction sensor 137 may transmit the first signal to the bone conduction sensor interface 136 via a connection 139, and the bone conduction sensor interface 136 transmits the first signal to the device body 138. The miniature microphone sound receiving aperture 135 transmits the second signal to the device body 138. The first signal and the second signal are subjected to signal feature extraction and cough type identification by a processing module inside the device main body 138, so that the cough type of the cough sound of the target user is confirmed. The specific implementation of the processing module inside the device main body 138 may refer to the cough type identification method shown in fig. 2, 4, 7, and 10.
As an alternative embodiment, the bone conduction sensor 137 may be sized as shown in fig. 13A: the largest cross-section is 8 millimeters (mm) in diameter and the largest longitudinal section is 4.5mm in height.
Specifically, the target user 10 may perform any one of the following operations through the key switch 132: starting up, shutting down, starting up network connection, closing network connection, starting to collect signals, suspending to collect signals, ending to collect signals and the like.
Specifically, after the network connection is turned on by the key switch 132, the network connection indicator lamp 133 may be turned on if the network connection is successful. After the start of the signal acquisition operation by the key switch 132, the signal acquisition indicator lamp 134 may be turned on.
As an alternative embodiment, the network connection indicator lamp 133 may display the first state after the network connection opening operation is performed through the key switch 132. After the network connection is opened, the network connection indicator lamp 133 may display the second state and the third state in case of successful network connection and unsuccessful connection, respectively. Wherein the first state, the second state and the third state are different from each other.
As an alternative embodiment, after the operation of starting to collect signals is performed by the key switch 132, the signal collection indicator lamp 134 may display a fourth state indicating that signals are currently being collected. The signal collection indicator lamp 134 may display the fifth state by performing the signal collection suspending operation through the key switch 132. Wherein the fourth state and the fifth state are different.
The first state, the second state, the third state, the fourth state and the fifth state may be, but not limited to, any one of the following cases: long brightness, twinkling and shining, and shining with different colors.
Referring to fig. 13B, fig. 13B is a front view of a cough type recognition device according to an embodiment of the present invention.
As shown in fig. 13B, the cough type recognition device 13 may include modules identical to those shown in fig. 13A, and the detailed description may refer to the description of fig. 13A, which is not repeated herein.
Note that, since fig. 13B is a front view of the cough type recognition device 13 and the fixing pin and the button 131 are located on the back surface of the device main body 138, fig. 13B does not show the fixing pin and the button 131.
As an alternative embodiment, the device body 138 may have the dimensions shown in fig. 13B: the largest cross section has a diameter of 30mm and the largest longitudinal section has a height of 40 mm.
In practical use, with reference to fig. 13A and 13B, the flow of the target user 10 using the cough type recognition device 13 may include, but is not limited to, the following steps:
first, the target user 10 fixes the cough type recognition device 13 to the neckline through the fixing pin and the button 131, and attaches the bone conduction sensor 137 to the neck sound emission part of the body with a disposable adhesive tape.
Second, after the target user 10 wears the device, the cough type identification device 13 can be turned on by clicking the push switch 132 once. Then, the key switch 132 may be pressed for a long time to turn on the network connection, the network connection indicator lamp 133 may flash and light in the case where the network connection has not been successful, and the network connection indicator lamp 133 may light for a long time in the case where the network connection is successful. When the network connection is successful, the cough type identification device 13 may establish a data connection relationship with a server or a user terminal as shown in fig. 1.
Third, the target user 10 may register an account with the user terminal 40 and perform entry of relevant information. The related information may include, but is not limited to: basic information of the user (such as gender, age, sleep condition, eating habits and the like) and cough information (the number of coughs, the duration of coughs, the subjective high-incidence period of the user, whether sputum exists or not and the like).
Fourth, the target user 10 may double-click the key switch 132 to turn on the non-sensory monitoring, i.e., the cough type identifying device 13 starts to collect a signal. At this time, the signal collection indicator 134 may be illuminated to indicate that it is currently in a state of collecting signals. When monitoring needs to be suspended, the target user 10 may double-click the key switch 132 again, and at this time, the signal collection indicator lamp 134 may flash to indicate that the signal collection is currently in the suspended state.
Fifth, the target user 10 may press the key switch 132 for a long time to end the monitoring, and the signal collection indicator 134 may not be turned on at this time, indicating that the signal collection has been finished. The target user 10 can view his or her own cough data during the monitoring period on the user terminal 40, and provide the cough data to the doctor for reference in diagnosis at the time of a visit or a follow-up visit.
In a specific implementation, the key switch 132 may be pressed for a long time to enter a signal collection state, the network connection indicator light 133 may flash to indicate that the network connection is successful, and the signal collection indicator light 134 may flash to indicate that a signal is being collected. The embodiment of the present invention is not limited thereto.
It should be noted that, in the embodiment of the present invention, the specific implementation of each unit may also correspond to the corresponding description of the method embodiments shown in fig. 2, fig. 4, fig. 7, and fig. 10.
According to the embodiment of the invention, the bone conduction sensor with excellent noise resistance and noise resistance is adopted, the signals are collected by combining the micro microphone, and the signal characteristics of the signals, which can represent the cough type, are extracted for matching and identification. Thus, it is possible to effectively filter environmental noise and other noises (for example, but not limited to, cough sounds, speaking sounds, body movement sounds, and the like of a target user who does not wear the cough type recognition device), recognize only the cough sounds of the target user, and perform more accurate recognition of cough types such as dry cough, wet cough, and the like. Meanwhile, the cough type recognition device belongs to a brooch type wearing mode, is small in structure, facilitates daily use and carrying of a user, and can be used for a long time.
Referring to fig. 14, fig. 14 is a schematic structural diagram of another cough type identification device according to an embodiment of the present invention, where the cough type identification device 14 may include: at least one processor 141, e.g., CPU, MCU, at least one network interface 144, memory 142, at least one communication bus 143. Wherein a communication bus 143 is used to enable the connection communication between these components. The network interface 144 may optionally include a standard wired interface, a wireless interface (e.g., WIFI interface, bluetooth interface), and a communication connection may be established with the server or the terminal through the network interface 144. The memory 142 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). As shown in fig. 14, the memory 142, which is a type of computer storage medium, may include therein an operating system, a network communication module, and program instructions.
It should be noted that the network interface 144 may be connected to an acquirer, a transmitter or other communication module, and the other communication module may include, but is not limited to, a WiFi module, a bluetooth module, etc., and it is understood that the cough type identifying device 14 may also include an acquirer, a transmitter and other communication module, etc. in the embodiment of the present invention.
Processor 141 may be configured to call program instructions stored in memory 142 and may perform the methods provided by the embodiments shown in fig. 2, 4, 7, and 10.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. And the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Claims (10)

1. A method for identifying a type of cough, comprising:
acquiring a first signal and a second signal acquired when a user coughs; the first signal is a signal obtained by collecting other signals which are not sound and are generated when the user coughs in a contact mode, and the second signal is an audio signal obtained by collecting the sound generated when the user coughs;
obtaining a third signal according to the first signal and the second signal;
extracting a first signal feature from the third signal;
determining a cough type of the user's cough from the first signal characteristic.
2. The method of claim 1, wherein the first signal, the second signal each comprise a plurality of signal frames, each signal frame corresponding to a different acquisition period;
the obtaining a third signal according to the first signal and the second signal includes:
if the energy of the first signal frame is greater than or equal to a preset energy threshold value, determining a second signal frame in the third signal as the first signal frame; the second signal frame and the first signal frame correspond to the same acquisition time period, and the first signal frame is any one of the first signals;
if the energy of the first signal frame is smaller than the preset energy threshold, determining that the second signal frame is a third signal frame in the second signal; the third signal frame corresponds to the same acquisition period as the first signal frame.
3. The method of claim 1, wherein the first signal features comprise an average short-time energy of a short-time energy spectrum of the third signal, a first slope, a second slope, and one or more mel-frequency cepstral coefficient feature vectors; the first slope of the third signal is a slope of a signal frame corresponding to a signal start point in the third signal in the short-time energy spectrum of the third signal, and the second slope of the third signal is a slope of a signal frame corresponding to a signal end point in the third signal in the short-time energy spectrum of the third signal.
4. The method of claim 3, wherein said determining a cough type of said user's cough from said first signal characteristic comprises:
inputting the first signal characteristic into a first model to obtain an output result of the first model; the output result is used for representing the cough type of the user cough, and the first model is a cough type recognition model which is obtained through pre-training and is based on a support vector data description algorithm.
5. The method of claim 4, wherein the first model comprises a plurality of cough templates, one cough template for each cough type, each cough template for each of an average short-time energy sub-model, a start-slope sub-model, an end-slope sub-model, and a vector sub-model; the center of the average short-time energy submodel is a first center, and the radius of the average short-time energy submodel is a first radius; the center of the starting point slope submodel is a second center, and the radius of the starting point slope submodel is a second radius; the center of the terminal slope submodel is a third center, and the radius of the terminal slope submodel is a third radius; the center of the vector sub-model is a fourth center, and the radius of the vector sub-model is a fourth radius;
the inputting the first signal characteristic into a first model to obtain an output result of the first model includes:
calculating the distance between the coordinate point corresponding to the average short-time energy of the third signal and the coordinate point corresponding to the first center corresponding to each cough template so as to determine the first distance corresponding to each cough template;
calculating a distance between a coordinate point corresponding to the first slope of the third signal and a coordinate point corresponding to the second center corresponding to each cough template to determine a second distance corresponding to each cough template;
calculating a distance between a coordinate point corresponding to the second slope of the third signal and a coordinate point corresponding to a third center corresponding to each cough template to determine a third distance corresponding to each cough template;
calculating one or more distances between a coordinate point corresponding to each of the one or more mel-frequency cepstrum coefficient feature vectors of the third signal and a coordinate point corresponding to a fourth center corresponding to each cough template to determine one or more fourth distances corresponding to each cough template;
determining that the cough type corresponding to the target cough template is the output result of the first model under the condition that a first distance corresponding to the target cough template is smaller than a first radius corresponding to the target cough template, a second distance corresponding to the target cough template is smaller than a second radius corresponding to the target cough template, a third distance corresponding to the target cough template is smaller than a third radius corresponding to the target cough template, and one or more fourth distances corresponding to the target cough template are smaller than a fourth radius corresponding to the target cough template; the target cough template is one of the plurality of cough templates.
6. The method of claim 4 or 5, wherein prior to determining the cough type of the user's cough from the first signal characteristic, the method further comprises:
acquiring signal characteristics of a preset number of sample signals of the target cough types; the target cough type is a preset and known cough type;
and obtaining the first model according to the signal characteristics of the sample signals of the preset number of target cough types.
7. The method of claim 6, wherein the signal features of the sample signal comprise an average short-time energy of a short-time energy spectrum of the sample signal, a first slope of the sample signal, a second slope of the sample signal, and one or more Mel cepstral coefficient feature vectors of the sample signal;
obtaining the first model according to the signal characteristics of the sample signals of the preset number of target cough types, including:
obtaining an average short-time energy submodel corresponding to the target cough type according to the average short-time energy of the sample signals of the preset number of target cough types;
obtaining a starting point slope submodel corresponding to the target cough type according to the first slopes of the sample signals of the preset number of target cough types;
obtaining a terminal slope submodel corresponding to the target cough type according to the second slopes of the sample signals of the preset number of target cough types;
and obtaining a vector sub-model corresponding to the target cough type according to one or more Mel cepstrum coefficient feature vectors of the sample signals of the preset number of target cough types.
8. A cough type identification device, comprising:
the first acquisition module is used for acquiring a first signal and a second signal acquired when a user coughs; the first signal is a signal obtained by collecting other signals which are not sound and are generated when the user coughs in a contact mode, and the second signal is an audio signal obtained by collecting the sound generated when the user coughs;
the second acquisition module is used for acquiring a third signal according to the first signal and the second signal;
the extraction module is used for extracting a first signal characteristic according to the third signal;
a determination module for determining a cough type of the user's cough from the first signal characteristic.
9. A cough type identification device, comprising: a processor and a memory for storing a computer program, the processor for invoking the computer program to perform the method of any of claims 1-7.
10. A computer-readable storage medium, comprising a computer program which, when on a cough type recognition device, causes the cough type recognition device to perform the method of any one of claims 1-7.
CN201911188230.4A 2019-11-27 2019-11-27 Cough type identification method, device and system Pending CN110946554A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911188230.4A CN110946554A (en) 2019-11-27 2019-11-27 Cough type identification method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911188230.4A CN110946554A (en) 2019-11-27 2019-11-27 Cough type identification method, device and system

Publications (1)

Publication Number Publication Date
CN110946554A true CN110946554A (en) 2020-04-03

Family

ID=69978740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911188230.4A Pending CN110946554A (en) 2019-11-27 2019-11-27 Cough type identification method, device and system

Country Status (1)

Country Link
CN (1) CN110946554A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112472065A (en) * 2020-11-18 2021-03-12 天机医用机器人技术(清远)有限公司 Disease detection method based on cough sound recognition and related equipment thereof
CN113409825A (en) * 2021-08-19 2021-09-17 南京裕隆生物医学发展有限公司 Intelligent health detection method and device, electronic equipment and readable storage medium
US11741986B2 (en) 2019-11-05 2023-08-29 Samsung Electronics Co., Ltd. System and method for passive subject specific monitoring
CN117064330A (en) * 2022-12-13 2023-11-17 上海市肺科医院 Sound signal processing method and device
CN117064330B (en) * 2022-12-13 2024-04-19 上海市肺科医院 Sound signal processing method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11741986B2 (en) 2019-11-05 2023-08-29 Samsung Electronics Co., Ltd. System and method for passive subject specific monitoring
CN112472065A (en) * 2020-11-18 2021-03-12 天机医用机器人技术(清远)有限公司 Disease detection method based on cough sound recognition and related equipment thereof
CN113409825A (en) * 2021-08-19 2021-09-17 南京裕隆生物医学发展有限公司 Intelligent health detection method and device, electronic equipment and readable storage medium
CN117064330A (en) * 2022-12-13 2023-11-17 上海市肺科医院 Sound signal processing method and device
CN117064330B (en) * 2022-12-13 2024-04-19 上海市肺科医院 Sound signal processing method and device

Similar Documents

Publication Publication Date Title
CN107799126B (en) Voice endpoint detection method and device based on supervised machine learning
Lu et al. Speakersense: Energy efficient unobtrusive speaker identification on mobile phones
CN110310623B (en) Sample generation method, model training method, device, medium, and electronic apparatus
CN101510905B (en) Method and apparatus for multi-sensory speech enhancement on a mobile device
US11587563B2 (en) Determining input for speech processing engine
CN101023469B (en) Digital filtering method, digital filtering equipment
WO2016150001A1 (en) Speech recognition method, device and computer storage medium
CN107657964A (en) Depression aided detection method and grader based on acoustic feature and sparse mathematics
JP2003255993A (en) System, method, and program for speech recognition, and system, method, and program for speech synthesis
WO2019023877A1 (en) Specific sound recognition method and device, and storage medium
CN110946554A (en) Cough type identification method, device and system
WO2019023879A1 (en) Cough sound recognition method and device, and storage medium
WO2021169742A1 (en) Method and device for predicting operating state of transportation means, and terminal and storage medium
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN109063624A (en) Information processing method, system, electronic equipment and computer readable storage medium
CN112016367A (en) Emotion recognition system and method and electronic equipment
Gao et al. Wearable audio monitoring: Content-based processing methodology and implementation
TW202117683A (en) Method for monitoring phonation and system thereof
CN110728993A (en) Voice change identification method and electronic equipment
Sengupta et al. Optimization of cepstral features for robust lung sound classification
JP4381404B2 (en) Speech synthesis system, speech synthesis method, speech synthesis program
Xu et al. Voiceprint recognition of Parkinson patients based on deep learning
CN116964669A (en) System and method for generating an audio signal
Cao et al. Comparing the performance of individual articulatory flesh points for articulation-to-speech synthesis
CN111508503B (en) Method and device for identifying same speaker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200413

Address after: 1706, Fangda building, No. 011, Keji South 12th Road, high tech Zone, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen shuliantianxia Intelligent Technology Co., Ltd

Address before: 518000, building 10, building ten, building D, Shenzhen Institute of Aerospace Science and technology, 6 hi tech Southern District, Nanshan District, Shenzhen, Guangdong 1003, China

Applicant before: SHENZHEN H & T HOME ONLINE NETWORK TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right