CN107898432A - Sound of snoring detection method, device, equipment and medium - Google Patents
Sound of snoring detection method, device, equipment and medium Download PDFInfo
- Publication number
- CN107898432A CN107898432A CN201710977496.1A CN201710977496A CN107898432A CN 107898432 A CN107898432 A CN 107898432A CN 201710977496 A CN201710977496 A CN 201710977496A CN 107898432 A CN107898432 A CN 107898432A
- Authority
- CN
- China
- Prior art keywords
- audio data
- target
- snore
- target audio
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010041235 Snoring Diseases 0.000 title claims abstract description 174
- 238000001514 detection method Methods 0.000 title abstract description 41
- 239000013598 vector Substances 0.000 claims abstract description 87
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 83
- 239000011159 matrix material Substances 0.000 claims description 77
- 238000012545 processing Methods 0.000 claims description 24
- 230000009467 reduction Effects 0.000 claims description 24
- 238000000354 decomposition reaction Methods 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 20
- 238000003860 storage Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 10
- 230000004069 differentiation Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 17
- 238000001228 spectrum Methods 0.000 description 14
- 230000005236 sound signal Effects 0.000 description 12
- 230000008569 process Effects 0.000 description 10
- 238000000513 principal component analysis Methods 0.000 description 7
- 238000009826 distribution Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000000737 periodic effect Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 230000007958 sleep Effects 0.000 description 5
- 238000012706 support-vector machine Methods 0.000 description 5
- 238000009432 framing Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000029058 respiratory gaseous exchange Effects 0.000 description 4
- 210000001015 abdomen Anatomy 0.000 description 2
- 208000008784 apnea Diseases 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 210000003205 muscle Anatomy 0.000 description 2
- 210000003928 nasal cavity Anatomy 0.000 description 2
- 208000001797 obstructive sleep apnea Diseases 0.000 description 2
- 230000000241 respiratory effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000033764 rhythmic process Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 201000002859 sleep apnea Diseases 0.000 description 2
- 208000024891 symptom Diseases 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 206010001497 Agitation Diseases 0.000 description 1
- 206010002383 Angina Pectoris Diseases 0.000 description 1
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 description 1
- 206010008190 Cerebrovascular accident Diseases 0.000 description 1
- 208000017667 Chronic Disease Diseases 0.000 description 1
- 241001672694 Citrus reticulata Species 0.000 description 1
- 206010020772 Hypertension Diseases 0.000 description 1
- 206010021143 Hypoxia Diseases 0.000 description 1
- 208000006011 Stroke Diseases 0.000 description 1
- 206010042434 Sudden death Diseases 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 208000026106 cerebrovascular disease Diseases 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 210000000038 chest Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001447 compensatory effect Effects 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 231100000206 health hazard Toxicity 0.000 description 1
- 230000007954 hypoxia Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 208000010125 myocardial infarction Diseases 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 208000019116 sleep disease Diseases 0.000 description 1
- 208000020685 sleep-wake disease Diseases 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4806—Sleep evaluation
- A61B5/4818—Sleep apnoea
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of sound of snoring detection method, device, equipment and medium.The sound of snoring detection method includes:Target feature vector corresponding with target audio data is obtained, and the target feature vector is inputted in Support Vector data description algorithm SVDD models, obtains and differentiates result;The target feature vector is the feature vector obtained to the mel-frequency scramble coefficient MFCC of the target audio data after feature extraction;Meet predetermined period in the cycle of the target audio data, and in the case that the differentiation result meets default result, it is the sound of snoring to determine the target audio data.Using the embodiment of the present invention, available for simpler, realize that the sound of snoring detects exactly, improve the efficiency of device.
Description
Technical Field
The invention relates to the technical field of audio signal processing, in particular to a snore detecting method and device.
Background
Snoring is a kind of sleep disorder that people often meet in daily life, and the loudness of snore increases more than 60dB after the patient sleeps well, hinders the gas exchange during normal breathing, called snore symptom, 5% snore symptom patient holds breath to different extent during sleep concurrently, called obstructive sleep apnea syndrome (OSAHS). The air passage of the snorer is usually narrower than that of a normal person, and the air passage is kept open by compensatory contraction of throat muscles when the snorer is awake in the day without blockage; however, when sleeping at night, nerve excitability is reduced, muscles are relaxed, pharyngeal tissues are blocked, an upper airway is collapsed, and when airflow passes through a narrow part, vortex is generated and vibration is caused, so that snore occurs. Snoring is not considered by most people to be a health hazard, however, long term snoring or snoring among people with severe snoring is often accompanied by sleep apnea syndrome: apnea occurs in the whole sleep process, oxygen in blood is reduced, and a short-time hypoxia state is formed. Some chronic diseases such as: hypertension, myocardial infarction, angina pectoris, cerebrovascular accident, etc., and sudden death may be caused if the night apnea time exceeds 120 seconds in severe cases.
At present, the snore is diagnosed and evaluated mainly by using a Polysomnography (PSG) to comprehensively analyze the sleep condition of a tested person and the severity of snore by recording various signals such as brain waves, electromyograms, electrocardiograms, air flow of oral cavities and nasal cavities, respiratory movement of breasts and abdomens, sounds and the like in the sleeping process. However, it requires professional personnel to operate in professional places, and the monitoring process is attached to the face, nasal cavity, chest and abdomen of the monitored object in a binding manner, which causes great interference to the monitored object and high monitoring cost. The other device suitable for family treatment utilizes the relation between the periodicity of snore and respiratory motion, but has low accuracy for sometimes-existing snore judgment in a complex environment.
Therefore, how to make the snore detection convenient and simple and the accuracy high becomes especially important.
Disclosure of Invention
The technical problem to be solved by the embodiment of the invention is to provide a snore detecting method, a snore detecting device, snore detecting equipment and a snore detecting medium; the snore detection device is used for simply and accurately realizing snore detection and improving the efficiency of the device.
In a first aspect, an embodiment of the present invention provides a snore detecting method, including:
acquiring a target feature vector corresponding to target audio data, and inputting the target feature vector into a support vector data description algorithm (SVDD) model to obtain a judgment result; the target feature vector is obtained by performing feature extraction on a Mel Frequency Cepstrum Coefficient (MFCC) of the target audio data;
and determining the target audio data as snore under the condition that the period of the target audio data meets a preset period and the judgment result meets a preset result.
In an optional implementation manner, before the obtaining of the target feature vector corresponding to the target audio data, the method further includes:
after original audio data are obtained, the target audio data are determined, and the target audio data are effective audio data corresponding to the original audio data.
In an optional implementation, the determining the target audio data includes:
and detecting effective audio data in the original audio data by using a voice endpoint detection method VAD to obtain the target audio data.
In an optional implementation manner, in a case that the period of the target audio data satisfies a preset period and the determination result satisfies a preset result, the method further includes:
and saving the original audio data.
In an optional implementation manner, the obtaining a target feature vector corresponding to target audio data includes:
calculating the MFCC of the target audio data to obtain an X X Y coefficient matrix, wherein X is the frame number of the target audio data, and Y is the MFCC coefficient length;
decomposing the X X Y coefficient matrix by using a Singular Value Decomposition (SVD) method to obtain a first target matrix;
and after the dimension reduction processing is carried out on the first target matrix, the target characteristic vector is obtained, and the target characteristic vector comprises the integral trend characteristic of the target audio data.
In an optional implementation manner, the period of the target audio data satisfying the preset period includes:
the time domain period of the target audio data meets a preset time domain period, and/or the frequency domain period of the target audio data meets a preset frequency domain period.
In an alternative implementation, the duration of the target audio data is greater than a preset time.
In an optional implementation, before the inputting the target audio data into the SVDD model, the method further includes:
and obtaining the SVDD model.
In an optional implementation manner, the acquiring the SVDD model includes:
acquiring an audio data sample and calculating the MFCC of the audio data sample to obtain an N x L coefficient matrix, wherein N is the frame number of the audio data sample, and L is the MFCC coefficient length;
decomposing the N x L coefficient matrix by using a Singular Value Decomposition (SVD) method to obtain a second target matrix;
and after the second target matrix is subjected to dimension reduction processing, obtaining a preset number of feature vectors, training the preset number of feature vectors by using an SVDD algorithm, and obtaining the SVDD model, wherein the preset number of feature vectors comprise the integral trend features of the audio data sample.
In a second aspect, an embodiment of the present invention provides a snore detecting device, including:
a first acquisition unit configured to acquire a target feature vector corresponding to target audio data;
an input obtaining unit, configured to input the target feature vector into a support vector data description algorithm SVDD model, so as to obtain a discrimination result, where the target feature vector is a feature vector obtained by performing feature extraction on a mel-frequency cepstrum coefficient MFCC of the target audio data;
and the determining unit is used for determining that the target audio data is snore under the condition that the period of the target audio data meets a preset period and the judgment result meets a preset result.
In an optional implementation manner, the first obtaining unit is further configured to obtain original audio data;
the determining unit is further configured to determine the target audio data, where the target audio data is valid audio data corresponding to the original audio data.
In an optional implementation manner, the determining unit is specifically configured to detect valid audio data in the original audio data by using a voice endpoint detection method VAD to obtain the target audio data.
In an optional implementation manner, the first obtaining unit includes:
the first calculating subunit is used for calculating the MFCC of the target audio data to obtain an X X Y coefficient matrix, wherein X is the frame number of the target audio data, and Y is the MFCC coefficient length;
the first decomposition subunit is used for decomposing the X X Y coefficient matrix by using a Singular Value Decomposition (SVD) method to obtain a first target matrix;
and the first dimension reduction processing subunit is configured to perform dimension reduction processing on the first target matrix to obtain the target feature vector, where the target feature vector includes an overall trend feature of the target audio data.
In an optional implementation manner, the period of the target audio data satisfying the preset period includes:
the time domain period of the target audio data meets a preset time domain period, and/or the frequency domain period of the target audio data meets a preset frequency domain period.
In an alternative implementation, the duration of the target audio data is greater than a preset time.
In an optional implementation, the apparatus further comprises:
and the second acquisition unit is used for acquiring the SVDD model.
In an optional implementation manner, the second obtaining unit includes:
the second calculating subunit is used for calculating the MFCC of the audio data sample to obtain an N x L coefficient matrix, wherein N is the frame number of the audio data sample, and L is the MFCC coefficient length;
the second decomposition subunit is used for decomposing the N x L coefficient matrix by using a Singular Value Decomposition (SVD) method to obtain a second target matrix;
the second dimension reduction processing subunit is configured to perform dimension reduction processing on the second target matrix, and then obtain a preset number of feature vectors, where the preset number of feature vectors includes an overall trend feature of the audio data sample;
and the training subunit is used for training the feature vectors of the preset number by using an SVDD algorithm to obtain the SVDD model.
In a third aspect, an embodiment of the present invention provides a snore detecting device, including a processor, an input device, an output device, and a memory, where: the processor, the input device, the output device and the memory are interconnected, the memory is used for storing a computer program, the computer program comprises program instructions, and the processor is configured to call the program instructions to execute the method as described in the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, where the computer program includes program instructions, which, when executed by a processor of a snore detecting device, cause the processor to perform the method described in the first aspect or any one of the possible implementation manners of the first aspect.
In a fifth aspect, an embodiment of the present invention provides a computer program product containing program instructions, which when run on a computer, causes the computer to perform the method described in the first aspect or any one of the possible implementations.
By implementing the embodiment of the invention, the acquired target characteristic Vector corresponding to the target audio Data is input into a Support Vector Data Description (SVDD) model to obtain the judgment result, so that when the period of the target audio Data meets the preset period and the judgment result meets the preset result, the target audio Data is determined to be snore, the snore detection is conveniently, quickly and accurately realized, and the efficiency of the snore detection device is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present invention, the drawings required to be used in the embodiments or the background art of the present invention will be described below.
Fig. 1 is a schematic flow chart of a snore detecting method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the relationship between f (m) and m according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a periodicity of snore signals according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a time-domain snore signal according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a snore signal power spectrum according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an energy curve of a 1min snore signal according to an embodiment of the present invention;
FIG. 7 is a schematic flow chart of another snore detecting method according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of an SVDD classification case according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a snore detecting device according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a first obtaining unit according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of another snore detecting device provided by the embodiment of the invention;
fig. 12 is a schematic structural diagram of a second obtaining unit according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of another snore detecting device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. It should be noted that the detailed description set forth in connection with the appended drawings is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The apparatus embodiments and method embodiments described herein are described in the following detailed description and illustrated in the accompanying drawings by various blocks, modules, units, components, circuits, steps, processes, algorithms, etc. (collectively referred to as "elements"). These elements may be implemented using electronic hardware, computer software, or any combination thereof. Whether such elements are implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. The terms first, second, etc. in the description and claims of the present invention and in the drawings of the specification, if used in describing various aspects, are used for distinguishing between different objects and not for describing a particular order.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
It should be noted that, unless otherwise specified, various technical features in the embodiments of the present invention may be regarded as being capable of being combined or coupled with each other as long as the combination or coupling is not technically impossible to implement. While certain exemplary, optional, or preferred features may be described in combination with other features in various embodiments of the invention for a more complete description of the invention, it is not necessary for such combination to be considered, and it is to be understood that the exemplary, optional, or preferred features and the other features may be separable or separable from each other, provided that such separation or separation is not technically impractical. Some functional descriptions of technical features in method embodiments may be understood as performing the function, method, or step, and some functional descriptions of technical features in apparatus embodiments may be understood as performing the function, method, or step using the apparatus.
Referring to fig. 1, fig. 1 is a schematic flow chart of a snore detecting method according to an embodiment of the present invention, where the snore detecting method is applicable to a snore detecting device, and the snore detecting device may include: various devices such as a Mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), an intelligent wearable Device (such as a smart watch and a smart bracelet), and the like may also be professional sound detection devices, and the embodiment of the present invention is not limited thereto. As shown in fig. 1, the snore detecting method may include:
101. acquiring a target feature vector corresponding to target audio data, and inputting the target feature vector into a support vector data description algorithm (SVDD) model to obtain a judgment result;
the target feature vector is a feature vector obtained by extracting features of Mel-frequency cepstrum coefficients (MFCCs) of the target audio data.
The SVDD model refers to a pre-configured or trained model capable of identifying snore signals based on target feature vectors.
Specifically, an embodiment of the present invention further provides a method for obtaining a target feature vector, where obtaining the target feature vector corresponding to the target audio data includes:
calculating the MFCC of the target audio data to obtain an X X Y coefficient matrix, wherein X is the frame number of the target audio data, and Y is the MFCC coefficient length;
decomposing the X X Y coefficient matrix by using a Singular Value Decomposition (SVD) method to obtain a first target matrix;
and after the dimension reduction processing is carried out on the first target matrix, the target characteristic vector is obtained.
Wherein, the Mel frequency is extracted based on the auditory characteristics of human ears, and forms a nonlinear corresponding relation with the frequency Hz. The MFCC is used for extracting the voice data features and reducing the operation dimension by using the relationship between the MFCC and the Hz frequency spectrum features obtained by calculation. For example: for 512-dimensional (sampling point) data in a frame, the most important 40-dimensional (general) data can be extracted after MFCC, and the purpose of reducing dimensions is also achieved. The steps may include:
1) pre-emphasis
The pre-emphasis is to boost the high frequency part to flatten the spectrum of the signal, and to maintain the spectrum in the whole frequency band from low frequency to high frequency, so that the spectrum can be obtained with the same signal-to-noise ratio. Meanwhile, the method is also used for eliminating the effect of vocal cords and lips in the sounding process, compensating the high-frequency part of the voice signal which is restrained by the sounding system, and highlighting the formants of the high frequency. This is typically achieved by pre-emphasis of the sampled speech signal s (n) by a first order Finite Impulse Response (FIR) high-pass digital filter. The transfer function is:
H(z)=1-a·z-1
wherein z represents an input signal, a time domain representation is a snore sample signal s (n), and a represents a pre-emphasis coefficient which is generally a constant in the range of 0.9-1.0.
2) Framing
Every N sampling points in the snore sample signal s (N) are grouped into an observation unit, which is called a frame. The value of N can be 256 or 512, and the covering time is about 20-30 ms. To avoid excessive variation between two adjacent frames, an overlap region is formed between two adjacent frames, the overlap region includes M sampling points, and M is usually about 1/2 or 1/3 of N. The sampling frequency of the speech signal used for speech recognition is 8KHz or 16KHz, and in 8KHz, if the frame length N is 256 samples, the corresponding time length is 256/8000 × 1000 ═ 32 ms.
3) Window with window
Each frame is multiplied by a hamming window to increase the continuity of the left and right ends of the frame. Assuming that the signal after framing is S (N), N is 0,1 …, N-1, N is the size of the frame, then after multiplication by the hamming window, S' (N) is S (N) × W (N), where
Where L represents the window length.
4) Fast Fourier Transform (FFT)
Since the signal is usually difficult to see by the transformation in the time domain, it is usually observed by transforming it into an energy distribution in the frequency domain, and different energy distributions can represent the characteristics of different voices. After multiplication by the hamming window, each frame must also undergo a fast fourier transform to obtain the energy distribution over the spectrum. And carrying out fast Fourier transform on each frame signal subjected to framing and windowing to obtain the frequency spectrum of each frame. And the power spectrum of the voice signal is obtained by taking the modulus square of the frequency spectrum of the voice signal.
5) Triangular band-pass filter
The energy spectrum is passed through a set of Mel-scale triangular filter banks, defining a filter bank with M filters (the number of filters is close to the number of critical bands), and the filters are triangular filters with center frequency f (M), where M is 1, 2. M is usually 22 to 26. The interval between each (m) decreases as the value of m decreases and increases as the value of m increases, as shown in fig. 2.
Wherein the frequency response of the triangular filter is defined as:
wherein,
6) discrete Cosine Transform (DCT)
The logarithmic energy of each filter bank output is calculated as:
obtaining MFCC coefficients by DCT on logarithmic energy s (m):
the MFCC can be obtained through the steps, and thus a coefficient matrix of X X Y can be obtained, wherein X is the frame number of the target audio data, and Y is the MFCC coefficient length.
The dimension of the MFCC coefficient matrix is high, and the number X of the matrix lines is different due to the fact that the lengths of the snore signals are inconsistent, so that the MFCC matrix cannot be used as direct input of the SVDD model, the MFCC matrix needs to be further processed to input the SVDD model.
Referring to fig. 3, fig. 3 is a schematic diagram of a snore signal cycle provided by an embodiment of the invention, wherein a horizontal line represents time(s) and a vertical axis represents signal amplitude, and as can be seen from fig. 3, most of the snore signals (about 90%) have substantially consistent trend and have the following characteristics: the time intervals of the snores are approximately equal, the amplitudes are basically consistent, and the snores are closely related to the breathing rhythm of the human body; secondly, the pulse width of the signal showing snore is more regular than that of the voice signal and is periodic; in addition, the short-term energy of the snore signal is obviously higher than that of the mandarin speech signal, and the energy distribution of the snore in the whole sleeping process is uniform, so that in order to identify the snore signal, the overall trend characteristic capable of representing the characteristics can be extracted from the MFCC matrix, the overall trend characteristic is used as input, and an SVDD model is established to identify target audio data. The overall trend characteristic refers to a characteristic capable of reflecting the signal transformation trend.
In order to extract the overall trend feature from the MFCC matrix, the overall trend feature may be obtained by reducing the dimension of the MFCC matrix, for example, after the dimension of the MFCC matrix is reduced by using methods such as Dynamic Time Warping (DTW) and Principal Component Analysis (PCA), an SVDD model of the snore signal is trained, but it is actually found that the SVDD model has a small degree of division between the snore and the noise, for example, the snore recognition rate of the SVDD model trained after reducing the dimension by using the PCA is about 85%, and the noise misrecognition rate is 35%.
Therefore, the valid snore signal features may be lost if the MFCC matrix dimension reduction method is not properly selected, and therefore, the overall trend features need to be extracted by selecting a proper method according to snore time domain and frequency domain characteristics. Further referring to fig. 4 and 5, fig. 4 is a schematic diagram of a time-domain snore signal provided by an embodiment of the present invention, fig. 5 is a schematic diagram of a snore signal power spectrum provided by an embodiment of the present invention, as can be seen from fig. 4, the snore signal has periodicity in a time domain and certain energy distribution characteristics, as can be seen from fig. 5, the snore signal is mainly distributed in a low frequency band on a frequency domain and has characteristics of a specific frequency width, a specific formant, and the like, in order to keep the characteristics of the snore signal on the time domain and the frequency domain as much as possible to improve a signal recognition rate, before dimension reduction is performed on the MFCC matrix, a Singular Value Decomposition (SVD) method is first adopted to obtain a main part of the MFCC coefficient matrix, that is a first target matrix, the first target matrix includes time-domain valid characteristics and frequency-domain valid characteristics of target audio data, and then dimension reduction is performed on the first target matrix, and obtaining a target feature vector, wherein the target feature vector comprises the overall trend feature of the target audio data.
In the embodiment of the present invention, the PCA may be adopted to perform dimension reduction processing on the first target matrix.
More specifically, since the features of the snore signal are mainly reflected in the first eigenvalues of the first target matrix in one period, after the first target matrix is obtained, the first eigenvalues of the first target matrix can be extracted, and then the dimension reduction processing is performed on the first eigenvalues of the first target matrix. For example, if the MFCC feature matrix is obtained for 1000 snore signals, and SVD is adopted to decompose the signals to obtain eigenvalues and eigenvectors, since 95% of the signal features are concentrated on the first 25 eigenvectors, the PCA algorithm only needs to reduce the dimensions of the first 25 eigenvectors, and finally, 25-dimensional snore overall eigenvectors can be obtained.
It can be understood that the method can make the lengths of the first target matrixes consistent (for example, make the number of rows of the first target matrixes the same), extract the overall trend characteristics of the target audio data, avoid the loss of snore signals, improve the accuracy of snore identification, and improve the efficiency of the snore detecting device.
Optionally, an embodiment of the present invention further provides a method for acquiring target audio data, which is as follows:
after original audio data are obtained, determining the target audio data, wherein the target audio data are effective audio data corresponding to the original audio data;
in this embodiment, the original audio data may be snore or other sounds. Wherein the snore detecting device may comprise a microphone, such that the microphone is used to obtain raw audio data. The embodiment can also acquire original audio data and the like in a mode of externally connecting a microphone. The original audio data can be acquired through an external audio acquisition device. It is to be understood that the method for acquiring the original audio data is not limited by the embodiment of the present invention.
In this embodiment, determining the target audio data may extract valid audio data in the original audio data by a correlation detection method, that is, extract a valid signal in the original audio data by a correlation detection method.
Optionally, this embodiment provides a method for determining target audio data, where determining the target audio data includes:
and detecting effective audio data in the original audio data by using a voice endpoint detection method VAD to obtain the target audio data.
Voice Activity Detection (VAD), also called Voice Activity Detection, refers to detecting whether Voice exists in a noise environment, and is generally used in Voice processing systems such as Voice coding and Voice enhancement, and plays roles of reducing Voice coding rate, saving communication bandwidth, reducing device energy consumption, improving recognition rate, and the like.
For example, the valid audio data may be extracted by using a double-threshold detection method, an autocorrelation maximum-based detection method, or a wavelet transform-based detection method, and the like, it is to be understood that the present embodiment is not limited to the specific implementation manner.
By implementing the embodiment, effective audio data can be extracted, the snore detecting efficiency of the snore detecting device is improved, and the interference of other signals is avoided.
102. And under the condition that the period of the target audio data meets a preset period and the judgment result meets a preset result, determining that the target audio data is snore.
In this embodiment, if the target audio data and the SVDD model can be successfully matched, the determination result may be considered to conform to the preset result, it can be understood that the determination result may be determined according to the setting of the SVDD model, and if the obtained determination result may be "yes", or other prompt information, and the like, and the specific implementation manner of the determination result is not limited in this embodiment.
Optionally, in the case that the period of the target audio data satisfies a preset period and the determination result meets a preset result, the method further includes:
and storing the original audio data and/or storing the target audio data.
By storing the original audio data and/or the target audio data, the embodiment can also conveniently count the snore records of the user, such as the snore times of the user per hour or information such as snore decibels, and the like, so as to provide a comparison basis for the physical health condition of the user.
Specifically, the duration of the target audio data is greater than a preset time.
For a certain snore, time intervals of each snore occurrence are substantially equal, amplitudes of the snore occurrences are substantially consistent, and the snore periods are closely related to a breathing rhythm of a human body, so that snore periods are substantially consistent with a breathing rate, that is, when sound with a period within a range of 10-60 times/minute appears in an audio signal of 1min, a snore signal can be preliminarily determined, as shown in fig. 6, fig. 6 is a schematic diagram of an energy curve of the snore signal of 1min provided by the embodiment of the present invention, wherein a horizontal axis represents time(s), and a vertical axis represents an energy curve amplitude. Therefore, in this embodiment, the preset time may be 1 minute. It is to be understood that the above-described audio signal can be understood as the target audio data referred to in the present embodiment.
Specifically, the step of satisfying the preset period by the period of the target audio data includes:
the time domain period of the target audio data satisfies a preset time domain period, and/or the frequency domain period of the target audio data satisfies a preset frequency domain period.
Specifically, whether the period of the target audio data satisfies the preset period may be determined by the following steps.
1) The energy curve of the 1min audio signal (i.e. snoring) is found: for example, the absolute value of the audio signal is calculated, the energy sum of the audio signal every 0.05s is calculated, an audio energy curve with the sampling rate of 20Hz is obtained, and local noises such as burrs in the energy curve are filtered by adopting a smoothing filter.
2) The time domain method finds the time domain period of the audio signal: searching signal wave crests on the audio energy curve obtained in the step 1), setting a wave crest interval threshold value and an amplitude threshold value, excluding part of wave crests which do not meet the conditions, calculating the wave crest interval, wherein the wave crest interval is a time domain period, and judging whether the time domain period is within the snore period range or not when the wave crest interval within 1min has certain periodicity; when the wave peak distance is not periodic within 1min, the audio signal is considered as noise.
3) The frequency domain method finds the frequency domain period of the audio signal: performing Fourier transform on the audio energy curve to obtain a frequency spectrum of the audio energy curve, and determining that the audio signal is snore when the formant information and the fundamental frequency information meet the characteristics of the snore signal according to the formant information and the fundamental frequency information; otherwise, the section of the audio signal is determined to be noise.
It can be understood that, in the specific implementation, the step 2) and the step 3) can be executed simultaneously, so that the accuracy rate of the snore detection device for detecting the snore is improved; and the snore detecting device can also be independently executed, so that the arithmetic operation amount is reduced, and the running memory of the snore detecting device is reduced.
In the embodiment, the method for identifying snore by combining the SVDD model with the periodic detection can obviously improve the accuracy rate of snore identification, can detect sleep by a simple snore detection device, reduces the manufacturing cost of products and improves the user experience. Meanwhile, other noise signal interference can be eliminated as much as possible, and the accuracy of snore detection is realized.
By implementing the embodiment of the invention, the snore detection can be conveniently, quickly and accurately realized, and the efficiency of the snore detection device is improved.
It will be appreciated that on the basis of the method described in fig. 1, the SVDD model described above may also be obtained before the target audio data is input into the SVDD model of the support vector data description algorithm. The SVDD model may be a trained SVDD model, which may be pre-stored in the snore detecting device, or trained by the snore detecting device before the method described in fig. 1 is performed, etc. The method for training the SVDD model may be the method described in fig. 7, or may be other methods, and the embodiments of the present invention are not limited uniquely. Further, if the accuracy rate of the SVDD-based model for identifying snore is not qualified due to scene change or other reasons, the SVDD model can be reconfigured or trained. Referring to fig. 7, fig. 7 is a schematic flow chart of another snore detecting method according to an embodiment of the present invention, where the snore detecting method is applicable to a snore detecting device, and as shown in fig. 7, the snore detecting method may include:
701. acquiring an audio data sample;
in this embodiment, generally, the audio data samples may be snore samples of people of different genders, or snore samples of people of different ages, and the like, and it is understood that the number of the audio data samples is not limited in the embodiment of the present invention.
702. Training the audio data sample to obtain an SVDD model;
specifically, an embodiment of the present invention further provides a method for training an SVDD model, where the training of the audio data sample to obtain the SVDD model includes:
calculating the MFCC of the audio data sample to obtain an N x L coefficient matrix, wherein N is the frame number of the audio data sample, and L is the MFCC coefficient length;
decomposing the N x L coefficient matrix by using a Singular Value Decomposition (SVD) method to obtain a second target matrix;
and after the second target matrix is subjected to dimension reduction processing, obtaining a preset number of feature vectors, training the preset number of feature vectors by using an SVDD algorithm, and obtaining the SVDD model, wherein the preset number of feature vectors contain the overall trend features of the audio data sample.
It is understood that after the audio data samples are obtained, valid audio samples in the audio data samples may be detected by the VAD detection method, so as to calculate the MFCC of the audio data samples.
In this embodiment, the method for calculating the MFCC of the audio data sample may refer to the method in the foregoing embodiment, for example, the MFCC coefficients of the audio data sample are obtained and an N × L coefficient matrix is obtained through the steps of pre-emphasis, framing, windowing, FFT, triangular band-pass filter, and DCT, where specific steps may refer to the foregoing embodiment and are not described herein again. After the audio data sample, such as the coefficient of the MFCC of the snore signal, is calculated, a coefficient matrix of N x L is obtained, wherein N is the number of frames of the snore signal, and L is the length of the MFCC coefficient. Because the MFCC coefficient matrix has a high dimension, and the number N of the matrix rows is different due to the inconsistent lengths of the snore signals, in order to further extract the effective features of the snore signals, PCA can be used for performing dimension reduction processing on the second target matrix. For example, if the MFCC feature matrix is obtained for 1000 snore signals, and SVD is adopted to decompose the signals to obtain eigenvalues and eigenvectors, since 95% of the signal features are concentrated on the first 25 eigenvectors, the PCA algorithm only needs to reduce the dimensions of the first 25 eigenvectors, and finally, 25-dimensional snore overall eigenvectors can be obtained. By the method, the lengths of the second target matrixes can be consistent, and the overall trend characteristics of the audio data sample can be extracted, so that the training effectiveness is improved. It is understood that, in the present embodiment, the N × L coefficient matrix may be the same as or different from the X × Y coefficient matrix, and in a specific implementation, the present embodiment is not limited uniquely according to specific situations.
A Support Vector Machine (SVM) is a new universal learning method proposed by Vapnik et al according to a statistical theory, and is established on the basis of a structural risk minimization principle, and has a very good classification effect and learning popularization capability on a nonlinear high-dimensional small sample problem. The SVDD algorithm is similar to the principle of the SVM, and the SVDD has the advantages of the standard SVM and the unique advantages and is more suitable for the identification application of specific sounds.
The SVDD basic principle is to calculate a spherical decision boundary for an input sample, and divide the whole space into two parts, wherein one part is the space in the boundary and is regarded as an acceptable part; the other part is the space outside the boundary and is considered as the rejected part. This allows the SVDD to have classification characteristics of a class of samples. In the training process, the size and the range of the hypersphere are controlled to enable the hypersphere to contain as many sample points as possible, and on the other hand, the radius of the hypersphere is required to be minimum to enable the hypersphere to achieve the optimal classification effect. Therefore, the role of the hypersphere is not only to separate the two types, but also to pack the samples in the ball as "firm" and pack as "pure" as possible, and to reject the samples of other types from entering.
The basic idea of identifying snore signals by using SVDD is as follows: training an SVDD (singular value decomposition) model according to the integral features extracted from the snore signals, and training a 25-dimensional hypersphere (the hypersphere refers to a sphere in a space with more than 3 dimensions, a curve is in a corresponding 2-dimensional space, a sphere is in a 3-dimensional space, and the hypersphere is in the 3-dimensional space), wherein under an ideal condition, all snore signals can be contained in the hypersphere, and non-snore is eliminated outside the hypersphere; however, in practical situations, in order to improve the accuracy of snore detection and reduce the false recognition rate, the snore sample can only be selected as comprehensively as possible, and on the premise that the hypersphere already contains all snore training signals, the boundary of the snore sample is narrowed as much as possible, so that the radius of the snore sample is minimized.
When the SVDD model is used for identifying a new audio signal, if the overall characteristics of the signal fall into the hypersphere, the signal is preliminarily judged to be a snore signal, otherwise, the signal is not the snore signal. Fig. 8 shows the result of the classification of 2-dimensional data by SVDD, and most of the samples are contained in the oval boundary (SVDD dimension is determined by the selected feature, and the extracted feature is many and the dimension is high, and for convenience of presentation, as an example of 2-dimensional data, the feature dimension may be high in practical application).
The specific training process of the SVDD model will be described in detail below:
the optimization goal of SVDD is to find a minimum sphere with a center a and a radius R:
wherein C and ξ represent model control parameters, C represents an input vector gamma range in the formula (1), and ξ represents a penalty factor.
So that this sphere satisfies:
(xi-a)T(xi-a)≤R2+ξi
satisfying this condition means that the data points in the training data set are all wrapped in a sphere (for data x above 3-D)iThe spherical surface is a hypersphere. The hypersphere refers to a sphere in a space with more than 3 dimensions, a curve is in a corresponding 2-dimensional space, and a sphere is in a 3-dimensional space).
Now, with the target to be solved and the constraint, the lagrange multiplier method is used as follows:
wherein alpha isi≥0,γinot less than 0, respectively for parameters R, a and xiiTaking the partial derivative and making the derivative equal to 0 yields:
the dual problems can be obtained by substituting the above (2), (3) and (4) into the formula (1):
wherein,
the above vector inner product can also be solved like SVM by a kernel function:
after solving the center a and the radius R of the SVDD model, the training process is completed, and the snore signal can be identified by using the model: firstly, extracting the integral feature of the snore signal, then calculating the distance D from the integral feature to the center a of the SVDD model, and preliminarily judging that the sample is snore when D is less than R.
703. After original audio data are obtained, target audio data are determined, wherein the target audio data are effective audio data corresponding to the original audio data;
704. acquiring a target feature vector corresponding to the target audio data, and inputting the target feature vector into the SVDD model to obtain a judgment result;
in an embodiment of the present invention, the target feature vector is a feature vector obtained by performing feature extraction on a mel-frequency cepstrum coefficient MFCC of the target audio data;
705. and under the condition that the period of the target audio data meets a preset period and the judgment result meets a preset result, determining that the target audio data is snore.
It is understood that the specific implementation of steps 703-705 may refer to the method described in fig. 1, which is not described herein again.
The invention combines SVDD model with period detection for snore identification. At present, most of household devices for detecting snore are based on simple voice parameters, such as zero-crossing rate, energy spectrum, power spectrum and the like. Some medical devices employ complex models. In comparison, the algorithm based on simple parameters has low accuracy and is difficult to meet the requirements; the complex model has high complexity, large computation amount and high cost, and is difficult to realize on household equipment. Therefore, by implementing the embodiment of the invention, the SVDD model is combined with the periodic detection and applied to the snore detection device such as a home or a portable device, so that the snore detection is convenient and quick, the snore detection can be more accurately realized, and beneficial information is provided for the disease diagnosis of a user.
The method of embodiments of the present invention is set forth above in detail and the apparatus of embodiments of the present invention is provided below.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a snore detecting device according to an embodiment of the present invention, where the snore detecting device may be used to execute a snore detecting method according to an embodiment of the present invention, and as shown in fig. 9, the snore detecting device may include:
a first obtaining unit 901 configured to obtain a target feature vector corresponding to target audio data;
an input obtaining unit 902, configured to input the target feature vector into a SVDD model, to obtain a determination result, where the target feature vector is a feature vector obtained by performing feature extraction on a mel-frequency cepstrum coefficient MFCC of the target audio data;
a determining unit 903, configured to determine that the target audio data is snore when a period of the target audio data meets a preset period and the determination result meets a preset result.
By implementing the embodiment of the invention, the snore detection can be conveniently, quickly and accurately realized, and the efficiency of the snore detection device is improved.
Optionally, the first obtaining unit 901 is further configured to obtain original audio data;
the determining unit 903 is further configured to determine target audio data, where the target audio data is valid audio data corresponding to the original audio data;
specifically, the determining unit 903 is specifically configured to detect valid audio data in the original audio data by using a voice endpoint detection method VAD to obtain the target audio data.
By implementing the embodiment, the snore recognition rate can be improved, and other interferences can be eliminated.
Optionally, the snore detecting device may further include:
and the storage unit is used for storing the original audio data and/or the target audio data.
Specifically, as shown in fig. 10, the first acquiring unit 901 includes:
a first calculating subunit 9011, configured to calculate an MFCC of the target audio data, to obtain an X × Y coefficient matrix, where X is a frame number of the target audio data, and Y is an MFCC coefficient length;
a first decomposition subunit 9012, configured to decompose the X × Y coefficient matrix by using a singular value decomposition SVD method, to obtain a first target matrix;
a first dimension reduction processing subunit 9013, configured to perform dimension reduction processing on the first target matrix to obtain the target feature vector, where the target feature vector includes an overall trend feature of the target audio data.
By implementing the embodiment, the loss of snore signals can be avoided, the accuracy of snore identification can be improved, and the efficiency of the snore detection device is improved.
Specifically, the step of satisfying the preset period by the period of the target audio data includes:
the time domain period of the target audio data satisfies a preset time domain period, and/or the frequency domain period of the target audio data satisfies a preset frequency domain period.
Specifically, the duration of the target audio data is greater than a preset time.
Specifically, as shown in fig. 11, the apparatus further includes:
a second obtaining unit 904, configured to obtain the SVDD model.
Specifically, as shown in fig. 12, the second obtaining unit 904 includes:
a second calculating subunit 9041, configured to calculate an MFCC of the audio data sample, to obtain an N × L coefficient matrix, where N is a frame number of the audio data sample, and L is an MFCC coefficient length;
a second decomposition subunit 9042, configured to decompose the N × L coefficient matrix by using a singular value decomposition SVD method, to obtain a second target matrix;
a second dimension reduction processing subunit 9043, configured to perform dimension reduction processing on the second target matrix, and obtain a preset number of feature vectors, where the preset number of feature vectors includes an overall trend feature of the audio data sample;
and the training subunit 9044 is configured to train the feature vectors of the preset number by using an SVDD algorithm to obtain the SVDD model.
In the embodiment, the SVDD model is combined with the periodic detection and applied to the snore detecting device such as a home or a portable device, so that the snore detecting device is convenient and fast, the snore detection can be more accurately realized, and beneficial information is provided for the disease diagnosis of a user.
It should be noted that the implementation of each unit may also correspond to the corresponding description of the method embodiments shown in fig. 1 and fig. 7.
Referring to fig. 13, fig. 13 is a schematic structural diagram of a snore detecting device according to another embodiment of the present invention. The snore detecting device in the present embodiment shown in fig. 13 may include: one or more processors 1301; one or more input devices 1302, one or more output devices 1303, and memory 1304. The processor 1301, the input device 1302, the output device 1303, and the memory 1304 are connected by a bus 1305. The memory 1302 is used to store instructions and the processor 1301 is used to execute instructions stored by the memory 1302. Wherein, the processor 1301 is configured to:
acquiring a target feature vector corresponding to target audio data, and inputting the target feature vector into a support vector data description algorithm (SVDD) model to obtain a judgment result; the target feature vector is obtained by performing feature extraction on a Mel Frequency Cepstrum Coefficient (MFCC) of the target audio data;
and under the condition that the period of the target audio data meets a preset period and the judgment result meets a preset result, determining that the target audio data is snore.
It should be understood that, in the embodiment of the present invention, the Processor 1301 may be a Central Processing Unit (CPU), and the Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Input device 1302 may include a touch pad, microphone, etc., such as raw audio data may be obtained through the input device, etc.
The output device 1303 may include a display (LCD, etc.), a speaker, etc., and the discrimination result may be output through the output device, for example.
The memory 1304, which may include both read-only memory and random-access memory, provides instructions and data to the processor 1301. A portion of the memory 1304 may also include non-volatile random access memory. For example, the memory 1304 may also store device type information.
In a specific implementation, the processor 1301, the input device 1302, and the output device 1303 described in the embodiment of the present invention may execute the implementation manners described in the first embodiment and the second embodiment of the snore detecting method provided in the embodiment of the present invention, and may also execute the implementation manner of the snore detecting device described in the embodiment of the present invention, which is not described herein again.
In another embodiment of the present invention, a computer-readable storage medium is provided, which stores a computer program that when executed by a processor implements:
acquiring a target feature vector corresponding to target audio data, and inputting the target feature vector into a support vector data description algorithm (SVDD) model to obtain a judgment result; the target feature vector is obtained by performing feature extraction on a Mel Frequency Cepstrum Coefficient (MFCC) of the target audio data;
and under the condition that the period of the target audio data meets a preset period and the judgment result meets a preset result, determining that the target audio data is snore.
The computer readable storage medium may be an internal storage unit of the snore detecting device according to any of the embodiments, such as a hard disk or a memory of the snore detecting device. The computer readable storage medium may also be an external storage device of the snore detecting device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the snore detecting device. Further, the computer readable storage medium may further include both an internal storage unit and an external storage device of the snoring detection device. The computer readable storage medium is used for storing the computer program and other programs and data required by the snoring detection apparatus. The above-described computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.
Claims (12)
1. A snore detecting method is characterized by comprising the following steps:
acquiring a target feature vector corresponding to target audio data, and inputting the target feature vector into a support vector data description algorithm (SVDD) model to obtain a judgment result; the target feature vector is obtained by performing feature extraction on a Mel Frequency Cepstrum Coefficient (MFCC) of the target audio data;
and determining the target audio data as snore under the condition that the period of the target audio data meets a preset period and the judgment result meets a preset result.
2. The method of claim 1, wherein obtaining the target feature vector corresponding to the target audio data comprises:
calculating the MFCC of the target audio data to obtain an X X Y coefficient matrix, wherein X is the frame number of the target audio data, and Y is the MFCC coefficient length;
decomposing the X X Y coefficient matrix by using a Singular Value Decomposition (SVD) method to obtain a first target matrix;
and after the dimension reduction processing is carried out on the first target matrix, the target characteristic vector is obtained, and the target characteristic vector comprises the integral trend characteristic of the target audio data.
3. The method according to claim 1 or 2, wherein the period of the target audio data satisfying a preset period comprises:
the time domain period of the target audio data meets at least one of a preset time domain period and the frequency domain period of the target audio data meets a preset frequency domain period.
4. The method of claim 1, wherein prior to inputting the target audio data into a support vector data description algorithm (SVDD) model, the method further comprises:
and obtaining the SVDD model.
5. The method of claim 4, wherein said obtaining the SVDD model comprises:
acquiring an audio data sample and calculating the MFCC of the audio data sample to obtain an N x L coefficient matrix, wherein N is the frame number of the audio data sample, and L is the MFCC coefficient length;
decomposing the N x L coefficient matrix by using a Singular Value Decomposition (SVD) method to obtain a second target matrix;
and after the second target matrix is subjected to dimension reduction processing, obtaining a preset number of feature vectors, training the preset number of feature vectors by using an SVDD algorithm, and obtaining the SVDD model, wherein the preset number of feature vectors comprise the integral trend features of the audio data sample.
6. A snore detecting device, comprising:
a first acquisition unit configured to acquire a target feature vector corresponding to target audio data;
an input obtaining unit, configured to input the target feature vector into a support vector data description algorithm SVDD model, so as to obtain a discrimination result, where the target feature vector is a feature vector obtained by performing feature extraction on a mel-frequency cepstrum coefficient MFCC of the target audio data;
and the determining unit is used for determining that the target audio data is snore under the condition that the period of the target audio data meets a preset period and the judgment result meets a preset result.
7. The apparatus of claim 6, wherein the first obtaining unit comprises:
the first calculating subunit is used for calculating the MFCC of the target audio data to obtain an X X Y coefficient matrix, wherein X is the frame number of the target audio data, and Y is the MFCC coefficient length;
the first decomposition subunit is used for decomposing the X X Y coefficient matrix by using a Singular Value Decomposition (SVD) method to obtain a first target matrix;
and the first dimension reduction processing subunit is configured to perform dimension reduction processing on the first target matrix to obtain the target feature vector, where the target feature vector includes an overall trend feature of the target audio data.
8. The apparatus according to claim 6 or 7, wherein the period of the target audio data satisfying a preset period comprises:
the time domain period of the target audio data meets at least one of a preset time domain period and the frequency domain period of the target audio data meets a preset frequency domain period.
9. The apparatus of claim 6, further comprising:
and the second acquisition unit is used for acquiring the SVDD model.
10. The apparatus of claim 9, wherein the second obtaining unit comprises:
the second calculating subunit is used for calculating the MFCC of the audio data sample to obtain an N x L coefficient matrix, wherein N is the frame number of the audio data sample, and L is the MFCC coefficient length;
the second decomposition subunit is used for decomposing the N x L coefficient matrix by using a Singular Value Decomposition (SVD) method to obtain a second target matrix;
the second dimension reduction processing subunit is configured to perform dimension reduction processing on the second target matrix, and then obtain a preset number of feature vectors, where the preset number of feature vectors includes an overall trend feature of the audio data sample;
and the training subunit is used for training the feature vectors of the preset number by using an SVDD algorithm to obtain the SVDD model.
11. A snore detecting device comprising a processor, an input device, an output device, and a memory, wherein:
the processor, the input device, the output device and the memory are interconnected, the memory being configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1 to 5.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710977496.1A CN107898432B (en) | 2017-10-17 | 2017-10-17 | Snore detection method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710977496.1A CN107898432B (en) | 2017-10-17 | 2017-10-17 | Snore detection method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107898432A true CN107898432A (en) | 2018-04-13 |
CN107898432B CN107898432B (en) | 2021-02-26 |
Family
ID=61841592
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710977496.1A Active CN107898432B (en) | 2017-10-17 | 2017-10-17 | Snore detection method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107898432B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109044279A (en) * | 2018-08-20 | 2018-12-21 | 深圳和而泰数据资源与云技术有限公司 | A kind of sound of snoring detection method and relevant device |
CN109982137A (en) * | 2019-02-22 | 2019-07-05 | 北京奇艺世纪科技有限公司 | Model generating method, video marker method, apparatus, terminal and storage medium |
CN111110189A (en) * | 2019-11-13 | 2020-05-08 | 吉林大学 | Anti-snoring device and method based on DSP sound and image recognition technology |
CN111938588A (en) * | 2020-07-24 | 2020-11-17 | 深圳数联天下智能科技有限公司 | Method for detecting sleep state, sleep monitor and storage medium |
CN112272019A (en) * | 2020-10-22 | 2021-01-26 | 广东美的制冷设备有限公司 | Control method, device, household appliance and storage medium of voice-activated knock switch |
CN114515137A (en) * | 2020-11-19 | 2022-05-20 | 纬创资通股份有限公司 | Physiological symptom identification method and physiological symptom sensing system |
CN114631782A (en) * | 2022-03-14 | 2022-06-17 | 南京理工大学 | A method for detecting sleep posture based on microphone array |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120004749A1 (en) * | 2008-12-10 | 2012-01-05 | The University Of Queensland | Multi-parametric analysis of snore sounds for the community screening of sleep apnea with non-gaussianity index |
CN103251388A (en) * | 2013-04-25 | 2013-08-21 | 北京大学深圳研究生院 | Method and system of snoring monitoring and prevention and treatment based on smart phone platform |
CN104688229A (en) * | 2015-01-28 | 2015-06-10 | 中国人民解放军理工大学 | Method for monitoring sleep respiration based on snore signals |
CN108369813A (en) * | 2017-07-31 | 2018-08-03 | 深圳和而泰智能家居科技有限公司 | Specific sound recognition methods, equipment and storage medium |
-
2017
- 2017-10-17 CN CN201710977496.1A patent/CN107898432B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120004749A1 (en) * | 2008-12-10 | 2012-01-05 | The University Of Queensland | Multi-parametric analysis of snore sounds for the community screening of sleep apnea with non-gaussianity index |
CN103251388A (en) * | 2013-04-25 | 2013-08-21 | 北京大学深圳研究生院 | Method and system of snoring monitoring and prevention and treatment based on smart phone platform |
CN104688229A (en) * | 2015-01-28 | 2015-06-10 | 中国人民解放军理工大学 | Method for monitoring sleep respiration based on snore signals |
CN108369813A (en) * | 2017-07-31 | 2018-08-03 | 深圳和而泰智能家居科技有限公司 | Specific sound recognition methods, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
孟泽民 等: "基于神经网络的家用型睡眠呼吸暂停综合症监测系统", 《计算机系统应用》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109044279A (en) * | 2018-08-20 | 2018-12-21 | 深圳和而泰数据资源与云技术有限公司 | A kind of sound of snoring detection method and relevant device |
CN109982137A (en) * | 2019-02-22 | 2019-07-05 | 北京奇艺世纪科技有限公司 | Model generating method, video marker method, apparatus, terminal and storage medium |
CN111110189A (en) * | 2019-11-13 | 2020-05-08 | 吉林大学 | Anti-snoring device and method based on DSP sound and image recognition technology |
CN111110189B (en) * | 2019-11-13 | 2021-11-09 | 吉林大学 | Anti-snoring device and method based on DSP sound and image recognition technology |
CN111938588A (en) * | 2020-07-24 | 2020-11-17 | 深圳数联天下智能科技有限公司 | Method for detecting sleep state, sleep monitor and storage medium |
CN112272019A (en) * | 2020-10-22 | 2021-01-26 | 广东美的制冷设备有限公司 | Control method, device, household appliance and storage medium of voice-activated knock switch |
CN112272019B (en) * | 2020-10-22 | 2024-12-03 | 广东美的制冷设备有限公司 | Control method, device, household appliance and storage medium of voice-controlled knock switch |
CN114515137A (en) * | 2020-11-19 | 2022-05-20 | 纬创资通股份有限公司 | Physiological symptom identification method and physiological symptom sensing system |
CN114515137B (en) * | 2020-11-19 | 2024-04-19 | 纬创资通股份有限公司 | Gastrointestinal disease symptom identification method and gastrointestinal disease symptom sensing system |
CN114631782A (en) * | 2022-03-14 | 2022-06-17 | 南京理工大学 | A method for detecting sleep posture based on microphone array |
Also Published As
Publication number | Publication date |
---|---|
CN107898432B (en) | 2021-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107898432B (en) | Snore detection method, device, equipment and medium | |
CN107910020B (en) | Snore detection method, device, equipment and storage medium | |
Ma et al. | Lungbrn: A smart digital stethoscope for detecting respiratory disease using bi-resnet deep learning algorithm | |
CN103251388B (en) | Method and system of snoring monitoring and prevention and treatment based on smart phone platform | |
Lin et al. | Automatic wheezing detection using speech recognition technique | |
Nallanthighal et al. | Deep learning architectures for estimating breathing signal and respiratory parameters from speech recordings | |
CN110123367B (en) | Computer device, heart sound recognition method, model training device, and storage medium | |
Lim et al. | Classification of snoring sound based on a recurrent neural network | |
WO2019023879A1 (en) | Cough sound recognition method and device, and storage medium | |
WO2019127088A1 (en) | Snore recognition method and snore-stopping device | |
CN113974607B (en) | Sleep snore detecting system based on pulse neural network | |
CN110942784A (en) | Snore classification system based on support vector machine | |
US12207903B2 (en) | Diagnosis of medical conditions using voice recordings and auscultation | |
He et al. | A novel snore detection and suppression method for a flexible patch with MEMS microphone and accelerometer | |
Luo et al. | Design of embedded real-time system for snoring and OSA detection based on machine learning | |
Farzam et al. | The diagnosis of heart diseases based on PCG signals using MFCC coefficients and SVM classifier | |
Fang et al. | Monitoring of Sleep Breathing States Based on Audio Sensor Utilizing Mel‐Scale Features in Home Healthcare | |
Casaseca-de-la-Higuera et al. | Effect of downsampling and compressive sensing on audio-based continuous cough monitoring | |
Wang et al. | Feature extraction based on mel-scaled wavelet transform for heart sound analysis | |
CN114027801B (en) | A method and system for sleep snoring recognition and snoring suppression | |
Albornoz et al. | Snore recognition using a reduced set of spectral features | |
Zhao et al. | Analysis of sleeping respiratory signal utilizing frequency energy features | |
Kalantarian et al. | A smartwatch-based system for audio-based monitoring of dietary habits | |
Le et al. | Cough sound recognition based on Hilbert marginal spectrum | |
Vaishnavi et al. | An automatic approach to extract features from the infant’s cry signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |