CN116312647A

CN116312647A - Night tooth real-time diagnosis method and system based on non-language audio feature recognition

Info

Publication number: CN116312647A
Application number: CN202310091119.3A
Authority: CN
Inventors: 周伟; 刘骁
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2023-02-09
Filing date: 2023-02-09
Publication date: 2023-06-23

Abstract

The invention belongs to the technical field of computer-aided diagnosis, and particularly relates to a night tooth real-time diagnosis method and system based on non-language audio feature recognition. The invention includes collecting audio in a sleep state environment; converting the audio frequency into a short-time stable signal by a framing method; extracting various time domain features and various frequency domain features from the audio sample, and summarizing feature groups of variance distribution and mean distribution by combining various feature arrays of each different frame; the best predictive model for night tooth diagnosis is preferred by a machine learning model. The invention can identify occurrence of bruxism and occurrence of other sleep states with high accuracy in a complex sleep environment, does not need more computing resources to carry out pretreatment such as denoising and filtering on audio signals, can realize non-contact monitoring and diagnosis on bruxism with a simple process, can avoid uncomfortable feeling of patients, and provides practical reference and theoretical analysis basis for implementing intervention of subsequent bruxism.

Description

Night tooth real-time diagnosis method and system based on non-language audio feature recognition

Technical Field

The invention belongs to the technical field of computer-aided diagnosis, and particularly relates to a night tooth real-time diagnosis method and system based on non-language audio feature recognition.

Background

According to the international consensus definition in 2018 regarding the evaluation of bruxism, night bruxism is a rhythmic (staged) or rhythmic (tonic) activity of masticatory muscles during sleep in humans, not dyskinesias and sleep disorders in healthy individuals, but increasing the risk of other diseases such as tooth wear, masticatory muscle hypertrophy, temporomandibular joint dysfunction, facial muscle soreness, even resulting in headaches and the like. According to the research of reliable documents, the population suffering from bruxism is widely distributed, and the proportion of the population is 8% in teenagers. In order to reduce the negative influence of the bruxism on other people and the bruxism, the accurate diagnosis and treatment of the bruxism are realized, and the bruxism has important significance for life and health.

The current diagnostic methods for bruxism are mainly: questionnaires, clinical evaluations, and portable device diagnostics. The questionnaire and clinical assessment thereof are too low for the diagnostic specificity of bruxism and are assessment after conditions that have been caused by bruxism damage, and do not prevent the occurrence of bruxism; the portable equipment based on the surface electromyographic signal SEMG is widely studied, the real-time diagnosis of the bruxism based on electromyographic signals and the detection of the bruxism degree of the bruxism are improved in a certain range, but the existing algorithm for diagnosing the bruxism based on the surface electromyographic signals at home and abroad is excessively dependent on equipment for collecting the electromyographic signals, namely the electromyographic report record of the bruxism lacks authority and uniformity, and the algorithm for identifying the bruxism is only suitable for equipment for collecting the electromyographic signals developed by authors who propose the algorithm and is not suitable for identifying night bruxism signals collected by other electromyographic equipment, so the night bruxism data disclosed at home and abroad and the mature portable equipment based on the surface electromyographic signal for diagnosing the bruxism are almost absent. Most importantly, for the current population of bruxism, whether it is a healthy individual (i.e. the bruxism does not pose a pathological hazard to the patient) or is a majority, most healthy bruxism patients do not have serious pathological behaviors, only afflictions such as bruxism partners or rest, in which case the healthy bruxism patients wear contact devices such as myoelectric sensors, pressure sensors, etc. and most patients feel sleep discomfort.

Therefore, there is a need for a more sophisticated non-contact diagnostic method to detect the occurrence of bruxism and to diagnose the occurrence of bruxism in real time, which makes a substantial contribution from the source to the reduction of bruxism, thereby meeting the needs of a wide range of healthy bruxism patients, while also monitoring other sleep states such as sneeze, yawning, sighing, moan, throat clearing, etc.

Disclosure of Invention

In order to improve the existing method for diagnosing bruxism and overcome the defects that the specificity of the bruxism is low and the non-contact real-time diagnosis cannot be realized in the current clinical diagnosis, the invention provides a portable and non-contact night tooth (including relevant sleep states and the same below) real-time diagnosis method and system based on non-language audio feature recognition.

The night-tooth real-time diagnosis method and system based on non-language audio feature recognition not only can rapidly recognize original lossless audio, but also has stable robustness and accuracy. The invention can monitor the occurrence of bruxism in real time, adds a novel applicability algorithm for portable equipment for diagnosing bruxism, and provides important basis for the research of the advanced prevention and treatment of the subsequent bruxism.

The invention provides a non-language audio feature recognition-based night tooth grinding real-time diagnosis method, which comprises the following specific steps:

step S1, audio acquisition: an audio acquisition module is used for acquiring audio data in a sleep environment;

step S2, framing and downsampling: converting the collected audio data into a short-time stable audio signal by a framing method on a time domain, and reducing the sampling frequency by half;

step S3, feature extraction: extracting time domain features and frequency domain features from the audio samples after framing;

step S4, feature group induction: summarizing the characteristic groups of various time domain variance distribution and mean distribution and the characteristic groups of various frequency domain variance distribution and mean distribution according to various characteristic arrays of each different frame;

step S5, optimizing a machine learning model: based on the above feature sets of time domain, frequency domain and time frequency domain, inputting a plurality of machine learning models, selecting a premolars optimal model for diagnostic evaluation of the premolars (and other sleep states thereof).

Further:

in step S1, the audio acquisition module acquires frequency data in a sleep environment, which means that in a natural sleep environment, a microphone is used to acquire sound in the sleep environment, the microphone is fixed in a central region of a bed and is 30-50cm away from the vertical upper part of the head of a nocturnal molar patient, single-channel data with the sampling frequency of 44100HZ is obtained, the number of sampling bits is 16, the audio format is a WAV lossless audio format, and the length of a single-time captured audio data sample is 1S.

In step S2, the collected audio data is converted into a short-time stable audio signal by a framing method in a time domain, and the sampling frequency is reduced by half; the method specifically comprises the following steps: downsampling the collected audio data; the non-stationary audio signal is converted into a short-time stationary audio signal by a framing method in a time domain, the sampled data volume of one frame is 1024 points, and the movement of each framing is 512 point step sizes.

In step S3, the extracting the time domain feature and the frequency domain feature from the audio sample after framing includes the following sub-steps:

step S3-1, extracting time domain features, wherein the time domain features are AE (Amplitude envelope), EMS (Root-mean square energy) and ZCR (Zero crossing rate), and the specific calculation modes are as follows:

the AE formula is:

the EMS calculation formula is:

the ZCR calculation formula is:

in the above formula, s (K) represents a speech signal, K is coordinates of sampling points of the speech signal, K represents sampling points of a frame of speech signal, and t represents a speech signal of a t-th frame; i.e. tk represents the position of the first sample point of the t frame, (t+1) k-1 represents the position of the last sample point of the t frame, sgn is a sign function;

step S3-2, extracting frequency domain features, namely BER (Band Energy Ratio), SC (Spectral centroid), BW (Bandwidth) and SR (Spectral Roll-off), wherein the frequency domain features are calculated as follows:

BER is calculated as:

the SC calculation formula is:

BW is calculated as:

the SR calculation formula is:

in the above formula, t represents the speech signal of the t frame, m _t (n) represents the signal of the t frame, n is the n sampling point, F represents the selection of the frequency band ratioN represents the number of sampling points per frame, f _c Represents spectral points satisfying inequality, m _i Representing energy.

In terms of frequency domain features, complementary harmonic and perceptual (Harmonics and Perceptrual) features are calculated as:

H(t)＝A * sin(2 * pi * f * t + phi)， (8)

S＝K*(10 ^L/10 )， (9)

h (t) is the amplitude function of the harmonic, representing the amplitude of the harmonic at time t. A is the amplitude of the harmonic and represents the maximum amplitude of the harmonic. f is the frequency of the harmonic in hertz. t is time and the unit is seconds. phi is the phase of the harmonic, representing the phase difference of the harmonic with respect to the fundamental frequency. S is the sound pressure level, L is the sound intensity level, and K is a constant.

And S3-3, extracting time-frequency domain characteristics, namely the mel cepstrum coefficient (MFCC). Performing FFT (fast Fourier transform) on each frame in the step S2 to obtain a frequency spectrum, further obtaining a magnitude spectrum, adding a Mel filter bank to the magnitude spectrum, performing logarithmic operation on the output of the filter bank, and finally further performing Discrete Cosine Transform (DCT) to obtain an MFCC, wherein the expression is as follows:

l is the number of filters, (10)

Where i is the i-th MFCC coefficient, N is the number of sampling points per frame, L is the i-th filter, m (L) represents the output of each filter, and L is the total number of filters.

In step S4, according to the multiple feature arrays of each different frame, summarizing multiple feature sets of time domain variance distribution and mean distribution, and multiple feature sets of frequency domain variance distribution and mean distribution, specifically, according to the feature calculation formula in step S3, calculating features of each frame to obtain feature arrays; for example, for a feature EMS, a feature array is obtained, denoted EMS-1, EMS-2, EMS-3, EMS-4 … … EMS-i, where i is the number of frames; obtaining feature sets of variance distribution and Mean distribution, such as EMS_mean and MES_Var, through the feature arrays; etc. Through the feature array, feature sets (such as ems_mean, mes_var) of variance distribution and Mean distribution are obtained, so that the following feature total number can be obtained, wherein the total feature total number is 56 feature values, and specifically:

'envelope_mean','envelope_var','RMS_mean','RMS_var','zero_mean','zero_var',

'centroids_mean','centroids_var','bandwith_mean','bandwidth_var','rolloff_mean',

'rolloff_var','harmonics_mean','harmonics_var','perceptrual_mean','perceptrual_var',

'mfcc1_mean','mfcc2_mean','mfcc3_mean','mfcc4_mean','mfcc5_mean','mfcc6_mean',

'mfcc7_mean','mfcc8_mean','mfcc9_mean','mfcc10_mean','mfcc11_mean','mfcc12_mean',

'mfcc13_mean','mfcc14_mean','mfcc15_mean','mfcc16_mean','mfcc17_mean','mfcc18_mean','mfcc19_mean','mfcc20_mean',

'mfcc1_var','mfcc2_var','mfcc3_var','mfcc4_var','mfcc5_var','mfcc6_var','mfcc7_var',

'mfcc8_var','mfcc9_var','mfcc10_var','mfcc11_var','mfcc12_var','mfcc13_var',

'mfcc14_var','mfcc15_var','mfcc16_var','mfcc17_var','mfcc18_var','mfcc18_var','mfcc20_var'；

in step S5, the machine learning method is used to construct an optimal model for the diagnosis of the night tooth and the prediction under other sleep states, and the model is specifically used for the diagnosis of the night tooth and the evaluation under other sleep states:

after the correlation analysis of the feature values, the feature values in step S4 are input into a typical machine learning network model, including but not limited to: bayesian model [ ]

Bayes), K-nearest neighbor (KNN), random gradient descent (Stochastic Gradient Descent), decision tree (precision), random Forest (Random Forest), support vector machine (support Vector Machine), logistic regression (Logistic Regression), neural network (NeuralNet), catboost, cross Gradient Booster, cross Gradient Booster (Random Forest); in comparison to these 11 model comparisons, the best model selection in applications where non-verbal audio feature recognition diagnoses bruxism in real time and other sleep states thereof is selected.

The standard for selecting the optimal model from different models is based on the value of the F-Score, the value of the F-Score is based on the comprehensive weighing value of the Precision and Recall, and the F-Score is introduced as a comprehensive index, namely, in order to balance the influence of the Precision and Recall, a classifier is comprehensively evaluated. F-score is the harmonic average of precision and recall.

Calculation formula for F-score (F1):

wherein TP, TN, FP and FN are the numbers of true positives, true negatives, false positives and false negatives, respectively. Wherein, true Positive (TP) is a positive sample predicted by the model as a positive class, true Negative (TN) is a negative sample predicted by the model as a negative class, false Positive (FP) is a negative sample predicted by the model as a positive class, and False Negative (FN) is a positive sample predicted by the model as a negative class.

In the invention, the working flow of machine learning network model training is as follows: carrying out data enhancement on the audio sample to obtain a data set, and dividing the data set into a training set and a testing set; the training set data set is subjected to downsampling and framing operation, so that the follow-up algorithm is convenient to carry out; after framing each data set, obtaining a feature array of each sample according to the feature value formula; after carrying out correlation analysis on the characteristic values, inputting the characteristic values into different network models for training analysis, and verifying the advantages and disadvantages of different machine learning network models by using a test set; and selecting an optimal network model according to the F1 index, and analyzing the predicted results of the molar and other sleep states.

The working flow of the invention is as follows: under a sleeping environment, an audio acquisition module close to a patient acquires audio sounds during sleeping; the audio acquisition module singly adopts 1s of audio data; the MCU performs downsampling and framing analysis on the audio data; the signal processing unit performs feature analysis on the processed audio data; and inputting the characteristic data into an optimal model with well trained characteristics, and obtaining an output result by the model in real time, thereby realizing diagnosis and prediction of the night-time teeth grinding and other sleep states.

The invention also provides a night tooth real-time diagnosis system based on the night tooth real-time diagnosis method. The night tooth real-time diagnosis system comprises the following modules: the system comprises an audio acquisition module, a framing and downsampling module, a feature extraction module, a feature group homing module and a machine learning model optimizing module; these 5 modules in turn perform the operations of 5 steps of the night tooth real-time diagnostic method. The audio acquisition module executes the audio acquisition of the step S1; the framing and downsampling module performs framing and downsampling of the step S2; the feature extraction module performs feature extraction in the step S3; the feature group reduction module executes feature group reduction of the step S4; the machine learning model preference module performs machine learning model preference of step S5.

The night tooth real-time diagnosis method and system based on non-language audio feature recognition provided by the invention have the following advantages:

1. the invention can diagnose the occurrence of bruxism and other sleep states in real time by rapidly recognizing the audio signal, the audio acquisition system is not in actual contact with patients, and compared with other various attached portable devices and oral built-in devices, the invention can not cause any discomfort to the sleep of patients, thereby meeting the requirements of patients without pathological bruxism;

2. the invention can rapidly diagnose the occurrence of bruxism in real time by rapidly identifying the audio signal, can rapidly make diagnosis instructions when bruxism occurs, avoids various diagnosis which are summarized in the past on the whole night data, and does not actually slow down the harm of bruxism to human body;

3. the invention adopts the audio in the standard lossless WAV format as the input signal, thereby ensuring the accuracy and reliability of data and the reproducibility of experiments;

4. the data set comes from an open source database, and the experimental data is rich and has wide applicability;

5. the algorithm part fully excavates the machine learning characteristics, adopts various commonly used machine learning models, and performs comparison analysis;

6. the algorithm may be highly specific and robust. The method can distinguish various noise related to tooth grinding sounds and other sleep states in a natural sleep environment with high accuracy, including sneezing, screaming, groaning, crying, yawning, tongue clicking, laughing, lip beeping 22228, throat clearing, mouth cleaning, nasal discharge blowing, cough, sighing, tooth grinding, tooth trembling and breathing in total 16 sleep states;

7. the intelligent monitoring system can monitor the onset of the molar in real time, can be matched with the key information such as the statistics of the number of times of the molar, the energy ratio and the like, has important significance for the molar diagnosis report and the later guiding treatment, and has the function of monitoring other sleep states.

Drawings

FIG. 1 is a block diagram of a real-time diagnostic night molar framework based on non-verbal audio feature recognition in accordance with the present invention.

FIG. 2 is a comparative analysis of network model selection according to the present invention.

Fig. 3 is a diagram of an optimal network recognition molar sounds (including other sleep states) confusion matrix.

Fig. 4 is a signal flow diagram identifying molars and other sleep states.

Fig. 5 is a schematic diagram of hardware in an example of identifying molars and other sleep states thereof.

Detailed Description

The technical solutions of the embodiments of the present invention will be fully and clearly described below with reference to the accompanying drawings in the embodiments, and it is apparent that the described embodiments can be used only as a representative embodiment of the present invention, not as an entire embodiment. Based on the embodiments of the present invention, those of ordinary skill in the art may obtain other embodiments without creative effort, which fall within the protection scope of the present invention.

Embodiments of the present invention comprise two parts: a network model training portion and an implementation instance portion. The process of training the network model comprises the following steps:

step S1, an audio acquisition module is used for acquiring audio data in a sleep environment;

step S2, converting the acquired audio data into a short-time stable audio signal by a framing method on a time domain, and reducing the sampling frequency by half;

s3, carrying out time domain features and frequency domain features on the audio samples after framing and extracting time-frequency domain features thereof;

step S4, summarizing the characteristic groups of various time domain variance distribution and mean distribution and the characteristic groups of various frequency domain variance distribution and mean distribution by combining the various characteristic groups of each different frame;

and S5, constructing a prediction optimal model of the night tooth and other sleep states thereof by a machine learning method based on the time domain, the frequency domain and the time-frequency domain feature groups, and diagnosing the night tooth and evaluating the night tooth and other sleep states thereof.

In the S1 step, an audio acquisition module is used for acquiring audio data in a sleep environment, sound in the sleep environment is acquired through a microphone in a natural sleep environment, the microphone is fixed in a central region of a bed and is positioned at a position which is 30-50cm away from the vertical upper part of the head of a nocturnal molar patient, single-channel data with the sampling frequency of 44100HZ of audio data are obtained, the number of sampling bits is 16, the audio format is a WAV lossless audio format, and the length of a single-time captured audio data sample is 1S.

S2 comprises the following two substeps: in step S2-1, down-sampling the collected audio data; in step S2-1, the non-stationary audio signal is converted into a short-time stationary audio signal by a framing method in a time domain, and the sampled frame data size is 1024 points, and the frame movement is 512 point step sizes each time.

The feature extraction part of step S3 includes the following sub-steps:

step S3-1 extracts the time domain features after step S2, the time domain features AE, EMS, ZCR and the calculation method are as follows:

the belonged AE (Amplitude envelope) is calculated by the following steps:

the EMS (Root-mean square energy) calculation mode is as follows:

the ZCR (Zero crossing rate) calculation mode is as follows:

in the above formula, s (K) represents a speech signal, K is coordinates of sampling points of the speech signal, K represents sampling points of a frame of speech signal, and t represents a speech signal of a t-th frame; i.e. tk represents the position of the first sample point of the t-th frame, (t+1) k-1 represents the position of the last sample point of the t-th frame, sgn is a sign function.

Step S3-2 extracts the frequency domain features BER, SC, BW, SR after step S2 based on the following calculation:

the BER (Band Energy Ratio) calculation mode is as follows:

the SC (Spectral centroid) calculation mode is as follows:

the BW (Bandwidth) calculation mode is as follows:

the SR (Spectral Roll-off) calculation mode is as follows:

in the above formula, t represents the speech signal of the t frame, m _t (N) represents the signal of the t frame, N is the N sampling point, F represents the separation frequency selected from the frequency band ratio, N represents the sampling point of each frame, F _c Represents spectral points satisfying inequality, m _i Representing energy.

In terms of frequency domain features, complementary harmonic and perceptual (Harmonics and perception) features are calculated as:

H(t)＝A*sin(2*pi*f*t+phi)，

S＝K*(10 ^L/10 )，

l is the number of the filters, and the number of the filters is equal to the number of the filters,

In step S4, based on the feature calculation formula in claim 4, the feature array (e.g., [ EMS-1, EMS-2, EMS-3, EMS-4, … … EMS-i ], where i is the number of frames) is obtained by calculating the features (e.g., feature EMS) of each frame, and the feature sets (e.g., ems_mean, mes_var) of variance distribution and Mean distribution are obtained by the feature array, so that the total number of features is 56.

'envelope_mean','envelope_var','RMS_mean','RMS_var','zero_mean','zero_var',

'mfcc1_mean','mfcc2_mean','mfcc3_mean','mfcc4_mean','mfcc5_mean','mfcc6_mean',

'mfcc8_var','mfcc9_var','mfcc10_var','mfcc11_var','mfcc12_var','mfcc13_var',

The summary features are as follows:

feature class	envelope_	RMS	zero	Centroids	Bandwidth
						Quantity of	2	2	2	2	2
Feature class	Roll-off	harmonics_	Perceptrual	MFCC	Totals to
						Quantity of	2	2	2	40	56

。

In step S5, after the correlation analysis of the feature values, the feature values in step S4 are input into a typical machine learning network model, including but not limited to: bayesian model [ ]

Bayes), K-nearest neighbor (KNN), random gradient descent (Stochastic Gradient Descent), decision tree (precision), random Forest (Random Forest), support vector machine (support Vector Machine), logicRegression (Logistic Regression), neural networks (Neural networks), catboost, cross Gradient Booster, cross Gradient Booster (Random Forest); comparing these 11 model comparisons, the analysis resulted in the best model selection in the application of non-verbal audio feature recognition to diagnose night-time teeth and other sleep states thereof in real time.

In step S5, in the method for selecting the optimal model from different models for real-time diagnosis of night-time teeth and other sleep states by non-linguistic audio feature recognition, the optimal criteria is selected based on the value of the F-Score, which is based on the comprehensive weighted value of the Precision and Recall (Recall), and the F-Score is introduced as a comprehensive index, i.e. a classifier is evaluated more comprehensively in order to balance the influence of the Precision and Recall. F-score is the harmonic average of precision and recall.

The calculation mode of F1 (namely F-score) is as follows:

wherein, TP, TN, FP and FN are the numbers of true positives, true negatives, false positives and false negatives respectively. Wherein TP, TN, FP and FN are the numbers of true positives, true negatives, false positives and false negatives, respectively. Where TP is a positive sample that is model predicted as a positive class, TN is a negative sample that is model predicted as a negative class, FP is a negative sample that is model predicted as a positive class, and FN is a positive sample that is model predicted as a negative class.

Bayes), K-nearest neighbor (KNN), random gradient descent (Stochastic Gradient Desc)ent), decision tree (precision), random Forest (Random Forest), support vector machine (support Vector Machine), logistic regression (Logistic Regression), neural network (Neural Nets), catboost, cross Gradient Booster, cross Gradient Booster (Random Forest); comparing these 11 model comparisons, the analysis resulted in the best model selection in the application of non-verbal audio feature recognition to diagnose night-time teeth and other sleep states thereof in real time.

In the method for selecting the optimal model from different models for diagnosing the night-time teeth and other sleep states of the night-time teeth by non-language audio feature recognition, the optimal standard is based on the value of the F-Score, the F-Score is based on the comprehensive balance value of the Precision and Recall rate, and the F-Score is introduced as the comprehensive index, namely, a classifier is comprehensively evaluated for balancing the influence of the Precision and Recall rate. F-score is the harmonic average of precision and recall.

The calculation mode of F1 (namely F-score) is as follows:

precision ratio of _：

Recall rate of recall _： />

The above network model combined with the selection F1-score criteria summarized the results as follows:

	Model	Score
				0	Cat-boost	0.88827
1	Cross Gradient Booster	0.84358
			2	Random Forest	0.83240
3	Support Vector Machine	0.76536
			4	Logistic Regression	0.74302
5	Cross Gradient Booster(Random Forest)	0.72905
			6	Neural Nets	0.71229
7	KNN	0.62011
			8	Naive Bayes	0.60335
9	Stochastic Gradient Descent	0.58380
			10	Decision trees	0.58380

among these, the scoring criteria for Catboost in the 11 commonly used machine learning network models are most prominent, and predictions of the accuracy and recall of the test set for sleep states in molars (including test-tooth and test-tooth) and 14 are summarized in the following table:

	Precision	Recall	F1-score
				coughing	0.85	0.96	0.90
crying	1.00	0.84	0.91
				laughing	0.77	0.59	0.67
lip-popping	0.90	0.96	0.93
				lip-smacking	0.79	0.83	0.81
moaning	0.85	0.88	0.87
				nose-blowing	0.88	0.84	0.86
panting	0.92	0.92	0.92
				screaming	1.00	0.97	0.98
sighing	1.00	0.80	0.89
				sneezing	0.93	0.81	0.87
teeth-chattering	0.94	0.91	0.92
				teeth-grinding	0.83	0.86	0.84
throat-clearing	0.71	0.95	0.82
				tongue-clicking	0.96	0.96	0.96
yawning	0.94	0.89	0.91

。

embodiments of the present invention comprise two parts: a network model training portion and an implementation instance portion. The network model training part is shown in fig. 1, and the implemented process can build the following simple system in the manner of fig. 5:

the invention is based on the method of the non-language audio frequency characteristic recognition and the real-time diagnosis of the night-time tooth and other sleep states, adopting an audio acquisition module, an MCU microcontroller, a power module, a signal processing unit, a storage module and an OLCD display screen; the MCU drives an audio acquisition module to acquire 1s of unit audio signal each time, the format is lossless WAV standard format audio, and the sampling frequency is 44100HZ; after the audio module acquires the unit audio, the signal processing unit calculates the characteristic group of the unit audio and stores the data in the memory; inputting the characteristic group into the trained neural network, outputting specific indication parameters corresponding to the specific sleep molar state and reserving the indication parameters. The OLCD display module is used for realizing the identification instruction of the occurrence of the molar. The MCU identifies the occurrence of tooth grinding through the algorithm processing of fig. 4, and the identification result instruction is displayed on a display screen, so that the OLCD can carry out decision identification on each input signal under the control of the MCU and display the signals in real time; the storage module is used for storing the total count of the occurrence times of the whole late teeth grinding and storing the characteristic parameters. Providing reliable data basis for doctor's evaluation and subsequent treatment.

Claims

1. A non-language audio feature recognition-based night tooth real-time diagnosis method is characterized by comprising the following specific steps:

step S5, optimizing a machine learning model: based on the time domain, the frequency domain and the time-frequency domain feature sets, inputting a plurality of machine learning models, and selecting a premolars optimal model for diagnosing and evaluating the premolars.

2. The method according to claim 1, wherein the step S1 of collecting frequency data in a sleep environment by using an audio collection module refers to collecting sound in the sleep environment by a microphone in a natural sleep environment, fixing the microphone in a central region of a bed, and obtaining single-channel data with a sampling frequency of 44100HZ at a distance of 30-50cm above a head of a patient with the night-time teeth, wherein the number of sampling bits is 16, the audio format is a lossless audio format of WAV, and the sample length of audio data captured once is 1S.

3. The method according to claim 2, wherein in step S2, the collected audio data is converted into a short-time stationary audio signal by a framing method in a time domain, and the sampling frequency is reduced by half; the method specifically comprises the following steps: downsampling the collected audio data; the non-stationary audio signal is converted into a short-time stationary audio signal by a framing method in a time domain, the sampled data volume of one frame is 1024 points, and the movement of each framing is 512 point step sizes.

4. The method according to claim 3, wherein the step S3 of extracting the time-domain features and the frequency-domain features from the audio samples after the framing, and the time-domain features, comprises the sub-steps of:

step S3-1, extracting time domain features, wherein the time domain features are AE, EMS, ZCR, and the specific calculation mode is as follows:

the AE formula is:

the EMS calculation formula is:

the ZCR calculation formula is:

wherein s (K) represents a voice signal, K is a coordinate of a sampling point number of the voice signal, K represents a sampling point number of a voice signal of a frame, and t represents a voice signal of a t frame; i.e. tk represents the position of the first sampling point of the t frame, (t+1) k-1 represents the position of the last sampling point of the t frame, sgn is a sign function;

step S3-2, extracting frequency domain features, wherein the frequency domain features are BER, SC, BW, SR, and the calculation mode is as follows:

BER is calculated as:

the SC calculation formula is:

BW is calculated as:

the SR calculation formula is:

in the above formula, t represents the speech signal of the t frame, m _t (N) represents the signal of the t frame, N is the N sampling point, F represents the separation frequency selected from the frequency band ratio, N represents the sampling point of each frame, F _c Represents spectral points satisfying inequality, m _i Represents energy;

H(t)＝A*sin(2*pi*f*t+phi)， (8)

S＝K*(10 ^L/10 )， (9)

wherein H (t) is the amplitude function of the harmonic, representing the amplitude of the harmonic at time t; a is the amplitude of the harmonic, representing the maximum amplitude of the harmonic; f is the frequency of the harmonic in hertz; t is time in seconds; phi is the phase of the harmonic, representing the phase difference of the harmonic relative to the fundamental frequency; s is sound pressure level, L is sound intensity level, K is constant;

s3-3, extracting time-frequency domain characteristics, namely, mel cepstrum coefficient (MFCC); performing FFT (fast Fourier transform) on each frame in the step S2 to obtain a frequency spectrum, further obtaining a magnitude spectrum, adding a Mel filter bank to the magnitude spectrum, performing logarithmic operation on the output of the filter bank, and finally further performing Discrete Cosine Transform (DCT) to obtain an MFCC, wherein the expression is as follows:

5. The method according to claim 4, wherein in step S4, the feature sets of the time domain variance distribution and the mean distribution and the feature sets of the frequency domain variance distribution and the mean distribution are summarized according to the feature arrays of each different frame, specifically, the feature of each frame is calculated according to the feature calculation formula in step S3, so as to obtain the feature arrays; obtaining a feature set of variance distribution and mean distribution through a feature array; the feature sets of variance distribution and mean distribution are obtained through the feature array, so that the following feature total number is obtained, and 56 feature values are obtained in total, specifically:

'envelope_mean','envelope_var','RMS_mean','RMS_var','zero_mean','zero_var',

'mfcc1_mean','mfcc2_mean','mfcc3_mean','mfcc4_mean','mfcc5_mean','mfcc6_mean',

'mfcc8_var','mfcc9_var','mfcc10_var','mfcc11_var','mfcc12_var','mfcc13_var',

'mfcc14_var','mfcc15_var','mfcc16_var','mfcc17_var','mfcc18_var','mfcc18_var','mfcc20_var'。

6. the method according to claim 5, wherein in step S5, the machine learning method is used to construct a predicted optimal model of the night tooth and other sleep states thereof for diagnosis of the night tooth and evaluation of the other sleep states thereof, specifically:

inputting the characteristic values in the step S4 into a plurality of machine learning network models; comparing the multiple models, and selecting an optimal model in the real-time diagnosis of the night-time teeth grinding; the criteria for selecting the optimal model from the different models is the value of F-score; the F-score value is a composite tradeoff between Precision and Recall, and the F-score is the harmonic average of Precision and Recall:

wherein TP, TN, FP and FN are the numbers of true positives, true negatives, false positives and false negatives respectively; f1 is F-score.

7. The method of claim 6, wherein the machine learning network model comprises a bayesian model, K-nearest neighbor, random gradient descent, decision tree, random forest, support vector machine, logistic regression, neural network, catboost, cross Gradient Booster, cross Gradient Booster.

8. A real-time diagnostic system for bruxism based on non-verbal audio feature recognition based on the real-time diagnostic method for bruxism according to one of claims 1 to 7, comprising the following modules: the system comprises an audio acquisition module, a framing and downsampling module, a feature extraction module, a feature group homing module and a machine learning model optimizing module; these 5 modules in turn perform the operations of 5 steps of the night tooth real-time diagnostic method.