WO2024008045A1 - 一种基于非对称卷积的噪声性听力损失预测系统 - Google Patents

一种基于非对称卷积的噪声性听力损失预测系统 Download PDF

Info

Publication number
WO2024008045A1
WO2024008045A1 PCT/CN2023/105569 CN2023105569W WO2024008045A1 WO 2024008045 A1 WO2024008045 A1 WO 2024008045A1 CN 2023105569 W CN2023105569 W CN 2023105569W WO 2024008045 A1 WO2024008045 A1 WO 2024008045A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
features
hearing loss
module
induced hearing
Prior art date
Application number
PCT/CN2023/105569
Other languages
English (en)
French (fr)
Inventor
田雨
周天舒
李劲松
赵浩淇
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2024008045A1 publication Critical patent/WO2024008045A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/12Audiometering
    • A61B5/121Audiometering evaluating hearing capacity
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7271Specific aspects of physiological measurement analysis
    • A61B5/7275Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Definitions

  • the invention relates to the field of medical information technology, and in particular to a noise-induced hearing loss prediction system based on asymmetric convolution.
  • Hearing loss is a major public health problem facing the world. Hearing damage can lead to long-term deficits in language cognition, comprehension, and social adaptability. Occupational exposure to complex noise is one of the main causes of hearing loss, and prolonged exposure to dangerous levels of noise can cause permanent hearing damage.
  • Noise is divided into steady-state noise and non-steady-state noise.
  • the current standard for assessing noise-induced hearing loss is the International Noise Exposure Standard (ISO-1999), which was established based on steady-state noise data from the 1950s to 1960s. It is therefore insensitive to the type of noise exposure and will underestimate the hearing loss caused by complex noise.
  • the A-weighted equivalent sound pressure level L Aeq is the only recognized indicator.
  • the A-weighted equivalent sound pressure level is based on the assumption of equal energy and mainly evaluates the noise properties from an energy perspective. It is believed that the same A-weighted equivalent sound pressure level, that is, the same energy of noise, will cause the same loss to hearing.
  • A-weighted equivalent sound pressure level and working hours survey are used to evaluate the biological effects caused by steady-state noise. It has also been widely recognized by the academic community. However, daily occupational exposure noises are mostly non-steady complex noises, which are generally impulsive or impactful. For complex noises, the equal energy assumption does not hold true. Compared with steady-state noises with the same energy, complex noises tend to cause greater harm to hearing. Large losses, especially for complex non-steady-state noise with high peak value and energy, the existing noise-induced hearing loss assessment standards often lead to underestimation of hearing loss, so building an effective noise-induced hearing loss prediction system is important for hearing health Protection is of great significance.
  • the kurtosis value of noise is equal to 3, and the kurtosis value of complex noise is greater than 3.
  • Time-domain metrics such as duration and pulse interval boil down to a simple parameter and are easy to calculate.
  • Zhao et al . [2] conducted a more in-depth study on this basis.
  • the hearing test data of 163 textile factory workers and 32 steel factory workers and the noise data of their environment they established the hearing damage rate.
  • the dose-response curve between the dose-response curve and the cumulative noise exposure After adjusting the cumulative noise exposure indicator using kurtosis, it was found that the dose-response curves of Gaussian noise and complex noise are close to coincident, which means that the cumulative dose-response curve after kurtosis correction
  • the indicator of noise exposure has a consistent assessment of complex noise and Gaussian noise, and has the potential to be an effective indicator for assessing noise-induced hearing loss.
  • Zhao Yanxia used support vector machine (SVM), neural network multi-layer perceptron (MLP), random forest and AdaBoost algorithm as alternative evaluation schemes to evaluate hearing loss caused by complex noise
  • SVM vector machine
  • MLP neural network multi-layer perceptron
  • AdaBoost AdaBoost algorithm
  • 39 effective noise feature parameters such as A-weighted equivalent sound pressure level, C-weighted equivalent sound pressure level and kurtosis value were finally selected and extracted from the questionnaire data
  • Three personal-related features were used to construct the final feature vector input model.
  • the purpose of the present invention is to propose a noise-induced hearing loss prediction based on asymmetric convolution in view of the shortcomings of the existing technology.
  • the system uses two asymmetric rectangular convolution kernels to extract energy information features and time domain change information features respectively, and fuses the energy features with time domain change information features and workers' personal related features to predict noise-induced hearing loss. .
  • a noise-induced hearing loss prediction system based on asymmetric convolution, which system includes a data acquisition module, a data preprocessing module, a feature extraction module and a feature fusion and noise-induced hearing loss prediction system.
  • Loss prediction module a noise-induced hearing loss prediction system based on asymmetric convolution, which system includes a data acquisition module, a data preprocessing module, a feature extraction module and a feature fusion and noise-induced hearing loss prediction system.
  • the data collection module is used to collect noise data of workers' occupational exposure and workers' personal information
  • the data preprocessing module is used to standardize the worker's personal information data and then input it into the feature fusion and noise-induced hearing loss prediction module, and to input the feature extraction module into the spectrogram when the noise data is converted into two-dimensional noise;
  • the feature extraction module is used to extract energy features and time domain change features in the time spectrum diagram based on asymmetric convolution kernels using convolution kernels of different shapes and then input the feature fusion and noise-induced hearing loss prediction modules;
  • the feature fusion and noise-induced hearing loss prediction module is used to introduce an attention mechanism module to selectively enhance features with a large amount of information, suppress invalid features, and then combine the energy features obtained by the feature extraction module with the time domain change features. After fusion and dimensionality reduction, the final features are obtained by combining the worker's personal information, and the prediction result of whether the worker suffers from noise-induced hearing loss is obtained through the fully connected layer and Softmax output layer.
  • the worker's personal information includes age, length of service and hearing threshold information at different frequencies.
  • the data preprocessing module matrixes the noise data into the original data set, and obtains the noise data time spectrum diagram through discrete time short-time Fourier transform.
  • the data preprocessing module standardizes workers’ personal information as follows:
  • d 1 is the age characteristic of the worker
  • d 2 is the length of service characteristic of the worker
  • d′ 1 is the standardized age characteristic of the worker
  • ⁇ 1 is the mean of d 1
  • ⁇ 1 is the standard deviation of d 1
  • d′ 2 is the standardized worker length of service characteristic
  • ⁇ 2 is the mean of d 2
  • ⁇ 2 is the standard deviation of d 2 .
  • the feature extraction module uses asymmetric convolution kernels to extract energy features and time domain change features respectively;
  • the horizontal rectangular convolution kernel It is more sensitive to the amplitude changes of adjacent frequencies at the same time, and is used to extract features that represent changes in the time domain;
  • the vertical rectangular convolution kernel it is more sensitive to the amplitude of adjacent frequencies at the same time, and is used to extract features that represent energy. Characteristics.
  • the feature extraction module uses horizontal and vertical convolution kernels to extract features from the input time-frequency spectrum image. After two asymmetric convolutions, three ordinary convolutions, and five poolings, the obtained time domain Change feature and energy feature input feature fusion and noise-induced hearing loss prediction module.
  • the feature fusion and noise-induced hearing loss prediction module uses the attention mechanism module to model the correlation of each channel.
  • global average pooling and compression are performed on each channel of the energy features and time domain change features.
  • the global spatial information is used as the channel descriptor, and then the energy feature channel descriptor and the time domain change feature channel descriptor are concatenated in series, and then two fully connected layers are connected.
  • the Sigmoid function is used to adjust the weight of each channel feature according to the input data. , thereby selectively enhancing features with greater information content and suppressing invalid features.
  • the energy features and time domain change features processed by the attention mechanism module are flattened into two one-dimensional vectors through two Flatten layers, and then the two one-dimensional vectors are concatenated and spliced, and then the two fully connected vectors are connected.
  • the present invention can accurately model complex problems through convolutional neural networks, thereby improving the accuracy of prediction of noise-induced hearing loss.
  • the present invention uses an asymmetric convolution kernel to extract features from the time-frequency spectrum diagram. According to the characteristics of the time-frequency spectrum diagram, two asymmetric rectangular convolution kernels are used to extract energy information features and time-domain change information features respectively, and through feature fusion The module combines energy features with time-domain change information features and worker-related personal features.
  • the model performance does not depend on the manually selected noise-induced hearing loss-related feature parameters, and it also makes better use of the original noise data.
  • Figure 1 is a schematic structural diagram of a noise-induced hearing loss prediction system based on asymmetric convolution provided by the present invention.
  • Figure 2 is a schematic structural diagram of the feature extraction module of the present invention.
  • Figure 3 is a schematic structural diagram of the feature fusion and noise-induced hearing loss prediction module of the present invention.
  • Figure 4 is a schematic diagram of a feature extraction module provided in an embodiment of the present invention.
  • the present invention provides a noise-induced hearing loss prediction system based on asymmetric convolution, including a data acquisition module, a data preprocessing module, a feature extraction module, and a feature fusion and noise-induced hearing loss prediction module.
  • the data collection module is mainly used to collect noise data of workers' occupational exposure and workers' personal information including age, length of service, and hearing thresholds at different frequencies;
  • the data preprocessing module is used to standardize workers' personal information data and then input feature fusion with the noise-induced hearing loss prediction module, and use the short-time Fourier transform on the noise data of workers’ occupational exposure to convert the original
  • the first one-dimensional noise data is converted into two-dimensional noise, and the time spectrogram is input into the feature extraction module;
  • the feature extraction module is used to extract energy features and time domain change features in the time spectrogram based on asymmetric convolution kernels, using convolution kernels of different shapes.
  • the feature fusion and noise-induced hearing loss prediction module is then input; the feature fusion and noise-induced hearing loss prediction module is used to fuse energy features and time-domain change features.
  • the attention mechanism module is introduced to allow the network to selectively enhance the amount of information. Large features, suppress invalid features, then fuse the energy features obtained by the feature extraction module with the time domain change features, reduce the dimensionality through two layers of fully connected layers, and finally combine the workers' personal information data to obtain the final features. After two layers of fully connected layers, the final features are obtained. The connection layer and Softmax output layer finally get the output.
  • Each module is described in detail below.
  • the data collection module is used to collect workers' occupational exposure to noise data during working hours; collect personal information data of workers, specifically including: workers' age, gender, length of service, factory, type of work, and hearing thresholds of both ears of workers.
  • the time-frequency spectrum diagram of noise data obtained by short-time Fourier transform can also be replaced by the spectrum obtained by other time-frequency analysis techniques such as Weinager-Weill distribution, smoothed pseudo-Weinager-Weill distribution, Cui-Williams distribution, etc. picture.
  • the present invention is only explained using a spectrum diagram when noise data is obtained through discrete-time short-time Fourier transform.
  • STFT Short-Time Fourier Transform
  • the discrete short-time Fourier transform is defined as:
  • DSTFT[k, q] is the discrete short-time Fourier transform of the one-dimensional noise data C[r]
  • q is the sampling point on the frequency axis
  • k is the sampling point on the time axis
  • j 2 -1
  • g [ ⁇ ] is the window function
  • N is the length of the window function.
  • the spectrogram of noise data represents the relationship between frequency distribution and window function delay. It is a visual representation of the time-related spectral information of the original noise data.
  • the obtained noise data time spectrum diagram is used as the input of the feature extraction module.
  • the data preprocessing module standardizes workers’ personal information data and uses it as a feature fusion and noise-induced hearing Input to the loss prediction module.
  • d′ 1 is the standardized age characteristic of workers
  • ⁇ 1 is the mean of d 1
  • ⁇ 1 is the standard deviation of d 1 ;
  • d′ 2 is the standardized worker length of service characteristics
  • ⁇ 2 is the mean of d 2
  • ⁇ 2 is the standard deviation of d 2 ;
  • the feature extraction module is used to extract energy features and time domain change features; existing research has proven that the loss caused by noise to human hearing is not only related to the energy of the noise, but also to the time-frequency domain characteristics of the noise itself. Likewise, Complex noise with high energy will cause greater damage to human hearing than steady-state noise.
  • the spectrum diagram SPEC[k, q] obtained by the data preprocessing module is the relationship between the frequency distribution and the window function delay. From the horizontal time dimension analysis, adjacent points can reflect the changes in the energy of the same frequency component. From the longitudinal energy Dimensional analysis, adjacent points can reflect the distribution of energy of different frequency components in the same time window.
  • the present invention uses an asymmetric convolution kernel to extract energy characteristics and time domain change characteristics respectively.
  • the horizontal rectangular convolution kernel is more sensitive to the amplitude changes of adjacent frequencies at the same frequency, and can more effectively extract features that represent changes in the time domain;
  • the vertical rectangular convolution kernel is more sensitive to the amplitude changes of adjacent frequencies at the same time. Weak is more sensitive and can more effectively extract features that represent energy.
  • the feature extraction module uses horizontal and vertical convolution kernels to extract features from the input time-frequency spectrum image. After two asymmetric convolutions, three ordinary convolutions, and five poolings, the output time-domain change features are combined with Energy feature input feature fusion and noise-induced hearing loss prediction module.
  • the specific convolutional neural network structure is:
  • the feature fusion and noise-induced hearing loss prediction module is used to fuse energy features and time domain change features.
  • the attention mechanism module is used to enhance features with large amount of information and suppress invalid features.
  • the feature extraction module obtains The obtained energy features are fused with the time domain change features, dimensionally reduced, and finally fused with the worker's personal information data and input into the classifier to obtain a prediction of whether the worker has hearing loss.
  • the channel attention mechanism is introduced in the feature fusion and noise-induced hearing loss prediction module to build a CNN neural network model for feature fusion and noise-induced hearing loss prediction:
  • the channel attention mechanism is first introduced to model the correlation of each channel. First, perform global average pooling on each channel of energy features and time domain change features, compress the global spatial information as channel descriptors, and then concatenate the energy feature channel descriptors and time domain change feature channel descriptors in series, and then connect Two fully connected layers, and finally through the Sigmoid function, the output is the weight of each channel feature.
  • the network will adjust the weight of each channel feature according to the input data, thereby selectively enhancing features with a large amount of information and suppressing invalid features.
  • the two sets of output features will be flattened into two one-dimensional vectors through two Flatten layers, and then the two one-dimensional vectors will be concatenated in series, and then two fully connected layers will be connected to reduce the dimensionality of the features;
  • the output features are concatenated with the worker's age, length of service and gender characteristics obtained by the data preprocessing module, and finally the prediction of whether the worker suffers from hearing loss is obtained through two fully connected layers and a Softmax output layer.
  • xi represents the time-frequency spectrum diagram of the i-th sample Data
  • yi represents the label of the i-th sample
  • m represents the number of samples.
  • the number of layers of the CNN neural network model is L, for the convolutional layer, the padding size is P, the stride is S, for the pooling layer, the size of the pooling area is u, the learning rate is ⁇ , the maximum number of iterations is Max, stop The threshold is ⁇ , the weight matrix is W, and the bias term is b.
  • a i, l are the tensors corresponding to layer l x i
  • W l is the weight matrix of layer l
  • is the activation function
  • b l is the bias term of layer l
  • * represents matrix multiplication.
  • layer l is a pooling layer
  • a i, l are the tensors corresponding to layer l x i
  • pool( ⁇ ) is the maximum pooling function.
  • a i, L are the tensors corresponding to the L layer x i
  • W L is the weight matrix of the L layer
  • b l is the bias term of the L layer.
  • ⁇ i, L is the gradient corresponding to layer L x i
  • a i, L is the tensor corresponding to L layer x i
  • z i L is the input corresponding to L layer x i
  • ⁇ ′( ⁇ ) is the first derivative of the activation function.
  • ⁇ i, l is the gradient corresponding to the l-th layer x i
  • W l+1 is the weight matrix of the l+1 layer
  • rot180 means rotating the matrix 180 degrees
  • z i, l is the input corresponding to the l-th layer x i
  • ⁇ ′( ⁇ ) is the first derivative of the activation function
  • represents the matrix dot product.
  • ⁇ i, l upsample ( ⁇ i, l+1 ) ⁇ ′(z i, l )
  • ⁇ i, l is the gradient corresponding to the l-th layer x i
  • upsample( ⁇ ) is the upsampling function
  • z i, l is the input corresponding to the l-th layer x i
  • ⁇ ′( ⁇ ) is the first-order derivative of the activation function.
  • ⁇ i, l (W l+1 ) T ⁇ i, l+1 ⁇ ′(z i, l )
  • ⁇ i, l is the gradient corresponding to layer l ⁇ ) is the first derivative of the activation function.
  • the weight update process is: If layer l is a fully connected layer, then there is:
  • W l is the weight matrix of layer l
  • is the learning rate
  • ⁇ i, l is the gradient corresponding to layer x i of l
  • a i, l-1 is layer l-1
  • the tensor corresponding to x i , b l is the bias term of layer l
  • m is the total number of samples.
  • W l is the weight matrix of layer l
  • is the learning rate
  • ⁇ i, l is the gradient corresponding to layer l x i
  • a i, l-1 is the tensor corresponding to layer x i of l-1
  • b l is the bias term of layer l
  • m is the total number of samples
  • u and v represent the positions of elements in the gradient matrix.
  • the noise time spectrogram is first obtained through the data preprocessing module as the input of the feature extraction module, and the personal information characteristics of the sample are obtained as the input of the feature fusion and noise-induced hearing loss prediction module, and the label is whether the sample has hearing loss. ;
  • dropout regularization and early stop methods are adopted to avoid data overfitting.
  • the features obtained by the workers through the data preprocessing module are input into the trained model, and the prediction results of whether the workers will suffer from hearing loss can be obtained.
  • the data acquisition module of this system is used to collect data from each person through a noise digital recorder.
  • the personal information data includes age characteristics, working length characteristics and hearing at different frequencies (500Hz, 1kHz, 2kHz, 3kHz, 4kHz, 6kHz, 8kHz) threshold.
  • the data preprocessing module of this system uses the data preprocessing module of this system to matrix the noise data of each sample into the original data set.
  • the time window length of the short-time Fourier transform is set to 0.5 seconds, and the two adjacent windows overlap by 0.25s.
  • the image dimension is 1207*1207.
  • the workers' personal information data is standardized and used as input to the feature fusion and noise-induced hearing loss prediction modules. Whether the average hearing threshold of both ears at 1kHz, 2kHz, 3kHz, and 4kHz is greater than 25dB is used as a measure of hearing loss. Samples without hearing loss are labeled as positive examples, and samples with hearing loss are labeled as negative examples. .
  • the data preprocessing module also standardizes the workers’ personal information data obtained by the data collection module and inputs it into the feature fusion and noise-induced hearing loss prediction module;
  • the feature extraction module of this system uses an asymmetric convolution kernel to extract energy features and time domain change features respectively.
  • the specific structure of the network is:
  • Input image - convolution layer 1 (1*11 convolution kernel) - pooling layer 1 - convolution layer 2 (1*9 convolution kernel) - pooling layer 2 - convolution layer 3 (3*3 convolution kernel) )-Pooling layer 3-Convolution layer 4 (3*3 convolution kernel)-Pooling layer 4-Convolution layer 5 (3*3 convolution kernel)-Pooling layer 5-Feature fusion and noise-induced hearing loss prediction module;
  • Input image - convolution layer 1 (11*1 convolution kernel) - pooling layer 1 - convolution layer 2 (9*1 convolution kernel) - pooling layer 2 - convolution layer 3 (3*3 convolution kernel) )-Pooling layer 3-Convolution layer 4 (3*3 convolution kernel)-Pooling layer 4-Convolution layer 5 (3*3 convolution kernel)-Pooling layer 5-Feature fusion and noise-induced hearing loss prediction module;
  • the training process of the network model in the feature fusion and noise-induced hearing loss prediction module is as follows:
  • the time domain change features and energy features extracted by the feature extraction module are used as the input of the network model of the feature fusion and noise-induced hearing loss prediction modules.
  • the personal information features obtained by the data preprocessing module are used as the input of the deep layer of the network.
  • the label is whether the sample is Suffering from hearing loss; dropout regularization and early stop methods are used during the training process to avoid data overfitting.
  • the prediction results of whether the workers will suffer from hearing loss can be obtained.
  • the AUC (Area Under the Curve) predicted by this system can reach more than 0.82, allowing for more accurate hearing loss prediction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Signal Processing (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Public Health (AREA)
  • Acoustics & Sound (AREA)
  • Pathology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Multimedia (AREA)
  • Veterinary Medicine (AREA)
  • Physiology (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Psychiatry (AREA)
  • Epidemiology (AREA)
  • Fuzzy Systems (AREA)
  • Otolaryngology (AREA)

Abstract

本发明公开了一种基于非对称卷积的噪声性听力损失预测系统,该系统包括数据采集模块、数据预处理模块、特征提取模块和特征融合与噪声性听力损失预测模块;数据采集模块用于采集工人职业暴露的噪声数据和工人个人信息;数据预处理模块用于对工人个人信息数据进行标准化,并对噪声数据转换为二维噪声时频谱图;特征提取模块用于利用不同形状的卷积核提能量特征与时域变化特征;特征融合与噪声性听力损失预测模块将能量特征与时域变化特征进行融合并降维后,联合工人个人信息输出得到工人是否患有噪声性听力损失的预测结果。本发明使用不对称卷积核对时频谱图进行特征提取,提高噪声性听力损失预测的准确性。

Description

一种基于非对称卷积的噪声性听力损失预测系统 技术领域
本发明涉及医疗信息技术领域,尤其涉及一种基于非对称卷积的噪声性听力损失预测系统。
背景技术
听力损失是全世界所面临的重大公共健康问题,听力的受损会导致语言认知能力、理解能力以及社会适应能力的长期缺陷。复杂噪声职业暴露是引起听力损失的主要原因之一,长时间暴露在危险水平的噪声中会造成永久性的听力损伤。
噪声分为稳态噪声与非稳态噪声,当前评估噪声性听力损失的标准是国际噪声暴露标准(ISO-1999),该标准是基于上世纪50~60年代的稳态噪声数据而建立的,因此对噪声暴露的类型不敏感,会低估复杂噪声造成的听力损失。在稳态噪声造成的听力损失评估方面,A计权等效声压级LAeq是唯一公认的指标,A计权等效声压级这一指标基于等能量假设,主要从能量角度对噪声性听力损失进行评估,认为相同A计权等效声压级即相同能量的噪声,对听力造成的损失也相同,使用A计权等效声压级和工时调查来评价稳态噪声所致生物效应也得到了学术界的广泛认可。但日常职业暴露噪声多为非稳态复杂噪声,普遍存在脉冲性或者冲击性,对于复杂噪声而言,等能量假设并不成立,相比能量相同的稳态噪声,复杂噪声往往会对听力造成更大的损失,尤其对于峰值和能量都很高的复杂非稳态噪声,现有的噪声性听力损失评估标准往往造成听力损失的低估,因此构建一种有效的噪声性听力损失预测系统对于听力健康保护具有重要的意义。
由于职业噪声暴露具有普遍性,许多研究已经针对现有的噪声性听力损失评估标准容易低估复杂噪声性听力损失这一问题开展了一些工作。与本发明所申明最相近的技术方案如下:
①用峰度修正的累计噪声暴露(Kurtosis Adjusted Cumulative Noise Exposure)。基于时域结构的不同,复杂噪声造成的听力损失也不同这一结论,研究者们提出了很多对复杂噪声时域结构评估的指标,包括信号脉冲的峰值、持续时间以及脉冲之间的时间间隔等,但这样描述复杂噪声的方法在实际噪声暴露的环境下并不实用,可操作性也比较差。Qiu等[1]人采用峰度作为复杂噪声性听力损伤评估指标,峰度是信号四阶中心矩和二阶中心矩的比值,可以用来估量一个随机过程相对于高斯分布的脉冲性,高斯噪声的峰度值等于3,复杂噪声的峰度值大于3,峰度值越大表明复杂噪声的脉冲性越强,用峰度作为评估的参量能够将脉冲峰值、 持续时间以及脉冲间隔等时域指标归结为一个简单的参数并且易于计算。
Zhao等[2]人在此基础上进行了更加深入的研究,通过对163名纺织厂工人以及32名钢铁厂工人的听力测试数据以及他们所处环境的噪声数据进行分析,建立了听力损伤率与累积噪声暴露之间的剂量-响应曲线,在使用峰度对累计噪声暴露这一指标进行调整以后发现高斯噪声和复杂噪声的剂量响应曲线接近于重合,这意味着经过峰度修正后的累积噪声暴露这一指标对复杂噪声和高斯噪声有着一致的评估,有潜力作为评估噪声性听力损伤的有效指标。
②基于机器学习构建噪声性听力损失预测模型。赵艳霞[3]从机器学习的角度出发,采用支持向量机(SVM)、神经网络多层感知器(MLP)、随机森林和AdaBoost算法作为评估复杂噪声造成的听力损失的备选评估方案,通过t-test单变量特征选择方法进行了特征筛选,最终选取了A计权等效声压级、C计权等效声压级以及峰度值等39个有效噪声特征参量,并从问卷数据中提取了3个个人相关特征构建最终的特征向量输入模型,通过比较得出SVM模型能够在噪声性听力损失预测任务上表现最好,获得了相比于ISO-1999标准更好的预测性能,为准确评估各种复杂噪声引起的听力损伤提供了新的思路。
与①相似的技术是基于分布统计学的,通过调整各类参数以拟合到某些数据集。当样本数量相对于模型的复杂度较大时,该方式效果较为理想,能够取得较好的评估结果,但是如果错误的选择了模型的复杂性,或所研究的系统过于复杂而无法用简单的数学公式描述,这样的方式也会产生较大的误差。同时,峰度极易受背景噪声幅度、脉冲的峰值、持续时间和发生频率以及峰度的计算窗口这四个因素影响,从而在评估噪声性听力损失的准确性上打了折扣。
与②相似的技术基于一定的先验知识,人工选取与噪声性听力损失相关的特征参量来进行输入模型的特征向量构建,其准确度依赖于人工选取的噪声性听力损失相关特征参量,对于噪声音频数据的利用程度不高。
参考文献
[1]Qiu W,Hamernik R P,Davis B.The kurtosis metric as an adjunct to energy in the prediction of trauma from continuous,nonGaussian noise exposures[J].Journal of the Acoustical Society of America,2006,120(6):3901.
[2]Zhao Y M,Qiu W,Zeng L,et al.Application of the kurtosis statistic to the evaluation of the risk of hearing loss in workers exposed to high-level complex noise.[J].Ear Hear,2010,31(4):527-532.
[3]赵艳霞.基于机器学习的复杂噪声所致听力损失预测模型研究[D]。
发明内容
本发明目的在于针对现有技术的不足,提出一种基于非对称卷积的噪声性听力损失预测 系统,采用两种不对称的矩形的卷积核分别提取能量信息特征以及时域变化信息特征,融合将能量特征与时域变化信息特征以及工人个人相关特征进行联合,对噪声性听力损失进行预测。
本发明的目的是通过以下技术方案来实现的:一种基于非对称卷积的噪声性听力损失预测系统,该系统包括数据采集模块、数据预处理模块、特征提取模块和特征融合与噪声性听力损失预测模块;
所述数据采集模块用于采集工人职业暴露的噪声数据和工人个人信息;
所述数据预处理模块用于对工人个人信息数据进行标准化后输入特征融合与噪声性听力损失预测模块,并对噪声数据转换为二维噪声时频谱图输入特征提取模块;
所述特征提取模块用于基于非对称卷积核,利用不同形状的卷积核分别提取时频谱图中能量特征与时域变化特征后输入特征融合与噪声性听力损失预测模块;
所述特征融合与噪声性听力损失预测模块用于引入注意力机制模块,有选择性的加强信息量较大的特征,抑制无效特征,再将特征提取模块得到的能量特征与时域变化特征进行融合并降维后,再联合工人个人信息得到最终特征,经过全连接层和Softmax输出层得到工人是否患有噪声性听力损失的预测结果。
进一步地,所述工人个人信息包括年龄、工龄和不同频率的听力阈值信息。
进一步地,所述数据预处理模块将噪声数据矩阵化为原始数据集,并通过离散时间短时傅里叶变换得到噪声数据时频谱图。
进一步地,数据预处理模块将工人个人信息进行标准化具体如下:

其中,其中d1为工人的年龄特征,d2为工人的工龄特征,d′1为标准化后的工人年龄特征,λ1为d1的均值,σ1为d1的标准差;d′2为标准化后的工人工龄特征,λ2为d2的均值,σ2为d2的标准差。
进一步地,所述特征提取模块基于噪声对听力造成损伤的特点以及噪声时频谱图像的特征,采用非对称卷积核,分别对能量特征以及时域变化特征进行提取;横向的矩形卷积核,对于同一频率相邻时刻幅值变化更加敏感,用于提取表征时域变化的特征;对于纵向的矩形卷积核,对同一时刻相邻频率的幅值的强弱更加敏感,用于提取表征能量的特征。
进一步地,特征提取模块对输入的时频谱图像分别采用横向以及纵向的卷积核进行特征提取,分别经过两次非对称卷积,三次普通卷积,五次池化后,将得到的时域变化特征与能量特征输入特征融合与噪声性听力损失预测模块。
进一步地,所述特征融合与噪声性听力损失预测模块利用注意力机制模块,对各个通道的关联性进行建模,首先分别对能量特征及时域变化特征的每个通道进行全局平均池化,压缩全局空间信息作为通道描述符,随后将能量特征通道描述符与时域变化特征通道描述符进行串联拼接,其后连接两个全连接层,最后通过Sigmoid函数,根据输入数据调节各通道特征的权重,从而有选择性的加强信息量较大的特征,抑制无效特征。
进一步地,将经过注意力机制模块处理后的能量特征及时域变化特征分别通过两个Flatten层展平为两个一维向量,再将两个一维向量进行串联拼接,再连接两个全连接层进行特征的降维;将降维后的输出特征与数据预处理模块得到的工人个人信息进行串联拼接,最后再通过两个全连接层和一个Softmax输出层得到工人是否患有噪声性听力损失的预测结果。
本发明的有益效果:
1.本发明通过卷积神经网络能够对复杂问题进行准确建模,从而提高噪声性听力损失预测的准确性。
2.本发明使用不对称卷积核对时频谱图进行特征提取,针对时频谱图的特性,采用两种不对称的矩形的卷积核分别提取能量信息特征以及时域变化信息特征,通过特征融合模块将能量特征与时域变化信息特征以及工人个人相关特征进行联合,模型表现不依赖于人工选取的噪声性听力损失相关特征参量,对原始噪声数据的利用程度也更高。
附图说明
图1为本发明提供的一种基于非对称卷积的噪声性听力损失预测系统结构示意图。
图2为本发明特征提取模块结构示意图。
图3为本发明特征融合与噪声性听力损失预测模块结构示意图。
图4为本发明实施例中提供的特征提取模块示意图。
具体实施方式
以下结合附图对本发明具体实施方式作进一步详细说明。
如图1所示,本发明提供的一种基于非对称卷积的噪声性听力损失预测系统,包括数据采集模块、数据预处理模块、特征提取模块和特征融合与噪声性听力损失预测模块。其中,数据采集模块主要用于采集工人职业暴露的噪声数据以及工人包括年龄、工龄、不同频率的听力阈值在内的个人信息;数据预处理模块用于对工人个人信息数据进行标准化后输入特征融合与噪声性听力损失预测模块,并对工人职业暴露的噪声数据利用短时傅里叶变换,将原 始一维噪声数据转换为二维噪声时频谱图输入特征提取模块;特征提取模块用于基于非对称卷积核,利用不同形状的卷积核分别提取时频谱图中能量特征与时域变化特征后输入特征融合与噪声性听力损失预测模块;特征融合与噪声性听力损失预测模块用于将能量特征与时域变化特征进行融合,首先引入注意力机制模块,让网络有选择性的加强信息量大的特征,抑制无效特征,再将特征提取模块得到的能量特征与时域变化特征进行融合,通过两层全连接层进行降维,最后再联合工人个人信息数据得到最终特征,经过两层全连接层和Softmax输出层最终得到输出。下面对每一个模块进行具体说明。
所述数据采集模块用于采集工作时间内的工人职业暴露噪声数据;收集工人的个人信息数据,具体包括:工人的年龄,性别,工龄,所在工厂、工种以及工人双耳的听力阈值。
所述数据预处理模块用于将数据采集模块采集的每个样本的噪声数据矩阵化为一维噪声数据C[r]=[c1,c2,...ch...,cO],其中O为噪声数据总点数,r=1,2,...O,表示噪声数据的索引,ch代表噪声记录仪记录到的第h点数据,通过离散时间短时傅里叶变换得到噪声数据时频谱图。通过短时傅里叶变换得到的噪声数据时频谱图也可以替换为魏纳格-威尔分布、平滑伪魏纳格-威尔分布、崔-威廉斯分布等其他时频分析技术得到的谱图。本发明仅以通过离散时间短时傅里叶变换得到噪声数据时频谱图来进行说明。
短时傅里叶变换(Short-Time Fourier Transform,STFT)的基本思想是采用滑动窗口函数对原始的信号进行截取,将信号分为若干子段并对每一个子段进行傅里叶分析,最后得到信号频谱与窗函数时延之间的关系,也就是信号的时间、频率二维联合分布。
对于采集到的噪声数据,离散短时傅里叶变换定义为:
其中DSTFT[k,q]为一维噪声数据C[r]的离散短时傅里叶变换,q为频率轴上的采样点,k为时间轴上的采样点,j2=-1,g[·]为窗函数,N为窗函数长度。在数据预处理模块模块中,原始噪声数据将进一步转化为时频谱图SPEC[k,q]:
SPEC[k,q]=|DSTFT[k,q]|2
噪声数据时频谱图表示的是频率分布与窗函数时延之间的关系,是原始噪声数据与时间相关的频谱信息的视觉表示。将得到的噪声数据时频谱图作为特征提取模块的输入。
同时,数据预处理模块将工人的个人信息数据进行标准化后作为特征融合与噪声性听力 损失预测模块的输入。
数据预处理模块基于采集模块得到的工人个人信息数据D=[d1,d2,d3],其中d1为工人的年龄特征;d2为工人的工龄特征;d3为工人的性别,0代表男性,1代表女性,进行标准化如下:
其中,d′1为标准化后的工人年龄特征,λ1为d1的均值,σ1为d1的标准差;
其中,d′2为标准化后的工人工龄特征,λ2为d2的均值,σ2为d2的标准差;
所述特征提取模块用于提取能量特征以及时域变化特征;目前已有的研究已经证明了噪声对人听力造成的损失不仅与噪声的能量有关,与噪声本身的时频域特征也有关,同样能量的复杂噪声会比稳态噪声对人听力损伤更大。而数据预处理模块得到时频谱图SPEC[k,q]是频率分布与窗函数时延之间的关系,从横向时间维度分析,相邻点可以反应同一频率分量能量的变化情况,从纵向能量维度分析,相邻点可以反应同一时间窗内,不同频率分量能量的分布情况。
基于上述噪声对听力造成损伤的特点以及噪声时频谱图像的特征,本发明采用非对称卷积核,分别对能量特征以及时域变化特征进行提取。横向的矩形卷积核,对于同一频率相邻时刻幅值变化更加敏感,可以更有效得提取表征时域变化的特征;对于纵向的矩形卷积核,对同一时刻相邻频率的幅值的强弱更加敏感,可以更加有效得提取表征能量的特征。
特征提取模块对输入的时频谱图像分别采用横向以及纵向的卷积核进行特征提取,分别经过两次非对称卷积,三次普通卷积,五次池化后,将输出的时域变化特征与能量特征输入特征融合与噪声性听力损失预测模块。
如图2所示,具体卷积神经网络结构为:
输入图像-卷积层1-池化层1-卷积层2-池化层2-卷积层3-池化层3-卷积层4-池化层4-卷积层5-池化层5-特征融合与噪声性听力损失预测模块;
输入图像-卷积层1-池化层1-卷积层2-池化层2-卷积层3-池化层3-卷积层4-池化层4-卷积层5-池化层5-特征融合与噪声性听力损失预测模块;
所述特征融合与噪声性听力损失预测模块用于融合能量特征与时域变化特征,首先利用注意力机制模块,对信息量大的特征进行加强,对无效特征进行抑制,再将特征提取模块得 到的能量特征与时域变化特征进行融合、降维,最后再与工人个人信息数据进行融合,输入分类器从而得到工人是否患有听力损失的预测。
特征融合与噪声性听力损失预测模块中引入通道注意力机制构建CNN神经网络模型用于特征融合与噪声性听力损失预测:
如图3所示,首先引入通道注意力机制,对各个通道的关联性进行建模。首先分别对能量特征及时域变化特征的每个通道进行全局平均池化,压缩全局空间信息作为通道描述符,随后将能量特征通道描述符与时域变化特征通道描述符进行串联拼接,其后连接两个全连接层,最后通过Sigmoid函数,输出的就是各通道特征的权重,网络会根据输入数据调节各通道特征的权重,从而有选择性的加强信息量大的特征,抑制无效特征。将得到两组输出特征分别通过两个Flatten层展平为两个一维向量,再将两个一维向量进行串联拼接,其后连接两个全连接层进行特征的降维;将降维后的输出特征与数据预处理模块得到的工人年龄、工龄以及性别特征进行串联拼接,最后再通过两个全连接层和一个Softmax输出层得到工人是否患有听力损失的预测。
对于训练样本集D={(x1,y1),...(xi,yi)...,(xm,ym)},其中xi代表第i个样本的时频谱图数据,yi代表第i个样本的标签,m代表样本数量。CNN神经网络模型的层数为L,对于卷积层,填充大小为P,步幅为S,对于池化层,池化区域的大小为u,学习率为α,最大迭代次数为Max,停止阈值为ε,权重矩阵为W,偏置项为b,则整个CNN神经网络的构建过程为:
1.初始化各隐藏层与输出层的权重矩阵W,偏置项b的值为一随机值。
2.前项传播过程:对于l=2,...,L-1,如果l层为卷积层或是全连接层,前项传播过程为:
ai,l=σ(Wl*ai,l-1+bl)
其中,ai,l为l层xi对应的张量,Wl为l层的权重矩阵,σ为激活函数,bl为l层的偏置项;*代表矩阵乘法。
如果l层为池化层,前项传播过程为:
ai,l=pool(ai,l-1)
其中,ai,l为l层xi对应的张量,pool(·)为最大池化函数。
最终第L层的输出为:
ai,L=softmax(WL*ai,L-1+bL)
其中,ai,L为L层xi对应的张量,WL为L层的权重矩阵,bl为L层的偏置项。
3.梯度计算过程:通过损失函数可以对输出层的梯度进行计算:
其中,δi,L为第L层xi对应的梯度,为代价损失函数,ai,L为L层xi对应的张量,zi,L为L层xi对应的输入,σ′(·)为激活函数的一阶导数。
对于l=2,...,L-1,如果l层为卷积层,反向传播过程为:
δi,l=δi,l+1*rot180(Wl+1)·σ′(zi,l)
其中,δi,l为第l层xi对应的梯度,Wl+1为l+1层的权重矩阵,rot180表示将矩阵旋转180度,zi,l为l层xi对应的输入,σ′(·)为激活函数的一阶导数,·代表矩阵点乘。
如果l层为池化层,反向传播过程为:
δi,l=upsample(δi,l+1)·σ′(zi,l)
其中δi,l为第l层xi对应的梯度,upsample(·)为上采样函数,zi,l为l层xi对应的输入,σ′(·)为激活函数的一阶导数。
如果l层为全连接层,反向传播过程为:
δi,l=(Wl+1)Tδi,l+1·σ′(zi,l)
其中,δi,l为第l层xi对应的梯度,Wl+1为l+1层的权重矩阵,T表示转置,zi,l为l层xi对应的输入,σ′(·)为激活函数的一阶导数。
4.权重更新过程为:如果l层为全连接层,则有:

其中,Wl为l层的权重矩阵,α为学习率,δi,l为第l层xi对应的梯度,ai,l-1为l-1层 xi对应的张量,bl为l层的偏置项,m为样本总个数。
如果l层卷积层,则对于每一个卷积核有:

其中,Wl为l层的权重矩阵,α为学习率,δi,l为第l层xi对应的梯度,ai,l-1为l-1层xi对应的张量,bl为l层的偏置项,m为样本总个数,u,v代表梯度矩阵中元素的位置。
5.当所有的W,b的变化都小于阈值ε时跳出迭代循环,否则迭代至最大迭代次数Max。
针对训练集中的单个样本,首先通过数据预处理模块得到噪声时频谱图作为特征提取模块的输入,得到样本个人信息特征作为特征融合与噪声性听力损失预测模块输入,标签为样本是否患有听力损失;训练过程中采取dropout正则化和early stop方法避免数据过拟合。
训练完成后将工人通过数据预处理模块得到的特征输入训练好的模型,即可得到工人是否会患上听力损失的预测结果。
以下给出一个具体的应用场景,对本发明作进一步详细说明:为预测一批工人在工业噪声暴露环境中是否有产生听力损失的风险,使用本系统的数据采集模块通过噪声数字记录仪采集每位暴露在噪声中的工人约8小时的噪声数据和其个人信息数据;所述个人信息数据包括年龄特征、工龄特征和不同频率下(500Hz、1kHz、2kHz、3kHz、4kHz、6kHz、8kHz)的听力阈值。
使用本系统的数据预处理模块将每个样本的噪声数据矩阵化为原始数据集,短时傅里叶变换的时间窗长度设置为0.5秒,相邻两窗重叠0.25s,通过离散时间短时傅里叶变换得到噪声数据时频谱图。图片维度为1207*1207。将工人的个人信息数据进行标准化后作为特征融合与噪声性听力损失预测模块的输入。采用双耳在1kHz、2kHz、3kHz、4kHz处的平均听力阈值是否大于25dB作为是否患有听力损失的衡量标准,对没有听力损失的样本打上正例标签,对于患有听力损失的样本打上反例标签。数据预处理模块同时将数据采集模块得到的工人个人信息数据进行标准化,输入到特征融合与噪声性听力损失预测模块;
使用本系统的特征提取模块采用非对称卷积核,分别对能量特征以及时域变化特征进行提取。网络具体结构为:
输入图像-卷积层1(1*11卷积核)-池化层1-卷积层2(1*9卷积核)-池化层2-卷积层3(3*3卷积核)-池化层3-卷积层4(3*3卷积核)-池化层4-卷积层5(3*3卷积核)-池化层 5-特征融合与噪声性听力损失预测模块;
输入图像-卷积层1(11*1卷积核)-池化层1-卷积层2(9*1卷积核)-池化层2-卷积层3(3*3卷积核)-池化层3-卷积层4(3*3卷积核)-池化层4-卷积层5(3*3卷积核)-池化层5-特征融合与噪声性听力损失预测模块;
图片维度变化如图4所示。
使用本系统的特征融合与噪声性听力损失预测模块对信息量大的特征进行加强,对无效特征进行抑制,再将特征提取模块得到的能量特征与时域变化特征进行融合、降维,最后再与工人个人信息数据进行融合,输入分类器从而得到工人是否患有听力损失的预测。
特征融合与噪声性听力损失预测模块中的网络模型的训练过程如下:
首先将特征提取模块提取的时域变化特征和能量特征作为特征融合与噪声性听力损失预测模块的网络模型的输入,将数据预处理模块得到的个人信息特征作为网络深层的输入,标签为样本是否患有听力损失;训练过程中采取dropout正则化和early stop方法避免数据过拟合。
将该批工人通过数据预处理模块得到的特征输入训练好的模型,即可得到工人是否会患上听力损失的预测结果。本系统预测的AUC(Area Under the Curve)可达到0.82以上,可以进行较为精准的听力损失预测。
上述实施例用来解释说明本发明,而不是对本发明进行限制,在本发明的精神和权利要求的保护范围内,对本发明作出的任何修改和改变,都落入本发明的保护范围。

Claims (8)

  1. 一种基于非对称卷积的噪声性听力损失预测系统,其特征在于,该系统包括数据采集模块、数据预处理模块、特征提取模块和特征融合与噪声性听力损失预测模块;
    所述数据采集模块用于采集工人职业暴露的噪声数据和工人个人信息;
    所述数据预处理模块用于对工人个人信息数据进行标准化后输入特征融合与噪声性听力损失预测模块,并对噪声数据转换为二维噪声时频谱图输入特征提取模块;
    所述特征提取模块用于基于非对称卷积核,利用不同形状的卷积核分别提取时频谱图中能量特征与时域变化特征后输入特征融合与噪声性听力损失预测模块;
    所述特征融合与噪声性听力损失预测模块用于引入注意力机制模块,有选择性的加强信息量较大的特征,抑制无效特征,再将特征提取模块得到的能量特征与时域变化特征进行融合并降维后,再联合工人个人信息得到最终特征,经过全连接层和Softmax输出层得到工人是否患有噪声性听力损失的预测结果。
  2. 根据权利要求1所述的一种基于非对称卷积的噪声性听力损失预测系统,其特征在于,所述工人个人信息包括年龄、工龄和不同频率的听力阈值信息。
  3. 根据权利要求1所述的一种基于非对称卷积的噪声性听力损失预测系统,其特征在于,所述数据预处理模块将噪声数据矩阵化为原始数据集,并通过离散时间短时傅里叶变换得到噪声数据时频谱图。
  4. 根据权利要求1所述的一种基于非对称卷积的噪声性听力损失预测系统,其特征在于,数据预处理模块将工人个人信息进行标准化具体如下:

    其中,其中d1为工人的年龄特征,d2为工人的工龄特征,d′1为标准化后的工人年龄特征,λ1为d1的均值,σ1为d1的标准差;d′2为标准化后的工人工龄特征,λ2为d2的均值,σ2为d2的标准差。
  5. 根据权利要求1所述的一种基于非对称卷积的噪声性听力损失预测系统,其特征在于,所述特征提取模块基于噪声对听力造成损伤的特点以及噪声时频谱图像的特征,采用非对称卷积核,分别对能量特征以及时域变化特征进行提取;横向的矩形卷积核,对于同一频率相邻时刻幅值变化更加敏感,用于提取表征时域变化的特征;对于纵向的矩形卷积核,对同一 时刻相邻频率的幅值的强弱更加敏感,用于提取表征能量的特征。
  6. 根据权利要求1所述的一种基于非对称卷积的噪声性听力损失预测系统,其特征在于,特征提取模块对输入的时频谱图像分别采用横向以及纵向的卷积核进行特征提取,分别经过两次非对称卷积,三次普通卷积,五次池化后,将得到的时域变化特征与能量特征输入特征融合与噪声性听力损失预测模块。
  7. 根据权利要求1所述的一种基于非对称卷积的噪声性听力损失预测系统,其特征在于,所述特征融合与噪声性听力损失预测模块利用注意力机制模块,对各个通道的关联性进行建模,首先分别对能量特征及时域变化特征的每个通道进行全局平均池化,压缩全局空间信息作为通道描述符,随后将能量特征通道描述符与时域变化特征通道描述符进行串联拼接,其后连接两个全连接层,最后通过Sigmoid函数,根据输入数据调节各通道特征的权重,从而有选择性的加强信息量较大的特征,抑制无效特征。
  8. 根据权利要求7所述的一种基于非对称卷积的噪声性听力损失预测系统,其特征在于,将经过注意力机制模块处理后的能量特征及时域变化特征分别通过两个Flatten层展平为两个一维向量,再将两个一维向量进行串联拼接,再连接两个全连接层进行特征的降维;将降维后的输出特征与数据预处理模块得到的工人个人信息进行串联拼接,最后再通过两个全连接层和一个Softmax输出层得到工人是否患有噪声性听力损失的预测结果。
PCT/CN2023/105569 2022-07-04 2023-07-03 一种基于非对称卷积的噪声性听力损失预测系统 WO2024008045A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210777572.5 2022-07-04
CN202210777572.5A CN114861835B (zh) 2022-07-04 2022-07-04 一种基于非对称卷积的噪声性听力损失预测系统

Publications (1)

Publication Number Publication Date
WO2024008045A1 true WO2024008045A1 (zh) 2024-01-11

Family

ID=82626044

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/105569 WO2024008045A1 (zh) 2022-07-04 2023-07-03 一种基于非对称卷积的噪声性听力损失预测系统

Country Status (2)

Country Link
CN (1) CN114861835B (zh)
WO (1) WO2024008045A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114861835B (zh) * 2022-07-04 2022-09-27 浙江大学 一种基于非对称卷积的噪声性听力损失预测系统
CN116320042B (zh) * 2023-05-16 2023-08-04 陕西思极科技有限公司 边缘计算的物联终端监测控制系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180160984A1 (en) * 2016-12-13 2018-06-14 Stefan Jozef Mauger Speech production and the management/prediction of hearing loss
CN111223564A (zh) * 2020-01-14 2020-06-02 浙江大学 一种基于卷积神经网络的噪声性听力损失预测系统
CN111584065A (zh) * 2020-04-07 2020-08-25 上海交通大学医学院附属第九人民医院 噪声性听力损失预测及易感人群筛选方法、装置、终端和介质
CN114861835A (zh) * 2022-07-04 2022-08-05 浙江大学 一种基于非对称卷积的噪声性听力损失预测系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109473120A (zh) * 2018-11-14 2019-03-15 辽宁工程技术大学 一种基于卷积神经网络的异常声音信号识别方法
CN109637545B (zh) * 2019-01-17 2023-05-30 哈尔滨工程大学 基于一维卷积非对称双向长短时记忆网络的声纹识别方法
CN109767785A (zh) * 2019-03-06 2019-05-17 河北工业大学 基于卷积神经网络的环境噪声识别分类方法
EP4085656A1 (en) * 2019-12-31 2022-11-09 Starkey Laboratories, Inc. Hearing assistance device model prediction
CN111625763A (zh) * 2020-05-27 2020-09-04 郑州航空工业管理学院 一种基于数学模型的运行风险预测方法和预测系统
CN112866694B (zh) * 2020-12-31 2023-07-14 杭州电子科技大学 联合非对称卷积块和条件上下文的智能图像压缩优化方法
CN112971776A (zh) * 2021-04-19 2021-06-18 中国人民解放军总医院第六医学中心 一种确定听力检测波形中特征波形位置的方法及装置
CN114445299A (zh) * 2022-01-28 2022-05-06 南京邮电大学 一种基于注意力分配机制的双残差去噪方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180160984A1 (en) * 2016-12-13 2018-06-14 Stefan Jozef Mauger Speech production and the management/prediction of hearing loss
CN111223564A (zh) * 2020-01-14 2020-06-02 浙江大学 一种基于卷积神经网络的噪声性听力损失预测系统
CN111584065A (zh) * 2020-04-07 2020-08-25 上海交通大学医学院附属第九人民医院 噪声性听力损失预测及易感人群筛选方法、装置、终端和介质
CN114861835A (zh) * 2022-07-04 2022-08-05 浙江大学 一种基于非对称卷积的噪声性听力损失预测系统

Also Published As

Publication number Publication date
CN114861835B (zh) 2022-09-27
CN114861835A (zh) 2022-08-05

Similar Documents

Publication Publication Date Title
WO2024008045A1 (zh) 一种基于非对称卷积的噪声性听力损失预测系统
WO2019218725A1 (zh) 基于骨传导振动与机器学习的智能输入方法及系统
Su et al. Bandwidth extension is all you need
CN109285551B (zh) 基于wmfcc和dnn的帕金森患者声纹识别方法
CN114469124B (zh) 一种运动过程中异常心电信号的识别方法
CN112587153A (zh) 一种基于vPPG信号的端到端的非接触房颤自动检测系统和方法
CN114241599A (zh) 一种基于多模态特征的抑郁倾向测评系统和方法
CN115346561B (zh) 基于语音特征的抑郁情绪评估预测方法及系统
CN111223564A (zh) 一种基于卷积神经网络的噪声性听力损失预测系统
CN115862684A (zh) 一种基于音频的双模式融合型神经网络的抑郁状态辅助检测的方法
Sharan et al. Cough sound analysis for diagnosing croup in pediatric patients using biologically inspired features
CN112820279A (zh) 基于语音上下文动态特征的帕金森病检测方法
CN115376526A (zh) 一种基于声纹识别的电力设备故障检测方法及系统
CN113674767A (zh) 一种基于多模态融合的抑郁状态识别方法
Casaseca-de-la-Higuera et al. Effect of downsampling and compressive sensing on audio-based continuous cough monitoring
CN115910097A (zh) 一种高压断路器潜伏性故障可听声信号识别方法及系统
Bouserhal et al. Classification of nonverbal human produced audio events: a pilot study
Ankışhan A new approach for detection of pathological voice disorders with reduced parameters
Porieva et al. Investigation of lung sounds features for detection of bronchitis and COPD using machine learning methods
CN112329819A (zh) 基于多网络融合的水下目标识别方法
Villanueva et al. Respiratory Sound Classification Using Long-Short Term Memory
Raju et al. AUTOMATIC SPEECH RECOGNITION SYSTEM USING MFCC-BASED LPC APPROACH WITH BACK PROPAGATED ARTIFICIAL NEURAL NETWORKS.
CN114077851B (zh) 基于fsvc的球磨机工况识别方法
Khanum et al. Speech based gender identification using feed forward neural networks
CN116230017A (zh) 语音评估方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23834812

Country of ref document: EP

Kind code of ref document: A1