CN116687438A - Method and device for identifying borborygmus - Google Patents

Method and device for identifying borborygmus Download PDF

Info

Publication number
CN116687438A
CN116687438A CN202310627776.5A CN202310627776A CN116687438A CN 116687438 A CN116687438 A CN 116687438A CN 202310627776 A CN202310627776 A CN 202310627776A CN 116687438 A CN116687438 A CN 116687438A
Authority
CN
China
Prior art keywords
data
physiological sound
borborygmus
sound
physiological
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310627776.5A
Other languages
Chinese (zh)
Inventor
于延锁
张明武
安翔
谢振年
杨延坤
刘强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Petrochemical Technology
Original Assignee
Beijing Institute of Petrochemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Petrochemical Technology filed Critical Beijing Institute of Petrochemical Technology
Priority to CN202310627776.5A priority Critical patent/CN116687438A/en
Publication of CN116687438A publication Critical patent/CN116687438A/en
Pending legal-status Critical Current

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B7/00Instruments for auscultation
    • A61B7/02Stethoscopes
    • A61B7/04Electric stethoscopes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B7/00Instruments for auscultation
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B7/00Instruments for auscultation
    • A61B7/003Detecting lung or respiration noise
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Pulmonology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention relates to the technical field of medical treatment, in particular to a method and a device for identifying borborborygmus. The method comprises the following steps: physiological sound data and borborygmus data with preset sampling rate are collected; wherein the physiological sound data includes cough sound, breath sound, heart sound, and murmur; borborygmus data includes borborygmus and borygmus; respectively extracting corresponding acoustic characteristics aiming at the physiological sound data and the borborygmus data; constructing a physiological sound recognition model; inputting the acoustic characteristics of the physiological sound data into the physiological sound recognition model, and training and obtaining a physiological sound prediction model; freezing all parameters in the physiological sound recognition model, and adding a new feature classification block to replace the feature classification block to generate a borborygmus recognition model; and inputting the acoustic characteristics of the borborygmus data into the borygmus recognition model to generate a borygmus prediction model so as to solve the problem of low borygmus recognition accuracy in the prior art.

Description

Method and device for identifying borborygmus
Technical Field
The invention relates to the technical field of medical treatment, in particular to a method and a device for identifying borborborygmus.
Background
Borborygmus refers to the intermittent generation of a gas-over-water sound (or gurgling sound) by the flow of gas and liquid within the intestinal lumen as the intestinal canal peristalses. The borborygmus reflects the motion state of human small intestine, is an important index for detecting intestinal diseases, and auscultation of borygmus is a means for diagnosing intestinal diseases commonly used at present. With the development of deep learning technology, researchers have gradually used deep learning methods to diagnose borborygmus. The current borborygmus recognition method adopts a deep neural network to carry out model training so as to improve the model performance. However, the parameters of the deep neural network are large, and a large amount of data support is required to train an effective model.
Because of the difficulty in acquiring bowel sounds, the amount of data is relatively small or the resources are low, which presents challenges for model training. In addition, the borygmus signal itself is weak and is easily interfered by noise, which also affects the performance of the model, and thus the accuracy of borygmus recognition is low. Therefore, when designing the borborygmus recognition model, factors such as data amount, data quality and model complexity need to be comprehensively considered.
Disclosure of Invention
Accordingly, the present invention is directed to a method and apparatus for recognizing borborygmus, which solve the problem of low accuracy in borygmus recognition in the prior art.
According to a first aspect of an embodiment of the present invention, a method for identifying borborygmus includes:
physiological sound data and borborygmus data with preset sampling rate are collected; wherein the physiological sound data includes cough sound, breath sound, heart sound, and murmur; borborygmus data includes borborygmus and borygmus;
respectively extracting corresponding acoustic characteristics aiming at the physiological sound data and the borborygmus data;
constructing a physiological sound recognition model; wherein the physiological sound recognition model comprises a feature extraction block and a feature classification block; the feature extraction block adopts a left branch structure and a right branch structure to extract different feature information of the acoustic feature, and the different feature information is used for training the physiological sound recognition model;
inputting the acoustic characteristics of the physiological sound data into the physiological sound recognition model, and training and obtaining a physiological sound prediction model;
freezing all parameters in the physiological sound prediction model, and adding a new feature classification block to replace the feature classification block to generate a borborygmus recognition model;
and inputting the acoustic characteristics of the borborygmus data into the borygmus recognition model, and training to obtain a borygmus prediction model.
Further, before extracting the corresponding acoustic features for the physiological sound data and the borborygmus data, the method further includes performing data preprocessing on the physiological sound data and the borygmus data, including:
performing signal filtering on the physiological sound data and the borborygmus data to generate a physiological sound signal and a borygmus signal;
and carrying out framing processing and windowing processing on the physiological sound signal and the borborygmus signal to generate frequency domains of each frame of physiological sound signal and each frame of borygmus signal.
Further, the extracting the corresponding acoustic features for the physiological sound data and the borborygmus data respectively includes:
and respectively extracting corresponding acoustic characteristics according to the frequency domains of each frame of physiological sound signal and each frame of borborygmus signal.
Further, before the physiological sound recognition model is constructed, the method further comprises:
and carrying out data mean normalization on the acoustic characteristics of the physiological sound data.
Further, the constructing a physiological sound recognition model includes:
the different characteristic information is combined and processed and then is input into a characteristic classification block, and the physiological sound recognition model is constructed and generated;
the feature classification block comprises a full connection layer, and the full connection layer is activated by using a preset function.
Further, the training process of the physiological sound recognition model comprises the following steps:
acquiring a preset first number of data samples as first training samples; wherein each data sample comprises: acoustic features of the target physiological sound data and categories of the target physiological sound data;
and training the physiological sound recognition model based on the data sample to obtain the physiological sound prediction model.
Further, the method comprises the steps of:
acquiring a preset second number of data samples as first verification data; wherein each data sample comprises: the acoustic characteristics of the target verification physiological sound data and the class of the target verification physiological sound data;
inputting the verification data into the physiological sound prediction model to obtain a prediction result;
calculating the fitting degree between the prediction result and the probability value of the category where the corresponding physiological sound data are actually located;
and if the fitting degree is lower than a preset value, retraining the physiological sound prediction model.
Further, the training process of the borborygmus prediction model comprises the following steps:
acquiring a preset third number of data samples as second training samples; wherein each data sample comprises: acoustic features of the target borborygmus data and categories of the target borygmus data;
training the borborygmus recognition model based on the third number of data samples to obtain the borygmus recognition model.
Further, the method comprises the steps of:
acquiring a preset fourth number of data samples as second verification data; wherein each data sample comprises: acoustic features of the target verification borborygmus data and categories of the target verification borygmus data;
inputting the verification data into the borborygmus prediction model to obtain a borygmus prediction result;
calculating the fitting degree between the bowel sound prediction result and the probability value of the category where the bowel sound data are actually located;
and if the fitting degree is lower than a preset value, retraining the borborygmus predictive model.
According to a second aspect of an embodiment of the present invention, an apparatus for recognizing borborygmus includes:
the acquisition module is used for acquiring physiological sound data and borborygmus data with preset sampling rate; wherein the physiological sound data includes cough sound, breath sound, heart sound, and murmur; borborygmus data includes borborygmus and borygmus;
the processing module is used for respectively extracting corresponding acoustic characteristics aiming at the physiological sound data and the borborygmus data;
the construction module is used for constructing a physiological sound recognition model; wherein the physiological sound recognition model comprises a feature extraction block and a feature classification block; the feature extraction block adopts a left branch structure and a right branch structure to extract different feature information of the acoustic feature, and the different feature information is used for training the physiological sound recognition model;
the training module is used for inputting the acoustic characteristics of the physiological sound data into the physiological sound recognition model, and training and obtaining a physiological sound prediction model;
the adjusting module is used for freezing all parameters in the physiological sound prediction model, adding a new feature classification block to replace the feature classification block, and constructing and obtaining a borborygmus recognition model;
and the generation module is used for inputting the acoustic characteristics of the borborygmus data into the borygmus recognition model, training and obtaining a borygmus prediction model.
The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects:
according to the invention, the physiological sound prediction model is obtained by training physiological sound data on the physiological sound recognition model, wherein the physiological sound prediction model comprises a feature extraction block and a feature classification block, the acoustic features of the learned physiological sound data are reserved by freezing all parameters in the physiological sound prediction model, the feature classification block of the physiological sound recognition model is adjusted, the feature classification block is replaced by a new feature classification block, the acoustic features of the bowel sound data are extracted, the bowel sound recognition model is constructed, the fine tuning from the physiological sound recognition model to the bowel sound recognition model is realized, finally, the acoustic features of the bowel sound data are input into the constructed bowel sound recognition model and trained, and finally, the obtained bowel sound recognition model can improve the accuracy of bowel sound data recognition, so that the problem that a large amount of bowel sound data are required to be trained to generate the bowel sound recognition model with higher precision is avoided, and the bowel sound recognition method with low resources is realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow chart illustrating a method of identifying bowel sounds according to an exemplary embodiment;
fig. 2 is a schematic block diagram illustrating an apparatus for recognizing a borborygmus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for identifying borborygmus according to an exemplary embodiment, and as shown in fig. 1, the method may specifically include the following steps:
s11, step: physiological sound data and borborygmus data with preset sampling rate are collected; wherein the physiological sound data includes cough sound, breath sound, heart sound, and murmur; borborygmus data includes borborygmus and borygmus;
by way of example, the physiological sound data includes a Coswara dataset comprising breath sound samples and cough sound samples, each breath sound sample having a duration of 30 seconds and each cough sound sample having a duration of 20 seconds, totaling about 75 hours; the icbhi_2017 dataset contained breath sound samples and non-breath sound samples, each sample varying in duration from 30 seconds to 35 seconds, summarized at about 9.5 hours; the coughid dataset comprises cough sound samples and non-cough sound samples, each sample varying in duration from 4 seconds to 20 seconds, totaling about 130 hours; the MIMIC-III Waveform dataset includes heart sound samples and non-heart sound samples, each of which varies in duration from a few hours to a few days, totaling more than 3000 hours. The Bowel Sound dataset contained Bowel Sound samples and Bowel murmur samples for a total of about 5 hours. The physiological sound data set contains 4 classes, namely cough, breath, heart and murmurs, wherein non-cough, non-breath and murmurs are combined into murmur classes. The borborygmus dataset contains 2 classes, namely borygmus and borygmus.
S12, step: respectively extracting corresponding acoustic characteristics aiming at the physiological sound data and the borborygmus data;
preferably, before extracting the corresponding acoustic features for the physiological sound data and the borborygmus data, the method further includes performing data preprocessing on the physiological sound data and the borygmus data, including:
and performing signal filtering on the physiological sound data and the borborygmus data to generate a physiological sound signal and a borygmus signal.
And carrying out framing processing and windowing processing on the physiological sound signal and the borborygmus signal to generate frequency domains of each frame of physiological sound signal and each frame of borygmus signal.
The data preprocessing process specifically comprises the following steps:
step 1. Signal Filtering
Adjusting the sampling rate f for a physiological sound dataset s After 22050Hz, the signal is filtered by a third-order Butterworth band-pass filter
Frequency signals within the range of 50Hz-3000 Hz are selectively reserved, and other frequency signals are filtered. Wherein omega c For the cut-off frequency, s is the Laplace transform factor, s is replaced by z-1, and the differential equation of the three-order Butterworth digital filter in the discrete time domain is obtained
Where x (n) is the input signal, i.e. the physiological sound signal that needs to be signal filtered. y (n) is the output signal, i.e. the physiological sound signal in the frequency range of 50Hz-3000 Hz is preserved.
a 0 、a 1 、a 2 、a 3 、b 0 、b 1 、b 2 、b 3 Is the coefficient of the third order butterworth digital filter.
Adjusting the sampling rate f for the borborygmus dataset s After 22050Hz, a third-order Butterworth band-pass filter is used for filtering, frequency signals in the range of 60Hz-1200 Hz are selectively reserved, and other frequency signals are filtered.
Step 2, framing and windowing
According to the physiological sound data and the borborygmus data obtained by the step 1 after signal filtering, respectively segmenting the physiological sound data and the borygmus data according to the time length T seconds, and judging whether the data segment is larger than the data segment with the time length less than T seconds when the data segment with the time length less than T seconds appearsSecond. If the data segment is greater than->Second, the time length is padded to T seconds by directly supplementing the 0 value after the data segment. If the data segment is less than or equal to + ->Second, copy the data segment back and splice, filling the time length to T seconds. After the segmented physiological sound data and the borborygmus data are obtained, short-time Fourier transform is applied to each segment of physiological sound data and borborygmus data
N s =f s ×T
Obtaining frequency domain representation x of each segment of physiological sound data and borborygmus data n (k) A. The invention relates to a method for producing a fibre-reinforced plastic composite Wherein d (N) is a time domain representation of each piece of physiological sound data or borborygmus data, k is a different frequency value of frequencies in each piece of physiological sound data or borygmus data, N is an index value of each piece of physiological sound data or borygmus data, N s The number of samples representing a certain period of T seconds of physiological sound data or borborygmus data.
After obtaining the frequency domain representation of each piece of physiological sound data and the borborygmus data, the frequency domain representation of each piece of physiological sound data and the borygmus data is subjected to framing windowing processing by using a hamming window
Wherein X is nf (k f ) Representing each frame of physiological sound data or bowel sound data obtained by carrying out frame windowing operation on each segment of physiological sound data or bowel sound data, namely representing that the frequency of nf frame of a certain segment of physiological sound data or bowel sound data is k f Frequency domain representation of k f Different frequency values in the frequency domain are used for each frame of physiological sound data or borborygmus data. w (m) is a hamming window function.
In step S12, for the physiological sound data and the borborygmus data, corresponding acoustic features are extracted respectively, and the process of extracting acoustic features specifically includes:
step 3:
and A, step A: calculating the power spectral density values of each frame of physiological sound data and borborygmus data based on the frequency domain representation of each frame of physiological sound data and borygmus data obtained in the step 2 so as to describe the energy distribution of the physiological sound data and borygmus data at different frequency points
Wherein, |X nf (k f ) I represents modulo the value, P nf (k f ) Represents that the frequency rate of the nf frame of a certain section of physiological sound data or bowel sound data is k f Is a power spectral density value of (2).
And B, step B: converting the power spectral density values of each frame of physiological sound data and the intestinal sound data into decibel units respectively, and arranging the power spectral density values of each frame of physiological sound data into a two-dimensional matrix according to time sequence to obtain a logarithmic spectrogram X of a certain section of physiological sound data phs Dimension is (N p ,N c ) Every pair of handles is of the same kindThe power spectral density values of the frame borborygmus data are arranged into a two-dimensional matrix according to time sequence, and a logarithmic spectrum diagram X of a certain section of borygmus data is obtained bs Dimension is (N p ,N c )。
Wherein N is c The number of frames of physiological sound data or borborygmus data for a certain period of T seconds is indicated.Representing rounding the value. N (N) s The number of sampling points of physiological sound data or borborygmus data of a certain period of T seconds is represented. N (N) f Expressed as frame length, i.e. the length of the sampling points of each frame of physiological sound data or bowel sound data. d is expressed as the number of samples per frame shift, i.e. the hamming window function. N (N) p The number of different frequencies of a certain piece of physiological sound data or bowel sound data is indicated.
And C, respectively carrying out data mean normalization on the logarithmic spectrograms of all the physiological sound data and the borborygmus data. For example, assuming that there is one physiological sound data or bowel sound data including N logarithmic spectrograms, the mean μ and standard deviation σ of the logarithmic spectrograms of all the physiological sound data or bowel sound data are calculated
Wherein X is a logarithmic spectrogram before mean normalization, X Is the logarithmic spectrogram after mean normalization.
It should be noted that the log spectrum is the acoustic feature mentioned in step S12.
S13, step: constructing a physiological sound recognition model; wherein the physiological sound recognition model comprises a feature extraction block and a feature classification block; the feature extraction block adopts a left branch structure and a right branch structure to extract different feature information of the acoustic feature, and the different feature information is used for training the physiological sound recognition model.
It is worth to say that the constructed physiological sound recognition model adopts a two-branch structure, the left branch adopts VGGNet, and mainly consists of two blocks, one is a convolution layer block, a plurality of repeated convolution layer blocks are used, and a pooling layer is added after each convolution layer block to reduce the size of the feature map, wherein the depth and the pooling size of different convolution layer blocks can be different. The second is a full connection layer block, which is followed by several full connection layers. The final layer of the branches is converted into a one-dimensional characteristic diagram with the length L through the output of the global average pooling layer V Is a vector of (2); the right branch adopts ResNet, and mainly consists of two blocks, wherein one block is a residual block, which is a core part in ResNet, and each residual block comprises two convolution layers and a residual connection. The residual connection adds the input data directly to the output data. And secondly, a residual network, wherein ResNet consists of a plurality of residual blocks, and a depth model is constructed by stacking the plurality of residual blocks. The final layer of the branches is converted into a one-dimensional characteristic diagram with the length L through the output of the global average pooling layer R Is a vector of (a). For the left branch and the right branch, which are collectively called as a feature extraction block, the vectors output by the two branches at last are spliced in the last dimension through a concatate layer to form one dimension with the length of L V +L R The fusion of different characteristic information extracted from the two branches is realized. Finally, combining the layers through the concatate layer to obtain the length L V +L R Is input to a specialIn the syndrome classification block,
wherein the feature classification block comprises a fully connected layer and uses softmax activation functions. And obtaining a physiological sound recognition model through the construction.
In some specific embodiments, the model of the physiological sound recognition model adopts a multi-branch structure, and one branch uses VGGNet, so that characteristic information in more physiological sound signals can be well reserved. The other branch uses ResNet to effectively extract feature information in the physiological sound signal, which may be multi-scale features.
S14, step: inputting the acoustic characteristics of the physiological sound data into the physiological sound recognition model, and training and obtaining a physiological sound prediction model;
it will be appreciated that step S14 is the training of a physiological sound recognition model.
Preferably, the training process of the physiological sound recognition model includes:
acquiring a preset first number of data samples as first training samples; wherein each data sample comprises: acoustic features of the target physiological sound data and categories of the target physiological sound data;
and training the physiological sound recognition model based on the data sample to obtain the physiological sound prediction model.
Acquiring a preset second number of data samples as first verification data; wherein each data sample comprises: the acoustic characteristics of the target verification physiological sound data and the class of the target verification physiological sound data;
inputting the verification data into the physiological sound prediction model to obtain a prediction result;
calculating the fitting degree between the prediction result and the probability value of the category where the corresponding physiological sound data are actually located;
and if the fitting degree is lower than a preset value, retraining the physiological sound prediction model.
In some specific embodiments, it is assumed that there is T d Individual samples as physiological sound recognitionThe model is input, wherein the samples refer to the logarithmic spectrogram of the physiological sound data after mean normalization, i.e. the first number of data samples comprises T d Log spectrograms of the individual physiological sound data.
Specifically, let i' th sample x () The input dimension of the target physiological sound data is (N) p ,N c ) The real label corresponding to the ith sample is y () E 0,1,2, …, K, where K represents the number of categories, and the real label refers to which category a piece of physiological sound data specifically belongs to, e.g., set the heart sound in the physiological sound data to category 0, the breath sound to category 1, the cough sound to category 2, the murmur to category 3, and 4 total categories.
The result of recognition of the output of the calculation model by forward transfer isI.e. the probability distribution of the i-th sample for 4 classes, where the probability distribution refers to the probability of a possible value for 4 classes, the sum of the probabilities being 1.
Using multiple cross entropy loss functions
In the method, in the process of the invention,representing a multi-element cross entropy loss function under a physiological sound recognition model parameter theta, wherein the multi-element cross entropy loss function is used for measuring the degree of difference between a predicted result of a physiological sound recognition model and a real label, and the theta is a physiological sound recognition model parameter N L Representing the number of log spectrograms of the physiological sound data. />Expressed as a value representing the j-th class in the real label of sample i, which may be 0 or 1, indicating that the sample belongs toThe j-th category (1) or not belonging to the j-th category (0), the +.>A probability value representing the prediction of the physiological sound recognition model for sample i in the j-th class, the value representing the probability that the physiological sound recognition model predicts that sample belongs to the j-th class. It can be understood that the method for measuring the degree of difference between the predicted result of the physiological sound recognition model and the real label by using the multi-element cross entropy loss function is a verification part of the physiological sound recognition model training.
In each iteration, a random gradient descent method is used to update the physiological sound recognition model parameters θ
Where a represents the learning rate and where,representing the gradient of the loss function to the physiological sound recognition model parameter θ. For a physiological sound sample (x () ,y () ) The corresponding loss function gradient is that
Wherein the method comprises the steps ofRepresenting the error between the predicted result of the physiological sound recognition model and the real label, x () Representing an input physiological sound sample.
After each iteration, calculating the loss values of all samples again by using the new parameters, and repeatedly executing the designated iteration times until the model converges to obtain a trained physiological sound recognition model.
It can be understood that by using a large amount of data of four types of physiological sounds, namely cough sound, heart sound, breath sound and murmur, in the physiological sound data, an accurate physiological sound recognition model is constructed based on the acoustic characteristics by using the extracted acoustic characteristics of the physiological sound, and the physiological sound recognition model is trained to obtain a physiological sound prediction model.
S15, step: freezing all parameters in the physiological sound prediction model, adding a new feature classification block to replace the feature classification block, and constructing and obtaining a borygmus recognition model;
specifically, the borborygmus recognition model includes two parts of a feature extraction block and a feature classification block respectively. The feature extraction block of the borborygmus recognition model adopts the same model structure as that of the feature extraction block of the physiological sound recognition model, namely, the VGGNet branch, the ResNet branch and the concatate layer after the two branches in the physiological sound recognition model are reserved.
A new feature classification block is added, which contains two fully connected layers, and is mapped to 2 classes using a softmax activation function, the model finally results in a probability distribution for the 2 classes, where the probability distribution refers to the probability of a possible value for the 2 classes, and the sum of the probabilities is 1.
And replacing the feature classification block of the physiological sound recognition model with the newly constructed feature classification block. And constructing the model to obtain the borborygmus recognition model.
It can be understood that the obtained physiological sound prediction model is finely tuned by utilizing the similarity of acoustic characteristics between physiological sound and borborygmus to obtain a required borygmus recognition model, so that a borygmus prediction model with high borygmus recognition accuracy can be obtained without a large amount of borygmus data.
S16, step: and inputting the acoustic characteristics of the borborygmus data into a constructed borygmus recognition model, and training to obtain a borygmus prediction model.
Preferably, the training process of the borborygmus prediction model includes:
acquiring a preset third number of data samples as second training samples; wherein each data sample comprises: acoustic features of the target borborygmus data and categories of the target borygmus data;
training the borborygmus recognition model based on the third number of data samples to obtain the borygmus prediction model.
Preferably, the method comprises the steps of:
acquiring a preset fourth number of data samples as second verification data; wherein each data sample comprises: acoustic features of the target verification borborygmus data and categories of the target verification borygmus data;
inputting the verification data into the borborygmus prediction model to obtain a borygmus prediction result;
calculating the fitting degree between the bowel sound prediction result and the probability value of the category where the bowel sound data are actually located;
and if the fitting degree is lower than a preset value, retraining the borborygmus predictive model.
The method comprises the following steps:
step a-suppose the third number of data samples is T t Training samples, T t And taking the training samples as input of a borborygmus recognition model, wherein the training samples refer to a logarithmic spectrogram of target borborygmus data after mean normalization.
Ith sampleIs of (N) p ,N c ) The corresponding real tag is +.>The real tag refers to which type of borborygmus data a certain piece of borygmus data belongs to, for example, borygmus is set to type 0, borygmus is set to type 1.
Step b, calculating the prediction result of the output of the model through forward transmission asI.e. representing the probability distribution of the ith sample for 2 classes, where the probability distribution isRefers to the probability that 2 classes may take on values, the sum of the probabilities being 1. Using binary cross entropy loss functions
In the method, in the process of the invention,parameter θ representing bowel sound recognition model b The binary cross entropy loss function is used for measuring the difference degree between the prediction result of the borborygmus recognition model and the real label, and theta b For parameters of the borborygmus recognition model, N represents the number of log spectrograms of borygmus data, +.>A real tag representing the ith borborygmus sample, which value may be 0 or 1,/-for example>A probability value indicating the predicted output of the borborygmus recognition model to the sample i.
Step c, calculating a model parameter theta of loss function to borborygmus recognition by using a back propagation algorithm b Gradient of (2)
At each iteration, a random gradient descent method is used to update the borborygmus recognition model parameters θ b
Where α represents the learning rate.
And c, repeating the steps a to c until the model converges, and finally obtaining a trained borborygmus recognition model.
It can be understood that the physiological sound prediction model is obtained by training the physiological sound data on the physiological sound recognition model, wherein the physiological sound prediction model comprises a feature extraction block and a feature classification block, the acoustic features of the learned physiological sound data are reserved by freezing all parameters in the physiological sound prediction model, the feature classification block of the physiological sound recognition model is adjusted, the feature classification block is replaced by a new feature classification block, the acoustic features of the intestinal sound data are extracted more suitably, the intestinal sound recognition model is constructed, fine tuning from the physiological sound recognition model to the intestinal sound recognition model is realized, finally, the acoustic features of the intestinal sound data are input into the constructed intestinal sound recognition model and trained, and the finally obtained intestinal sound recognition model can improve the accuracy rate of intestinal sound data recognition, so that the problem that a large amount of intestinal sound data needs to be trained to generate the intestinal sound recognition model with higher precision is avoided, and the intestinal sound recognition method with low resources is realized.
Referring to fig. 2, fig. 2 is a schematic block diagram of an apparatus for recognizing borborygmus, as shown in fig. 2, including:
the acquisition module 1 is used for acquiring physiological sound data and borborygmus data with preset sampling rate; wherein the physiological sound data includes cough sound, breath sound, heart sound, and murmur; borborygmus data includes borborygmus and borygmus;
the processing module 2 is used for respectively extracting corresponding acoustic characteristics for the physiological sound data and the borborygmus data;
a construction module 3 for constructing a physiological sound recognition model; wherein the physiological sound recognition model comprises a feature extraction block and a feature classification block; the feature extraction block adopts a left branch structure and a right branch structure to extract different feature information of the acoustic feature, and the different feature information is used for training the physiological sound recognition model;
the training module 4 is used for inputting the acoustic characteristics of the physiological sound data into the physiological sound recognition model, and training and obtaining a physiological sound prediction model;
the adjusting module 5 is used for freezing all parameters in the physiological sound prediction model, adding a new feature classification block to replace the feature classification block, and constructing and obtaining a borborygmus recognition model;
and a generation module 6 for inputting the acoustic characteristics of the borborygmus data into the borygmus recognition model, and training and obtaining a borygmus prediction model.
Specifically, the device for identifying a borborygmus can refer to a specific implementation manner of a method for identifying a borborygmus described in any of the above embodiments, which is not described herein again.
It can be understood that the physiological sound prediction model is obtained by training the physiological sound data on the physiological sound recognition model, wherein the physiological sound prediction model comprises a feature extraction block and a feature classification block, the acoustic features of the learned physiological sound data are reserved by freezing all parameters in the physiological sound prediction model, the feature classification block of the physiological sound recognition model is adjusted, the feature classification block is replaced by a new feature classification block, the acoustic features of the intestinal sound data are extracted more suitably, the intestinal sound recognition model is constructed, fine tuning from the physiological sound recognition model to the intestinal sound recognition model is realized, finally, the acoustic features of the intestinal sound data are input into the constructed intestinal sound recognition model and trained, and the finally obtained intestinal sound recognition model can improve the accuracy rate of intestinal sound data recognition, so that the problem that a large amount of intestinal sound data needs to be trained to generate the intestinal sound recognition model with higher precision is avoided, and the intestinal sound recognition method with low resources is realized.
It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.
It should be noted that in the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "plurality" means at least two.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: logic gates and discrete logic circuits for implementing logic functions on digital signals, application specific integrated circuits with suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, result, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, results, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims (10)

1. A method for recognizing borborygmus, comprising:
physiological sound data and borborygmus data with preset sampling rate are collected; wherein the physiological sound data includes cough sound, breath sound, heart sound, and murmur; borborygmus data includes borborygmus and borygmus;
respectively extracting corresponding acoustic characteristics aiming at the physiological sound data and the borborygmus data;
constructing a physiological sound recognition model; wherein the physiological sound recognition model comprises a feature extraction block and a feature classification block; the feature extraction block adopts a left branch structure and a right branch structure to extract different feature information of the acoustic feature, and the different feature information is used for training the physiological sound recognition model;
inputting the acoustic characteristics of the physiological sound data into the physiological sound recognition model, and training and obtaining a physiological sound prediction model;
freezing all parameters in the physiological sound prediction model, and adding a new feature classification block to replace the feature classification block to generate a borborygmus recognition model;
and inputting the acoustic characteristics of the borborygmus data into the borygmus recognition model, and training to obtain a borygmus prediction model.
2. The method of claim 1, further comprising data preprocessing the physiological sound data and the borborygmus data before extracting corresponding acoustic features for the physiological sound data and the borygmus data, respectively, comprising:
performing signal filtering on the physiological sound data and the borborygmus data to generate a physiological sound signal and a borygmus signal;
and carrying out framing processing and windowing processing on the physiological sound signal and the borborygmus signal to generate frequency domains of each frame of physiological sound signal and each frame of borygmus signal.
3. The method of claim 1, wherein the extracting the corresponding acoustic features for the physiological sound data and the borborygmus data, respectively, comprises:
and respectively extracting corresponding acoustic characteristics according to the frequency domains of each frame of physiological sound signal and each frame of borborygmus signal.
4. The method of claim 1, wherein prior to constructing the physiological sound recognition model, further comprising:
and carrying out data mean normalization on the acoustic characteristics of the physiological sound data.
5. The method of claim 1, wherein said constructing a physiological sound recognition model comprises:
the different characteristic information is combined and processed and then is input into a characteristic classification block, and the physiological sound recognition model is constructed and generated;
the feature classification block comprises a full connection layer, and the full connection layer is activated by using a preset function.
6. The method of claim 4, wherein the training process of the physiological sound recognition model comprises:
acquiring a preset first number of data samples as first training samples; wherein each data sample comprises: acoustic features of the target physiological sound data and categories of the target physiological sound data;
and training the physiological sound recognition model based on the data sample to obtain the physiological sound prediction model.
7. The method according to claim 6, comprising:
acquiring a preset second number of data samples as first verification data; wherein each data sample comprises: the acoustic characteristics of the target verification physiological sound data and the class of the target verification physiological sound data;
inputting the verification data into the physiological sound prediction model to obtain a prediction result;
calculating the fitting degree between the prediction result and the probability value of the category where the corresponding physiological sound data are actually located;
and if the fitting degree is lower than a preset value, retraining the physiological sound prediction model.
8. The method of claim 1, wherein the training process of the borborygmus predictive model comprises:
acquiring a preset third number of data samples as second training samples; wherein each data sample comprises: acoustic features of the target borborygmus data and categories of the target borygmus data;
training the borborygmus recognition model based on the third number of data samples to obtain the borygmus recognition model.
9. The method according to claim 8, comprising:
acquiring a preset fourth number of data samples as second verification data; wherein each data sample comprises: acoustic features of the target verification borborygmus data and categories of the target verification borygmus data;
inputting the verification data into the borborygmus prediction model to obtain a borygmus prediction result;
calculating the fitting degree between the bowel sound prediction result and the probability value of the category where the bowel sound data are actually located;
and if the fitting degree is lower than a preset value, retraining the borborygmus predictive model.
10. An apparatus for recognizing a borborygmus, comprising:
the acquisition module is used for acquiring physiological sound data and borborygmus data with preset sampling rate; wherein the physiological sound data includes cough sound, breath sound, heart sound, and murmur; borborygmus data includes borborygmus and borygmus;
the processing module is used for respectively extracting corresponding acoustic characteristics aiming at the physiological sound data and the borborygmus data;
the construction module is used for constructing a physiological sound recognition model; wherein the physiological sound recognition model comprises a feature extraction block and a feature classification block; the feature extraction block adopts a left branch structure and a right branch structure to extract different feature information of the acoustic feature, and the different feature information is used for training the physiological sound recognition model;
the training module is used for inputting the acoustic characteristics of the physiological sound data into the physiological sound recognition model, and training and obtaining a physiological sound prediction model;
the adjusting module is used for freezing all parameters in the physiological sound prediction model, adding a new feature classification block to replace the feature classification block, and constructing and obtaining a borborygmus recognition model;
and the generation module is used for inputting the acoustic characteristics of the borborygmus data into the borygmus recognition model, training and obtaining a borygmus prediction model.
CN202310627776.5A 2023-05-30 2023-05-30 Method and device for identifying borborygmus Pending CN116687438A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310627776.5A CN116687438A (en) 2023-05-30 2023-05-30 Method and device for identifying borborygmus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310627776.5A CN116687438A (en) 2023-05-30 2023-05-30 Method and device for identifying borborygmus

Publications (1)

Publication Number Publication Date
CN116687438A true CN116687438A (en) 2023-09-05

Family

ID=87842634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310627776.5A Pending CN116687438A (en) 2023-05-30 2023-05-30 Method and device for identifying borborygmus

Country Status (1)

Country Link
CN (1) CN116687438A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117238299A (en) * 2023-11-14 2023-12-15 国网山东省电力公司电力科学研究院 Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117238299A (en) * 2023-11-14 2023-12-15 国网山东省电力公司电力科学研究院 Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line
CN117238299B (en) * 2023-11-14 2024-01-30 国网山东省电力公司电力科学研究院 Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line

Similar Documents

Publication Publication Date Title
CN106782501B (en) Speech feature extraction method and device based on artificial intelligence
Beckmann et al. Speech-vgg: A deep feature extractor for speech processing
CN116687438A (en) Method and device for identifying borborygmus
CN114023412A (en) ICD code prediction method and system based on joint learning and denoising mechanism
KR20170064960A (en) Disease diagnosis apparatus and method using a wave signal
Turan et al. Monitoring Infant's Emotional Cry in Domestic Environments Using the Capsule Network Architecture.
CN108806725A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN114391827A (en) Pre-hospital emphysema diagnosis device based on convolutional neural network
CN117611601A (en) Text-assisted semi-supervised 3D medical image segmentation method
CN113129310A (en) Medical image segmentation system based on attention routing
Ahmed et al. Musical genre classification on the marsyas audio data using convolution NN
CN116778158A (en) Multi-tissue composition image segmentation method and system based on improved U-shaped network
CN116898451A (en) Method for realizing atrial fibrillation prediction by using neural network with multi-scale attention mechanism
CN116570284A (en) Depression recognition method and system based on voice characterization
CN116310770A (en) Underwater sound target identification method and system based on mel cepstrum and attention residual error network
Cai et al. The best input feature when using convolutional neural network for cough recognition
CN115206347A (en) Method and device for identifying bowel sounds, storage medium and computer equipment
CN111312215A (en) Natural speech emotion recognition method based on convolutional neural network and binaural representation
CN117476034A (en) Method, device and storage medium for constructing borborygmus signal feature recognition model
CN111354372A (en) Audio scene classification method and system based on front-end and back-end joint training
Fayyazi et al. Analyzing the Use of Auditory Filter Models for Making Interpretable Convolutional Neural Networks for Speaker Identification
CN116421152B (en) Sleep stage result determining method, device, equipment and medium
CN114863939B (en) Panda attribute identification method and system based on sound
CN117373492B (en) Deep learning-based schizophrenia voice detection method and system
CN115831356B (en) Auxiliary prediction diagnosis method based on artificial intelligence algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination