CN116052725A - Fine granularity borborygmus recognition method and device based on deep neural network - Google Patents
Fine granularity borborygmus recognition method and device based on deep neural network Download PDFInfo
- Publication number
- CN116052725A CN116052725A CN202310335591.7A CN202310335591A CN116052725A CN 116052725 A CN116052725 A CN 116052725A CN 202310335591 A CN202310335591 A CN 202310335591A CN 116052725 A CN116052725 A CN 116052725A
- Authority
- CN
- China
- Prior art keywords
- neural network
- module
- borborygmus
- deep neural
- borygmus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 206010067715 Gastrointestinal sounds abnormal Diseases 0.000 title claims abstract description 45
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 36
- 238000002555 auscultation Methods 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000003062 neural network model Methods 0.000 claims abstract description 24
- 210000001015 abdomen Anatomy 0.000 claims abstract description 18
- 230000003187 abdominal effect Effects 0.000 claims abstract description 15
- 238000002372 labelling Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 15
- 238000010606 normalization Methods 0.000 claims description 15
- 210000002569 neuron Anatomy 0.000 claims description 10
- 230000005236 sound signal Effects 0.000 claims description 10
- 238000012795 verification Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 208000037656 Respiratory Sounds Diseases 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 12
- 238000000605 extraction Methods 0.000 abstract description 4
- 230000001419 dependent effect Effects 0.000 abstract description 2
- 239000013598 vector Substances 0.000 description 25
- 238000013507 mapping Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 239000004973 liquid crystal related substance Substances 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 206010010774 Constipation Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 210000001035 gastrointestinal tract Anatomy 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 208000028774 intestinal disease Diseases 0.000 description 1
- 210000000936 intestine Anatomy 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 208000002551 irritable bowel syndrome Diseases 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 230000002572 peristaltic effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4803—Speech analysis specially adapted for diagnostic purposes
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B7/00—Instruments for auscultation
- A61B7/02—Stethoscopes
- A61B7/04—Electric stethoscopes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Artificial Intelligence (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Signal Processing (AREA)
- Animal Behavior & Ethology (AREA)
- Veterinary Medicine (AREA)
- Surgery (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Heart & Thoracic Surgery (AREA)
- Computational Linguistics (AREA)
- Pathology (AREA)
- Biophysics (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Epidemiology (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physiology (AREA)
- Psychiatry (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a device for identifying fine granularity borborygmus based on a deep neural network, which mainly solve the problems that in the prior art, a great amount of relevant information is lost in the process of converting 1-dimensional audio data into a 2-dimensional feature map and the specific moment of borygmus occurrence cannot be accurately positioned due to the fact that the signal features are very dependent on manual extraction. The deep neural network-based fine granularity borborygmus recognition method is used for collecting and labeling belly auscultation recording data; constructing a depth neural network model based on a transducer structure, and training the depth neural network by using the marked belly auscultation recording data to obtain a final model; and loading the final model, and obtaining a corresponding borborygmus recognition result by the input abdominal voice signal. Through the scheme, the accurate fine granularity identification of the borborygmus event can be realized through end-to-end training without manual feature extraction.
Description
Technical Field
The invention relates to the technical field of deep neural networks, in particular to a method and a device for identifying fine-granularity borborygmus based on a deep neural network.
Background
Human Bowel Sounds (BS) are sounds produced by the peristaltic movement of the intestines to push food, liquids and gases. Many studies have shown that borborygmus is closely related to the functional status of the gastrointestinal tract, auscultation of borygmus is helpful in diagnosing functional bowel diseases such as irritable bowel syndrome and functional constipation, and thus, abdominal borygmus auscultation is an important investigation item in clinical work.
The prior art schemes are mainly divided into two main categories: a method based on conventional signal processing and a method based on deep learning; the disadvantages of this are: (1) Most of the existing methods rely on manually extracted signal features, but the features extracted by the manual features are often suboptimal due to the complexity of borygmus. This suboptimal signal characteristic limits the recognition performance of the subsequent classification algorithm; (2) The current mainstream algorithm is based on a Convolutional Neural Network (CNN), but CNN is specially designed for 2-dimensional image data, sound signals are 1-dimensional sequence data, the data characteristics of the sound signals are greatly different from those of the 2-dimensional image data, bowel sounds are 1-dimensional sequence data, the bowel sounds need to be converted into 2-dimensional characteristic diagrams, and a large amount of relevant information is lost in the data conversion process; (3) The prior art only realizes coarse granularity identification of the borborygmus events, namely only judges whether the borygmus events are contained in the belly sound fragments for a long time, but cannot accurately locate the specific moment of the borygmus events, and cannot judge related parameters such as the occurrence times, the frequency and the like of the borygmus events, and the parameters are critical to diagnosis of intestinal diseases.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a borborygmus sound fine-granularity identification method based on a deep neural network, which trains a novel borygmus sound identification neural network model based on a transducer structure on collected borygmus sound data and realizes fine-granularity accurate identification of borygmus sound events on the premise of not manually extracting audio features.
The invention provides the following technical scheme:
in one aspect, a method for identifying fine-grained borborygmus sounds based on a deep neural network includes
Collecting and labeling belly auscultation recording data;
constructing a depth neural network model based on a transducer structure, and training the depth neural network by using the marked belly auscultation recording data to obtain a final model;
and loading the final model, and inputting an abdomen sound signal to be detected to obtain a corresponding borborygmus recognition result.
In a preferred embodiment, collecting and annotating the abdominal auscultation recording data includes: the abdominal auscultation audio stream is collected and the auscultation recording is split into equal length audio pieces.
In a preferred embodiment, collecting and annotating the abdominal auscultation recording data includes: audio clips containing the borborygmus event and other sounds including gaussian noise, heart beat sounds, breathing sounds, speaking sounds and stethoscope friction sounds are screened from the audio clips and respectively form a data set.
In a preferred embodiment, collecting and annotating the abdominal auscultation recording data includes: the starting and ending time of the borborygmus event in each sound clip in the borygmus event data set is marked in detail, and the marking precision is accurate to the set millisecond level.
In a preferred embodiment, collecting and annotating the abdominal auscultation recording data includes: dividing the marked borborygmus event data set into three parts of a training set, a verification set and a test set.
On the other hand, the deep neural network model is a fine-grained borygmus recognition model constructed by using a frame embedding module, a position coding module, a plurality of stacked Transformer encoders and a linear classifier;
the frame embedding module is composed of a 1-dimensional convolution layer with multiple output channels;
the position coding module is responsible for automatically coding the position information of the audio frame sequence and adding the coded position information into the original input sequence through addition operation;
the transducer encoder is composed of a layer normalization module, a multi-head self-attention module and a multi-layer perceptron module;
the linear classifier module is composed of a layer of linear neurons and is responsible for classifying the audio feature scores extracted by the transducer encoder, and each audio frame is divided into two types of borygmus events and non-borygmus events.
In a preferred embodiment, the multi-head self-attention module and the multi-layer perceptron module are provided with residual modules.
In a preferred embodiment, constructing a transducer structure-based deep neural network model includes the steps of:
s301, adding position information into a position coding module, entering a layer normalization module, and performing normalization operation on the training set data, wherein the normalized data meets normal distribution with a mean value of 0 and a standard deviation of 1;
s302, outputting more abstract sequence features after the normalized data in the step S301 enter a multi-head self-attention module; the multi-head attention module is composed of a plurality of parallel self-attention modules;
s303, carrying out layer normalization on the sequence features output in the step S302 again, entering a multi-layer perceptron module, and outputting more abstract classification features; the multi-layer perceptron module consists of two layers of linear neurons, and the middle of the multi-layer perceptron module is connected by a nonlinear activation function; the classification features enter a residual error module and then output advanced features;
s304, the advanced features enter a linear classifier mode to be classified into borborygmus events and non-borygmus events.
In a preferred embodiment, the input abdominal sound signal obtains a corresponding borborygmus recognition result including: and (3) accurately identifying the borborygmus event in clinic by using the trained deep neural network model, and calculating the occurrence frequency of the borygmus event by using a non-maximum suppression algorithm according to the output result of the deep neural network model.
In a third aspect, a method and apparatus for identifying fine-grained borborygmus sounds based on a deep neural network include a memory: for storing executable instructions; a processor: the executable instructions are used for executing the executable instructions stored in the memory to realize a fine granularity borborygmus recognition method based on a deep neural network.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention can automatically learn the characteristics of the audio signal; compared with other existing methods, the neural network module provided by the invention can automatically learn how to extract optimal and highly abstract audio signal characteristics from a large amount of belly auscultation data, so that a better recognition effect can be realized.
(2) After training, the neural network model provided by the invention can directly process the original audio stream so as to obtain a refined borborygmus recognition result; in the using process of the neural network model proposed by other methods, low-level audio features such as a histogram of a frequency domain or a mel frequency cepstrum coefficient and the like are manually extracted from an original audio signal, and then the neural network model can calculate and obtain a coarse-granularity borborygmus recognition result according to the manually extracted features.
(3) The deep neural network provided by the invention is not based on a traditional CNN structure, but is based on a transducer structure specially designed for sequence data; the neural network structure provided by the invention not only can extract the clinical domain response characteristics of the borborygmus signal, but also can effectively extract the remote dependence information of different sound signals, so that the context information of the sequence signal can be better constructed.
(4) The current mainstream technology only realizes coarse-granularity identification of the borborygmus events, but cannot accurately position the specific moment of the borygmus events, so that the current mainstream technology is difficult to apply in large scale in the real world; the neural network model provided by the invention realizes fine-grained modeling of the abdomen auscultation signal through the transducer structure, further realizes the identification of the refined borygmus event, and has great help to auscultation of clinical borygmus.
Drawings
For a clearer description of embodiments of the invention or of the prior art, the drawings that are necessary for the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are some of the embodiments of the invention and that, without the inventive effort, further drawings may be obtained according to these drawings, for a person skilled in the art, in which:
FIG. 1 is a flow chart of the present invention.
Fig. 2 is an overall structure of the neural network model.
Fig. 3 is a specific structure of a frame embedding module.
Fig. 4 is a specific structure of the position coding module.
Fig. 5 is a transducer encoder module.
Fig. 6 is a specific structure of the linear classifier.
Description of the embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to fig. 1 to 6, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making inventive efforts are within the scope of protection of the present invention.
Aiming at the defects of the prior art, the invention provides a borborygmus fine-granularity identification method based on a deep neural network, which is based on a transducer structure, does not need manual feature extraction, and can realize fine-granularity accurate identification of borygmus events only through end-to-end training.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a borborygmus fine granularity identification method based on a deep neural network comprises the following steps:
(1) And collecting and marking data, collecting the desensitized belly auscultation record, and marking the starting and ending positions of the borborygmus event in detail.
(2) Creating a deep neural network based on a transducer structure, training a neural network model using the collected abdominal auscultatory recordings.
(3) Loading model to obtain corresponding bowel sound identification result according to inputted abdominal voice signal
The overall flow is as shown in FIG. 1:
further, in the step (1), the abdominal auscultation recording data for training the deep neural network model is collected and marked, and the method comprises the following steps:
(11) Recording and collecting belly auscultation audio streams, dividing auscultation recordings into audio clips with equal length, and storing the audio clips in a wav format, so that subsequent training is facilitated.
(12) Audio clips containing the bowel sound event and other sounds (gaussian noise, heart beat sound, breathing sound, talking sound, and other noise such as stethoscope friction sound) are screened out to form a data set.
(13) The beginning and ending times of the borborygmus event in each sound clip are marked in detail, the marking accuracy is accurate to 10 milliseconds, and other noise which is not borygmus is not marked.
(14) Dividing the marked data set into three parts of a training set, a verification set and a test set.
Further, in the step (2), a deep neural network based on a transducer structure is created, and the collected belly auscultation recording is used for training a neural network model, which specifically comprises the following steps:
(21) First, in order to achieve fine-grained division of a borborygmus event, input audio data is divided into a plurality of equal-sized "audio frames" in 10-millisecond steps, and the division fineness is the same as the fineness of data labeling.
(22) Subsequently, a deep neural network model is constructed: a fine-grained borygmus recognition model is constructed using a frame embedding module, a position encoding module, a plurality of stacked Transformer encoders, and a linear classifier, as shown in fig. 2. The specific description of each module is as follows:
(i) The frame embedding module is formed by a 1-dimensional convolution layer with multiple output channels. As shown in fig. 3, the module takes a single audio frame as input, outputs a plurality of eigenvectors of the audio frame after convolution calculation, and splices the eigenvectors obtained by calculation into a one-dimensional eigenvector. Specifically, assume that the input data isWherein->、/>And->Distribution indicates the number of audio frames, the number of channels per audio frame and the number of features, the audio features output by the convolution layer +.>Can be expressed as:
wherein the method comprises the steps ofRepresenting a one-dimensional multichannel convolution operation,/->,/>And->The number of convolution operators in the convolution layer and the feature number of the feature vector obtained after convolution are respectively represented. In the present invention, < >>,/>. Then, the plurality of feature vectors of each audio frame are spliced into a one-dimensional feature vector, and a final frame embedding result is obtained:. In this module, the 1-dimensional convolution layer can be regarded as a plurality of learnable filters, so that the module can replace the feature extraction step in the prior art, and automatically learn and extract shallow features of an audio frame in an end-to-end training process.
(ii) The position coding module is responsible for automatically coding the position information of the audio frame sequence and adding the coded position information into the original input sequence through addition operation. As shown in fig. 4, the position coding module is composed of a set of learnable parameters, and the parameter dimensions are the same as those of the input audio frame sequence, so as to automatically learn the optimal position information coding method in the end-to-end training process. Specifically, assume that the input data isWherein->Representing the feature number of the input data, its corresponding position code can be expressed as: />And the output data subjected to the position information coding is: />。
(iii) As shown in fig. 5, the transducer encoder is composed of a layer normalization module, a multi-head self-attention module, a residual module, and a multi-layer perceptron module. Meanwhile, in order to further extract the advanced features of the borborygmus event and improve the accuracy of borygmus recognition, the invention stacks and uses 3 transducer encoders. The detailed description of each sub-module is as follows:
first, the input data is submitted to a layer normalization module. The layer normalization module is responsible for normalizing the input data, so that normalized data meets normal distribution with a mean value of 0 and a standard deviation of 1.
The normalized data is then input into a multi-headed self-attention module, which is made up of a plurality of parallel self-attention modules. The self-attention module aims at simulating the attention behavior of a person when the person interacts with the outside, and a large amount of irrelevant interference information can be removed from the sequence in the operation process, so that a better recognition effect and higher training efficiency are realized. Compared with a convolutional neural network model which is used in a large amount in the prior art, the self-attention module is a neural network module specially designed for sequence data, and can better extract features from the sequence data, especially long-distance dependent features in the sequence data, so that the invention can achieve better recognition results than the prior art.
Specifically, for input dataThe self-attention module is represented by a linear transformation matrix +.>、And->Will input data +.>Mapping to the corresponding query vector +.>Keyword vector->Sum vector->Wherein->、/>And->Characteristic numbers respectively representing a query vector, a keyword vector and a value vector, in the present invention,/-in>=64. Then, the self-attention module calculates the query vector +.>And keyword vector->The similarity between the audio frames is obtained, the relevance between the different audio frames is obtained, and finally, all value vectors are subjected to +_ according to the relevance score>And performing weighted sum operation to obtain new characteristic representation. The similarity calculation method used in the present invention is based on +.>The specific formula of the scaling vector point multiplication method of the function is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,indicate->Feature vectors calculated by the self-attention module, < >>In the present invention->. Q, K and V represent query vector, keyword vector and value vector, respectively, +.>Representing the dimensions of the keyword vector, softmax represents the normalized exponential function. Then, the multi-head self-attention module splices the audio features calculated by the self-attention modules along the feature direction to obtain a new feature vector +.>And pass through a linear transformation matrix +.>Will->Mapping to final output eigenvector +.>I.e.。
Then, in order to further extract the abstract features of the borborygmus event, the invention carries out layer normalization on the features calculated by the multi-head self-attention module again, and submits the normalized data to the multi-layer perceptron module. The multi-layer perceptron module in the invention is composed of two layers of linear neurons, and the middle part is composed ofThe nonlinear functions are connected. As shown, the specific structure of the module can be expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,is self-attention of multiple headsThe abstract features calculated by the force module are used as input vectors of the multi-layer perceptron. LayerNorm represents a layer normalization module, geLU represents a nonlinear activation function, and the calculation formula is as follows:and->Is the characteristic vector calculated by the multi-layer perceptron module, W 0 And W is 1 Representing linear mapping matrices in the first and second layer linear neurons, respectively, and b 0 And b 1 Representing regularized deviation amounts in the first and second layer linear neurons, respectively.
Finally, in order to cope with the gradient vanishing problem of the deep neural network in the training process, a residual error module is respectively introduced on the multi-head self-attention module and the multi-layer perceptron module.
To summarize, assume the followingThe input of the individual transducer encoder is +.>And->The module can be represented as:
wherein LayerNorm represents a layer normalization module,representing the input vector of the first layer normalization module. W (W) Q ,W K And W is V Representing a linear mapping matrix in a multi-headed self-attention mechanism, responsible for inputting vectors +.>The linear transformation is a corresponding query vector, keyword vector and value vector. D (D) K Representing the dimensions of the keyword vector. softmax represents the normalized exponential function. W (W) O Is another linear mapping matrix in the multi-head self-attention module, and is responsible for mapping the feature vector calculated by the self-attention mechanism into the output domain. GeLU represents a nonlinear activation function, W 0 And W is 1 Representing linear mapping matrices in first and second layer linear neurons, respectively, in a multi-layer perceptron, and b 0 And b 1 Representing regularized deviation amounts in the first and second layer linear neurons, respectively. />Indicate->The final output of the individual transducer encoders,。
(iv) Finally, as shown in fig. 6, the linear classifier module is composed of a layer of linear neurons and is responsible for classifying the audio feature components extracted by the transducer encoder, and classifying each audio frame into two classes: namely borborygmus events and non-borygmus events.
It can be seen from the above that, since the transducer encoder used in the present invention can effectively calculate the correlation information between each audio frame and other frames, and thus extract the advanced features of each audio frame, the deep neural network provided in the present invention can fully and carefully identify each audio frame, thereby implementing fine-grained segmentation of the borygmus events, and precisely locating the boundary of each borygmus event.
(23) The deep neural network model constructed in the previous step is then trained using the acquired abdominal auscultation data. The present invention is trained using a random gradient descent method (SGD) and uses a cross entropy function as the training objective function. In particular, it is assumed that all parameters of the deep neural network model constructed in the present invention areAt the beginning, all entries of the model are randomly initialized. And then entering a forward calculation stage, randomly selecting 512 abdominal auscultation fragments from the training data set without returning, predicting the 512 data by using the neural network model to be trained, and calculating a model prediction result and a loss value corresponding to a correct label by using an objective function. Subsequently the model parameters are calculated by the back propagation method +.>Gradient of->The gradient of the model is updated by the following formula:
wherein the method comprises the steps ofIndicates learning rate (I/O)>Representing parameters before update ∈ ->Representing the updated parameters. Thus, one training iteration is completed. The iterations are repeated until all the data in the training set is extracted. At this point all data in the verification set is fetched,and predicting the data in the verification set by the trained model, and calculating indexes such as prediction loss, prediction accuracy and the like of the model, so as to evaluate the quality of the model. If the effect of the model does not meet the requirement, repeating the training steps until the performance of the model on the verification set reaches the expected value, and storing the parameters of the model. In the invention, the learning rate used in the training process is 0.1, the dynamic value is 0.9, and the weight attenuation rate is 0.02.
(24) The trained neural network model is used for accurately identifying the borborygmus events in clinic, and further calculating related parameters such as the occurrence frequency, the frequency and the like of the borygmus events according to the identification result, so as to assist doctors in diagnosing diseases.
A method and a device for identifying fine granularity borborygmus based on a deep neural network comprise a memory: for storing executable instructions; a processor: the method is used for executing executable instructions stored in the memory to realize a fine-granularity borborygmus recognition method based on the deep neural network.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A method for identifying fine granularity borborygmus based on deep neural network is characterized in that,
collecting and labeling belly auscultation recording data;
constructing a depth neural network model based on a transducer structure, and training the depth neural network by using the marked belly auscultation recording data to obtain a final model;
and loading the final model, and inputting an abdomen sound signal to be detected to obtain a corresponding borborygmus recognition result.
2. The method for identifying fine-grained borygmus sounds based on a deep neural network according to claim 1, wherein collecting and labeling the auscultation recording data of the abdomen comprises: the abdominal auscultation audio stream is collected and the auscultation recording is split into equal length audio pieces.
3. The method for identifying fine-grained borygmus sounds based on deep neural network according to claim 2, wherein collecting and labeling the auscultation recording data of the abdomen comprises: audio clips containing the borborygmus event and other sounds including gaussian noise, heart beat sounds, breathing sounds, speaking sounds and stethoscope friction sounds are screened from the audio clips and respectively form a data set.
4. A method for identifying fine-grained borygmus sounds based on a deep neural network according to claim 3, wherein collecting and labeling the auscultation recording data of the abdomen comprises: the starting and ending time of the borborygmus event in each sound clip in the borygmus event data set is marked in detail, and the marking precision is accurate to the set millisecond level.
5. The method and device for identifying fine-grained borygmus sounds based on deep neural network according to claim 4, wherein collecting and labeling the belly auscultation recording data comprises: dividing the marked borborygmus event data set into three parts of a training set, a verification set and a test set.
6. The method for identifying fine-grained borygmus sounds based on a deep neural network according to any one of claims 1 to 5, wherein the deep neural network model is a fine-grained borygmus sound identification model constructed by using a frame embedding module, a position encoding module, a plurality of stacked Transformer encoders and a linear classifier;
the frame embedding module is composed of a 1-dimensional convolution layer with multiple output channels;
the position coding module is responsible for automatically coding the position information of the audio frame sequence and adding the coded position information into the original input sequence through addition operation;
the transducer encoder is composed of a layer normalization module, a multi-head self-attention module and a multi-layer perceptron module;
the linear classifier module is composed of a layer of linear neurons and is responsible for classifying the audio feature scores extracted by the transducer encoder, and each audio frame is divided into two types of borygmus events and non-borygmus events.
7. The method for identifying fine-grained borygmus on basis of a deep neural network according to claim 6, wherein the multi-head self-attention module and the multi-layer perceptron module are provided with residual modules.
8. The method for identifying fine-grained borygmus sounds based on a deep neural network according to claim 7, wherein constructing a deep neural network model based on a transducer structure comprises the steps of:
s301, adding position information into a position coding module, entering a layer normalization module, and performing normalization operation on the training set data, wherein the normalized data meets normal distribution with a mean value of 0 and a standard deviation of 1;
s302, outputting sequence features after the normalized data in the step S301 enter a multi-head self-attention module; the multi-head attention module is composed of a plurality of parallel self-attention modules;
s303, carrying out layer normalization on the sequence features output in the step S302 again, and then entering a multi-layer perceptron module to output classification features; the multi-layer perceptron module consists of two layers of linear neurons, and the middle of the multi-layer perceptron module is connected by a nonlinear activation function; the classification features enter a residual error module and then output advanced features;
s304, the advanced features enter a linear classifier mode to be classified into borborygmus events and non-borygmus events.
9. The method for identifying fine-grained borborygmus based on deep neural network according to claim 1, wherein the input abdominal sound signal obtains the corresponding borygmus identification result comprising: and (3) accurately identifying the output result of the borborygmus event in clinic by using the trained deep neural network model, and calculating the occurrence frequency of the borygmus event by using a non-maximum suppression algorithm.
10. A method and a device for identifying fine-granularity borborygmus based on a deep neural network are characterized by comprising
A memory: for storing executable instructions;
a processor: for executing executable instructions stored in said memory, implementing a deep neural network based fine-grained borborygmus recognition method according to any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310335591.7A CN116052725B (en) | 2023-03-31 | 2023-03-31 | Fine granularity borborygmus recognition method and device based on deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310335591.7A CN116052725B (en) | 2023-03-31 | 2023-03-31 | Fine granularity borborygmus recognition method and device based on deep neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116052725A true CN116052725A (en) | 2023-05-02 |
CN116052725B CN116052725B (en) | 2023-06-23 |
Family
ID=86127645
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310335591.7A Active CN116052725B (en) | 2023-03-31 | 2023-03-31 | Fine granularity borborygmus recognition method and device based on deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116052725B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1952852A (en) * | 2006-10-13 | 2007-04-25 | 杨益民 | Voice pick device of circumstance around foetus |
CN104305961A (en) * | 2014-10-20 | 2015-01-28 | 清华大学 | Bowel sounds monitoring and recognizing system |
CN106021948A (en) * | 2016-05-30 | 2016-10-12 | 清华大学 | Signal processing method for borborygmus signal monitoring system |
CN106328150A (en) * | 2016-08-18 | 2017-01-11 | 北京易迈医疗科技有限公司 | Bowel sound detection method, device and system under noisy environment |
CN109620154A (en) * | 2018-12-21 | 2019-04-16 | 平安科技(深圳)有限公司 | Borborygmus voice recognition method and relevant apparatus based on deep learning |
CN110432924A (en) * | 2019-08-06 | 2019-11-12 | 杭州智团信息技术有限公司 | Borborygmus sound detection device, method and electronic equipment |
CN113674734A (en) * | 2021-08-24 | 2021-11-19 | 中国铁道科学研究院集团有限公司电子计算技术研究所 | Information query method, system, equipment and storage medium based on voice recognition |
CN114023316A (en) * | 2021-11-04 | 2022-02-08 | 匀熵科技(无锡)有限公司 | TCN-Transformer-CTC-based end-to-end Chinese voice recognition method |
CN114283791A (en) * | 2021-11-30 | 2022-04-05 | 广东电力信息科技有限公司 | Speech recognition method based on high-dimensional acoustic features and model training method |
CN114305484A (en) * | 2021-12-15 | 2022-04-12 | 浙江大学医学院附属儿童医院 | Heart disease heart sound intelligent classification method, device and medium based on deep learning |
CN115206347A (en) * | 2021-04-13 | 2022-10-18 | 浙江荷清柔性电子技术有限公司 | Method and device for identifying bowel sounds, storage medium and computer equipment |
-
2023
- 2023-03-31 CN CN202310335591.7A patent/CN116052725B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1952852A (en) * | 2006-10-13 | 2007-04-25 | 杨益民 | Voice pick device of circumstance around foetus |
CN104305961A (en) * | 2014-10-20 | 2015-01-28 | 清华大学 | Bowel sounds monitoring and recognizing system |
CN106021948A (en) * | 2016-05-30 | 2016-10-12 | 清华大学 | Signal processing method for borborygmus signal monitoring system |
CN106328150A (en) * | 2016-08-18 | 2017-01-11 | 北京易迈医疗科技有限公司 | Bowel sound detection method, device and system under noisy environment |
CN109620154A (en) * | 2018-12-21 | 2019-04-16 | 平安科技(深圳)有限公司 | Borborygmus voice recognition method and relevant apparatus based on deep learning |
CN110432924A (en) * | 2019-08-06 | 2019-11-12 | 杭州智团信息技术有限公司 | Borborygmus sound detection device, method and electronic equipment |
CN115206347A (en) * | 2021-04-13 | 2022-10-18 | 浙江荷清柔性电子技术有限公司 | Method and device for identifying bowel sounds, storage medium and computer equipment |
CN113674734A (en) * | 2021-08-24 | 2021-11-19 | 中国铁道科学研究院集团有限公司电子计算技术研究所 | Information query method, system, equipment and storage medium based on voice recognition |
CN114023316A (en) * | 2021-11-04 | 2022-02-08 | 匀熵科技(无锡)有限公司 | TCN-Transformer-CTC-based end-to-end Chinese voice recognition method |
CN114283791A (en) * | 2021-11-30 | 2022-04-05 | 广东电力信息科技有限公司 | Speech recognition method based on high-dimensional acoustic features and model training method |
CN114305484A (en) * | 2021-12-15 | 2022-04-12 | 浙江大学医学院附属儿童医院 | Heart disease heart sound intelligent classification method, device and medium based on deep learning |
Non-Patent Citations (2)
Title |
---|
ASHISH VASWANI 等: ""Attention is all you need"", 《HTTPS://ARXIV.ORG/ABS/1706.03762》, pages 1 - 15 * |
Y. HUANG: ""PhysioVec: IoT Biosignal Based Search Engine for Gastrointestinal Health"", 《2022 7TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA)》, vol. 1, pages 230 - 236 * |
Also Published As
Publication number | Publication date |
---|---|
CN116052725B (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110457432B (en) | Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium | |
CN111161715B (en) | Specific sound event retrieval and positioning method based on sequence classification | |
CN111429943B (en) | Joint detection method for music and relative loudness of music in audio | |
CN113806609A (en) | Multi-modal emotion analysis method based on MIT and FSM | |
CN114783418B (en) | End-to-end voice recognition method and system based on sparse self-attention mechanism | |
CN111986699A (en) | Sound event detection method based on full convolution network | |
CN115393968A (en) | Audio-visual event positioning method fusing self-supervision multi-mode features | |
Kohlsdorf et al. | An auto encoder for audio dolphin communication | |
Lu et al. | Temporal Attentive Pooling for Acoustic Event Detection. | |
CN112735466A (en) | Audio detection method and device | |
CN116189671B (en) | Data mining method and system for language teaching | |
CN116052725B (en) | Fine granularity borborygmus recognition method and device based on deep neural network | |
CN112052880A (en) | Underwater sound target identification method based on weight updating support vector machine | |
CN115376547B (en) | Pronunciation evaluation method, pronunciation evaluation device, computer equipment and storage medium | |
CN113488069B (en) | Speech high-dimensional characteristic rapid extraction method and device based on generation type countermeasure network | |
CN115472182A (en) | Attention feature fusion-based voice emotion recognition method and device of multi-channel self-encoder | |
Sharma et al. | Comparative analysis of various feature extraction techniques for classification of speech disfluencies | |
Anindya et al. | Development of Indonesian speech recognition with deep neural network for robotic command | |
Wilkinghoff et al. | TACos: Learning temporally structured embeddings for few-shot keyword spotting with dynamic time warping | |
CN112951270B (en) | Voice fluency detection method and device and electronic equipment | |
CN114298019A (en) | Emotion recognition method, emotion recognition apparatus, emotion recognition device, storage medium, and program product | |
CN113936663A (en) | Method for detecting difficult airway, electronic device and storage medium thereof | |
CN113488027A (en) | Hierarchical classification generated audio tracing method, storage medium and computer equipment | |
CN109190556B (en) | Method for identifying notarization will authenticity | |
JP2017134321A (en) | Signal processing method, signal processing device, and signal processing program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |