CN116052725A - Fine granularity borborygmus recognition method and device based on deep neural network - Google Patents

Fine granularity borborygmus recognition method and device based on deep neural network Download PDF

Info

Publication number
CN116052725A
CN116052725A CN202310335591.7A CN202310335591A CN116052725A CN 116052725 A CN116052725 A CN 116052725A CN 202310335591 A CN202310335591 A CN 202310335591A CN 116052725 A CN116052725 A CN 116052725A
Authority
CN
China
Prior art keywords
neural network
module
borborygmus
deep neural
borygmus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310335591.7A
Other languages
Chinese (zh)
Other versions
CN116052725B (en
Inventor
胡兵
刘瑞德
黄凯得
冯亦龙
袁湘蕾
刘伟
林怡秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
West China Hospital of Sichuan University
Original Assignee
West China Hospital of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West China Hospital of Sichuan University filed Critical West China Hospital of Sichuan University
Priority to CN202310335591.7A priority Critical patent/CN116052725B/en
Publication of CN116052725A publication Critical patent/CN116052725A/en
Application granted granted Critical
Publication of CN116052725B publication Critical patent/CN116052725B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4803Speech analysis specially adapted for diagnostic purposes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B7/00Instruments for auscultation
    • A61B7/02Stethoscopes
    • A61B7/04Electric stethoscopes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Signal Processing (AREA)
  • Animal Behavior & Ethology (AREA)
  • Veterinary Medicine (AREA)
  • Surgery (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Computational Linguistics (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Epidemiology (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physiology (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for identifying fine granularity borborygmus based on a deep neural network, which mainly solve the problems that in the prior art, a great amount of relevant information is lost in the process of converting 1-dimensional audio data into a 2-dimensional feature map and the specific moment of borygmus occurrence cannot be accurately positioned due to the fact that the signal features are very dependent on manual extraction. The deep neural network-based fine granularity borborygmus recognition method is used for collecting and labeling belly auscultation recording data; constructing a depth neural network model based on a transducer structure, and training the depth neural network by using the marked belly auscultation recording data to obtain a final model; and loading the final model, and obtaining a corresponding borborygmus recognition result by the input abdominal voice signal. Through the scheme, the accurate fine granularity identification of the borborygmus event can be realized through end-to-end training without manual feature extraction.

Description

Fine granularity borborygmus recognition method and device based on deep neural network
Technical Field
The invention relates to the technical field of deep neural networks, in particular to a method and a device for identifying fine-granularity borborygmus based on a deep neural network.
Background
Human Bowel Sounds (BS) are sounds produced by the peristaltic movement of the intestines to push food, liquids and gases. Many studies have shown that borborygmus is closely related to the functional status of the gastrointestinal tract, auscultation of borygmus is helpful in diagnosing functional bowel diseases such as irritable bowel syndrome and functional constipation, and thus, abdominal borygmus auscultation is an important investigation item in clinical work.
The prior art schemes are mainly divided into two main categories: a method based on conventional signal processing and a method based on deep learning; the disadvantages of this are: (1) Most of the existing methods rely on manually extracted signal features, but the features extracted by the manual features are often suboptimal due to the complexity of borygmus. This suboptimal signal characteristic limits the recognition performance of the subsequent classification algorithm; (2) The current mainstream algorithm is based on a Convolutional Neural Network (CNN), but CNN is specially designed for 2-dimensional image data, sound signals are 1-dimensional sequence data, the data characteristics of the sound signals are greatly different from those of the 2-dimensional image data, bowel sounds are 1-dimensional sequence data, the bowel sounds need to be converted into 2-dimensional characteristic diagrams, and a large amount of relevant information is lost in the data conversion process; (3) The prior art only realizes coarse granularity identification of the borborygmus events, namely only judges whether the borygmus events are contained in the belly sound fragments for a long time, but cannot accurately locate the specific moment of the borygmus events, and cannot judge related parameters such as the occurrence times, the frequency and the like of the borygmus events, and the parameters are critical to diagnosis of intestinal diseases.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a borborygmus sound fine-granularity identification method based on a deep neural network, which trains a novel borygmus sound identification neural network model based on a transducer structure on collected borygmus sound data and realizes fine-granularity accurate identification of borygmus sound events on the premise of not manually extracting audio features.
The invention provides the following technical scheme:
in one aspect, a method for identifying fine-grained borborygmus sounds based on a deep neural network includes
Collecting and labeling belly auscultation recording data;
constructing a depth neural network model based on a transducer structure, and training the depth neural network by using the marked belly auscultation recording data to obtain a final model;
and loading the final model, and inputting an abdomen sound signal to be detected to obtain a corresponding borborygmus recognition result.
In a preferred embodiment, collecting and annotating the abdominal auscultation recording data includes: the abdominal auscultation audio stream is collected and the auscultation recording is split into equal length audio pieces.
In a preferred embodiment, collecting and annotating the abdominal auscultation recording data includes: audio clips containing the borborygmus event and other sounds including gaussian noise, heart beat sounds, breathing sounds, speaking sounds and stethoscope friction sounds are screened from the audio clips and respectively form a data set.
In a preferred embodiment, collecting and annotating the abdominal auscultation recording data includes: the starting and ending time of the borborygmus event in each sound clip in the borygmus event data set is marked in detail, and the marking precision is accurate to the set millisecond level.
In a preferred embodiment, collecting and annotating the abdominal auscultation recording data includes: dividing the marked borborygmus event data set into three parts of a training set, a verification set and a test set.
On the other hand, the deep neural network model is a fine-grained borygmus recognition model constructed by using a frame embedding module, a position coding module, a plurality of stacked Transformer encoders and a linear classifier;
the frame embedding module is composed of a 1-dimensional convolution layer with multiple output channels;
the position coding module is responsible for automatically coding the position information of the audio frame sequence and adding the coded position information into the original input sequence through addition operation;
the transducer encoder is composed of a layer normalization module, a multi-head self-attention module and a multi-layer perceptron module;
the linear classifier module is composed of a layer of linear neurons and is responsible for classifying the audio feature scores extracted by the transducer encoder, and each audio frame is divided into two types of borygmus events and non-borygmus events.
In a preferred embodiment, the multi-head self-attention module and the multi-layer perceptron module are provided with residual modules.
In a preferred embodiment, constructing a transducer structure-based deep neural network model includes the steps of:
s301, adding position information into a position coding module, entering a layer normalization module, and performing normalization operation on the training set data, wherein the normalized data meets normal distribution with a mean value of 0 and a standard deviation of 1;
s302, outputting more abstract sequence features after the normalized data in the step S301 enter a multi-head self-attention module; the multi-head attention module is composed of a plurality of parallel self-attention modules;
s303, carrying out layer normalization on the sequence features output in the step S302 again, entering a multi-layer perceptron module, and outputting more abstract classification features; the multi-layer perceptron module consists of two layers of linear neurons, and the middle of the multi-layer perceptron module is connected by a nonlinear activation function; the classification features enter a residual error module and then output advanced features;
s304, the advanced features enter a linear classifier mode to be classified into borborygmus events and non-borygmus events.
In a preferred embodiment, the input abdominal sound signal obtains a corresponding borborygmus recognition result including: and (3) accurately identifying the borborygmus event in clinic by using the trained deep neural network model, and calculating the occurrence frequency of the borygmus event by using a non-maximum suppression algorithm according to the output result of the deep neural network model.
In a third aspect, a method and apparatus for identifying fine-grained borborygmus sounds based on a deep neural network include a memory: for storing executable instructions; a processor: the executable instructions are used for executing the executable instructions stored in the memory to realize a fine granularity borborygmus recognition method based on a deep neural network.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention can automatically learn the characteristics of the audio signal; compared with other existing methods, the neural network module provided by the invention can automatically learn how to extract optimal and highly abstract audio signal characteristics from a large amount of belly auscultation data, so that a better recognition effect can be realized.
(2) After training, the neural network model provided by the invention can directly process the original audio stream so as to obtain a refined borborygmus recognition result; in the using process of the neural network model proposed by other methods, low-level audio features such as a histogram of a frequency domain or a mel frequency cepstrum coefficient and the like are manually extracted from an original audio signal, and then the neural network model can calculate and obtain a coarse-granularity borborygmus recognition result according to the manually extracted features.
(3) The deep neural network provided by the invention is not based on a traditional CNN structure, but is based on a transducer structure specially designed for sequence data; the neural network structure provided by the invention not only can extract the clinical domain response characteristics of the borborygmus signal, but also can effectively extract the remote dependence information of different sound signals, so that the context information of the sequence signal can be better constructed.
(4) The current mainstream technology only realizes coarse-granularity identification of the borborygmus events, but cannot accurately position the specific moment of the borygmus events, so that the current mainstream technology is difficult to apply in large scale in the real world; the neural network model provided by the invention realizes fine-grained modeling of the abdomen auscultation signal through the transducer structure, further realizes the identification of the refined borygmus event, and has great help to auscultation of clinical borygmus.
Drawings
For a clearer description of embodiments of the invention or of the prior art, the drawings that are necessary for the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are some of the embodiments of the invention and that, without the inventive effort, further drawings may be obtained according to these drawings, for a person skilled in the art, in which:
FIG. 1 is a flow chart of the present invention.
Fig. 2 is an overall structure of the neural network model.
Fig. 3 is a specific structure of a frame embedding module.
Fig. 4 is a specific structure of the position coding module.
Fig. 5 is a transducer encoder module.
Fig. 6 is a specific structure of the linear classifier.
Description of the embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to fig. 1 to 6, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making inventive efforts are within the scope of protection of the present invention.
Aiming at the defects of the prior art, the invention provides a borborygmus fine-granularity identification method based on a deep neural network, which is based on a transducer structure, does not need manual feature extraction, and can realize fine-granularity accurate identification of borygmus events only through end-to-end training.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a borborygmus fine granularity identification method based on a deep neural network comprises the following steps:
(1) And collecting and marking data, collecting the desensitized belly auscultation record, and marking the starting and ending positions of the borborygmus event in detail.
(2) Creating a deep neural network based on a transducer structure, training a neural network model using the collected abdominal auscultatory recordings.
(3) Loading model to obtain corresponding bowel sound identification result according to inputted abdominal voice signal
The overall flow is as shown in FIG. 1:
further, in the step (1), the abdominal auscultation recording data for training the deep neural network model is collected and marked, and the method comprises the following steps:
(11) Recording and collecting belly auscultation audio streams, dividing auscultation recordings into audio clips with equal length, and storing the audio clips in a wav format, so that subsequent training is facilitated.
(12) Audio clips containing the bowel sound event and other sounds (gaussian noise, heart beat sound, breathing sound, talking sound, and other noise such as stethoscope friction sound) are screened out to form a data set.
(13) The beginning and ending times of the borborygmus event in each sound clip are marked in detail, the marking accuracy is accurate to 10 milliseconds, and other noise which is not borygmus is not marked.
(14) Dividing the marked data set into three parts of a training set, a verification set and a test set.
Further, in the step (2), a deep neural network based on a transducer structure is created, and the collected belly auscultation recording is used for training a neural network model, which specifically comprises the following steps:
(21) First, in order to achieve fine-grained division of a borborygmus event, input audio data is divided into a plurality of equal-sized "audio frames" in 10-millisecond steps, and the division fineness is the same as the fineness of data labeling.
(22) Subsequently, a deep neural network model is constructed: a fine-grained borygmus recognition model is constructed using a frame embedding module, a position encoding module, a plurality of stacked Transformer encoders, and a linear classifier, as shown in fig. 2. The specific description of each module is as follows:
(i) The frame embedding module is formed by a 1-dimensional convolution layer with multiple output channels. As shown in fig. 3, the module takes a single audio frame as input, outputs a plurality of eigenvectors of the audio frame after convolution calculation, and splices the eigenvectors obtained by calculation into a one-dimensional eigenvector. Specifically, assume that the input data is
Figure SMS_1
Wherein->
Figure SMS_2
、/>
Figure SMS_3
And->
Figure SMS_4
Distribution indicates the number of audio frames, the number of channels per audio frame and the number of features, the audio features output by the convolution layer +.>
Figure SMS_5
Can be expressed as:
Figure SMS_6
wherein the method comprises the steps of
Figure SMS_7
Representing a one-dimensional multichannel convolution operation,/->
Figure SMS_8
,/>
Figure SMS_9
And->
Figure SMS_10
The number of convolution operators in the convolution layer and the feature number of the feature vector obtained after convolution are respectively represented. In the present invention, < >>
Figure SMS_11
,/>
Figure SMS_12
. Then, the plurality of feature vectors of each audio frame are spliced into a one-dimensional feature vector, and a final frame embedding result is obtained:
Figure SMS_13
. In this module, the 1-dimensional convolution layer can be regarded as a plurality of learnable filters, so that the module can replace the feature extraction step in the prior art, and automatically learn and extract shallow features of an audio frame in an end-to-end training process.
(ii) The position coding module is responsible for automatically coding the position information of the audio frame sequence and adding the coded position information into the original input sequence through addition operation. As shown in fig. 4, the position coding module is composed of a set of learnable parameters, and the parameter dimensions are the same as those of the input audio frame sequence, so as to automatically learn the optimal position information coding method in the end-to-end training process. Specifically, assume that the input data is
Figure SMS_14
Wherein->
Figure SMS_15
Representing the feature number of the input data, its corresponding position code can be expressed as: />
Figure SMS_16
And the output data subjected to the position information coding is: />
Figure SMS_17
(iii) As shown in fig. 5, the transducer encoder is composed of a layer normalization module, a multi-head self-attention module, a residual module, and a multi-layer perceptron module. Meanwhile, in order to further extract the advanced features of the borborygmus event and improve the accuracy of borygmus recognition, the invention stacks and uses 3 transducer encoders. The detailed description of each sub-module is as follows:
first, the input data is submitted to a layer normalization module. The layer normalization module is responsible for normalizing the input data, so that normalized data meets normal distribution with a mean value of 0 and a standard deviation of 1.
The normalized data is then input into a multi-headed self-attention module, which is made up of a plurality of parallel self-attention modules. The self-attention module aims at simulating the attention behavior of a person when the person interacts with the outside, and a large amount of irrelevant interference information can be removed from the sequence in the operation process, so that a better recognition effect and higher training efficiency are realized. Compared with a convolutional neural network model which is used in a large amount in the prior art, the self-attention module is a neural network module specially designed for sequence data, and can better extract features from the sequence data, especially long-distance dependent features in the sequence data, so that the invention can achieve better recognition results than the prior art.
Specifically, for input data
Figure SMS_24
The self-attention module is represented by a linear transformation matrix +.>
Figure SMS_23
Figure SMS_30
And->
Figure SMS_25
Will input data +.>
Figure SMS_27
Mapping to the corresponding query vector +.>
Figure SMS_26
Keyword vector->
Figure SMS_33
Sum vector->
Figure SMS_21
Wherein->
Figure SMS_31
、/>
Figure SMS_18
And->
Figure SMS_28
Characteristic numbers respectively representing a query vector, a keyword vector and a value vector, in the present invention,/-in>
Figure SMS_22
=64. Then, the self-attention module calculates the query vector +.>
Figure SMS_29
And keyword vector->
Figure SMS_19
The similarity between the audio frames is obtained, the relevance between the different audio frames is obtained, and finally, all value vectors are subjected to +_ according to the relevance score>
Figure SMS_32
And performing weighted sum operation to obtain new characteristic representation. The similarity calculation method used in the present invention is based on +.>
Figure SMS_20
The specific formula of the scaling vector point multiplication method of the function is as follows:
Figure SMS_34
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_36
indicate->
Figure SMS_40
Feature vectors calculated by the self-attention module, < >>
Figure SMS_43
In the present invention->
Figure SMS_37
. Q, K and V represent query vector, keyword vector and value vector, respectively, +.>
Figure SMS_39
Representing the dimensions of the keyword vector, softmax represents the normalized exponential function. Then, the multi-head self-attention module splices the audio features calculated by the self-attention modules along the feature direction to obtain a new feature vector +.>
Figure SMS_42
And pass through a linear transformation matrix +.>
Figure SMS_44
Will->
Figure SMS_35
Mapping to final output eigenvector +.>
Figure SMS_38
I.e.
Figure SMS_41
Then, in order to further extract the abstract features of the borborygmus event, the invention carries out layer normalization on the features calculated by the multi-head self-attention module again, and submits the normalized data to the multi-layer perceptron module. The multi-layer perceptron module in the invention is composed of two layers of linear neurons, and the middle part is composed of
Figure SMS_45
The nonlinear functions are connected. As shown, the specific structure of the module can be expressed as:
Figure SMS_46
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_47
is self-attention of multiple headsThe abstract features calculated by the force module are used as input vectors of the multi-layer perceptron. LayerNorm represents a layer normalization module, geLU represents a nonlinear activation function, and the calculation formula is as follows:
Figure SMS_48
and->
Figure SMS_49
Is the characteristic vector calculated by the multi-layer perceptron module, W 0 And W is 1 Representing linear mapping matrices in the first and second layer linear neurons, respectively, and b 0 And b 1 Representing regularized deviation amounts in the first and second layer linear neurons, respectively.
Finally, in order to cope with the gradient vanishing problem of the deep neural network in the training process, a residual error module is respectively introduced on the multi-head self-attention module and the multi-layer perceptron module.
To summarize, assume the following
Figure SMS_50
The input of the individual transducer encoder is +.>
Figure SMS_51
And->
Figure SMS_52
The module can be represented as:
Figure SMS_53
/>
Figure SMS_54
Figure SMS_55
Figure SMS_56
wherein LayerNorm represents a layer normalization module,
Figure SMS_57
representing the input vector of the first layer normalization module. W (W) Q ,W K And W is V Representing a linear mapping matrix in a multi-headed self-attention mechanism, responsible for inputting vectors +.>
Figure SMS_58
The linear transformation is a corresponding query vector, keyword vector and value vector. D (D) K Representing the dimensions of the keyword vector. softmax represents the normalized exponential function. W (W) O Is another linear mapping matrix in the multi-head self-attention module, and is responsible for mapping the feature vector calculated by the self-attention mechanism into the output domain. GeLU represents a nonlinear activation function, W 0 And W is 1 Representing linear mapping matrices in first and second layer linear neurons, respectively, in a multi-layer perceptron, and b 0 And b 1 Representing regularized deviation amounts in the first and second layer linear neurons, respectively. />
Figure SMS_59
Indicate->
Figure SMS_60
The final output of the individual transducer encoders,
Figure SMS_61
(iv) Finally, as shown in fig. 6, the linear classifier module is composed of a layer of linear neurons and is responsible for classifying the audio feature components extracted by the transducer encoder, and classifying each audio frame into two classes: namely borborygmus events and non-borygmus events.
It can be seen from the above that, since the transducer encoder used in the present invention can effectively calculate the correlation information between each audio frame and other frames, and thus extract the advanced features of each audio frame, the deep neural network provided in the present invention can fully and carefully identify each audio frame, thereby implementing fine-grained segmentation of the borygmus events, and precisely locating the boundary of each borygmus event.
(23) The deep neural network model constructed in the previous step is then trained using the acquired abdominal auscultation data. The present invention is trained using a random gradient descent method (SGD) and uses a cross entropy function as the training objective function. In particular, it is assumed that all parameters of the deep neural network model constructed in the present invention are
Figure SMS_62
At the beginning, all entries of the model are randomly initialized. And then entering a forward calculation stage, randomly selecting 512 abdominal auscultation fragments from the training data set without returning, predicting the 512 data by using the neural network model to be trained, and calculating a model prediction result and a loss value corresponding to a correct label by using an objective function. Subsequently the model parameters are calculated by the back propagation method +.>
Figure SMS_63
Gradient of->
Figure SMS_64
The gradient of the model is updated by the following formula:
Figure SMS_65
wherein the method comprises the steps of
Figure SMS_66
Indicates learning rate (I/O)>
Figure SMS_67
Representing parameters before update ∈ ->
Figure SMS_68
Representing the updated parameters. Thus, one training iteration is completed. The iterations are repeated until all the data in the training set is extracted. At this point all data in the verification set is fetched,and predicting the data in the verification set by the trained model, and calculating indexes such as prediction loss, prediction accuracy and the like of the model, so as to evaluate the quality of the model. If the effect of the model does not meet the requirement, repeating the training steps until the performance of the model on the verification set reaches the expected value, and storing the parameters of the model. In the invention, the learning rate used in the training process is 0.1, the dynamic value is 0.9, and the weight attenuation rate is 0.02.
(24) The trained neural network model is used for accurately identifying the borborygmus events in clinic, and further calculating related parameters such as the occurrence frequency, the frequency and the like of the borygmus events according to the identification result, so as to assist doctors in diagnosing diseases.
A method and a device for identifying fine granularity borborygmus based on a deep neural network comprise a memory: for storing executable instructions; a processor: the method is used for executing executable instructions stored in the memory to realize a fine-granularity borborygmus recognition method based on the deep neural network.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for identifying fine granularity borborygmus based on deep neural network is characterized in that,
collecting and labeling belly auscultation recording data;
constructing a depth neural network model based on a transducer structure, and training the depth neural network by using the marked belly auscultation recording data to obtain a final model;
and loading the final model, and inputting an abdomen sound signal to be detected to obtain a corresponding borborygmus recognition result.
2. The method for identifying fine-grained borygmus sounds based on a deep neural network according to claim 1, wherein collecting and labeling the auscultation recording data of the abdomen comprises: the abdominal auscultation audio stream is collected and the auscultation recording is split into equal length audio pieces.
3. The method for identifying fine-grained borygmus sounds based on deep neural network according to claim 2, wherein collecting and labeling the auscultation recording data of the abdomen comprises: audio clips containing the borborygmus event and other sounds including gaussian noise, heart beat sounds, breathing sounds, speaking sounds and stethoscope friction sounds are screened from the audio clips and respectively form a data set.
4. A method for identifying fine-grained borygmus sounds based on a deep neural network according to claim 3, wherein collecting and labeling the auscultation recording data of the abdomen comprises: the starting and ending time of the borborygmus event in each sound clip in the borygmus event data set is marked in detail, and the marking precision is accurate to the set millisecond level.
5. The method and device for identifying fine-grained borygmus sounds based on deep neural network according to claim 4, wherein collecting and labeling the belly auscultation recording data comprises: dividing the marked borborygmus event data set into three parts of a training set, a verification set and a test set.
6. The method for identifying fine-grained borygmus sounds based on a deep neural network according to any one of claims 1 to 5, wherein the deep neural network model is a fine-grained borygmus sound identification model constructed by using a frame embedding module, a position encoding module, a plurality of stacked Transformer encoders and a linear classifier;
the frame embedding module is composed of a 1-dimensional convolution layer with multiple output channels;
the position coding module is responsible for automatically coding the position information of the audio frame sequence and adding the coded position information into the original input sequence through addition operation;
the transducer encoder is composed of a layer normalization module, a multi-head self-attention module and a multi-layer perceptron module;
the linear classifier module is composed of a layer of linear neurons and is responsible for classifying the audio feature scores extracted by the transducer encoder, and each audio frame is divided into two types of borygmus events and non-borygmus events.
7. The method for identifying fine-grained borygmus on basis of a deep neural network according to claim 6, wherein the multi-head self-attention module and the multi-layer perceptron module are provided with residual modules.
8. The method for identifying fine-grained borygmus sounds based on a deep neural network according to claim 7, wherein constructing a deep neural network model based on a transducer structure comprises the steps of:
s301, adding position information into a position coding module, entering a layer normalization module, and performing normalization operation on the training set data, wherein the normalized data meets normal distribution with a mean value of 0 and a standard deviation of 1;
s302, outputting sequence features after the normalized data in the step S301 enter a multi-head self-attention module; the multi-head attention module is composed of a plurality of parallel self-attention modules;
s303, carrying out layer normalization on the sequence features output in the step S302 again, and then entering a multi-layer perceptron module to output classification features; the multi-layer perceptron module consists of two layers of linear neurons, and the middle of the multi-layer perceptron module is connected by a nonlinear activation function; the classification features enter a residual error module and then output advanced features;
s304, the advanced features enter a linear classifier mode to be classified into borborygmus events and non-borygmus events.
9. The method for identifying fine-grained borborygmus based on deep neural network according to claim 1, wherein the input abdominal sound signal obtains the corresponding borygmus identification result comprising: and (3) accurately identifying the output result of the borborygmus event in clinic by using the trained deep neural network model, and calculating the occurrence frequency of the borygmus event by using a non-maximum suppression algorithm.
10. A method and a device for identifying fine-granularity borborygmus based on a deep neural network are characterized by comprising
A memory: for storing executable instructions;
a processor: for executing executable instructions stored in said memory, implementing a deep neural network based fine-grained borborygmus recognition method according to any of claims 1-9.
CN202310335591.7A 2023-03-31 2023-03-31 Fine granularity borborygmus recognition method and device based on deep neural network Active CN116052725B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310335591.7A CN116052725B (en) 2023-03-31 2023-03-31 Fine granularity borborygmus recognition method and device based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310335591.7A CN116052725B (en) 2023-03-31 2023-03-31 Fine granularity borborygmus recognition method and device based on deep neural network

Publications (2)

Publication Number Publication Date
CN116052725A true CN116052725A (en) 2023-05-02
CN116052725B CN116052725B (en) 2023-06-23

Family

ID=86127645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310335591.7A Active CN116052725B (en) 2023-03-31 2023-03-31 Fine granularity borborygmus recognition method and device based on deep neural network

Country Status (1)

Country Link
CN (1) CN116052725B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1952852A (en) * 2006-10-13 2007-04-25 杨益民 Voice pick device of circumstance around foetus
CN104305961A (en) * 2014-10-20 2015-01-28 清华大学 Bowel sounds monitoring and recognizing system
CN106021948A (en) * 2016-05-30 2016-10-12 清华大学 Signal processing method for borborygmus signal monitoring system
CN106328150A (en) * 2016-08-18 2017-01-11 北京易迈医疗科技有限公司 Bowel sound detection method, device and system under noisy environment
CN109620154A (en) * 2018-12-21 2019-04-16 平安科技(深圳)有限公司 Borborygmus voice recognition method and relevant apparatus based on deep learning
CN110432924A (en) * 2019-08-06 2019-11-12 杭州智团信息技术有限公司 Borborygmus sound detection device, method and electronic equipment
CN113674734A (en) * 2021-08-24 2021-11-19 中国铁道科学研究院集团有限公司电子计算技术研究所 Information query method, system, equipment and storage medium based on voice recognition
CN114023316A (en) * 2021-11-04 2022-02-08 匀熵科技(无锡)有限公司 TCN-Transformer-CTC-based end-to-end Chinese voice recognition method
CN114283791A (en) * 2021-11-30 2022-04-05 广东电力信息科技有限公司 Speech recognition method based on high-dimensional acoustic features and model training method
CN114305484A (en) * 2021-12-15 2022-04-12 浙江大学医学院附属儿童医院 Heart disease heart sound intelligent classification method, device and medium based on deep learning
CN115206347A (en) * 2021-04-13 2022-10-18 浙江荷清柔性电子技术有限公司 Method and device for identifying bowel sounds, storage medium and computer equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1952852A (en) * 2006-10-13 2007-04-25 杨益民 Voice pick device of circumstance around foetus
CN104305961A (en) * 2014-10-20 2015-01-28 清华大学 Bowel sounds monitoring and recognizing system
CN106021948A (en) * 2016-05-30 2016-10-12 清华大学 Signal processing method for borborygmus signal monitoring system
CN106328150A (en) * 2016-08-18 2017-01-11 北京易迈医疗科技有限公司 Bowel sound detection method, device and system under noisy environment
CN109620154A (en) * 2018-12-21 2019-04-16 平安科技(深圳)有限公司 Borborygmus voice recognition method and relevant apparatus based on deep learning
CN110432924A (en) * 2019-08-06 2019-11-12 杭州智团信息技术有限公司 Borborygmus sound detection device, method and electronic equipment
CN115206347A (en) * 2021-04-13 2022-10-18 浙江荷清柔性电子技术有限公司 Method and device for identifying bowel sounds, storage medium and computer equipment
CN113674734A (en) * 2021-08-24 2021-11-19 中国铁道科学研究院集团有限公司电子计算技术研究所 Information query method, system, equipment and storage medium based on voice recognition
CN114023316A (en) * 2021-11-04 2022-02-08 匀熵科技(无锡)有限公司 TCN-Transformer-CTC-based end-to-end Chinese voice recognition method
CN114283791A (en) * 2021-11-30 2022-04-05 广东电力信息科技有限公司 Speech recognition method based on high-dimensional acoustic features and model training method
CN114305484A (en) * 2021-12-15 2022-04-12 浙江大学医学院附属儿童医院 Heart disease heart sound intelligent classification method, device and medium based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ASHISH VASWANI 等: ""Attention is all you need"", 《HTTPS://ARXIV.ORG/ABS/1706.03762》, pages 1 - 15 *
Y. HUANG: ""PhysioVec: IoT Biosignal Based Search Engine for Gastrointestinal Health"", 《2022 7TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA)》, vol. 1, pages 230 - 236 *

Also Published As

Publication number Publication date
CN116052725B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
CN111161715B (en) Specific sound event retrieval and positioning method based on sequence classification
CN113806609A (en) Multi-modal emotion analysis method based on MIT and FSM
CN114783418B (en) End-to-end voice recognition method and system based on sparse self-attention mechanism
CN111986699A (en) Sound event detection method based on full convolution network
CN111341294A (en) Method for converting text into voice with specified style
KR102406512B1 (en) Method and apparatus for voice recognition
CN115393968A (en) Audio-visual event positioning method fusing self-supervision multi-mode features
Kohlsdorf et al. An auto encoder for audio dolphin communication
Bu et al. A Monte Carlo search-based triplet sampling method for learning disentangled representation of impulsive noise on steering gear
Lu et al. Temporal Attentive Pooling for Acoustic Event Detection.
CN112735466A (en) Audio detection method and device
CN117310668A (en) Underwater sound target identification method integrating attention mechanism and depth residual error shrinkage network
CN116052725B (en) Fine granularity borborygmus recognition method and device based on deep neural network
CN112052880A (en) Underwater sound target identification method based on weight updating support vector machine
CN113488069B (en) Speech high-dimensional characteristic rapid extraction method and device based on generation type countermeasure network
CN115472182A (en) Attention feature fusion-based voice emotion recognition method and device of multi-channel self-encoder
Anindya et al. Development of Indonesian speech recognition with deep neural network for robotic command
CN112951270B (en) Voice fluency detection method and device and electronic equipment
CN113488027A (en) Hierarchical classification generated audio tracing method, storage medium and computer equipment
JP2017134321A (en) Signal processing method, signal processing device, and signal processing program
Wilkinghoff et al. TACos: Learning temporally structured embeddings for few-shot keyword spotting with dynamic time warping
Sharma et al. Comparative analysis of various feature extraction techniques for classification of speech disfluencies
CN115376547B (en) Pronunciation evaluation method, pronunciation evaluation device, computer equipment and storage medium
Frost Deep learning based methods for tuberculosis cough classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant