WO2021107941A1 - Procédé et système de séparation de sons à partir de différentes sources - Google Patents

Procédé et système de séparation de sons à partir de différentes sources Download PDF

Info

Publication number
WO2021107941A1
WO2021107941A1 PCT/US2019/063480 US2019063480W WO2021107941A1 WO 2021107941 A1 WO2021107941 A1 WO 2021107941A1 US 2019063480 W US2019063480 W US 2019063480W WO 2021107941 A1 WO2021107941 A1 WO 2021107941A1
Authority
WO
WIPO (PCT)
Prior art keywords
clustered
sound
coded matrix
coded
lung
Prior art date
Application number
PCT/US2019/063480
Other languages
English (en)
Inventor
Wei-Chien Wang
Original Assignee
Vitalchains Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vitalchains Corporation filed Critical Vitalchains Corporation
Priority to PCT/US2019/063480 priority Critical patent/WO2021107941A1/fr
Priority to TW109141909A priority patent/TW202121400A/zh
Publication of WO2021107941A1 publication Critical patent/WO2021107941A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/43Detecting, measuring or recording for evaluating the reproductive systems
    • A61B5/4306Detecting, measuring or recording for evaluating the reproductive systems for evaluating the female reproductive systems, e.g. gynaecological evaluations
    • A61B5/4343Pregnancy and labour monitoring, e.g. for labour onset detection
    • A61B5/4362Assessing foetal parameters
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B7/00Instruments for auscultation
    • A61B7/003Detecting lung or respiration noise
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B7/00Instruments for auscultation
    • A61B7/02Stethoscopes
    • A61B7/04Electric stethoscopes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7203Signal processing specially adapted for physiological signals or for diagnostic purposes for noise prevention, reduction or removal
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7253Details of waveform analysis characterised by using transforms
    • A61B5/7257Details of waveform analysis characterised by using transforms using Fourier transforms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • the present disclosure is generally related to the method, module and system for separation of sounds from different sources. More particularly, the present disclosure is directed to a method, module and system for analysis of heart sounds and lung sounds.
  • Auscultation is an important tool for analyzing and monitoring heart, lung, bowel, and vascular disorders in a human body.
  • a stethoscope is often used by a physician to perform auscultation.
  • the physician may use chest auscultation to analyze, monitor, or diagnose various disorders in the cardiovascular system and respiratory system.
  • the lung, the heart, and the thoracic aorta are located in the thoracic cavity.
  • the lung sound is generated from exhale and inhale
  • the heart sound is generated from diastole and systole in a cardiac cycle. Due to the proximity of lung and heart, auscultation sound from the chest auscultation is often a mixture of lung sound and heart sound.
  • the heart sound and lung sound may be noises or interferences in the chest auscultation.
  • the presence of lung sound is a noise to the physician; when performing auscultation of the lung, the presence of heart sound is also a noise.
  • These noises or interferences could be amplified by an earpiece of the stethoscope when performing auscultation.
  • auscultation of body parts other than the chest could also be hindered by another source in the human body that is not the subject of auscultation.
  • auscultation of fetus heart could be interfered by the heart sound or the lung sound of the mother, or the auscultation of mother heart could be interfered by the heart sound of the fetus.
  • Improved stethoscope technology is another approach to manage interferences in auscultation.
  • Digital stethoscopes including analog-digital signal converters or computers may reduce or filter the interferences in auscultation.
  • US20180317876A1 discloses a digital stethoscope that includes a noise reduction system for removing heart sound in lung auscultation.
  • both heart sound and lung sound contain important information for the diagnosis, analysis, or monitoring for the heart and the lung. If the heart sound is removed by a noise reduction system and a heart auscultation is later required on the same patient, the physician would need to conduct another chest auscultation to retrieve previously discarded heart sounds. This would a burden for both the physician and the patient.
  • US20180125444A1 discloses an auscultation device for detecting a cardiac and/or respiratory disease, and accelerometer signals are used to monitor the inhalation- exhalation cycle of the lung to identify the lung sound.
  • accelerometer signals are used to monitor the inhalation- exhalation cycle of the lung to identify the lung sound.
  • additional accelerometers increase the weight of said device, thus the device would be difficult to be carried by the physician.
  • the interferences can be the lung sound or the heart sound in chest auscultation, the heart sound or the lung sound of the mother in fetus heart auscultation, the heart sound of the fetus in heart auscultation of the mother, or other interferences from source that is not the subject of auscultation.
  • An embodiment of the present disclosure provides a system for sound separation.
  • the system comprises a sound collector for collecting a physiological signal and a sound separation device.
  • the sound separation device comprises: a transformation unit, configured to receive the physiological signal from the sound collector, for transforming the physiological signal into a spectrum; an encoder, configured to receive the spectrum from the transformation unit, for generating a coded matrix from the spectrum; a Fourier transformer, configured to receive the coded matrix from the encoder, for generating a periodicity coded matrix according to the coded matrix; a latent space cluster, configured to receive the coded matrix from the encoder and the periodicity coded matrix from the Fourier transformer, for grouping a plurality of clustered coded matrices according to the coded matrix and the periodicity coded matrix; a decoder, configured to receive the clustered coded matrix from the latent space cluster, for generating a plurality of clustered spectrums from the plurality of the clustered coded matrices; and an inverse Fourier transformer,
  • the encoder and the decoder are configured to form a deep autoencoder.
  • the deep autoencoder is implemented with a minimizing-mean-squared-errors (MSE) loss function.
  • MSE minimizing-mean-squared-errors
  • the latent space cluster is a K-means latent space cluster.
  • the sound collector is a stethoscope.
  • the clustered sound signals are originated from different sources at an auscultation site.
  • the clustered sound signals are originated from different sources at different auscultation sites.
  • the physiological signal comprises a heart sound and a lung sound.
  • the plurality of clustered coded matrices comprises a heart sound clustered coded matrix corresponding to the heart sound and a lung sound clustered coded matrix corresponding to the lung sound.
  • the clustered sound signals comprise a clustered heart sound and a clustered lung sound.
  • each of the clustered sound signals is reconstructed from one of the clustered spectrums.
  • An embodiment of the present disclosure provides a sound separation device.
  • the sound separation device comprises: a transformation unit for transforming a physiological signal into a spectrum; an encoder, configured to receive the spectrum from the transformation unit, for generating a coded matrix from the spectrum; a Fourier transformer, configured to receive the coded matrix from the encoder, for generating a periodicity coded matrix according to the coded matrix; a latent space cluster, configured to receive the coded matrix from the encoder and the periodicity coded matrix from the Fourier transformer, for grouping a plurality of clustered coded matrices according to the coded matrix and the periodicity coded matrix; a decoder, configured to receive the clustered coded matrix from the latent space cluster, for generating a plurality of clustered spectrums from the plurality of the clustered coded matrices; and an inverse Fourier transformer, configured to receive the clustered spectrums from the decoder, for reconstructing a plurality of clustered sound signals from a plurality
  • An embodiment of the present disclosure provides a method for sound separation.
  • the method comprises steps of: receiving a physiological signal by a sound collector; transforming the physiological signal received by the sound collector into a spectrum by a transforming unit; generating a coded matrix by an encoder, from the spectrum transformed by the transforming unit; generating a periodicity coded matrix by a Fourier transformer, according to the coded matrix generated by the encoder; grouping a plurality of clustered coded matrices by a latent space cluster, according to the coded matrix generated by the encoder and the periodicity coded matrix generated by the Fourier transformer; generating a plurality of clustered spectrums by a decoder, from the plurality of the clustered coded matrices grouped by the latent space cluster; and reconstructing a plurality of clustered sound signals by an inverse Fourier transformer, from a plurality of the clustered spectrums generated by the decoder.
  • the physiological signal is collected by a stethoscope.
  • An embodiment of the present disclosure provides another method for sound separation. The method comprises steps of: generating a coded matrix by an encoder, from a spectrum of a physiological signal; generating a periodicity coded matrix by a Fourier transformer, according to the coded matrix generated by the encoder; grouping a plurality of clustered coded matrices by a latent space cluster, according to the coded matrix generated by the encoder and the periodicity coded matrix generated by the Fourier transformer; generating a plurality of clustered spectrums by a decoder, from the plurality of the clustered coded matrices grouped by the latent space cluster; and reconstructing a plurality of clustered sound signals by an inverse Fourier transformer, from a plurality of the clustered spectrums generated by the decoder.
  • FIG. 1 is a schematic illustration of a system for sound separation, in accordance with an embodiment of the present disclosure.
  • FIG. 2 is schematic diagram of a sound collector and a sound separation device, in accordance with an embodiment of the present disclosure.
  • FIG. 3 is an architecture of a deep autoencoder, in accordance with an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a sound separation process within the sound separation device, in accordance with an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of some of the steps in FIG. 4, in accordance with an embodiment of the present disclosure.
  • FIG. 6A is a schematic representation of a periodicity coded matrix
  • FIG. 6B and 6C are schematic representations of coded matrixes, in accordance with an embodiment of the present disclosure.
  • FIG. 7A is a representation of a clustered spectrum
  • FIG. 7B is a representation of another clustered spectrum, in accordance with an embodiment of the present disclosure.
  • FIG. 8A is a representation of a clustered sound signal
  • FIG. 8B is a representation of another clustered sound signal, in accordance with an embodiment of the present disclosure.
  • FIG. 9 is a schematic illustration of a system for sound separation, in accordance with an embodiment of the present disclosure.
  • FIG. 10 is a schematic illustration of a system for sound separation, in accordance with an embodiment of the present disclosure.
  • FIG. 11 is a schematic illustration of a system for sound separation, in accordance with an embodiment of the present disclosure.
  • Coupled is defined as connected, whether directly or indirectly through intervening components, and is not necessarily limited to physical connections.
  • the connection can be such that the objects are permanently connected or releasably connected.
  • comprising means “including, but not necessarily limited to”, it specifically indicates open-ended inclusion or membership in the so-described combination, group, series and the like.
  • FIG. 1 is a schematic illustration of a system in accordance with an embodiment of the present disclosure.
  • a system 10 comprises a sound collector 11 and a sound separation device 12.
  • the sound collector 11 collects sound signals, or physiological sounds in a human body, and can be a conventional stethoscope of a digital stethoscope.
  • the sound collector 11 can be placed directly on the chest, the abdomen, or other auscultation sites to perform an auscultation.
  • the purposes of the auscultation can be analyzing, monitoring, or diagnosing various disorders in the cardiovascular system or the respiratory system.
  • the purposes of the auscultation can be analyzing or monitoring the cardiovascular system of a fetus.
  • the sound collector 11 transmits the sound signals to the sound separation device 12 by wireless communication or cable.
  • the sound separation device 12 can receive the sound signals from the sound collector 11.
  • Each of the sound signals from the sound collector 11 may comprise at least two sound signals originated from different sources.
  • the sources of the sound signals can be a lung, a heart, or a thoracic aorta when the sound collector 11 is placed on the chest.
  • the sources of the sound signals can be the heart of the fetus, the heart of the mother, or the lung of the mother when the sound collector 11 is placed on the abdomen of the pregnant woman.
  • FIG. 2 is a schematic diagram of the sound collector 11 and the sound separation device 12, in accordance with an embodiment of the present disclosure.
  • the sound separation device 12 may comprise at least one processor 127 to analyze or separate the sound signals from different sources.
  • the processor can be a CPU, an MPU, or other components that performs computation.
  • the sound separation device 12 can be a mobile device, a personal computer, or a server.
  • the processor of the sound separation device 12 comprising or is coupled to a computer-readable medium.
  • a non-transitory computer program product is embodied in the computer-readable medium.
  • the non-transitory computer program product embodied in the computer-readable medium can be executed by the processor 127 of the sound separation device 12, to separate the sound signals from the different sources.
  • the sound separation device 12 comprises a transformation unit 121, an encoder 122, a Fourier transformer 123, a latent space cluster 124, a decoder 125, and an inverse Fourier transformer 126.
  • Each of the above elements can be constituted as a part of the processor, or can be embedded into the non-transitory computer program product.
  • the transformation unit 121 of the sound separation device 12 receives the sound signals from the sound collector 11, transforms the signals into spectrums, and the spectrums are received by the encoder 122. The mechanism and process required for the sound separation is described below.
  • the encoder 122 and the decoder 125 of the sound separation device 12 are configured to form a deep autoencoder (DAE).
  • DAE deep autoencoder
  • the encoder 122 encodes an input x n to a latent space, and the decoder 125 then attempts to reconstruct the input by decoding from the latent space.
  • the reconstructed output y n is aimed to approximate X by minimizing the mean squared errors (MSE).
  • w E and w D are encoding and decoding matrixes
  • b E and b D are vectors of biases
  • l n 6 R Mxl wherein M is an amount of neurons in the latent space.
  • the DAE can be constructed with different architectures, including fully connected layer, convolutional layer, and de-convolutional layer.
  • An objective of the autoencoder is to learn a representation (encoding) for a set of data.
  • An embodiment of the present disclosure analyzes the periodicity of latent representation, classifies, and groups the latent representation corresponding to the original set of data. Therefore, different DAE architectures have different expressiveness for modeling sound signals could be implemented by various embodiments of the present disclosure for different applications.
  • a deep convolutional autoencoder DCAE
  • FIG. 3 is an architecture of the DAE, in accordance with an embodiment of the present disclosure.
  • the DCAE is constructed with several convolutional layers and deconvolutional layers.
  • the convolutional layers extract features of the input and the deconvolutional layers reconstruct approximate sound signals from the feature represented by the convolutional layers.
  • the convolutional layers connect multiple input activations within a filter to form a single activation.
  • layer 128a can be a Conv2D(4xl) 32 Unit
  • layer 128b can be a Conv2D(4xl) 16 Unit
  • layer 128c can be a Conv2D(3xl) 8 Unit
  • layer 128d can be a latent space.
  • the deconvolutional layer is a contrast of the convolutional layer, and the deconvolutional layer associates the single input activation with multiple outputs.
  • Layer 128e can be a DeConv2D(3xl) 8 Unit
  • layer 128f can be a DevCon2D(3xl) 16 Unit
  • layer 128g can be a DeConv2D(4xl) 32 Unit.
  • the types of autoencoder, the structure of layers, the number of units in each layer, the number of neurons, and other details in the configuration of FIG. 3 are illustrative only. One of ordinary skills in the art would understand the flexibility of autoencoder configurations without degrading signal separation, classification, or purification, and preventing under-fitting or over-fitting.
  • w is defined as a kernel and x is defined as input.
  • FIG. 4 is a schematic diagram of a sound separation process within the sound separation device 12, in accordance with an embodiment of the present disclosure.
  • the proposed periodicity-coded deep auto-encoder (PC-DAE) in the sound separation device 12 is able to solve blind source separation (BSS) problems.
  • the BSS problem is the separation of signals from different sources within a mixed signal, without any prior knowledge of the sources or how the signals are mixed.
  • the BSS problem can be the separation of a heart sound and a lung sound from the sound signal of a chest auscultation, because the lung sound and the heart sound are collected at one auscultation site, and overlapped on a frequency range of 50 Hz to 150 Hz.
  • the BSS problem of the present disclosure can also be the separation of a fetal heart sound and a maternal heart sound/lung sound from the sound signal of a fetal auscultation.
  • the physiological signal is a mixture of at least two sound signals, each of the sound signals are originated from different sources.
  • the physiological signal can be in a digital format, and the digital format may be converted from an analog sound signal.
  • each of the physiological signals is collected from the auscultation site and originated from at least two sources.
  • the source of the physiological signals can be a lung or a heart when the physiological signals are collected by the sound collector 11 from the chest, therefore the physiological signal from this auscultation site at least comprises the lung sound and the heart sound.
  • the source of the physiological signals can also be the heart of the fetus, the heart of the mother, or the lung of the mother when the physiological signals are collected by the sound collector 11 from the abdomen of the pregnant woman, therefore the physiological signal from this auscultation site at least comprises the fetal heart sound and the maternal heart/lung sound.
  • the physiological signals may also be stored in the database, wherein the location of sound collection is already identified within each of the physiological signals.
  • the physiological signals may obtain separately from different auscultation sites.
  • Various auscultation site localizing methods can be implemented with an embodiment of the present disclosure, such as the auscultation site locating methods described in U.S. Patent Publication No. US20160143512A1.
  • the sound separation device 12 receives the physiological signals.
  • the physiological signals can be from the sound collector 11 or the database.
  • the physiological signal includes 2 sound signals originated from 2 sources, namely a 1 st sound signal originated from a 1 st source and a 2 nd sound signal originated from a 2 nd source.
  • the physiological signal may include more than 2 sound signals originated from more than 2 sources.
  • Step S2 of FIG. 4 the transformation unit 121 of the sound separation device 12 transforms the physiological signals into a spectrum by Short Time Fourier Transformation (STFT), Fast Fourier Transformation (FFT), or Discrete Fourier Transformation (DFT).
  • the spectrum can be logarithmized into log power spectrum (LPS).
  • LPS log power spectrum
  • the LPS is modeled by the DAE, and the DAE decreases the mean square error of reconstructed loss by using back-propagation algorithms.
  • the transformation unit 121 is in communication with the sound collector 11 and configured to receive the sound signals collected by the sound collector 11.
  • Step S3 of FIG. 4 a coded matrix is generated from the spectrum of Step S2 by the encoder 122 in the sound separation device 12.
  • the coded matrix exhibits significant expressiveness on temporal information.
  • the encoder 122 is in communication with the transformation unit 121 and configured to receive the spectrum generated in Step S2 by the transformation unit 121.
  • the encoder 122 is configured to correspond to the decoder 125 according to the general autoencoder structure described in FIG. 3.
  • Step S4 of FIG. 4 the coded matrix generated in Step S3 is further transformed to a periodicity coded matrix via DFT for analyzing the temporal information.
  • the transformation of the periodicity coded matrix is performed by the Fourier transformer 123, the Fourier transformer 123 is in communication with the encoder 122 and configured to receive the coded matrix generated in Step S3 by the encoder 122.
  • a latent space cluster 124 is used for grouping a plurality of clustered coded matrices.
  • the latent space cluster 124 is in communication with the encoder 122 and the Fourier transformer 123, and is configured to receive the coded matrix generated in Step S3 by the encoder 122 and the periodicity coded matrix generated in Step S4 by the Fourier transformer 123.
  • the technical principles of signal processing in Step S4 and Step S5 are described below.
  • the physiological signal includes 2 different sound signals
  • different neurons of the periodicity analysis algorithm are activated by the 1 st sound signal or the 2 nd sound signal in the latent space.
  • the log power spectrum sequence is inputted to the encoder 122 to obtain latent representations at each time step.
  • the latent representation (latent space sequences) then concatenates to a coded matrix L, and L 6 R MxN
  • the coded matrix L is represented as Z mix comprising the latent representations, as described below:
  • each of the coded matrix element zTM ic is the row of L, and the coded matrix element z ; mi serves as time factor.
  • Zj nix [lj l jN ], 1 ⁇ j ⁇ M, l R 1xN , M is the amount of the neurons in the latent space, and N is the duration.
  • Step S4 of FIG. 4 the periodicity coded matrix is generated by a Fourier transformer 123 in the sound separation device 12 for further temporal information analysis.
  • the sound signals could be separated from each other by identifying different periodicity characteristic, thus blind source separation (BSS) problems are solved.
  • BSS blind source separation
  • a sound collector may collect a physiological signal with lung sound, heart sound, thermal noise, and other noises generated during the chest auscultation. Because the lung sound and the heart sound exhibit different periodicity characteristics, they could be identified in S5 of FIG.4 by clustering methods. Therefore, to facilitate periodicity analysis, the periodicity coded matrix is generated by the Fourier transformer 123 via Discrete Fourier Transformation (DFT).
  • DFT Discrete Fourier Transformation
  • a latent space cluster 124 is used for grouping the plurality of clustered coded matrices.
  • the latent space cluster 124 is implemented with sparse NMF clustering method for analyzing the periodicity coded matrix.
  • the sparse NMF clustering method can be described as:
  • [h l ... ] provides a cluster membership, wherein h j 6 R kx( ⁇ ⁇ ⁇ +1) and k is set to an amount of the cluster basis and 1 ⁇ j £ M.
  • the grouped cluster of p j is determine by the largest h among h j ’s index; A represents a sparsity penalty factor;
  • represents Ll- norm;
  • 2 represents a distance measurement.
  • the clustering result of periodicity matrix p is assigned to the coded matrix.
  • the coded matrixes that shared similar periodicity are assigned as clustered coded matrixes.
  • Periodicity Analysis Algorithm Periodicity-coded analysis for coded matrix separation
  • Cluster P by sparse NMF, and get the label H p and the corresponding to Z mix set source to ⁇ 1st, 2nd) foreach source set z source to Z mix then do set Z j SOUrce to min element(Z j 0urce ) end end return Z source end
  • the clustering algorithm may include K-means clustering, NMF clustering, or pre-trained supervised clustering.
  • a mask is constructed for masking interferences of each source.
  • one embodiment of the present disclosure could be generating specific clustered spectrums while masking other sounds. These “other sounds” are regarded as interferences, in view of desired sound sources corresponding to specific clustered spectrums. Therefore, an embodiment of the present disclosure could be generating clustered spectrums of lung sounds while masking heart sounds and other noises, and another embodiment could be generating clustered spectrums of heart sounds while masking lung sounds and other noises.
  • the mask is defined as the following:
  • Step S6 of FIG. 4 the clustered coded matrix is decoded by the decoder 125 and are constructed into a plurality of clustered spectrums.
  • the decoder 125 is in communication with the latent space cluster 124, and configured to receive the clustered coded matrix grouped in Step S5 by the latent space cluster 124.
  • the physiological signal includes 2 sound signals, two cluster is presented.
  • a first mask M lst (A, /) and a second mask M 2n(i (A, /) are constructed after the clustered coded matrix and their corresponding spectrums/LPS are obtained.
  • the first LPS Y lst and second LPS Y 2nd are generated from the mixed LPS while applying masks for specific sounds.
  • Step S7 of FIG. 4 the clustered spectrums are transformed into a plurality of clustered sound signals by the inverse Fourier transformer 126 through Inverse Short Time Fourier Transformation (ISTFT) or Inverse Discrete Fourier Transformation.
  • ISTFT Inverse Short Time Fourier Transformation
  • the clustered sound signals are reconstructed, and each of the clustered sound signals is reconstructed from one of the clustered spectrums for a specific sound source.
  • the inverse Fourier transformer 126 is in communication with the decoder 125, and configured to receive the clustered spectrums decoded in Step S6 by the decoder 125.
  • Each of the clustered sound signals reconstructed in Step S7 is originated from a source. If the physiological signal received in SI is collected from one auscultation site, then the clustered sound signals may be originated from different sources at the auscultation site: if the physiological signal is collected from a location of the chest, then one of the clustered sound signals is originated from the lung, and another of the clustered sound signals is originated from the heart. Additionally, if the physiological signal received in SI is collected different auscultation sites, then the clustered sound signals may be originated from different sources: if the physiological signals are collected separately from 2 or more auscultation sites of the chest, then one of the clustered sound signal is originated from the lung, and another of the clustered sound signals is originated from the heart.
  • FIG. 5 is a schematic diagram of Step S4, S5, and S6 in FIG. 3, in accordance with an embodiment of the present disclosure.
  • a periodicity coded matrix 41 is generated by the Fourier transformer 123 via DFT, clustered coded matrixes 51 and 52 are grouped by the latent space cluster 124, and clustered spectrums 61 and 62 are generated from the clustered coded matrixes 51 and 52.
  • FIG. 6A is a schematic representation of the periodicity coded matrix 41, in accordance with an embodiment of the present disclosure.
  • the periodicity coded matrix 41 is presented with a y-axis of codes, and an x-axis of DFT frames representing time.
  • the periodicity coded matrix 41 comprises at least 2 periodicities, wherein the features that periodically occurred are illustrated with different shades of gray.
  • the features that constitute the periodicity can be the amplitude or the waveform of the sound signal.
  • FIG. 6B and 6C are schematic representations of the coded matrix, in accordance with an embodiment of the present disclosure.
  • a first coded matrix 411 and a second coded matrix 412 is presented as having different periodicities.
  • Each of the first coded matrix 411 and the second coded matrix 412 are with different periodicities.
  • the coded matrixes 411 and 412 are to be taken together with Step S5 of FIG. 5.
  • Step S5 of FIG. 5 the coded matrixes 411 and 412 are grouped by the latent space, and the clustered coded matrixes 51 and 52 are generated.
  • the clustered coded matrix 51 is corresponded to the coded matrix 411
  • the clustered coded matrix 52 is corresponded to the coded matrix 412.
  • Step S6 of FIG. 5 the clustered coded matrix 51 is decoded to generate the clustered spectrum 61, and clustered coded matrix 52 is decoded to generate the clustered spectrum 62.
  • FIG. 7A is a representation of the clustered spectrum 61
  • FIG. 7B is a representation of the clustered spectrum 62, in accordance with an embodiment of the present disclosure.
  • FIG. 8A and 8B are representations of the clustered sound signals, in accordance with an embodiment of the present disclosure.
  • FIG. 8A and 8B are amplitude vs. time graph, wherein a y-axis is an amplitude of the signal, and an x-axis represents time stamps of the signal.
  • the heart sound and the lung sound in mixed heart-lung sounds collected by an auscultation system are separated by an embodiment of the present disclosure.
  • the auscultation is performed on a SAM student auscultation manikin (SAM® 3G, Cardionics), wherein the SAM manikin comprises a standard sound library recorded from patients.
  • a sound dataset is constructed from the standard sound library, and at least comprises a plurality of mixtures of the heart sounds and the lung sounds, wherein the heart sounds comprise normal heart sounds with 2 beating speeds and the lung sounds comprise normal, wheezing, rhonchi, and stridor.
  • SNR signal to noise ratio
  • the mixed heart-lung sounds are broadcasted from the SAM manikin and collected by an electronic stethoscope (iMEDIPLUS). All of the collected sounds are sampled at 8k Hz, with a DFT frame length of 2048 samples, a DFT frame shift of 128 samples, respectively.
  • the DAE model used in the embodiment is consisted of 7 hidden layers, and each of the layer has 1024, 512, 256, 128, 256, 512, andl024 neurons respectively.
  • the encoder in the DCAE model used in the embodiment is consisted of 3 convolutional layers: a 1 st layer having 32 filters with the kernel size of of 1x4, a 2 nd layer having 16 filters with the kernel size of 1x3 and a 3 rd layer having 8 filters with the kernel size of 1 x 3.
  • the decoder in the DCAE model used in the embodiment is consisted of 3 deconvolutional layers: the 1 st layer having 8 deconvolutional filters with the kernel size of 1x3, the 2 nd layer having 16 deconvolutional filter with the kernel size of 1x3, and the 3 rd layer having 32 deconvolutional filter with the kernel size of 1 X4.
  • Both the DAE and the DCAE models set an activation function of the encoder as a rectified linear unit, the activation function of the decoder as a hyperbolic tangent.
  • the optimizers of both the DAE and the DCAE models are set to adam optimizer.
  • An unsupervised NMF-based is taken as a baseline of performance benchmark, and a basis number of NAIF is set to 50.
  • a L2 cost function is chosen for simulation.
  • the DAE model, and the NMF model are implemented with the periodicity analysis algorithms previously described, and thus PC-NMF, PC-DAE, and PC-DCAE models are generated from the above combinations.
  • the PC-NMF, PC-DAE, and PC- DCAE are performed in a personal computer with the sound separation process illustrated and described in FIG. 4, with NMF as a baseline of performance benchmark.
  • the signal quality of the separated sound signals is evaluated by a signal distortion ratio (SDR) and a signal to interferences ratio (SIR).
  • SDR indicates similarity (distortion) between the original SAM® signal and the reconstructed signal.
  • SIR indicates a signal separation clarity, and the SIR value is higher when signals from different sources in the mixed signal are not interfering with each other.
  • Table 1 and Table 2 illustrate the SDR and SIR evaluations for the NMF, PC-NMF, DCAE, PC-DAE, and PC-DCAE models in the separated heart sounds and the separated lung sounds. [0089] Table 1 : Evaluation for separated heart sounds
  • the SDRs in the models implemented with the periodicity analysis algorithms have higher performances, meaning the heart sounds and the lung sounds are better reconstructed in the PC-models than the models without periodicity analysis algorithms
  • PC-models periodicity coded models
  • the SDR has increased from 2.31 in the NMF to 3.91 in the PC-NMF; in 2dB, the SDR has increased from 2.26 in the NMF to 5.39 in the PC- NMF; in 6dB, the SDR has increased from 4.57 in the NMF to 7.60 in the PC-NMF. Additionally, a comparison between the DCAE and the PC-DCAE shows the periodicity coded models have significantly improved the signal recovery qualities in the separated heart sounds.
  • the SDR has increased from -0.33 in the DCAE to 3.61 in the PC- DCAE; in -2dB the SDR has increased from 4.81 in the DCAE to 7.22 in the PC-DCAE; in OdB, the SDR has increased from 5.77 in the DCAE to 9.15 in the PC-DCAE; in 2dB, the SDR has increased from 6.87 in the DCAE to 10.46 in the PC-DCAE; in 6dB, the SDR has increased from 9.63 in the DCAE to 16.14 in the PC-DCAE.
  • Table 1 indicate the periodicity analysis algorithms have improved the recovery quality in the separated heart sounds, this suggests the PC-models can be used in a severer auscultation environment. Comparing with the DAE algorithm, the DCAE algorithm is more compatible with the PC-model, therefore the PC-DCAE has better performance than the PC-DAE.
  • the SDR has increased from -0.93 in the DCAE to 4.82 in the PC-DCAE; in -2dB, the SDR has increased from 2.94 in the DCAE to 7.78 in the PC-DCAE; in OdB, the SDR has increased from 5.15 in the DCAE to 9.36 in the PC-DCAE; in 2dB, the SDR has increased from 7.31 in the DCAE to 10.20 in the PC-DCAE; in 6dB, the SDR has increased from 9.14 in the DCAE to 11.53 in the PC-DCAE.
  • Table 2 indicates the periodicity analysis algorithms have improved the recovery quality in the separated lung sounds, this suggests the PC-models can be used in a severer auscultation environment.
  • the SDRs and the SIRs in Table 1 demonstrate the periodicity coded models have improved performances in the separated heart sounds.
  • the SDRs and the SIRs in Table 2 demonstrate the PC-DCAE model has improved performances in the separated lung sounds.
  • the embodiment described above separates the heart sounds and the lung sounds in the sound dataset constructed from the sound library, and the separated heart sounds and the lung sounds have better performance in the signal recovery quality and the signal separation clarity. 4. Configurations of sound separation systems
  • FIG. 9 is a schematic illustration of a system 20 in accordance with an embodiment of the present disclosure.
  • the system 20 comprises a sound collector 21 and a mobile device 221, and a processor 222.
  • the sound collector 21 can be placed directly on the chest, the abdomen, or other auscultation sites to perform an auscultation.
  • the sound collector 21 transmits the sound signals to the mobile device 221 by wireless communication or cable.
  • the sound collector 21 may be capable of performing auscultation on many different auscultation sites at the same time.
  • the mobile device 221 can receive the sound signals from the sound collector 21. Each of the sound signals from the sound collector 11 may comprise at least two sound signals originated from different sources.
  • the mobile device 221 can be a mobile phone, a personal computer, or other computational device with communication functions and can be carried by an individual.
  • the mobile device 221 may transform the sound signals into spectrums, and wirelessly sends the spectrums to the processor 222 in a cloud infrastructure.
  • the mobile device may also simply transmit the sound signals to the cloud infrastructure, without performing any transformation.
  • the cloud infrastructure may be physically distant from the mobile device 221 or the sound collector 21.
  • the cloud infrastructure comprises the processor 222 for separating sound signals, and the processor 222 comprises an encoder, a Fourier transformer, a latent space cluster, a decoder, and an inverse Fourier transformer.
  • the processor 222 may further comprise a transformation unit if the mobile device 221 does not transform the sound signals into the spectrums. The function of the processor 222 and the procedure of the sound separation are described in FIG. 2 and 4. After the clustered signals are reconstructed by the inverse Fourier transformer in the processor 222, the cloud infrastructure may transmit the clustered signals to the mobile device 221.
  • the mobile device 221 may comprise a user interface so that a physician or the patient may retrieve the clustered signals from the mobile device 221.
  • the physician or the patient may listen to the clustered signals to analyze, monitor, or diagnose various disorders related to the auscultation site.
  • FIG. 10 is a schematic illustration of a device 30 in accordance with an embodiment of the present disclosure.
  • the device 30 is capable of performing auscultation and processing the sound signals.
  • the device 30 can be directly placed on the chest, the abdomen, or other auscultation sites to perform an auscultation, and may be capable of performing auscultation on many different auscultation sites at the same time.
  • the sound signals collected by the device 30 can be processed by a processor inside the device 30, and the sound signals originated from different sources can be separated by the procedure described and illustrated in FIG. 4.
  • the physician may retrieve the clustered sound signals from the device 30 to monitor, analyze, or diagnose various disorders.
  • FIG. 11 is a schematic illustration of a system 40 in accordance with an embodiment of the present disclosure.
  • the system 40 comprises a device 41 and a cloud infrastructure 42.
  • the device 41 can be placed directly on the chest, the abdomen, or other auscultation sites to perform an auscultation.
  • the device 41 may be capable of performing auscultation on many different auscultation sites at the same time.
  • the device 41 may transmit the sound signals to the cloud infrastructure 42 by wireless communication or cable, or the device 41 may transform the sound signals into spectrums, and wirelessly sends the spectrums to the cloud infrastructure 42.
  • the cloud infrastructure 42 may be physically distant from the device 41.
  • the cloud infrastructure 42 may comprise an encoder, a Fourier transformer, a latent space cluster, a decoder, and an inverse Fourier transformer.
  • the cloud infrastructure 42 may further comprise a transformation unit if the device 41 does not transform the sound signals into the spectrums. The functions of the cloud infrastructure 42 and the procedure of the sound separation are described in FIG. 2 and 4. After the clustered signals are reconstructed by the inverse Fourier transformer, the cloud infrastructure 42 may transmit the clustered signals to the device 41.
  • the device 41 may comprise a user interface so that a physician or the patient may retrieve the clustered signals from the device 41. The physician or the patient may listen to the clustered signals to analyze, monitor, or diagnose various disorders related to the auscultation site.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Biomedical Technology (AREA)
  • Acoustics & Sound (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Pulmonology (AREA)
  • Pediatric Medicine (AREA)
  • Pregnancy & Childbirth (AREA)
  • Gynecology & Obstetrics (AREA)
  • Reproductive Health (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

La présente invention concerne un système de séparation de son, comprenant un collecteur de son pour collecter un signal physiologique et un dispositif de séparation de son. Le dispositif de séparation de son comprend une unité de transformation pour transformer le signal physiologique en un spectre ; un codeur pour générer une matrice codée à partir du spectre ; un transformateur de Fourier pour générer une matrice à codage de périodicité selon la matrice codée ; un groupe d'espace latent pour regrouper une pluralité de matrices codées regroupées en fonction de la matrice codée et de la matrice à périodicité codée ; un décodeur pour générer une pluralité de spectres regroupés à partir de la pluralité des matrices codées regroupées ; et un transformateur de Fourier inverse pour reconstruire une pluralité de signaux sonores regroupés à partir d'une pluralité des spectres regroupés.
PCT/US2019/063480 2019-11-27 2019-11-27 Procédé et système de séparation de sons à partir de différentes sources WO2021107941A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/US2019/063480 WO2021107941A1 (fr) 2019-11-27 2019-11-27 Procédé et système de séparation de sons à partir de différentes sources
TW109141909A TW202121400A (zh) 2019-11-27 2020-11-27 異源聲音分離系統及方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2019/063480 WO2021107941A1 (fr) 2019-11-27 2019-11-27 Procédé et système de séparation de sons à partir de différentes sources

Publications (1)

Publication Number Publication Date
WO2021107941A1 true WO2021107941A1 (fr) 2021-06-03

Family

ID=76128735

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/063480 WO2021107941A1 (fr) 2019-11-27 2019-11-27 Procédé et système de séparation de sons à partir de différentes sources

Country Status (2)

Country Link
TW (1) TW202121400A (fr)
WO (1) WO2021107941A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114305485A (zh) * 2021-12-31 2022-04-12 科大讯飞股份有限公司 心跳监测方法以及心跳监测装置、计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110249822A1 (en) * 2008-12-15 2011-10-13 France Telecom Advanced encoding of multi-channel digital audio signals
US20170270941A1 (en) * 2016-03-15 2017-09-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoding apparatus for processing an input signal and decoding apparatus for processing an encoded signal
US20170301354A1 (en) * 2014-10-02 2017-10-19 Sony Corporation Method, apparatus and system
WO2019079829A1 (fr) * 2017-10-21 2019-04-25 Ausculsciences, Inc. Procédé de prétraitement et de criblage de signaux sonores auscultatoires

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110249822A1 (en) * 2008-12-15 2011-10-13 France Telecom Advanced encoding of multi-channel digital audio signals
US20170301354A1 (en) * 2014-10-02 2017-10-19 Sony Corporation Method, apparatus and system
US20170270941A1 (en) * 2016-03-15 2017-09-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoding apparatus for processing an input signal and decoding apparatus for processing an encoded signal
WO2019079829A1 (fr) * 2017-10-21 2019-04-25 Ausculsciences, Inc. Procédé de prétraitement et de criblage de signaux sonores auscultatoires

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KONSTANTINOS KAMNITSAS; DANIEL C. CASTRO; LOIC LE FOLGOC; IAN WALKER; RYUTARO TANNO; DANIEL RUECKERT; BEN GLOCKER; ANTONIO CRIMINI: "Semi-Supervised Learning via Compact Latent Space Clustering", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 June 2018 (2018-06-07), 201 Olin Library Cornell University Ithaca, NY 14853, XP080888166 *
POURAZAD M.T., MOUSSAVI Z., FARAHMAND F., WARD R.K.: "Heart Sounds Separation From Lung Sounds Using Independent Component Analysis", ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, 2005. IEEE-EMBS 2005. 27T H ANNUAL INTERNATIONAL CONFERENCE OF THE SHANGHAI, CHINA 01-04 SEPT. 2005, PISCATAWAY, NJ, USA,IEEE, 1 September 2005 (2005-09-01) - 4 September 2005 (2005-09-04), pages 2736 - 2739, XP010908368, ISBN: 978-0-7803-8741-6, DOI: 10.1109/IEMBS.2005.1617037 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114305485A (zh) * 2021-12-31 2022-04-12 科大讯飞股份有限公司 心跳监测方法以及心跳监测装置、计算机可读存储介质

Also Published As

Publication number Publication date
TW202121400A (zh) 2021-06-01

Similar Documents

Publication Publication Date Title
Chowdhury et al. Time-frequency analysis, denoising, compression, segmentation, and classification of PCG signals
CN109934089B (zh) 基于监督梯度提升器的多级癫痫脑电信号自动识别方法
CN110123367B (zh) 计算机设备、心音识别装置、方法、模型训练装置及存储介质
Tsai et al. Blind monaural source separation on heart and lung sounds based on periodic-coded deep autoencoder
CN114266276B (zh) 一种基于通道注意力和多尺度时域卷积的运动想象脑电信号分类方法
Chakrabarti et al. Phonocardiogram signal analysis-practices, trends and challenges: A critical review
Meng et al. Detection of respiratory sounds based on wavelet coefficients and machine learning
Li et al. Automated heartbeat classification using 3-D inputs based on convolutional neural network with multi-fields of view
CN110731778B (zh) 一种基于可视化的呼吸音信号识别方法及系统
CN113925459B (zh) 一种基于脑电特征融合的睡眠分期方法
Rizal et al. Lung sound classification using Hjorth descriptor measurement on wavelet sub-bands
Das et al. Supervised model for Cochleagram feature based fundamental heart sound identification
Zakaria et al. Three resnet deep learning architectures applied in pulmonary pathologies classification
CN113436726B (zh) 一种基于多任务分类的肺部病理音自动化分析方法
Singh et al. Short unsegmented PCG classification based on ensemble classifier
KR20170064960A (ko) 파동신호를 활용한 질병 진단 장치 및 그 방법
Baghel et al. ALSD-Net: Automatic lung sounds diagnosis network from pulmonary signals
Ali et al. An end-to-end deep learning framework for real-time denoising of heart sounds for cardiac disease detection in unseen noise
Salhi et al. Voice disorders identification using hybrid approach: Wavelet analysis and multilayer neural networks
CN113413163B (zh) 一种混合深度学习和低差异度森林的心音诊断系统
Huang et al. Deep learning-based lung sound analysis for intelligent stethoscope
WO2021107941A1 (fr) Procédé et système de séparation de sons à partir de différentes sources
CN112336369A (zh) 一种多通道心音信号的冠心病风险指数评估系统
González-Rodríguez et al. Robust denoising of phonocardiogram signals using time-frequency analysis and U-Nets
CN116439730A (zh) 一种基于脑连接特征和领域自适应的睡眠分期方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19953770

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19953770

Country of ref document: EP

Kind code of ref document: A1