CN109036472B - Improved pathological voice fundamental tone frequency extraction method - Google Patents

Improved pathological voice fundamental tone frequency extraction method Download PDF

Info

Publication number
CN109036472B
CN109036472B CN201810797265.7A CN201810797265A CN109036472B CN 109036472 B CN109036472 B CN 109036472B CN 201810797265 A CN201810797265 A CN 201810797265A CN 109036472 B CN109036472 B CN 109036472B
Authority
CN
China
Prior art keywords
frequency
signal
fundamental
decomposition
framing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810797265.7A
Other languages
Chinese (zh)
Other versions
CN109036472A (en
Inventor
张涛
武雅琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201810797265.7A priority Critical patent/CN109036472B/en
Publication of CN109036472A publication Critical patent/CN109036472A/en
Application granted granted Critical
Publication of CN109036472B publication Critical patent/CN109036472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • G10L19/0216Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation using wavelet decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • G10L2025/906Pitch tracking

Abstract

An improved pathological voice fundamental tone frequency extraction method comprises the following steps: performing signal preprocessing, including direct current removal processing and framing processing; decomposing and reconstructing a wavelet packet, wherein the wavelet packet decomposition is to decompose the signals subjected to framing processing by adopting db6 in Duabeechies series wavelets to obtain signals required by reconstruction, and the number of decomposed layers is determined according to the sampling frequency and the upper limit of the signal fundamental frequency; reconstructing a frame signal before decomposition according to the correlation between each layer of decomposed signals and the corresponding frame signal before decomposition and the signal fundamental frequency range; performing HHT conversion on the reconstructed frame signal to obtain a plurality of IMF components, removing IMF components which do not meet the frequency condition, and then reconstructing a framing signal before HHT conversion; and carrying out fundamental frequency extraction on the framing signals before reconstruction of HHT transformation. The method ensures that the extracted fundamental tone frequency of the pathological voice is still basically kept in the original fundamental frequency range, and ensures the extraction accuracy of the fundamental tone frequency of the voice.

Description

Improved pathological voice fundamental tone frequency extraction method
Technical Field
The invention relates to a method for extracting fundamental tone frequency of pathological voice. In particular to an improved pathological voice fundamental tone frequency extraction method.
Background
The human vocal organs mainly comprise three parts: lung and trachea, larynx, vocal tract. The larynx is an important vocal organ, and is a complex system composed of cartilage and muscle, and contains vocal cords controlled by the cartilage and muscle, and the acoustic function of the vocal cords is to provide a main excitation source during vocalization. The limited space between two vocal cords is called glottis, the time for which a vocal cord is opened and closed once each time is called the period of vocal cord vibration, i.e. the pitch period, and the reciprocal of the pitch period is the pitch frequency, called simply the fundamental frequency. The fundamental frequency of different genders and ages of people and even the same tone may be different, and the thickness, length and relaxation degree of vocal cords also affect the tone. Generally, elderly men are low and young children and young women are high. The range of the fundamental frequency is 60-500 HZ.
The traditional extraction of the fundamental frequency of the voice signal is provided based on a normal voice signal, and the traditional detection algorithm comprises an autocorrelation method, an amplitude difference method and a linear prediction cepstrum method. But second-order frequency multiplication and frequency division errors often occur when the autocorrelation method is used alone.
In the process of extracting pathological voice, the peak value of the autocorrelation method and the average amplitude difference method in the extraction process has mutation, so that frequency multiplication and half frequency are caused. Another classical extraction method is a cepstrum method, and in a cepstrum domain, the cepstrum of the sound pulse of the fundamental tone information can be separated from the corresponding cepstrum of the sound channel, so that the fundamental tone frequency is solved, the precision of pure voice detection is high, and the calculation is complex. However, in the process of extracting the fundamental frequency of the pathological voice, the traditional cepstrum method generates errors when a sound channel convolution signal is filtered, so that frequency doubling and frequency halving errors occur, and the traditional method is invalid for extracting the fundamental frequency of the pathological voice. Experiments show that due to different severity degrees of pathological voice, as shown in fig. 1, fig. 2a, fig. 2b, fig. 2c and fig. 2d, the normal voice fundamental frequency extracted by the traditional method is stable, and the extracted pathological voice fundamental frequency is unstable, broken or even disappeared, so that the problems that the fundamental frequency cannot be extracted and the like are solved, which provides a challenge for accurately extracting the fundamental frequency.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an improved pathological voice pitch frequency extraction method capable of improving the accuracy of voice pitch frequency extraction.
The technical scheme adopted by the invention is as follows: an improved pathological voice fundamental tone frequency extraction method comprises the following steps:
1) performing signal preprocessing, including direct current removal processing and framing processing;
2) decomposing and reconstructing a wavelet packet, wherein the wavelet packet decomposition is to decompose the signals subjected to framing processing by adopting db6 in Duabeechies series wavelets to obtain signals required by reconstruction, and the number of decomposed layers is determined according to the sampling frequency and the upper limit of the signal fundamental frequency; reconstructing the frame signal before decomposition according to the correlation between each layer of decomposed signals and the corresponding frame signal before decomposition and the signal fundamental frequency range;
3) performing HHT conversion on the reconstructed frame signal to obtain a plurality of IMF components, removing the IMF components which do not meet the frequency condition, and then reconstructing the framing signal before the HHT conversion, wherein the reconstruction comprises empirical mode decomposition and Hilbert conversion;
4) and carrying out fundamental frequency extraction on the framing signals before reconstruction of HHT transformation.
In step 1), the time length of each frame signal in the framing processing is 20ms, the sampling frequency is 25KHz, the corresponding frame length is 500, and the frame shifting is 250.
The number of decomposed layers in the step 2) adopts the following formula:
Figure BDA0001736257520000021
wherein J represents the number of layers of wavelet packet decomposition; int represents a rounding operation; f. ofsRepresents the sampling frequency; f. ofmaxRepresenting the upper fundamental frequency limit.
The reconstruction in the step 2) comprises the following steps:
(1) according to sampling frequency and decomposition layer number by low frequency band formula
Figure BDA0001736257520000022
And high frequency band formula
Figure BDA0001736257520000023
Respectively calculating the frequency ranges of the low-frequency signal and the high-frequency signal in each layer after decomposition, wherein n is the current layer;
(2) and respectively comparing the signal fundamental frequency range with the calculated frequency ranges of the low-frequency signal and the high-frequency signal in each layer, and finding out the decomposed signal of the layer closest to the signal fundamental frequency range.
The step 3) comprises the following steps:
(1) performing empirical mode decomposition on the reconstructed frame signal to obtain a plurality of IMF components and residual terms, wherein the reconstructed frame signal is expressed as
Figure BDA0001736257520000024
Wherein m represents the number of IMF components; ci(t) denotes the ith IMF component; r ism(t) represents the remaining terms;
(2) performing Hilbert transform on each IMF component to obtain an instantaneous frequency and an instantaneous amplitude of each IMF, respectively, including:
(2.1) obtaining the intermediate variable y by using the following formulai(t)
Figure BDA0001736257520000025
(2.2) adding Ci(t) and yi(t) use ofThe formula constitutes an analysis signal z (t):
Figure BDA0001736257520000026
in the formula, ai(t) represents the instantaneous magnitude of the ith IMF component, as follows:
Figure BDA0001736257520000027
θi(t) represents the ith IMF component instantaneous frequency, expressed as follows:
Figure BDA0001736257520000028
obtaining the instantaneous frequency and the instantaneous amplitude of each IMF component through HHT conversion;
and (2.3) after the IMF components which do not meet the signal fundamental frequency range are removed, the rest IMF components are subjected to reconstruction of the framing signals before HHT transformation.
The step 4) comprises the following steps:
(1) the following formula is adopted for the framing signals before the reconstructed HHT conversion to find out a short-time autocorrelation function Rl(k) As an estimate of the pitch period:
Figure BDA0001736257520000031
in the formula: x is the number ofl(m) denotes the l-th frame signal; n represents the frame length of each frame; k represents a time delay amount;
(2) calculating the pitch frequency according to the estimated value of the pitch period;
(3) processing the outlier of the fundamental tone frequency by adopting median filtering and linear smoothing;
(4) and calculating the average value of the fundamental frequency of each frame of signal as the final extraction result.
The improved pathological voice pitch frequency extraction method has the following beneficial effects:
1) the method ensures that the extracted fundamental tone frequency of the pathological voice is still basically kept in the original fundamental frequency range, and ensures the extraction accuracy of the fundamental tone frequency of the voice. The invention adopts wavelet packet decomposition reconstruction and HHT conversion to highlight the pitch information of the voice, filters out higher harmonics, enhances the signal periodicity, inhibits unnecessary peak values of an autocorrelation function, emphasizes the peak values at the pitch period and improves the noise immunity.
2) The invention carries out smooth post-processing on the obtained pitch track, removes the influence of a wild point on a calculation result, reduces errors of frequency multiplication, half frequency and pseudorandom points, obtains a clear and stable pitch track, improves the accuracy of extracting the fundamental frequency of the pathological voice, and effectively overcomes the defects that the fundamental frequency is unstable, is broken or even disappears and can not be extracted when the pathological voice is extracted by using the traditional algorithm.
Drawings
FIG. 1 is a graph of fundamental frequency of a normal vocal signal extracted using a conventional algorithm;
FIG. 2a is a fundamental frequency diagram of a pathological voice signal with unstable fracture extracted by a traditional algorithm;
fig. 2b is a base frequency diagram with variable occurrence tracks of pathological voice signals extracted by adopting a traditional algorithm;
FIG. 2c is a graph of fundamental frequency of sudden rupture of a pathological voice signal extracted using a conventional algorithm;
FIG. 2d is a fundamental frequency diagram of a pathological voice signal extracted by a conventional algorithm showing unstable disappearance;
FIG. 3 is an exploded view of a wavelet 5 layer of an embodiment of the present invention;
FIG. 4 is a signal diagram and fundamental frequency diagram of pathological voice extracted according to the embodiment of the present invention;
FIG. 5 is a comparison graph of fundamental frequency and standard average fundamental frequency of pathological voice signals extracted according to the embodiment of the present invention;
fig. 6 is a flow chart of an improved pathological voice pitch frequency extraction method according to the present invention.
Detailed Description
The following describes an improved pathological voice pitch frequency extraction method according to the present invention in detail with reference to the accompanying drawings and embodiments.
As shown in fig. 6, an improved method for extracting a pitch frequency of a pathological voice of the present invention includes the following steps:
1) and performing signal preprocessing, including direct current removal processing and framing processing, wherein the time length of each frame of signal in the framing processing is 20ms, the sampling frequency is 25KHz, the corresponding frame length is 500, and the frame shifting is 250. The signals of the present invention are obtained from the American Massachusetts Eye and Ear office (MEEI) database of Kay Elemetrics, USA.
2) Performing wavelet packet decomposition and reconstruction
The fundamental frequency range of the signal is 60-500 HZ and belongs to the low-frequency part of the signal, and the essence of wavelet packet transformation is that the signal is decomposed into the high-frequency part and the low-frequency part through a high-pass filter and a low-pass filter, so that the wavelet packet transformation has the characteristics of accurate subdivision and strong time-frequency localization capability, and therefore the signal of the corresponding frequency band after wavelet transformation can be selected as a signal to be processed, the calculation amount in detection can be reduced, and the interference of the high-frequency part on the subsequent extraction of the fundamental frequency can be eliminated.
The Daubeechies series wavelet is a tightly supported standard orthogonal wavelet, and has the characteristics of large vanishing moment and good smoothness when the order is given, flexible parameter selection and the like, so the wavelet packet decomposition of the invention adopts db6 in the Duabeeechies series wavelet to decompose the signals after framing processing to obtain the signals required by reconstruction, and the number of decomposed layers is determined according to the sampling frequency and the upper limit of the signal fundamental frequency; reconstructing the frame signal before decomposition according to the correlation between each layer of decomposed signals and the corresponding frame signal before decomposition and the signal fundamental frequency range;
the number of layers of decomposition described in the present invention is the following formula:
Figure BDA0001736257520000041
wherein J represents the number of layers of wavelet packet decomposition; iht denotes roundingCalculating; f. ofsRepresents the sampling frequency; f. ofmaxRepresenting the upper fundamental frequency limit.
The reconstruction of the invention comprises the following steps:
(1) according to sampling frequency and decomposition layer number by low frequency band formula
Figure BDA0001736257520000042
And high frequency band formula
Figure BDA0001736257520000043
Respectively calculating the frequency ranges of the low-frequency signal and the high-frequency signal in each layer after decomposition, wherein n is the current layer;
(2) and respectively comparing the signal fundamental frequency range with the calculated frequency ranges of the low-frequency signal and the high-frequency signal in each layer, and finding out the decomposed signal of the layer closest to the signal fundamental frequency range.
FIG. 3 is a diagram of a 5-level wavelet decomposition according to an embodiment of the present invention, where A represents the low frequency component of the signal; d represents a high frequency component of the signal; the sequence number indicates the number of layers of wavelet packet decomposition; LP is a low pass filter; HP is a high pass filter. According to the frequency band division condition, the frequency band of the A5 signal component is 0-390.625 Hz. In terms of correlation, the a5 component has the greatest correlation with the source pathological voice signal than other components, and therefore, the embodiment of the present invention selects the a5 component as the signal to be processed for extracting the fundamental frequency.
3) Performing HHT (Hilbert Transform) conversion on the reconstructed frame signal to obtain a plurality of IMF components, removing the IMF components which do not meet the frequency condition, and reconstructing the framing signal before the HHT conversion, wherein the reconstruction comprises performing Empirical Mode Decomposition (EMD) and Hilbert Transform (HT); the method comprises the following steps:
(1) performing empirical mode decomposition on the reconstructed frame signal to obtain a plurality of IMF components and residual terms, wherein the reconstructed frame signal is expressed as
Figure BDA0001736257520000044
Wherein m represents the number of IMF components; ci(t) denotes the ith IMF component; r ism(t) represents the remaining terms;
(2) performing Hilbert transform on each IMF component to obtain an instantaneous frequency and an instantaneous amplitude of each IMF, respectively, including:
(2.1) obtaining the intermediate variable y by using the following formulai(t)
Figure BDA0001736257520000045
(2.2) adding Ci(t) and yi(t) the analytical signal z (t) is formed by the following formula:
Figure BDA0001736257520000046
in the formula, ai(t) represents the instantaneous magnitude of the ith IMF component, as follows:
Figure BDA0001736257520000047
θi(t) represents the ith IMF component instantaneous frequency, expressed as follows:
Figure BDA0001736257520000051
obtaining the instantaneous frequency and the instantaneous amplitude of each IMF component through HHT conversion;
and (2.3) after the IMF components which do not meet the signal fundamental frequency range are removed, the rest IMF components are subjected to reconstruction of the framing signals before HHT transformation.
4) Carrying out fundamental frequency extraction on the framing signals before reconstruction of HHT transformation, comprising the following steps:
(1) the following formula is adopted for the framing signals before the reconstructed HHT conversion to find out a short-time autocorrelation function Rl(k) As an estimate of the pitch period, the distance between the two maxima of (a):
Figure BDA0001736257520000052
in the formula: x is the number ofl(m) denotes the l-th frame signal; n represents the length of each frame; k represents a time delay amount;
(2) calculating the pitch frequency according to the estimated value of the pitch period;
(3) processing the outlier of the fundamental tone frequency by adopting median filtering and linear smoothing;
(4) and calculating the average value of the fundamental frequency of each frame of signal as the final extraction result.
In the embodiment of the invention, in the aspect of selecting a pathological voice source signal, a part of pathological voice database of American Massachusetts Eye and Ear hospital (MEEI) of American Kay Elemetrics, which comprises continuous vowel/a/recording, has the sampling frequency of 25KHz and the bit width of 16 bits, is adopted. The data gives information on the age, sex and their average fundamental frequency of the patient. The fundamental frequency of 20 pathological voices with different diseases is extracted, wherein the fundamental frequency comprises pathological voices of vocal cord polyps, vocal cord cysts and vocal cord nodules.
The extracted fundamental frequencies of 20 pathological voice signals are compared with the average fundamental frequency given by a database, and the mean-square error (MSE) of the fundamental frequencies is calculated as a measure reflecting the difference degree between the extracted fundamental frequency and the average fundamental frequency, and is defined as:
MSE(F^)=E(F^-F)2 (8)
wherein: f ^ represents the fundamental frequency of the extracted pathological voice signal; f represents the average fundamental frequency of the pathological voice signal.
When the fundamental frequency of the pathological voice is extracted by independently adopting a traditional autocorrelation algorithm, an average amplitude difference method and the like, the conditions of fracture, instability and even disappearance often occur. The fundamental frequency extracted by the method of the invention, by comparison with the average fundamental frequency given by the database, has a mean square error of 2.02052 Hz. The experimental result shows that the accuracy is improved while the pathological voice signal is extracted.
As can be seen from fig. 4: the method does not have the problem of unstable fracture in the traditional algorithm extraction process, presents a flat and stable state on the whole time axis, and is compared with the standard average fundamental frequency, as shown in figure 5, the error is basically within 0-3 Hz, and a more ideal effect is achieved.

Claims (1)

1. An improved pathological voice fundamental tone frequency extraction method is characterized by comprising the following steps:
1) performing signal preprocessing, including direct current removal processing and framing processing, wherein the time length of each frame of signal in the framing processing is 20ms, the sampling frequency is 25kHz, the corresponding frame length is 500, and the frame moving is 250;
the signal is a partial pathological voice database of Massachusetts Eye and Ear hospitals (MEEI) of Kay Elemetrics, USA, the sampling frequency is 25kHz, and the bit width is 16 bits; selecting 20 pathological voices with different diseases for extracting fundamental frequencies, wherein the fundamental frequencies comprise pathological voices of vocal cord polyps, vocal cord cysts and vocal cord nodules;
2) decomposing and reconstructing a wavelet packet, wherein the wavelet packet decomposition is to decompose the signals subjected to framing processing by adopting db6 in Duabeechies series wavelets to obtain signals required by reconstruction, and the number of decomposed layers is determined according to the sampling frequency and the upper limit of the signal fundamental frequency; reconstructing the frame signal before decomposition according to the correlation between each layer of decomposed signals and the corresponding frame signal before decomposition and the signal fundamental frequency range; wherein the content of the first and second substances,
the number of decomposed layers is the following formula:
Figure FDA0003476372130000011
wherein J represents the number of layers of wavelet packet decomposition; int represents a rounding operation; f. ofsRepresents the sampling frequency; f. ofmaxRepresents the fundamental frequency upper limit;
the reconstruction comprises the following steps:
(1) according to sampling frequency and decomposition layer number by low frequency band formula
Figure FDA0003476372130000012
And high frequency band formula
Figure FDA0003476372130000013
Respectively calculating the frequency ranges of the low-frequency signal and the high-frequency signal in each layer after decomposition, wherein n is the current layer;
(2) comparing the signal fundamental frequency range with the calculated frequency ranges of the low-frequency signal and the high-frequency signal in each layer respectively, and finding out the decomposition signal of the layer closest to the signal fundamental frequency range;
3) performing HHT conversion on the reconstructed frame signal to obtain a plurality of IMF components, removing the IMF components which do not meet the frequency condition, and then reconstructing the framing signal before the HHT conversion, wherein the reconstruction comprises empirical mode decomposition and Hilbert conversion; the method comprises the following steps:
(1) performing empirical mode decomposition on the reconstructed frame signal to obtain a plurality of IMF components and residual terms, wherein the reconstructed frame signal is expressed as
Figure FDA0003476372130000014
Wherein m represents the number of IMF components; ci(t) denotes the ith IMF component; r ism(t) represents the remaining terms;
(2) performing Hilbert transform on each IMF component to obtain an instantaneous frequency and an instantaneous amplitude of each IMF, respectively, including:
(2.1) obtaining the intermediate variable y by using the following formulai(t)
Figure FDA0003476372130000015
(2.2) adding Ci(t) and yi(t) the analytical signal z (t) is formed by the following formula:
Figure FDA0003476372130000016
in the formula, ai(t) represents the instantaneous magnitude of the ith IMF component, as follows:
Figure FDA0003476372130000021
θi(t) represents the ith IMF component instantaneous frequency, expressed as follows:
Figure FDA0003476372130000022
obtaining the instantaneous frequency and the instantaneous amplitude of each IMF component through HHT conversion;
(2.3) after removing IMF components which do not meet the signal fundamental frequency range, reconstructing framing signals before HHT conversion on the residual IMF components;
4) carrying out fundamental frequency extraction on the framing signals before reconstruction of HHT transformation; the method comprises the following steps:
(1) the following formula is adopted for the framing signals before the reconstructed HHT conversion to find out a short-time autocorrelation function Rl(k) As an estimate of the pitch period:
Figure FDA0003476372130000023
in the formula: x is a radical of a fluorine atoml(m) denotes the l-th frame signal; n represents the frame length of each frame; k represents a time delay amount;
(2) calculating the pitch frequency according to the estimated value of the pitch period;
(3) processing the outlier of the fundamental tone frequency by adopting median filtering and linear smoothing;
(4) and calculating the average value of the fundamental frequency of each frame of signal as the final extraction result.
CN201810797265.7A 2018-07-19 2018-07-19 Improved pathological voice fundamental tone frequency extraction method Active CN109036472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810797265.7A CN109036472B (en) 2018-07-19 2018-07-19 Improved pathological voice fundamental tone frequency extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810797265.7A CN109036472B (en) 2018-07-19 2018-07-19 Improved pathological voice fundamental tone frequency extraction method

Publications (2)

Publication Number Publication Date
CN109036472A CN109036472A (en) 2018-12-18
CN109036472B true CN109036472B (en) 2022-05-10

Family

ID=64644015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810797265.7A Active CN109036472B (en) 2018-07-19 2018-07-19 Improved pathological voice fundamental tone frequency extraction method

Country Status (1)

Country Link
CN (1) CN109036472B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763930B (en) * 2021-11-05 2022-03-11 深圳市倍轻松科技股份有限公司 Voice analysis method, device, electronic equipment and computer readable storage medium
CN114822567B (en) * 2022-06-22 2022-09-27 天津大学 Pathological voice frequency spectrum reconstruction method based on energy operator

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103117066B (en) * 2013-01-17 2015-04-15 杭州电子科技大学 Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum
CN105067101A (en) * 2015-08-05 2015-11-18 北方工业大学 Fundamental tone frequency characteristic extraction method based on vibration signal for vibration source identification
US10249325B2 (en) * 2016-03-31 2019-04-02 OmniSpeech LLC Pitch detection algorithm based on PWVT of Teager Energy Operator
CN107316653B (en) * 2016-04-27 2020-06-26 南京理工大学 Improved empirical wavelet transform-based fundamental frequency detection method

Also Published As

Publication number Publication date
CN109036472A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
Manfredi Adaptive noise energy estimation in pathological speech signals
KR20140079369A (en) System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain
Ebner et al. Audio inpainting with generative adversarial network
Bhowmick et al. Speech enhancement using voiced speech probability based wavelet decomposition
CN109036472B (en) Improved pathological voice fundamental tone frequency extraction method
Jangjit et al. A new wavelet denoising method for noise threshold
Vaz et al. A two-step technique for MRI audio enhancement using dictionary learning and wavelet packet analysis.
Singh et al. Preliminary analysis of cough sounds
Hidayat et al. A Modified MFCC for Improved Wavelet-Based Denoising on Robust Speech Recognition.
Zhang et al. FB-MSTCN: A full-band single-channel speech enhancement method based on multi-scale temporal convolutional network
Kuortti et al. Post-processing speech recordings during MRI
Manfredi et al. A new insight into postsurgical objective voice quality evaluation: application to thyroplastic medialization
González–rodríguez et al. Robust denoising of phonocardiogram signals using time-frequency analysis and U-Nets
Cai et al. The best input feature when using convolutional neural network for cough recognition
Moussavi et al. Heart sound cancellation based on multiscale products and linear prediction
CN114822567A (en) Pathological voice frequency spectrum reconstruction method based on energy operator
Singh et al. IIIT-S CSSD: A cough speech sounds database
Baishya et al. Speech de-noising using wavelet based methods with focus on classification of speech into voiced, unvoiced and silence regions
JP3841705B2 (en) Occupancy degree extraction device and fundamental frequency extraction device, method thereof, program thereof, and recording medium recording the program
Lin et al. An evaluation study of modulation-domain wavelet denoising method by alleviating different sub-band portions for speech enhancement
Sirichokswad et al. Improvement of esophageal speech using lpc and lf model
Ravi Performance analysis of adaptive wavelet denosing by speech discrimination and thresholding
CN109346106B (en) Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting
Wiśniewski et al. Tonal Index in digital recognition of lung auscultation
Rohith et al. Comparitive Analysis of Speech Enhancement Techniques: A Review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant