KR20100044424A - Transfer base voiced measuring mean and system - Google Patents

Transfer base voiced measuring mean and system Download PDF

Info

Publication number
KR20100044424A
KR20100044424A KR1020080103555A KR20080103555A KR20100044424A KR 20100044424 A KR20100044424 A KR 20100044424A KR 1020080103555 A KR1020080103555 A KR 1020080103555A KR 20080103555 A KR20080103555 A KR 20080103555A KR 20100044424 A KR20100044424 A KR 20100044424A
Authority
KR
South Korea
Prior art keywords
harmonic
voiced sound
measuring
noise
peak
Prior art date
Application number
KR1020080103555A
Other languages
Korean (ko)
Inventor
김현수
Original Assignee
삼성전자주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자주식회사 filed Critical 삼성전자주식회사
Priority to KR1020080103555A priority Critical patent/KR20100044424A/en
Publication of KR20100044424A publication Critical patent/KR20100044424A/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Abstract

Moving based voiced sound measuring method according to the present invention comprises the steps of measuring the envelope of the harmonic peak of the speech signal; Measuring an envelope of a noise peak of the speech signal; And measuring the degree of environmental noise based on the envelope ratio between the harmonic peak and the noise peak, and then moving the harmonic according to the degree of environmental grading to measure the degree of the voiced sound, and using an accurate and practical feature extraction method based on the harmonic component analysis. By presenting (characteristics for the degree of voicing measure), it has the advantage of presenting a new way of solving the problem of voicing information, which is the most important and performance-influenced information in systems using all voice and audio signals. .

Description

Movement-based voiced sound measuring method and system {TRANSFER BASE VOICED MEASURING MEAN AND SYSTEM}

The present invention relates to a method and system for measuring voice-based voices, and by analyzing and using the harmonic region, it is possible to quantify the separation feature extraction of voiced and unvoiced sounds very robust to noise, very fast, accurate and practical with little computation. .

In voice or audio signals, periodic (periodic, periodic or harmonic) components and aperiodic (unvoiced, nom-periodic or random) components are mixed. Such voiced and unvoiced separation information is used for all voice and audio signal processing systems (coding). , Synthesis, recognition, reinforcement, etc.) is the most basic and critical information.

Using the exact selection method of harmonic peaks, a spectral envelope is obtained, and a method of extracting the degree of voiced sound using the ratio of the envelope of the obtained harmonic peak and the nominal-periodic peak is presented.

As shown in FIG. 1, the degree of voiced sound is provided by whether voiced / unvoiced switching is performed according to a voiced voice (periodic glottal pulse) and a voiced voice (random excitation).

A method using separation of voiced sound and unvoiced sound includes a method used in phonetic coding.

This method is divided into six categories (onset, full-band steady-state voiced, full-band transient voiced, low-pass transient voiced, low-pass steady-state voiced and unvoiced) for phonetic segmentation. have.

The features used to separate voiced and unvoiced sound include: 1. Low-band speech energy, 2. Zero-crossing count, 3. First reflection coefficient, 4. Pre-emphasized energy ratio, 5. Second reflection coefficient, 6. Casual pitch prediction gains, 7. There are non-causal pitch prediction gains, which are used in combination in a linear discriminator.

Yu as one feature currently. There are few ways to isolate the unvoiced sound, and how to combine several features will have a significant impact on performance.

References-S. Wang, and A. Gersho, "Phonetically-based vector excitation coding of speech at 3.6 kbits / sec.", Proc. IEEE 1989 Int. Conf. Acoust., Speech and Signal Processing, pp. 349-352, 1989; E. Paksoy, K. Srinivasan and A. Gersho, "Variable rate speech coding with phonetic segmentation", Proc. IEEE 1993 Int. Conf. Acoust., Speech and Signal Processing, pp. 155-158, 1993; A. Das and A. Gersho, "Variable dimension spectral coding of speech at 2400 bps and below with phonetic classification," Proc. IEEE 1995 Int. Conf. Acoust., Speech and Signal Processing, pp. 492-495, 1995.

However, there are many features that are used to extract the separated features of voiced and unvoiced voices, but each of them is a single feature that lacks information for voiced / unvoiced classification, and features that are not clear. Combination of separated voice and unvoiced sound.

At this time, the problem of correlation between each feature (because the feature combination has to be selected) and the problem of performance degradation in noise are serious and must be solved.

In addition, the difference between the presence and absence of harmonic components, which are inherent differences between voiced and unvoiced sounds, is not properly expressed, and the development of a feature extraction method capable of accurately separating voiced and unvoiced sounds by analyzing the harmonic components is practical. It is required.

In defining the performance of the voicing estimator, an important issue is

1. Sensitivity to mixed voicing of voice,

2. insensitivity to pitch behavior (including high and low pitches, smooth changes in pitch, and the presence or absence of randomness in the pitch period),

3. insensitivity to the spectral envelope,

4. subjective performance, etc.

In practice, the auditory system is not very sensitive to small changes in voicing intensity, so there may be some margin of error in the measurement of voicing measures, but the most important performance criterion is listening. It's subjective performance.

Accordingly, the present invention is to solve the above-mentioned conventional problems, the object of the present invention is to satisfy the important issue in defining the performance of the voiced sound envelope, while not requiring a combination of several unclear features, as a single feature The present invention provides a method and system for measuring voice-based voices and separating them based on their characteristics.

According to an aspect of the mobile-based voiced sound measuring system according to an embodiment of the present invention for achieving the above object, a harmonic measurement unit for measuring the envelope of the harmonic peak of the speech signal; A noise measuring unit measuring an envelope of noise peaks of the audio signal; And after measuring the degree of environmental noise through the envelope ratio of the harmonic peak and the noise peak may include a voiced sound measuring unit for measuring the degree of voiced sound by moving the harmonics according to the degree of the environment.

The voiced sound measuring unit includes: an environmental noise measuring unit measuring an environmental noise level; A harmonic shift value detector for measuring a harmonic shift based on the degree of environmental noise measured by the environmental noise measurer; A harmonic shifter for moving the harmonics of a frame to be estimated according to the voiced sound movement of the harmonic shifter; A voiced sound contact detector for measuring a contact value between harmonics moved through the harmonic moving unit; And a voiced sound determining unit configured to determine a voiced sound level of the voice signal according to whether the contact value between the harmonics measured by the voiced voice contact detector exceeds a preset threshold value.

The environmental noise measurement unit determines whether the envelope ratio of the harmonic peak and the noise peak exceeds a preset threshold ratio.

The voiced sound measuring unit moves the harmonic by applying a weight when the envelope ratio between the harmonic peak and the noise peak exceeds a preset threshold ratio.

The harmonic measurement unit may extract through harmonic to noise decomposition or extract a harmonic region through SEEVOC.

According to an aspect of the movement-based voiced sound measuring method according to an embodiment of the present invention, measuring the envelope of the harmonic peak of the speech signal; Measuring an envelope of a noise peak of the speech signal; And after measuring the degree of environmental noise through the envelope ratio of the harmonic peak and the noise peak may include the step of moving the harmonic according to the degree of environmental catch to measure the degree of voiced sound.

Measuring the level of the voiced sound, measuring the level of environmental noise; Determining a degree of harmonic movement according to the degree of environmental noise; Moving the harmonic of the frame to be estimated according to the degree of voiced sound movement; Detecting a contact value according to the contact between harmonics; And determining the voiced sound level of the corresponding voice signal according to whether the contact value exceeds a preset threshold.

In the measuring of the environmental noise level, the voiced sound measuring unit determines whether the envelope ratio of the harmonic peak and the noise peak exceeds a predetermined threshold ratio.

In the moving of the harmonic of the frame to be estimated according to the voiced sound movement degree, when the envelope ratio of the harmonic peak and the noise peak exceeds a preset threshold ratio, the voiced sound measuring unit applies the weight to move the harmonic.

The extracting of the harmonic region may include extracting harmonic to noise decomposition through the harmonic region or extracting the harmonic region through SEEVOC.

As described above, according to the moving-based voiced sound measuring method and system according to the present invention, by using an accurate and practical feature extraction method based on the harmonic component analysis (a feature for the degree of voicing measure), all voice and audio signals are used. It has a great effect in suggesting a new way to solve the problem of voicing information, which is the most important and performance-influenced information in the system.

Hereinafter, with reference to the accompanying drawings, a preferred embodiment for a mobile-based voiced sound measuring method and system according to the present invention will be described in detail. At this time, it will be understood by those of ordinary skill in the art that the system configuration described below is a system cited for the purpose of the present invention and does not limit the present invention to the following system.

2 is a view showing the configuration of a movement-based voiced sound measuring system according to an embodiment of the present invention, the movement-based voiced sound measuring system according to the present invention signal input unit 100, harmonic measuring unit 200, noise measuring unit 300 And a voiced sound measuring unit 400, and the voiced sound measuring unit 400 includes an environmental noise measuring unit 410, a harmonic moving value detector 420, a harmonic moving unit 430, The voiced sound contact detecting unit 440 and the voiced sound determining unit 450 are included.

The signal input unit 100 receives a voice signal and samples the same to obtain a frame.

And the harmonic measurement unit 200 measures the envelope of the harmonic peak.

In addition, the noise measuring unit 300 measures the envelope of the noise peak in the same manner as the method of accurately selecting the harmonic peak.

The voiced sound measuring unit 400 measures the degree of voiced sound through the envelope ratio between the harmonic peak and the noise peak.

That is, the environmental noise measuring unit 410 of the voiced sound measuring unit 400 measures the degree of environmental noise. Here, the environmental noise measurement unit 410 determines whether the envelope ratio between the harmonic peak and the noise peak exceeds a preset threshold ratio.

In addition, the harmonic movement value detector 420 of the voiced sound measurement unit 400 measures the harmonic movement degree according to the degree of environmental noise measured by the environmental noise measurement unit 410.

As shown in FIG. 4, the harmonic shifter 430 of the voiced sound measurer 400 moves the harmonic of the frame to be estimated according to the voiced sound shift level f 0 -Pt determined by the harmonic shift value detector 420. Here, the harmonic shifting unit 430 moves the harmonic by applying a weight when the envelope ratio between the harmonic peak and the noise peak exceeds a preset threshold ratio.

The voiced sound contact detecting unit 440 of the voiced sound measuring unit 400 measures the contact value Dk between the harmonics moved through the harmonic moving unit 430.

In addition, the voiced sound determination unit 450 of the voiced sound measuring unit 400 determines the voiced sound level of the voice signal according to whether the contact value between the harmonics measured by the voiced sound contact detector 440 exceeds a preset threshold value.

On the other hand, the harmonic peak selection method selected by the harmonic measurement unit 200 is a reference SEEVOC [D.B. Paul, "The spectral envelope estimation", IEEE Trans. Acoust., Speech and Signal Process., ASSP-29, pp. 786-794, 1981, and a separate method of applying harmonic to noise decomposition is described in C. d'Alessandro, B. Yegnanarayana and V. Darsinos, "Decomposition of speech signals into deterministic and stochastic components". , Proc. ICASSP-95, pp. 760-763, 1995).

Therefore, the envelope of the harmonic peak and the envelope of the noise peak can be extracted by the SEEVOC method and the harmonic to noise decomposition.

First, the SEEVOC method may frequency-convert a speech signal through a discrete Fourier transform, extract a harmonic peak as shown in FIG. 5A, and then extract an envelope of the harmonic peak as shown in FIG. 5B.

Thereafter, after extracting the noise peak as shown in FIG. 5C, the harmonic envelope S (SEEVOC envelope) and the noise envelope W (non-SEEVOC envelope) can be estimated as shown in FIG. 5D.

Here, if the voice is strong voiced, the SEEVOC envelope S will appear above the non-SEEVOC envelope W (since it is mostly noise).

If it is a weak voiced or unvoiced sound, the SEEVOC envelope will not be larger than the non-SEEVOC peak envelope so the degree of voiced sound will not be very clear.

Therefore, the next candidate measurement of the range of voiced sounds is based on the energy ratio of the two spectral envelopes.

[Equation 1]

Figure 112008073290643-PAT00001

Where the two envelopes are all frequencies fn, n = 0, 1, ..., M-1 when they are specified. Interpolate when required.

Hereinafter, a harmonic to noise decomposition method will be described.

First, the harmonic to noise decomposition method depends on the characteristics of the peaks in the spectrum.

The cross-correlation approach of the proposed spectrum is characterized by information from other parts of the spectrum, after which the spectral peaks are not as sensitive to noise as points in the spectrum with low amplitude.

Therefore, in many examples of spectra with strong voices and mixed voices, the shape of the peaks adjacent to the strong voiced spectrum derived from a portion of the pitch in the voice signal is very similar.

Thus, when the pitch is changed in the frame, the shape of the peaks is mutually changed in the contact spectrum.

Even in mixed voiced sounds, the shape changes from peak to peak for faster and faster voiceless voices.

These measurements will be more sensitive to mixed voiced sounds than changes in height.

Therefore, the harmonic to noise decomposition method uses the cross-correlation of two adjacent peaks in the spectrum as a similarly shaped representation.

Here, the normalized speech spectrum Sn (f) is the amplitude spectrum of the monotonous spectrum divided by the SEEVOC spectral envelope.

This normalized spectrum maintains the construction of the amplitude spectrum. However, all important peaks are assumed to have a consistent amplitude.

On the other hand, the envelope is composed by interpolation between the selected peaks in the amplitude spectrum and the division by the envelope will change the shape of the individual peak.

Then, during the STFT analysis, each peak is obtained with the shape of a Hanning window transform, so that the normalized spectrum eliminates the formant effect.

Where k0 is the frequency bin corresponding to the peak and Sn (k) is the normalized spectrum.

Then, Equation 2 measures the mutual similarity starting the two adjacent peak frequency intervals in the spectral sample.

[Equation 2]

Figure 112008073290643-PAT00002

In determining the deviation of the overall deviation from the periodicity in the spectrum, the change in V (m) in the full spectrum is identified.

This measurement considers points that lower the amplitude portion when low amplitude random conditions are evident, and the amplitudes are less regulated by normalization.

However, this measurement is more sensitive to peak changes within the frame than mixed voiced sounds.

Here, the capstrum is a second-order spectrum of G (t), which is a Fourier transform of G of the function G (t) of the time domain once again, and is a term made from the spectrum, and the concept corresponding to the frequency, amplitude, and phase in the domain is expressed. Are called quefrency, lampitude, and saphe, respectively. In addition, a concept corresponding to filtering in the time domain or the frequency domain is called lifting. Capstrum analysis is useful for analyzing signals in which the frequency spectrum is periodic, and many different definitions are used.

General functions and detailed operations of the above-described elements will be omitted, and the operations will be described with reference to FIGS. 2 to 5D based on the operations corresponding to the present invention.

First, the signal input unit 100 receives a voice signal to sample and obtain a frame.

Subsequently, the harmonic measurement unit 200 measures the envelope of the harmonic peak through a method of accurately selecting the harmonic peak.

That is, the harmonic measurement unit 200 and the noise measurement unit 300 may acquire the envelope S obtained from the peak S and the remaining peak obtained from the peak selected very simply from the SEEVOC method as shown in FIGS. 5A to 5D. .

Here, if the voice is a strong voiced voice, the SEEVOC envelope S will appear above the non-SEEVOC envelope W (since it is mostly noise).

If it is a weak voiced or unvoiced sound, the SEEVOC envelope will not be larger than the non-SEEVOC peak envelope so the degree of voiced sound will not be very clear.

Therefore, the next candidate measurement of the range of voiced sounds is based on the energy ratio of the two spectral envelopes.

[Equation 1]

Figure 112008073290643-PAT00003

Where the two envelopes are all frequencies fn, n = 0, 1, ..., M-1 when they are specified. Interpolate when required.

Then, the voiced sound measuring unit 400 measures the degree of voiced sound through the envelope ratio of the measured harmonic peak and the noise peak.

That is, the environmental noise measuring unit 410 measures the degree of environmental noise. Here, the environmental noise measurement unit 410 determines whether the envelope ratio between the harmonic peak and the noise peak exceeds a preset threshold ratio.

Subsequently, the harmonic shift value detector 420 measures the harmonic shift based on the degree of environmental noise measured by the environmental noise measurer 410.

Thereafter, as shown in FIG. 4, the harmonic shifter 430 moves the harmonic of the frame to be estimated according to the voiced sound shift level f 0 -Pt determined by the harmonic shift value detector 420. Here, the harmonic shifting unit 430 moves the harmonic by applying a weight when the envelope ratio of the harmonic peak and the noise peak exceeds a preset threshold ratio.

Then, the voiced sound contact detecting unit 440 measures the contact value Dk between the harmonics moved through the harmonic moving unit 430.

Accordingly, the voiced sound determiner 450 determines the voiced sound level of the voice signal according to whether the contact value between the harmonics measured by the voiced voice contact detector 440 exceeds a preset threshold.

On the other hand, the harmonic measurement unit 200 may measure the pitch of the harmonic through a harmonic to noise decomposition method.

First, the harmonic to noise decomposition method depends on the characteristics of the peaks in the spectrum.

The cross-correlation approach of the proposed spectrum is characterized by information from other parts of the spectrum, after which the spectral peaks are not as sensitive to noise as points in the spectrum with low amplitude.

Therefore, in many examples of spectra with strong voices and mixed voices, the shape of the peaks adjacent to the strong voiced spectrum derived from a portion of the pitch in the voice signal is very similar.

Thus, when the pitch is changed in the frame, the shape of the peaks is mutually changed in the contact spectrum.

Even in mixed voiced sounds, the shape changes from peak to peak for faster and faster voiceless voices.

These measurements will be more sensitive to mixed voiced sounds than changes in height.

Therefore, the harmonic to noise decomposition method uses the cross-correlation of two adjacent peaks in the spectrum as a similarly shaped representation.

Here, the normalized speech spectrum Sn (f) is the amplitude spectrum of the monotonous spectrum divided by the SEEVOC spectral envelope.

This normalized spectrum maintains the construction of the amplitude spectrum. However, all important peaks are assumed to have a consistent amplitude.

On the other hand, the envelope is composed by interpolation between the selected peaks in the amplitude spectrum and the division by the envelope will change the shape of the individual peak.

Then, during the STFT analysis, each peak is obtained with the shape of a Hanning window transform, so that the normalized spectrum eliminates the formant effect.

Where k0 is the frequency bin corresponding to the peak and Sn (k) is the normalized spectrum.

Then, Equation 2 measures the mutual similarity starting the two adjacent peak frequency intervals in the spectral sample.

[Equation 2]

Figure 112008073290643-PAT00004

In determining the deviation of the overall deviation from the periodicity in the spectrum, the change in V (m) in the full spectrum is identified.

This measurement considers points that lower the amplitude portion when low amplitude random conditions are evident, and the amplitudes are less regulated by normalization.

However, this measurement is more sensitive to peak changes within the frame than mixed voiced sounds.

Then, the movement-based voiced sound measuring method according to an embodiment of the present invention having the configuration as described above will be described with reference to FIG.

First, the signal input unit 100 receives a voice signal to sample and obtain a frame (S1).

Subsequently, the harmonic measurement unit 200 measures the envelope of the harmonic peak (S2).

In addition, the noise measuring unit 300 measures the envelope of the noise peak of the speech signal (S3). Here, the step S2 of extracting the harmonic region and the step S (3) of extracting the noise region may be extracted through the SEEVOC method or the harmonic to noise decomposition method.

Then, the voiced sound measuring unit 400 measures the environmental noise level through the envelope ratio between the harmonic peak and the noise peak, and then moves the harmonics according to the degree of environmental catching to measure the degree of the voiced sound (S4).

Hereinafter, a detailed step will be described with reference to FIG. 7 for the step S4 of measuring the degree of voiced sound through the envelope ratio of the harmonic peak and the noise peak.

First, the voiced sound measuring unit 400 measures the degree of environmental noise (S41). Here, in the step (S41) of measuring the environmental noise level, the voiced sound measuring unit 400 determines whether the envelope ratio of the harmonic peak and the noise peak exceeds a preset threshold ratio.

Subsequently, the voiced sound measuring unit 400 determines the degree of harmonic movement according to the degree of environmental noise (S42).

Thereafter, the voiced sound measuring unit 400 moves the harmonics of the frame to be estimated according to the voiced sound movement degree f 0 -P t as shown in FIG. 4 (S43). Here, in the step S43 of moving the harmonics of the frame to be estimated according to the voiced sound movement, the voiced sound measuring unit 400 applies a weight when the envelope ratio of the harmonic peaks and the noise peak exceeds a preset threshold ratio. Move it.

Then, the voiced sound measuring unit 400 detects the contact value Dk according to the contact between the harmonics (S44).

Thereafter, the voiced sound measuring unit 400 determines the voiced sound level of the corresponding voice signal according to whether the contact value exceeds a preset threshold value (S45).

The voiced sound information extraction methods proposed in the present invention can be used for all coding, recognition, reinforcement, and synthesis. Especially, voiced sound information extraction based on a small amount of computation and accurate harmonic region detection can be applied to a mobile phone terminal, a telematics, a PDA, and an mp3. It will be more effective in applications that have high mobility, limited computational and storage capacity, or require fast processing, and may be the most important source technology for all other voice and audio signal processing systems.

Although the present invention has been described in detail only with respect to the specific embodiments described, it will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the present invention, and such modifications and modifications belong to the appended claims. .

1 is a functional block diagram for separating voiced and unvoiced sounds in general.

2 is a functional block diagram showing the configuration of a movement-based voiced sound measuring system according to the present invention.

3 is a functional block diagram illustrating a detailed configuration of the voiced sound measuring unit in the movement-based voiced sound measuring system according to FIG.

4 is a diagram showing a state in which the harmonics are moved according to the degree of environmental noise measured in the movement-based voiced sound measuring system according to FIG. 3.

5A-5D illustrate the SEEVOC method.

Figure 6 is a flow chart showing a movement-based voiced sound measuring method according to the present invention.

FIG. 7 is a flowchart illustrating the detailed steps of the voiced sound measuring method S4 in the movement-based voiced sound measuring method according to FIG. 6.

<Explanation of symbols for the main parts of the drawings>

100: signal input unit 200: harmonic measurement unit

300: noise measuring unit 400: voiced sound measuring unit

410: environmental noise measurement unit 420: harmonic shift value detector

430: harmonic moving unit 440: voiced sound contact detection unit

450: voiced sound determination unit

Claims (12)

Measuring the envelope of the harmonic peak of the speech signal; Measuring an envelope of a noise peak of the speech signal; And And measuring the amount of voiced sound by moving the harmonics according to the degree of environmental catchment after measuring the degree of environmental noise based on the envelope ratio between the harmonic peaks and the noise peak. The method of claim 1, Measuring the degree of the voiced sound, Measuring a degree of environmental noise; Determining a degree of harmonic movement according to the degree of environmental noise; Moving the harmonic of the frame to be estimated according to the degree of voiced sound movement; Detecting a contact value according to the contact between harmonics; And And determining the voiced sound level of the corresponding voice signal according to whether the contact value exceeds a preset threshold. 3. The method of claim 2, Measuring the environmental noise level, The voiced sound measuring unit determines whether the envelope ratio of the harmonic peak and the noise peak exceeds a predetermined threshold ratio. The method of claim 3, Moving the harmonic of the frame to be estimated according to the voiced sound movement degree, The voiced sound measuring unit applies a weight to move the harmonic when the envelope ratio of the harmonic peak and the noise peak exceeds a preset threshold ratio. The method of claim 1, Extracting the harmonic region, A moving-based voiced sound measurement method comprising extracting harmonic to noise decomposition through a harmonic region. The method of claim 1, Extracting the harmonic region, Moving-based voiced sound measurement method characterized in that the extraction of the harmonic region through the SEEVOC. A harmonic measurement unit measuring an envelope of a harmonic peak of a voice signal; A noise measuring unit measuring an envelope of noise peaks of the audio signal; And A mobile-based voiced sound measuring system including a voiced sound measuring unit for measuring the degree of voiced sound by moving the harmonics according to the degree of environmental catch after measuring the degree of environmental noise through the envelope ratio of the harmonic peak and the noise peak. The method of claim 7, wherein The voiced sound measuring unit, Environmental noise measuring unit for measuring the degree of environmental noise; A harmonic shift value detector for measuring a harmonic shift based on the degree of environmental noise measured by the environmental noise measurer; A harmonic shifter for moving the harmonics of a frame to be estimated according to the voiced sound movement of the harmonic shifter; A voiced sound contact detector for measuring a contact value between harmonics moved through the harmonic moving unit; And And a voiced sound determiner configured to determine a voiced sound level of the voice signal according to whether the contact value between the harmonics measured by the voiced voice contact detector exceeds a preset threshold value. The method of claim 8, The environmental noise measuring unit, A moving-based voiced sound measurement system, characterized in that it is determined whether the envelope ratio of the harmonic peak and the noise peak exceeds a predetermined threshold ratio. The method of claim 9, The voiced sound measuring unit, And moving the harmonic by applying a weight if the envelope ratio of the harmonic peak and the noise peak exceeds a preset threshold ratio. The method of claim 7, wherein The harmonic measurement unit, Moving-based voiced sound measurement system, characterized in that the extraction by harmonic to noise decomposition. The method of claim 7, wherein Extracting the harmonic region, A mobile-based voiced sound measurement system, characterized by extracting harmonic regions through SEEVOC.
KR1020080103555A 2008-10-22 2008-10-22 Transfer base voiced measuring mean and system KR20100044424A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020080103555A KR20100044424A (en) 2008-10-22 2008-10-22 Transfer base voiced measuring mean and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020080103555A KR20100044424A (en) 2008-10-22 2008-10-22 Transfer base voiced measuring mean and system

Publications (1)

Publication Number Publication Date
KR20100044424A true KR20100044424A (en) 2010-04-30

Family

ID=42219123

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020080103555A KR20100044424A (en) 2008-10-22 2008-10-22 Transfer base voiced measuring mean and system

Country Status (1)

Country Link
KR (1) KR20100044424A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013085801A1 (en) * 2011-12-09 2013-06-13 Microsoft Corporation Harmonicity-based single-channel speech quality estimation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013085801A1 (en) * 2011-12-09 2013-06-13 Microsoft Corporation Harmonicity-based single-channel speech quality estimation
US8731911B2 (en) 2011-12-09 2014-05-20 Microsoft Corporation Harmonicity-based single-channel speech quality estimation

Similar Documents

Publication Publication Date Title
Graf et al. Features for voice activity detection: a comparative analysis
KR100744352B1 (en) Method of voiced/unvoiced classification based on harmonic to residual ratio analysis and the apparatus thereof
Sadjadi et al. Unsupervised speech activity detection using voicing measures and perceptual spectral flux
KR101060533B1 (en) Systems, methods and apparatus for detecting signal changes
McAulay et al. Pitch estimation and voicing detection based on a sinusoidal speech model
KR100770839B1 (en) Method and apparatus for estimating harmonic information, spectrum information and degree of voicing information of audio signal
KR100986957B1 (en) Systems, methods, and apparatus for detection of tonal components
Chen et al. Improved voice activity detection algorithm using wavelet and support vector machine
JP6272433B2 (en) Method and apparatus for detecting pitch cycle accuracy
Ba et al. BaNa: A hybrid approach for noise resilient pitch detection
Ding et al. A DCT-based speech enhancement system with pitch synchronous analysis
KR100827153B1 (en) Method and apparatus for extracting degree of voicing in audio signal
Dubuisson et al. On the use of the correlation between acoustic descriptors for the normal/pathological voices discrimination
Sadjadi et al. Robust front-end processing for speaker identification over extremely degraded communication channels
Jang et al. Evaluation of performance of several established pitch detection algorithms in pathological voices
KR20100044424A (en) Transfer base voiced measuring mean and system
Vydana et al. Detection of fricatives using S-transform
Graf et al. Improved performance measures for voice activity detection
Stahl et al. Phase-processing for voice activity detection: A statistical approach
Korvel et al. Comparative analysis of spectral and cepstral feature extraction techniques for phoneme modelling
Hermus et al. Estimation of the voicing cut-off frequency contour based on a cumulative harmonicity score
Laleye et al. Automatic boundary detection based on entropy measures for text-independent syllable segmentation
Ananthapadmanabha et al. An interesting property of LPCs for sonorant vs fricative discrimination
US20240013803A1 (en) Method enabling the detection of the speech signal activity regions
Verteletskaya et al. Pitch detection algorithms and voiced/unvoiced classification for noisy speech

Legal Events

Date Code Title Description
WITN Withdrawal due to no request for examination