CN110047519A - A kind of sound end detecting method, device and equipment - Google Patents

A kind of sound end detecting method, device and equipment Download PDF

Info

Publication number
CN110047519A
CN110047519A CN201910311947.7A CN201910311947A CN110047519A CN 110047519 A CN110047519 A CN 110047519A CN 201910311947 A CN201910311947 A CN 201910311947A CN 110047519 A CN110047519 A CN 110047519A
Authority
CN
China
Prior art keywords
frame
spectrum
signal
energy
entropy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910311947.7A
Other languages
Chinese (zh)
Other versions
CN110047519B (en
Inventor
张承云
梁龙腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN201910311947.7A priority Critical patent/CN110047519B/en
Publication of CN110047519A publication Critical patent/CN110047519A/en
Application granted granted Critical
Publication of CN110047519B publication Critical patent/CN110047519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Abstract

The invention discloses a kind of sound end detecting methods, including are filtered simultaneously framing to the received voice signal of institute, obtain a signal;Calculate the energy and frequency spectrum of a signal described in every frame;Weighted factor is constructed according to the energy, and spectrum weighting is carried out to the frequency spectrum using the weighted factor, obtains secondary singal;Calculate the power spectrum and spectrum energy summation of secondary singal described in every frame;According to the power spectrum and the spectrum energy summation, the short-time spectrum entropy of secondary singal described in every frame is calculated;Using the average value reciprocal of the short-time spectrum entropy of several frames as the detection threshold value of sound end, the judgement of speech frame and noise frame is carried out.Sound end detecting method provided by the invention can be suitable for the noise type that Power Spectrum Distribution is comparatively concentrated, and improve the accuracy of speech terminals detection.

Description

A kind of sound end detecting method, device and equipment
Technical field
The present invention relates to technical field of voice recognition, more particularly, to a kind of sound end detecting method, device and equipment.
Background technique
Speech terminals detection is a kind of technology applied to speech front-end processing, by end-point detection algorithm in signal Noisy speech signal extract, be later period auditory localization, speech enhan-cement, speech recognition, voice coding scheduling algorithm and technology Effective information is provided.The step of sound end detecting method in the prior art is broadly divided into two steps: phonic signal character mentions Take and detect voice signal.The feature extraction for carrying out voice signal by different algorithms first, believes voice signal and noise It number distinguishes;Then the voice signal extracted is examined by different detection methods.The feature extraction of voice signal is voice The core of end-point detection technology determines the accuracy rate of final speech terminals detection.
Speech terminals detection technology is mainly frequency domain end-point detection on processing domain, and frequency domain end-point detection is a kind of base In the sound end detecting method of spectrum entropy method, using voice signal and noise signal have the characteristics of different spectrum entropys to signal into Row is distinguished, and then carries out speech terminals detection by the planarization of detection power spectrum, that is, is needed according to spectrum probability density function (Probability Density Function, PDF) calculates spectrum entropy.When the Power Spectrum Distribution of signal is relatively flat or uniform When, it is intended to equal-probability distribution, entropy function takes the larger value, and inverse takes smaller value;Conversely, working as the Power Spectrum Distribution of signal more It concentrates or uneven, entropy function takes smaller value, and inverse takes the larger value.Since voice signal has resonance peak structure, power spectrum Distribution more collection neutralizes unevenly, so spectrum entropy is relatively low, inverse is the larger value;Noise signal (white noise, powder noise etc.) Power spectrum more dissipate, spectrum entropy it is bigger, inverse be smaller value, so as to which voice signal and noise signal are distinguished Come.End-point detecting method based on spectrum entropy method has the feature less by voice signal energy affect, therefore has one to noise Fixed robustness;But under actual noisy environment, such as dining room or subway are flooded with noisy people's noise, Noise Generated by Running Vehicles Under equal environment, noise signal and voice signal all have the Power Spectrum Distribution more concentrated, and make the sound end based on spectrum entropy method Detection method is difficult to accurately estimate.
Summary of the invention
The present invention provides a kind of methods of speech terminals detection, to solve sound end detecting method in the prior art The technical issues of accurately estimating is difficult under the noise that Power Spectrum Distribution is more concentrated;The present invention can be suitable for Power Spectrum Distribution The noise type comparatively concentrated, and improve the accuracy of speech terminals detection.
In order to solve the above-mentioned technical problem, the embodiment of the invention provides a kind of sound end detecting methods, comprising:
Simultaneously framing is filtered to the received voice signal of institute, obtains a signal;
Calculate the energy and frequency spectrum of a signal described in every frame;
Weighted factor is constructed according to the energy, and spectrum weighting is carried out to the frequency spectrum using the weighted factor, is obtained Secondary singal;
Calculate the power spectrum and spectrum energy summation of secondary singal described in every frame;
According to the power spectrum and the spectrum energy summation, the short-time spectrum entropy of secondary singal described in every frame is calculated;
The detection threshold value of sound end is used as using the average value reciprocal of the short-time spectrum entropy of several frames, carry out speech frame with The judgement of noise frame.
Preferably, detection of the average value reciprocal of the short-time spectrum entropy using several frames as sound end Threshold value carries out the judgement of speech frame and noise frame, specifically:
The detection threshold value is compared with the short-time spectrum entropy of secondary singal described in every frame;
When the short-time spectrum entropy is greater than the detection threshold value, then determine that the signal frame for corresponding to the short-time spectrum entropy is Speech frame;
When the short-time spectrum entropy is less than or equal to the detection threshold value, the signal of the corresponding short-time spectrum entropy is determined Frame is noise frame.
Preferably, the energy and frequency spectrum for calculating a signal described in every frame, specifically:
End-point detecting method based on energy calculates the ENERGY E (n) of a signal described in every frame;
The frequency spectrum X (n, l) of a signal described in every frame is calculated using Fourier transformation;
Wherein,N=1,2,3 ..., N, a signal be x (n, m), n=1,2,3 ..., N, m=1,2,3 ..., M, N are frame number, and M is frame length;
X (n, l)=fft (x (n, m)), fft are Fast Fourier Transform (FFT), and l is frequency.
Preferably, which is characterized in that it is described according to the energy construct weighted factor, and using the weighting because Son carries out spectrum weighting to the frequency spectrum, obtains secondary singal, specifically:
The ENERGY E (n) of a signal described in every frame is normalized, and constructs weighted factor e (n);
Spectrum weighting is carried out using the weighted factor e (n) frequency spectrum X (n, l) of a signal described in every frame, obtains every frame The secondary singal Xg(n,l);
Wherein, e (n) is weighted factor, e (n)=1-Eg(n), Eg(n)=E (n)/max (E (n));
Xg(n, l)=X (n, l)/| X (n, l) |e(n)
Preferably, the power spectrum and spectrum energy summation for calculating secondary singal described in every frame, specifically:
Calculate the power spectrum modulus value S (n, l) and spectrum energy summation Y (n) of secondary singal described in every frame;
Wherein, S (n, l)=| Xg(n,l).*Xg(n, l) |,L is the length of Fourier transformation;
Preferably, described according to the power spectrum and the spectrum energy summation, calculate secondary singal described in every frame Short-time spectrum entropy, specifically:
The spectrum of secondary singal described in every frame is calculated according to the power spectrum modulus value S (n, l) and the spectrum energy summation Y (n) Probability density function P (n, l);
In short-term according to secondary singal described in the every frame of the spectrum probability density function P (n, l) of secondary singal described in every frame calculating It composes entropy H (n);
Wherein, P (n, l)=S (n, l)/Y (n);
Preferably, detection of the average value reciprocal of the short-time spectrum entropy using several frames as sound end Threshold value carries out the judgement of speech frame and noise frame, specifically:
Using the average value reciprocal of the continuous preceding Z frame spectrum entropy in spectrum entropy described in N frame as the detection threshold of sound end Value K;
Wherein,Z < < N, J (n)=1/H (n).
In order to solve identical technical problem, the embodiment of the invention provides a kind of speech terminals detection devices, comprising:
Preprocessing module obtains a signal for being filtered simultaneously framing to the received voice signal of institute;
First computing module, for calculating the energy and frequency spectrum of a signal described in every frame;
Weighting block is composed, for constructing weighted factor according to the energy, and using the weighted factor to the frequency spectrum Spectrum weighting is carried out, secondary singal is obtained;
Second computing module, for calculating the power spectrum and spectrum energy summation of secondary singal described in every frame;
Third computing module, for calculating secondary singal described in every frame according to the power spectrum and the spectrum energy summation Short-time spectrum entropy;
Judgment module, for the detection threshold using the average value reciprocal of the short-time spectrum entropy of several frames as sound end Value carries out the judgement of speech frame and noise frame.
In order to solve the above-mentioned technical problem, the embodiment of the invention provides a kind of speech terminals detection equipment, including processing Device, memory and storage in the memory and are configured as the computer program executed by the processor, the place Reason device realizes such as above-mentioned sound end detecting method when executing the computer program.
Compared with the prior art, the beneficial effect of the embodiment of the present invention is, the embodiment of the invention provides a kind of voices End-point detecting method, including simultaneously framing is filtered to the received voice signal of institute, obtain a signal;It calculates one described in every frame The energy and frequency spectrum of secondary signal;Weighted factor is constructed according to the energy, and the frequency spectrum is carried out using the weighted factor Spectrum weighting, obtains secondary singal;Calculate the power spectrum and spectrum energy summation of secondary singal described in every frame;According to the power spectrum and The spectrum energy summation calculates the short-time spectrum entropy of secondary singal described in every frame;With the inverse of the short-time spectrum entropy of several frames Detection threshold value of the average value as sound end carries out the judgement of speech frame and noise frame.
Under the noise type that Power Spectrum Distribution is comparatively concentrated, using energy computation results construction weighted factor and The frequency spectrum of a signal described in every frame carries out spectrum weighting processing to obtain the secondary singal, thus to the frequency spectrum of noise signal into The a degree of albefaction of row, can make the Power Spectrum Distribution of noise signal more flat and uniform, and then it is short to increase noise signal When compose entropy so that the inverse of noise signal short-time spectrum entropy takes smaller value;Meanwhile voice signal power spectrum is retained, Voice signal short-time spectrum entropy is smaller, and the inverse of short-time spectrum entropy takes the larger value;So as to distinguish voice signal and noise Signal, and then improve the accuracy of speech terminals detection.By the way that the end-point detecting method based on energy is dissolved into spectrum entropy method, And be weighted to energy on spectral whitening by exponential form, it can play the role of controlling spectral whitening degree, thus in power spectrum It is distributed under the noise type comparatively concentrated and is able to carry out accurate end-point detection, and then effectively improve spectrum entropy French The accuracy rate of voice endpoint detection.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of sound end detecting method provided by the invention;
Fig. 2 is a kind of flow diagram of sound end detecting method provided by the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
First embodiment of the invention:
Referring to Figure 1, first embodiment of the invention provides a kind of sound end detecting method, includes at least:
S1: being filtered the received voice signal of institute and framing, obtains a signal;
S2: the energy and frequency spectrum of a signal described in every frame are calculated;
S3: weighted factor is constructed according to the energy, and spectrum weighting is carried out to the frequency spectrum using the weighted factor, is obtained To secondary singal;
By the way that the end-point detecting method based on energy to be dissolved into spectrum entropy method, and energy is weighted to by exponential form On spectral whitening, it can play the role of controlling spectral whitening degree, thus the noise type comparatively concentrated in Power Spectrum Distribution Under be able to carry out accurately using spectrum entropy method progress end-point detection, and then effectively improve the accurate of speech terminals detection Rate.
S4: the power spectrum and spectrum energy summation of secondary singal described in every frame are calculated;
S5: according to the power spectrum and the spectrum energy summation, the short-time spectrum entropy of secondary singal described in every frame is calculated;
S6: using the average value reciprocal of the short-time spectrum entropy of several frames as the detection threshold value of sound end, voice is carried out The judgement of frame and noise frame.
The present embodiment utilizes adding for energy computation results construction under the noise type that Power Spectrum Distribution is comparatively concentrated The frequency spectrum of a signal described in weight factor and every frame carries out spectrum weighting and handles to obtain the secondary singal, can be to noise signal Frequency spectrum carries out a degree of albefaction, keeps the Power Spectrum Distribution of noise signal more flat and uniform, makes noise signal short-time spectrum Entropy increases, so that the inverse of noise signal short-time spectrum entropy takes smaller value, and retains voice signal power spectrum, voice Signal short-time spectrum entropy is smaller, and the inverse of voice signal short-time spectrum entropy is the larger value, so as to distinguish voice signal and make an uproar Acoustical signal, and then improve the accuracy of spectrum entropy method speech terminals detection.
In embodiments of the present invention, the average value reciprocal of the short-time spectrum entropy using several frames is as sound end Detection threshold value carries out the judgement of speech frame and noise frame, specifically:
The detection threshold value is compared with the short-time spectrum entropy of secondary singal described in every frame;
When the short-time spectrum entropy is greater than the detection threshold value, then determine that the signal frame for corresponding to the short-time spectrum entropy is Speech frame;
When the short-time spectrum entropy is less than or equal to the detection threshold value, the signal of the corresponding short-time spectrum entropy is determined Frame is noise frame.
In embodiments of the present invention, the energy and frequency spectrum for calculating a signal described in every frame, specifically:
End-point detecting method based on energy calculates the ENERGY E (n) of a signal described in every frame;
The frequency spectrum X (n, l) of a signal described in every frame is calculated using Fourier transformation;
Wherein,N=1,2,3 ..., N, a signal be x (n, m), n=1,2,3 ..., N, m=1,2,3 ..., M, N are frame number, and M is frame length;
X (n, l)=fft (x (n, m)), fft are Fast Fourier Transform (FFT), and l is frequency.
In embodiments of the present invention, described that weighted factor is constructed according to the energy, and using the weighted factor to institute It states frequency spectrum and carries out spectrum weighting, obtain secondary singal, specifically:
The ENERGY E (n) of a signal described in every frame is normalized, and constructs weighted factor e (n);
Spectrum weighting is carried out using the weighted factor e (n) frequency spectrum X (n, l) of a signal described in every frame, obtains every frame The secondary singal Xg(n,l);
Wherein, e (n) is weighted factor, e (n)=1-Eg(n), Eg(n)=E (n)/max (E (n));
Xg(n, l)=X (n, l)/| X (n, l) |e(n)
In this way by the way that the end-point detecting method based on energy to be dissolved into spectrum entropy method, and energy is added by exponential form It weighs on spectral whitening, can play the role of controlling spectral whitening degree, thus the noise comparatively concentrated in Power Spectrum Distribution It is able to carry out under type and accurately carries out end-point detection using spectrum entropy method, and then improve the accuracy rate of speech terminals detection.
In embodiments of the present invention, the power spectrum and spectrum energy summation for calculating secondary singal described in every frame, specifically:
Calculate the power spectrum modulus value S (n, l) and spectrum energy summation Y (n) of secondary singal described in every frame;
Wherein, S (n, l)=| Xg(n,l).*Xg(n, l) |,L is the length of Fourier transformation;
In embodiments of the present invention, described according to the power spectrum and the spectrum energy summation, it calculates secondary described in every frame The short-time spectrum entropy of signal, specifically:
The spectrum of secondary singal described in every frame is calculated according to the power spectrum modulus value S (n, l) and the spectrum energy summation Y (n) Probability density function P (n, l);
In short-term according to secondary singal described in the every frame of the spectrum probability density function P (n, l) of secondary singal described in every frame calculating It composes entropy H (n);
Wherein, P (n, l)=S (n, l)/Y (n);
In embodiments of the present invention, the average value reciprocal of the short-time spectrum entropy using several frames is as sound end Detection threshold value carries out the judgement of speech frame and noise frame, specifically:
Using the average value reciprocal of the continuous preceding Z frame spectrum entropy in spectrum entropy described in N frame as the detection threshold of sound end Value K;
Wherein,Z < < N, J (n)=1/H (n).
The present embodiment utilizes adding for energy computation results construction under the noise type that Power Spectrum Distribution is comparatively concentrated The frequency spectrum of a signal described in weight factor and every frame carries out spectrum weighting and handles to obtain the secondary singal, can be to noise signal Frequency spectrum carries out a degree of albefaction, keeps the Power Spectrum Distribution of noise signal more flat and uniform, makes noise signal short-time spectrum Entropy increases, so that the inverse of noise signal short-time spectrum entropy takes smaller value, and retains voice signal power spectrum, voice Signal short-time spectrum entropy is smaller, and the inverse of voice signal short-time spectrum entropy is the larger value, so as to distinguish voice signal and make an uproar Acoustical signal, and then improve the accuracy of spectrum entropy method speech terminals detection.
Fig. 2 is referred to, sound end detecting method of the invention, the process of one of feasible specific embodiment is such as Under:
1, voice signal to be measured is received by microphone, voice signal to be measured is denoted as x (t);
2, simultaneously sub-frame processing is filtered to the received voice signal of institute, obtains a signal and is denoted as x (n, m), wherein N=1,2,3 ..., N, N frame number, m=1,2,3 ..., M, M are the frame length of every frame;
3, the energy of a signal x (n, m) described in every frame is estimated, calculates the ENERGY E of a signal described in every frame (n), calculating process is as follows:
4, the ENERGY E (n) of a signal described in every frame is normalized to obtain Eg(n), and weighted factor e is constructed (n), calculating process is as follows:
Eg(n)=E (n)/max (E (n)),
E (n)=1-Eg(n);
5, a signal x (n, m) described in every frame carries out Fourier transformation, obtains the frequency spectrum X of a signal described in every frame (n, l), calculating process are as follows:
X (n, l)=fft (x (n, m)),
Wherein, fft is Fast Fourier Transform (FFT), and l is frequency;
6, spectrum weighting is carried out to the frequency spectrum X (n, l) using the weighted factor to handle, obtain secondary singal Xg(n, l), Calculating process is as follows:
Xg(n, l)=X (n, l)/| X (n, l) |e(n)
7, the power spectrum modulus value S (n, l) of secondary singal described in every frame is calculated, calculating process is as follows:
S (n, l)=| Xg(n,l).*Xg(n,l)|;
8, the spectrum energy summation Y (n) of secondary singal described in every frame is calculated, calculating process is as follows:
Wherein, L is the length of Fourier transformation;
9, the spectrum probability density function P (n, l) of secondary singal described in every frame is calculated, calculated result is as follows:
P (n, l)=S (n, l)/Y (n)
10, the short-time spectrum entropy H (n) of secondary singal described in every frame is calculated, calculated result is as follows:
11, the J (n) reciprocal of the short-time spectrum entropy of secondary singal described in every frame is calculated, calculated result is as follows:
J (n)=1/H (n);
12, take the average value of spectrum entropy of preceding 20 frame as the detection threshold value K, calculated result is as follows:
Compared with the prior art, a kind of sound end detecting method provided in an embodiment of the present invention has following beneficial to effect Fruit:
(1) under the noise type that Power Spectrum Distribution is comparatively concentrated, using energy computation results construction weighting because The frequency spectrum of a signal described in sub and every frame carries out spectrum weighting processing to obtain the secondary singal, thus to the frequency of noise signal Spectrum carries out a degree of albefaction, and the Power Spectrum Distribution of noise signal can be made more flat and uniform, and then increases noise letter Number short-time spectrum entropy, so that the inverse of noise signal short-time spectrum entropy takes smaller value;Meanwhile voice signal power spectrum is protected It stays, voice signal short-time spectrum entropy is smaller, and the inverse of short-time spectrum entropy takes the larger value;So as to distinguish voice signal and make an uproar Acoustical signal, and then improve the accuracy of speech terminals detection.
(2) by the way that the end-point detecting method based on energy to be dissolved into spectrum entropy method, and energy is added by exponential form It weighs on spectral whitening, can play the role of controlling spectral whitening degree, thus the noise comparatively concentrated in Power Spectrum Distribution It is able to carry out accurate end-point detection under type, and then effectively improves the accuracy rate of spectrum entropy method speech terminals detection.
(3) a degree of albefaction is carried out using frequency spectrum of the spectral whitening technology to noise section signal, makes noise signal Power Spectrum Distribution is more flat and uniform, to make to compose entropy increase;Voice signal power spectrum is retained, and spectrum entropy is less, can Distinguish the spectrum entropy of voice signal and noise signal, to improve the accuracy detected under a variety of noises.
(4) it is incorporated in spectrum entropy method, is had insensitive for noise type excellent using the end-point detecting method based on energy Energy is weighted in spectral whitening method by point by way of index, to control spectral whitening degree;Frequency spectrum is weighted Method combines the method that is weighted to energy by way of index on spectral whitening, can be carried out under various noise types compared with For accurate end-point detection, to improve the accuracy detected under a variety of noises.
Second embodiment of the invention:
Second embodiment of the invention provides a kind of speech terminals detection device, comprising:
Preprocessing module obtains a signal for being filtered simultaneously framing to the received voice signal of institute;
First computing module, for calculating the energy and frequency spectrum of a signal described in every frame;
Weighting block is composed, for constructing weighted factor according to the energy, and using the weighted factor to the frequency spectrum Spectrum weighting is carried out, secondary singal is obtained;
Second computing module, for calculating the power spectrum and spectrum energy summation of secondary singal described in every frame;
Third computing module, for calculating secondary singal described in every frame according to the power spectrum and the spectrum energy summation Short-time spectrum entropy;
Judgment module, for the detection threshold using the average value reciprocal of the short-time spectrum entropy of several frames as sound end Value carries out the judgement of speech frame and noise frame.
In embodiments of the present invention, the judgment module, is also used to:
The detection threshold value is compared with the short-time spectrum entropy of secondary singal described in every frame;
When the short-time spectrum entropy is greater than the detection threshold value, then determine that the signal frame for corresponding to the short-time spectrum entropy is Speech frame;
When the short-time spectrum entropy is less than or equal to the detection threshold value, the signal of the corresponding short-time spectrum entropy is determined Frame is noise frame.
First computing module, is also used to:
End-point detecting method based on energy calculates the ENERGY E (n) of a signal described in every frame;
The frequency spectrum X (n, l) of a signal described in every frame is calculated using Fourier transformation;
Wherein,N=1,2,3 ..., N, a signal be x (n, m), n=1,2,3 ..., N, m=1,2,3 ..., M, N are frame number, and M is frame length;
X (n, l)=fft (x (n, m)), fft are Fast Fourier Transform (FFT), and l is frequency.
The spectrum weighting block, is also used to:
The ENERGY E (n) of a signal described in every frame is normalized, and constructs weighted factor e (n);
Spectrum weighting is carried out using the weighted factor e (n) frequency spectrum X (n, l) of a signal described in every frame, obtains every frame The secondary singal Xg(n,l);
Wherein, e (n) is weighted factor, e (n)=1-Eg(n), Eg(n)=E (n)/max (E (n));
Xg(n, l)=X (n, l)/| X (n, l) |e(n)
Second computing module, is also used to:
Calculate the power spectrum modulus value S (n, l) and spectrum energy summation Y (n) of secondary singal described in every frame;
Wherein, S (n, l)=| Xg(n,l).*Xg(n, l) |,L is the length of Fourier transformation.
The third computing module, is also used to:
The spectrum of secondary singal described in every frame is calculated according to the power spectrum modulus value S (n, l) and the spectrum energy summation Y (n) Probability density function P (n, l);
In short-term according to secondary singal described in the every frame of the spectrum probability density function P (n, l) of secondary singal described in every frame calculating It composes entropy H (n);
Wherein, P (n, l)=S (n, l)/Y (n);
The judgment module, is also used to:
Using the average value reciprocal of the continuous preceding Z frame spectrum entropy in spectrum entropy described in N frame as the detection threshold of sound end Value K;
Wherein,Z < < N, J (n)=1/H (n).
Third embodiment of the invention:
Third embodiment of the invention additionally provides a kind of speech terminals detection equipment, including processor, memory and deposits It stores up in the memory and is configured as the computer program executed by the processor, such as object fixed routine.It is described Processor is realized when executing the computer program such as the step of above-mentioned sound end detecting method, such as step shown in FIG. 1 Rapid S1.Alternatively, the processor realizes the function of each module/unit in above-mentioned each Installation practice when executing the computer program Can, such as analysis and assessment module.
Illustratively, the computer program can be divided into one or more module/units, one or more A module/unit is stored in the memory, and is executed by the processor, to complete the present invention.It is one or more A module/unit can be the series of computation machine program instruction section that can complete specific function, and the instruction segment is for describing institute State implementation procedure of the computer program in the speech terminals detection equipment.
The speech terminals detection equipment can be the meter such as desktop PC, notebook, palm PC and Intelligent flat Calculate equipment.The speech terminals detection equipment may include, but be not limited only to, processor, memory.Those skilled in the art can be with Understand, above-mentioned component is only the example of speech terminals detection equipment, does not constitute the restriction to speech terminals detection equipment, can To include perhaps combining certain components or different components, such as the sound end than above-mentioned more or fewer components Detection device can also include input-output equipment, network access equipment, bus etc..
Alleged processor can be central processing unit (Central Processing Unit, CPU), can also be it His general processor, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng the processor is the control centre of the speech terminals detection equipment, utilizes various interfaces and the entire voice of connection The various pieces of end-point detection equipment.
The memory can be used for storing the computer program and/or module, and the processor is by operation or executes Computer program in the memory and/or module are stored, and calls the data being stored in memory, described in realization The various functions of speech terminals detection equipment.The memory can mainly include storing program area and storage data area, wherein deposit Store up program area can application program needed for storage program area, at least one function (for example sound-playing function, image play function Energy is equal) etc.;Storage data area, which can be stored, uses created data (such as audio data, phone directory etc.) etc. according to mobile phone. Can also include nonvolatile memory in addition, memory may include high-speed random access memory, for example, hard disk, memory, Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card), at least one disk memory, flush memory device or other volatile solid-state parts.
Wherein, if module/unit of the speech terminals detection integration of equipments is realized in the form of SFU software functional unit And when sold or used as an independent product, it can store in a computer readable storage medium.Based on such Understand, the present invention realizes all or part of the process in above-described embodiment method, can also instruct phase by computer program The hardware of pass is completed, and the computer program can be stored in a computer readable storage medium, which exists When being executed by processor, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer journey Sequence code, the computer program code can be source code form, object identification code form, executable file or certain intermediate shapes Formula etc..The computer-readable medium may include: any entity or device, note that can carry the computer program code Recording medium, USB flash disk, mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), Random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium Deng.It should be noted that the content that the computer-readable medium includes can be real according to legislation in jurisdiction and patent The requirement trampled carries out increase and decrease appropriate, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium It does not include electric carrier signal and telecommunication signal.
It should be noted that the apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual It needs that some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.In addition, device provided by the invention In embodiment attached drawing, the connection relationship between module indicate between them have communication connection, specifically can be implemented as one or A plurality of communication bus or signal wire.Those of ordinary skill in the art are without creative efforts, it can understand And implement.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (9)

1. a kind of sound end detecting method, which comprises the following steps:
Simultaneously framing is filtered to the received voice signal of institute, obtains a signal;
Calculate the energy and frequency spectrum of a signal described in every frame;
Weighted factor is constructed according to the energy, and spectrum weighting is carried out to the frequency spectrum using the weighted factor, is obtained secondary Signal;
Calculate the power spectrum and spectrum energy summation of secondary singal described in every frame;
According to the power spectrum and the spectrum energy summation, the short-time spectrum entropy of secondary singal described in every frame is calculated;
Using the average value reciprocal of the short-time spectrum entropy of several frames as the detection threshold value of sound end, speech frame and noise are carried out The judgement of frame.
2. sound end detecting method as described in claim 1, which is characterized in that the short-time spectrum entropy with several frames The detection threshold value of average value reciprocal as sound end, carries out the judgement of speech frame and noise frame, specifically:
The detection threshold value is compared with the short-time spectrum entropy of secondary singal described in every frame;
When the short-time spectrum entropy is greater than the detection threshold value, then the signal frame for determining to correspond to the short-time spectrum entropy is voice Frame;
When the short-time spectrum entropy is less than or equal to the detection threshold value, determine that the signal frame of the corresponding short-time spectrum entropy is Noise frame.
3. sound end detecting method as described in claim 1, which is characterized in that described to calculate described in every frame signal Energy and frequency spectrum, specifically:
End-point detecting method based on energy calculates the ENERGY E (n) of a signal described in every frame;
The frequency spectrum X (n, l) of a signal described in every frame is calculated using Fourier transformation;
Wherein,N=1,2,3 ..., N, a signal are x (n, m), n=1,2,3 ..., N, m= 1,2,3 ..., M, N are frame number, and M is frame length;
X (n, l)=fft (x (n, m)), fft are Fast Fourier Transform (FFT), and l is frequency.
4. sound end detecting method as claimed in claim 3, which is characterized in that it is described according to the energy construct weighting because Son, and spectrum weighting is carried out to the frequency spectrum using the weighted factor, secondary singal is obtained, specifically:
The ENERGY E (n) of a signal described in every frame is normalized, and constructs weighted factor e (n);
Spectrum weighting is carried out using the weighted factor e (n) frequency spectrum X (n, l) of a signal described in every frame, is obtained described in every frame Secondary singal Xg(n,l);
Wherein, e (n) is weighted factor, e (n)=1-Eg(n), Eg(n)=E (n)/max (E (n));
Xg(n, l)=X (n, l)/| X (n, l) |e(n)
5. sound end detecting method as claimed in claim 4, which is characterized in that secondary singal described in the every frame of calculating Power spectrum and spectrum energy summation, specifically:
Calculate the power spectrum modulus value S (n, l) and spectrum energy summation Y (n) of secondary singal described in every frame;
Wherein, S (n, l)=| Xg(n,l).*Xg(n, l) |,L is the length of Fourier transformation.
6. sound end detecting method as claimed in claim 5, which is characterized in that described according to the power spectrum and the spectrum Energy summation calculates the short-time spectrum entropy of secondary singal described in every frame, specifically:
The spectrum outline of secondary singal described in every frame is calculated according to the power spectrum modulus value S (n, l) and the spectrum energy summation Y (n) Density function P (n, l);
According to the short-time spectrum entropy of secondary singal described in the every frame of the spectrum probability density function P (n, l) of secondary singal described in every frame calculating Value H (n);
Wherein, P (n, l)=S (n, l)/Y (n);
7. sound end detecting method as claimed in claim 6, which is characterized in that the short-time spectrum entropy with several frames The detection threshold value of average value reciprocal as sound end, carries out the judgement of speech frame and noise frame, specifically:
Using the average value reciprocal of the continuous preceding Z frame spectrum entropy in spectrum entropy described in N frame as the detection threshold value K of sound end;
Wherein,Z < < N, J (n)=1/H (n).
8. a kind of speech terminals detection device characterized by comprising
Preprocessing module obtains a signal for being filtered simultaneously framing to the received voice signal of institute;
First computing module, for calculating the energy and frequency spectrum of a signal described in every frame;
Weighting block is composed, for constructing weighted factor according to the energy, and the frequency spectrum is carried out using the weighted factor Spectrum weighting, obtains secondary singal;
Second computing module, for calculating the power spectrum and spectrum energy summation of secondary singal described in every frame;
Third computing module, for calculating the short of secondary singal described in every frame according to the power spectrum and the spectrum energy summation When compose entropy;
Judgment module, for the average value reciprocal using the short-time spectrum entropy of several frames as the detection threshold value of sound end, into The judgement of row speech frame and noise frame.
9. a kind of speech terminals detection equipment, which is characterized in that in the memory including processor, memory and storage And it is configured as the computer program executed by the processor, the processor is realized when executing the computer program as weighed Benefit requires the described in any item sound end detecting methods of 1-7.
CN201910311947.7A 2019-04-16 2019-04-16 Voice endpoint detection method, device and equipment Active CN110047519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910311947.7A CN110047519B (en) 2019-04-16 2019-04-16 Voice endpoint detection method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910311947.7A CN110047519B (en) 2019-04-16 2019-04-16 Voice endpoint detection method, device and equipment

Publications (2)

Publication Number Publication Date
CN110047519A true CN110047519A (en) 2019-07-23
CN110047519B CN110047519B (en) 2021-08-24

Family

ID=67277750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910311947.7A Active CN110047519B (en) 2019-04-16 2019-04-16 Voice endpoint detection method, device and equipment

Country Status (1)

Country Link
CN (1) CN110047519B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110648692A (en) * 2019-09-26 2020-01-03 苏州思必驰信息科技有限公司 Voice endpoint detection method and system
CN110995821A (en) * 2019-11-28 2020-04-10 深圳供电局有限公司 Power distribution network inspection system based on AI and intelligent helmet
CN111540368A (en) * 2020-05-07 2020-08-14 广州大学 Stable bird sound extraction method and device and computer readable storage medium
CN111650559A (en) * 2020-06-12 2020-09-11 深圳市裂石影音科技有限公司 Real-time processing two-dimensional sound source positioning method
CN112612008A (en) * 2020-12-08 2021-04-06 中国人民解放军陆军工程大学 Method and device for extracting initial parameters of echo signals of high-speed projectile
CN116665717A (en) * 2023-08-02 2023-08-29 广东技术师范大学 Cross-subband spectral entropy weighted likelihood ratio voice detection method and system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1503467A (en) * 2002-11-25 2004-06-09 ض� Noise matching for echo cancellers
CN1689072A (en) * 2002-08-16 2005-10-26 数字信号处理工厂有限公司 Method and system for processing subband signals using adaptive filters
KR100930061B1 (en) * 2008-01-22 2009-12-08 성균관대학교산학협력단 Signal detection method and apparatus
CN101599269A (en) * 2009-07-02 2009-12-09 中国农业大学 Sound end detecting method and device
CN101777349A (en) * 2009-12-08 2010-07-14 中国科学院自动化研究所 Auditory perception property-based signal subspace microphone array voice enhancement method
CN102044243A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method and device for voice activity detection (VAD) and encoder
US20120232890A1 (en) * 2011-03-11 2012-09-13 Kabushiki Kaisha Toshiba Apparatus and method for discriminating speech, and computer readable medium
US20130267796A1 (en) * 2010-12-01 2013-10-10 Universitat Politecnica De Catalunya System and method for the simultaneous, non-invasive estimation of blood glucose, glucocorticoid level and blood pressure
CN103426440A (en) * 2013-08-22 2013-12-04 厦门大学 Voice endpoint detection device and voice endpoint detection method utilizing energy spectrum entropy spatial information
US9123351B2 (en) * 2011-03-31 2015-09-01 Oki Electric Industry Co., Ltd. Speech segment determination device, and storage medium
CN106536011A (en) * 2014-05-15 2017-03-22 布莱阿姆青年大学 Low-power miniature LED-based UV absorption detector with low detection limits for capillary liquid chromatography
WO2018069719A1 (en) * 2016-10-16 2018-04-19 Sentimoto Limited Voice activity detection method and apparatus
EP3443557A1 (en) * 2016-04-12 2019-02-20 Fraunhofer Gesellschaft zur Förderung der Angewand Audio encoder for encoding an audio signal, method for encoding an audio signal and computer program under consideration of a detected peak spectral region in an upper frequency band

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1689072A (en) * 2002-08-16 2005-10-26 数字信号处理工厂有限公司 Method and system for processing subband signals using adaptive filters
CN1503467A (en) * 2002-11-25 2004-06-09 ض� Noise matching for echo cancellers
KR100930061B1 (en) * 2008-01-22 2009-12-08 성균관대학교산학협력단 Signal detection method and apparatus
CN101599269A (en) * 2009-07-02 2009-12-09 中国农业大学 Sound end detecting method and device
CN102044243A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method and device for voice activity detection (VAD) and encoder
CN101777349A (en) * 2009-12-08 2010-07-14 中国科学院自动化研究所 Auditory perception property-based signal subspace microphone array voice enhancement method
US20130267796A1 (en) * 2010-12-01 2013-10-10 Universitat Politecnica De Catalunya System and method for the simultaneous, non-invasive estimation of blood glucose, glucocorticoid level and blood pressure
US20120232890A1 (en) * 2011-03-11 2012-09-13 Kabushiki Kaisha Toshiba Apparatus and method for discriminating speech, and computer readable medium
US9123351B2 (en) * 2011-03-31 2015-09-01 Oki Electric Industry Co., Ltd. Speech segment determination device, and storage medium
CN103426440A (en) * 2013-08-22 2013-12-04 厦门大学 Voice endpoint detection device and voice endpoint detection method utilizing energy spectrum entropy spatial information
CN106536011A (en) * 2014-05-15 2017-03-22 布莱阿姆青年大学 Low-power miniature LED-based UV absorption detector with low detection limits for capillary liquid chromatography
EP3443557A1 (en) * 2016-04-12 2019-02-20 Fraunhofer Gesellschaft zur Förderung der Angewand Audio encoder for encoding an audio signal, method for encoding an audio signal and computer program under consideration of a detected peak spectral region in an upper frequency band
WO2018069719A1 (en) * 2016-10-16 2018-04-19 Sentimoto Limited Voice activity detection method and apparatus

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
CHAITANYA K, SINHA R.: "Energy and Entropy based Switching Algorithm for Speech Endpoint Detection in Varying SNR Conditions", 《NINTH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION》 *
RENEVEY P, DRYGAJLO A.: "Entropy based voice activity detection in very noisy conditions", 《SEVENTH EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY》 *
VLAJ D, KAČIČ Z, KOS M.: "Voice activity detection algorithm using nonlinear spectral weights, hangover and hangbefore criteria", 《COMPUTERS & ELECTRICAL ENGINEERING》 *
WU B F, WANG K C: "Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments", 《IEEE TRANSACTIONS ON SPEECH & AUDIO PROCESSING》 *
徐望: "连续语音识别的稳健性技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
梁龙腾: "基于传声器阵列的声源定位算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王博,郭英,韩立峰: "基于熵函数的语音端点检测算法研究", 《信号处理》 *
郑秋菊,李强,王岑: "噪声估计和谱熵结合的语音激活检测算法", 《现代电信科技》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110648692A (en) * 2019-09-26 2020-01-03 苏州思必驰信息科技有限公司 Voice endpoint detection method and system
CN110648692B (en) * 2019-09-26 2022-04-12 思必驰科技股份有限公司 Voice endpoint detection method and system
CN110995821A (en) * 2019-11-28 2020-04-10 深圳供电局有限公司 Power distribution network inspection system based on AI and intelligent helmet
CN111540368A (en) * 2020-05-07 2020-08-14 广州大学 Stable bird sound extraction method and device and computer readable storage medium
CN111540368B (en) * 2020-05-07 2023-03-14 广州大学 Stable bird sound extraction method and device and computer readable storage medium
CN111650559A (en) * 2020-06-12 2020-09-11 深圳市裂石影音科技有限公司 Real-time processing two-dimensional sound source positioning method
CN112612008A (en) * 2020-12-08 2021-04-06 中国人民解放军陆军工程大学 Method and device for extracting initial parameters of echo signals of high-speed projectile
CN116665717A (en) * 2023-08-02 2023-08-29 广东技术师范大学 Cross-subband spectral entropy weighted likelihood ratio voice detection method and system
CN116665717B (en) * 2023-08-02 2023-09-29 广东技术师范大学 Cross-subband spectral entropy weighted likelihood ratio voice detection method and system

Also Published As

Publication number Publication date
CN110047519B (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN110047519A (en) A kind of sound end detecting method, device and equipment
CN108229555B (en) Sample weights distribution method, model training method, electronic equipment and storage medium
CN105702263B (en) Speech playback detection method and device
CN106202329B (en) Sample data processing, data identification method and device, computer equipment
US6173074B1 (en) Acoustic signature recognition and identification
CN110956966B (en) Voiceprint authentication method, voiceprint authentication device, voiceprint authentication medium and electronic equipment
CN104637489B (en) The method and apparatus of sound signal processing
CN109584884A (en) A kind of speech identity feature extractor, classifier training method and relevant device
CN110261816B (en) Method and device for estimating direction of arrival of voice
CN105590629B (en) A kind of method and device of speech processes
CN115775564A (en) Audio processing method and device, storage medium and intelligent glasses
CN112420079B (en) Voice endpoint detection method and device, storage medium and electronic equipment
CN111144347B (en) Data processing method, device, platform and storage medium
CN113191787A (en) Telecommunication data processing method, device electronic equipment and storage medium
CN108847251A (en) A kind of voice De-weight method, device, server and storage medium
CN110930987B (en) Audio processing method, device and storage medium
CN110275138A (en) A kind of more sound localization methods removed using advantage sound source ingredient
Bui et al. A non-linear GMM KL and GUMI kernel for SVM using GMM-UBM supervector in home acoustic event classification
CN113223552A (en) Speech enhancement method, speech enhancement device, speech enhancement apparatus, storage medium, and program
CN113128660A (en) Deep learning model compression method and related equipment
CN110534128A (en) A kind of noise processing method, device, equipment and storage medium
CN115620748B (en) Comprehensive training method and device for speech synthesis and false identification evaluation
CN105989838B (en) Audio recognition method and device
CN115905642B (en) Method, system, terminal and storage medium for enhancing speaking emotion
CN111883183B (en) Voice signal screening method, device, audio equipment and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant