CN116665717B - Cross-subband spectral entropy weighted likelihood ratio voice detection method and system - Google Patents

Cross-subband spectral entropy weighted likelihood ratio voice detection method and system Download PDF

Info

Publication number
CN116665717B
CN116665717B CN202310963463.7A CN202310963463A CN116665717B CN 116665717 B CN116665717 B CN 116665717B CN 202310963463 A CN202310963463 A CN 202310963463A CN 116665717 B CN116665717 B CN 116665717B
Authority
CN
China
Prior art keywords
sub
frequency
band
subband
likelihood ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310963463.7A
Other languages
Chinese (zh)
Other versions
CN116665717A (en
Inventor
何伟俊
符志定
廖学远
何宇欣
林沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202310963463.7A priority Critical patent/CN116665717B/en
Publication of CN116665717A publication Critical patent/CN116665717A/en
Application granted granted Critical
Publication of CN116665717B publication Critical patent/CN116665717B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The application discloses a cross-subband spectral entropy weighted likelihood ratio voice detection method and a system, wherein non-uniform partial overlapping subband division is firstly carried out in a frequency domain, and spectral entropy characteristics of each subband are extracted; setting likelihood ratio weight values of corresponding sub-bands according to the entropy of the sub-bands and the ratio of the energy spectrum of the sub-bands to the average energy spectrum of the non-speech frame sub-bands; and finally, judging whether the signal of a certain frame is a voice frame or not by combining the weighted likelihood ratio with a preset threshold detection. The application has robustness under noise background according to the spectral entropy characteristics of the voice signal, uses the subband spectral entropy information to set the likelihood ratio weight in the likelihood ratio test detection method, uses the weighted likelihood ratio as one of the voice detection judgment bases, improves the detection accuracy of the likelihood ratio test voice detection method under the environment with low signal-to-noise ratio, and is suitable for the voice signal processing fields such as voice recognition, speaker recognition and the like.

Description

一种跨子带谱熵加权似然比语音检测方法及系统A cross-subband spectral entropy weighted likelihood ratio speech detection method and system

技术领域Technical field

本发明涉及语音检测技术领域,更具体的,涉及一种跨子带谱熵加权似然比语音检测方法及系统。The present invention relates to the technical field of speech detection, and more specifically, to a cross-subband spectral entropy weighted likelihood ratio speech detection method and system.

背景技术Background technique

语音激活检测(Voice Activity Detection, VAD)目的在于从信号中区分出语音信号和非语音信号。语音信号处理系统常涉及VAD检测问题。在语音编码系统中,通过VAD判断当前信号是否有语音,采取不同的比特分配方式或不同的编解码方法,从而在不影响合成语音质量的前提下降低编码速率;在语音识别系统或者说话人识别系统中,准确的VAD判决可提高识别率并节省处理时间。传统的语音激活检测主要是基于短时能量、过零率、谱熵、LPC参数、倒谱特征、高阶统计量等语音特征参数的方法,它们在信噪比较高的条件下具有令人满意的效果,但随着信噪比降低,检测性能急剧下降。Voice Activity Detection (VAD) aims to distinguish voice signals and non-voice signals from signals. Speech signal processing systems often involve VAD detection issues. In the speech coding system, VAD is used to determine whether the current signal contains speech, and different bit allocation methods or different coding and decoding methods are adopted to reduce the coding rate without affecting the quality of the synthesized speech; in speech recognition systems or speaker recognition In the system, accurate VAD judgment can improve the recognition rate and save processing time. Traditional speech activation detection is mainly based on speech feature parameters such as short-term energy, zero-crossing rate, spectral entropy, LPC parameters, cepstrum features, and high-order statistics. They have impressive performance under high signal-to-noise ratio conditions. Satisfactory results, but as the signal-to-noise ratio decreases, the detection performance drops sharply.

为了解决低信噪比下VAD问题,目前提出基于似然比检验的VAD算法,此方法利用高斯统计模型对信号的傅里叶变换系数按语音与非语音两种假设进行建模,通过似然比检验法评估两种统计模型与当前观测数据适配程度,从而作出VAD判决。一方面,语音信号的谱熵特征具有一定稳健性,当信噪比降低时,语音信号的谱熵形状大体保持不变;另一方面,语音信号的谱熵与幅度无关,而只与信号的随机性(即分布)有关,并且谱平坦度越大,谱熵值越大,语音的谱熵通常都要小于噪声的谱熵。不同频带的谱熵在相同时段呈现出对存在语音的判决能力不尽相同,考虑可利用不同子频带的谱熵值作为似然比检验法中似然比判决的辅助特征。此发明提出一种跨子带谱熵加权似然比语音激活检测方法,方法对子带进行非连续划分并计算子带的谱熵,根据子带谱熵设置子带频分量似然比权值,使用加权的似然比作为语音检测判决依据。In order to solve the VAD problem under low signal-to-noise ratio, a VAD algorithm based on the likelihood ratio test is currently proposed. This method uses the Gaussian statistical model to model the Fourier transform coefficient of the signal according to two assumptions: speech and non-speech. The ratio test method evaluates the degree of fit between the two statistical models and the current observation data to make a VAD decision. On the one hand, the spectral entropy characteristics of the speech signal have a certain degree of robustness. When the signal-to-noise ratio decreases, the shape of the spectral entropy of the speech signal remains roughly unchanged; on the other hand, the spectral entropy of the speech signal has nothing to do with the amplitude, but only with the signal's It is related to randomness (i.e. distribution), and the greater the spectral flatness, the greater the spectral entropy value. The spectral entropy of speech is usually smaller than the spectral entropy of noise. The spectral entropy of different frequency bands shows different ability to determine the presence of speech in the same period. It is considered that the spectral entropy values of different sub-bands can be used as auxiliary features for likelihood ratio determination in the likelihood ratio test method. This invention proposes a cross-subband spectral entropy weighted likelihood ratio speech activation detection method. The method divides subbands discontinuously and calculates the spectral entropy of the subbands, and sets the subband frequency component likelihood ratio weight according to the subband spectral entropy. , using weighted likelihood ratio as the basis for speech detection decision.

发明内容Contents of the invention

为了解决上述技术问题,本发明提出了一种跨子带谱熵加权似然比语音检测方法及系统。In order to solve the above technical problems, the present invention proposes a cross-subband spectral entropy weighted likelihood ratio speech detection method and system.

本发明第一方面提供了一种跨子带谱熵加权似然比语音检测方法,包括:A first aspect of the present invention provides a cross-subband spectral entropy weighted likelihood ratio speech detection method, including:

步骤S01:给定待检测采样信号,给定似然比判决阈值/>,给定时间区域有效性判决门限/>,给定傅里叶变换长度/>,对信号/>进行加窗分帧预处理,计算第/>帧信号各频点谱线的似然比检验值/>Step S01: Given the sampling signal to be detected , given the likelihood ratio decision threshold/> , the validity judgment threshold of a given time zone/> , given the Fourier transform length/> , to signal/> Perform windowing and framing preprocessing to calculate the Likelihood ratio test value of each frequency point spectral line of the frame signal/> ;

步骤S02:在频域范围进行子带划分,所述子带为非均匀部分重叠;Step S02: Divide sub-bands in the frequency domain, and the sub-bands are non-uniformly partially overlapping;

步骤S03:根据步骤S02中划分子带的频率上下限,计算第帧信号第/>个子带的能量谱/>,并计算第/>帧第/>个子带的谱熵/>Step S03: Based on the upper and lower frequency limits of the divided sub-bands in step S02, calculate the Frame signal/> Energy spectrum of subbands/> , and calculate the /> Frame/> Spectral entropy of subbands/> ;

步骤S04:计算所有非语音信号帧第个子带的平均能量谱/>Step S04: Calculate the number of all non-speech signal frames Average energy spectrum of subbands/> ;

步骤S05:根据第个子带谱熵大小以及第/>个子带的能量谱/>与第/>个子带平均能量谱/>的比值,设置子带的似然比权值权值/>Step S05: According to the The size of sub-band spectral entropy and the Energy spectrum of subbands/> And No./> Subband average energy spectrum/> The ratio of , sets the likelihood ratio weight of the subband/> ;

步骤S06:对似然比检验值按权值进行加权求和后计算平均值,根据似然比阈值判决第帧信号是否为语音。Step S06: Calculate the average value after weighting and summing the likelihood ratio test values according to the weights, and determine the third test value based on the likelihood ratio threshold. Whether the frame signal is speech.

本方案中:所述步骤S02具体为:In this solution: the step S02 is specifically:

子带划分的频段范围是,其中,/>为信号的采样率,为8000Hz或16000Hz;把整个频段范围划分为低频类频段和高频类频段,设置低频类频段范围是/>Hz,高频类频段范围是/>Hz;The frequency band range divided into sub-bands is , where,/> is the sampling rate of the signal, which is 8000Hz or 16000Hz; divide the entire frequency range into low-frequency frequency bands and high-frequency frequency bands, and set the low-frequency frequency band range to/> Hz, the high frequency band range is/> Hz;

根据确定低频类频段与高频类频段的子带划分数量,在低频类频段和高频类频段中根据子带数量进行子带划分,把相邻子带设置为部分重叠。according to Determine the number of sub-band divisions for the low-frequency frequency band and the high-frequency frequency band, divide the low-frequency frequency band and the high-frequency frequency band into sub-bands according to the number of sub-bands, and set adjacent sub-bands to partially overlap.

本方案中,根据确定低频类频段与高频类频段的子带划分数量,在低频类频段和高频类频段中根据子带数量进行子带划分,具体为:In this plan, according to Determine the number of sub-band divisions for low-frequency frequency bands and high-frequency frequency bands, and divide sub-bands in low-frequency frequency bands and high-frequency frequency bands according to the number of sub-bands, specifically as follows:

当采样率为8000Hz时,划分为5个子带,低频类频段均匀划分为2个子带,每个子带宽度为/>,高频类频段均匀划分为3个子带,每个子带宽度为/>When the sampling rate When it is 8000Hz, it is divided into 5 sub-bands, and the low-frequency frequency band is evenly divided into 2 sub-bands, and the width of each sub-band is/> , the high-frequency frequency band is evenly divided into 3 sub-bands, and the width of each sub-band is/> ;

当采样率为16000Hz时,划分为10个子带,低频类频段均匀划分为4个子带,每个子带宽度为/>,高频类频段均匀划分为6个子带,每个子带宽度为/>When the sampling rate When it is 16000Hz, it is divided into 10 sub-bands. The low-frequency frequency band is evenly divided into 4 sub-bands, and the width of each sub-band is/> , the high-frequency band is evenly divided into 6 sub-bands, and the width of each sub-band is/> ;

按此划分后得到各个子带是非重叠的,将划分得到的各个子带的边界频率视为各子带的频率上限和频率下限。设第个子带的频率上限为/>,第/>个子带的频率下限为/>。根据傅里叶变换长度/>以及/>和/>计算各子带频率上限和频率下限对应的频点/>和/>After this division, each sub-band is non-overlapping, and the boundary frequency of each divided sub-band is regarded as the upper frequency limit and lower frequency limit of each sub-band. Set up the first The upper frequency limit of the subband is/> , No./> The lower frequency limit of the subband is/> . According to the Fourier transform length/> and/> and/> Calculate the frequency points corresponding to the upper frequency limit and lower frequency limit of each subband/> and/> .

本方案中,把相邻子带设置为部分重叠,具体为:In this solution, adjacent subbands are set to partially overlap, specifically:

为子带总数,/>为频移量。对于前/>个子带,每个子带采用后向频移方式与其下一个子带形成部分重叠,即当/>时,/>,/>,/>为经频移后的第/>个子带的频率上限;set up is the total number of subbands,/> is the frequency shift amount. For ex/> Sub-bands, each sub-band uses backward frequency shift to partially overlap with the next sub-band, that is, when/> When,/> ,/> ,/> is the No./> after frequency shift The upper frequency limit of the subband;

个子带则采用前向频移方式与上一个子带形成部分重叠,即当/>时,,/>,/>为经频移后的第/>个子带的频率下限。No. The sub-band uses forward frequency shift to partially overlap with the previous sub-band, that is, when/> hour, ,/> ,/> is the No./> after frequency shift The lower frequency limit of the subband.

根据和/>计算各子带频率上限和下限对应的频点/>和/>according to and/> Calculate the frequency points corresponding to the upper and lower frequency limits of each subband/> and/> .

本方案中,所述步骤S03具体为:In this solution, the step S03 is specifically:

根据各子带的频率上限和频率下限/>,计算第/>帧第/>子带的能量谱/>,/>表示从/>到/>所有频点谱线的能量谱之和,/>,其中/>表示第/>帧第/>个频点的能量谱;According to the frequency upper limit of each sub-band and frequency lower limit/> , calculate the number/> Frame/> Energy spectrum of sub-bands/> ,/> Indicates from/> to/> The sum of the energy spectrum of all frequency spectral lines,/> , of which/> Indicates the first/> Frame/> The energy spectrum of a frequency point;

计算第帧第/>个子带的第/>个频点谱线的归一化概率密度函数/>Calculate the first Frame/> The subband's/> Normalized probability density function of spectral lines at frequency points/> , ;

计算第帧第/>个子带的谱熵/>,/>,/>表示第/>帧第/>子带中第/>个频点谱线的归一化概率密度函数。Calculate the first Frame/> Spectral entropy of subbands/> ,/> ,/> Indicates the first/> Frame/> Subband middle/> The normalized probability density function of the spectral line at a frequency point.

本方案中,所述步骤S04具体为:In this solution, the step S04 is specifically:

计算第个非语音信号帧第/>个子带的能量谱/>,/>Calculate the first non-speech signal frame/> Energy spectrum of subbands/> ,/> ;

计算所有非语音信号帧第个子带的平均能量谱/>,/>,/>为非语音信号帧的总数。Calculate the number of all non-speech signal frames Average energy spectrum of subbands/> ,/> ,/> is the total number of non-speech signal frames.

本方案中,所述步骤S05具体为:In this solution, the step S05 is specifically:

给定第个子带的谱熵预设阈值/>。若第/>帧第/>个子带的能量谱/>与非语音信号帧第/>个子带的平均能量谱/>的比值超过预设阈值/>,即/>,并且第/>个子带的谱熵小于预设阈值/>,即/>,则把第/>个子带中所有频点的似然比权值/>设置为1;Given no. Spectral entropy preset threshold of subband/> . If the first/> Frame/> Energy spectrum of subbands/> and non-speech signal frame/> Average energy spectrum of subbands/> The ratio exceeds the preset threshold/> , that is/> , and No./> Spectral entropy of subband Less than the preset threshold/> , that is/> , then put the first/> Likelihood ratio weights of all frequency points in the subband/> set to 1;

否则,则把第个子带中所有频点的似然比权值/>设置为0;Otherwise, put the Likelihood ratio weights of all frequency points in the subband/> set to 0;

其中,所述似然比权值具体为:Among them, the likelihood ratio weight Specifically:

;

式中表示第/>个频点的似然比权值,它归属第/>个子带,需说明的是,此处/>中的子带/>指的是按下限/>和上限/>划分的非重叠子带。in the formula Indicates the first/> The likelihood ratio weight of a frequency point, which belongs to the /> Subband, it should be noted that here/> Subbands in/> Refers to pressing the limit/> and upper limit/> divided into non-overlapping subbands.

本方案中,所述非语音信号帧具体指在过去已检测的时间范围内,在离第帧信号时间最近的一个有效时间区域中被检测为“非语音”的所有信号帧;In this solution, the non-speech signal frame specifically refers to the frame that has been detected in the past within the time range from the All signal frames that are detected as "non-speech" in a valid time area with the latest frame signal time;

在第一个有效时间区域出现之前,将被检测信号的前0.5s所有信号帧看作非语音信号帧。The signal will be detected before the first valid time zone occurs All signal frames in the first 0.5s are regarded as non-speech signal frames.

本方案中,所述有效时间区域,具体为:In this plan, the effective time area is specifically:

时间区域是指按一定数量信号帧形成的区域,设为时间区域内非语音信号帧的总数,/>为时间区域有效性判决门限;当/>时,则认为时间区域是有效的;否则,认为时间区域是无效的。The time area refers to the area formed by a certain number of signal frames, assuming is the total number of non-speech signal frames in the time region,/> is the time zone validity judgment threshold; when/> , the time area is considered valid; otherwise, the time area is considered invalid.

本发明第二方面还提供了一种跨子带谱熵加权似然比语音检测系统,该系统包括:存储器、处理器,所述存储器中包括一种跨子带谱熵加权似然比语音检测方法程序,所述处理器执行一种跨子带谱熵加权似然比语音检测方法的步骤。A second aspect of the present invention also provides a cross-subband spectral entropy weighted likelihood ratio speech detection system. The system includes: a memory and a processor. The memory includes a cross-subband spectral entropy weighted likelihood ratio speech detection system. Method program, the processor performs the steps of a cross-subband spectral entropy weighted likelihood ratio speech detection method.

本发明公开了一种跨子带谱熵加权似然比语音检测方法及系统,首先在频域进行非均匀部分重叠子带划分,提取各子带的谱熵特征;然后根据子带谱熵大小以及子带的能量谱与非语音帧子带平均能量谱的比值大小,设置对应子带的似然比权值;最后利用加权后的似然比结合预设阈值检测判断某帧信号是否为语音帧。本发明依据语音信号谱熵特征在噪声背景下具有稳健性,利用子带谱熵信息设置似然比检验检测法中的似然比权值,使用加权的似然比作为语音检测判决依据之一,提升了低信噪比环境下的似然比检验语音检测方法的检测准确率,适用于语音识别、说话人识别等语音信号处理领域。The invention discloses a cross-subband spectral entropy weighted likelihood ratio speech detection method and system. First, non-uniform partially overlapping sub-bands are divided in the frequency domain, and the spectral entropy characteristics of each sub-band are extracted; and then the spectral entropy size of the sub-band is determined. And the ratio of the energy spectrum of the subband to the average energy spectrum of the non-speech frame subband, set the likelihood ratio weight of the corresponding subband; finally, use the weighted likelihood ratio combined with the preset threshold detection to determine whether a certain frame signal is speech. frame. The present invention is robust in the noise background based on the spectral entropy characteristics of the speech signal, uses the sub-band spectral entropy information to set the likelihood ratio weight in the likelihood ratio test detection method, and uses the weighted likelihood ratio as one of the basis for the speech detection decision. , which improves the detection accuracy of the likelihood ratio test speech detection method in a low signal-to-noise ratio environment, and is suitable for speech signal processing fields such as speech recognition and speaker recognition.

附图说明Description of the drawings

为了更清楚地说明本发明实施例或示例性中的技术方案,下面将对实施例或示例性描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以按照这些附图示出的获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments or exemplary descriptions of the present invention, the drawings needed to be used in the embodiments or exemplary descriptions will be briefly introduced below. Obviously, the drawings in the following description are only for illustration purposes. For some embodiments of the invention, those of ordinary skill in the art can also obtain other drawings as shown in these drawings without exerting creative efforts.

图1示出了本发明一种跨子带谱熵加权似然比语音检测方法的流程图;Figure 1 shows a flow chart of a cross-subband spectral entropy weighted likelihood ratio speech detection method of the present invention;

图2示出了本发明计算各子带的谱熵的流程图;Figure 2 shows a flow chart for calculating the spectral entropy of each sub-band according to the present invention;

图3示出了本发明设置子带的似然比权值的方法流程图;Figure 3 shows a flow chart of the method of setting the likelihood ratio weight of a subband according to the present invention;

图4示出了本发明所提供方法与传统方法检测结果对比举例示意图;Figure 4 shows a schematic diagram showing an example of comparison of detection results between the method provided by the present invention and the traditional method;

图5示出了本发明一种跨子带谱熵加权似然比语音检测系统的框图。Figure 5 shows a block diagram of a cross-subband spectral entropy weighted likelihood ratio speech detection system of the present invention.

具体实施方式Detailed ways

为了能够更清楚地理解本发明的上述目的、特征和优点,下面结合附图和具体实施方式对本发明进行进一步的详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, as long as there is no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.

在下面的描述中阐述了很多具体细节以便于充分理解本发明,但是,本发明还可以采用其他不同于在此描述的其他方式来实施,因此,本发明的保护范围并不受下面公开的具体实施例的限制。Many specific details are set forth in the following description in order to fully understand the present invention. However, the present invention can also be implemented in other ways different from those described here. Therefore, the protection scope of the present invention is not limited by the specific details disclosed below. Limitations of Examples.

实施例1Example 1

图1示出了本发明一种跨子带谱熵加权似然比语音检测方法的流程图。Figure 1 shows a flow chart of a cross-subband spectral entropy weighted likelihood ratio speech detection method of the present invention.

如图1所示,本发明第一方面提供了一种跨子带谱熵加权似然比语音检测方法,包括:As shown in Figure 1, the first aspect of the present invention provides a cross-subband spectral entropy weighted likelihood ratio speech detection method, including:

步骤S01:给定待检测采样信号,给定似然比判决阈值/>,给定时间区域有效性判决门限/>,给定傅里叶变换长度/>,对信号/>进行加窗分帧预处理,计算第/>帧信号各频点谱线的似然比检验值/>Step S01: Given the sampling signal to be detected , given the likelihood ratio decision threshold/> , the validity judgment threshold of a given time zone/> , given the Fourier transform length/> , to signal/> Perform windowing and framing preprocessing to calculate the Likelihood ratio test value of each frequency point spectral line of the frame signal/> ;

步骤S02:在频域范围进行子带划分,所述子带为非均匀部分重叠;Step S02: Divide sub-bands in the frequency domain, and the sub-bands are non-uniformly partially overlapping;

步骤S03:根据步骤S02中划分子带的频率上下限,计算第帧信号第/>个子带的能量谱/>,并计算第/>帧第/>个子带的谱熵/>Step S03: Based on the upper and lower frequency limits of the divided sub-bands in step S02, calculate the Frame signal/> Energy spectrum of subbands/> , and calculate the /> Frame/> Spectral entropy of subbands/> ;

步骤S04:计算所有非语音信号帧第个子带的平均能量谱/>Step S04: Calculate the number of all non-speech signal frames Average energy spectrum of subbands/> ;

步骤S05:根据第个子带谱熵大小以及第/>个子带的能量谱/>与第/>个子带平均能量谱/>的比值,设置子带的似然比权值/>Step S05: According to the The size of sub-band spectral entropy and the Energy spectrum of subbands/> And No./> Subband average energy spectrum/> The ratio of , sets the likelihood ratio weight of the subband/> ;

步骤S06:对似然比检验值按权值进行加权求和后计算平均值,根据似然比阈值判决第帧信号是否为语音。Step S06: Calculate the average value after weighting and summing the likelihood ratio test values according to the weights, and determine the third test value based on the likelihood ratio threshold. Whether the frame signal is speech.

需要说明的是:所述步骤S02具体为:子带划分的频段范围是,其中,/>为信号的采样率,为8000Hz或16000Hz;把整个频段范围划分为低频类频段和高频类频段,设置低频类频段范围是/>Hz,高频类频段范围是/>Hz;It should be noted that the step S02 is specifically: the frequency band range divided into sub-bands is , where,/> is the sampling rate of the signal, which is 8000Hz or 16000Hz; divide the entire frequency range into low-frequency frequency bands and high-frequency frequency bands, and set the low-frequency frequency band range to/> Hz, the high frequency band range is/> Hz;

根据确定低频类频段与高频类频段的子带划分数量,在低频类频段和高频类频段中根据子带数量进行子带划分,把相邻子带设置为部分重叠。according to Determine the number of sub-band divisions for the low-frequency frequency band and the high-frequency frequency band, divide the low-frequency frequency band and the high-frequency frequency band into sub-bands according to the number of sub-bands, and set adjacent sub-bands to partially overlap.

需要说明的是,当采样率为8000Hz时,划分为5个子带,低频类频段均匀划分为2个子带,每个子带宽度为/>,高频类频段均匀划分为3个子带,每个子带宽度为/>;当采样率/>为16000Hz时,划分为10个子带,低频类频段均匀划分为4个子带,每个子带宽度为,高频类频段均匀划分为6个子带,每个子带宽度为/>。按此划分后得到各个子带是非重叠的,划分得到的各个子带的边界频率视为各子带的频率上限和频率下限。设第/>个子带的频率上限为/>,第/>个子带的频率下限为/>。根据傅里叶变换长度/>以及和/>计算各子带频率上限和频率下限对应的频点/>和/>It should be noted that when the sampling rate When it is 8000Hz, it is divided into 5 sub-bands, and the low-frequency frequency band is evenly divided into 2 sub-bands, and the width of each sub-band is/> , the high-frequency frequency band is evenly divided into 3 sub-bands, and the width of each sub-band is/> ;When the sampling rate/> When it is 16000Hz, it is divided into 10 sub-bands. The low-frequency frequency band is evenly divided into 4 sub-bands, and the width of each sub-band is , the high-frequency band is evenly divided into 6 sub-bands, and the width of each sub-band is/> . After this division, each sub-band is non-overlapping, and the boundary frequency of each divided sub-band is regarded as the upper frequency limit and lower frequency limit of each sub-band. Setting/> The upper frequency limit of the subband is/> , No./> The lower frequency limit of the subband is/> . According to the Fourier transform length/> as well as and/> Calculate the frequency points corresponding to the upper frequency limit and lower frequency limit of each subband/> and/> .

需要说明的是,需进一步把相邻子带设置为部分重叠。It should be noted that adjacent subbands need to be further set to partially overlap.

为子带总数,/>为频移量。对于前/>个子带,每个子带采用后向频移方式与其下一个子带形成部分重叠,即当/>时,/>,/>,/>为经频移后的第/>个子带的频率上限,/>为频移量;set up is the total number of subbands,/> is the frequency shift amount. For ex/> Sub-bands, each sub-band uses backward frequency shift to partially overlap with the next sub-band, that is, when/> When,/> ,/> ,/> is the No./> after frequency shift The upper frequency limit of the subband,/> is the frequency shift amount;

个子带则采用前向频移方式与上一个子带形成部分重叠,即当/>时,,/>,/>为经频移后的第/>个子带的频率下限,/>为频移量。根据和/>计算各子带频率上限和下限对应的频点/>和/>No. The sub-band uses forward frequency shift to partially overlap with the previous sub-band, that is, when/> hour, ,/> ,/> is the No./> after frequency shift The lower frequency limit of the subband,/> is the frequency shift amount. according to and/> Calculate the frequency points corresponding to the upper and lower frequency limits of each subband/> and/> .

需要说明的是,如图2所示,所述步骤S03具体为:It should be noted that, as shown in Figure 2, the step S03 is specifically:

S302,根据各子带的频率上限和频率下限/>,计算第/>帧第/>子带的能量谱/>表示从/>到/>所有频点谱线的能量谱之和/>,其中/>表示第/>帧第/>个频点的能量谱;S302, according to the upper frequency limit of each subband and frequency lower limit/> , calculate the number/> Frame/> Energy spectrum of sub-bands/> , Indicates from/> to/> The sum of the energy spectrum of all frequency spectral lines/> , of which/> Indicates the first/> Frame/> The energy spectrum of a frequency point;

S304,计算第帧第/>个子带的第/> 谱线的归一化概率密度函数/>S304, calculate the Frame/> The subband's/> Normalized probability density function of spectral lines/> , ;

S306,计算第帧第/>个子带的谱熵/>,/>,/>表示第/>帧第/>子带中第/>个频点谱线的归一化概率密度函数。S306, calculate the Frame/> Spectral entropy of subbands/> ,/> ,/> Indicates the first/> Frame/> Subband middle/> The normalized probability density function of the spectral line at a frequency point.

需要说明的是,所述步骤S04具体为:It should be noted that step S04 is specifically:

计算第个非语音信号帧第/>个子带的能量谱/>,/>Calculate the first non-speech signal frame/> Energy spectrum of subbands/> ,/> ;

计算所有非语音信号帧第个子带的平均能量谱/>,/>,/>为非语音信号帧的总数。Calculate the number of all non-speech signal frames Average energy spectrum of subbands/> ,/> ,/> is the total number of non-speech signal frames.

需要说明的是,如图3所示,所述步骤S05具体为:It should be noted that, as shown in Figure 3, the step S05 is specifically:

S502,给定第个子带的谱熵预设阈值/>。若第/>帧第/>个子带的能量谱/>与非语音信号帧第/>个子带的平均能量谱/>的比值超过预设阈值/>,即/>,并且第/>个子带的谱熵/>小于预设阈值/>,即/>,则把第/>个子带中所有频点的似然比权值/>设置为1;S502, given the Spectral entropy preset threshold of subband/> . If the first/> Frame/> Energy spectrum of subbands/> and non-speech signal frame/> Average energy spectrum of subbands/> The ratio exceeds the preset threshold/> , that is/> , and No./> Spectral entropy of subbands/> Less than the preset threshold/> , that is/> , then put the first/> Likelihood ratio weights of all frequency points in the subband/> set to 1;

S504,否则,则把第个子带中所有频点的似然比权值/>设置为0;S504, otherwise, put the Likelihood ratio weights of all frequency points in the subband/> set to 0;

其中,所述似然比权值具体为:Among them, the likelihood ratio weight Specifically:

; ;

式中表示第/>个频点的似然比权值,它归属第/>个子带,需说明的是,此处/>中的子带/>指的是按下限/>和上限/>划分的非重叠子带。in the formula Indicates the first/> The likelihood ratio weight of a frequency point, which belongs to the /> Subband, it should be noted that here/> Subbands in/> Refers to pressing the limit/> and upper limit/> divided into non-overlapping subbands.

需要说明的是,所述非语音信号帧具体指在过去已检测的时间范围内,在离第帧信号时间最近的一个有效时间区域中被检测为“非语音”的所有信号帧;在第一个有效时间区域出现之前,将被检测信号/>的前0.5s所有信号帧看作非语音信号帧。It should be noted that the non-speech signal frame specifically refers to the time frame that has been detected in the past and is close to the third frame. All signal frames that are detected as "non-speech" in the most recent valid time area of the frame signal time; signals will be detected before the first valid time area appears/> All signal frames in the first 0.5s are regarded as non-speech signal frames.

需要说明的是,时间区域是指的是按一定数量信号帧形成的区域,设为区域中的信号帧数量,第/>个时间区域的起始帧为第/>帧,结束帧为第/>帧时间区域指按一定数量信号帧形成的区域,设/>为时间区域内非语音信号帧的总数,/>为时间区域有效性判决门限;当/>时,则认为时间区域是有效的;否则,认为时间区域是无效的。It should be noted that the time area refers to the area formed by a certain number of signal frames. Assume is the number of signal frames in the area, No./> The starting frame of the time region is the /> frame, the end frame is the /> The frame time area refers to the area formed by a certain number of signal frames, assuming/> is the total number of non-speech signal frames in the time region,/> is the time zone validity judgment threshold; when/> , the time area is considered valid; otherwise, the time area is considered invalid.

实施例2Example 2

在本实施例中,制作了低信噪比信号样本并利用所提供方法实施语音激活检测,并把方法与传统的似然比语音激活检测方法进行效果对比。In this embodiment, a low signal-to-noise ratio signal sample is produced and the provided method is used to implement voice activation detection, and the effect of the method is compared with the traditional likelihood ratio voice activation detection method.

本发明提供的一种跨子带谱熵加权似然比语音检测方法,步骤如下:The invention provides a cross-subband spectral entropy weighted likelihood ratio speech detection method. The steps are as follows:

步骤S01:给定待检测采样信号,给定似然比判决阈值/>,给定时间区域有效性判决门限/>,给定傅里叶变换长度/>,对信号/>进行加窗分帧预处理,计算第/>帧信号各频点谱线的似然比检验值/>;Step S01: Given the sampling signal to be detected , given the likelihood ratio decision threshold/> , the validity judgment threshold of a given time zone/> , given the Fourier transform length/> , to signal/> Perform windowing and framing preprocessing to calculate the Likelihood ratio test value of each frequency point spectral line of the frame signal/> ;

在汉语普通话自然口语对话语料库(CADCC)中选择多人对话样本,选择多人对话样本,总时长约为20分37.526秒,共含528个语音段,采样率为8000Hz。对样本进行人工标注以便进行检测准确率统计,标注语音帧(包含元音和辅音)与非语音帧,其中语音帧约占75.03%,非语音帧约占24.97%。采用NOISEX-92噪声数据库作为叠加噪声源,选取噪声样本包括高斯白噪声(平稳噪声)和嘈杂噪声(非平稳噪声);把语音与噪声合成信噪比0dB的低信噪比语音信号样本,将其作为待检测采样信号Multi-person dialogue samples are selected from the Mandarin Chinese Natural Spoken Dialogue Corpus (CADCC). The total duration is about 20 minutes and 37.526 seconds, containing a total of 528 speech segments, and the sampling rate is 8000Hz. The samples were manually labeled for detection accuracy statistics. Speech frames (including vowels and consonants) and non-speech frames were labeled. Speech frames accounted for approximately 75.03% and non-speech frames accounted for approximately 24.97%. The NOISEX-92 noise database is used as the superimposed noise source, and the noise samples are selected to include Gaussian white noise (stationary noise) and noisy noise (non-stationary noise); the speech and noise are synthesized into a low signal-to-noise ratio speech signal sample with a signal-to-noise ratio of 0dB, and As the sampling signal to be detected .

当对高斯白噪声语音样本进行检测时,设置似然比判决阈值为0.6;当对嘈杂噪声语音样本进行检测时,设置似然比判决阈值/>为20。设置时间区域有效性判决门限/>为30帧,对信号/>进行加汉明窗分帧预处理,设置帧长为0.45ms,帧移为22.5ms。计算第/>帧信号/>各频点的似然比检验值/>,其中/>。/>傅里叶变换的长度,设置为360。When detecting Gaussian white noise speech samples, set the likelihood ratio decision threshold is 0.6; when detecting noisy noise speech samples, set the likelihood ratio decision threshold/> is 20. Set the time zone validity judgment threshold/> For 30 frames, for signal/> Perform Hamming window frame preprocessing, set the frame length to 0.45ms, and the frame shift to 22.5ms. Calculate the number/> Frame signal/> Likelihood ratio test value for each frequency point/> , of which/> . /> The length of the Fourier transform, set to 360.

步骤S02:在频域范围进行子带划分,所述子带为非均匀部分重叠;Step S02: Divide sub-bands in the frequency domain, and the sub-bands are non-uniformly partially overlapping;

(1)在频域范围进行子带划分(1) Divide sub-bands in the frequency domain

由于信号采样率为8000Hz,所以子带划分的频段范围是0/>4000Hz;把整个频段范围划分为低频类频段和高频类频段,设置低频类频段范围是0/>1000Hz,高频类频段范围是1000/>4000Hz;Since the signal sampling rate is 8000Hz, so the frequency range divided into sub-bands is 0/> 4000Hz; divide the entire frequency range into low-frequency frequency bands and high-frequency frequency bands, and set the low-frequency frequency band range to 0/> 1000Hz, the high frequency band range is 1000/> 4000Hz;

当信号采样率为8000Hz时,划分为5个子带,低频类频段均匀划分为2个子带,每个子带宽度为500Hz,高频类频段均匀划分为3个子带,每个子带宽度为1000Hz;按此划分后各个子带是非重叠的,将划分得到的各子带的频率边界视为各子带的频率上限和频率下限。根据傅里叶变换长度(/>=360)以及/>和/>,计算频率上限和频率下限对应的频点。非重叠子带频率上限与下限及对应频点谱线具体如表1所示。When the signal sampling rate When it is 8000Hz, it is divided into 5 sub-bands. The low-frequency band is evenly divided into 2 sub-bands, each sub-band has a width of 500 Hz. The high-frequency band is evenly divided into 3 sub-bands, each sub-band has a width of 1000 Hz. After this division, each The sub-bands are non-overlapping, and the frequency boundaries of each divided sub-band are regarded as the upper frequency limit and lower frequency limit of each sub-band. According to the Fourier transform length (/> =360) and/> and/> , calculate the frequency points corresponding to the upper frequency limit and the lower frequency limit. The upper and lower frequency limits of non-overlapping subbands and the corresponding frequency point spectrum lines are shown in Table 1.

表1非重叠子带频率上限与下限及对应频点Table 1 The upper and lower frequency limits of non-overlapping subbands and corresponding frequency points

; ;

(2)把相邻子带设置为部分重叠(2) Set adjacent sub-bands to partially overlap

进一步把相邻子带设置为部分重叠,当前子带总数为5,设置频移量/>为500Hz。对于前4个子带,每个子带采用后向频移方式与其下一个子带形成部分重叠,即当/>时,,/>,/>为经频移后的第/>个子带的频率上限,/>为频移量;第/>个子带则采用前向频移方式与上一个子带形成部分重叠,即当/>时,/>,/>为经频移后的第/>个子带的频率下限。根据/>。部分重叠子带频率上限与下限及对应频点谱线具体如表2所示。Further set adjacent sub-bands to partially overlap, the current total number of sub-bands To 5, set the frequency shift amount/> is 500Hz. For the first four sub-bands, each sub-band uses backward frequency shifting to partially overlap with the next sub-band, that is, when/> hour, ,/> ,/> is the No./> after frequency shift The upper frequency limit of the subband,/> is the frequency shift amount; No./> The sub-band uses forward frequency shift to partially overlap with the previous sub-band, that is, when/> When,/> , ,/> is the No./> after frequency shift The lower frequency limit of the subband. According to/> and . The upper and lower frequency limits of the partially overlapping subbands and the corresponding frequency point spectrum lines are shown in Table 2.

表2部分重叠子带频率上限与下限及对应频点Table 2 Upper and lower frequency limits of partially overlapping subbands and corresponding frequency points

; ;

步骤S03:根据步骤S02中划分的子带频率上下限,计算第帧信号第/>个子带的能量谱/>,并计算第/>帧第/>个子带的谱熵/>Step S03: Based on the upper and lower frequency limits of the subbands divided in step S02, calculate the Frame signal/> Energy spectrum of subbands/> , and calculate the /> Frame/> Spectral entropy of subbands/> ;

(1)根据已设置傅里叶变换的长度,对第/>帧信号/>进行快速傅里叶变换,表示为/>。/>,根据各子带的频率上限/>和频率下限/>,计算第/>帧第子带的能量谱/>,/>表示从/>到/>所有频点谱线的能量谱之和,/>,其中表示第/>帧第/>个频点的能量谱;(1) According to the length of the Fourier transform that has been set , right/> Frame signal/> Perform fast Fourier transform, expressed as/> . /> ,according to the frequency upper limit of each sub-band/> and frequency lower limit/> , calculate the number/> Frame No. Energy spectrum of sub-bands/> ,/> Indicates from/> to/> The sum of the energy spectrum of all frequency spectral lines,/> ,in Indicates the first/> Frame/> The energy spectrum of a frequency point;

(2)计算第帧第/>个子带的谱熵(2) Calculate the Frame/> Spectral entropy of subband

计算第帧第/>个子带的第/>个频点谱线的归一化概率密度函数/>;计算第/>帧第/>个子带的谱熵/>,/>,/>表示第/>帧第/>子带中第/>个频点谱线的归一化概率密度函数。Calculate the first Frame/> The subband's/> Normalized probability density function of spectral lines at frequency points/> , ;Calculate the first/> Frame/> Spectral entropy of subbands/> ,/> ,/> Indicates the first/> Frame/> Subband middle/> The normalized probability density function of the spectral line at a frequency point.

步骤S04:计算所有非语音信号帧第个子带的平均能量谱/>Step S04: Calculate the number of all non-speech signal frames Average energy spectrum of subbands/> ;

计算第个非语音信号帧第/>个子带的能量谱/>,/>;计算所有非语音信号帧第/>个子带的平均能量谱/>,/>,/>为非语音信号帧的总数。Calculate the first non-speech signal frame/> Energy spectrum of subbands/> ,/> ;Calculate the number of all non-speech signal frames/> Average energy spectrum of subbands/> ,/> ,/> is the total number of non-speech signal frames.

所述非语音信号帧具体指在过去已检测的时间范围内,在离第帧信号时间最近的一个有效时间区域中被检测为“非语音”的所有信号帧;在第一个有效时间区域出现之前,将被检测信号/>的前0.5s所有信号帧看作是非语音信号帧。The non-speech signal frame specifically refers to the time frame that has been detected in the past, from the All signal frames that are detected as "non-speech" in the most recent valid time area of the frame signal time; signals will be detected before the first valid time area appears/> All signal frames in the first 0.5s are regarded as non-speech signal frames.

所述时间区域指的是按一定数量信号帧形成的区域,设为区域中的信号帧数量,设置/>,第/>个时间区域的起始帧为第/>帧,结束帧为第帧时间区域指按一定数量信号帧形成的区域,设/>为时间区域内非语音信号帧的总数,/>为时间区域有效性判决门限,设置/>;当/>时,则认为时间区域是有效的;否则,认为时间区域是无效的。The time area refers to an area formed by a certain number of signal frames, assuming For the number of signal frames in the area, set/> , No./> The starting frame of the time region is the /> frame, the end frame is the The frame time area refers to the area formed by a certain number of signal frames, assuming/> is the total number of non-speech signal frames in the time region,/> For the time zone validity judgment threshold, set/> ;when/> , the time area is considered valid; otherwise, the time area is considered invalid.

步骤S05:根据第个子带谱熵大小以及第/>个子带的能量谱/>与第/>个子带平均能量谱/>的比值,设置子带的似然比权值/>Step S05: According to the The size of sub-band spectral entropy and the Energy spectrum of subbands/> And No./> Subband average energy spectrum/> The ratio of , sets the likelihood ratio weight of the subband/> ;

设置阈值,计算前0.5s信号(约22帧)各子带谱熵的均值作为各子带谱熵的预设阈值/>,具体如表3所示。若第/>帧第/>个子带的能量谱/>与非语音信号帧第/>个子带的平均能量谱/>的比值超过预设阈值/>,即/>,并且第/>个子带的谱熵/>小于预设阈值/>(表3),即/>,则把第/>个子带中所有频点的似然比权值/>设置为1;否则,把第/>个子带中所有频点的似然比权值设置为0。具体如下:Set threshold , calculate the mean value of each sub-band spectral entropy of the first 0.5s signal (about 22 frames) as the preset threshold of each sub-band spectral entropy/> , as shown in Table 3. If the first/> Frame/> Energy spectrum of subbands/> and non-speech signal frame/> Average energy spectrum of subbands/> The ratio exceeds the preset threshold/> , that is/> , and No./> Spectral entropy of subbands/> Less than the preset threshold/> (Table 3), that is/> , then put the first/> Likelihood ratio weights of all frequency points in the subband/> Set to 1; otherwise, set the /> The likelihood ratio weights of all frequency points in the subband are set to 0. details as follows:

; ;

式中表示第/>个频点的似然比权值,它归属第/>个子带,需特别说明的是,此处中的子带/>指的是按表1频率上限和下限划分的非重叠子带。in the formula Indicates the first/> The likelihood ratio weight of a frequency point, which belongs to the /> subband, it should be noted that here Subbands in/> Refers to the non-overlapping sub-bands divided by the upper and lower frequency limits of Table 1.

表3各子带谱熵的预设阈值对应表Table 3 Correspondence table of preset thresholds for spectral entropy of each subband

; ;

步骤S06:对似然比检验值按权值进行加权求和后计算平均值,最后根据似然比阈值判决第/>帧信号是否为语音。Step S06: Calculate the average value after weighting and summing the likelihood ratio test values according to the weight, and finally calculate the average value based on the likelihood ratio threshold. Judgment No./> Whether the frame signal is speech.

时,判决第/>帧信号为语音;当/>时,判决第/>帧信号为非语音。其中,/>when At that time, the judgment was The frame signal is speech; when/> At that time, the judgment was The frame signal is non-speech. Among them,/> .

将所提供的方法与传统似然比检验语音检测方法进行效果对比,通过检测结果举例和检测准确率统计进一步说明所提供方法的有效性。图4示给出了本实施例所提供方法与传统方法检测结果对比举例(第22帧-第294帧)。The effect of the provided method is compared with the traditional likelihood ratio test speech detection method, and the effectiveness of the provided method is further illustrated through examples of detection results and detection accuracy statistics. Figure 4 shows an example of comparison of detection results between the method provided in this embodiment and the traditional method (frame 22 - frame 294).

检测准确率对比如表4所示,在0dB信噪比(白噪声和嘈杂噪声)环境下,所提供方法相比传统方法在检测准确率上均有明显提高。The detection accuracy comparison is shown in Table 4. In the 0dB signal-to-noise ratio (white noise and noisy noise) environment, the detection accuracy of the provided method is significantly improved compared to the traditional method.

表4所提供方法与传统方法检测准确率对比Comparison of detection accuracy between the methods provided in Table 4 and traditional methods

; ;

实施例3Example 3

图5示出了本发明一种跨子带谱熵加权似然比语音检测系统的框图。Figure 5 shows a block diagram of a cross-subband spectral entropy weighted likelihood ratio speech detection system of the present invention.

本发明第二方面还提供了一种跨子带谱熵加权似然比语音检测系统5,该系统包括:存储器51、处理器52,所述存储器中包括一种跨子带谱熵加权似然比语音检测方法程序,所述一种跨子带谱熵加权似然比语音检测方法程序被所述处理器执行时实现如下步骤:The second aspect of the present invention also provides a cross-subband spectrum entropy weighted likelihood ratio speech detection system 5. The system includes: a memory 51 and a processor 52. The memory includes a cross-subband spectrum entropy weighted likelihood ratio speech detection system 5. Compared with the speech detection method program, when the cross-subband spectral entropy weighted likelihood ratio speech detection method program is executed by the processor, the following steps are implemented:

给定待检测采样信号,给定似然比判决阈值/>,给定时间区域有效性判决门限,给定傅里叶变换长度/>,对信号/>进行加窗分帧预处理,计算第/>帧信号各频点谱线的似然比检验值/>Given the sampling signal to be detected , given the likelihood ratio decision threshold/> , the validity judgment threshold of a given time zone , given the Fourier transform length/> , to signal/> Perform windowing and framing preprocessing to calculate the Likelihood ratio test value of each frequency point spectral line of the frame signal/> ;

在频域范围进行子带划分,所述子带为非均匀部分重叠;Divide sub-bands in the frequency domain, and the sub-bands are non-uniformly partially overlapping;

根据划分子带的频率上下限,计算第帧信号第/>个子带的能量谱/>,并计算第/>帧第/>个子带的谱熵/>According to the upper and lower frequency limits of divided sub-bands, calculate the Frame signal/> Energy spectrum of subbands/> , and calculate the /> Frame/> Spectral entropy of subbands/> ;

计算所有非语音信号帧第个子带的平均能量谱/>Calculate the number of all non-speech signal frames Average energy spectrum of subbands/> ;

根据第个子带谱熵大小以及第/>个子带的能量谱/>与第/>个子带平均能量谱/>的比值,设置子带的似然比权值/>According to Article The size of sub-band spectral entropy and the Energy spectrum of subbands/> And No./> Subband average energy spectrum/> The ratio of , sets the likelihood ratio weight of the subband/> ;

对似然比检验值按权值进行加权求和后计算平均值,根据似然比阈值判决第帧信号是否为语音。The likelihood ratio test values are weighted and summed according to the weights, and then the average value is calculated, and the first decision is made based on the likelihood ratio threshold. Whether the frame signal is speech.

本发明第三方面还提供一种计算机可读存储介质,所述计算机可读存储介质中包括一种跨子带谱熵加权似然比语音检测方法程序,所述一种跨子带谱熵加权似然比语音检测方法程序被处理器执行时,实现如上述任一项所述的一种跨子带谱熵加权似然比语音检测方法的步骤。A third aspect of the present invention also provides a computer-readable storage medium. The computer-readable storage medium includes a cross-subband spectral entropy weighted likelihood ratio speech detection method program, and the cross-subband spectral entropy weighted likelihood ratio speech detection method program When the likelihood ratio speech detection method program is executed by the processor, the steps of a cross-subband spectral entropy weighted likelihood ratio speech detection method as described in any one of the above are implemented.

在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods can be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods, such as: multiple units or components may be combined, or can be integrated into another system, or some features can be ignored, or not implemented. In addition, the coupling, direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be electrical, mechanical, or other forms. of.

上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated; the components shown as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本发明各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention can be all integrated into one processing unit, or each unit can be separately used as a unit, or two or more units can be integrated into one unit; the above-mentioned integration The unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps to implement the above method embodiments can be completed through hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, the execution includes: The steps of the above method embodiment; and the aforementioned storage media include: mobile storage devices, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc. The medium on which program code is stored.

或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated unit of the present invention is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present invention can be embodied in the form of software products in essence or those that contribute to the existing technology. The computer software products are stored in a storage medium and include a number of instructions to A computer device (which may be a personal computer, a server, a network device, etc.) is caused to execute all or part of the methods described in various embodiments of the present invention. The aforementioned storage media include: mobile storage devices, ROM, RAM, magnetic disks or optical disks and other media that can store program codes.

以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed by the present invention. should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (9)

1. A cross-subband spectrum entropy weighted likelihood ratio voice detection method is characterized by comprising the following steps:
step S01: given a sample signal to be detectedGiven likelihood ratio decision threshold +.>A given time zone effectiveness decision threshold +.>Given Fourier transform Length->For signals->Carrying out windowing and framing pretreatment, and calculating the +.>Likelihood ratio test value of each frequency point of frame signal>
Step S02: dividing sub-bands in a frequency domain range, wherein the sub-bands are non-uniform and partially overlapped;
step S03: according to the upper and lower frequency limits of the dividing sub-band in step S02, calculate the firstFrame signal->Energy spectrum of sub-bandAnd calculate->Frame->Spectral entropy of subband->
Step S04: calculate all non-speech signal framesAverage energy spectrum of subband +.>
Step S05: according to the firstSubband spectral entropy size, th +.>Energy spectrum of subband->And->Sub-band average energy spectrumSetting the likelihood ratio weight +.>
Step S06: weighting and summing the likelihood ratio test values according to weights, calculating an average value, and judging the first according to a likelihood ratio threshold valueWhether the frame signal is speech;
the step S05 specifically includes:
given the firstSpectral entropy preset threshold of subband +.>If->Frame->Energy spectrum of subband->Frame of non-speech signal->Average energy spectrum of subband +.>Is greater than a preset threshold->I.e. +.>And->Spectral entropy of subband->Less than a preset threshold->I.e. +.>Then->Likelihood ratio weight of all frequency points in each sub-band +.>Set to 1;
otherwise, then get the firstLikelihood ratio weight of all frequency points in each sub-band +.>Set to 0;
wherein the likelihood ratio weight isThe method comprises the following steps:
in the middle ofIndicate->Likelihood ratio weight of individual frequency points belonging to +.>Sub-bands, here->Sub-bands->Refers to pressing the lower limitAnd upper limit->The divided non-overlapping sub-bands.
2. The method for detecting the cross-subband spectral entropy weighted likelihood ratio voice according to claim 1, wherein the step S02 is specifically:
the frequency range of the sub-band division isWherein->The sampling rate of the signal is 8000Hz or 16000Hz; dividing the whole frequency range into a low-frequency class frequency range and a high-frequency class frequency range, and setting the low-frequency class frequency range to be +.>Hz, high frequency class frequency range is +.>Hz;
According toDetermination ofThe number of sub-band divisions of the low frequency class band and the high frequency class band is based on the number of sub-bands in the low frequency class band and the high frequency class band, and adjacent sub-bands are set to be partially overlapped.
3. The method for cross-subband spectral entropy weighted likelihood ratio speech detection of claim 2, wherein the method is based onDetermining the number of sub-band divisions of a low-frequency class frequency band and a high-frequency class frequency band, and carrying out sub-band division according to the number of sub-bands in the low-frequency class frequency band and the high-frequency class frequency band, wherein the method specifically comprises the following steps:
when sampling rateAt 8000Hz, the frequency band is divided into 5 sub-bands, the low frequency band is evenly divided into 2 sub-bands, and the width of each sub-band is +.>The high frequency class frequency band is evenly divided into 3 sub-bands, each sub-band width is +.>
When sampling rateAt 16000Hz, the frequency band is divided into 10 sub-bands, the low frequency band is uniformly divided into 4 sub-bands, and the width of each sub-band is +.>The high frequency class frequency band is evenly divided into 6 sub-bands, each sub-band width is +.>
The sub-bands obtained after division are non-overlapped, and the boundary frequency of each sub-band obtained by division is regarded as the upper frequency limit sum of each sub-bandA lower frequency limit; set the firstThe upper frequency limit of the sub-band is +.>First->The lower frequency limit of the sub-band is +.>According to Fourier transform length->And->And->Calculating frequency points corresponding to the upper frequency limit and the lower frequency limit of each sub-band>And->
4. The method for detecting the cross-subband spectral entropy weighted likelihood ratio voice according to claim 2, wherein adjacent subbands are set to be partially overlapped, specifically:
is provided withFor the total number of subbands>For the frequency shift amount, for the former->A plurality of sub-bands, each sub-band being partially overlapped with the next sub-band by means of a backward frequency shift, i.e. when +.>When (I)>,/>,/>Is the +.>Upper frequency limit of sub-band, < >>Is the +.>Lower frequency limit of sub-band, < >>Is->The upper frequency limit of the sub-band,is->A lower frequency limit of the sub-band;
first, theThe sub-band is formed with the last sub-band by adopting a forward frequency shift modePartially overlapping, i.e. when->In the time-course of which the first and second contact surfaces,,/>
according toAnd->Calculating frequency points corresponding to the upper limit and the lower limit of each sub-band frequency +.>And->
5. The method for detecting voice by cross-subband spectral entropy weighted likelihood ratio according to claim 4, wherein step S03 is specifically:
according to the upper frequency limit of each sub-bandAnd frequency lower limit->Calculate +.>Frame->Energy spectrum of subband->,/>Representing from->To->Sum of energy spectra of all frequency spectrum lines, < ->Wherein->Indicate->Frame->Energy spectrum of each frequency point;
calculate the firstFrame->No. of sub-band>Normalized probability Density function of individual frequency Point spectral lines +.>,/>
Calculate the firstFrame->Spectral entropy of subband->,/>,/>Indicate->Frame->In subband->Normalized probability density function for individual frequency point spectral lines.
6. The method for detecting speech by cross-subband spectral entropy weighted likelihood ratio according to claim 4, wherein step S04 is specifically:
calculate the firstA non-speech signal frame->Energy spectrum of subband->,/>,/>Is->The non-speech signal is +>Energy spectrum of each frequency point;
calculate all non-speech signal framesAverage energy spectrum of subband +.>,/>,/>Is the total number of frames of the non-speech signal.
7. The method of claim 1, wherein the non-speech signal frames are in a range of time detected in the past, and are separated from each other by a third time periodAll signal frames detected as "non-speech" in one active time region where the frame signal time is nearest; before the first active time zone occurs, the signal to be detected is +.>All signal frames of the first 0.5s of (a) are regarded as non-speech signal frames.
8. The method for detecting the voice of the cross-subband spectral entropy weighted likelihood ratio according to claim 7, wherein the effective time area is specifically:
the time zone is a zone formed by a certain number of signal framesFor the total number of frames of the non-speech signal in the time zone, < >>A time zone effectiveness judgment threshold; when->When it is, then the time zone is considered valid; otherwise, the time zone is considered invalid.
9. A cross-subband spectral entropy weighted likelihood ratio speech detection system, the system comprising: memory, a processor comprising a cross-subband spectral entropy weighted likelihood ratio speech detection method program therein, the processor performing the steps of the cross-subband spectral entropy weighted likelihood ratio speech detection method according to any of claims 1-8.
CN202310963463.7A 2023-08-02 2023-08-02 Cross-subband spectral entropy weighted likelihood ratio voice detection method and system Active CN116665717B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310963463.7A CN116665717B (en) 2023-08-02 2023-08-02 Cross-subband spectral entropy weighted likelihood ratio voice detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310963463.7A CN116665717B (en) 2023-08-02 2023-08-02 Cross-subband spectral entropy weighted likelihood ratio voice detection method and system

Publications (2)

Publication Number Publication Date
CN116665717A CN116665717A (en) 2023-08-29
CN116665717B true CN116665717B (en) 2023-09-29

Family

ID=87715797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310963463.7A Active CN116665717B (en) 2023-08-02 2023-08-02 Cross-subband spectral entropy weighted likelihood ratio voice detection method and system

Country Status (1)

Country Link
CN (1) CN116665717B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962285A (en) * 2018-07-20 2018-12-07 浙江万里学院 A kind of sound end detecting method dividing subband based on human ear masking effect
CN110047519A (en) * 2019-04-16 2019-07-23 广州大学 A kind of sound end detecting method, device and equipment
CN113838476A (en) * 2021-09-24 2021-12-24 世邦通信股份有限公司 Noise estimation method and device for noisy speech
WO2022105570A1 (en) * 2020-11-17 2022-05-27 深圳壹账通智能科技有限公司 Speech endpoint detection method, apparatus and device, and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE477572T1 (en) * 2007-10-01 2010-08-15 Harman Becker Automotive Sys EFFICIENT SUB-BAND AUDIO SIGNAL PROCESSING, METHOD, APPARATUS AND ASSOCIATED COMPUTER PROGRAM

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962285A (en) * 2018-07-20 2018-12-07 浙江万里学院 A kind of sound end detecting method dividing subband based on human ear masking effect
CN110047519A (en) * 2019-04-16 2019-07-23 广州大学 A kind of sound end detecting method, device and equipment
WO2022105570A1 (en) * 2020-11-17 2022-05-27 深圳壹账通智能科技有限公司 Speech endpoint detection method, apparatus and device, and computer readable storage medium
CN113838476A (en) * 2021-09-24 2021-12-24 世邦通信股份有限公司 Noise estimation method and device for noisy speech

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An improved robust statistical voice activity detection based on sub-band periodic intensity;Weijun He et al.;2015 IEEE International Conference on Information and Automation;全文 *
基于方差和谱熵结合的语音端点检测方法;毛强等;常州工学院学报;第34卷(第2期);36-40+52 *

Also Published As

Publication number Publication date
CN116665717A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
US10504539B2 (en) Voice activity detection systems and methods
CN106486131B (en) A kind of method and device of speech de-noising
US8160877B1 (en) Hierarchical real-time speaker recognition for biometric VoIP verification and targeting
CN109410977B (en) Voice segment detection method based on MFCC similarity of EMD-Wavelet
CN103489446B (en) Based on the twitter identification method that adaptive energy detects under complex environment
CN104835498B (en) Method for recognizing sound-groove based on polymorphic type assemblage characteristic parameter
WO2021114733A1 (en) Noise suppression method for processing at different frequency bands, and system thereof
US20150081287A1 (en) Adaptive noise reduction for high noise environments
Yamashita et al. Nonstationary noise estimation using low-frequency regions for spectral subtraction
WO2014153800A1 (en) Voice recognition system
CN108899052B (en) A Parkinson&#39;s Speech Enhancement Method Based on Multiband Spectral Subtraction
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium, and terminal
CN106024010B (en) A Method of Dynamic Feature Extraction of Speech Signal Based on Formant Curve
Farooq et al. Wavelet based robust sub-band features for phoneme recognition
CN108682432B (en) Voice emotion recognition device
WO2022068440A1 (en) Howling suppression method and apparatus, computer device, and storage medium
CN111739562B (en) Voice activity detection method based on data selectivity and Gaussian mixture model
Chen et al. Improved voice activity detection algorithm using wavelet and support vector machine
CN112233657A (en) A speech enhancement method based on low-frequency syllable recognition
Sarkar et al. Automatic speech segmentation using average level crossing rate information
Gupta et al. Speech enhancement using MMSE estimation and spectral subtraction methods
CN118379986B (en) Keyword-based non-standard voice recognition method, device, equipment and medium
Kim et al. Physiologically-motivated synchrony-based processing for robust automatic speech recognition.
CN116665717B (en) Cross-subband spectral entropy weighted likelihood ratio voice detection method and system
CN113593604A (en) Method, device and storage medium for detecting audio quality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant