CN114639390A

CN114639390A - Voice noise analysis method and system

Info

Publication number: CN114639390A
Application number: CN202011499230.9A
Authority: CN
Inventors: 刘刚; 龚科
Original assignee: DMAI Guangzhou Co Ltd
Current assignee: DMAI Guangzhou Co Ltd
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2022-06-17
Anticipated expiration: 2040-12-15
Also published as: CN114639390B

Abstract

The invention provides a voice noise analysis method and a system, wherein the method comprises the following steps: acquiring voice data to be analyzed; extracting a noise audio segment only containing noise from voice data to be analyzed; determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade; and determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio clip. The noise level evaluation method based on the noise level of the voice data to be analyzed has the advantages that the noise intensity indexes of the noise audio frequency segments only containing noise are calculated to carry out independent analysis, then the noise level evaluation result of the whole voice data to be analyzed is determined according to the distribution condition of the noise intensity levels of all the noise audio frequency segments, the influence of normal voice is avoided, objective evaluation of the noise level of the voice data to be analyzed is achieved, reference audio is not needed, the application range is wider, and the noise conditions under various scenes can be accurately reflected.

Description

A method and system for analyzing speech noise

技术领域technical field

本发明涉及语音信号处理技术领域，具体涉及一种语音噪声分析方法及系统。The invention relates to the technical field of speech signal processing, in particular to a speech noise analysis method and system.

背景技术Background technique

随着移动互联网的高速发展，通讯软件的应用也越来越广泛，例如：越来越多的老师通过即时通讯软件对学生进行在线教学辅导，以替代传统的面对面教学方式。但是，在使用通讯软件时，噪声会严重影响通讯音频质量，在对噪声有较高要求的场所，例如：学生通过通讯软件在线收听老师录制的音频课程时，对音频课程内的音频噪声尽可能小，以提高授课效果。然而，由于在线教学音频数量巨大，传统依靠人工去分析每节课堂的噪声的方式，工作量庞大且分析结果具有极大主观性。With the rapid development of the mobile Internet, the application of communication software is becoming more and more extensive. For example, more and more teachers provide online teaching guidance to students through instant communication software, instead of the traditional face-to-face teaching method. However, when using communication software, noise will seriously affect the audio quality of communication. In places with high requirements for noise, for example, when students listen to the audio courses recorded by teachers online through communication software, the audio noise in the audio courses should be kept as low as possible. small to improve teaching effectiveness. However, due to the huge amount of online teaching audio, the traditional way of analyzing the noise of each classroom by manual labor is huge and the analysis results are highly subjective.

而现有技术中，客观评估噪声情况的指标评价方式(如信噪比、分段信噪比等)在在衡量一条音频的噪声情况时需要有与之严格时间对齐的语音内容完全相同的参考音频，而对于教学场景或者其他无法获得参考音频的情况下，现有噪声评估方法将无法进行噪声评估，因此，如何实现在没有参考音频的情况下对语音噪声的客观评估是一个亟待解决的问题。In the prior art, the index evaluation methods (such as signal-to-noise ratio, segmental signal-to-noise ratio, etc.) for objectively evaluating the noise situation need to have exactly the same reference as the strictly time-aligned speech content when measuring the noise situation of a piece of audio. Audio, and for teaching scenarios or other situations where reference audio cannot be obtained, existing noise evaluation methods will not be able to perform noise evaluation. Therefore, how to achieve an objective evaluation of speech noise without reference audio is an urgent problem to be solved. .

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明实施例提供了一种语音噪声分析方法及系统，以克服现有技术中对在没有参考音频的情况下，难以实现语音噪声客观评估的问题。In view of this, the embodiments of the present invention provide a speech noise analysis method and system to overcome the problem in the prior art that it is difficult to achieve objective evaluation of speech noise in the absence of reference audio.

本发明实施例提供了一种语音噪声分析方法，包括：An embodiment of the present invention provides a speech noise analysis method, including:

获取待分析语音数据；Obtain the voice data to be analyzed;

从所述待分析语音数据中提取出仅包含噪声的噪声音频片段；extracting noise audio segments that only contain noise from the speech data to be analyzed;

基于每个噪声音频片段的噪声强度指标和预设噪声强度划分等级，确定各所述噪声音频片段对应的噪声强度等级；Determine the noise intensity level corresponding to each of the noise audio segments based on the noise intensity index of each noise audio segment and the preset noise intensity classification level;

根据各所述噪声音频片段对应的噪声强度等级的分布情况，确定所述待分析语音数据的噪声水平评估结果。Determine the noise level evaluation result of the speech data to be analyzed according to the distribution of the noise intensity levels corresponding to each of the noise audio segments.

可选地，所述从所述待分析语音数据中提取出仅包含噪声的噪声音频片段，包括：Optionally, extracting a noise audio segment that only contains noise from the to-be-analyzed speech data includes:

基于所述待分析语音数据的总时长及预设提取时长周期，将所述待分析语音数据划分为多个音频片段；Dividing the to-be-analyzed speech data into multiple audio segments based on the total duration of the to-be-analyzed speech data and a preset extraction duration period;

将每个音频片段转换为幅度谱；Convert each audio clip to an amplitude spectrum;

将每个音频片段对应的幅度谱输入预设噪声分类模型，得到每个音频片段对应的仅包含噪声的概率；Input the amplitude spectrum corresponding to each audio clip into the preset noise classification model, and obtain the probability that each audio clip contains only noise;

基于预设概率阈值从音频片段中筛选仅包含噪声的噪声音频片段。Filters noisy audio clips containing only noise from audio clips based on a preset probability threshold.

可选地，所述基于每个噪声音频片段的噪声强度指标和预设噪声强度划分等级，确定各所述噪声音频片段对应的噪声强度等级，包括：Optionally, determining the noise intensity level corresponding to each of the noise audio clips based on the noise intensity index of each noise audio clip and a preset noise intensity classification level, including:

分别计算每个噪声音频片段对应的噪声强度指标；Calculate the noise intensity index corresponding to each noise audio segment separately;

获取所述预设噪声强度划分等级中不同噪声强度等级对应的噪声强度指标范围；Obtaining noise intensity index ranges corresponding to different noise intensity levels in the preset noise intensity classification levels;

根据当前噪声音频片段对应的噪声强度指标，确定所述当前噪声音频片段对应的当前噪声强度指标范围；According to the noise intensity index corresponding to the current noise audio clip, determine the current noise intensity index range corresponding to the current noise audio clip;

将所述当前噪声强度指标范围对应的噪声强度等级确定为所述当前噪声音频片段的噪声强度等级。The noise intensity level corresponding to the current noise intensity index range is determined as the noise intensity level of the current noise audio segment.

可选地，所述根据各所述噪声音频片段对应的噪声强度等级的分布情况，确定所述待分析语音数据的噪声水平，包括：Optionally, determining the noise level of the to-be-analyzed speech data according to the distribution of noise intensity levels corresponding to each of the noise audio segments includes:

获取各所述噪声音频片段中不同噪声强度等级的占比；obtaining the proportions of different noise intensity levels in each of the noise audio clips;

根据不同噪声强度等级的占比及预设占比评价指标，确定所述待分析语音数据的噪声水平评估结果。Determine the noise level evaluation result of the speech data to be analyzed according to the proportion of different noise intensity levels and the preset proportion evaluation index.

可选地，所述噪声强度等级包括：高强度噪声等级、中强度噪声等级和低强度噪声等级。Optionally, the noise intensity levels include: high-intensity noise levels, medium-intensity noise levels, and low-intensity noise levels.

可选地，所述根据不同噪声强度等级的占比及预设占比评价指标，确定所述待分析语音数据的噪声水平评估结果，包括：Optionally, determining the noise level evaluation result of the speech data to be analyzed according to the ratio of different noise intensity levels and a preset ratio evaluation index, including:

获取高强度噪声等级的占比；Obtain the proportion of high-intensity noise levels;

根据所述高强度噪声等级的占比与所述预设占比评价指标中预设高强度噪声等级占比范围的关系，确定所述待分析语音数据的噪声水平评估结果。According to the relationship between the proportion of the high-intensity noise level and the preset proportion range of the high-intensity noise level in the preset proportion evaluation index, the noise level evaluation result of the speech data to be analyzed is determined.

可选地，所述噪声水平评估结果包括：噪声水平低、噪声水平适中和噪声水平高，其中，Optionally, the noise level evaluation result includes: low noise level, moderate noise level and high noise level, wherein,

当所述高强度噪声等级的占比小于所述预设占比评价指标中预设高强度噪声等级占比范围的最小值时，判断所述噪声水平评估结果为噪声水平低；When the proportion of the high-intensity noise level is less than the minimum value of the preset proportion of the high-intensity noise level in the preset proportion evaluation index, it is determined that the noise level evaluation result is that the noise level is low;

当所述高强度噪声等级的占比在所述预设占比评价指标中预设高强度噪声等级占比范围内时，判断所述噪声水平评估结果为噪声水平适中；When the proportion of the high-intensity noise level is within the preset proportion of the high-intensity noise level in the preset proportion evaluation index, judging that the noise level evaluation result is a moderate noise level;

当所述高强度噪声等级的占比大于所述预设占比评价指标中预设高强度噪声等级占比范围的最大值时，判断所述噪声水平评估结果为噪声水平高。When the proportion of the high-intensity noise level is greater than the maximum value of the preset high-intensity noise level proportion range in the preset proportion evaluation index, it is determined that the noise level evaluation result is a high noise level.

本发明实施例还提供了一种语音噪声分析系统，包括：The embodiment of the present invention also provides a speech noise analysis system, including:

获取模块，用于获取待分析语音数据；an acquisition module for acquiring the voice data to be analyzed;

第一处理模块，用于从所述待分析语音数据中提取出仅包含噪声的噪声音频片段；a first processing module for extracting noise audio segments that only contain noise from the to-be-analyzed speech data;

第二处理模块，用于基于每个噪声音频片段的噪声强度指标和预设噪声强度划分等级，确定各所述噪声音频片段对应的噪声强度等级；The second processing module is configured to determine the noise intensity level corresponding to each of the noise audio segments based on the noise intensity index of each noise audio segment and the preset noise intensity classification level;

第三处理模块，用于根据各所述噪声音频片段对应的噪声强度等级的分布情况，确定所述待分析语音数据的噪声水平评估结果。The third processing module is configured to determine the noise level evaluation result of the to-be-analyzed speech data according to the distribution of the noise intensity levels corresponding to each of the noise audio segments.

本发明实施例还提供了一种电子设备，包括：存储器和处理器，所述存储器和所述处理器之间互相通信连接，所述存储器中存储有计算机指令，所述处理器通过执行所述计算机指令，从而执行本发明实施例提供的语音噪声分析方法。An embodiment of the present invention further provides an electronic device, including: a memory and a processor, the memory and the processor are connected in communication with each other, the memory stores computer instructions, and the processor executes the computer instructions, thereby executing the speech noise analysis method provided by the embodiments of the present invention.

本发明实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质存储计算机指令，所述计算机指令用于使所述计算机执行本发明实施例提供的语音噪声分析方法。Embodiments of the present invention further provide a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the speech noise analysis method provided by the embodiments of the present invention.

本发明技术方案，具有如下优点：The technical scheme of the present invention has the following advantages:

本发明实施例提供了一种语音噪声分析方法及系统，通过获取待分析语音数据；从待分析语音数据中提取出仅包含噪声的噪声音频片段；基于每个噪声音频片段的噪声强度指标和预设噪声强度划分等级，确定各噪声音频片段对应的噪声强度等级；根据各噪声音频片段对应的噪声强度等级的分布情况，确定待分析语音数据的噪声水平评估结果。从而通过计算仅包含噪声的噪声音频片段的噪声强度指标来对各个噪声音频片段的噪声强度等级进行单独分析，然后根据所有噪声音频片段的噪声强度等级的分布情况确定整个待分析语音数据的噪声水平评估结果，避免了待分析语音数据中正常语音的影响，实现了对待分析语音数据噪声水平的客观评估，并且无需参考音频，应用范围更广，能够准确的反映各种场景下的噪声情况。Embodiments of the present invention provide a speech noise analysis method and system, by acquiring speech data to be analyzed; extracting noise audio segments containing only noise from the speech data to be analyzed; based on the noise intensity index and prediction of each noise audio segment The noise intensity level is set to determine the noise intensity level corresponding to each noise audio segment; the noise level evaluation result of the speech data to be analyzed is determined according to the distribution of the noise intensity level corresponding to each noise audio segment. Therefore, the noise intensity level of each noise audio segment is separately analyzed by calculating the noise intensity index of the noise audio segment containing only noise, and then the noise level of the entire speech data to be analyzed is determined according to the distribution of the noise intensity levels of all noise audio segments. The evaluation result avoids the influence of normal speech in the speech data to be analyzed, realizes an objective evaluation of the noise level of the speech data to be analyzed, and does not require reference audio, has a wider application range, and can accurately reflect the noise in various scenarios.

附图说明Description of drawings

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the specific embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the specific embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without creative efforts.

图1为本发明实施例中的语音噪声分析方法的流程图；1 is a flowchart of a method for analyzing speech noise in an embodiment of the present invention;

图2为本发明实施例中的将每个音频片段对应的幅度谱输入预设噪声分类模型，得到每个音频片段对应的仅包含噪声的概率的过程示意图；2 is a schematic diagram of the process of inputting the amplitude spectrum corresponding to each audio clip into a preset noise classification model to obtain a probability corresponding to each audio clip that only contains noise according to an embodiment of the present invention;

图3为本发明实施例中的语音噪声分析系统的结构示意图；3 is a schematic structural diagram of a speech noise analysis system in an embodiment of the present invention;

图4为本发明实施例中的电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of the present invention.

下面所描述的本发明不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。The technical features involved in the different embodiments of the present invention described below can be combined with each other as long as there is no conflict with each other.

随着移动互联网的高速发展，在线教育渐渐取代传统的教育方式，目前越来越多的老师通过即时通讯软件对学生进行教学辅导，这使得智能化分析课堂情况具有更大的便捷性。噪音是影响学生课堂学习质量的一种因素，因此有必要检测噪声以为学生创造一个安静的环境，保证学习效果。然而目前每天都能产生海量的在线教学音视频，人工去分析每节课堂的噪声情况工作量庞大且分析结果具有极大主观性，对音频噪声的客观智能化分析尤为必要。With the rapid development of the mobile Internet, online education has gradually replaced traditional education methods. At present, more and more teachers use instant messaging software to provide teaching guidance to students, which makes it more convenient to analyze the classroom situation intelligently. Noise is a factor that affects the quality of students' classroom learning, so it is necessary to detect noise to create a quiet environment for students to ensure the learning effect. However, at present, a large number of online teaching audio and video are produced every day. The manual analysis of the noise in each classroom requires a huge workload and the analysis results are highly subjective. Therefore, the objective and intelligent analysis of audio noise is particularly necessary.

目前，客观评估噪声情况的指标(如信噪比、分段信噪比等)在应用时都有极大的限制，即在衡量一条教学音频的噪声情况时需要有与之严格时间对齐的语音内容完全相同的参考音频，这在教学场景是极难获取到的。因此，如何实现非参考情况下的噪声评估亟待解决。At present, the indicators (such as signal-to-noise ratio, segmental signal-to-noise ratio, etc.) for objectively evaluating the noise situation have great limitations in application, that is, when measuring the noise situation of a teaching audio, it is necessary to have a speech that is strictly time-aligned with it. Reference audio with the exact same content, which is extremely difficult to obtain in teaching scenarios. Therefore, how to realize the noise evaluation in the non-reference situation needs to be solved urgently.

本发明实施例提供了一种语音噪声分析方法，可应用于在线教学平台的噪声分析，如图1所示，该语音噪声分析方法主要包括如下步骤：An embodiment of the present invention provides a voice noise analysis method, which can be applied to noise analysis of an online teaching platform. As shown in FIG. 1 , the voice noise analysis method mainly includes the following steps:

步骤S101：获取待分析语音数据。具体地，该待分析语音数据为包含有噪声的音频数据，例如：在线教学平台上录制的教学音频或者是从包含有语音数据的教学视频提取相应的音频数据等。待分析语音数据的获取方式可以是直接下载音频数据或者从预设的待分析语音数据库中进行提取等，本发明并不以此为限。Step S101: Acquire the speech data to be analyzed. Specifically, the voice data to be analyzed is audio data containing noise, such as: teaching audio recorded on an online teaching platform or corresponding audio data extracted from a teaching video containing voice data. The acquisition method of the voice data to be analyzed may be directly downloading the audio data or extracting from a preset voice database to be analyzed, etc., and the present invention is not limited to this.

步骤S102：从待分析语音数据中提取出仅包含噪声的噪声音频片段。具体地，由于在包含噪声的待分析语音数据中既包含有正常的语音也包含有噪声，为了避免评估噪声需要参考音频即正常语音，通过提取出仅包含噪声的噪声音频片段的方式，可以直接通过提取音频的声量或能量等噪声强度指标来直观衡量噪声的大小。Step S102 : extracting noise audio segments that only contain noise from the speech data to be analyzed. Specifically, since the speech data to be analyzed that contains noise contains both normal speech and noise, in order to avoid evaluating the noise, it is necessary to refer to the audio, that is, the normal speech. Visually measure the size of the noise by extracting noise intensity indicators such as the volume or energy of the audio.

步骤S103：基于每个噪声音频片段的噪声强度指标和预设噪声强度划分等级，确定各噪声音频片段对应的噪声强度等级。具体地，不同的噪声音频片段噪声能量或声量差异存在较大的差异，更通过对每个噪声音频片段划分等级的方式可以更为直观地对各个噪声音频片段进行对比，便于后续对整个待分析语音数据的噪声水平评估，Step S103: Determine the noise intensity level corresponding to each noise audio segment based on the noise intensity index of each noise audio segment and the preset noise intensity classification level. Specifically, there is a large difference in the noise energy or sound volume of different noise audio clips, and each noise audio clip can be compared more intuitively by classifying each noise audio clip, which is convenient for subsequent analysis of the entire to-be-analyzed Noise level assessment of speech data,

步骤S104：根据各噪声音频片段对应的噪声强度等级的分布情况，确定待分析语音数据的噪声水平评估结果。具体地，由于在一个完整的待分析语音数据中包含有很多噪声音频片段，为了提高对整个待分析语音数据噪声评估的精确性，通过考虑所有噪声音频片段噪声强度等级的分布来得出噪声水平评估结果，实现了对待分析语音数据的客观噪声评估。Step S104: Determine the noise level evaluation result of the speech data to be analyzed according to the distribution of the noise intensity levels corresponding to each noise audio segment. Specifically, since there are many noise audio segments in a complete speech data to be analyzed, in order to improve the accuracy of noise evaluation of the entire speech data to be analyzed, the noise level evaluation is obtained by considering the distribution of noise intensity levels of all noise audio segments As a result, an objective noise assessment of the speech data to be analyzed is achieved.

通过上述步骤S101至步骤S104，本发明实施例提供的语音噪声分析方法，通过计算仅包含噪声的噪声音频片段的噪声强度指标来对各个噪声音频片段的噪声强度等级进行单独分析，然后根据所有噪声音频片段的噪声强度等级的分布情况确定整个待分析语音数据的噪声水平评估结果，避免了待分析语音数据中正常语音的影响，实现了对待分析语音数据噪声水平的客观评估，并且无需参考音频，应用范围更广，能够准确的反映各种场景下的噪声情况。Through the above steps S101 to S104, the speech noise analysis method provided by the embodiment of the present invention separately analyzes the noise intensity level of each noise audio segment by calculating the noise intensity index of the noise audio segment containing only noise, and then according to all noise The distribution of the noise intensity level of the audio segment determines the noise level evaluation result of the entire speech data to be analyzed, avoids the influence of normal speech in the speech data to be analyzed, and realizes the objective evaluation of the noise level of the speech data to be analyzed, and does not need to refer to audio, It has a wider range of applications and can accurately reflect the noise in various scenarios.

具体地，在一实施例中，上述的步骤S102具体包括如下步骤：Specifically, in an embodiment, the above-mentioned step S102 specifically includes the following steps:

步骤S201：基于待分析语音数据的总时长及预设提取时长周期，将待分析语音数据划分为多个音频片段。具体地，根据总时长的时间轴将待分析语音数据划分为若干个等长的时长比较小的音频片段，预设提取时长周期可以根据总时长及噪音分析的精度需求进行灵活的设置，例如1s，3s等，本发明并不以此为限。Step S201: Divide the speech data to be analyzed into a plurality of audio segments based on the total duration of the speech data to be analyzed and a preset extraction duration period. Specifically, the speech data to be analyzed is divided into several audio clips of equal length with relatively small duration according to the time axis of the total duration. The preset extraction duration period can be flexibly set according to the total duration and the accuracy requirements of noise analysis, for example, 1s , 3s, etc., the present invention is not limited to this.

步骤S202：将每个音频片段转换为幅度谱。具体地，通过对每个音频片段进行分帧，傅里叶变换，求幅度以及对幅度归一化等操作将每个音频片段转化为幅度谱。Step S202: Convert each audio segment into an amplitude spectrum. Specifically, each audio segment is converted into an amplitude spectrum by performing framing, Fourier transform, and amplitude normalization operations on each audio segment.

步骤S203：将每个音频片段对应的幅度谱输入预设噪声分类模型，得到每个音频片段对应的仅包含噪声的概率。该预设噪声分类模型是通过事先建立的分类模型，该分类模型的输入为音频片段，输出为预测该音频片段仅包含噪声的概率，并利用大量的已知音频片段对该分类模型进行训练后得到的。Step S203: Input the amplitude spectrum corresponding to each audio segment into a preset noise classification model to obtain a probability corresponding to each audio segment that only contains noise. The preset noise classification model is a classification model established in advance. The input of the classification model is an audio clip, and the output is the probability that the audio clip contains only noise. After training the classification model with a large number of known audio clips owned.

在本发明实施例中，如图2所示，分类模型以mobilenet－v2为主干网络，进一步得到音频的若干个深度特征，然后将这些深度特征进行聚合得到音频的稠密特征最后送入分类器进行分类。其中主干网络mobilenet－v2采用深度可分离卷积代替传统的卷积，推理速度更快，其在业界已广泛使用，这里不再深入介绍；在特征聚合阶段，采用更有效的特征聚合方法NetVLAD Pooling。假设主干网络得到的深度特征为{x₁,x₂,…,x_T}，NetVLADPooling的中间输出为一个K×D的矩阵V，K表示预先定义的聚类数，D表示每个聚类中心的维度大小，则矩阵V的每一个行通过下式得到：In the embodiment of the present invention, as shown in Figure 2, the classification model uses mobilenet-v2 as the backbone network, and further obtains several deep features of the audio, and then aggregates these deep features to obtain the dense features of the audio and finally sends them to the classifier for processing. Classification. Among them, the backbone network mobilenet-v2 uses depthwise separable convolution instead of traditional convolution, and the reasoning speed is faster. It has been widely used in the industry and will not be introduced in depth here; in the feature aggregation stage, a more effective feature aggregation method NetVLAD Pooling is used. . Assuming that the deep features obtained by the backbone network are {x ₁ ,x ₂ ,…,x _T }, the intermediate output of NetVLAD Pooling is a K×D matrix V, where K represents the predefined number of clusters, and D represents the center of each cluster , then each row of matrix V is obtained by the following formula:

其中{w_k}，{b_k}，{c_k}为训练参数，跟随分类模型一起训练。将矩阵V进行L2正则化后拼接在一起即为NetVLAD Pooling聚合的特征，之后送入全连接层进行二分类。整个分类模型采用二元交叉熵损失函数作为目标进行训练。Among them, {w _k }, {b _k }, {c _k } are training parameters, which are trained together with the classification model. The matrix V is L2 regularized and spliced together is the feature of NetVLAD Pooling aggregation, and then sent to the fully connected layer for binary classification. The entire classification model is trained with a binary cross-entropy loss function as the target.

步骤S204：基于预设概率阈值从音频片段中筛选仅包含噪声的噪声音频片段。具体地，将上述步骤S203得到的音频片段的概率与预设概率阈值进行比较判断，如果概率值超过该阈值的则表示该音频片段仅包含噪声，否则该音频还包含有人声等非噪声，然后保留所有判断为仅包含噪声的音频片段即噪声音频片段，舍弃其他音频片段。Step S204: Screen the noise audio clips containing only noise from the audio clips based on a preset probability threshold. Specifically, the probability of the audio segment obtained in the above step S203 is compared with the preset probability threshold, and if the probability value exceeds the threshold, it means that the audio segment only contains noise, otherwise the audio also contains non-noise such as human voice, and then All audio clips judged to contain only noise, namely noise audio clips, are retained, and other audio clips are discarded.

具体地，在一实施例中，上述的步骤S103具体包括如下步骤：Specifically, in an embodiment, the above-mentioned step S103 specifically includes the following steps:

步骤S301：分别计算每个噪声音频片段对应的噪声强度指标。具体地，该噪声强度指标可以是噪声的能量或音量等可以反映噪声大小程度的指标，在本发明实施例中，采用噪声能量指标，即计算噪声音频片段的能量。假设噪声音频片段用a＝{a₁,a₂,…,a_N}表示，N为该音频所包含的样本点数，则该音频的能量通过计算方法为：Step S301: Calculate the noise intensity index corresponding to each noise audio segment respectively. Specifically, the noise intensity index may be an index that can reflect the magnitude of the noise, such as noise energy or volume. In the embodiment of the present invention, the noise energy index is used, that is, the energy of the noise audio segment is calculated. Assuming that the noise audio segment is represented by a={a ₁ ,a ₂ ,...,a _N }, and N is the number of sample points contained in the audio, the energy of the audio is calculated as:

其中，energy表示该音频的能量，t表示该音频的时长，N为该音频所包含的样本点数，a₁,a₂,…,a_N表示该音频每个样本点的能量值。Among them, energy represents the energy of the audio, t represents the duration of the audio, N is the number of sample points contained in the audio, and a ₁ , a ₂ , ..., a _N represents the energy value of each sample point of the audio.

步骤S302：获取预设噪声强度划分等级中不同噪声强度等级对应的噪声强度指标范围。具体地，在本发明实施例中以预设噪声强度划分等级包括低强度噪声等级、中强度噪声等级及高强度噪声等级为例，划分依据为噪声强度指标范围，在本发明实施例中，通过预先设定高低两个能量阈值T_l、T_h，能量小于低阈值的为低强度噪声，介于低阈值和高阈值之间的为中强度噪声，大于高阈值的划分为高强度噪声。Step S302: Acquire the noise intensity index ranges corresponding to different noise intensity levels in the preset noise intensity classification levels. Specifically, in the embodiment of the present invention, the preset noise intensity classification levels include low-intensity noise levels, medium-intensity noise levels, and high-intensity noise levels as an example, and the division basis is the range of noise intensity indicators. Two high and low energy thresholds T _l and T _h are preset, and the energy less than the low threshold is low-intensity noise, the energy between the low threshold and the high threshold is medium-intensity noise, and the energy greater than the high threshold is classified as high-intensity noise.

步骤S303：根据当前噪声音频片段对应的噪声强度指标，确定当前噪声音频片段对应的当前噪声强度指标范围。通过上述步骤S301计算的噪声能量值与上述步骤S302中两个能量阈值T_l、T_h的关系，确定其所属的当前噪声强度指标范围。Step S303: Determine the current noise intensity index range corresponding to the current noise audio segment according to the noise intensity index corresponding to the current noise audio segment. According to the relationship between the noise energy value calculated in the above step S301 and the two energy thresholds T _l and _Th in the above step S302 , the current noise intensity index range to which it belongs is determined.

步骤S304：将当前噪声强度指标范围对应的噪声强度等级确定为当前噪声音频片段的噪声强度等级。具体地，假设当前噪声音频片段对应的能量值为A，且T_l＜A＜T_h，则该噪声音频片段所属的当前噪声强度指标范围与中强度噪声等级对应，则将该噪声音频片段的噪声等级确定为中强度噪声等级。Step S304: Determine the noise intensity level corresponding to the current noise intensity index range as the noise intensity level of the current noise audio segment. Specifically, assuming that the energy value corresponding to the current noise audio clip is A, and T _l <A<T _h , then the current noise intensity index range to which the noise audio clip belongs corresponds to the medium-intensity noise level, then the noise audio clip’s The noise level is determined as a medium-intensity noise level.

具体地，在一实施例中，上述的步骤S104具体包括如下步骤：Specifically, in an embodiment, the above-mentioned step S104 specifically includes the following steps:

步骤S401：获取各噪声音频片段中不同噪声强度等级的占比。具体地，通过计算整个待分析语音数据中各噪声音频片段中属于低强度噪声等级的噪声音频片段在总噪声音频片段数量中的比例，以及中强度噪声等级的噪声音频片段对应的比例和高强度噪声等级的噪声音频片段对应的比例。Step S401: Obtain the proportions of different noise intensity levels in each noise audio segment. Specifically, by calculating the proportion of the noise audio segments belonging to the low-intensity noise level in the total number of noise audio segments among the noise audio segments in the entire to-be-analyzed speech data, and the proportion and high-intensity corresponding to the noise audio segments of the medium-intensity noise level The scale corresponding to the noise level of the noisy audio segment.

步骤S402：根据不同噪声强度等级的占比及预设占比评价指标，确定待分析语音数据的噪声水平评估结果。具体地，该预设占比评价指标可以根据实际需要进行设定，例如：占比最大的噪声强度等级作为待分析语音数据的噪声水平评估结果，或者，也可以为不同噪声强度等级占比设置权重，再对加权后的占比进行比较，将加权后占比最大的噪声强度等级作为待分析语音数据的噪声水平评估结果等，本发明并不以此为限。Step S402: Determine the noise level evaluation result of the speech data to be analyzed according to the proportion of different noise intensity levels and the preset proportion evaluation index. Specifically, the preset ratio evaluation index can be set according to actual needs, for example, the noise intensity level with the largest proportion is used as the noise level evaluation result of the speech data to be analyzed, or it can also be set for the ratio of different noise intensity levels The weighted proportions are compared, and the noise intensity level with the largest proportion after weighting is used as the noise level evaluation result of the speech data to be analyzed, etc. The present invention is not limited to this.

在本发明实施例中，在更为关注影响学生学习的高强度噪声的情况下，为了提高噪声水平评估结果对高强度噪声的敏感性，通过综合每个噪声音频片段的噪声等级结果得到高强度噪声等级的占比，根据高强度噪声等级的占比与预设占比评价指标中预设高强度噪声等级占比范围的关系，确定待分析语音数据的噪声水平评估结果。当高强度噪声等级的占比小于预设占比评价指标中预设高强度噪声等级占比范围的最小值时，判断噪声水平评估结果为噪声水平低；当高强度噪声等级的占比在预设占比评价指标中预设高强度噪声等级占比范围内时，判断噪声水平评估结果为噪声水平适中；当高强度噪声等级的占比大于预设占比评价指标中预设高强度噪声等级占比范围的最大值时，判断噪声水平评估结果为噪声水平高。在实际应用中，该预设高强度噪声等级占比范围也可以通过设定高低两个阈值T_L、T_H来确定，如果小于T_L表示待分析语音数据的整体噪声水平低，介于T_L和T_H之间表示待分析语音数据的整体噪声水平适中，大于T_H表示待分析语音数据的整体噪声水平高。In the embodiment of the present invention, in the case of paying more attention to the high-intensity noise affecting students' learning, in order to improve the sensitivity of the noise level evaluation result to the high-intensity noise, the high-intensity noise is obtained by synthesizing the noise level results of each noise audio segment The proportion of noise level, according to the relationship between the proportion of high-intensity noise level and the preset proportion of high-intensity noise level in the preset proportion evaluation index, to determine the noise level evaluation result of the speech data to be analyzed. When the proportion of high-intensity noise levels is less than the minimum value of the preset proportion of high-intensity noise levels in the preset proportion evaluation index, it is judged that the noise level evaluation result is low noise level; When the proportion of the preset high-intensity noise level in the proportion evaluation index is within the proportion of the preset high-intensity noise level, it is judged that the noise level evaluation result is a moderate noise level; when the proportion of the high-intensity noise level is greater than the preset high-intensity noise level in the preset proportion evaluation index When the ratio is the maximum value of the range, it is judged that the noise level evaluation result is high noise level. In practical applications, the preset high-intensity noise level ratio range can also be determined by setting two high and low thresholds _TL and _TH . If it is less than _TL , it means that the overall noise level of the speech data to be analyzed is low, and between T Between _L and _TH indicates that the overall noise level of the speech data to be analyzed is moderate, and greater than _TH indicates that the overall noise level of the speech data to be analyzed is high.

通过执行上述步骤，本发明实施例提供的语音噪声分析方法，通过获取待分析语音数据；从待分析语音数据中提取出仅包含噪声的噪声音频片段；基于每个噪声音频片段的噪声强度指标和预设噪声强度划分等级，确定各噪声音频片段对应的噪声强度等级；根据各噪声音频片段对应的噪声强度等级的分布情况，确定待分析语音数据的噪声水平评估结果。从而通过计算仅包含噪声的噪声音频片段的噪声强度指标来对各个噪声音频片段的噪声强度等级进行单独分析，然后根据所有噪声音频片段的噪声强度等级的分布情况确定整个待分析语音数据的噪声水平评估结果，避免了待分析语音数据中正常语音的影响，实现了对待分析语音数据噪声水平的客观评估，并且无需参考音频，应用范围更广，能够准确的反映各种场景下的噪声情况。By performing the above steps, the speech noise analysis method provided by the embodiment of the present invention obtains the speech data to be analyzed; extracts noise audio segments containing only noise from the speech data to be analyzed; based on the noise intensity index of each noise audio segment and The noise intensity level is preset, and the noise intensity level corresponding to each noise audio segment is determined; according to the distribution of the noise intensity level corresponding to each noise audio segment, the noise level evaluation result of the speech data to be analyzed is determined. Therefore, the noise intensity level of each noise audio segment is separately analyzed by calculating the noise intensity index of the noise audio segment containing only noise, and then the noise level of the entire speech data to be analyzed is determined according to the distribution of the noise intensity levels of all noise audio segments. The evaluation result avoids the influence of normal speech in the speech data to be analyzed, realizes an objective evaluation of the noise level of the speech data to be analyzed, and does not require reference audio, has a wider application range, and can accurately reflect the noise in various scenarios.

本发明实施例还提供了一种语音噪声分析系统，如图3所示，该语音噪声分析系统包括：An embodiment of the present invention also provides a speech noise analysis system, as shown in FIG. 3 , the speech noise analysis system includes:

获取模块101，用于获取待分析语音数据。详细内容参见上述方法实施例中步骤S101的相关描述，在此不再进行赘述。The acquiring module 101 is used for acquiring the speech data to be analyzed. For details, please refer to the relevant description of step S101 in the above method embodiments, which will not be repeated here.

噪声提取模块102，用于从待分析语音数据中提取出仅包含噪声的噪声音频片段。详细内容参见上述方法实施例中步骤S102的相关描述，在此不再进行赘述。The noise extraction module 102 is configured to extract a noise audio segment containing only noise from the speech data to be analyzed. For details, refer to the relevant description of step S102 in the above method embodiments, which will not be repeated here.

噪声估计模块103，用于基于每个噪声音频片段的噪声强度指标和预设噪声强度划分等级，确定各噪声音频片段对应的噪声强度等级。详细内容参见上述方法实施例中步骤S103的相关描述，在此不再进行赘述。The noise estimation module 103 is configured to determine the noise intensity level corresponding to each noise audio segment based on the noise intensity index of each noise audio segment and the preset noise intensity classification level. For details, please refer to the relevant description of step S103 in the above method embodiments, which will not be repeated here.

噪声统计模块104，用于根据各噪声音频片段对应的噪声强度等级的分布情况，确定待分析语音数据的噪声水平评估结果。详细内容参见上述方法实施例中步骤S104的相关描述，在此不再进行赘述。The noise statistics module 104 is configured to determine the noise level evaluation result of the speech data to be analyzed according to the distribution of the noise intensity levels corresponding to each noise audio segment. For details, refer to the relevant description of step S104 in the above method embodiments, and details are not repeated here.

通过上述各个组成部分的协同合作，本发明实施例提供的语音噪声分析系统，通过获取待分析语音数据；从待分析语音数据中提取出仅包含噪声的噪声音频片段；基于每个噪声音频片段的噪声强度指标和预设噪声强度划分等级，确定各噪声音频片段对应的噪声强度等级；根据各噪声音频片段对应的噪声强度等级的分布情况，确定待分析语音数据的噪声水平评估结果。从而通过计算仅包含噪声的噪声音频片段的噪声强度指标来对各个噪声音频片段的噪声强度等级进行单独分析，然后根据所有噪声音频片段的噪声强度等级的分布情况确定整个待分析语音数据的噪声水平评估结果，避免了待分析语音数据中正常语音的影响，实现了对待分析语音数据噪声水平的客观评估，并且无需参考音频，应用范围更广，能够准确的反映各种场景下的噪声情况。Through the cooperation of the above components, the speech noise analysis system provided by the embodiment of the present invention obtains the speech data to be analyzed; extracts noise audio segments containing only noise from the speech data to be analyzed; The noise intensity index and the preset noise intensity classification level are used to determine the noise intensity level corresponding to each noise audio segment; the noise level evaluation result of the speech data to be analyzed is determined according to the distribution of the noise intensity level corresponding to each noise audio segment. Therefore, the noise intensity level of each noise audio segment is separately analyzed by calculating the noise intensity index of the noise audio segment containing only noise, and then the noise level of the entire speech data to be analyzed is determined according to the distribution of the noise intensity levels of all noise audio segments. The evaluation result avoids the influence of normal speech in the speech data to be analyzed, realizes an objective evaluation of the noise level of the speech data to be analyzed, and does not require reference audio, has a wider application range, and can accurately reflect the noise in various scenarios.

根据本发明实施例还提供了一种电子设备，如图4所示，该电子设备可以包括处理器901和存储器902，其中处理器901和存储器902可以通过总线或者其他方式连接，图4中以通过总线连接为例。An electronic device is also provided according to an embodiment of the present invention. As shown in FIG. 4 , the electronic device may include a processor 901 and a memory 902, wherein the processor 901 and the memory 902 may be connected by a bus or in other ways. Connecting via a bus is an example.

处理器901可以为中央处理器(Central Processing Unit，CPU)。处理器901还可以为其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field－Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等芯片，或者上述各类芯片的组合。The processor 901 may be a central processing unit (Central Processing Unit, CPU). The processor 901 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components and other chips, or a combination of the above types of chips.

存储器902作为一种非暂态计算机可读存储介质，可用于存储非暂态软件程序、非暂态计算机可执行程序以及模块，如本发明方法实施例中的方法所对应的程序指令/模块。处理器901通过运行存储在存储器902中的非暂态软件程序、指令以及模块，从而执行处理器的各种功能应用以及数据处理，即实现上述方法实施例中的方法。As a non-transitory computer-readable storage medium, the memory 902 can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as program instructions/modules corresponding to the methods in the method embodiments of the present invention. The processor 901 executes various functional applications and data processing of the processor by running the non-transitory software programs, instructions and modules stored in the memory 902, ie, implements the methods in the above method embodiments.

存储器902可以包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需要的应用程序；存储数据区可存储处理器901所创建的数据等。此外，存储器902可以包括高速随机存取存储器，还可以包括非暂态存储器，例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施例中，存储器902可选包括相对于处理器901远程设置的存储器，这些远程存储器可以通过网络连接至处理器901。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 902 may include a storage program area and a storage data area, wherein the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created by the processor 901 and the like. Additionally, memory 902 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 902 may optionally include memory located remotely from processor 901, which may be connected to processor 901 via a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

一个或者多个模块存储在存储器902中，当被处理器901执行时，执行上述方法实施例中的方法。One or more modules are stored in the memory 902, and when executed by the processor 901, perform the methods in the above method embodiments.

上述电子设备具体细节可以对应参阅上述方法实施例中对应的相关描述和效果进行理解，此处不再赘述。The specific details of the above electronic device can be understood by referring to the corresponding related descriptions and effects in the above method embodiments, and details are not repeated here.

本领域技术人员可以理解，实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，存储介质可为磁碟、光盘、只读存储记忆体(Read－Only Memory，ROM)、随机存储记忆体(Random Access Memory，RAM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive，缩写：HDD)或固态硬盘(Solid－StateDrive，SSD)等；存储介质还可以包括上述种类的存储器的组合。Those skilled in the art can understand that the realization of all or part of the processes in the methods of the above embodiments can be completed by instructing the relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium, and the program can be executed when the program is executed. , may include the flow of the above-mentioned method embodiments. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random access memory (RAM), a flash memory (Flash Memory), a hard disk drive (Hard Disk Drive). , abbreviation: HDD) or solid-state hard disk (Solid-StateDrive, SSD), etc.; the storage medium may also include a combination of the above-mentioned types of memories.

虽然结合附图描述了本发明的实施例，但是本领域技术人员可以在不脱离本发明的精神和范围的情况下作出各种修改和变型，这样的修改和变型均落入由所附权利要求所限定的范围之内。Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, various modifications and variations can be made by those skilled in the art without departing from the spirit and scope of the present invention, such modifications and variations falling within the scope of the appended claims within the limited range.

Claims

1. A method for speech noise analysis, comprising:

acquiring voice data to be analyzed;

extracting a noise audio segment only containing noise from the voice data to be analyzed;

determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade;

and determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio segment.

2. The method according to claim 1, wherein the extracting noise audio segments containing only noise from the speech data to be analyzed comprises:

dividing the voice data to be analyzed into a plurality of audio segments based on the total time length of the voice data to be analyzed and a preset extraction time length period;

converting each audio clip into a magnitude spectrum;

inputting the amplitude spectrum corresponding to each audio clip into a preset noise classification model to obtain the probability of only containing noise corresponding to each audio clip;

and screening the noise audio clips only containing the noise from the audio clips based on a preset probability threshold.

3. The method of claim 1, wherein determining the noise intensity level corresponding to each noise audio segment based on the noise intensity indicator of each noise audio segment and a preset noise intensity classification level comprises:

respectively calculating the noise intensity index corresponding to each noise audio frequency segment;

acquiring noise intensity index ranges corresponding to different noise intensity levels in the preset noise intensity classification levels;

determining a current noise intensity index range corresponding to a current noise audio frequency segment according to a noise intensity index corresponding to the current noise audio frequency segment;

and determining the noise intensity level corresponding to the current noise intensity index range as the noise intensity level of the current noise audio frequency segment.

4. The method according to claim 3, wherein the determining the noise level of the speech data to be analyzed according to the distribution of the noise intensity level corresponding to each of the noise audio segments comprises:

obtaining the ratio of different noise intensity levels in each noise audio segment;

and determining a noise level evaluation result of the voice data to be analyzed according to the ratios of different noise intensity levels and preset ratio evaluation indexes.

5. The method of claim 4, wherein the noise strength level comprises: high intensity noise level, medium intensity noise level, and low intensity noise level.

6. The method according to claim 5, wherein the determining the noise level evaluation result of the speech data to be analyzed according to the ratios of different noise intensity levels and the preset ratio evaluation index comprises:

acquiring the ratio of the high-intensity noise level;

and determining a noise level evaluation result of the voice data to be analyzed according to the relation between the ratio of the high-intensity noise level and the ratio range of the preset high-intensity noise level in the preset ratio evaluation index.

7. The method of claim 6, wherein the noise level assessment result comprises: low, moderate and high noise levels, wherein,

when the ratio of the high-intensity noise level is smaller than the minimum value of the ratio range of the preset high-intensity noise level in the preset ratio evaluation index, judging that the noise level evaluation result is low in noise level;

when the proportion of the high-intensity noise level is within the preset proportion range of the high-intensity noise level in the preset proportion evaluation index, judging that the noise level evaluation result is that the noise level is moderate;

and when the ratio of the high-intensity noise level is larger than the maximum value of the ratio range of the preset high-intensity noise level in the preset ratio evaluation index, judging that the noise level evaluation result is that the noise level is high.

8. A speech noise analysis system, comprising:

the acquisition module is used for acquiring voice data to be analyzed;

the noise extraction module is used for extracting a noise audio segment only containing noise from the voice data to be analyzed;

the noise estimation module is used for determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade;

and the noise statistical module is used for determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio frequency segment.

9. An electronic device, comprising:

a memory and a processor communicatively coupled to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method of any of claims 1-7.

10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to thereby perform the method of any one of claims 1-7.