WO2023036016A1 - 一种应用于电力作业中的声纹识别的方法及系统 - Google Patents

一种应用于电力作业中的声纹识别的方法及系统 Download PDF

Info

Publication number
WO2023036016A1
WO2023036016A1 PCT/CN2022/115882 CN2022115882W WO2023036016A1 WO 2023036016 A1 WO2023036016 A1 WO 2023036016A1 CN 2022115882 W CN2022115882 W CN 2022115882W WO 2023036016 A1 WO2023036016 A1 WO 2023036016A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint
voiceprint information
information
electric power
cnn
Prior art date
Application number
PCT/CN2022/115882
Other languages
English (en)
French (fr)
Inventor
莫梓樱
朱明增
覃秋勤
吕鸣
刘小兰
陈极万
韩竞
李和峰
蒋志儒
黄新华
胡凯博
欧健美
温黎明
周素君
马红康
宋嗣皇
梁维
梁朝聪
罗晨怡
梁豪
奉华
Original Assignee
广西电网有限责任公司贺州供电局
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广西电网有限责任公司贺州供电局 filed Critical 广西电网有限责任公司贺州供电局
Publication of WO2023036016A1 publication Critical patent/WO2023036016A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches

Definitions

  • the invention relates to the field of computer technology, in particular to a method and system for voiceprint recognition applied in electric power operations.
  • Identity authentication technology is roughly divided into two branches: one is identity input, such as the account mode of logging in with a user name and password; the other is through identity markers, such as keys and certificates. These two methods have been accepted by the vast majority of people and are widely used. However, after some criminals obtain the identity of others through illegal means, they can pass the test smoothly, pretend to be others, and eventually cause heavy losses to the power grid system.
  • voiceprint is a kind of biological characteristics.
  • voiceprint is a long-term stable characteristic signal. Different individuals can be distinguished through voiceprint identification technology. Because each person's vocal tract and pronunciation organs are different, and acquired factors such as physical condition and living environment will cause the voices of different people to show differences on the spectrogram, so this feature is unique. Using this Features, machine learning and artificial intelligence methods can be used to identify different speakers.
  • the invention with application number 202011634585.4 discloses a voiceprint recognition method.
  • the method mainly includes the following steps: obtaining an audio file; cutting the file to obtain a valid audio file; encrypting the valid audio file to obtain encrypted audio information; sending a voiceprint recognition request, the request including the encrypted audio information.
  • the invention with application number 201610641491.7 discloses a voiceprint recognition system.
  • the method, device and voiceprint recognition system of the present invention can effectively solve the channel difference by collecting the channel characteristics of the sound in real time, preferentially selecting the voiceprint model with the channel characteristics for pattern matching, and establishing the voiceprint model library and the voiceprint model The impact on voiceprint recognition performance.
  • the identification technology based on traditional password authentication is not safe. Once the information is leaked, it will be stolen. Secondly, its convenience is also very poor. Users need to remember the account password and perform cumbersome manual input. Perform the operation of retrieving the password.
  • the existing voiceprint recognition technology uses the traditional probability model or a single machine learning method for feature training, but the recognition effect is not good enough, and the final recognition accuracy cannot reach the expected effect.
  • the purpose of the present invention is to overcome the deficiencies of the prior art.
  • the present invention provides a method and system for voiceprint recognition applied in electric power operations, so that the final recognition accuracy can be improved.
  • an embodiment of the present invention provides a method for voiceprint recognition applied in electric power operations, the method comprising:
  • the denoised voiceprint information is subjected to non-negative matrix decomposition NMF to extract features;
  • the removing noise and interference information in the voiceprint information includes:
  • the discretized voiceprint information is amplified
  • the transfer function H of the first-order high-frequency digital filter is:
  • H is the transfer function
  • A is defined as the energy amplification coefficient
  • the value range is 0.9 ⁇ A ⁇ 1
  • z represents the z transformation factor
  • the speech signal is amplified and strengthened as follows:
  • s(n) and s(n-1) are signals of different time periods before amplification.
  • the signal segmentation processing of the voiceprint information after the signal amplification processing includes:
  • w(n) is the window function used.
  • the described voiceprint information after denoising is carried out non-negative matrix factorization NMF extraction feature comprises:
  • Q is the original high-dimensional data matrix
  • W is the non-negative value matrix for constructing the first element
  • H is the non-negative value matrix for constructing the second element
  • the spectrogram of the voiceprint information obtained based on the voiceprint information processed by NMF includes:
  • STFT short-time Fourier transform
  • the processing of the spectrogram based on the convolutional neural network CNN voiceprint recognition algorithm includes:
  • the spectrogram is used as the input of CNN and processed by the convolutional layer;
  • an embodiment of the present invention also provides a system for voiceprint recognition applied in electric power operations, the system comprising:
  • the collection module is used to collect the voiceprint information of different people in the electric power operation scene
  • a denoising module configured to remove noise and interference information in the voiceprint information
  • the feature extraction module is used to perform non-negative matrix decomposition NMF feature extraction on the voiceprint information after denoising;
  • the spectrogram module is used to obtain the spectrogram of the voiceprint information based on the voiceprint information processed by NMF;
  • the CNN module is used to process the spectrogram based on the CNN voiceprint recognition algorithm of the convolutional neural network
  • the result output module is used to output voiceprint recognition results based on the convolutional neural network training model
  • the denoising module performs signal discretization processing on the collected voiceprint information; performs signal amplification processing on the discretized voiceprint information based on a first-order high-frequency digital filter; performs signal processing on the voiceprint information after signal amplification processing. Segmentation processing.
  • the CNN module takes the spectrogram as the input of CNN, and processes it through a convolutional layer; then performs pooling processing of a convolutional neural network CNN; and finally processes it through a fully connected layer of a convolutional neural network CNN.
  • the collected speech signal is converted into a spectrogram form, and then the spectrogram is used as input data, and the convolutional neural network is used to train the model, and the voiceprint recognition result is further obtained, which improves the overall recognition effect ;
  • the feature extraction stage use NMF to extract features and fuse multi-dimensional features to obtain more accurate recognition results.
  • Fig. 1 is a flow chart of a method for voiceprint recognition applied in electric power work in an embodiment of the present invention
  • Fig. 2 is a schematic structural diagram of a voiceprint recognition system applied to electric power work in an embodiment of the present invention.
  • the voiceprint recognition method applied in the electric power operation involved in the embodiment of the present invention includes: collecting voiceprint information of different people in the electric power operation scene; removing noise and interference information in the voiceprint information; The denoised voiceprint information is subjected to non-negative matrix decomposition NMF to extract features; based on the voiceprint information processed by NMF, the spectrogram of the voiceprint information is obtained; the spectrogram is processed based on the convolutional neural network CNN voiceprint recognition algorithm; The product neural network training model outputs voiceprint recognition results.
  • the collected speech signal is converted into a spectrogram form, and then the spectrogram is used as input data, and the convolutional neural network is used to train the model, and the voiceprint recognition result is further obtained, which improves the overall recognition effect; in the feature extraction stage Using NMF to extract features and fuse multi-dimensional features can get more accurate recognition results.
  • FIG. 1 shows a flow chart of a method for voiceprint recognition applied in electric power work in an embodiment of the present invention, including:
  • the voice collection part can obtain the voiceprint information of different people, and then carry out the data preprocessing process.
  • the data preprocessing is mainly to remove the noise and interference information of the collected voice information, which involves signal discretization, signal amplification processing, and signal analysis. segment processing etc.
  • the sound signal that the staff directly outputs to the external space through the vocal organ is a one-dimensional time series, that is, an analog signal, but the computer can only process digital signals, so the continuous signal must be discretized first. Signal features are then extracted and processed. According to Shannon's sampling theorem, the sampling frequency must be equal to twice the frequency of the sound signal of the collecting staff, so as to ensure that the discretized signal retains the original data information as much as possible.
  • the energy of the voice signal sent by the staff is mainly distributed in the low frequency band, and the high frequency band is less.
  • the attenuation caused by signal propagation will cause some signal information to be lost. Therefore, the signal processed by the above steps can be input into the first stage High-frequency digital filter, so as to achieve the purpose of enhancing its energy.
  • the transfer function H of the first-order high-frequency digital filter is:
  • H is the transfer function
  • A is defined as the energy amplification coefficient
  • the value range is 0.9 ⁇ A ⁇ 1
  • z represents the z transformation factor
  • the speech signal is amplified and strengthened as follows:
  • s(n) and s(n-1) are signals of different time periods before amplification.
  • the voice signal is generally a non-stationary signal, which is difficult to process directly.
  • the voice signal can be regarded as a signal composed of multiple frames. After the voice signal is segmented, its expression is as follows:
  • w(n) is the window function used.
  • NMF non-negative matrix factorization
  • NMF non-negative matrix decomposition is used to extract features, and the decomposition process is as follows:
  • Q is the original high-dimensional data matrix
  • W is the non-negative value matrix for constructing the first element
  • H is the non-negative value matrix for constructing the second element
  • the objective function selection is based on the Euclidean distance objective function, as follows:
  • Common voiceprint feature parameters are LPCC, MFCC, PLP, and CQCC, each of which has its own focus.
  • the present invention adopts a multi-feature fusion method to train the model.
  • STFT short-time Fourier transform
  • the preprocessed signal is subjected to short-time Fourier transform STFT, and the transformation process can be described as the following mathematical expression, where S n is the segmented signal:
  • w represents the frequency
  • e jw is the complex variable function
  • n, m, k are the sampling counting points
  • N is the speech length.
  • the embodiment of the present invention proposes a CNN voiceprint recognition algorithm for classification and feature matching. Firstly, the spectrogram is obtained, and then the CNN voiceprint recognition algorithm is entered.
  • the spectrogram is used as the input of CNN and processed by the convolutional layer.
  • the convolutional layer processing process is as follows:
  • a [l-1] is the input
  • l represents the l-th layer
  • ⁇ [l] is the activation function
  • Indicates bias is the activation function
  • the output results are processed by the fully connected layer.
  • the processing process of the fully connected layer is as follows:
  • w the weight
  • the method shown in Figure 1 above converts the collected speech signal into a spectrogram form, then uses the spectrogram as input data, uses a convolutional neural network to train the model, and further obtains the voiceprint recognition result, which improves the overall recognition Effect;
  • NMF is used to extract features, and multi-dimensional features are fused to obtain more accurate recognition results.
  • FIG. 2 shows a schematic structural diagram of a system for voiceprint recognition applied in electric power work in an embodiment of the present invention, and the system includes:
  • the collection module is used to collect the voiceprint information of different people in the electric power operation scene
  • a denoising module configured to remove noise and interference information in the voiceprint information
  • the feature extraction module is used to perform non-negative matrix decomposition NMF feature extraction on the voiceprint information after denoising;
  • the spectrogram module is used to obtain the spectrogram of the voiceprint information based on the voiceprint information processed by NMF;
  • the CNN module is used to process the spectrogram based on the CNN voiceprint recognition algorithm of the convolutional neural network
  • the result output module is used to output voiceprint recognition results based on the convolutional neural network training model
  • the denoising module performs signal discretization processing on the collected voiceprint information; performs signal amplification processing on the discretized voiceprint information based on the first-order high-frequency digital filter;
  • the pattern information is processed by signal segmentation.
  • the energy of the voice signal sent by the staff is mainly distributed in the low-frequency band, and the high-frequency band is less, and the attenuation caused by signal propagation will cause part of the signal information to be lost. Therefore, the above-mentioned steps can be processed
  • the signal is input into the first-order high-frequency digital filter, so as to achieve the purpose of enhancing its energy.
  • the sound signal directly output by the staff to the external space through the vocal organ is a one-dimensional time series, that is, an analog signal, but the computer can only process digital signals, so the continuous signal must be Discretization is performed first, and then signal features are extracted and processed.
  • the sampling frequency must be equal to twice the frequency of the sound signal of the collecting staff, so as to ensure that the discretized signal retains the original data information as much as possible.
  • the CNN module uses the spectrogram as the input of the CNN, which is processed by the convolutional layer; then it is processed by the pooling of the convolutional neural network CNN; and finally it is processed by the fully connected layer of the convolutional neural network CNN.
  • the spectrogram module performs short-time Fourier transform (STFT) on the voiceprint information processed by NMF; performs discrete Fourier transform (DFT); and calculates the energy spectral density function P.
  • STFT short-time Fourier transform
  • DFT discrete Fourier transform
  • the feature extraction of the feature extraction module is mainly to extract the main feature parameters as much as possible to provide input data for subsequent training and testing.
  • the system shown in Figure 2 above converts the collected speech signal into a spectrogram form, then uses the spectrogram as input data, uses a convolutional neural network to train the model, and further obtains the voiceprint recognition result, which improves the overall recognition Effect;
  • NMF is used to extract features, and multi-dimensional features are fused to obtain more accurate recognition results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种应用于电力作业中的声纹识别的方法及系统,其方法包括:采集电力作业场景中不同人的声纹信息;去除所述声纹信息中的噪声和干扰信息;将去噪后的声纹信息进行非负矩阵分解NMF提取特征;基于NMF处理的声纹信息获取声纹信息的语谱图;对语谱图基于卷积神经网络CNN声纹识别算法处理;基于卷积神经网络训练模型输出声纹识别结果。在本发明实施例在特征提取阶段使用NMF提取特征,并将多维特征进行融合,可以得到更加准确的识别效果。

Description

一种应用于电力作业中的声纹识别的方法及系统 技术领域
本发明涉及计算机技术领域,尤其涉及一种应用于电力作业中的声纹识别的方法及系统。
背景技术
目前保证电网的安全运行,需要对电网工作人员的业务技能提出了一定要求,工作人员只有具有良好的业务技能才能保证电网稳定与安全地运行,所以必须保证专业人员具有自己独一无二的身份特征来实施操作。身份认证技术大致分为两个分支:一是身份标识输入,如使用用户名、密码登录的账户模式;二是通过身份标识物,即钥匙、证件等。这两种方式已经被绝大多数人所接受并被广泛使用。但是一些不法之徒通过非法手段获得他人的身份标识后,就可以顺利地通过检测,冒充他人,最终对电网系统造成重大损失。对于这种问题,基于生物特性识别和文本识别的身份认证技术就应运而生。声纹是生物特征的一种,对于人体来说,声纹是长期稳定的特征信号,通过声纹鉴别技术可以区分不同个体。因为每个人的声道和发音器官具有差异,而且身体状况、生活环境等后天的因素都会造成不同的人发出的语音在语谱图上呈现出差异性,因此这种特征具有独特性,利用这一特性,可以使用机器学习及人工智能的方法实现不同说话人的识别。
申请号为202011634585.4的发明公开了一种声纹识别方法。该方法主要包括以下步骤:获取音频文件;对文件进行剪裁处理以得到有效音频文件;对有效音频文件进行加密处理,获得加密音频信息;发送声纹识别请求,该请求包括所述加密音频信息。
申请号为201610641491.7的发明公开了一种声纹识别系统。本发明的方法、装置和声纹识别系统,通过实时地采集声音的信道特征,优先选择带有信道特征的声纹模型进行模式匹配,建立声纹模型库以及声纹模型,可有效解决信道差异对声纹识别性能的影响。
基于传统密码认证的身份识别技术不安全,信息一旦泄露就会被盗用,其次它的便捷性也很差,使用者需要记住账号密码并且进行繁琐的手动输入,如果忘记账号或密码,还需要进行找回密码的操作。现有的声纹识别技术使用传统的概率学模型或者单一的机器学习方法进行特征的训练,识别效果不够好,最终的识别准确率达不到预期效果。
发明内容
本发明的目的在于克服现有技术的不足,本发明提供了一种应用于电力作业中的声纹识别的方法及系统,使得最终识别准确率能达到提升。
为了解决上述技术问题,本发明实施例提供了一种应用于电力作业中的声纹识别的方法,所述方法包括:
采集电力作业场景中不同人的声纹信息;
去除所述声纹信息中的噪声和干扰信息;
将去噪后的声纹信息进行非负矩阵分解NMF提取特征;
基于NMF处理的声纹信息获取声纹信息的语谱图;
对语谱图基于卷积神经网络CNN声纹识别算法处理;
基于卷积神经网络训练模型输出声纹识别结果。
所述去除所述声纹信息中的噪声和干扰信息包括:
对采集的声纹信息进行信号离散化处理;
基于一阶高频数字滤波器对离散化处理后的声纹信息进行信号放大处理;
对信号放大处理后的声纹信息进行信号分段处理。
所述一阶高频数字滤波器的传递函数H为:
Figure PCTCN2022115882-appb-000001
其中,H是传递函数,A定义为能量放大的系数,取值范围为0.9<A<1,z代表z变换因子,语音信号经过放大加强后为:
Figure PCTCN2022115882-appb-000002
其中,
Figure PCTCN2022115882-appb-000003
是放大后的信号,s(n)和s(n-1)是放大前不同时间段的信号。
所述对信号放大处理后的声纹信息进行信号分段处理包括:
对语音信号进行分段后的表达式如下:
s w(n)=s(n)w(n);
其中:w(n)为所用窗函数。
所述将去噪后的声纹信息进行非负矩阵分解NMF提取特征包括:
采用非负矩阵分解NMF提取特征,分解过程如下:
Figure PCTCN2022115882-appb-000004
其中,Q为原始高维数据矩阵,W为构造第一元素的非负值矩阵、H为构造第二元素的非负值矩阵,
Figure PCTCN2022115882-appb-000005
为分解误差。
所述基于NMF处理的声纹信息获取声纹信息的语谱图包括:
对NMF处理的声纹信息进行短时傅里叶变换STFT;
进行离散傅里叶变换DFT;
计算能量谱密度函数P。
所述对语谱图基于卷积神经网络CNN声纹识别算法处理包括:
将语谱图作为CNN的输入,经过卷积层处理;
随后进行卷积神经网络CNN的池化处理;
最后经过卷积神经网络CNN的全连接层处理。
相应的,本发明实施例还提供了一种应用于电力作业中的声纹识别的系统,所述系统包括:
采集模块,用于采集电力作业场景中不同人的声纹信息;
去噪模块,用于去除所述声纹信息中的噪声和干扰信息;
特征提取模块,用于将去噪后的声纹信息进行非负矩阵分解NMF提取特征;
语谱图模块,用于基于NMF处理的声纹信息获取声纹信息的语谱图;
CNN模块,用于对语谱图基于卷积神经网络CNN声纹识别算法处理;
结果输出模块,用于基于卷积神经网络训练模型输出声纹识别结果
所述去噪模块对采集的声纹信息进行信号离散化处理;基于一阶高频数字滤波器对离散化处理后的声纹信息进行信号放大处理;对信号放大处理后的声纹信息进行信号分段处理。
所述CNN模块将语谱图作为CNN的输入,经过卷积层处理;随后进行卷积神经网络CNN的池化处理;最后经过卷积神经网络CNN的全连接层处理。
在本发明实施例中通过将采集的语音信号转换为语谱图形式,随后将 语谱图作为输入数据,采用卷积神经网络训练模型,并进一步得到声纹识别结果,提升了整体的识别效果;在特征提取阶段使用NMF提取特征,并将多维特征进行融合,可以得到更加准确的识别效果。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见的,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1是本发明实施例中的应用于电力作业中的声纹识别的方法流程图;
图2是本发明实施例中的应用于电力作业中的声纹识别的系统结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
本发明实施例中所涉及的应用于电力作业中的声纹识别的方法,其方法包括:采集电力作业场景中不同人的声纹信息;去除所述声纹信息中的噪声和干扰信息;将去噪后的声纹信息进行非负矩阵分解NMF提取特征;基于NMF处理的声纹信息获取声纹信息的语谱图;对语谱图基于卷积神经网络CNN声纹识别算法处理;基于卷积神经网络训练模型输出声纹识别结果。这里通过将采集的语音信号转换为语谱图形式,随后将语谱图作为输入数据,采用卷积神经网络训练模型,并进一步得到声纹识别结果,提升了整体的识别效果;在特征提取阶段使用NMF提取特征,并将多维特征进行融合,可以得到更加准确的识别效果。
具体的,图1示出了本发明实施例中的应用于电力作业中的声纹识别的方法流程图,包括:
S101、采集电力作业场景中不同人的声纹信息;
语音采集部分可以获取不同人的声纹信息,接下来进行数据预处理过程,数据预处理主要是为了去除采集的语音信息的噪声和干扰信息,其涉及到信号离散化、信号放大处理、信号分段处理等。
S102、对采集的声纹信息进行信号离散化处理;
这里的信号离散化处理中,工作人员通过发声器官直接向外部空间输出的声音信号为一维时间序列,也就是模拟信号,但是计算机只能处理数字信号,因此要对连续信号先进行离散化,随后再提取和处理信号特征。根据香农采样定理,采样频率最要等于采集工作人员声音信号频率的2倍,这样才能保证离散化后的信号尽可能地保留原始数据信息。
S103、基于一阶高频数字滤波器对离散化处理后的声纹信息进行信号放大处理;
这里的信号放大处理中,工作人员发出的语音信号的能量主要分布在低频段,高频段较少,信号传播导致的衰减会使部分信号信息遗失,因此可将经过上述步骤处理的信号输入一阶高频数字滤波器,从而达到使其能量增强的目的。
所述一阶高频数字滤波器的传递函数H为:
Figure PCTCN2022115882-appb-000006
其中,H是传递函数,A定义为能量放大的系数,取值范围为0.9<A<1,z代表z变换因子,语音信号经过放大加强后为:
Figure PCTCN2022115882-appb-000007
其中,
Figure PCTCN2022115882-appb-000008
是放大后的信号,s(n)和s(n-1)是放大前不同时间段的信号。
S104、对信号放大处理后的声纹信息进行信号分段处理;
这里的信号分段处理中,语音信号一般是非平稳信号,难以直接处理,可将语音信号看成是由多帧组成的信号,对语音信号进行分段后,其表达式如下:
s w(n)=s(n)w(n);
其中:w(n)为所用窗函数。
S105、将去噪后的声纹信息进行非负矩阵分解NMF提取特征;
这里的特征提取主要是尽可能地提取主要特征参数,为后续的训练和 测试提供输入数据。本发明实施例中采用NMF非负矩阵分解提取特征,分解过程如下:
Figure PCTCN2022115882-appb-000009
其中,Q为原始高维数据矩阵,W为构造第一元素的非负值矩阵、H为构造第二元素的非负值矩阵,
Figure PCTCN2022115882-appb-000010
为分解误差。
此外,目标函数选择基于欧几里得距离的目标函数,如下式:
Figure PCTCN2022115882-appb-000011
常见的声纹特征参数分别是LPCC、MFCC、PLP、CQCC,他们各有自己的侧重点,本发明采用多特征融合的方法训练模型。
S106、对NMF处理的声纹信息进行短时傅里叶变换STFT;
过预处理后的信号进行短时傅里叶变换STFT,变换过程可以描述为如下数学表达式,其中S n为分段后的信号:
Figure PCTCN2022115882-appb-000012
Figure PCTCN2022115882-appb-000013
S107、进行离散傅里叶变换DFT;
随后进行离散傅里叶变换DFT,其中,w代表频率,e jw为复变函数,n、m、k为采样计数点,N为语音长度。
S108、计算能量谱密度函数P;
然后计算能量谱密度函数P:
P(n,k)=|S(n,k)| 2=(S(n,k))×(conj(S(n,k)))
S109、获取声纹信息的语谱图;
基于上述S105-S109步骤生成语谱图后,将语谱图作为CNN的输入。在语谱图的基础上,本发明实施例提出了用于分类及特征匹配的CNN声纹识别算法,首先获取语谱图,在进入CNN声纹识别算法过程。
S110、将语谱图作为CNN的输入,经过卷积层处理;
在生成语谱图后,将语谱图作为CNN的输入,经过卷积层处理,卷积层处理过程如下:
Figure PCTCN2022115882-appb-000014
其中,a [l-1]为输入,l表示第l层,ψ [l]为激活函数,
Figure PCTCN2022115882-appb-000015
表示偏置。
S111、随后进行卷积神经网络CNN的池化处理;
随后进行池化处理,处理过程如下:
Figure PCTCN2022115882-appb-000016
其中,
Figure PCTCN2022115882-appb-000017
为池化函数,f [l]为卷积核。
S112、最后经过卷积神经网络CNN的全连接层处理;
后经过全连接层处理输出结果,全连接层处理过程如下:
Figure PCTCN2022115882-appb-000018
其中,w表示权重。
S113、基于卷积神经网络训练模型输出声纹识别结果。
以上图1所示的方法通过将采集的语音信号转换为语谱图形式,随后将语谱图作为输入数据,采用卷积神经网络训练模型,并进一步得到声纹识别结果,提升了整体的识别效果;在特征提取阶段使用NMF提取特征,并将多维特征进行融合,可以得到更加准确的识别效果。
相应的,图2示出了本发明实施例中的应用于电力作业中的声纹识别的系统结构示意图,所述系统包括:
采集模块,用于采集电力作业场景中不同人的声纹信息;
去噪模块,用于去除所述声纹信息中的噪声和干扰信息;
特征提取模块,用于将去噪后的声纹信息进行非负矩阵分解NMF提取特征;
语谱图模块,用于基于NMF处理的声纹信息获取声纹信息的语谱图;
CNN模块,用于对语谱图基于卷积神经网络CNN声纹识别算法处理;
结果输出模块,用于基于卷积神经网络训练模型输出声纹识别结果
需要说明的是,该去噪模块对采集的声纹信息进行信号离散化处理;基于一阶高频数字滤波器对离散化处理后的声纹信息进行信号放大处理;对信号放大处理后的声纹信息进行信号分段处理。
需要说明的是,这里的信号放大处理中,工作人员发出的语音信号的能量主要分布在低频段,高频段较少,信号传播导致的衰减会使部分信号信息遗失,因此可将经过上述步骤处理的信号输入一阶高频数字滤波器,从而达到使其能量增强的目的。
需要说明的是,这里的信号离散化处理中,工作人员通过发声器官直接向外部空间输出的声音信号为一维时间序列,也就是模拟信号,但是计算机只能处理数字信号,因此要对连续信号先进行离散化,随后再提取和处理信号特征。根据香农采样定理,采样频率最要等于采集工作人员声音信号频率的2倍,这样才能保证离散化后的信号尽可能地保留原始数据信息。
需要说明的是,该CNN模块将语谱图作为CNN的输入,经过卷积层处理;随后进行卷积神经网络CNN的池化处理;最后经过卷积神经网络CNN的全连接层处理。
需要说明的是,该语谱图模块对NMF处理的声纹信息进行短时傅里叶变换STFT;进行离散傅里叶变换DFT;计算能量谱密度函数P。
需要说明的是,这里特征提取模块的特征提取主要是尽可能地提取主要特征参数,为后续的训练和测试提供输入数据。
以上图2所示的系统通过将采集的语音信号转换为语谱图形式,随后将语谱图作为输入数据,采用卷积神经网络训练模型,并进一步得到声纹识别结果,提升了整体的识别效果;在特征提取阶段使用NMF提取特征,并将多维特征进行融合,可以得到更加准确的识别效果。
以上对本发明实施例所进行了详细介绍,本文中应采用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (10)

  1. 一种应用于电力作业中的声纹识别的方法,其特征在于,所述方法包括:
    采集电力作业场景中不同人的声纹信息;
    去除所述声纹信息中的噪声和干扰信息;
    将去噪后的声纹信息进行非负矩阵分解NMF提取特征;
    基于NMF处理的声纹信息获取声纹信息的语谱图;
    对语谱图基于卷积神经网络CNN声纹识别算法处理;
    基于卷积神经网络训练模型输出声纹识别结果。
  2. 如权利要求1所述的应用于电力作业中的声纹识别的方法,其特征在于,所述去除所述声纹信息中的噪声和干扰信息包括:
    对采集的声纹信息进行信号离散化处理;
    基于一阶高频数字滤波器对离散化处理后的声纹信息进行信号放大处理;
    对信号放大处理后的声纹信息进行信号分段处理。
  3. 如权利要求2所述的应用于电力作业中的声纹识别的方法,其特征在于,所述一阶高频数字滤波器的传递函数H为:
    Figure PCTCN2022115882-appb-100001
    其中,H是传递函数,A定义为能量放大的系数,取值范围为0.9<A<1,z代表z变换因子,语音信号经过放大加强后为:
    Figure PCTCN2022115882-appb-100002
    其中,
    Figure PCTCN2022115882-appb-100003
    是放大后的信号,s(n)和s(n-1)是放大前不同时间段的信号。
  4. 如权利要求3所述的应用于电力作业中的声纹识别的方法,其特征在于,所述对信号放大处理后的声纹信息进行信号分段处理包括:
    对语音信号进行分段后的表达式如下:
    s w(n)=s(n)w(n);
    其中:w(n)为所用窗函数。
  5. 如权利要求1所述的应用于电力作业中的声纹识别的方法,其特征在于,所述将去噪后的声纹信息进行非负矩阵分解NMF提取特征包括:
    采用非负矩阵分解NMF提取特征,分解过程如下:
    Figure PCTCN2022115882-appb-100004
    其中,Q为原始高维数据矩阵,W为构造第一元素的非负值矩阵、H为构造第二元素的非负值矩阵,
    Figure PCTCN2022115882-appb-100005
    为分解误差。
  6. 如权利要求1所述的应用于电力作业中的声纹识别的方法,其特征在于,所述基于NMF处理的声纹信息获取声纹信息的语谱图包括:
    对NMF处理的声纹信息进行短时傅里叶变换STFT;
    进行离散傅里叶变换DFT;
    计算能量谱密度函数P。
  7. 如权利要求1至6任一项所述的应用于电力作业中的声纹识别的方法,其特征在于,所述对语谱图基于卷积神经网络CNN声纹识别算法处理包括:
    将语谱图作为CNN的输入,经过卷积层处理;
    随后进行卷积神经网络CNN的池化处理;
    最后经过卷积神经网络CNN的全连接层处理。
  8. 一种应用于电力作业中的声纹识别的系统,其特征在于,所述系统包括:
    采集模块,用于采集电力作业场景中不同人的声纹信息;
    去噪模块,用于去除所述声纹信息中的噪声和干扰信息;
    特征提取模块,用于将去噪后的声纹信息进行非负矩阵分解NMF提取特征;
    语谱图模块,用于基于NMF处理的声纹信息获取声纹信息的语谱图;
    CNN模块,用于对语谱图基于卷积神经网络CNN声纹识别算法处理;
    结果输出模块,用于基于卷积神经网络训练模型输出声纹识别结果
  9. 如权利要求8所述的应用于电力作业中的声纹识别的系统,其特征在于,所述去噪模块对采集的声纹信息进行信号离散化处理;基于一阶高频数字滤波器对离散化处理后的声纹信息进行信号放大处理;对信号放大处理后的声纹信息进行信号分段处理。
  10. 如权利要求8所述的应用于电力作业中的声纹识别的系统,其特征在于,所述CNN模块将语谱图作为CNN的输入,经过卷积层处理;随后进行卷积神经网络CNN的池化处理;最后经过卷积神经网络CNN的全连接层处理。
PCT/CN2022/115882 2021-09-07 2022-08-30 一种应用于电力作业中的声纹识别的方法及系统 WO2023036016A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111044307.8 2021-09-07
CN202111044307.8A CN113823291A (zh) 2021-09-07 2021-09-07 一种应用于电力作业中的声纹识别的方法及系统

Publications (1)

Publication Number Publication Date
WO2023036016A1 true WO2023036016A1 (zh) 2023-03-16

Family

ID=78922041

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/115882 WO2023036016A1 (zh) 2021-09-07 2022-08-30 一种应用于电力作业中的声纹识别的方法及系统

Country Status (2)

Country Link
CN (1) CN113823291A (zh)
WO (1) WO2023036016A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118522288A (zh) * 2024-07-24 2024-08-20 山东第一医科大学附属省立医院(山东省立医院) 基于声纹识别的耳鼻喉科患者身份验证方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113823291A (zh) * 2021-09-07 2021-12-21 广西电网有限责任公司贺州供电局 一种应用于电力作业中的声纹识别的方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198576A (zh) * 2018-02-11 2018-06-22 华南理工大学 一种基于语音特征非负矩阵分解的阿尔茨海默症初筛方法
CN109524014A (zh) * 2018-11-29 2019-03-26 辽宁工业大学 一种基于深度卷积神经网络的声纹识别分析方法
CN110459225A (zh) * 2019-08-14 2019-11-15 南京邮电大学 一种基于cnn融合特征的说话人辨认系统
AU2020102038A4 (en) * 2020-08-28 2020-10-08 Jia, Yichen Mr A speaker identification method based on deep learning
CN112053695A (zh) * 2020-09-11 2020-12-08 北京三快在线科技有限公司 声纹识别方法、装置、电子设备及存储介质
CN112735436A (zh) * 2021-01-21 2021-04-30 国网新疆电力有限公司信息通信公司 声纹识别方法及声纹识别系统
CN113823291A (zh) * 2021-09-07 2021-12-21 广西电网有限责任公司贺州供电局 一种应用于电力作业中的声纹识别的方法及系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8015003B2 (en) * 2007-11-19 2011-09-06 Mitsubishi Electric Research Laboratories, Inc. Denoising acoustic signals using constrained non-negative matrix factorization
CN110767244B (zh) * 2018-07-25 2024-03-29 中国科学技术大学 语音增强方法
CN110534118B (zh) * 2019-07-29 2021-10-08 安徽继远软件有限公司 基于声纹识别和神经网络的变压器/电抗器故障诊断方法
JP7373358B2 (ja) * 2019-10-30 2023-11-02 株式会社日立製作所 音抽出システム及び音抽出方法
CN111108554A (zh) * 2019-12-24 2020-05-05 广州国音智能科技有限公司 一种基于语音降噪的声纹识别方法和相关装置
CN111312270B (zh) * 2020-02-10 2022-11-22 腾讯科技(深圳)有限公司 语音增强方法及装置、电子设备和计算机可读存储介质
CN112053694A (zh) * 2020-07-23 2020-12-08 哈尔滨理工大学 一种基于cnn与gru网络融合的声纹识别方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198576A (zh) * 2018-02-11 2018-06-22 华南理工大学 一种基于语音特征非负矩阵分解的阿尔茨海默症初筛方法
CN109524014A (zh) * 2018-11-29 2019-03-26 辽宁工业大学 一种基于深度卷积神经网络的声纹识别分析方法
CN110459225A (zh) * 2019-08-14 2019-11-15 南京邮电大学 一种基于cnn融合特征的说话人辨认系统
AU2020102038A4 (en) * 2020-08-28 2020-10-08 Jia, Yichen Mr A speaker identification method based on deep learning
CN112053695A (zh) * 2020-09-11 2020-12-08 北京三快在线科技有限公司 声纹识别方法、装置、电子设备及存储介质
CN112735436A (zh) * 2021-01-21 2021-04-30 国网新疆电力有限公司信息通信公司 声纹识别方法及声纹识别系统
CN113823291A (zh) * 2021-09-07 2021-12-21 广西电网有限责任公司贺州供电局 一种应用于电力作业中的声纹识别的方法及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118522288A (zh) * 2024-07-24 2024-08-20 山东第一医科大学附属省立医院(山东省立医院) 基于声纹识别的耳鼻喉科患者身份验证方法

Also Published As

Publication number Publication date
CN113823291A (zh) 2021-12-21

Similar Documents

Publication Publication Date Title
WO2023036016A1 (zh) 一种应用于电力作业中的声纹识别的方法及系统
CN108922541B (zh) 基于dtw和gmm模型的多维特征参数声纹识别方法
CN106952649A (zh) 基于卷积神经网络和频谱图的说话人识别方法
CN109215665A (zh) 一种基于3d卷积神经网络的声纹识别方法
CN110767239A (zh) 一种基于深度学习的声纹识别方法、装置及设备
CN113129897B (zh) 一种基于注意力机制循环神经网络的声纹识别方法
CN104887263B (zh) 一种基于心音多维特征提取的身份识别算法及其系统
CN102237083A (zh) 一种基于WinCE平台的便携式口语翻译系统及其语言识别方法
Patil et al. Energy Separation-Based Instantaneous Frequency Estimation for Cochlear Cepstral Feature for Replay Spoof Detection.
Permana et al. Implementation of constant-Q transform (CQT) and mel spectrogram to converting bird’s sound
Li et al. A study of voice print recognition technology
Nirjon et al. sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices--a feasibility study
Huang et al. Audio-replay Attacks Spoofing Detection for Automatic Speaker Verification System
Sukor et al. Speaker identification system using MFCC procedure and noise reduction method
Sengupta et al. Optimization of cepstral features for robust lung sound classification
CN109003613A (zh) 结合空间信息的声纹识别支付信息防伪方法
CN115472168A (zh) 耦合bgcc和pwpe特征的短时语音声纹识别方法、系统及设备
CN107993666A (zh) 语音识别方法、装置、计算机设备及可读存储介质
Balpande et al. Speaker recognition based on mel-frequency cepstral coefficients and vector quantization
Jain et al. Speech features analysis and biometric person identification in multilingual environment
Zhipeng et al. Voiceprint recognition based on BP Neural Network and CNN
Kumar et al. Formant measure of Indian English vowels for speaker identity
Jain et al. Comparative study of voice print Based acoustic features: MFCC and LPCC
CN111652178A (zh) 鲁棒性好且不易复制的心音特征身份识别方法
Abdulghani et al. Voice Signature Recognition for UAV Pilots Identity Verification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22866482

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22866482

Country of ref document: EP

Kind code of ref document: A1