WO2021127994A1 - 一种声纹识别方法、装置、设备和储存介质 - Google Patents
一种声纹识别方法、装置、设备和储存介质 Download PDFInfo
- Publication number
- WO2021127994A1 WO2021127994A1 PCT/CN2019/127967 CN2019127967W WO2021127994A1 WO 2021127994 A1 WO2021127994 A1 WO 2021127994A1 CN 2019127967 W CN2019127967 W CN 2019127967W WO 2021127994 A1 WO2021127994 A1 WO 2021127994A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- boltzmann machine
- restricted boltzmann
- bias
- data set
- spectrogram
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000000605 extraction Methods 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims description 50
- 238000005457 optimization Methods 0.000 claims description 32
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 239000011521 glass Substances 0.000 claims description 2
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Definitions
- This application relates to the technical field of voiceprint recognition, in particular to a voiceprint recognition method, device, equipment and storage medium.
- Voiceprint recognition refers to the process of comprehensively analyzing and comparing the voice acoustic characteristics of an unknown speaker or an uncertain speaker with the voice acoustic characteristics of a known speaker, and making a conclusion whether the two are the same.
- the existing voiceprint recognition method usually manually compares the spectrogram of the sample voice with the spectrogram of the sample voice to obtain the voiceprint recognition result. This method has the problems of low efficiency and low recognition accuracy.
- This application provides a voiceprint recognition method, device, equipment, and storage medium, which are used to solve the technical problems of low recognition efficiency and low accuracy in the existing voiceprint recognition method for recognizing spectrograms through manual comparison.
- the first aspect of this application provides a voiceprint recognition method, including:
- the extracted features are input into the preset SVM classifier to obtain the recognition result of the voice to be recognized.
- the step of inputting the first spectrogram into a preset restricted Boltzmann machine for feature extraction also includes:
- the trained restricted Boltzmann machine and the trained SVM classifier are obtained, and the trained restricted Boltzmann machine is used as The preset restricted Boltzmann machine uses the trained SVM classifier as the preset SVM classifier.
- the optimization of the bias of the hidden unit based on a multi-objective optimization algorithm to obtain the optimized restricted Boltzmann machine includes:
- the bias parameters in the first bias data set are updated based on the second bias data set to obtain the optimized restricted Boltzmann machine.
- the extraction of the first spectrogram of the speech to be recognized before further includes:
- the second aspect of the present application provides a voiceprint recognition device, including:
- the first acquisition module is used to acquire the voice to be recognized
- the first extraction module is used to extract the first spectrogram of the speech to be recognized
- the second extraction module is configured to input the first spectrogram into a preset restricted Boltzmann machine for feature extraction
- the recognition module is used to input the extracted features into the preset SVM classifier to obtain the recognition result of the voice to be recognized.
- it also includes:
- the second acquisition module is used to acquire a training sample speech data set
- the third extraction module is used to extract the second spectrogram of the training sample speech in the training sample speech data set;
- the first training module is configured to input the second spectrogram into a restricted Boltzmann machine, and perform optimization training on the restricted Boltzmann machine to obtain target parameters, where the target parameters include weight parameters , The bias of the visible unit and the bias of the hidden unit;
- the optimization module is used to optimize the bias of the hidden unit based on a multi-objective optimization algorithm to obtain the optimized restricted Boltzmann machine;
- the fourth extraction module is configured to input the second spectrogram to the optimized restricted Boltzmann machine for feature extraction, so that the optimized restricted Boltzmann machine outputs voiceprint features ;
- the second training module is used to input the voiceprint feature into the SVM classifier and train the SVM classifier;
- a calculation module configured to calculate the recognition rate of the SVM classifier on the training sample speech data set
- a trigger module configured to trigger the first training module when the recognition rate is less than a threshold
- the output module is used to obtain the trained restricted Boltzmann machine and the trained SVM classifier when the recognition rate is greater than or equal to the threshold, and the trained restricted glass
- the Ertzmann machine is used as the preset restricted Boltzmann machine
- the trained SVM classifier is used as the preset SVM classifier.
- the optimization module is specifically used for:
- the bias parameters in the first bias data set are updated based on the second bias data set to obtain the optimized restricted Boltzmann machine.
- it also includes:
- the preprocessing module is used to preprocess the speech to be recognized.
- a third aspect of the present application provides a voiceprint recognition device, the device including a processor and a memory;
- the memory is used to store program code and transmit the program code to the processor
- the processor is configured to use any of the voiceprint recognition methods described in the first aspect according to instructions in the program code.
- a fourth aspect of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium is used to store program code, and the program code is used to execute any of the voiceprints described in the first aspect. recognition methods.
- the present application provides a voiceprint recognition method, including: acquiring a voice to be recognized; extracting a first spectrogram of the voice to be recognized; inputting the first spectrogram into a preset restricted Boltzmann machine for feature extraction ; Input the extracted features into the preset SVM classifier to obtain the recognition result of the speech to be recognized.
- This application uses a preset restricted Boltzmann machine to perform feature extraction on the first spectrogram of the extracted sample speech, and input the extracted features into a preset SVM classifier for classification and recognition, without the need for manual comparison of the spectrum
- the recognition method of the graph is used for voiceprint recognition, which solves the technical problems of low recognition efficiency and low accuracy in the existing voiceprint recognition method that recognizes the spectrogram through manual comparison.
- FIG. 1 is a schematic flowchart of a voiceprint recognition method provided in an embodiment of this application
- FIG. 2 is a schematic diagram of another process of a voiceprint recognition method provided in an embodiment of this application.
- FIG. 3 is a schematic structural diagram of a voiceprint recognition device provided in an embodiment of this application.
- An embodiment of a voiceprint recognition method provided in this application includes:
- Step 101 Obtain a sample voice.
- sample voice can be obtained through a voice recording device.
- Step 102 Extract the first spectrogram of the speech to be recognized.
- the first spectrogram of the speech to be recognized can be obtained through a spectrograph.
- Step 103 Input the first spectrogram into a preset restricted Boltzmann machine for feature extraction.
- the preset restricted Boltzmann machine may be a trained restricted Boltzmann machine.
- Step 104 Input the extracted features into the preset SVM classifier to obtain the recognition result of the voice to be recognized.
- the preset SVM classifier may be a trained SVM classifier.
- the first spectrogram of the extracted sample speech is extracted by the preset restricted Boltzmann machine, and the extracted features are input into the preset SVM classifier for classification Recognition solves the technical problems of low recognition efficiency and low accuracy in the existing voiceprint recognition method through manual comparison of spectrograms.
- FIG. 2 Another embodiment of a voiceprint recognition method provided in this application includes:
- Step 201 Obtain a training sample speech data set.
- training sample speech data set can be obtained in the voiceprint recognition database.
- Step 202 Extract a second spectrogram of the training sample speech in the training sample speech data set.
- the second spectrogram of the training sample speech can be obtained through the spectrograph. Before the second spectrogram is extracted, the training sample speech in the training sample speech data set can be denoised pre-processed to reduce the environment. The influence of noise or channel noise on the recognition result.
- Step 203 Input the second language spectrogram into the restricted Boltzmann machine, and perform optimization training on the restricted Boltzmann machine to obtain target parameters.
- target parameters include the weight parameter, the bias of the visible unit and the bias of the hidden unit.
- Step 204 Optimize the bias of the hidden unit based on the multi-objective optimization algorithm to obtain the optimized restricted Boltzmann machine.
- bias parameters are randomly selected in the bias of the hidden unit to generate the first bias data set; the first bias data set is optimized based on the multi-objective optimization algorithm to obtain the second bias data set , Among them, the use of multi-objective optimization algorithm to optimize the data set belongs to the prior art. Here, the specific optimization process will not be described in detail; the bias parameters in the first bias data set are performed based on the second bias data set.
- the update specifically involves replacing the configuration parameters of the second bias data set with the bias parameters in the first bias data set to obtain the optimized restricted Boltzmann machine.
- Step 205 Input the second spectrogram to the optimized restricted Boltzmann machine for feature extraction, so that the optimized restricted Boltzmann machine outputs voiceprint features.
- the optimized restricted Boltzmann machine is used for feature extraction, and the extracted features are beneficial to improve the recognition rate.
- Step 206 Input the voiceprint feature into the SVM classifier, and train the SVM classifier.
- Step 207 Calculate the recognition rate of the SVM classifier on the training sample speech data set.
- the recognition rate is the ratio of the number of correctly recognized training sample speeches to the number of training sample speech data sets.
- Step 208 When the recognition rate is less than the threshold, return to step 203.
- a trained restricted Boltzmann machine and a trained SVM classifier are obtained, and the trained restricted Bohr
- the Ziman machine is used as the preset restricted Boltzmann machine, and the trained SVM classifier is used as the preset SVM classifier.
- the recognition rate is less than the threshold, it means that neither the restricted Boltzmann machine nor the SVM classifier has been trained well.
- the recognition rate is greater than or equal to the threshold , The trained restricted Boltzmann machine and the trained SVM classifier are obtained, and the trained restricted Boltzmann machine and the trained SVM classifier can be used for voiceprint recognition.
- Step 209 Obtain a sample voice.
- sample voice can be obtained through a voice recording device.
- Step 210 Extract the first spectrogram of the voice to be recognized.
- the first spectrogram of the speech to be recognized can be obtained by the spectrograph. Before the first spectrogram of the speech to be recognized is extracted, the speech to be recognized can be denoised preprocessing to reduce the effect of noise on the recognition result. Impact.
- Step 211 Input the first spectrogram into a preset restricted Boltzmann machine for feature extraction.
- Step 212 Input the extracted features into the preset SVM classifier to obtain the recognition result of the speech to be recognized.
- step 211 and step 212 are consistent with step 103 and step 104, and will not be repeated here.
- an embodiment of a voiceprint recognition device provided by the present application includes:
- the first acquiring module 301 is used to acquire the voice to be recognized.
- the first extraction module 302 is used to extract the first spectrogram of the speech to be recognized.
- the second extraction module 303 is configured to input the first spectrogram into the preset restricted Boltzmann machine for feature extraction.
- the recognition module 304 is used to input the extracted features into the preset SVM classifier to obtain the recognition result of the voice to be recognized.
- the second acquiring module 305 is used to acquire a training sample speech data set.
- the third extraction module 306 is used to extract the second spectrogram of the training sample speech in the training sample speech data set.
- the first training module 307 is used to input the second spectrogram into the restricted Boltzmann machine, and perform optimization training on the restricted Boltzmann machine to obtain target parameters.
- the target parameters include weight parameters and visual unit Bias and hidden unit bias.
- the optimization module 308 is configured to optimize the bias of the hidden unit based on a multi-objective optimization algorithm to obtain the optimized restricted Boltzmann machine.
- the fourth extraction module 309 is configured to input the second spectrogram into the optimized restricted Boltzmann machine for feature extraction, so that the optimized restricted Boltzmann machine outputs voiceprint features.
- the second training module 310 is used to input the voiceprint features into the SVM classifier to train the SVM classifier.
- the calculation module 311 is used to calculate the recognition rate of the training sample speech data set by the SVM classifier.
- the trigger module 312 is configured to trigger the first training module when the recognition rate is less than the threshold.
- the output module 313 is used to obtain the trained restricted Boltzmann machine and the trained SVM classifier when the recognition rate is greater than or equal to the threshold, and use the trained restricted Boltzmann machine as the preset restricted Boltzmann machine uses the trained SVM classifier as the preset SVM classifier.
- optimization module 308 is specifically used for:
- the bias parameters in the first bias data set are updated based on the second bias data set to obtain the optimized restricted Boltzmann machine.
- the preprocessing module 314 is used for preprocessing the speech to be recognized.
- This application provides an embodiment of a voiceprint recognition device, the device includes a processor and a memory;
- the memory is used to store the program code and transmit the program code to the processor
- the processor is configured to execute the voiceprint recognition method in the aforementioned voiceprint recognition method embodiment according to the instructions in the program code.
- This application provides an embodiment of a computer-readable storage medium, the computer-readable storage medium is used to store program code, and the program code is used to execute the voiceprint recognition method in the aforementioned voiceprint recognition method embodiment
- the disclosed device and method may be implemented in other ways.
- the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. , Including several instructions to execute all or part of the steps of the methods described in the various embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device, etc.).
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (English full name: Read-Only Memory, English abbreviation: ROM), random access memory (English full name: Random Access Memory, English abbreviation: RAM), magnetic Various media that can store program codes, such as discs or optical discs.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Circuit For Audible Band Transducer (AREA)
- Machine Translation (AREA)
Abstract
一种声纹识别方法、装置、设备和储存介质,其中方法包括:获取待识别语音(101);提取待识别语音的第一语谱图(102);将第一语谱图输入到预置受限玻尔兹曼机中进行特征提取(103);将提取的特征输入到预置SVM分类器中,得到待识别语音的识别结果(104)。该方法通过预置受限玻尔兹曼机对提取的样本语音的第一语谱图进行特征提取,将提取的特征输入到预置SVM分类器中进行分类识别,解决了现有的声纹识别方法通过人工比对频谱图进行识别,存在的识别效率低和准确率低的技术问题。
Description
本申请涉及声纹识别技术领域,尤其涉及一种声纹识别方法、装置、设备和储存介质。
声纹识别是指通过未知说话人或不确定说话人的语音声学特征与已知说话人的语音声学特征进行综合分析比对,做出两者是否同一的结论的过程。现有的声纹识别方法通常是将样本语音的频谱图与检材语音的频谱图进行人工比对,得到声纹识别结果,该方法存在效率低和识别准确率低的问题。
发明内容
本申请提供了一种声纹识别方法、装置、设备和储存介质,用于解决现有的声纹识别方法通过人工比对频谱图进行识别,存在的识别效率低和准确率低的技术问题。
有鉴于此,本申请第一方面提供了一种声纹识别方法,包括:
获取待识别语音;
提取所述待识别语音的第一语谱图;
将所述第一语谱图输入到预置受限玻尔兹曼机中进行特征提取;
将提取的特征输入到预置SVM分类器中,得到所述待识别语音的识别结果。
优选地,所述将所述第一语谱图输入到预置受限玻尔兹曼机中进行特征提取,之前还包括:
获取训练样本语音数据集;
提取所述训练样本语音数据集中的训练样本语音的第二语谱图;
将所述第二语谱图输入到受限玻尔兹曼机,对所述受限玻尔兹曼机进 行优化训练,得到目标参数,所述目标参数包括权重参数、可视单元的偏置和隐藏单元的偏置;
基于多目标优化算法对所述隐藏单元的偏置进行优化,得到优化后的所述受限玻尔兹曼机;
将所述第二语谱图输入到优化后的所述受限玻尔兹曼机进行特征提取,使得优化后的所述受限玻尔兹曼机输出声纹特征;
将所述声纹特征输入到SVM分类器中,对所述SVM分类器进行训练;
计算所述SVM分类器对训练样本语音数据集的识别率;
当所述识别率小于阈值时,返回所述将所述第二语谱图输入到受限玻尔兹曼机,对所述受限玻尔兹曼机进行优化训练,得到目标参数的步骤;
当所述识别率大于或等于所述阈值时,得到训练好的所述受限玻尔兹曼机和训练好的所述SVM分类器,将训练好的所述受限玻尔兹曼机作为所述预置受限玻尔兹曼机,将训练好的所述SVM分类器作为所述预置SVM分类器。
优选地,所述基于多目标优化算法对所述隐藏单元的偏置进行优化,得到优化后的所述受限玻尔兹曼机,包括:
在所述隐藏单元的偏置中随机选取若干个偏置参数,生成第一偏置数据集;
基于多目标优化算法对所述第一偏置数据集进行优化,得到第二偏置数据集;
基于所述第二偏置数据集对所述第一偏置数据集中的偏置参数进行更新,得到优化后的所述受限玻尔兹曼机。
优选地,所述提取所述待识别语音的第一语谱图,之前还包括:
对所述待识别语音进行预处理。
本申请第二方面提供了一种声纹识别装置,包括:
第一获取模块,用于获取待识别语音;
第一提取模块,用于提取所述待识别语音的第一语谱图;
第二提取模块,用于将所述第一语谱图输入到预置受限玻尔兹曼机中进行特征提取;
识别模块,用于将提取的特征输入到预置SVM分类器中,得到所述 待识别语音的识别结果。
优选地,还包括:
第二获取模块,用于获取训练样本语音数据集;
第三提取模块,用于提取所述训练样本语音数据集中的训练样本语音的第二语谱图;
第一训练模块,用于将所述第二语谱图输入到受限玻尔兹曼机,对所述受限玻尔兹曼机进行优化训练,得到目标参数,所述目标参数包括权重参数、可视单元的偏置和隐藏单元的偏置;
优化模块,用于基于多目标优化算法对所述隐藏单元的偏置进行优化,得到优化后的所述受限玻尔兹曼机;
第四提取模块,用于将所述第二语谱图输入到优化后的所述受限玻尔兹曼机进行特征提取,使得优化后的所述受限玻尔兹曼机输出声纹特征;
第二训练模块,用于将所述声纹特征输入到SVM分类器中,对所述SVM分类器进行训练;
计算模块,用于计算所述SVM分类器对训练样本语音数据集的识别率;
触发模块,用于当所述识别率小于阈值时,触发所述第一训练模块;
输出模块,用于当所述识别率大于或等于所述阈值时,得到训练好的所述受限玻尔兹曼机和训练好的所述SVM分类器,将训练好的所述受限玻尔兹曼机作为所述预置受限玻尔兹曼机,将训练好的所述SVM分类器作为所述预置SVM分类器。
优选地,所述优化模块具体用于:
在所述隐藏单元的偏置中随机选取若干个偏置参数,生成第一偏置数据集;
基于多目标优化算法对所述第一偏置数据集进行优化,得到第二偏置数据集;
基于所述第二偏置数据集对所述第一偏置数据集中的偏置参数进行更新,得到优化后的所述受限玻尔兹曼机。
优选地,还包括:
预处理模块,用于对所述待识别语音进行预处理。
本申请第三方面提供了一种声纹识别设备,所述设备包括处理器以及存储器;
所述存储器用于存储程序代码,并将所述程序代码传输给所述处理器;
所述处理器用于根据所述程序代码中的指令第一方面任一种所述的声纹识别方法。
本申请第四方面提供了一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行第一方面任一种所述的声纹识别方法。
从以上技术方案可以看出,本申请具有以下优点:
本申请提供了一种声纹识别方法,包括:获取待识别语音;提取待识别语音的第一语谱图;将第一语谱图输入到预置受限玻尔兹曼机中进行特征提取;将提取的特征输入到预置SVM分类器中,得到待识别语音的识别结果。本申请通过预置受限玻尔兹曼机对提取的样本语音的第一语谱图进行特征提取,将提取的特征输入到预置SVM分类器中进行分类识别,不需要通过人工比对频谱图的识别方式进行声纹识别,解决了现有的声纹识别方法通过人工比对频谱图进行识别,存在的识别效率低和准确率低的技术问题。
图1为本申请实施例中提供的一种声纹识别方法的一个流程示意图;
图2为本申请实施例中提供的一种声纹识别方法的另一个流程示意图;
图3为本申请实施例中提供的一种声纹识别装置的一个结构示意图。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提 下所获得的所有其他实施例,都属于本申请保护的范围。
为了便于理解,请参阅图1,本申请提供的一种声纹识别方法的一个实施例,包括:
步骤101、获取样本语音。
需要说明的是,可以通过语音录制设备获得样本语音。
步骤102、提取待识别语音的第一语谱图。
需要说明的是,可以通过语谱图仪获取待识别语音的第一语谱图。
步骤103、将第一语谱图输入到预置受限玻尔兹曼机中进行特征提取。
需要说明的是,其中,预置受限玻尔兹曼机可以是训练好的受限玻尔兹曼机。
步骤104、将提取的特征输入到预置SVM分类器中,得到待识别语音的识别结果。
需要说明的是,预置SVM分类器可以是训练好的SVM分类器。
本申请实施例中的声纹识别方法,通过预置受限玻尔兹曼机对提取的样本语音的第一语谱图进行特征提取,将提取的特征输入到预置SVM分类器中进行分类识别,解决了现有的声纹识别方法通过人工比对频谱图进行识别,存在的识别效率低和准确率低的技术问题。
为了便于理解,请参阅图2,本申请提供的一种声纹识别方法的另一个实施例,包括:
步骤201、获取训练样本语音数据集。
需要说明的是,可以在声纹识别数据库中获取训练样本语音数据集。
步骤202、提取训练样本语音数据集中的训练样本语音的第二语谱图。
需要说明的是,可以通过语谱图仪获取训练样本语音的第二语谱图,在提取第二语谱图之前,可以对训练样本语音数据集中的训练样本语音进行去噪预处理,降低环境噪声或通道噪声对识别结果的影响。
步骤203、将第二语谱图输入到受限玻尔兹曼机,对受限玻尔兹曼机进行优化训练,得到目标参数。
需要说明的是,目标参数包括权重参数、可视单元的偏置和隐藏单元的偏置。
步骤204、基于多目标优化算法对隐藏单元的偏置进行优化,得到优 化后的受限玻尔兹曼机。
需要说明的是,在隐藏单元的偏置中随机选取若干个偏置参数,生成第一偏置数据集;基于多目标优化算法对第一偏置数据集进行优化,得到第二偏置数据集,其中,采用多目标优化算法对数据集进行优化属于现有技术,在此,不再对优化的具体过程进行赘述;基于第二偏置数据集对第一偏置数据集中的偏置参数进行更新,具体是将第二偏置数据集的配置参数替换第一偏置数据集中的偏置参数,得到优化后的受限玻尔兹曼机。
步骤205、将第二语谱图输入到优化后的受限玻尔兹曼机进行特征提取,使得优化后的受限玻尔兹曼机输出声纹特征。
需要说明的是,采用优化后的受限玻尔兹曼机进行特征提取,提取的特征有利于提高识别率。
步骤206、将声纹特征输入到SVM分类器中,对SVM分类器进行训练。
步骤207、计算SVM分类器对训练样本语音数据集的识别率。
需要说明的是,识别率为正确识别的训练样本语音数量与训练样本语音数据集的数量的比值。
步骤208、当识别率小于阈值时,返回步骤203,当识别率大于或等于阈值时,得到训练好的受限玻尔兹曼机和训练好的SVM分类器,将训练好的受限玻尔兹曼机作为预置受限玻尔兹曼机,将训练好的SVM分类器作为预置SVM分类器。
需要说明的是,需要说明的是,当识别率小于阈值时,说明受限玻尔兹曼机和SVM分类器均未训练好,返回步骤203,继续迭代训练;当识别率大于或等于阈值时,得到训练好的受限玻尔兹曼机和训练好的SVM分类器,训练好的受限玻尔兹曼机和训练好的SVM分类器可以用于声纹识别。
步骤209、获取样本语音。
需要说明的是,可以通过语音录制设备获得样本语音。
步骤210、提取待识别语音的第一语谱图。
需要说明的是,可以通过语谱图仪获取待识别语音的第一语谱图,在提取待识别语音的第一语谱图前可以对待识别语音进行去噪预处理,以降 低噪声对识别结果的影响。
步骤211、将第一语谱图输入到预置受限玻尔兹曼机中进行特征提取。
步骤212、将提取的特征输入到预置SVM分类器中,得到待识别语音的识别结果。
需要说明的是,步骤211和步骤212与步骤103和步骤104一致,在此不再进行赘述。
为了便于理解,请参阅图3,本申请提供的一种声纹识别装置的一个实施例,包括:
第一获取模块301,用于获取待识别语音。
第一提取模块302,用于提取待识别语音的第一语谱图。
第二提取模块303,用于将第一语谱图输入到预置受限玻尔兹曼机中进行特征提取。
识别模块304,用于将提取的特征输入到预置SVM分类器中,得到待识别语音的识别结果。
进一步地,还包括:
第二获取模块305,用于获取训练样本语音数据集。
第三提取模块306,用于提取训练样本语音数据集中的训练样本语音的第二语谱图。
第一训练模块307,用于将第二语谱图输入到受限玻尔兹曼机,对受限玻尔兹曼机进行优化训练,得到目标参数,目标参数包括权重参数、可视单元的偏置和隐藏单元的偏置。
优化模块308,用于基于多目标优化算法对隐藏单元的偏置进行优化,得到优化后的所述受限玻尔兹曼机。
第四提取模块309,用于将第二语谱图输入到优化后的受限玻尔兹曼机进行特征提取,使得优化后的受限玻尔兹曼机输出声纹特征。
第二训练模块310,用于将声纹特征输入到SVM分类器中,对SVM分类器进行训练。
计算模块311,用于计算SVM分类器对训练样本语音数据集的识别率。
触发模块312,用于当识别率小于阈值时,触发第一训练模块。
输出模块313,用于当识别率大于或等于阈值时,得到训练好的受限 玻尔兹曼机和训练好的SVM分类器,将训练好的受限玻尔兹曼机作为预置受限玻尔兹曼机,将训练好的SVM分类器作为预置SVM分类器。
进一步地,优化模块308具体用于:
在隐藏单元的偏置中随机选取若干个偏置参数,生成第一偏置数据集;
基于多目标优化算法对第一偏置数据集进行优化,得到第二偏置数据集;
基于第二偏置数据集对第一偏置数据集中的偏置参数进行更新,得到优化后的受限玻尔兹曼机。
进一步地,还包括:
预处理模块314,用于对待识别语音进行预处理。
本申请提供了一种声纹识别设备的一个实施例,设备包括处理器以及存储器;
存储器用于存储程序代码,并将程序代码传输给处理器;
处理器用于根据程序代码中的指令执行前述声纹识别方法实施例中的声纹识别方法。
本申请提供了一种计算机可读存储介质的一个实施例,计算机可读存储介质用于存储程序代码,程序代码用于执行前述声纹识别方法实施例中的声纹识别方法
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元 中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以通过一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-Only Memory,英文缩写:ROM)、随机存取存储器(英文全称:Random Access Memory,英文缩写:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。
Claims (10)
- 一种声纹识别方法,其特征在于,包括:获取待识别语音;提取所述待识别语音的第一语谱图;将所述第一语谱图输入到预置受限玻尔兹曼机中进行特征提取;将提取的特征输入到预置SVM分类器中,得到所述待识别语音的识别结果。
- 根据权利要求1所述的声纹识别方法,其特征在于,所述将所述第一语谱图输入到预置受限玻尔兹曼机中进行特征提取,之前还包括:获取训练样本语音数据集;提取所述训练样本语音数据集中的训练样本语音的第二语谱图;将所述第二语谱图输入到受限玻尔兹曼机,对所述受限玻尔兹曼机进行优化训练,得到目标参数,所述目标参数包括权重参数、可视单元的偏置和隐藏单元的偏置;基于多目标优化算法对所述隐藏单元的偏置进行优化,得到优化后的所述受限玻尔兹曼机;将所述第二语谱图输入到优化后的所述受限玻尔兹曼机进行特征提取,使得优化后的所述受限玻尔兹曼机输出声纹特征;将所述声纹特征输入到SVM分类器中,对所述SVM分类器进行训练;计算所述SVM分类器对训练样本语音数据集的识别率;当所述识别率小于阈值时,返回所述将所述第二语谱图输入到受限玻尔兹曼机,对所述受限玻尔兹曼机进行优化训练,得到目标参数的步骤;当所述识别率大于或等于所述阈值时,得到训练好的所述受限玻尔兹曼机和训练好的所述SVM分类器,将训练好的所述受限玻尔兹曼机作为所述预置受限玻尔兹曼机,将训练好的所述SVM分类器作为所述预置SVM分类器。
- 根据权利要求2所述的声纹识别方法,其特征在于,所述基于多目标优化算法对所述隐藏单元的偏置进行优化,得到优化后的所述受限玻尔兹曼机,包括:在所述隐藏单元的偏置中随机选取若干个偏置参数,生成第一偏置数据集;基于多目标优化算法对所述第一偏置数据集进行优化,得到第二偏置数据集;基于所述第二偏置数据集对所述第一偏置数据集中的偏置参数进行更新,得到优化后的所述受限玻尔兹曼机。
- 根据权利要求1所述的声纹识别方法,其特征在于,所述提取所述待识别语音的第一语谱图,之前还包括:对所述待识别语音进行预处理。
- 一种声纹识别装置,其特征在于,包括:第一获取模块,用于获取待识别语音;第一提取模块,用于提取所述待识别语音的第一语谱图;第二提取模块,用于将所述第一语谱图输入到预置受限玻尔兹曼机中进行特征提取;识别模块,用于将提取的特征输入到预置SVM分类器中,得到所述待识别语音的识别结果。
- 根据权利要求5所述的声纹识别装置,其特征在于,还包括:第二获取模块,用于获取训练样本语音数据集;第三提取模块,用于提取所述训练样本语音数据集中的训练样本语音的第二语谱图;第一训练模块,用于将所述第二语谱图输入到受限玻尔兹曼机,对所述受限玻尔兹曼机进行优化训练,得到目标参数,所述目标参数包括权重参数、可视单元的偏置和隐藏单元的偏置;优化模块,用于基于多目标优化算法对所述隐藏单元的偏置进行优化,得到优化后的所述受限玻尔兹曼机;第四提取模块,用于将所述第二语谱图输入到优化后的所述受限玻尔兹曼机进行特征提取,使得优化后的所述受限玻尔兹曼机输出声纹特征;第二训练模块,用于将所述声纹特征输入到SVM分类器中,对所述SVM分类器进行训练;计算模块,用于计算所述SVM分类器对训练样本语音数据集的识别 率;触发模块,用于当所述识别率小于阈值时,触发所述第一训练模块;输出模块,用于当所述识别率大于或等于所述阈值时,得到训练好的所述受限玻尔兹曼机和训练好的所述SVM分类器,将训练好的所述受限玻尔兹曼机作为所述预置受限玻尔兹曼机,将训练好的所述SVM分类器作为所述预置SVM分类器。
- 根据权利要求6所述的声纹识别装置,其特征在于,所述优化模块具体用于:在所述隐藏单元的偏置中随机选取若干个偏置参数,生成第一偏置数据集;基于多目标优化算法对所述第一偏置数据集进行优化,得到第二偏置数据集;基于所述第二偏置数据集对所述第一偏置数据集中的偏置参数进行更新,得到优化后的所述受限玻尔兹曼机。
- 根据权利要求5所述的声纹识别装置,其特征在于,还包括:预处理模块,用于对所述待识别语音进行预处理。
- 一种声纹识别设备,其特征在于,所述设备包括处理器以及存储器;所述存储器用于存储程序代码,并将所述程序代码传输给所述处理器;所述处理器用于根据所述程序代码中的指令执行权利要求1-4任一项所述的声纹识别方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行权利要求1-4任一项所述的声纹识别方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/127967 WO2021127994A1 (zh) | 2019-12-24 | 2019-12-24 | 一种声纹识别方法、装置、设备和储存介质 |
CN201980003324.1A CN111149154B (zh) | 2019-12-24 | 2019-12-24 | 一种声纹识别方法、装置、设备和储存介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/127967 WO2021127994A1 (zh) | 2019-12-24 | 2019-12-24 | 一种声纹识别方法、装置、设备和储存介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021127994A1 true WO2021127994A1 (zh) | 2021-07-01 |
Family
ID=70525106
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/127967 WO2021127994A1 (zh) | 2019-12-24 | 2019-12-24 | 一种声纹识别方法、装置、设备和储存介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111149154B (zh) |
WO (1) | WO2021127994A1 (zh) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150279351A1 (en) * | 2012-12-19 | 2015-10-01 | Google Inc. | Keyword detection based on acoustic alignment |
CN108510979A (zh) * | 2017-02-27 | 2018-09-07 | 芋头科技(杭州)有限公司 | 一种混合频率声学识别模型的训练方法及语音识别方法 |
CN108831486A (zh) * | 2018-05-25 | 2018-11-16 | 南京邮电大学 | 基于dnn与gmm模型的说话人识别方法 |
CN110111797A (zh) * | 2019-04-04 | 2019-08-09 | 湖北工业大学 | 基于高斯超矢量和深度神经网络的说话人识别方法 |
-
2019
- 2019-12-24 CN CN201980003324.1A patent/CN111149154B/zh active Active
- 2019-12-24 WO PCT/CN2019/127967 patent/WO2021127994A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150279351A1 (en) * | 2012-12-19 | 2015-10-01 | Google Inc. | Keyword detection based on acoustic alignment |
CN108510979A (zh) * | 2017-02-27 | 2018-09-07 | 芋头科技(杭州)有限公司 | 一种混合频率声学识别模型的训练方法及语音识别方法 |
CN108831486A (zh) * | 2018-05-25 | 2018-11-16 | 南京邮电大学 | 基于dnn与gmm模型的说话人识别方法 |
CN110111797A (zh) * | 2019-04-04 | 2019-08-09 | 湖北工业大学 | 基于高斯超矢量和深度神经网络的说话人识别方法 |
Non-Patent Citations (4)
Title |
---|
GUO WANGPENG: "Research of Speaker Recognition Technology Based on Deep Learning", INFORMATION & TECHNOLOGY, CHINA MASTER'S THESES FULL-TEXT DATABASE, 1 May 2019 (2019-05-01), XP055825850 * |
JIA YANJIE, CHEN XI;YU JIE-QIONG;WANG LIAN-MING: "Fast Speaker Recognition Based on Characteristic Spectrogram and an Adaptive Clustering Self-organizing Feature Map", SCIENCE TECHNOLOGY AND ENGINEERING, ZHONGGUO JISHU JINGJI YANJIUHUI, CN, vol. 19, no. 15, 1 January 2019 (2019-01-01), CN, pages 211 - 218, XP055825848, ISSN: 1671-1815 * |
W.M. CAMPBELL ; D.E. STURIM ; D.A. REYNOLDS ; A. SOLOMONOFF: "SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation", ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 2006. ICASSP 2006 PROCEEDINGS . 2006 IEEE INTERNATIONAL CONFERENCE ON TOULOUSE, FRANCE 14-19 MAY 2006, PISCATAWAY, NJ, USA,IEEE, PISCATAWAY, NJ, USA, 14 May 2006 (2006-05-14), Piscataway, NJ, USA, pages I - I, XP031330837, ISBN: 978-1-4244-0469-8 * |
YONG FENG, QINGYU XIONG, WEIREN SHI, JUNHUA CAO: "Speaker feature extraction algorithm based on restricted Boltzmann", YIQI YIBIAO XUEBAO - CHINESE JOURNAL OF SCIENTIFIC INSTRUMENT, ZHONGGUO YIQI YIBIAO XUEHUI, BEIJING, CN, vol. 37, no. 2, 1 February 2016 (2016-02-01), CN, pages 256 - 262, XP055827114, ISSN: 0254-3087, DOI: 10.19650/j.cnki.cjsi.2016.02.003 * |
Also Published As
Publication number | Publication date |
---|---|
CN111149154B (zh) | 2021-08-24 |
CN111149154A (zh) | 2020-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lavrentyeva et al. | STC antispoofing systems for the ASVspoof2019 challenge | |
CN107492382B (zh) | 基于神经网络的声纹信息提取方法及装置 | |
US10957339B2 (en) | Speaker recognition method and apparatus, computer device and computer-readable medium | |
WO2020073694A1 (zh) | 一种声纹识别的方法、模型训练的方法以及服务器 | |
CN108305641B (zh) | 情感信息的确定方法和装置 | |
EP3614377A1 (en) | Object identifying method, computer device and computer readable storage medium | |
WO2018176894A1 (zh) | 一种说话人确认方法及装置 | |
CN109637545B (zh) | 基于一维卷积非对称双向长短时记忆网络的声纹识别方法 | |
WO2020155584A1 (zh) | 声纹特征的融合方法及装置,语音识别方法,系统及存储介质 | |
WO2019134247A1 (zh) | 基于声纹识别模型的声纹注册方法、终端装置及存储介质 | |
CN107229627B (zh) | 一种文本处理方法、装置及计算设备 | |
WO2019200744A1 (zh) | 自更新的反欺诈方法、装置、计算机设备和存储介质 | |
WO2021159902A1 (zh) | 年龄识别方法、装置、设备及计算机可读存储介质 | |
CN104167208A (zh) | 一种说话人识别方法和装置 | |
JP2014502375A (ja) | 話者照合のためのパスフレーズ・モデリングのデバイスおよび方法、ならびに話者照合システム | |
CN104538035B (zh) | 一种基于Fisher超向量的说话人识别方法及系统 | |
CN111524527A (zh) | 话者分离方法、装置、电子设备和存储介质 | |
CN109785846B (zh) | 单声道的语音数据的角色识别方法及装置 | |
WO2022134798A1 (zh) | 基于自然语言的断句方法、装置、设备及存储介质 | |
CN113450830B (zh) | 具有多重注意机制的卷积循环神经网络的语音情感识别方法 | |
WO2021128003A1 (zh) | 一种声纹同一性鉴定方法和相关装置 | |
CN112652313B (zh) | 声纹识别的方法、装置、设备、存储介质以及程序产品 | |
WO2021127990A1 (zh) | 一种基于语音降噪的声纹识别方法和相关装置 | |
CN105280181A (zh) | 一种语种识别模型的训练方法及语种识别方法 | |
WO2021127976A1 (zh) | 一种可供比对音素选取方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19958002 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.11.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19958002 Country of ref document: EP Kind code of ref document: A1 |