WO2021127975A1 - 一种声音采集对象声纹检测方法、装置和设备 - Google Patents

一种声音采集对象声纹检测方法、装置和设备 Download PDF

Info

Publication number
WO2021127975A1
WO2021127975A1 PCT/CN2019/127882 CN2019127882W WO2021127975A1 WO 2021127975 A1 WO2021127975 A1 WO 2021127975A1 CN 2019127882 W CN2019127882 W CN 2019127882W WO 2021127975 A1 WO2021127975 A1 WO 2021127975A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
collection object
sound
voiceprint
pcm
Prior art date
Application number
PCT/CN2019/127882
Other languages
English (en)
French (fr)
Inventor
陈昊亮
罗伟航
Original Assignee
广州国音智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州国音智能科技有限公司 filed Critical 广州国音智能科技有限公司
Priority to CN201980003351.9A priority Critical patent/CN111108553A/zh
Priority to PCT/CN2019/127882 priority patent/WO2021127975A1/zh
Publication of WO2021127975A1 publication Critical patent/WO2021127975A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/16Hidden Markov models [HMM]

Definitions

  • This application relates to the technical field of audio recognition, and in particular to a method, device and equipment for detecting voiceprints of a sound collection object.
  • Voiceprint is a sound wave spectrum that carries verbal information displayed by electroacoustic instruments.
  • the size and shape of the organs used by different people when speaking are relatively large, so the voiceprint patterns of any two people are different.
  • the voice signal can be converted into an electrical signal, and then the computer can be used for recognition, thereby obtaining the voiceprint recognition result.
  • Voiceprint recognition requires the sound collection of the collection object.
  • the sound collection process may be mixed with background sounds or sounds made by objects other than the collection object.
  • the sounds of these non-target collection objects will interfere with the sound detection of the collection object. Therefore, there are It is necessary to perform voiceprint recognition on the collected audio to determine whether the audio has waveform distortion and whether the audio is the sound of the normal collection object.
  • the present application provides a method, device and equipment for detecting the voiceprint of a sound collection object, which are used to detect whether the collected audio is the normal sound of the collection object.
  • the first aspect of the present application provides a method for detecting the voiceprint of a sound collection object, including:
  • the frame rate matrix is input into a hidden Markov model, and it is determined whether the audio is a normal collection target audio according to the output result of the hidden Markov model.
  • the conversion of the audio of the collection object into a WAV format file processed by PCM encoding further includes:
  • the audio of the collection object is collected through a microphone.
  • the number of rows of the frame rate matrix is 12 rows.
  • the step of inputting the frame rate matrix into a hidden Markov model, and judging whether the audio is a normal collection target audio according to the output result of the hidden Markov model further includes:
  • a second aspect of the present application provides a voiceprint detection device for a sound collection object, including:
  • the conversion module is used to convert the audio of the collection object into a WAV format file processed by PCM encoding
  • the cutting module is used to mute the beginning and the end of the WAV format file to obtain the to-be-processed PCM audio stream;
  • a framing module configured to perform sound framing on the PCM audio stream based on a moving window function
  • a feature extraction module configured to perform waveform transformation on the PCM audio stream after the sound is framed, and obtain a frame rate matrix after voiceprint feature extraction;
  • the recognition module is configured to input the frame rate matrix into a hidden Markov model, and determine whether the audio is a normal collection target audio according to the output result of the hidden Markov model.
  • it also includes:
  • the collection module is used to collect the audio of the collection object through a microphone.
  • it also includes:
  • the training module is used to train the hidden Markov model.
  • a third aspect of the present application provides a voiceprint detection device for a sound collection object, the device includes a processor and a memory:
  • the memory is used to store program code and transmit the program code to the processor
  • the processor is configured to execute any one of the voiceprint detection methods of the sound collection object described in the first aspect according to the instructions in the program code.
  • a fourth aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium is used to store program code, and the program code is used to perform any of the voiceprint detection of a sound collection object described in the first aspect method.
  • the fifth aspect of the present application provides a computer program product including instructions, which when run on a computer, cause the computer to execute any of the voiceprint detection methods of a sound collection object described in the first aspect.
  • a method for detecting the voiceprint of a sound collection object includes: converting the audio of the collection object into a WAV format file processed by PCM coding; mute the beginning and the end of the WAV format file to obtain a to-be-processed PCM audio stream; Perform sound framing of the PCM audio stream based on the moving window function; perform waveform transformation on the PCM audio stream after the sound framing, and obtain the frame rate matrix after voiceprint feature extraction; the frame rate matrix is input into the hidden Markov model, and the frame rate matrix is input according to the hidden Markov model.
  • the output result of the Markov model determines whether the audio is a normal collection target audio.
  • the voiceprint detection method of a sound collection object converts the collected audio of the collection object into a PCM-encoded WAV format file, and then performs audio excision, framing and acoustic feature extraction processing to obtain a frame rate matrix.
  • the Kofu model performs audio recognition, and judges whether the audio is the normal sound of the collection object according to the output result of the hidden Markov model, and realizes the recognition and detection of whether the sound of the collection object is normal.
  • FIG. 1 is a schematic flowchart of a method for detecting voiceprints of a sound collection object provided in an embodiment of the application
  • FIG. 2 is a schematic diagram of another process of a method for detecting voiceprint of a sound collection object provided in an embodiment of the application;
  • FIG. 3 is a schematic structural diagram of a voiceprint detection device for a sound collection object provided in an embodiment of the application.
  • the method for detecting voiceprint of a voice collection object in the embodiment of the present application includes:
  • Step 101 Convert the audio of the collection object into a WAV format file processed by PCM encoding.
  • the audio collected by the collection object needs to be converted into a non-compressed pure waveform windows PCM file, that is, a WAV format file processed by PCM encoding.
  • the PCM stream of a WAV format file stores the file header and the waveform points of the sound. Coordinates of the waveform points to make a sound waveform diagram.
  • Step 102 Mute the beginning and the end of the WAV format file to obtain a to-be-processed PCM audio stream.
  • Step 103 Perform sound framing on the PCM audio stream based on the moving window function.
  • the sound can be divided into frames by moving the window function and cut into multiple small segments.
  • Step 104 Perform waveform transformation on the PCM audio stream after the sound is framed, and obtain a frame rate matrix after voiceprint feature extraction.
  • the PCM audio stream needs to be waveform transformed.
  • the acoustic feature extraction module is used to extract the acoustic characteristics of the sound waveform, and a 12-line frame rate matrix can be obtained. Recognize N frames of speech as a state, every 3 states are combined into a phoneme, and multiple phonemes are combined into a word. Chinese uses initials and finals as a phoneme set. The state of each frame of phoneme can be judged according to the trained The acoustic model matches the state value with the highest probability, so that each frame gets a state number.
  • Step 105 Input the frame rate matrix into the hidden Markov model, and judge whether the audio is a normal collection target audio according to the output result of the hidden Markov model.
  • a state network needs to be constructed through the hidden Markov model, and the sound path is matched in the state network, so as to achieve audio decoding and output a new PCM stream.
  • the judgment method can be a preset number environment variable to determine whether each audio segment has waveform distortion, so as to determine whether the audio is the audio of the normal collection object.
  • the voiceprint detection method of a sound collection object converts the collected audio of the collection object into a PCM-encoded WAV format file, and then performs audio excision, framing and acoustic feature extraction processing to obtain a frame rate matrix.
  • the Hidden Markov Model performs audio recognition. According to the output result of the Hidden Markov Model, it is judged whether the audio is the normal sound of the collection object, and the recognition and detection of whether the sound of the collection object is normal is realized.
  • This application provides another embodiment of a method for detecting voiceprints of a sound collection object.
  • the method for detecting voiceprints of a sound collection object in this embodiment of the application includes:
  • Step 201 Collect audio of the collection object through a microphone.
  • Step 202 Convert the audio of the collection object into a WAV format file processed by PCM encoding.
  • Step 203 Mute the beginning and the end of the WAV format file to obtain a to-be-processed PCM audio stream.
  • Step 204 Perform sound framing on the PCM audio stream based on the moving window function.
  • Step 205 Perform waveform transformation on the PCM audio stream after the sound is framed, and obtain a frame rate matrix after voiceprint feature extraction.
  • step 202 to step 205 in the embodiment of the present application are consistent with step 101 to step 104 of the previous embodiment, and will not be repeated here.
  • Step 206 Input the frame rate matrix into the hidden Markov model, and determine whether the audio is a normal collection target audio according to the output result of the hidden Markov model.
  • the hidden Markov model needs to be trained before it is used, and the hidden Markov model can be trained through the BW-GA method.
  • This application provides another embodiment of a voiceprint detection device for a sound collection object.
  • the voiceprint detection device for a sound collection object in this embodiment of the application includes:
  • the conversion module is used to convert the audio of the collection object into a WAV format file processed by PCM encoding.
  • the cutting module is used to mute the beginning and the end of the WAV format file to obtain the PCM audio stream to be processed.
  • the framing module is used for sound framing of the PCM audio stream based on the moving window function.
  • the feature extraction module is used for waveform transformation of the PCM audio stream after the sound is framed, and the frame rate matrix is obtained after voiceprint feature extraction.
  • the recognition module is used for inputting the frame rate matrix into the hidden Markov model, and judging whether the audio is a normal collection target audio according to the output result of the hidden Markov model.
  • the collection module is used to collect the audio of the collection object through a microphone.
  • the training module is used to train the hidden Markov model.
  • This application also provides an embodiment of a device for detecting voiceprints of a sound collection object, the device includes a processor and a memory:
  • the memory is used to store the program code and transmit the program code to the processor
  • the processor is configured to execute any one of the voiceprint detection methods of the voice collection object in the foregoing embodiments of the voice collection object voiceprint detection method according to the instructions in the program code.
  • This application also provides a computer-readable storage medium, where the computer-readable storage medium is used to store program code, and the program code is used to execute any of the aforementioned sound collection object voiceprint detection method embodiments. Pattern detection method.
  • the present application also provides a computer program product including instructions, which is characterized in that when it runs on a computer, the computer is caused to execute any one of the sound collection object voiceprint detection method embodiments described above.
  • Object voiceprint detection method Object voiceprint detection method.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (English full name: Read-Only Memory, English abbreviation: ROM), random access memory (English full name: Random Access Memory, English abbreviation: RAM), magnetic Various media that can store program codes, such as discs or optical discs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种声音采集对象声纹检测方法、装置和设备,该方法包括:将收集到的采集对象的音频转换为PCM编码的WAV格式文件(101);将WAV格式文件的首尾段静音切除,得到待处理PCM音频流(102);基于移动窗函数对PCM音频流进行声音分帧(103);对声音分帧后的PCM音频流进行波形变换,进行声纹特征提取后得到帧率矩阵(104);将帧率矩阵输入隐马尔可夫模型,根据隐马尔可夫模型的输出结果判断音频是否为正常的采集对象音频(105)。该方法实现了对采集对象的声音是否正常的识别检测。

Description

一种声音采集对象声纹检测方法、装置和设备 技术领域
本申请涉及音频识别技术领域,尤其涉及一种声音采集对象声纹检测方法、装置和设备。
背景技术
声纹是用电声学仪器显示的携带言语信息的声波频谱,不同的人在讲话时使用是发生器官在尺寸和形态方面的差异比较大,所以任何两个人的声纹图谱都有差异。使用声纹识别技术,可以把声信号转换成电信号,再用计算机进行识别,从而得到声纹识别结果。
声纹识别需要进行采集对象的声音采集,声音采集过程中可能会夹杂背景声音或非采集对象的对象发出的声音,这些非目标采集对象的声音会对采集对象的声音检测存在干扰,因此,有必要对采集到的音频进行声纹识别,判断音频是否存在波形失真,音频是否为正常的采集对象的声音。
发明内容
本申请提供了一种声音采集对象声纹检测方法、装置和设备,用于检测采集到的音频是否为采集对象的正常声音。
有鉴于此,本申请第一方面提供了一种声音采集对象声纹检测方法,包括:
将采集对象的音频转换成PCM编码处理的WAV格式文件;
将所述WAV格式文件的首尾段静音切除,得到待处理PCM音频流;
基于移动窗函数对所述PCM音频流进行声音分帧;
对声音分帧后的所述PCM音频流进行波形变换,进行声纹特征提取后得到帧率矩阵;
将所述帧率矩阵输入隐马尔可夫模型,根据所述隐马尔可夫模型的输出结果判断所述音频是否为正常的采集对象音频。
可选地,所述将采集对象的音频转换成PCM编码处理的WAV格式文件,之前还包括:
通过麦克风采集所述采集对象的音频。
可选地,所述帧率矩阵的行数为12行。
可选地,所述将所述帧率矩阵输入隐马尔可夫模型,根据所述隐马尔可夫模型的输出结果判断所述音频是否为正常的采集对象音频,之前还包括:
对所述隐马尔可夫模型进行训练。
本申请第二方面提供了一种声音采集对象声纹检测装置,包括:
转换模块,用于将采集对象的音频转换成PCM编码处理的WAV格式文件;
切除模块,用于将所述WAV格式文件的首尾段静音切除,得到待处理PCM音频流;
分帧模块,用于基于移动窗函数对所述PCM音频流进行声音分帧;
特征提取模块,用于对声音分帧后的所述PCM音频流进行波形变换,进行声纹特征提取后得到帧率矩阵;
识别模块,用于将所述帧率矩阵输入隐马尔可夫模型,根据所述隐马尔可夫模型的输出结果判断所述音频是否为正常的采集对象音频。
可选地,还包括:
采集模块,用于通过麦克风采集所述采集对象的音频。
可选地,还包括:
训练模块,用于对所述隐马尔可夫模型进行训练。
本申请第三方面提供了一种声音采集对象声纹检测设备,所述设备包括处理器以及存储器:
所述存储器用于存储程序代码,并将所述程序代码传输给所述处理器;
所述处理器用于根据所述程序代码中的指令执行第一方面的所述的任一种声音采集对象声纹检测方法。
本申请第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行第一方面所述的任一种声音采集对象声纹检测方法。
本申请第五方面提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行第一方面所述的任一种声音采集对象声纹检测方 法。
从以上技术方案可以看出,本申请实施例具有以下优点:
本申请中,提供了一种声音采集对象声纹检测方法,包括:将采集对象的音频转换成PCM编码处理的WAV格式文件;将WAV格式文件的首尾段静音切除,得到待处理PCM音频流;基于移动窗函数对PCM音频流进行声音分帧;对声音分帧后的PCM音频流进行波形变换,进行声纹特征提取后得到帧率矩阵;将帧率矩阵输入隐马尔可夫模型,根据隐马尔可夫模型的输出结果判断音频是否为正常的采集对象音频。本申请提供的声音采集对象声纹检测方法,将收集到的采集对象的音频转换为PCM编码的WAV格式文件,然后进行音频切除、分帧和声学特征提取处理,得到帧率矩阵,通过隐马尔可夫模型进行音频识别,根据隐马尔可夫模型的输出结果判断音频是否为正常的采集对象声音,实现了对采集对象的声音是否正常的识别检测。
附图说明
图1为本申请实施例中提供的一种声音采集对象声纹检测方法的流程示意图;
图2为本申请实施例中提供的一种声音采集对象声纹检测方法的另一流程示意图;
图3为本申请实施例中提供的一种声音采集对象声纹检测装置的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了便于理解,请参阅图1,本申请提供的一种声音采集对象声纹检测方 法的一个实施例,本申请实施例中的声音采集对象声纹检测方法,包括:
步骤101、将采集对象的音频转换成PCM编码处理的WAV格式文件。
需要说明的是,对采集对象采集到的音频需要转成非压缩纯波形windows PCM文件,即使用PCM编码处理的WAV格式文件,WAV格式文件的PCM流存储了文件头以及声音的波形点,通过波形点的坐标,制作声音的波形图。
步骤102、将WAV格式文件的首尾段静音切除,得到待处理PCM音频流。
需要说明的是,在将WAV格式文件输入到算法模型之前,需要将WAV格式文件的首尾段的静音切除,降低干扰。
步骤103、基于移动窗函数对PCM音频流进行声音分帧。
需要说明的是,可以通过移动窗函数把声音进行分帧,切成多个小段。
步骤104、对声音分帧后的PCM音频流进行波形变换,进行声纹特征提取后得到帧率矩阵。
需要说明的是,分帧后,需要将PCM音频流进行波形变换,根据人耳生理特征,利用声学特征提取模块对声音波形进行声学特征提取,可得到一个12行的帧率矩阵。将N帧语音识别为一个状态,每3个状态组合成一个音素,多个音素组合成一个单词,汉语则是用声母跟韵母作为音素集,每一帧音素对应的状态判断可以根据训练好的声学模型匹配概率最大的状态值,让每一帧得到一个状态号。
步骤105、将帧率矩阵输入隐马尔可夫模型,根据隐马尔可夫模型的输出结果判断音频是否为正常的采集对象音频。
需要说明的是,为了把得到不同的状态号的帧音频组合起来,需要通过隐马尔可夫模型构建一个状态网络,在状态网络中匹配声音的路径,从而实现音频的解码,输出新的PCM流,通过对音频的标识,判断出每一段音频的准确性,判断方式可以是预先设定号环境变量,判断每一段音频是否出现波形失真,从而确定音频是否为正常的采集对象的音频。
本申请实施例提供的声音采集对象声纹检测方法,将收集到的采集对象的音频转换为PCM编码的WAV格式文件,然后进行音频切除、分帧和声学特征提取处理,得到帧率矩阵,通过隐马尔可夫模型进行音频识别,根据隐马尔可夫模型的输出结果判断音频是否为正常的采集对象声音,实现了对采集对象 的声音是否正常的识别检测。
为了便于理解,请参阅图2,本申请中提供了一种声音采集对象声纹检测方法的另一个实施例,本申请实施例中的声音采集对象声纹检测方法,包括:
步骤201、通过麦克风采集采集对象的音频。
需要说明的是,本申请实施例中,首选通过麦克风采集目标采集对象的音频。
步骤202、将采集对象的音频转换成PCM编码处理的WAV格式文件。
步骤203、将WAV格式文件的首尾段静音切除,得到待处理PCM音频流。
步骤204、基于移动窗函数对PCM音频流进行声音分帧。
步骤205、对声音分帧后的PCM音频流进行波形变换,进行声纹特征提取后得到帧率矩阵。
需要说明的是,本申请实施例中的步骤202至步骤205与上一实施例的步骤101至步骤104一致,在此不再进行赘述。
步骤206、将帧率矩阵输入隐马尔可夫模型,根据隐马尔可夫模型的输出结果判断音频是否为正常的采集对象音频。
需要说明的是,隐马尔可夫模型在使用之前,需要先进行训练,可以通过BW-GA方法对隐马尔可夫模型进行训练。
为了便于理解,请参阅图3,本申请中提供了一种声音采集对象声纹检测装置的另一个实施例,本申请实施例中的声音采集对象声纹检测装置,包括:
转换模块,用于将采集对象的音频转换成PCM编码处理的WAV格式文件。
切除模块,用于将WAV格式文件的首尾段静音切除,得到待处理PCM音频流。
分帧模块,用于基于移动窗函数对PCM音频流进行声音分帧。
特征提取模块,用于对声音分帧后的PCM音频流进行波形变换,进行声纹特征提取后得到帧率矩阵。
识别模块,用于将帧率矩阵输入隐马尔可夫模型,根据隐马尔可夫模型的输出结果判断音频是否为正常的采集对象音频。
还可以包括:
采集模块,用于通过麦克风采集所述采集对象的音频。
还可以包括:
训练模块,用于对隐马尔可夫模型进行训练。
本申请中还提供了一种声音采集对象声纹检测设备的实施例,设备包括处理器以及存储器:
存储器用于存储程序代码,并将程序代码传输给所述处理器;
处理器用于根据程序代码中的指令执行前述的声音采集对象声纹检测方法实施例中的任一种声音采集对象声纹检测方法。
本申请中还提供了一种计算机可读存储介质,计算机可读存储介质用于存储程序代码,程序代码用于执行前述的声音采集对象声纹检测方法实施例中的任一种声音采集对象声纹检测方法。
本申请中还提供了一种包括指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行前述的声音采集对象声纹检测方法实施例中的任一种声音采集对象声纹检测方法。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-Only Memory,英文缩写:ROM)、随机存取存储器(英文全称:Random Access Memory,英文缩写:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (10)

  1. 一种声音采集对象声纹检测方法,其特征在于,包括:
    将采集对象的音频转换成PCM编码处理的WAV格式文件;
    将所述WAV格式文件的首尾段静音切除,得到待处理PCM音频流;
    基于移动窗函数对所述PCM音频流进行声音分帧;
    对声音分帧后的所述PCM音频流进行波形变换,进行声纹特征提取后得到帧率矩阵;
    将所述帧率矩阵输入隐马尔可夫模型,根据所述隐马尔可夫模型的输出结果判断所述音频是否为正常的采集对象音频。
  2. 根据权利要求1所述的声音采集对象声纹检测方法,其特征在于,所述将采集对象的音频转换成PCM编码处理的WAV格式文件,之前还包括:
    通过麦克风采集所述采集对象的音频。
  3. 根据权利要求1所述的声音采集对象声纹检测方法,其特征在于,所述帧率矩阵的行数为12行。
  4. 根据权利要求1所述的声音采集对象声纹检测方法,其特征在于,所述将所述帧率矩阵输入隐马尔可夫模型,根据所述隐马尔可夫模型的输出结果判断所述音频是否为正常的采集对象音频,之前还包括:
    对所述隐马尔可夫模型进行训练。
  5. 一种声音采集对象声纹检测装置,其特征在于,包括:
    转换模块,用于将采集对象的音频转换成PCM编码处理的WAV格式文件;
    切除模块,用于将所述WAV格式文件的首尾段静音切除,得到待处理PCM音频流;
    分帧模块,用于基于移动窗函数对所述PCM音频流进行声音分帧;
    特征提取模块,用于对声音分帧后的所述PCM音频流进行波形变换,进行声纹特征提取后得到帧率矩阵;
    识别模块,用于将所述帧率矩阵输入隐马尔可夫模型,根据所述隐马尔可夫模型的输出结果判断所述音频是否为正常的采集对象音频。
  6. 根据权利要求5所述的声音采集对象声纹检测装置,其特征在于,还 包括:
    采集模块,用于通过麦克风采集所述采集对象的音频。
  7. 根据权利要求5所述的声音采集对象声纹检测装置,其特征在于,还包括:
    训练模块,用于对所述隐马尔可夫模型进行训练。
  8. 一种声音采集对象声纹检测设备,其特征在于,所述设备包括处理器以及存储器:
    所述存储器用于存储程序代码,并将所述程序代码传输给所述处理器;
    所述处理器用于根据所述程序代码中的指令执行权利要求1-4中任一项所述的声音采集对象声纹检测方法。
  9. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行权利要求1-4中任一项所述的声音采集对象声纹检测方法。
  10. 一种包括指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行权利要求1-4中任一项所述的声音采集对象声纹检测方法。
PCT/CN2019/127882 2019-12-24 2019-12-24 一种声音采集对象声纹检测方法、装置和设备 WO2021127975A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980003351.9A CN111108553A (zh) 2019-12-24 2019-12-24 一种声音采集对象声纹检测方法、装置和设备
PCT/CN2019/127882 WO2021127975A1 (zh) 2019-12-24 2019-12-24 一种声音采集对象声纹检测方法、装置和设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/127882 WO2021127975A1 (zh) 2019-12-24 2019-12-24 一种声音采集对象声纹检测方法、装置和设备

Publications (1)

Publication Number Publication Date
WO2021127975A1 true WO2021127975A1 (zh) 2021-07-01

Family

ID=70427482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/127882 WO2021127975A1 (zh) 2019-12-24 2019-12-24 一种声音采集对象声纹检测方法、装置和设备

Country Status (2)

Country Link
CN (1) CN111108553A (zh)
WO (1) WO2021127975A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112687295A (zh) * 2020-12-22 2021-04-20 联想(北京)有限公司 一种输入控制方法及电子设备
CN115240687A (zh) * 2022-06-30 2022-10-25 国网安徽省电力有限公司电力科学研究院 一种gis声纹信号采集装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129860A (zh) * 2011-04-07 2011-07-20 魏昕 基于无限状态隐马尔可夫模型的与文本相关的说话人识别方法
CN202124017U (zh) * 2011-06-10 2012-01-25 沈阳君天科技股份有限公司 基于嵌入式系统的语音直接启动汽车与防盗的装置
CN102404278A (zh) * 2010-09-08 2012-04-04 盛乐信息技术(上海)有限公司 一种基于声纹识别的点歌系统及其应用方法
CN102815279A (zh) * 2011-06-10 2012-12-12 沈阳君天科技股份有限公司 基于嵌入式系统的语音直接启动汽车与防盗的方法及装置
CN104064189A (zh) * 2014-06-26 2014-09-24 厦门天聪智能软件有限公司 一种声纹动态口令的建模和验证方法
US20180358020A1 (en) * 2017-06-13 2018-12-13 Beijing Didi Infinity Technology And Development Co., Ltd. Method, apparatus and system for speaker verification
CN110390948A (zh) * 2019-07-24 2019-10-29 厦门快商通科技股份有限公司 一种快速语音识别的方法及系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1808567A (zh) * 2006-01-26 2006-07-26 覃文华 验证真人在场状态的声纹认证设备和其认证方法
US9460722B2 (en) * 2013-07-17 2016-10-04 Verint Systems Ltd. Blind diarization of recorded calls with arbitrary number of speakers
CN104464724A (zh) * 2014-12-08 2015-03-25 南京邮电大学 一种针对刻意伪装语音的说话人识别方法
CN108172241B (zh) * 2017-12-27 2020-11-17 上海传英信息技术有限公司 一种基于智能终端的音乐推荐方法及音乐推荐系统
CN108847217A (zh) * 2018-05-31 2018-11-20 平安科技(深圳)有限公司 一种语音切分方法、装置、计算机设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404278A (zh) * 2010-09-08 2012-04-04 盛乐信息技术(上海)有限公司 一种基于声纹识别的点歌系统及其应用方法
CN102129860A (zh) * 2011-04-07 2011-07-20 魏昕 基于无限状态隐马尔可夫模型的与文本相关的说话人识别方法
CN202124017U (zh) * 2011-06-10 2012-01-25 沈阳君天科技股份有限公司 基于嵌入式系统的语音直接启动汽车与防盗的装置
CN102815279A (zh) * 2011-06-10 2012-12-12 沈阳君天科技股份有限公司 基于嵌入式系统的语音直接启动汽车与防盗的方法及装置
CN104064189A (zh) * 2014-06-26 2014-09-24 厦门天聪智能软件有限公司 一种声纹动态口令的建模和验证方法
US20180358020A1 (en) * 2017-06-13 2018-12-13 Beijing Didi Infinity Technology And Development Co., Ltd. Method, apparatus and system for speaker verification
CN110390948A (zh) * 2019-07-24 2019-10-29 厦门快商通科技股份有限公司 一种快速语音识别的方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GU ZHIXIN: "Research on Mode and Algorithm of Identity Authentication Based on Voiceprint", CHINESE DOCTORAL DISSERTATIONS & MASTER'S THESES FULL-TEXT DATABASE, GRADUATE SCHOOL OF PEKING UNION MEDICAL COLLEGE, CN, 1 January 2005 (2005-01-01), CN, XP055827317, ISSN: 1671-6779 *

Also Published As

Publication number Publication date
CN111108553A (zh) 2020-05-05

Similar Documents

Publication Publication Date Title
WO2020224217A1 (zh) 语音处理方法、装置、计算机设备及存储介质
JP4797342B2 (ja) オーディオデータを自動的に認識する方法及び装置
Patel et al. Speech recognition and verification using MFCC & VQ
WO2017084327A1 (zh) 一种添加账号的方法、终端、服务器、计算机存储介质
CN112133277B (zh) 样本生成方法及装置
CN113724718B (zh) 目标音频的输出方法及装置、系统
WO2021127975A1 (zh) 一种声音采集对象声纹检测方法、装置和设备
Nasib et al. A real time speech to text conversion technique for bengali language
JP2006285254A (ja) 音声速度測定方法及び装置並びに録音装置
CN110858476A (zh) 一种基于麦克风阵列的声音采集方法及装置
JP2001166789A (ja) 初頭/末尾の音素類似度ベクトルによる中国語の音声認識方法及びその装置
JP5385876B2 (ja) 音声区間検出方法、音声認識方法、音声区間検出装置、音声認識装置、そのプログラム及び記録媒体
CN110265000A (zh) 一种实现快速语音文字记录的方法
KR101022519B1 (ko) 모음 특징을 이용한 음성구간 검출 시스템 및 방법과 이에 사용되는 음향 스펙트럼 유사도 측정 방법
CN111667834A (zh) 一种助听设备及助听方法
CN113744715A (zh) 声码器语音合成方法、装置、计算机设备及存储介质
CN110767238B (zh) 基于地址信息的黑名单识别方法、装置、设备及存储介质
Aggarwal et al. Implementing a speech recognition system interface for indian languages
JP2012155301A (ja) 状況認知型音声認識方法
KR20100056859A (ko) 음성 인식 장치 및 방법
Waghmare et al. A Comparative Study of the Various Emotional Speech Databases
Karthikeyan et al. Speech enhancement approach for body-conducted unvoiced speech based on Taylor–Boltzmann machines trained DNN
JP7296214B2 (ja) 音声認識システム
WO2022068675A1 (zh) 发声者语音抽取方法、装置、存储介质及电子设备
Deshmukh et al. Automatic recognition of class variants of Marathi consonants

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19957478

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.12.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19957478

Country of ref document: EP

Kind code of ref document: A1