WO2021128003A1 - 一种声纹同一性鉴定方法和相关装置 - Google Patents

一种声纹同一性鉴定方法和相关装置 Download PDF

Info

Publication number
WO2021128003A1
WO2021128003A1 PCT/CN2019/127987 CN2019127987W WO2021128003A1 WO 2021128003 A1 WO2021128003 A1 WO 2021128003A1 CN 2019127987 W CN2019127987 W CN 2019127987W WO 2021128003 A1 WO2021128003 A1 WO 2021128003A1
Authority
WO
WIPO (PCT)
Prior art keywords
target phoneme
sample
formant
frequency deviation
formant frequency
Prior art date
Application number
PCT/CN2019/127987
Other languages
English (en)
French (fr)
Inventor
郑琳琳
Original Assignee
广州国音智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州国音智能科技有限公司 filed Critical 广州国音智能科技有限公司
Priority to CN201980003349.1A priority Critical patent/CN111108552A/zh
Priority to PCT/CN2019/127987 priority patent/WO2021128003A1/zh
Publication of WO2021128003A1 publication Critical patent/WO2021128003A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates

Definitions

  • the present invention relates to the technical field of voiceprint identification, in particular to a voiceprint identity identification method and related devices.
  • Voiceprint identity identification refers to the process of comprehensively analyzing and comparing the voice acoustic characteristics of an unknown speaker or an uncertain speaker with the voice acoustic characteristics of a known speaker, and making a conclusion whether the two are the same.
  • the main method is to manually search for phonemes one by one in the spectrogram corresponding to the sample speech, and compare the voiceprint features one by one. This method has the problem of low efficiency.
  • the present application provides a voiceprint identity identification method and related devices, which are used to solve the technical problem of low efficiency by manually comparing voiceprint features in the existing voiceprint identity identification method.
  • the first aspect of this application provides a voiceprint identity identification method, including:
  • the voiceprint identity identification result of the target phoneme is obtained.
  • the calculation of the frequency deviation of each formant of the target phoneme in the sample voice and each formant of the target phoneme in the sample voice to obtain the deviation result further includes:
  • the 4 formants of the target phoneme in the sampled speech are extracted.
  • said extracting the 4 formants of the target phoneme in the sample speech includes:
  • the 4 formants of the target phoneme in the sample speech are extracted based on the linear predictive coding technology.
  • the preset formant frequency deviation standard includes:
  • the target phoneme in the sample voice and the target phoneme of the sample voice meet the first formant frequency deviation value of less than 12%, and the second formant frequency deviation value of less than 9 %, the third formant frequency deviation value is less than 5%-6%, and the fourth formant frequency deviation value is less than 5%-6%, it is determined that the target phoneme in the sample voice is the same as the sample voice.
  • the target phoneme belongs to the same speaker;
  • the target phoneme in the sample voice and the target phoneme of the sample voice do not satisfy the first formant frequency deviation value less than 12%, and the second formant frequency deviation value less than 9%, the third formant frequency deviation value is less than 5% to 6%, and the fourth formant frequency deviation value is less than 5% to 6%.
  • the target phonemes of the material speech belong to different speakers.
  • the second aspect of the present application provides a voiceprint identity identification device, including:
  • the first acquisition module is used to acquire sample voices
  • the first extraction module is used to extract 4 formants of the target phoneme in the sample speech
  • a calculation module configured to calculate the frequency deviation of each formant of the target phoneme in the sample voice and each formant of the target phoneme in the sample voice to obtain a deviation result
  • the identification module is used to obtain the voiceprint identity identification result of the target phoneme according to the corresponding relationship between the deviation result and the preset formant frequency deviation standard.
  • it also includes:
  • the second acquisition module is configured to acquire the voice of the material inspection
  • the second extraction module is used to extract the 4 formants of the target phoneme in the sample speech.
  • the first extraction module is specifically used for:
  • the 4 formants of the target phoneme in the sample speech are extracted based on the linear predictive coding technology.
  • a third aspect of the present application provides a voiceprint identity verification device, the device including a processor and a memory;
  • the memory is used to store program code and transmit the program code to the processor
  • the processor is configured to execute any one of the voiceprint identity identification methods described in the first aspect according to instructions in the program code.
  • the fourth aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium is used to store program code, and the program code is used to execute any of the voiceprint identity verification methods described in the first aspect .
  • the fifth aspect of the present application provides a computer program product including instructions, which when run on a computer, causes the computer to execute any of the voiceprint identity identification methods described in the first aspect.
  • This application provides a voiceprint identity identification method, including: obtaining a sample voice; extracting 4 formants of the target phoneme in the sample voice; calculating each formant of the target phoneme in the sample voice and the sample voice The formant frequency deviation of each formant of the target phoneme is obtained, and the deviation result is obtained; according to the corresponding relationship between the deviation result and the preset formant frequency deviation standard, the voiceprint identity identification result of the target phoneme is obtained.
  • the voiceprint identity identification method provided in this application extracts the four formants of the target phoneme of the sample voice, and calculates each formant of the target phoneme in the sample voice and each of the target phoneme in the sample voice.
  • the formant frequency deviation of the formant, the deviation result is obtained, and the corresponding relationship between the deviation result and the preset formant frequency deviation standard is used to determine whether the target phoneme in the sample voice and the target phoneme in the sample voice are the same speaker, so as to obtain the voice
  • the identification result of the pattern identity solves the technical problem of low efficiency in the existing method of identifying the identity of the voiceprint by manually comparing the characteristics of the voiceprint one by one.
  • FIG. 1 is a schematic flowchart of an embodiment of a voiceprint identity identification method provided by this application
  • FIG. 2 is a schematic flowchart of another embodiment of a voiceprint identity identification method provided by this application.
  • FIG. 3 is a schematic structural diagram of an embodiment of a voiceprint identity verification device provided by this application.
  • An embodiment of a voiceprint identity identification method provided in this application includes:
  • Step 101 Obtain a sample voice.
  • sample voice can be obtained through a voice recording device.
  • Step 102 Extract 4 formants of the target phoneme in the sample speech.
  • each phoneme has 4 formants.
  • the phoneme cannot As the target phoneme.
  • Step 103 Calculate the frequency deviation of each formant of the target phoneme in the sample voice and each formant of the target phoneme in the sample voice to obtain a deviation result.
  • the target phoneme in the sampled speech also has 4 formants, and the calculated deviation result includes 4 formant frequency deviation values.
  • Step 104 According to the corresponding relationship between the deviation result and the preset formant frequency deviation standard, obtain the voiceprint identity identification result of the target phoneme.
  • the voiceprint identity identification method in the embodiment of the present application extracts the four formants of the target phoneme of the sample voice, and calculates each formant of the target phoneme in the sample voice and each formant of the target phoneme in the sample voice.
  • the formant frequency deviation of each formant is obtained, and the deviation result is obtained.
  • it is judged whether the target phoneme in the sample voice and the target phoneme in the sample voice are the same speaker, so as to obtain The voiceprint identity identification result solves the technical problem that the existing voiceprint identity identification method manually compares the voiceprint features one by one, which has low efficiency.
  • FIG. 2 Another embodiment of a voiceprint identity identification method provided in this application includes:
  • Step 201 Obtain the voice of the material inspection.
  • the voice of the sampled material can be obtained in the voiceprint identification database.
  • Step 202 Extract 4 formants of the target phoneme in the sampled speech.
  • each phoneme has 4 formants.
  • the formant can be extracted by linear predictive coding technology.
  • Step 203 Obtain a sample voice.
  • step 203 and step 201 can be performed simultaneously or sequentially.
  • Step 204 Extract 4 formants of the target phoneme in the sample speech.
  • the formant can be extracted by linear predictive coding technology, where the target phoneme in the sample speech and the target phoneme in the detected speech are the same phoneme.
  • Step 205 Calculate the frequency deviation of each formant of the target phoneme in the sample voice and each formant of the target phoneme in the sample voice to obtain a deviation result.
  • the target phoneme in the sampled voice also has 4 formants, and the calculated deviation results include 4 formant frequency deviation values.
  • the calculation of formant frequency deviation belongs to the prior art, and the formant frequency is not corrected here. The specific calculation process of the deviation will be described in detail.
  • Step 206 According to the corresponding relationship between the deviation result and the preset formant frequency deviation standard, obtain the voiceprint identity identification result of the target phoneme.
  • preset formant frequency deviation standards include:
  • the target phoneme in the sample voice and the target phoneme of the sample voice meet the first formant frequency deviation value less than 12%, the second formant frequency deviation value is less than 9%, and the third formant frequency deviation When the value is less than 5%-6% and the fourth formant frequency deviation value is less than 5%-6%, it is determined that the target phoneme in the sample speech and the target phoneme of the sample speech belong to the same speaker;
  • the fourth formant frequency deviation value is less than 5% to 6%, it is determined that the target phoneme in the sample speech and the target phoneme of the sample speech belong to different speakers.
  • the first formant frequency deviation value is less than 12%
  • the second formant frequency deviation value is less than 9%
  • the third formant frequency deviation value When the value is less than 5%-6% and the fourth formant frequency deviation value is less than 5%-6%, the identification result that the target phoneme in the sample speech and the target phoneme in the sample speech belong to the same speaker is obtained; for example, suppose calculation The four formant frequency deviations of the target phoneme of the sample speech and the target phoneme of the sample speech are F1: 8%, F2: 7%, F3: 5%, F4: 4%, due to F1, F2, F3, and F4 Both meet the condition of being the same speaker in the preset formant frequency deviation standard. Therefore, the target phoneme of the sample speech and the target phoneme of the sample speech belong to the same speaker.
  • the first formant frequency deviation value is less than 12%
  • the second formant frequency deviation value is less than 9%
  • the third formant frequency deviation value When the deviation value is less than 5% to 6% and the fourth formant frequency deviation value is less than 5% to 6%, the target phoneme in the sample speech and the target phoneme in the sample speech are identified as different speakers. result. For example, suppose that the four formant frequency deviations of the target phoneme of the sample voice and the target phoneme of the sample voice are calculated as F1: 16%, F2: 15%, F3: 5%, F4: 8%, because F1, F2 And F4 do not meet the preset formant frequency deviation standard. Therefore, the target phoneme of the sample speech and the target phoneme of the sample speech belong to different speakers.
  • an embodiment of a voiceprint identity verification device provided in this application includes:
  • the first obtaining module 301 is used to obtain sample voices.
  • the first extraction module 302 is used to extract 4 formants of the target phoneme in the sample speech.
  • the calculation module 303 is used to calculate the frequency deviation of each formant of the target phoneme in the sample speech and each formant of the target phoneme in the sample speech to obtain a deviation result.
  • the identification module 304 is configured to obtain the voiceprint identity identification result of the target phoneme according to the corresponding relationship between the deviation result and the preset formant frequency deviation standard.
  • the second acquiring module 305 is used to acquire the voice of the material inspection.
  • the second extraction module 306 is used to extract 4 formants of the target phoneme in the sampled speech.
  • first extraction module 302 is specifically configured to:
  • the 4 formants of the target phoneme in the sample speech are extracted.
  • This application provides an embodiment of a voiceprint identity authentication device, the device includes a processor and a memory;
  • the memory is used to store the program code and transmit the program code to the processor
  • the processor is configured to execute the voiceprint identity verification method in the foregoing embodiment of the voiceprint identity verification method according to the instructions in the program code.
  • This application provides an embodiment of a computer-readable storage medium, where the computer-readable storage medium is used to store program code, and the program code is used to execute the voiceprint identity verification method in the aforementioned voiceprint identity verification method embodiment.
  • the present application also provides a computer program product including instructions, which when run on a computer, causes the computer to execute the voiceprint identity verification method in the aforementioned voiceprint identity verification method embodiment.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. , Including several instructions to execute all or part of the steps of the methods described in the various embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device, etc.).
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (English full name: Read-Only Memory, English abbreviation: ROM), random access memory (English full name: Random Access Memory, English abbreviation: RAM), magnetic Various media that can store program codes, such as discs or optical discs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

一种声纹同一性鉴定方法和相关装置,其中方法包括:获取样本语音(101);提取样本语音中的目标音素的4个共振峰(102);计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到偏差结果(103);根据偏差结果和预置共振峰频率偏差标准的对应关系,得到目标音素的声纹同一性鉴定结果(104),解决了现有的声纹同一性鉴定方法通过人工逐一比对声纹特征,存在效率低的技术问题。

Description

一种声纹同一性鉴定方法和相关装置 技术领域
本发明涉及声纹鉴定技术领域,尤其涉及一种声纹同一性鉴定方法和相关装置。
背景技术
声纹同一性鉴定是指通过未知说话人或不确定说话人的语音声学特征与已知说话人的语音声学特征进行综合分析比对,做出两者是否同一的结论的过程。现有技术中,主要是通过将样本语音对应显示语谱图中,人工逐个查找音素,逐一比对声纹特征,该方法存在效率低的问题。
发明内容
本申请提供了一种声纹同一性鉴定方法和相关装置,用于解决现有的声纹同一性鉴定方法通过人工逐一比对声纹特征,存在效率低的技术问题。
有鉴于此,本申请第一方面提供了一种声纹同一性鉴定方法,包括:
获取样本语音;
提取所述样本语音中的目标音素的4个共振峰;
计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到偏差结果;
根据所述偏差结果和预置共振峰频率偏差标准的对应关系,得到所述目标音素的声纹同一性鉴定结果。
优选地,所述计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到偏差结果,之前还包括:
获取所述检材语音;
提取所述检材语音中的所述目标音素的4个共振峰。
优选地,所述提取所述样本语音中的目标音素的4个共振峰,包括:
基于线性预测编码技术提取所述样本语音中的所述目标音素的4个共振峰。
优选地,所述预置共振峰频率偏差标准包括:
当所述样本语音中的所述目标音素与所述检材语音的所述目标音素的4个共振峰频率偏差满足第一共振峰频率偏差值小于12%、第二共振峰频率偏差值小于9%、第三共振峰频率偏差值小于5%-6%和第四共振峰频率偏差值小于5%-6%时,判定所述样本语音中的所述目标音素与所述检材语音的所述目标音素属于同一说话人;
当所述样本语音中的所述目标音素与所述检材语音的所述目标音素的4个共振峰频率偏差不满足第一共振峰频率偏差值小于12%、第二共振峰频率偏差值小于9%、第三共振峰频率偏差值小于5%-6%和第四共振峰频率偏差值小于5%-6%中的任一个条件时,判定所述样本语音中的所述目标音素与所述检材语音的所述目标音素属于不同说话人。
本申请第二方面提供了一种声纹同一性鉴定装置,包括:
第一获取模块,用于获取样本语音;
第一提取模块,用于提取所述样本语音中的目标音素的4个共振峰;
计算模块,用于计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到偏差结果;
鉴定模块,用于根据所述偏差结果和预置共振峰频率偏差标准的对应关系,得到所述目标音素的声纹同一性鉴定结果。
优选地,还包括:
第二获取模块,用于获取所述检材语音;
第二提取模块,用于提取所述检材语音中的所述目标音素的4个共振峰。
优选地,所述第一提取模块具体用于:
基于线性预测编码技术提取所述样本语音中的所述目标音素的4个共振峰。
本申请第三方面提供了一种声纹同一性鉴定设备,所述设备包括处理器以及存储器;
所述存储器用于存储程序代码,并将所述程序代码传输给所述处理器;
所述处理器用于根据所述程序代码中的指令执行第一方面任一种所述的声纹同一性鉴定方法。
本申请第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行第一方面任一种所述的声纹同一性鉴定方法。
本申请第五方面提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行第一方面任一种所述的声纹同一性鉴定方法。
从以上技术方案可以看出,本申请具有以下优点:
本申请提供了一种声纹同一性鉴定方法,包括:获取样本语音;提取样本语音中的目标音素的4个共振峰;计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到偏差结果;根据偏差结果和预置共振峰频率偏差标准的对应关系,得到目标音素的声纹同一性鉴定结果。本申请中提供的声纹同一性鉴定方法,通过提取获取的样本语音的目标音素的4个共振峰,计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到偏差结果,根据偏差结果和预置共振峰频率偏差标准的对应关系判断样本语音中的目标音素与检材语音中的目标音素是否为同一说话人,从而得到声纹同一性鉴定结果,解决了现有的声纹同一性鉴定方法通过人工逐一比对声纹特征,存在效率低的技术问题。
附图说明
图1为本申请提供的一种声纹同一性鉴定方法的一个实施例的流程示意图;
图2为本申请提供的一种声纹同一性鉴定方法的另一个实施例的流程示意图;
图3为本申请提供的一种声纹同一性鉴定装置的一个实施例的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了便于理解,请参阅图1,本申请提供的一种声纹同一性鉴定方法的一个实施例,包括:
步骤101、获取样本语音。
需要说明的是,可以通过语音录制设备获得样本语音。
步骤102、提取样本语音中的目标音素的4个共振峰。
需要说明的是,样本语音中可能存在多个不同的音素,通常每个音素具有4个共振峰,提取样本语音中的音素的共振峰时,若某音素没有4个共振峰,则该音素不能作为目标音素。
步骤103、计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到偏差结果。
需要说明的是,检材语音中的目标音素同样具有4个共振峰,计算得到的偏差结果包括4个共振峰频率偏差值。
步骤104、根据偏差结果和预置共振峰频率偏差标准的对应关系,得到目标音素的声纹同一性鉴定结果。
需要说明的是,根据偏差结果与预置共振峰频率偏差标准的对应关系,得到样本语音中的目标音素与检材语音中的目标音素是否属于同一说话人。
本申请实施例中的声纹同一性鉴定方法,通过提取获取的样本语音的目标音素的4个共振峰,计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到偏差结果,根据偏差结果和预置共振峰频率偏差标准的对应关系判断样本语音中的目标音素与检材语音中的目标音素是否为同一说话人,从而得到声纹同一性鉴定 结果,解决了现有的声纹同一性鉴定方法通过人工逐一比对声纹特征,存在效率低的技术问题。
为了便于理解,请参阅图2,本申请提供的一种声纹同一性鉴定方法的另一个实施例,包括:
步骤201、获取检材语音。
需要说明的是,可以在声纹鉴定数据库中获取检材语音。
步骤202、提取检材语音中的目标音素的4个共振峰。
需要说明的是,检材语音中可能存在多个不同的音素,通常每个音素具有4个共振峰,提取检材语音中的音素的共振峰时,若某音素没有4个共振峰,则该音素不能作为目标音素。其中,可以通过线性预测编码技术提取共振峰。
步骤203、获取样本语音。
需要说明的是,步骤203和步骤201可以同时进行,也可以先后进行。
步骤204、提取样本语音中的目标音素的4个共振峰。
需要说明的是,可以通过线性预测编码技术提取共振峰,其中,样本语音中的目标音素与检测语音中的目标音素为相同的音素。
步骤205、计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到偏差结果。
需要说明的是,检材语音中的目标音素同样具有4个共振峰,计算得到的偏差结果包括4个共振峰频率偏差值,共振峰频率偏差的计算属于现有技术,在此不对共振峰频率偏差的具体计算过程进行赘述。
步骤206、根据偏差结果和预置共振峰频率偏差标准的对应关系,得到目标音素的声纹同一性鉴定结果。
需要说明的是,预置共振峰频率偏差标准包括:
当样本语音中的目标音素与检材语音的目标音素的4个共振峰频率偏差满足第一共振峰频率偏差值小于12%、第二共振峰频率偏差值小于9%、第三共振峰频率偏差值小于5%-6%和第四共振峰频率偏差值小于5%-6%时,判定样本语音中的目标音素与检材语音的目标音素属于同一说话人;
当样本语音中的目标音素与检材语音的目标音素的4个共振峰频率偏差不满足第一共振峰频率偏差值小于12%、第二共振峰频率偏差值小于 9%、第三共振峰频率偏差值小于5%-6%和第四共振峰频率偏差值小于5%-6%中的任一个条件时,判定样本语音中的目标音素与检材语音的目标音素属于不同说话人。
当偏差结果中的4个共振峰频率偏差值满足预置共振峰频率偏差标准中的第一共振峰频率偏差值小于12%、第二共振峰频率偏差值小于9%、第三共振峰频率偏差值小于5%-6%和第四共振峰频率偏差值小于5%-6%时,得到样本语音中的目标音素与检材语音中的目标音素属于同一说话人的鉴定结果;例如,假设计算得到检材语音的目标音素与样本语音的目标音素的4个共振峰频率偏差分别为F1:8%,F2:7%,F3:5%,F4:4%,由于F1、F2、F3和F4都符合预置共振峰频率偏差标准中属于同一说话人的条件,因此,检材语音的目标音素与样本语音的目标音素属于同一说话人。
当偏差结果中的4个共振峰频率偏差值不满足预置共振峰频率偏差标准中的第一共振峰频率偏差值小于12%、第二共振峰频率偏差值小于9%、第三共振峰频率偏差值小于5%-6%和第四共振峰频率偏差值小于5%-6%中的任一条件时,得到样本语音中的目标音素与检材语音中的目标音素属于不同说话人的鉴定结果。例如,假设计算得到检材语音的目标音素与样本语音的目标音素的4个共振峰频率偏差分别为F1:16%,F2:15%,F3:5%,F4:8%,由于F1、F2和F4不符合预置共振峰频率偏差标准,因此,检材语音的目标音素与样本语音的目标音素属于不同说话人。
为了便于理解,请参阅图3,本申请提供的一种声纹同一性鉴定装置的一个实施例,包括:
第一获取模块301,用于获取样本语音。
第一提取模块302,用于提取样本语音中的目标音素的4个共振峰。
计算模块303,用于计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到偏差结果。
鉴定模块304,用于根据偏差结果和预置共振峰频率偏差标准的对应关系,得到目标音素的声纹同一性鉴定结果。
进一步地,还包括:
第二获取模块305,用于获取检材语音。
第二提取模块306,用于提取检材语音中的目标音素的4个共振峰。
进一步地,第一提取模块302具体用于:
基于线性预测编码技术提取样本语音中的目标音素的4个共振峰。
本申请提供了一种声纹同一性鉴定设备的一个实施例,设备包括处理器以及存储器;
存储器用于存储程序代码,并将程序代码传输给处理器;
处理器用于根据程序代码中的指令执行前述声纹同一性鉴定方法实施例中的声纹同一性鉴定方法。
本申请提供了一种计算机可读存储介质的一个实施例,计算机可读存储介质用于存储程序代码,程序代码用于执行前述声纹同一性鉴定方法实施例中的声纹同一性鉴定方法。
本申请还提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行前述声纹同一性鉴定方法实施例中的声纹同一性鉴定方法。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销 售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以通过一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-Only Memory,英文缩写:ROM)、随机存取存储器(英文全称:Random Access Memory,英文缩写:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (10)

  1. 一种声纹同一性鉴定方法,其特征在于,包括:
    获取样本语音;
    提取所述样本语音中的目标音素的4个共振峰;
    计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到偏差结果;
    根据所述偏差结果和预置共振峰频率偏差标准的对应关系,得到所述目标音素的声纹同一性鉴定结果。
  2. 根据权利要求1所述的声纹同一性鉴定方法,其特征在于,所述计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到偏差结果,之前还包括:
    获取所述检材语音;
    提取所述检材语音中的所述目标音素的4个共振峰。
  3. 根据权利要求1所述的声纹同一性鉴定方法,其特征在于,所述提取所述样本语音中的目标音素的4个共振峰,包括:
    基于线性预测编码技术提取所述样本语音中的所述目标音素的4个共振峰。
  4. 根据权利要求1所述的声纹同一性鉴定方法,其特征在于,所述预置共振峰频率偏差标准包括:
    当所述样本语音中的所述目标音素与所述检材语音的所述目标音素的4个共振峰频率偏差满足第一共振峰频率偏差值小于12%、第二共振峰频率偏差值小于9%、第三共振峰频率偏差值小于5%-6%和第四共振峰频率偏差值小于5%-6%时,判定所述样本语音中的所述目标音素与所述检材语音的所述目标音素属于同一说话人;
    当所述样本语音中的所述目标音素与所述检材语音的所述目标音素的4个共振峰频率偏差不满足第一共振峰频率偏差值小于12%、第二共振峰频率偏差值小于9%、第三共振峰频率偏差值小于5%-6%和第四共振峰频率偏差值小于5%-6%中的任一个条件时,判定所述样本语音中的所述目标音素与所述检材语音的所述目标音素属于不同说话人。
  5. 一种声纹同一性鉴定装置,其特征在于,包括:
    第一获取模块,用于获取样本语音;
    第一提取模块,用于提取所述样本语音中的目标音素的4个共振峰;
    计算模块,用于计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到偏差结果;
    鉴定模块,用于根据所述偏差结果和预置共振峰频率偏差标准的对应关系,得到所述目标音素的声纹同一性鉴定结果。
  6. 根据权利要求5所述的声纹同一性鉴定装置,其特征在于,还包括:
    第二获取模块,用于获取所述检材语音;
    第二提取模块,用于提取所述检材语音中的所述目标音素的4个共振峰。
  7. 根据权利要求5所述的声纹同一性鉴定装置,其特征在于,所述第一提取模块具体用于:
    基于线性预测编码技术提取所述样本语音中的所述目标音素的4个共振峰。
  8. 一种声纹同一性鉴定设备,其特征在于,所述设备包括处理器以及存储器;
    所述存储器用于存储程序代码,并将所述程序代码传输给所述处理器;
    所述处理器用于根据所述程序代码中的指令执行权利要求1-4任一项所述的声纹同一性鉴定方法。
  9. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行权利要求1-4任一项所述的声纹同一性鉴定方法。
  10. 一种包括指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行权利要求1-4任一项所述的声纹同一性鉴定方法。
PCT/CN2019/127987 2019-12-24 2019-12-24 一种声纹同一性鉴定方法和相关装置 WO2021128003A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980003349.1A CN111108552A (zh) 2019-12-24 2019-12-24 一种声纹同一性鉴定方法和相关装置
PCT/CN2019/127987 WO2021128003A1 (zh) 2019-12-24 2019-12-24 一种声纹同一性鉴定方法和相关装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/127987 WO2021128003A1 (zh) 2019-12-24 2019-12-24 一种声纹同一性鉴定方法和相关装置

Publications (1)

Publication Number Publication Date
WO2021128003A1 true WO2021128003A1 (zh) 2021-07-01

Family

ID=70427480

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/127987 WO2021128003A1 (zh) 2019-12-24 2019-12-24 一种声纹同一性鉴定方法和相关装置

Country Status (2)

Country Link
CN (1) CN111108552A (zh)
WO (1) WO2021128003A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111640454B (zh) * 2020-05-13 2023-08-11 广州国音智能科技有限公司 频谱图匹配方法、装置、设备及计算机可读存储介质
CN111627421B (zh) * 2020-05-13 2023-08-11 广州国音智能科技有限公司 语音识别方法、装置、设备及计算机可读存储介质
CN112951274A (zh) * 2021-02-07 2021-06-11 脸萌有限公司 语音相似度确定方法及设备、程序产品
CN113409796B (zh) * 2021-05-11 2022-09-27 武汉大晟极科技有限公司 一种基于长时共振峰测量的语音同一性验证方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983178A (en) * 1997-12-10 1999-11-09 Atr Interpreting Telecommunications Research Laboratories Speaker clustering apparatus based on feature quantities of vocal-tract configuration and speech recognition apparatus therewith
WO2003007292A1 (en) * 2001-07-13 2003-01-23 Innomedia Pte Ltd Speaker verification utilizing compressed audio formants
CN103714826A (zh) * 2013-12-18 2014-04-09 安徽讯飞智元信息科技有限公司 面向声纹鉴定的共振峰自动匹配方法
CN107680601A (zh) * 2017-10-18 2018-02-09 深圳势必可赢科技有限公司 一种基于语谱图和音素检索的身份同一性检验方法及装置
CN109473105A (zh) * 2018-10-26 2019-03-15 平安科技(深圳)有限公司 与文本无关的声纹验证方法、装置和计算机设备
CN109979466A (zh) * 2019-03-21 2019-07-05 广州国音智能科技有限公司 一种声纹身份同一性鉴定方法、装置及计算机可读存储介质
CN110164454A (zh) * 2019-05-24 2019-08-23 广州国音智能科技有限公司 一种基于共振峰偏差的音频同一性判别方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050049103A (ko) * 2003-11-21 2005-05-25 삼성전자주식회사 포만트 대역을 이용한 다이얼로그 인핸싱 방법 및 장치
US20050171774A1 (en) * 2004-01-30 2005-08-04 Applebaum Ted H. Features and techniques for speaker authentication
CN101359473A (zh) * 2007-07-30 2009-02-04 国际商业机器公司 自动进行语音转换的方法和装置
CN101887722A (zh) * 2009-06-18 2010-11-17 博石金(北京)信息技术有限公司 快速声纹认证方法
CN107767860B (zh) * 2016-08-15 2023-01-13 中兴通讯股份有限公司 一种语音信息处理方法和装置
CN106710604A (zh) * 2016-12-07 2017-05-24 天津大学 提高语音可懂度的共振峰增强装置和方法
GB201801657D0 (en) * 2017-11-21 2018-03-21 Cirrus Logic Int Semiconductor Ltd Speaker enrolment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983178A (en) * 1997-12-10 1999-11-09 Atr Interpreting Telecommunications Research Laboratories Speaker clustering apparatus based on feature quantities of vocal-tract configuration and speech recognition apparatus therewith
WO2003007292A1 (en) * 2001-07-13 2003-01-23 Innomedia Pte Ltd Speaker verification utilizing compressed audio formants
CN103714826A (zh) * 2013-12-18 2014-04-09 安徽讯飞智元信息科技有限公司 面向声纹鉴定的共振峰自动匹配方法
CN107680601A (zh) * 2017-10-18 2018-02-09 深圳势必可赢科技有限公司 一种基于语谱图和音素检索的身份同一性检验方法及装置
CN109473105A (zh) * 2018-10-26 2019-03-15 平安科技(深圳)有限公司 与文本无关的声纹验证方法、装置和计算机设备
CN109979466A (zh) * 2019-03-21 2019-07-05 广州国音智能科技有限公司 一种声纹身份同一性鉴定方法、装置及计算机可读存储介质
CN110164454A (zh) * 2019-05-24 2019-08-23 广州国音智能科技有限公司 一种基于共振峰偏差的音频同一性判别方法及装置

Also Published As

Publication number Publication date
CN111108552A (zh) 2020-05-05

Similar Documents

Publication Publication Date Title
WO2021128003A1 (zh) 一种声纹同一性鉴定方法和相关装置
CN108305615B (zh) 一种对象识别方法及其设备、存储介质、终端
US9536547B2 (en) Speaker change detection device and speaker change detection method
WO2021128741A1 (zh) 语音情绪波动分析方法、装置、计算机设备及存储介质
Rouvier et al. An open-source state-of-the-art toolbox for broadcast news diarization
Singh et al. Speaker's voice characteristics and similarity measurement using Euclidean distances
Tian et al. Spoofing detection from a feature representation perspective
US20160180852A1 (en) Speaker identification using spatial information
WO2019148586A1 (zh) 多人发言中发言人识别方法以及装置
US10089994B1 (en) Acoustic fingerprint extraction and matching
WO2021127998A1 (zh) 一种声纹鉴定方法和相关装置
CN109979466B (zh) 一种声纹身份同一性鉴定方法、装置及计算机可读存储介质
CN110164454B (zh) 一种基于共振峰偏差的音频同一性判别方法及装置
US11417344B2 (en) Information processing method, information processing device, and recording medium for determining registered speakers as target speakers in speaker recognition
WO2021127976A1 (zh) 一种可供比对音素选取方法和装置
Shirali-Shahreza et al. Effect of MFCC normalization on vector quantization based speaker identification
CN113348504A (zh) 二次分割聚类、自动语音识别和转录生成的系统及方法
Gupta et al. Applications of MFCC and Vector Quantization in speaker recognition
Estevez et al. Study on the fairness of speaker verification systems across accent and gender groups
CN112735432B (zh) 音频识别的方法、装置、电子设备及存储介质
CN111402898B (zh) 音频信号处理方法、装置、设备及存储介质
CN112652313A (zh) 声纹识别的方法、装置、设备、存储介质以及程序产品
CN109378004B (zh) 一种音素比对的方法、装置、设备及计算机可读存储介质
WO2021051533A1 (zh) 基于地址信息的黑名单识别方法、装置、设备及存储介质
Tomchuk Spectral masking in MFCC calculation for noisy speech

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19957580

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.11.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19957580

Country of ref document: EP

Kind code of ref document: A1