WO2021127998A1 - 一种声纹鉴定方法和相关装置 - Google Patents

一种声纹鉴定方法和相关装置 Download PDF

Info

Publication number
WO2021127998A1
WO2021127998A1 PCT/CN2019/127977 CN2019127977W WO2021127998A1 WO 2021127998 A1 WO2021127998 A1 WO 2021127998A1 CN 2019127977 W CN2019127977 W CN 2019127977W WO 2021127998 A1 WO2021127998 A1 WO 2021127998A1
Authority
WO
WIPO (PCT)
Prior art keywords
target phoneme
sample
frequency deviation
deviation
formant
Prior art date
Application number
PCT/CN2019/127977
Other languages
English (en)
French (fr)
Inventor
郑琳琳
Original Assignee
广州国音智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州国音智能科技有限公司 filed Critical 广州国音智能科技有限公司
Priority to PCT/CN2019/127977 priority Critical patent/WO2021127998A1/zh
Priority to CN201980003350.4A priority patent/CN111108551B/zh
Publication of WO2021127998A1 publication Critical patent/WO2021127998A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • This application relates to the technical field of voiceprint identification, and in particular to a voiceprint identification method and related devices.
  • Voiceprint identification refers to the process of comprehensively analyzing and comparing the voice acoustic characteristics of an unknown speaker or an uncertain speaker with the voice acoustic characteristics of a known speaker, and making a conclusion whether the two are the same.
  • the existing voiceprint identification methods generally compare the same phoneme in the sample and the sample, and calculate the deviation of the formant frequency of the same phoneme to obtain the deviation value. If the calculated deviation value is within a preset range, It is considered that the phoneme in the sample speech and the phoneme in the sample of the sample are the same speaker, otherwise, they are different speakers. However, there are certain situations when the speaker is due to mood fluctuations and other reasons, which makes the calculated deviation value There is a slight deviation from the preset range, which causes the same speaker to be mistakenly identified as a different speaker.
  • the present application provides a voiceprint identification method and related devices, which are used to solve the problem that in the existing voiceprint identification method, when the speaker has a slight deviation from the preset range due to emotional fluctuations and other reasons, the calculated deviation value is slightly deviated from the preset range. It was originally a technical problem where the same speaker was mistakenly identified as a different speaker.
  • the first aspect of this application provides a voiceprint identification method, including:
  • the deviation value that does not meet the preset formant frequency deviation standard is calculated to correspond to the deviation value in the preset formant frequency deviation standard The difference between the upper limit of the frequency deviation of the formant;
  • the target phoneme in the voice and the target phoneme in the sample voice belong to the voiceprint identification result of the same speaker; otherwise, the target phoneme in the sample voice and the target phoneme in the sample voice are output
  • the target phoneme belongs to the voiceprint identification results of different speakers.
  • the calculation of the frequency deviation between each formant of the target phoneme in the sample voice and each formant of the target phoneme in the sample voice is calculated to obtain 4 deviation values.
  • 4 deviation values include:
  • the 4 formants of the target phoneme in the sampled speech are extracted.
  • the preset formant frequency deviation standard includes:
  • the formant frequency deviation of the target phoneme in the sample speech and the target phoneme in the sample speech satisfies: the first formant frequency deviation is less than 12%, and the second formant frequency deviation is less than 9%, The third formant frequency deviation is less than 5%-6% and the fourth formant frequency deviation is less than 5%-6%, it is determined that the target phoneme in the sample voice and the target phoneme in the sample voice are The same speaker.
  • the extracting the 4 formants of the target phoneme in the sample speech includes:
  • the 4 formants of the target phoneme in the sample speech are extracted based on the linear predictive coding technology.
  • the second aspect of the present application provides a voiceprint identification device, including:
  • the first acquisition module is used to acquire sample voices
  • the first extraction module is used to extract 4 formants of the target phoneme in the sample speech
  • the first calculation module is configured to calculate the frequency deviation of each formant of the target phoneme in the sample voice and each formant of the target phoneme in the sample voice to obtain 4 deviation values;
  • the output module is configured to output the target phoneme in the sample speech and the target phoneme in the sample speech that belong to the same speaker when the deviation value meets the preset formant frequency deviation standard. Pattern identification result;
  • the second calculation module is configured to calculate the difference between the deviation value that does not meet the preset formant frequency deviation standard and the preset formant frequency deviation standard when the deviation value does not meet the preset formant frequency deviation standard The difference between the upper limit of the resonance peak frequency deviation corresponding to the deviation value;
  • the judgment module is used to judge whether the difference value is within a preset range, and if so, adjust the audio time range of the target phoneme in the sample speech, and trigger the first calculation module until the deviation value Satisfy the preset formant frequency deviation standard, output the voiceprint identification result of the target phoneme in the sample speech and the target phoneme in the sample speech belonging to the same speaker; otherwise, output the sample The target phoneme in the speech and the target phoneme in the sampled speech belong to the voiceprint identification results of different speakers.
  • it also includes:
  • the second acquisition module is used to acquire the voice of the inspection material
  • the second extraction module is used to extract the 4 formants of the target phoneme in the sample speech.
  • the first extraction module is specifically used for:
  • the 4 formants of the target phoneme in the sample speech are extracted based on the linear predictive coding technology.
  • a third aspect of the present application provides a voiceprint identification device, the device including a processor and a memory;
  • the memory is used to store program code and transmit the program code to the processor
  • the processor is configured to execute any one of the voiceprint identification methods described in the first aspect according to instructions in the program code.
  • a fourth aspect of the present application provides a computer-readable storage medium, where the computer-readable storage medium is used to store program code, and the program code is used to execute any of the voiceprint identification methods described in the first aspect.
  • the fifth aspect of the present application provides a computer program product including instructions, which when run on a computer, causes the computer to execute any of the voiceprint identification methods described in the first aspect.
  • This application provides a voiceprint identification method, including: obtaining a sample voice; extracting 4 formants of the target phoneme in the sample voice; calculating each formant of the target phoneme in the sample voice and the target phoneme in the sample voice The frequency deviation of the formant of each formant of the, get 4 deviation values; when the deviation value meets the preset formant frequency deviation standard, the target phoneme in the output sample voice and the target phoneme in the sample voice belong to the same speaker's voice Pattern identification result; when the deviation value does not meet the preset formant frequency deviation standard, calculate the deviation value that does not meet the preset formant frequency deviation standard and the formant frequency deviation corresponding to the deviation value in the preset formant frequency deviation standard The difference of the limit; judge whether the difference is within the preset range, if so, adjust the audio time range of the target phoneme in the sample speech, and return to calculate each formant of the target phoneme in the sample speech and the sample speech The formant frequency deviation of each formant of the target phoneme, and the steps of obtaining 4 deviation values,
  • the voiceprint identification method in this application calculates the deviation of the formant frequency between each formant of the target phoneme in the sample voice and each formant of the target phoneme in the sample voice to obtain the deviation value.
  • the formant frequency deviation standard is preset, calculate the difference between the deviation value and the upper limit of the formant frequency deviation corresponding to the deviation value in the formant frequency deviation standard, and judge whether the difference is within the preset range, if so, explain the deviation value There is a slight deviation from the standard value of the preset formant frequency deviation.
  • FIG. 1 is a schematic flowchart of an embodiment of a voiceprint identification method provided by this application
  • FIG. 2 is a schematic flowchart of another embodiment of a voiceprint identification method provided by this application.
  • FIG. 3 is a schematic structural diagram of an embodiment of a voiceprint identification device provided by this application.
  • An embodiment of a voiceprint identification method provided in this application includes:
  • Step 101 Obtain a sample voice.
  • sample voice can be obtained through a voice recording device.
  • Step 102 Extract 4 formants of the target phoneme in the sample speech.
  • each phoneme has 4 formants.
  • the phoneme cannot As the target phoneme.
  • Step 103 Calculate the frequency deviation of each formant of the target phoneme in the sample voice and each formant of the target phoneme in the sample voice to obtain 4 deviation values.
  • the target phoneme in the sampled speech also has 4 formants, and the calculated deviation value includes 4 formant frequency deviation values.
  • Step 104 When the deviation value meets the preset formant frequency deviation standard, output the voiceprint identification result of the target phoneme in the sample speech and the target phoneme in the sample speech belonging to the same speaker.
  • Step 105 When the deviation value does not meet the preset formant frequency deviation standard, calculate the deviation value that does not meet the preset formant frequency deviation standard and the upper limit of the formant frequency deviation corresponding to the deviation value in the preset formant frequency deviation standard The difference between the values.
  • Step 106 Determine whether the difference value is within the preset range. If so, adjust the audio time range of the target phoneme in the sample voice, and return to step 103 until the deviation value meets the preset formant frequency deviation standard, and output the sample voice
  • the target phoneme and the target phoneme in the sample speech belong to the voiceprint identification result of the same speaker; otherwise, the target phoneme in the output sample speech and the target phoneme in the sample speech belong to the voiceprint identification result of different speakers.
  • the deviation value does not meet the preset formant frequency deviation standard, it means that there is a deviation between the deviation value and the preset formant frequency deviation standard value.
  • the deviation value is calculated from the preset formant frequency deviation standard. The difference between the upper limit of the formant frequency deviation corresponding to the value is used to quantify the deviation between the deviation value and the preset formant frequency deviation standard value, so as to intuitively understand the deviation value and the preset formant frequency deviation standard The degree of deviation between the values.
  • the voiceprint identification method in the embodiment of the present application calculates the deviation of the formant frequency of each formant of the target phoneme in the sample voice and each formant of the target phoneme in the sample voice to obtain the deviation value.
  • the preset formant frequency deviation standard is not met, calculate the difference between the deviation value and the upper limit of the formant frequency deviation value in the formant frequency deviation standard, and judge whether the difference is within the preset range. If so, indicate the deviation value and There is a slight deviation in the standard value of the preset formant frequency deviation.
  • FIG. 2 Another embodiment of a voiceprint identification method provided in this application includes:
  • Step 201 Obtain the voice of the material inspection.
  • the voice of the sampled material can be obtained in the voiceprint identification database.
  • Step 202 Extract 4 formants of the target phoneme in the sampled speech.
  • each phoneme has 4 formants.
  • the formant can be extracted by linear predictive coding technology.
  • Step 203 Obtain a sample voice.
  • step 203 and step 201 can be performed simultaneously or sequentially.
  • Step 204 Extract 4 formants of the target phoneme in the sample speech.
  • the formant can be extracted by linear predictive coding technology, where the target phoneme in the sample speech and the target phoneme in the detected speech are the same phoneme.
  • Step 205 Calculate the frequency deviation of each formant of the target phoneme in the sample speech and each formant of the target phoneme in the sample speech to obtain 4 deviation values.
  • the target phoneme in the sampled speech also has 4 formants, and the calculated deviation value includes 4 formant frequency deviation values.
  • the calculation of formant frequency deviation belongs to the prior art, and the formant frequency is not corrected here. The specific calculation process of the deviation will be described in detail.
  • Step 206 When the deviation value meets the preset formant frequency deviation standard, output the voiceprint identification result of the target phoneme in the sample speech and the target phoneme in the sample speech belonging to the same speaker.
  • the preset formant frequency deviation standards include: when the formant frequency deviation of the target phoneme in the sample voice and the target phoneme in the sample voice meets the following requirements: the first formant frequency deviation is less than 12%, and the second formant frequency deviation The frequency deviation is less than 9%, the third formant frequency deviation is less than 5%-6% and the fourth formant frequency deviation is less than 5%-6%. It is judged that the target phoneme in the sample speech and the target phoneme in the sample speech are the same speech people.
  • the frequency deviation of the first formant is less than 12%, the frequency deviation of the second formant is less than 9%, the frequency deviation of the third formant is less than 5%-6%, and the frequency deviation of the fourth formant is less than 5%-6% .
  • the identification result that the target phoneme in the output sample speech and the target phoneme in the sample speech belong to the same speaker; for example, suppose the four formant frequency deviations of the target phoneme of the sample speech and the target phoneme of the sample speech are calculated as F1: 8%, F2: 7%, F3: 5%, F4: 4%. Since F1, F2, F3, and F4 all meet the preset formant frequency deviation standard, the target phoneme and sample voice of the sampled voice are output The target phoneme belongs to the same speaker.
  • Step 207 When the deviation value does not meet the preset formant frequency deviation standard, calculate the deviation value that does not meet the preset formant frequency deviation standard and the upper limit of the formant frequency deviation corresponding to the deviation value in the preset formant frequency deviation standard The difference between the values.
  • the deviation between the deviation value and the preset formant frequency deviation standard calculates the difference between the deviation value and the upper limit value of the formant frequency deviation corresponding to the deviation value in the preset formant frequency deviation standard.
  • the deviation between the deviation value and the preset formant frequency deviation standard value is quantified by the difference, so that To intuitively understand the degree of deviation between the deviation value and the standard deviation of the preset formant frequency.
  • the calculated four deviation values are F1: 11%, F2: 8%, F3: 5%, F4: 7%.
  • Step 208 Determine whether the difference value is within the preset range. If so, adjust the audio time range of the target phoneme in the sample voice, and return to step 205 until the deviation value meets the preset formant frequency deviation standard, and output the sample voice
  • the target phoneme and the target phoneme in the sample speech belong to the voiceprint identification result of the same speaker; otherwise, the target phoneme in the output sample speech and the target phoneme in the sample speech belong to the voiceprint identification result of different speakers.
  • the difference is within the preset range. If so, it means that the deviation between the deviation value and the preset formant frequency deviation standard value is small, which may be caused by the speaker's mood swings or other reasons This deviation is reduced by appropriately adjusting the audio time range of the target phoneme in the sample voice until the deviation value meets the preset formant frequency deviation standard.
  • the target phoneme in the output sample voice and the target phoneme in the sampled voice belong to The voiceprint identification result of the same speaker, for example, following the above example, assuming that the preset range is 0%-2%, the calculated difference is 1%, and the difference is within the preset range, indicating the deviation value and the preset The degree of deviation between the standard values of the formant frequency deviation is relatively small.
  • the audio time range of the target phoneme in the sample speech is appropriately adjusted.
  • the adjustment of the audio time range can be determined according to the specific situation, which can be in the audio time of the sample speech Reduce the axis by 2ms, return to step 205, recalculate the formant frequency deviation of the target phoneme in the sample voice and the target phoneme in the sample voice, and obtain 4 deviation values until the deviation value meets the preset formant frequency deviation standard, output
  • the target phoneme in the sample speech and the target phoneme in the sample speech belong to the voiceprint identification result of the same speaker.
  • the difference is not within the preset range, it means that the deviation between the deviation value and the standard value of the preset formant frequency deviation is large, and the target phoneme in the output sample speech and the target phoneme in the sample speech belong to different speakers.
  • Voiceprint identification results for example, assuming that the preset range is 0%-2%, the four calculated deviations are F1: 11%, F2: 8%, F3: 5%, F4: 10%, by dividing these Comparing the four deviation values with the preset formant frequency deviation standard, it can be seen that the fourth formant frequency deviation of the four deviation values does not meet the preset formant frequency deviation standard, that is, F4: 10%>6%.
  • an embodiment of a voiceprint identification device provided in this application includes:
  • the first obtaining module 301 is used to obtain sample voices.
  • the first extraction module 302 is used to extract 4 formants of the target phoneme in the sample speech.
  • the first calculation module 303 is configured to calculate the frequency deviation of each formant of the target phoneme in the sample speech and each formant of the target phoneme in the sample speech to obtain 4 deviation values.
  • the output module 304 is configured to output the voiceprint identification result that the target phoneme in the sample voice and the target phoneme in the sample voice belong to the same speaker when the deviation value meets the preset formant frequency deviation standard.
  • the second calculation module 305 is used for calculating the deviation value that does not meet the preset formant frequency deviation standard and the resonance corresponding to the deviation value in the preset formant frequency deviation standard when the deviation value does not meet the preset formant frequency deviation standard The difference between the upper limit of the peak frequency deviation.
  • the judging module 306 is used to judge whether the difference is within the preset range, and if so, adjust the audio time range of the target phoneme in the sample speech, and trigger the first calculation module 303 until the deviation meets the preset formant frequency deviation standard ,
  • the target phoneme in the output sample speech and the target phoneme in the sample speech belong to the voiceprint identification result of the same speaker, otherwise, the target phoneme in the output sample speech and the target phoneme in the sample speech belong to the voiceprint of different speakers Identification result.
  • the second acquiring module 307 is used to acquire the voice of the material inspection.
  • the second extraction module 308 is used to extract 4 formants of the target phoneme in the sampled speech.
  • the first extraction module 303 is specifically configured to:
  • the 4 formants of the target phoneme in the sample speech are extracted.
  • This application provides an embodiment of a voiceprint identification device, the device includes a processor and a memory;
  • the memory is used to store the program code and transmit the program code to the processor
  • the processor is configured to execute the voiceprint identification method in the aforementioned voiceprint identification method embodiment according to the instructions in the program code.
  • This application provides an embodiment of a computer-readable storage medium, the computer-readable storage medium is used to store program code, and the program code is used to execute the voiceprint identification method in the aforementioned voiceprint identification method embodiment
  • This application also provides an embodiment of a computer program product including instructions, which when run on a computer, causes the computer to execute the voiceprint identification method in the aforementioned voiceprint identification method embodiment.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. , Including several instructions to execute all or part of the steps of the methods described in the various embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device, etc.).
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (English full name: Read-Only Memory, English abbreviation: ROM), random access memory (English full name: Random Access Memory, English abbreviation: RAM), magnetic Various media that can store program codes, such as discs or optical discs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Telephone Function (AREA)

Abstract

一种声纹鉴定方法和相关装置,通过计算样本语音中的目标音素与检材语音中的目标音素的共振峰频率偏差,得到4个偏差值(103),当偏差值不满足预置共振峰频率偏差标准时,计算偏差值与共振峰频率偏差标准中与该偏差值对应的共振峰频率偏差上限值的差值(105),判断差值是否在预置范围内,若是,调整样本语音中的目标音素的音频时间范围(106),直至满足条件,从而输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果(104),否则输出不同说话人的声纹鉴定结果,从而解决了现有的声纹鉴定方法中说话人因为情绪波动等原因使得计算得到的偏差值与预置范围存在微小的偏差,进而使得原本为同一说话人被错鉴定为不同说话人的技术问题。

Description

一种声纹鉴定方法和相关装置 技术领域
本申请涉及声纹鉴定技术领域,尤其涉及一种声纹鉴定方法和相关装置。
背景技术
声纹鉴定是指通过未知说话人或不确定说话人的语音声学特征与已知说话人的语音声学特征进行综合分析比对,做出两者是否同一的结论的过程。现有的声纹鉴定方法中一般是通过对检材和样本中相同的音素作比对,通过计算相同音素的共振峰频率偏差,得到偏差值,若计算得到的偏差值在预置范围内,则认为样本语音中的该音素与检材样本中的该音素为同一说话人,反之,为不同说话人,但是,存在某种情况,当说话人因为情绪波动等原因,使得计算得到的偏差值与预置范围存在微小的偏差,进而使得原本为同一说话人被错鉴定为不同说话人。
发明内容
本申请提供了一种声纹鉴定方法和相关装置,用于解决现有的声纹鉴定方法中当说话人因为情绪波动等原因使得计算得到的偏差值与预置范围存在微小的偏差,进而使得原本为同一说话人被错鉴定为不同说话人的技术问题。
有鉴于此,本申请第一方面提供了一种声纹鉴定方法,包括:
获取样本语音;
提取所述样本语音中的目标音素的4个共振峰;
计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值;
当所述偏差值满足所述预置共振峰频率偏差标准时,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于同一说话人的 声纹鉴定结果;
当所述偏差值不满足预置共振峰频率偏差标准时,计算不满足所述预置共振峰频率偏差标准的所述偏差值与所述预置共振峰频率偏差标准中的与所述偏差值对应的共振峰频率偏差上限值的差值;
判断所述差值是否在预置范围内,若是,则调整所述样本语音中的所述目标音素的音频时间范围,并返回所述计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值的步骤,直至所述偏差值满足所述预置共振峰频率偏差标准,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于同一说话人的声纹鉴定结果,否则,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于不同说话人的声纹鉴定结果。
优选地,所述计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值,之前还包括:
获取检材语音;
提取所述检材语音中的所述目标音素的4个共振峰。
优选地,所述预置共振峰频率偏差标准包括:
当所述样本语音中的所述目标音素与所述检材语音中的所述目标音素的共振峰频率偏差满足:第一共振峰频率偏差小于12%、第二共振峰频率偏差小于9%,第三共振峰频率偏差小于5%-6%和第四共振峰频率偏差小于5%-6%,判断所述样本语音中的所述目标音素与所述检材语音中的所述目标音素为同一说话人。
优选地,所述提取所述样本语音中的所述目标音素的4个共振峰,包括:
基于线性预测编码技术提取所述样本语音中的所述目标音素的4个共振峰。
本申请第二方面提供了一种声纹鉴定装置,包括:
第一获取模块,用于获取样本语音;
第一提取模块,用于提取所述样本语音中的目标音素的4个共振峰;
第一计算模块,用于计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值;
输出模块,用于当所述偏差值满足所述预置共振峰频率偏差标准时,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于同一说话人的声纹鉴定结果;
第二计算模块,用于当所述偏差值不满足预置共振峰频率偏差标准时,计算不满足所述预置共振峰频率偏差标准的所述偏差值与所述预置共振峰频率偏差标准中的与所述偏差值对应的共振峰频率偏差上限值的差值;
判断模块,用于判断所述差值是否在预置范围内,若是,则调整所述样本语音中的所述目标音素的音频时间范围,并触发所述第一计算模块,直至所述偏差值满足所述预置共振峰频率偏差标准,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于同一说话人的声纹鉴定结果,否则,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于不同说话人的声纹鉴定结果。
优选地,还包括:
第二获取模块,用于获取检材语音;
第二提取模块,用于提取所述检材语音中的所述目标音素的4个共振峰。
优选地,所述第一提取模块具体用于:
基于线性预测编码技术提取所述样本语音中的所述目标音素的4个共振峰。
本申请第三方面提供了一种声纹鉴定设备,所述设备包括处理器以及存储器;
所述存储器用于存储程序代码,并将所述程序代码传输给所述处理器;
所述处理器用于根据所述程序代码中的指令执行第一方面任一种所述的声纹鉴定方法。
本申请第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行第一方面任一种所述的声纹鉴定方法。
本申请第五方面提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行第一方面任一种所述的声纹鉴定方法。
从以上技术方案可以看出,本申请具有以下优点:
本申请提供了一种声纹鉴定方法,包括:获取样本语音;提取样本语音中的目标音素的4个共振峰;计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值;当偏差值满足预置共振峰频率偏差标准时,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果;当偏差值不满足预置共振峰频率偏差标准时,计算不满足预置共振峰频率偏差标准的偏差值与预置共振峰频率偏差标准中的与偏差值对应的共振峰频率偏差上限值的差值;判断差值是否在预置范围内,若是,则调整样本语音中的目标音素的音频时间范围,并返回计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值的步骤,直至偏差值满足预置共振峰频率偏差标准,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果,否则,输出样本语音中的目标音素与检材语音中的目标音素属于不同说话人的声纹鉴定结果。
本申请中的声纹鉴定方法,通过计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到偏差值,当偏差值不满足预置共振峰频率偏差标准时,计算偏差值与共振峰频率偏差标准中与该偏差值对应的共振峰频率偏差上限值的差值,判断差值是否在预置范围内,若是,说明偏差值与预置共振峰频率偏差标准值存在微小的偏差,调整样本语音中的目标音素的音频时间范围,使得样本语音的目标音素与检材语音的目标音素的共振峰频率偏差满足预置共振峰频率偏差标准,从而输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果,若差值不在预置范围内,说明偏差值与预置共振峰频率偏差标准值偏差较大,则输出样本语音中的目标音素与检材语音中的目标音素属于不同说话人的声纹鉴定结果,从而解决了现有的声纹鉴定方法中说话人因为情绪波动等原因使得计算得到的偏差值与预置范围存在微小的偏差,进而使得原本为同一说话人被错鉴定为不同说话人的技 术问题。
附图说明
图1为本申请提供的一种声纹鉴定方法的一个实施例的流程示意图;
图2为本申请提供的一种声纹鉴定方法的另一个实施例的流程示意图;
图3为本申请提供的一种声纹鉴定装置的一个实施例的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了便于理解,请参阅图1,本申请提供的一种声纹鉴定方法的一个实施例,包括:
步骤101、获取样本语音。
需要说明的是,可以通过语音录制设备获得样本语音。
步骤102、提取样本语音中的目标音素的4个共振峰。
需要说明的是,样本语音中可能存在多个不同的音素,通常每个音素具有4个共振峰,提取样本语音中的音素的共振峰时,若某音素没有4个共振峰,则该音素不能作为目标音素。
步骤103、计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值。
需要说明的是,检材语音中的目标音素同样具有4个共振峰,计算得到的偏差值包括4个共振峰频率偏差值。
步骤104、当偏差值满足预置共振峰频率偏差标准时,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果。
步骤105、当偏差值不满足预置共振峰频率偏差标准时,计算不满足 预置共振峰频率偏差标准的偏差值与预置共振峰频率偏差标准中的与偏差值对应的共振峰频率偏差上限值的差值。
步骤106、判断差值是否在预置范围内,若是,则调整样本语音中的目标音素的音频时间范围,并返回步骤103,直至偏差值满足预置共振峰频率偏差标准,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果,否则,输出样本语音中的目标音素与检材语音中的目标音素属于不同说话人的声纹鉴定结果。
需要说明的是,当偏差值不满足预置共振峰频率偏差标准时,说明偏差值与预置共振峰频率偏差标准值存在偏差,通过计算偏差值与预置共振峰频率偏差标准中的与该偏差值对应的共振峰频率偏差上限值的差值,通过差值量化偏差值与预置共振峰频率偏差标准值之间存在的偏差,以便于直观地了解偏差值与预置共振峰频率偏差标准值之间的偏差程度。
判断差值是否在预置范围内,若是,说明偏差值与预置共振峰频率偏差标准值之间的偏差程度较小,可能是说话人因为情绪波动或其他原因所导致的该偏差,通过适当调整样本语音中的目标音素的音频时间范围,缩小这种偏差,直至偏差值满足预置共振峰频率偏差标准,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果;若差值不在预置范围内,说明偏差值与预置共振峰频率偏差标准值之间的偏差程度较大,则输出样本语音中的目标音素与检材语音中的目标音素属于不同说话人的声纹鉴定结果。
本申请实施例中的声纹鉴定方法,通过计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到偏差值,当偏差值不满足预置共振峰频率偏差标准时,计算偏差值与共振峰频率偏差标准中的共振峰频率偏差值的上限值的差值,判断差值是否在预置范围内,若是,说明偏差值与预置共振峰频率偏差标准值存在微小的偏差,调整样本语音中的目标音素的音频时间范围,使得样本语音的目标音素与检材语音的目标音素的共振峰频率偏差满足预置共振峰频率偏差标准,从而输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果,若差值不在预置范围内,说明偏差值与预置共振峰频率偏差标准值偏差较大,则输出样本语音中的目标音素与检材语音中 的目标音素属于不同说话人的声纹鉴定结果,从而解决了现有的声纹鉴定方法中说话人因为情绪波动等原因使得计算得到的偏差值与预置范围存在微小的偏差,进而使得原本为同一说话人被错鉴定为不同说话人的技术问题。
为了便于理解,请参阅图2,本申请提供的一种声纹鉴定方法的另一个实施例,包括:
步骤201、获取检材语音。
需要说明的是,可以在声纹鉴定数据库中获取检材语音。
步骤202、提取检材语音中的目标音素的4个共振峰。
需要说明的是,检材语音中可能存在多个不同的音素,通常每个音素具有4个共振峰,提取检材语音中的音素的共振峰时,若某音素没有4个共振峰,则该音素不能作为目标音素。其中,可以通过线性预测编码技术提取共振峰。
步骤203、获取样本语音。
需要说明的是,步骤203和步骤201可以同时进行,也可以先后进行。
步骤204、提取样本语音中的目标音素的4个共振峰。
需要说明的是,可以通过线性预测编码技术提取共振峰,其中,样本语音中的目标音素与检测语音中的目标音素为相同的音素。
步骤205、计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值。
需要说明的是,检材语音中的目标音素同样具有4个共振峰,计算得到的偏差值包括4个共振峰频率偏差值,共振峰频率偏差的计算属于现有技术,在此不对共振峰频率偏差的具体计算过程进行赘述。
步骤206、当偏差值满足预置共振峰频率偏差标准时,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果。
需要说明的是,预置共振峰频率偏差标准包括:当样本语音中的目标音素与检材语音中的目标音素的共振峰频率偏差满足:第一共振峰频率偏差小于12%、第二共振峰频率偏差小于9%,第三共振峰频率偏差小于5%-6%和第四共振峰频率偏差小于5%-6%,判断样本语音中的目标音素与检材语音中的目标音素为同一说话人。
第一共振峰频率偏差值小于12%、第二共振峰频率偏差值小于9%、第三共振峰频率偏差值小于5%-6%和第四共振峰频率偏差值小于5%-6%时,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的鉴定结果;例如,假设计算得到检材语音的目标音素与样本语音的目标音素的4个共振峰频率偏差分别为F1:8%,F2:7%,F3:5%,F4:4%,由于F1、F2、F3和F4都符合预置共振峰频率偏差标准,因此,输出检材语音的目标音素与样本语音的目标音素属于同一说话人。
步骤207、当偏差值不满足预置共振峰频率偏差标准时,计算不满足预置共振峰频率偏差标准的偏差值与预置共振峰频率偏差标准中的与偏差值对应的共振峰频率偏差上限值的差值。
需要说明的是,当任一偏差值不满足预置共振峰频率偏差标准时,计算该偏差值与预置共振峰频率偏差标准中与该偏差值对应的共振峰频率偏差上限值的差值,通过计算偏差值与共振峰频率偏差标准中与该偏差值对应的共振峰频率偏差上限值的差值,通过差值量化偏差值与预置共振峰频率偏差标准值之间存在的偏差,以便于直观地了解偏差值与预置共振峰频率偏差标准值之间的偏差程度。例如,计算得到的4个偏差值分别为F1:11%,F2:8%,F3:5%,F4:7%,通过将这4个偏差值与预置共振峰频率偏差标准比较可知,该4个偏差值中的第4个共振峰频率偏差不满足预置共振峰频率偏差标准,即F4:7%>6%,因此,需要计算偏差值F4与预置共振峰频率偏差标准中第四共振峰频率偏差上限值(6%)的差值,该差值为7%-6%=1%,偏差程度较小。
步骤208、判断差值是否在预置范围内,若是,则调整样本语音中的目标音素的音频时间范围,并返回步骤205,直至偏差值满足预置共振峰频率偏差标准,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果,否则,输出样本语音中的目标音素与检材语音中的目标音素属于不同说话人的声纹鉴定结果。
需要说明的是,判断差值是否在预置范围内,若是,说明偏差值与预置共振峰频率偏差标准值之间的偏差程度较小,可能是说话人因为情绪波动或其他原因所导致的该偏差,通过适当调整样本语音中的目标音素的音频时间范围,缩小这种偏差,直至偏差值满足预置共振峰频率偏差标准, 输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果,例如,沿用上述例子,假设预置范围为0%-2%,计算得到的差值为1%,该差值在预置范围内,说明偏差值与预置共振峰频率偏差标准值之间的偏差程度较小,此时,适当调整样本语音中的目标音素的音频时间范围,音频时间范围的调整可以根据具体情况来定,可以是在样本语音的音频时间轴上缩小2ms,返回步骤205,重新计算样本语音中的目标音素与检材语音中的目标音素的共振峰频率偏差,得到4个偏差值,直至偏差值满足预置共振峰频率偏差标准,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果。
若差值不在预置范围内,说明偏差值与预置共振峰频率偏差标准值之间的偏差程度较大,则输出样本语音中的目标音素与检材语音中的目标音素属于不同说话人的声纹鉴定结果,例如,假设预置范围为0%-2%,计算得到的4个偏差值分别为F1:11%,F2:8%,F3:5%,F4:10%,通过将这4个偏差值与预置共振峰频率偏差标准比较可知,该4个偏差值中的第4个共振峰频率偏差不满足预置共振峰频率偏差标准,即F4:10%>6%,因此,需要计算偏差值F4与预置共振峰频率偏差标准中第四共振峰频率偏差上限值(6%)的差值,该差值为10%-6%=4%,该差值不在预置范围内,与预置共振峰频率偏差标准的偏差程度较大,输出样本语音中的目标音素与检材语音中的目标音素属于不同说话人的声纹鉴定结果。
为了便于理解,请参阅图3,本申请提供的一种声纹鉴定装置的一个实施例,包括:
第一获取模块301,用于获取样本语音。
第一提取模块302,用于提取样本语音中的目标音素的4个共振峰。
第一计算模块303,用于计算样本语音中的目标音素的每个共振峰与检材语音中的目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值。
输出模块304,用于当偏差值满足预置共振峰频率偏差标准时,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果。
第二计算模块305,用于当偏差值不满足预置共振峰频率偏差标准时,计算不满足预置共振峰频率偏差标准的偏差值与预置共振峰频率偏差标准 中的与偏差值对应的共振峰频率偏差上限值的差值。
判断模块306,用于判断差值是否在预置范围内,若是,则调整样本语音中的目标音素的音频时间范围,并触发第一计算模块303,直至偏差值满足预置共振峰频率偏差标准,输出样本语音中的目标音素与检材语音中的目标音素属于同一说话人的声纹鉴定结果,否则,输出样本语音中的目标音素与检材语音中的目标音素属于不同说话人的声纹鉴定结果。
进一步地,还包括:
第二获取模块307,用于获取检材语音。
第二提取模块308,用于提取检材语音中的目标音素的4个共振峰。
进一步地,第一提取模块303具体用于:
基于线性预测编码技术提取样本语音中的目标音素的4个共振峰。
本申请提供了一种声纹鉴定设备的一个实施例,设备包括处理器以及存储器;
存储器用于存储程序代码,并将程序代码传输给处理器;
处理器用于根据程序代码中的指令执行前述声纹鉴定方法实施例中的声纹鉴定方法。
本申请提供了一种计算机可读存储介质的一个实施例,计算机可读存储介质用于存储程序代码,程序代码用于执行前述声纹鉴定方法实施例中的声纹鉴定方法
本申请还提供了一种包括指令的计算机程序产品的一个实施例,当其在计算机上运行时,使得计算机执行前述声纹鉴定方法实施例中的声纹鉴定方法。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的, 作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以通过一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-Only Memory,英文缩写:ROM)、随机存取存储器(英文全称:Random Access Memory,英文缩写:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (10)

  1. 一种声纹鉴定方法,其特征在于,包括:
    获取样本语音;
    提取所述样本语音中的目标音素的4个共振峰;
    计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值;
    当所述偏差值满足所述预置共振峰频率偏差标准时,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于同一说话人的声纹鉴定结果;
    当所述偏差值不满足预置共振峰频率偏差标准时,计算不满足所述预置共振峰频率偏差标准的所述偏差值与所述预置共振峰频率偏差标准中的与所述偏差值对应的共振峰频率偏差上限值的差值;
    判断所述差值是否在预置范围内,若是,则调整所述样本语音中的所述目标音素的音频时间范围,并返回所述计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值的步骤,直至所述偏差值满足所述预置共振峰频率偏差标准,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于同一说话人的声纹鉴定结果,否则,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于不同说话人的声纹鉴定结果。
  2. 根据权利要求1所述的声纹鉴定方法,其特征在于,所述计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值,之前还包括:
    获取检材语音;
    提取所述检材语音中的所述目标音素的4个共振峰。
  3. 根据权利要求1所述的声纹鉴定方法,其特征在于,所述预置共振峰频率偏差标准包括:
    当所述样本语音中的所述目标音素与所述检材语音中的所述目标音素的共振峰频率偏差满足:第一共振峰频率偏差小于12%、第二共振峰频率 偏差小于9%,第三共振峰频率偏差小于5%-6%和第四共振峰频率偏差小于5%-6%,判断所述样本语音中的所述目标音素与所述检材语音中的所述目标音素为同一说话人。
  4. 根据权利要求1所述的声纹鉴定方法,其特征在于,所述提取所述样本语音中的所述目标音素的4个共振峰,包括:
    基于线性预测编码技术提取所述样本语音中的所述目标音素的4个共振峰。
  5. 一种声纹鉴定装置,其特征在于,包括:
    第一获取模块,用于获取样本语音;
    第一提取模块,用于提取所述样本语音中的目标音素的4个共振峰;
    第一计算模块,用于计算所述样本语音中的所述目标音素的每个共振峰与检材语音中的所述目标音素的每个共振峰的共振峰频率偏差,得到4个偏差值;
    输出模块,用于当所述偏差值满足所述预置共振峰频率偏差标准时,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于同一说话人的声纹鉴定结果;
    第二计算模块,用于当所述偏差值不满足预置共振峰频率偏差标准时,计算不满足所述预置共振峰频率偏差标准的所述偏差值与所述预置共振峰频率偏差标准中的与所述偏差值对应的共振峰频率偏差上限值的差值;
    判断模块,用于判断所述差值是否在预置范围内,若是,则调整所述样本语音中的所述目标音素的音频时间范围,并触发所述第一计算模块,直至所述偏差值满足所述预置共振峰频率偏差标准,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于同一说话人的声纹鉴定结果,否则,输出所述样本语音中的所述目标音素与所述检材语音中的所述目标音素属于不同说话人的声纹鉴定结果。
  6. 根据权利要求5所述的声纹鉴定装置,其特征在于,还包括:
    第二获取模块,用于获取检材语音;
    第二提取模块,用于提取所述检材语音中的所述目标音素的4个共振峰。
  7. 根据权利要求5所述的声纹鉴定装置,其特征在于,所述第一提取 模块具体用于:
    基于线性预测编码技术提取所述样本语音中的所述目标音素的4个共振峰。
  8. 一种声纹鉴定设备,其特征在于,所述设备包括处理器以及存储器;
    所述存储器用于存储程序代码,并将所述程序代码传输给所述处理器;
    所述处理器用于根据所述程序代码中的指令执行权利要求1-4任一项所述的声纹鉴定方法。
  9. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行权利要求1-4任一项所述的声纹鉴定方法。
  10. 一种包括指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行权利要求1-4任一项所述的声纹鉴定方法。
PCT/CN2019/127977 2019-12-24 2019-12-24 一种声纹鉴定方法和相关装置 WO2021127998A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/127977 WO2021127998A1 (zh) 2019-12-24 2019-12-24 一种声纹鉴定方法和相关装置
CN201980003350.4A CN111108551B (zh) 2019-12-24 2019-12-24 一种声纹鉴定方法和相关装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/127977 WO2021127998A1 (zh) 2019-12-24 2019-12-24 一种声纹鉴定方法和相关装置

Publications (1)

Publication Number Publication Date
WO2021127998A1 true WO2021127998A1 (zh) 2021-07-01

Family

ID=70427468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/127977 WO2021127998A1 (zh) 2019-12-24 2019-12-24 一种声纹鉴定方法和相关装置

Country Status (2)

Country Link
CN (1) CN111108551B (zh)
WO (1) WO2021127998A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114566189A (zh) * 2022-04-28 2022-05-31 之江实验室 基于三维深度特征融合的语音情感识别方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627421B (zh) * 2020-05-13 2023-08-11 广州国音智能科技有限公司 语音识别方法、装置、设备及计算机可读存储介质
CN113409796B (zh) * 2021-05-11 2022-09-27 武汉大晟极科技有限公司 一种基于长时共振峰测量的语音同一性验证方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1530925A (zh) * 2003-03-16 2004-09-22 广东省深圳市人民检察院 广州话声纹鉴定方法
US20050171774A1 (en) * 2004-01-30 2005-08-04 Applebaum Ted H. Features and techniques for speaker authentication
CN103714826A (zh) * 2013-12-18 2014-04-09 安徽讯飞智元信息科技有限公司 面向声纹鉴定的共振峰自动匹配方法
CN109979466A (zh) * 2019-03-21 2019-07-05 广州国音智能科技有限公司 一种声纹身份同一性鉴定方法、装置及计算机可读存储介质
CN110164454A (zh) * 2019-05-24 2019-08-23 广州国音智能科技有限公司 一种基于共振峰偏差的音频同一性判别方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1530925A (zh) * 2003-03-16 2004-09-22 广东省深圳市人民检察院 广州话声纹鉴定方法
US20050171774A1 (en) * 2004-01-30 2005-08-04 Applebaum Ted H. Features and techniques for speaker authentication
CN103714826A (zh) * 2013-12-18 2014-04-09 安徽讯飞智元信息科技有限公司 面向声纹鉴定的共振峰自动匹配方法
CN109979466A (zh) * 2019-03-21 2019-07-05 广州国音智能科技有限公司 一种声纹身份同一性鉴定方法、装置及计算机可读存储介质
CN110164454A (zh) * 2019-05-24 2019-08-23 广州国音智能科技有限公司 一种基于共振峰偏差的音频同一性判别方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114566189A (zh) * 2022-04-28 2022-05-31 之江实验室 基于三维深度特征融合的语音情感识别方法及系统

Also Published As

Publication number Publication date
CN111108551A (zh) 2020-05-05
CN111108551B (zh) 2023-05-26

Similar Documents

Publication Publication Date Title
CN108305615B (zh) 一种对象识别方法及其设备、存储介质、终端
WO2021127998A1 (zh) 一种声纹鉴定方法和相关装置
US9536547B2 (en) Speaker change detection device and speaker change detection method
US9947324B2 (en) Speaker identification method and speaker identification device
WO2021128003A1 (zh) 一种声纹同一性鉴定方法和相关装置
Singh et al. Speaker's voice characteristics and similarity measurement using Euclidean distances
US9047866B2 (en) System and method for identification of a speaker by phonograms of spontaneous oral speech and by using formant equalization using one vowel phoneme type
US10490194B2 (en) Speech processing apparatus, speech processing method and computer-readable medium
KR100631786B1 (ko) 프레임의 신뢰도를 측정하여 음성을 인식하는 방법 및 장치
US20160180852A1 (en) Speaker identification using spatial information
CN108780645B (zh) 对通用背景模型和登记说话者模型进行文本转录适配的说话者验证计算机系统
JP2006079079A (ja) 分散音声認識システム及びその方法
CN106847259B (zh) 一种音频关键词模板的筛选和优化方法
KR101616112B1 (ko) 음성 특징 벡터를 이용한 화자 분리 시스템 및 방법
US9792898B2 (en) Concurrent segmentation of multiple similar vocalizations
CN106910495A (zh) 一种应用于异常声音检测的音频分类系统和方法
CN111863033A (zh) 音频质量识别模型的训练方法、装置、服务器和存储介质
Hughes et al. The individual and the system: assessing the stability of the output of a semi-automatic forensic voice comparison system
JP5803125B2 (ja) 音声による抑圧状態検出装置およびプログラム
WO2021127976A1 (zh) 一种可供比对音素选取方法和装置
JP2013235050A (ja) 情報処理装置及び方法、並びにプログラム
Tomchuk Spectral masking in MFCC calculation for noisy speech
CN114678040B (zh) 语音一致性检测方法、装置、设备及存储介质
Ganchev et al. Performance evaluation for voice conversion systems
JP2021001988A (ja) 音声認識装置、音声認識方法及び記憶媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19957180

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.11.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19957180

Country of ref document: EP

Kind code of ref document: A1