WO2019075965A1 - 一种基于语谱图和音素检索的身份同一性检验方法及装置 - Google Patents

一种基于语谱图和音素检索的身份同一性检验方法及装置 Download PDF

Info

Publication number
WO2019075965A1
WO2019075965A1 PCT/CN2018/075774 CN2018075774W WO2019075965A1 WO 2019075965 A1 WO2019075965 A1 WO 2019075965A1 CN 2018075774 W CN2018075774 W CN 2018075774W WO 2019075965 A1 WO2019075965 A1 WO 2019075965A1
Authority
WO
WIPO (PCT)
Prior art keywords
phoneme
vowel
identity
audio file
sample audio
Prior art date
Application number
PCT/CN2018/075774
Other languages
English (en)
French (fr)
Inventor
晏青
Original Assignee
深圳势必可赢科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳势必可赢科技有限公司 filed Critical 深圳势必可赢科技有限公司
Publication of WO2019075965A1 publication Critical patent/WO2019075965A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the invention relates to the field of speech recognition, and in particular to a method and device for checking identity identity based on a spectrogram and a phoneme retrieval.
  • the human voice After adulthood, the human voice can remain relatively stable for a long time. Experiments have shown that whether the speaker deliberately imitates the voice and tone of others, or whispers softly, even if the imitation is vivid, the voiceprint is always the same. Based on these two characteristics of voiceprint, the investigators can compare the voiceprints of the acquired criminals and the voiceprints of the suspects through the voiceprint identification technology, quickly identify the criminals, and provide reliable evidence for the investigation and resolution.
  • the existing method for verifying the identity of voiceprint identity is mainly by manually searching for similar vowels in the corresponding audio spectrum of the recorded audio, and comparing the voiceprint features one by one.
  • the biggest disadvantage of this method is that it takes a lot of time and effort to find similar vowels, and it is also necessary to manually mark the same syllable for statistics.
  • In the actual voiceprint identification due to the high contrast requirement for voiceprint features, repeated comparisons of different vowels and combinations, traditional manual search and search will lead to a large waste of labor costs, and a single phoneme retrieval function. It is impossible to provide valid identification results for the voiceprint identification case handling.
  • the invention solves the technical problem of searching and finding phonemes in actual voiceprint identification, and visualizes the phonemes to improve the recognition efficiency of the case handlers.
  • the invention provides an identity identity test method and device based on a spectrogram and a phoneme retrieval, which solves the technical problem of searching and finding phonemes in actual voiceprint identification, and visualizes the phonemes to improve the recognition efficiency of the case handlers.
  • the invention provides an identity identity test method based on a spectrogram and a phoneme retrieval, comprising:
  • the acquiring the spectral map corresponding to the sample audio file specifically includes:
  • spectral parameters of the sample audio file including: bandwidth, dynamic range, attenuation coefficient, high frequency boosting coefficient, and windowing type;
  • a corresponding spectral map is constructed according to the spectral parameters.
  • the acquiring the voice feature parameter of the sample audio file specifically includes:
  • the phoneme recognition model is configured to input the phonetic feature parameter into the phoneme recognition model for phoneme retrieval, and the obtained phoneme specifically includes:
  • the Mel frequency cepstrum coefficient is input into the phoneme recognition model for phoneme retrieval, and according to the probability distribution, a matching phoneme is obtained.
  • the matching phoneme is identified on the spectrogram, and the vowel or vowel combination having the same identifier is tested for identity, and determining whether the identity of the to-be identified person corresponding to the sample audio file is verified
  • the specifics include:
  • the invention provides an identity identity testing device based on a spectrogram and a phoneme retrieval, comprising:
  • a first acquiring unit configured to acquire a spectral map corresponding to the sample audio file
  • a second acquiring unit configured to acquire a voice feature parameter of the sample audio file
  • a phoneme retrieval unit configured to construct a phoneme recognition model, input the phonetic feature parameter into the phoneme recognition model for phoneme retrieval, to obtain a phoneme that matches;
  • An identity verification unit configured to identify the matched phoneme on the spectrogram, perform identity verification on a vowel or vowel combination having the same identifier, and determine identity verification of the to-be-identified person corresponding to the sample audio file Whether it passed.
  • the first acquiring unit specifically includes:
  • the parameter obtaining subunit is configured to obtain the spectral parameters of the sample audio file, including: bandwidth, dynamic range, attenuation coefficient, high frequency lifting coefficient, and windowing type;
  • a spectrogram construction subunit is configured to construct a corresponding spectrogram according to the spectral parameters.
  • the second obtaining unit specifically includes:
  • the voice feature parameter acquisition subunit is configured to acquire a Mel frequency cepstrum coefficient of the sample audio file.
  • the phoneme retrieval unit specifically includes:
  • a phoneme recognition model construction subunit configured to input a preset phoneme dictionary, a preset acoustic model and a preset phoneme language model into a phoneme recognizer to construct a phoneme recognition model;
  • a phoneme retrieval subunit configured to input the Mel frequency cepstrum coefficient into the phoneme recognition model for phoneme retrieval, and obtain a matching phoneme according to the probability distribution.
  • the identity verification unit specifically includes:
  • An identifier subunit configured to identify the matched phoneme on the spectrogram, and obtain a vowel or vowel combination having the same identifier
  • a first determining subunit configured to determine whether the voice features of the first group of the vowel or vowel combination having the same identifier match
  • a second determining subunit configured to determine whether the number of types of the matched vowel or vowel combination reaches a preset required quantity
  • the present invention has the following advantages:
  • the invention provides an identity identity verification method based on a spectrogram and a phoneme retrieval, comprising: acquiring a spectrogram corresponding to a sample audio file; acquiring a speech feature parameter of the sample audio file; constructing a phoneme recognition model, The speech feature parameters are input into the phoneme recognition model for phoneme retrieval to obtain a matching phoneme; the met phonemes are identified on the spectrogram, and the vowel or vowel combination having the same identity is identical. Checking, determining whether the identity verification of the to-be-identified person corresponding to the sample audio file passes.
  • the phoneme recognition model by constructing a phoneme recognition model, the phoneme that meets the requirements in the sample audio file is retrieved, and the phoneme corresponding to the required phoneme is compared with the sampler corresponding to the sample audio file, and the identity of the to-be identified corresponding to the sample audio file is identified.
  • the phoneme recognition model retrieves a plurality of phonemes that meet the requirements, which improves the accuracy of the comparison, solves the technical problem of searching and finding phonemes in the actual voiceprint identification, and visualizes the phonemes. It shows that the identification efficiency of the case-handling personnel is improved.
  • FIG. 1 is a schematic flow chart of an embodiment of an identity identity verification method based on a spectrogram and a phoneme retrieval according to the present invention
  • FIG. 2 is a schematic flow chart of another embodiment of an identity identity verification method based on a spectrogram and a phoneme retrieval according to the present invention
  • FIG. 3 is a schematic structural diagram of an embodiment of an identity identity verification apparatus based on a spectrogram and a phoneme retrieval according to the present invention
  • FIG. 4 is a schematic structural diagram of another embodiment of an identity identity verification apparatus based on a spectrogram and a phoneme retrieval according to the present invention.
  • the embodiment of the invention provides an identity identity test method and device based on a spectrogram and a phoneme retrieval, which solves the technical problem of searching and finding phonemes in the actual voiceprint identification, and visualizes the phonemes to improve the identification of the case handlers. effectiveness.
  • an embodiment of the present invention provides an embodiment of an identity identity verification method based on a spectrogram and a phoneme retrieval, including:
  • sample audio is collected by any recording terminal, and a sample audio file is formed, and a spectral map corresponding to the sample audio file is obtained.
  • the phoneme recognition model is constructed, and the phonetic feature parameters are input into the phoneme recognition model for phoneme retrieval, and the matched phonemes are obtained.
  • Identify the phoneme that matches the phonogram obtain a vowel or vowel combination with the same identifier, perform an identity test on the vowel or vowel combination with the same identifier, and determine the identifier to be identified corresponding to the sample audio file. Whether the authentication passed.
  • the obtained phoneme is identified on the score map, and the identity check is performed on the two to determine whether the identity verification of the to-be-identified person corresponding to the sample audio file passes.
  • the phoneme that meets the requirements in the sample audio file is retrieved, and the phoneme corresponding to the required phoneme is compared with the sampler corresponding to the sample audio file, and the corresponding sample audio file is identified.
  • the identity of the person is more accurate than the manual comparison, and the phoneme recognition model is used to retrieve a plurality of phonemes that meet the requirements, which improves the accuracy of the comparison and solves the technical problem of searching and finding the phonemes in the actual voiceprint identification.
  • the phoneme visual display shows the efficiency of the identification of the case handlers.
  • the above is an embodiment of the identity identity verification method based on the spectrogram and the phoneme retrieval provided by the present invention.
  • the following describes the identity identity verification method based on the spectrogram and the phoneme retrieval provided by the present invention. Another embodiment is described.
  • an embodiment of the present invention provides another embodiment of an identity identity verification method based on a spectrogram and a phoneme retrieval, including:
  • obtaining the spectral parameters of the sample audio file including: bandwidth, dynamic range, attenuation coefficient, high frequency boost coefficient, and windowing type;
  • sample audio is collected by any recording terminal, and a sample audio file is formed, and the spectral parameters of the sample audio file are obtained, including: bandwidth, dynamic range, attenuation coefficient, high frequency lifting coefficient, and windowing type.
  • MFCC Mel-Frequency Cepstral Coefficients
  • the preset phoneme dictionary, the preset acoustic model and the preset phoneme language model are input into the phoneme recognizer to construct a phoneme recognition model, wherein the preset acoustic model is the voice model of the identified person.
  • the preset phoneme speech model presets the same phoneme language model according to the language type of the person to be recognized.
  • the Mel frequency cepstrum coefficient is input into the phoneme recognition model for phoneme retrieval, and according to the probability distribution, a matching phoneme is obtained.
  • met phonemes are identified on the spectrogram to obtain a vowel or vowel combination with the same identifier.
  • vocals are also affected by their own physiology such as the nostrils, pharyngeal cavity, and oral size, with their own Formant Regions.
  • their own physiology such as the nostrils, pharyngeal cavity, and oral size
  • Formant Regions By taking advantage of changes in the shape and size of these resonance spaces (such as changing the throat and mouth shape), we can change the formant of the sound.
  • the reason we can distinguish between different vocals and vowels is mainly based on the position of their formant distribution.
  • the voice features of the first group of vowels or vowel combinations having the same identifier are determined. If the voice features match, the type of the matched vowel or vowel combination is determined, and step 2044 is performed. If the features do not match, then whether the speech features of the next set of vowels or vowel combinations with the same identity match are judged.
  • the type of the vowel or vowel combination of the statistical matching is obtained, and the number of types of the matched vowel or vowel combination is obtained, and the number of types of the matched vowel or vowel combination is compared with the preset required number. If the number of types of matched vowels or vowel combinations reaches the preset required number, it is determined that the identity identification to be identified corresponding to the sample audio file passes, and if the number of matching vowels or vowel combinations does not reach the preset required quantity Then, it is determined that the identity verification to be identified corresponding to the sample audio file does not pass.
  • the above is another embodiment of the identity identity verification method based on the spectrogram and the phoneme retrieval provided by the present invention.
  • the following is an identity identity verification device based on the spectrogram and the phoneme retrieval provided by the present invention. An embodiment of this is illustrated.
  • the present invention provides an embodiment of an identity identity verification apparatus based on a spectrogram and a phoneme retrieval, including:
  • a first obtaining unit 301 configured to acquire a spectrogram corresponding to the sample audio file
  • the second obtaining unit 302 is configured to acquire a voice feature parameter of the sample audio file.
  • a phoneme retrieval unit 303 configured to construct a phoneme recognition model, input the phonetic feature parameter into the phoneme recognition model for phoneme retrieval, to obtain a phoneme that matches;
  • the identity verification unit 304 is configured to identify the matched phoneme on the spectrogram, perform an identity check on the vowel or vowel combination having the same identifier, and determine the identity of the to-be identified corresponding to the sample audio file. Verify that it passed.
  • the above is an embodiment of an identity identity verification apparatus based on a spectrogram and a phoneme retrieval provided by the present invention.
  • the following is an identity identity verification apparatus based on a spectrogram and a phoneme retrieval provided by the present invention. Another embodiment is described.
  • the present invention provides another embodiment of an identity identity verification apparatus based on a spectrogram and a phoneme retrieval, including:
  • a first obtaining unit 401 configured to acquire a spectrogram corresponding to the sample audio file
  • the first obtaining unit 401 specifically includes:
  • the parameter obtaining subunit 4011 is configured to obtain a spectral parameter of the sample audio file, including: a bandwidth, a dynamic range, an attenuation coefficient, a high frequency lifting coefficient, and a windowing type;
  • the score map construction sub-unit 4012 is configured to construct a corresponding score map according to the score parameters.
  • the second obtaining unit 402 is configured to acquire a voice feature parameter of the sample audio file.
  • the second obtaining unit 402 specifically includes:
  • the voice feature parameter acquisition sub-unit 4021 is configured to acquire a Mel frequency cepstrum coefficient of the sample audio file.
  • a phoneme retrieval unit 403 configured to construct a phoneme recognition model, input the phonetic feature parameter into the phoneme recognition model for phoneme retrieval, to obtain a phoneme that matches;
  • the phoneme retrieval unit 403 specifically includes:
  • a phoneme recognition model construction sub-unit 4031 configured to input a preset phoneme dictionary, a preset acoustic model, and a preset phoneme language model into a phoneme recognizer to construct a phoneme recognition model;
  • the phoneme retrieval sub-unit 4032 is configured to input the Mel frequency cepstrum coefficient into the phoneme recognition model for phoneme retrieval, and obtain a matching phoneme according to the probability distribution.
  • the identity verification unit 404 is configured to identify the met phoneme on the spectrogram, perform an identity check on the vowel or vowel combination having the same identifier, and determine the identity of the to-be identified corresponding to the sample audio file. Verify that it passed.
  • the identity verification unit 404 specifically includes:
  • An identifier subunit 4041 configured to identify the met phoneme on the spectrogram, and obtain a vowel or vowel combination having the same identifier;
  • An analysis subunit 4042 configured to analyze a formant characteristic of the vowel or vowel combination having the same identifier
  • a first determining sub-unit 4043 configured to determine whether the voice features of the first group of the vowel or vowel combination having the same identifier match
  • a second determining subunit 4044 configured to determine whether the number of types of the matched vowel or vowel combination reaches a preset requirement quantity

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于语谱图和音素检索的身份同一性检验方法及装置,其中方法包括:获取样本音频文件对应的语谱图(101);获取样本音频文件的语音特征参数(102);构建音素识别模型,将语音特征参数输入至音素识别模型中进行音素检索,得到符合的音素(103);将符合的音素标识在语谱图上,对具有相同标识的元音或元音组合进行同一性检验,判断样本音频文件对应的待识别者身份验证是否通过(104)。解决了实际声纹鉴定中搜索和查找音素的技术问题,并将音素可视化显示,提高办案人员的认定效率。

Description

一种基于语谱图和音素检索的身份同一性检验方法及装置
本申请要求于2017年10月18日提交中国专利局、申请号为201710971618.6、发明名称为“一种基于语谱图和音素检索的身份同一性检验方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及语音识别领域,尤其涉及一种基于语谱图和音素检索的身份同一性检验方法及装置。
背景技术
成年以后,人的声音可保持长期相对稳定不变。实验证明,无论讲话者是故意模仿他人声音和语气,还是耳语轻声讲话,即使模仿得惟妙惟肖,其声纹却始终相同。基于声纹的这两个特征,侦查人员就可将获取的犯罪分子的声纹和嫌疑人的声纹,通过声纹鉴定技术进行检验对比,迅速认定罪犯,为侦查破案提供可靠的证据。
现有的声纹身份同一性检验方法主要是通过在录音音频对应显示语谱图中,人工逐个查找相似的元音,逐一比对声纹特征的方法。此方法最大的缺点是查找相似的元音需要花费大量的时间和精力,还需要人工标注相同音节进行统计。在实际的声纹鉴定中,由于对声纹特征的对比要求高,会对不同元音及组合进行反复的比较,传统的人工查找和搜索会导致大量人力成本的浪费,且单一的音素检索功能无法对声纹鉴定办案审理提供有效的鉴定结果。本发明解决了实际声纹鉴定中搜索和查找音素的技术问题,并将音素可视化显示,提高办案人员的认定效率。
发明内容
本发明提供了一种基于语谱图和音素检索的身份同一性检验方法及装置,解决了实际声纹鉴定中搜索和查找音素的技术问题,并将音素可视化显示,提高办案人员的认定效率。
本发明提供了一种基于语谱图和音素检索的身份同一性检验方法,包括:
获取样本音频文件对应的语谱图;
获取所述样本音频文件的语音特征参数;
构建音素识别模型,将所述语音特征参数输入至所述音素识别模型中进行音素检索,得到符合的音素;
将所述符合的音素标识在所述语谱图上,对具有相同标识的元音或元音组合进行同一性检验进行同一性检验,判断所述样本音频文件对应的待识别者身份验证是否通过。
作为优选,所述获取样本音频文件对应的语谱图具体包括:
获取样本音频文件的语谱参数,包括:带宽、动态范围、衰减系数、高频提升系数和加窗类型;
根据所述语谱参数构建对应的语谱图。
作为优选,所述获取所述样本音频文件的语音特征参数具体包括:
获取所述样本音频文件的梅尔频率倒谱系数。
作为优选,所述构建音素识别模型,将所述语音特征参数输入至所述音素识别模型中进行音素检索,得到符合的音素具体包括:
将预置的音素词典、预置的声学模型和预置的音素语言模型输入至音素识别器中构建音素识别模型;
将所述梅尔频率倒谱系数输入至所述音素识别模型中进行音素检索,根据概率分布,得到符合的音素。
作为优选,所述将所述符合的音素标识在所述语谱图上,对具有相同标识的元音或元音组合进行同一性检验,判断所述样本音频文件对应的待识别者身份验证是否通过具体包括:
将所述符合的音素标识在所述语谱图上,获取具有相同标识的元音或元音组合;
分析所述具有相同标识的元音或元音组合的共振峰特性;
判断第一组所述具有相同标识的元音或元音组合的语音特征是否匹配,
若是,则确定匹配的元音或元音组合的种类,并执行下一步,
若否,则判断下一组所述具有相同标识的元音或元音组合的语音特征是否匹配;
判断所述匹配的元音或元音组合的种类数量是否达到预置要求数量,
若是,则判断所述样本音频文件对应的待识别身份验证通过。
本发明提供了一种基于语谱图和音素检索的身份同一性检验装置,包括:
第一获取单元,用于获取样本音频文件对应的语谱图;
第二获取单元,用于获取所述样本音频文件的语音特征参数;
音素检索单元,用于构建音素识别模型,将所述语音特征参数输入至所述音素识别模型中进行音素检索,得到符合的音素;
身份验证单元,用于将所述符合的音素标识在所述语谱图上,对具有相同标识的元音或元音组合进行同一性检验,判断所述样本音频文件对应的待识别者身份验证是否通过。
作为优选,所述第一获取单元具体包括:
参数获取子单元,用于获取样本音频文件的语谱参数,包括:带宽、动态范围、衰减系数、高频提升系数和加窗类型;
语谱图构建子单元,用于根据所述语谱参数构建对应的语谱图。
作为优选,所述第二获取单元具体包括:
语音特征参数获取子单元,用于获取所述样本音频文件的梅尔频率倒谱系数。
作为优选,所述音素检索单元具体包括:
音素识别模型构建子单元,用于将预置的音素词典、预置的声学模型和预置的音素语言模型输入至音素识别器中构建音素识别模型;
音素检索子单元,用于将所述梅尔频率倒谱系数输入至所述音素识别模型 中进行音素检索,根据概率分布,得到符合的音素。
作为优选,所述身份验证单元具体包括:
标识子单元,用于将所述符合的音素标识在所述语谱图上,获取具有相同标识的元音或元音组合;
分析子单元,用于分析所述具有相同标识的元音或元音组合的共振峰特性;
第一判断子单元,用于判断第一组所述具有相同标识的元音或元音组合的语音特征是否匹配,
若是,则确定匹配的元音或元音组合的种类,并执行下一步,
若否,则判断下一组所述具有相同标识的元音或元音组合的语音特征是否匹配;
第二判断子单元,用于判断所述匹配的元音或元音组合的种类数量是否达到预置要求数量,
若是,则判断所述样本音频文件对应的待识别身份验证通过。
从以上技术方案可以看出,本发明具有以下优点:
本发明提供了一种基于语谱图和音素检索的身份同一性检验方法,包括:获取样本音频文件对应的语谱图;获取所述样本音频文件的语音特征参数;构建音素识别模型,将所述语音特征参数输入至所述音素识别模型中进行音素检索,得到符合的音素;将所述符合的音素标识在所述语谱图上,对具有相同标识的元音或元音组合进行同一性检验,判断所述样本音频文件对应的待识别者身份验证是否通过。
本发明中,通过构建音素识别模型,检索出样本音频文件中符合要求的音素,且将符合要求的音素与样本音频文件对应的语谱图进行比对,鉴定样本音频文件对应的待识别者身份,相对于人工比较更加的准确,且通过音素识别模型检索出多个符合要求的音素,更加提高了比较的准确度,解决了实际声纹鉴定中搜索和查找音素的技术问题,并将音素可视化显示,提高办案人员的认定效率。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本发明提供的一种基于语谱图和音素检索的身份同一性检验方法的一个实施例的流程示意图;
图2为本发明提供的一种基于语谱图和音素检索的身份同一性检验方法的另一个实施例的流程示意图;
图3为本发明提供的一种基于语谱图和音素检索的身份同一性检验装置的一个实施例的结构示意图;
图4为本发明提供的一种基于语谱图和音素检索的身份同一性检验装置的另一个实施例的结构示意图。
具体实施方式
本发明实施例提供了一种基于语谱图和音素检索的身份同一性检验方法及装置,解决了实际声纹鉴定中搜索和查找音素的技术问题,并将音素可视化显示,提高办案人员的认定效率。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
请参阅图1,本发明实施例提供了一种基于语谱图和音素检索的身份同一性检验方法的一个实施例,包括:
101、获取样本音频文件对应的语谱图;
需要说明的是,通过任意录音终端采集样本音频,并形成样本音频文件,获取样本音频文件对应的语谱图。
102、获取样本音频文件的语音特征参数;
需要说明的是,提取样本音频文件的语音特征参数。
103、构建音素识别模型,将语音特征参数输入至音素识别模型中进行音素检索,得到符合的音素;
需要说明的是,构建音素识别模型,将语音特征参数输入至音素识别模型中进行音素检索,得到符合的音素。
104、将符合的音素标识在语谱图上,获取具有相同标识的元音或元音组合,对具有相同标识的元音或元音组合进行同一性检验,判断样本音频文件对应的待识别者身份验证是否通过。
需要说明的是,将得到的符合的音素标识在语谱图上,对二者进行同一性检验,判断样本音频文件对应的待识别者身份验证是否通过。
本发明实施例中,通过构建音素识别模型,检索出样本音频文件中符合要求的音素,且将符合要求的音素与样本音频文件对应的语谱图进行比对,鉴定样本音频文件对应的待识别者身份,相对于人工比较更加的准确,且通过音素识别模型检索出多个符合要求的音素,更加提高了比较的准确度,解决了实际声纹鉴定中搜索和查找音素的技术问题,并将音素可视化显示,提高办案人员的认定效率。
以上是本发明提供的一种基于语谱图和音素检索的身份同一性检验方法的一个实施例进行说明,以下将说明本发明提供的一种基于语谱图和音素检索的身份同一性检验方法的另一个实施例进行说明。
请参阅图2,本发明实施例提供了一种基于语谱图和音素检索的身份同一性检验方法的另一个实施例,包括:
2011、获取样本音频文件的语谱参数,包括:带宽、动态范围、衰减系数、高频提升系数和加窗类型;
需要说明的是,通过任意录音终端采集样本音频,并形成样本音频文件,获取样本音频文件的语谱参数,包括:带宽、动态范围、衰减系数、高频提升系数和加窗类型。
2012、根据语谱参数构建对应的语谱图。
需要说明的是,通过得到的五个语谱参数,构建对应的语谱图。
202、获取样本音频文件的梅尔频率倒谱系数。
需要说明的是,获取样本音频文件的梅尔频率倒谱系数;
梅尔频率倒谱系数(MFCC,Mel-Frequency Cepstral Coefficients)是一种在主动语音和谈话人识别中广泛使用的特征。
2031、将预置的音素词典、预置的声学模型和预置的音素语言模型输入至音素识别器中构建音素识别模型;
需要说明的是,将预置的音素词典、预置的声学模型和预置的音素语言模型输入至音素识别器中构建音素识别模型,其中,预置的声学模型为已识别身份者的语音模型,预置的音素语音模型根据待识别者的语言类型预置相同的音素语言模型。
2032、将梅尔频率倒谱系数输入至音素识别模型中进行音素检索,根据概率分布,得到符合的音素。
需要说明的是,将梅尔频率倒谱系数输入至音素识别模型中进行音素检索,根据概率分布,得到符合的音素。
2041、将符合的音素标识在语谱图上,获取具有相同标识的元音或元音组合;
需要说明的是,将符合的音素标识在所述语谱图上,获取具有相同标识的元音或元音组合。
2042、分析具有相同标识的元音或元音组合的共振峰特性;
需要说明的是,分析具有相同标识的元音或元音组合的共振峰特性;
在语音声学中,人声也同样受自身生理如鼻孔、咽腔、口腔大小的影响有自身的共振峰区(Formant Regions)。通过利用这些共鸣空间的形状和大小不同的变化(例如改变咽喉、嘴形),我们就能改变声音的共振峰。我们之所以能够区分不同的人声、元音,主要也是依靠它们的共振峰分布的位置。
2043、判断第一组具有相同标识的元音或元音组合的语音特征是否匹配,
若是,则确定匹配的元音或元音组合的种类,并执行下一步,
若否,则判断下一组具有相同标识的元音或元音组合的语音特征是否匹配;
需要说明的是,对第一组具有相同标识的元音或元音组合的语音特征进行判断,若语音特征匹配,则确定匹配的元音或元音组合的种类,并执行步骤2044,若语音特征不匹配,则对下一组具有相同标识的元音或元音组合的语音特征是否匹配进行判断。
2044、判断匹配的元音或元音组合的种类数量是否达到预置要求数量,
若是,则判断样本音频文件对应的待识别身份验证通过。
需要说明的是,统计匹配的元音或元音组合的种类,得到匹配的元音或元音组合的种类数量,将匹配的元音或元音组合的种类数量与预置要求数量进行对比判断,若匹配的元音或元音组合的种类数量达到预置要求数量,则判断样本音频文件对应的待识别身份验证通过,若匹配的元音或元音组合的种类数量未达到预置要求数量,则判断样本音频文件对应的待识别身份验证不通过。
以上是对本发明提供的一种基于语谱图和音素检索的身份同一性检验方法的另一个实施例进行说明,以下将对本发明提供的一种基于语谱图和音素检索的身份同一性检验装置的一个实施例进行说明。
请参阅图3,本发明提供了一种基于语谱图和音素检索的身份同一性检验装置的一个实施例,包括:
第一获取单元301,用于获取样本音频文件对应的语谱图;
第二获取单元302,用于获取所述样本音频文件的语音特征参数;
音素检索单元303,用于构建音素识别模型,将所述语音特征参数输入至所述音素识别模型中进行音素检索,得到符合的音素;
身份验证单元304,用于将所述符合的音素标识在所述语谱图上,对具有相同标识的元音或元音组合进行同一性检验,判断所述样本音频文件对应的待识别者身份验证是否通过。
以上是对本发明提供的一种基于语谱图和音素检索的身份同一性检验装 置的一个实施例进行说明,以下将对本发明提供的一种基于语谱图和音素检索的身份同一性检验装置的另一个实施例进行说明。
请参阅图4,本发明提供了一种基于语谱图和音素检索的身份同一性检验装置的另一个实施例,包括:
第一获取单元401,用于获取样本音频文件对应的语谱图;
第一获取单元401具体包括:
参数获取子单元4011,用于获取样本音频文件的语谱参数,包括:带宽、动态范围、衰减系数、高频提升系数和加窗类型;
语谱图构建子单元4012,用于根据所述语谱参数构建对应的语谱图。
第二获取单元402,用于获取所述样本音频文件的语音特征参数;
第二获取单元402具体包括:
语音特征参数获取子单元4021,用于获取所述样本音频文件的梅尔频率倒谱系数。
音素检索单元403,用于构建音素识别模型,将所述语音特征参数输入至所述音素识别模型中进行音素检索,得到符合的音素;
音素检索单元403具体包括:
音素识别模型构建子单元4031,用于将预置的音素词典、预置的声学模型和预置的音素语言模型输入至音素识别器中构建音素识别模型;
音素检索子单元4032,用于将所述梅尔频率倒谱系数输入至所述音素识别模型中进行音素检索,根据概率分布,得到符合的音素。
身份验证单元404,用于将所述符合的音素标识在所述语谱图上,对具有相同标识的元音或元音组合进行同一性检验,判断所述样本音频文件对应的待识别者身份验证是否通过。
身份验证单元404具体包括:
标识子单元4041,用于将所述符合的音素标识在所述语谱图上,获取具有相同标识的元音或元音组合;
分析子单元4042,用于分析所述具有相同标识的元音或元音组合的共振峰特性;
第一判断子单元4043,用于判断第一组所述具有相同标识的元音或元音组合的语音特征是否匹配,
若是,则确定匹配的元音或元音组合的种类,并执行下一步,
若否,则判断下一组所述具有相同标识的元音或元音组合的语音特征是否匹配;
第二判断子单元4044,用于判断所述匹配的元音或元音组合的种类数量是否达到预置要求数量,
若是,则判断所述样本音频文件对应的待识别身份验证通过。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (10)

  1. 一种基于语谱图和音素检索的身份同一性检验方法,其特征在于,包括:
    获取样本音频文件对应的语谱图;
    获取所述样本音频文件的语音特征参数;
    构建音素识别模型,将所述语音特征参数输入至所述音素识别模型中进行音素检索,得到符合的音素;
    将所述符合的音素标识在所述语谱图上,对具有相同标识的元音或元音组合进行同一性检验,判断所述样本音频文件对应的待识别者身份验证是否通过。
  2. 根据权利要求1所述的基于语谱图和音素检索的身份同一性检验方法,其特征在于,所述获取样本音频文件对应的语谱图具体包括:
    获取样本音频文件的语谱参数,包括:带宽、动态范围、衰减系数、高频提升系数和加窗类型;
    根据所述语谱参数构建对应的语谱图。
  3. 根据权利要求2所述的基于语谱图和音素检索的身份同一性检验方法,其特征在于,所述获取所述样本音频文件的语音特征参数具体包括:
    获取所述样本音频文件的梅尔频率倒谱系数。
  4. 根据权利要求3所述的基于语谱图和音素检索的身份同一性检验方法,其特征在于,所述构建音素识别模型,将所述语音特征参数输入至所述音素识别模型中进行音素检索,得到符合的音素具体包括:
    将预置的音素词典、预置的声学模型和预置的音素语言模型输入至音素识别器中构建音素识别模型;
    将所述梅尔频率倒谱系数输入至所述音素识别模型中进行音素检索,根据概率分布,得到符合的音素。
  5. 根据权利要求4所述的基于语谱图和音素检索的身份同一性检验方法, 其特征在于,所述将所述符合的音素标识在所述语谱图上,对具有相同标识的元音或元音组合进行同一性检验,判断所述样本音频文件对应的待识别者身份验证是否通过具体包括:
    将所述符合的音素标识在所述语谱图上,获取具有相同标识的元音或元音组合;
    分析所述具有相同标识的元音或元音组合的共振峰特性;
    判断第一组所述具有相同标识的元音或元音组合的语音特征是否匹配,
    若是,则确定匹配的元音或元音组合的种类,并执行下一步,
    若否,则判断下一组所述具有相同标识的元音或元音组合的语音特征是否匹配;
    判断所述匹配的元音或元音组合的种类数量是否达到预置要求数量,
    若是,则判断所述样本音频文件对应的待识别身份验证通过。
  6. 一种基于语谱图和音素检索的身份同一性检验装置,其特征在于,包括:
    第一获取单元,用于获取样本音频文件对应的语谱图;
    第二获取单元,用于获取所述样本音频文件的语音特征参数;
    音素检索单元,用于构建音素识别模型,将所述语音特征参数输入至所述音素识别模型中进行音素检索,得到符合的音素;
    身份验证单元,用于将所述符合的音素标识在所述语谱图上,对具有相同标识的元音或元音组合进行同一性检验进行同一性检验,判断所述样本音频文件对应的待识别者身份验证是否通过。
  7. 根据权利要求6所述的基于语谱图和音素检索的身份同一性检验装置,其特征在于,所述第一获取单元具体包括:
    参数获取子单元,用于获取样本音频文件的语谱参数,包括:带宽、动态范围、衰减系数、高频提升系数和加窗类型;
    语谱图构建子单元,用于根据所述语谱参数构建对应的语谱图。
  8. 根据权利要求7所述的基于语谱图和音素检索的身份同一性检验装置,其特征在于,所述第二获取单元具体包括:
    语音特征参数获取子单元,用于获取所述样本音频文件的梅尔频率倒谱系数。
  9. 根据权利要求8所述的基于语谱图和音素检索的身份同一性检验装置,其特征在于,所述音素检索单元具体包括:
    音素识别模型构建子单元,用于将预置的音素词典、预置的声学模型和预置的音素语言模型输入至音素识别器中构建音素识别模型;
    音素检索子单元,用于将所述梅尔频率倒谱系数输入至所述音素识别模型中进行音素检索,根据概率分布,得到符合的音素。
  10. 根据权利要求9所述的基于语谱图和音素检索的身份同一性检验装置,其特征在于,所述身份验证单元具体包括:
    标识子单元,用于将所述符合的音素标识在所述语谱图上,获取具有相同标识的元音或元音组合;
    分析子单元,用于分析所述具有相同标识的元音或元音组合的共振峰特性;
    第一判断子单元,用于判断第一组所述具有相同标识的元音或元音组合的语音特征是否匹配,
    若是,则确定匹配的元音或元音组合的种类,并执行下一步,
    若否,则判断下一组所述具有相同标识的元音或元音组合的语音特征是否匹配;
    第二判断子单元,用于判断所述匹配的元音或元音组合的种类数量是否达到预置要求数量,
    若是,则判断所述样本音频文件对应的待识别身份验证通过。
PCT/CN2018/075774 2017-10-18 2018-02-08 一种基于语谱图和音素检索的身份同一性检验方法及装置 WO2019075965A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710971618.6A CN107680601B (zh) 2017-10-18 2017-10-18 一种基于语谱图和音素检索的身份同一性检验方法及装置
CN201710971618.6 2017-10-18

Publications (1)

Publication Number Publication Date
WO2019075965A1 true WO2019075965A1 (zh) 2019-04-25

Family

ID=61141447

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/075774 WO2019075965A1 (zh) 2017-10-18 2018-02-08 一种基于语谱图和音素检索的身份同一性检验方法及装置

Country Status (2)

Country Link
CN (1) CN107680601B (zh)
WO (1) WO2019075965A1 (zh)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108766417B (zh) * 2018-05-29 2019-05-17 广州国音科技有限公司 一种基于音素自动检索的身份同一性检验方法及装置
CN109065023A (zh) * 2018-08-23 2018-12-21 广州势必可赢网络科技有限公司 一种语音鉴定方法、装置、设备及计算机可读存储介质
CN109087651B (zh) * 2018-09-05 2021-01-19 广州势必可赢网络科技有限公司 一种基于视频与语谱图的声纹鉴定方法、系统及设备
CN109378004B (zh) * 2018-12-17 2022-05-27 广州势必可赢网络科技有限公司 一种音素比对的方法、装置、设备及计算机可读存储介质
CN109448733A (zh) * 2019-01-07 2019-03-08 广州势必可赢网络科技有限公司 一种语谱图生成方法、系统及相关装置
CN109817223A (zh) * 2019-01-29 2019-05-28 广州势必可赢网络科技有限公司 基于音频指纹的音素标记方法及装置
CN109979466B (zh) * 2019-03-21 2021-09-17 广州国音智能科技有限公司 一种声纹身份同一性鉴定方法、装置及计算机可读存储介质
CN110164454B (zh) * 2019-05-24 2021-08-24 广州国音智能科技有限公司 一种基于共振峰偏差的音频同一性判别方法及装置
CN110570842B (zh) * 2019-10-25 2020-07-10 南京云白信息科技有限公司 基于音素近似度和发音标准度的语音识别方法及系统
WO2021127976A1 (zh) * 2019-12-24 2021-07-01 广州国音智能科技有限公司 一种可供比对音素选取方法和装置
CN111108552A (zh) * 2019-12-24 2020-05-05 广州国音智能科技有限公司 一种声纹同一性鉴定方法和相关装置
CN111640453B (zh) * 2020-05-13 2023-06-16 广州国音智能科技有限公司 语谱图匹配方法、装置、设备及计算机可读存储介质
CN112259086A (zh) * 2020-10-15 2021-01-22 杭州电子科技大学 一种基于语谱图合成的语音转换方法
CN112133289B (zh) * 2020-11-24 2021-02-26 北京远鉴信息技术有限公司 声纹鉴定模型训练、声纹鉴定方法、装置、设备及介质
CN112382300A (zh) * 2020-12-14 2021-02-19 北京远鉴信息技术有限公司 声纹鉴定方法、模型训练方法、装置、设备及存储介质
CN113921017A (zh) * 2021-12-14 2022-01-11 深圳市声扬科技有限公司 语音同一性的检验方法、装置、电子设备及存储介质
CN114255764B (zh) * 2022-02-28 2022-06-28 深圳市声扬科技有限公司 音频信息处理方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101656069A (zh) * 2009-09-17 2010-02-24 陈拙夫 一种中文语音信息通讯系统及其通讯方法
US20140035920A1 (en) * 2008-08-12 2014-02-06 Adobe Systems Incorporated Colorization of audio segments
US20140185862A1 (en) * 2012-12-21 2014-07-03 Digimarc Corporation Messaging by writing an image into a spectrogram
WO2015191140A2 (en) * 2014-03-24 2015-12-17 Taylor Thomas Jason Voice-key electronic commerce
CN106023986A (zh) * 2016-05-05 2016-10-12 河南理工大学 一种基于声效模式检测的语音识别方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100406307B1 (ko) * 2001-08-09 2003-11-19 삼성전자주식회사 음성등록방법 및 음성등록시스템과 이에 기초한음성인식방법 및 음성인식시스템
CN103714826B (zh) * 2013-12-18 2016-08-17 讯飞智元信息科技有限公司 面向声纹鉴定的共振峰自动匹配方法
CN106710589B (zh) * 2016-12-28 2019-07-30 百度在线网络技术(北京)有限公司 基于人工智能的语音特征提取方法及装置
CN106920545B (zh) * 2017-03-21 2020-07-28 百度在线网络技术(北京)有限公司 基于人工智能的语音特征提取方法和装置
CN106952649A (zh) * 2017-05-14 2017-07-14 北京工业大学 基于卷积神经网络和频谱图的说话人识别方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140035920A1 (en) * 2008-08-12 2014-02-06 Adobe Systems Incorporated Colorization of audio segments
CN101656069A (zh) * 2009-09-17 2010-02-24 陈拙夫 一种中文语音信息通讯系统及其通讯方法
US20140185862A1 (en) * 2012-12-21 2014-07-03 Digimarc Corporation Messaging by writing an image into a spectrogram
WO2015191140A2 (en) * 2014-03-24 2015-12-17 Taylor Thomas Jason Voice-key electronic commerce
CN106023986A (zh) * 2016-05-05 2016-10-12 河南理工大学 一种基于声效模式检测的语音识别方法

Also Published As

Publication number Publication date
CN107680601A (zh) 2018-02-09
CN107680601B (zh) 2019-02-01

Similar Documents

Publication Publication Date Title
WO2019075965A1 (zh) 一种基于语谱图和音素检索的身份同一性检验方法及装置
Dhingra et al. Isolated speech recognition using MFCC and DTW
CN103714826B (zh) 面向声纹鉴定的共振峰自动匹配方法
US11727954B2 (en) Diagnostic techniques based on speech-sample alignment
CN106782517A (zh) 一种语音音频关键词过滤方法及装置
US11776561B2 (en) Diagnostic techniques based on speech models
AU2020234072B2 (en) Diagnostic techniques based on speech models
CN109273012A (zh) 一种基于说话人识别和数字语音识别的身份认证方法
Babu et al. Forensic speaker recognition system using machine learning
Barczewska et al. Detection of disfluencies in speech signal
CN109087651B (zh) 一种基于视频与语谱图的声纹鉴定方法、系统及设备
US20230306985A1 (en) Analyzing speech using speech models and segmentation based on acoustic features
US20230317099A1 (en) Analyzing speech using speech-sample alignment and segmentation based on acoustic features
Liu Word fragments identification using acoustic-prosodic features in conversational speech
Nath et al. Feature Selection Method for Speaker Recognition using Neural Network
Sigmund Search for keywords and vocal elements in audio recordings
Toledano et al. BioSec Multimodal Biometric Database in Text-Dependent Speaker Recognition.
Bansal et al. Speaker Adaptation on Hidden Markov Model using MFCC and Rasta-PLP and Comparative Study
Boon Mandarin Language Learning System for Nasal Voice User
Muniandy et al. Mandarin Language Learning System for Nasal Voice User
Fulop et al. Advanced time-frequency displays applied to forensic speaker identification
Asani An Enhanced Speech Recognition Algorithm Using Levinson-Durbin, DTW and Maximum Likelihood Classification
Ibrahim An educational text-dependent speaker recognition system
JPH01289997A (ja) 音声登録方式

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18867936

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 08.09.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18867936

Country of ref document: EP

Kind code of ref document: A1