CN113409796A - Voice identity verification method based on long-term formant measurement - Google Patents
Voice identity verification method based on long-term formant measurement Download PDFInfo
- Publication number
- CN113409796A CN113409796A CN202110510987.1A CN202110510987A CN113409796A CN 113409796 A CN113409796 A CN 113409796A CN 202110510987 A CN202110510987 A CN 202110510987A CN 113409796 A CN113409796 A CN 113409796A
- Authority
- CN
- China
- Prior art keywords
- voice
- long
- frequency
- distance
- resonance peak
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000007774 longterm Effects 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000005259 measurement Methods 0.000 title claims abstract description 23
- 238000012795 verification Methods 0.000 title claims abstract description 15
- 239000000463 material Substances 0.000 claims abstract description 21
- 238000001514 detection method Methods 0.000 claims abstract description 11
- 238000009826 distribution Methods 0.000 claims description 20
- 238000004154 testing of material Methods 0.000 claims description 17
- 238000012360 testing method Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 238000010998 test method Methods 0.000 abstract description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 101100006960 Caenorhabditis elegans let-2 gene Proteins 0.000 description 1
- 238000000692 Student's t-test Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- PYSODLWHFWCFLV-VJBFNVCUSA-N leukotriene F4 Chemical compound CCCCC\C=C/C\C=C/C=C/C=C/[C@H]([C@@H](O)CCCC(O)=O)SC[C@@H](C(O)=O)NC(=O)CC[C@H](N)C(O)=O PYSODLWHFWCFLV-VJBFNVCUSA-N 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012353 t test Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
The invention provides a voice identity verification method based on long-term formant measurement, wherein a voice file from the same speaker is known, the distance between long-term formant data of any two sections of voice in the known voice file is calculated to obtain an upper limit distance and a lower limit distance, when a material detection voice is acquired, the long-term formant distance between the material detection voice and the known voice file is calculated, and if the long-term formant distance is smaller than the lower limit distance, the material detection voice and the known voice file are judged to have identity; if the detected material voice is larger than the upper limit distance, judging that the detected material voice does not have the same identity with the known voice file; if the distance is between the upper limit and the lower limit, a hypothesis test method is adopted to verify the identity. The invention can improve the verification precision by acquiring the long-term formants of the voice file and combining a hypothesis test method to verify the voice identity according to the distance of the long-term formants.
Description
Technical Field
The invention belongs to the technical field of voice detection, and particularly relates to a voice identity verification method based on long-term formant measurement.
Background
Formants are important features in voiceprint identification, which not only provide a reference for consonants and vowel resolution, but also include personality characteristics of the speaker. The formant frequency is affected by the length of the vocal tract, and a longer vocal tract results in a lower vowel formant, and the proportional size between the various parts of the vocal tract also affects the formant frequency.
There are many ways to measure the formant frequency. Among them, the method of measuring the central frequency values of different vowel formants is the most classical. However, there is not sufficient correlation between formant frequencies of different vowels and between different formants, and this characteristic reduces the accuracy of identification. Another method for studying formants is dynamic analysis, in which individuals leave traces of their specific movement patterns when they pronounce, these traces reflect the personality characteristics of the speaker, but the dynamics of formants are affected by both the segment and prosodic contexts, and this method also requires further study of the differences between different speaking contexts.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for verifying the voice identity based on the long-term formant measurement can improve the verification precision.
The technical scheme adopted by the invention for solving the technical problems is as follows: a voice identity verification method based on long-term formant measurement comprises the following steps:
knowing a voice file from the same speaker, calculating the distance between the long-time resonance peak data of any two sections of voice in the known voice file, and obtaining the upper limit distanceAnd a lower limit distance
When a material testing voice is collected, calculating the long-term formant distance D between the material testing voice and the known voice file, and carrying out the following judgment:
when in useJudging that the material checking voice in the time interval has the same identity with the known voice file, namely the same speaker;
when in useJudging that the material testing voice in the time interval does not have the same identity with the known voice file, namely different speakers are obtained;
According to the above method, the upper limit distanceAnd a lower limit distanceThe calculation method of (2) is as follows:
let the 4 long-term formant measurement data of 2-segment speech in the known speech file be X1 and Y1, wherein,
in the formula, xF11……xF1mFor the first to the mth resonance peak data, x, under the first frequency of the first section voiceF21……xF2mFor the first to the mth resonance peak data x under the second frequency of the first section voiceF31……xF3mIs the first to the mth resonance peak data x under the third frequency of the first section voiceF41……xF4mData of first to mth resonance peaks under a fourth frequency of the first section of voice; y isF11……yF1nFor the first to nth resonance peak data, y, at the first frequency of the second speech segmentF21……yF2nFor the first to nth resonance peak data, y, at the second frequency of the second speech segmentF31……yF3nIs the first to nth resonance peak data y under the third frequency of the second section voiceF41……yF4nThe data of the first to nth resonance peak under the fourth frequency of the second section of voice; the first to fourth frequencies are frequencies which are sequentially increased or sequentially decreased;
the column data of each long-time formant measurement data matrix form a formant vector xi= [xF1i xF2ixF3i xF4i]、yi=[yF1i yF2i yF3i yF4i]Respectively calculating the central position of the m vectors of the first section of voice and the n vectors of the second section of voice, and enabling x to bec=[xF1c xF2c xF3c xF4c]Is the center of the X1 matrix, let yc=[yF1c yF2c yF3c yF4c]For the center of Y1 matrix, x is obtained according to the clustering principlecTo xiIs minimized, x is obtained by solving the following minimum problemcAnd yc:
At xcAnd ycOn the basis, the Euclidean distance between centers is calculated to calculate the long-term formant distance D of the two sections of voice*:
From the known speech fileRespectively calculating the distance between every two voices in different segments according to the method, and taking the maximum value and the minimum value as the upper limit distanceAnd a lower limit distance
According to the method, the method for calculating the long-term formant distance D of the material detection voice and the long-term formant distance D of the two sections of voices in the known voice file*The same method is used.
According to the method, the hypothesis testing method is a t testing method, and the method comprises the following specific steps:
let the 4 long-term formant measurement data of the material testing voice be Z1, wherein
In the formula, zF11……zF1jFor first to jth resonance peak data, z, at a first frequency of the material-testing voiceF21……zF2jFor first to jth resonance peak data, z, at a second frequency of the material-testing voiceF31……zF3jFor first to jth resonance peak data, Z, under the third frequency of the material-testing voiceF41……zF4jData of first to jth resonance peaks at a fourth frequency of the material detection voice are obtained;
let xF21、xF22、xF23、……、xF2mObedience as N (u, σ)2) Normal distribution of (a), zF21、zF22、 zF23……zF2jObedience as N (v, σ)2) According to the statistical theory, the data of the resonance peak at the second frequency are distributed as follows:
wherein xF2mean、SxAre respectively xF21、xF22、xF23、……、xF2mMean and standard deviation of (2), zF2mean、SzAre each zF21、zF22、zF23……zF2jMean and standard deviation of;
given a degree of confidence a, when
And judging that the time-interval material-checking voice is identical to the known voice file, otherwise, judging that the time-interval material-checking voice is not identical to the known voice file.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method for voice identity verification based on long-term formant measurements when executing the computer program.
A non-transitory computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for voice identity verification based on long-term formant measurements.
The invention has the beneficial effects that: the voice identity verification is carried out by acquiring the long-term formants of the voice file and combining a hypothesis test method according to the distance of the long-term formants, so that the verification precision can be improved.
Drawings
FIG. 1 shows the frequencies of the formants LTF2 and LTF3 at vowels in different contexts of speech.
FIG. 2 is a formant spectrum.
FIG. 3 is a plot of formant F1-F3 frequency versus time.
FIG. 4 is a frequency distribution plot of formants F1-F3.
FIG. 5 is a graph of the long-term formant LTF2 and LTF3 distribution for different speakers.
FIG. 6 is a graph of the long term formant LTF2 and LTF3 distribution for the same speaker.
Fig. 7 is a t-test confidence interval distribution map.
FIG. 8 is a flowchart of a method according to an embodiment of the present invention.
Detailed Description
The invention is further illustrated by the following specific examples and figures.
FIG. 1 depicts the frequency variation of LTF2 and LTF3 in both the natural speaking and reading contexts of multiple test persons, from which it can be seen that the frequency mean variation of LTF2 and LTF3 for speakers in both contexts is very small; LTF4 is more affected by the telephone communication bandwidth, so the present invention selects LTF2 and LTF3 for the voiceprint authentication basis.
As shown in FIG. 2, the positions of vowel formants F1-F4 are determined by combining a linear predictive analysis technique and manual correction for a voice file to be identified, wherein the positions are F1-F4 in sequence according to a curve from low frequency to high frequency, the formants F1-F3 are not used as identification bases due to unstable formants F4, the time-varying curves of the formants F1-F3 are shown in FIG. 3, and long-time formant F1-F3 frequency distribution curves shown in FIG. 4 can be drawn according to the frequency and the occurrence probability of each formant. From the above-mentioned frequency distribution characteristics of the long-term formants, different speakers have different distributions of LTF2 and LTF3, and fig. 5 depicts the distributions of vowels LTF2 and LTF3 of 2 testers, in which two solid lines are distributions of LTF2 of two testers, respectively, and two dashed lines are distributions of LTF3 of two testers, respectively. It can be seen from the figure that LTF2 and LTF3 of 2 people not only have different frequency means, but also have larger differences in the section covered by the distribution curve and the curve shape. The distribution of vowel LTF2 and LTF3 measured in different contexts for the same speaker is shown in fig. 6, where two solid lines are the vowel LTF2 distribution measured in different contexts for the same speaker, and two dashed lines are the vowel LTF3 distribution measured in different contexts for the same speaker, it can be known from the figure that the long-term formants LTF2 and LTF3 of the same speaker in different contexts not only have small frequency mean variation, but also have very similar intervals and shapes of distribution curves, so that hypothesis test can be performed on the measured long-term formants LTF2 and LTF3 data by using a probabilistic method to determine whether the detected speech sample is the target speaker.
Based on the above principle and research, the present invention provides a voice identity verification method based on long-term formant measurement, as shown in fig. 8, the method includes:
s1, knowing a voice file from the same speaker, calculating the distance between the long-time resonance peak data of any two sections of voice in the known voice file, and obtaining the upper limit distanceAnd a lower limit distance
let the 4 long-term formant measurement data of 2-segment speech in the known speech file be X1 and Y1, wherein,
in the formula, xF11……xF1mFor the first to the mth resonance peak data, x, under the first frequency of the first section voiceF21……xF2mFor the first to the mth resonance peak data x under the second frequency of the first section voiceF31……xF3mIs the first to the mth resonance peak data x under the third frequency of the first section voiceF41……xF4mFor the first speech segment from the first to the m < th > frequencyIndividual resonance peak data; y isF11……yF1nFor the first to nth resonance peak data, y, at the first frequency of the second speech segmentF21……yF2nFor the first to nth resonance peak data, y, at the second frequency of the second speech segmentF31……yF3nIs the first to nth resonance peak data y under the third frequency of the second section voiceF41……yF4nThe data of the first to nth resonance peak under the fourth frequency of the second section of voice; the first to fourth frequencies are frequencies which are sequentially increased or sequentially decreased;
the column data of each long-time formant measurement data matrix form a formant vector xi= [xF1ixF2ixF3ixF4i]、yi=[yF1i yF2i yF3i yF4i]Respectively calculating the central position of the m vectors of the first section of voice and the n vectors of the second section of voice, and enabling x to bec=[xF1c xF2c xF3c xF4c]Is the center of the X1 matrix, let yc=[yF1c yF2c yF3c yF4c]For the center of Y1 matrix, x is obtained according to the clustering principlecTo xiIs minimized, x is obtained by solving the following minimum problemcAnd yc:
At xcAnd ycOn the basis, the Euclidean distance between centers is calculated to calculate the long-term formant distance D of the two sections of voice*:
From said knownRespectively calculating the distance between every two voices of different segments in the voice file according to the method, and taking the maximum value and the minimum value as the upper limit distanceAnd a lower limit distance
S2, when a material detection voice is collected, calculating the long-term formant distance D between the material detection voice and the known voice file, and the method for calculating the long-term formant distance D between the material detection voice and the long-term formant distance D between two sections of voices in the known voice file*The same method is used.
Then the following judgments were made: when in useJudging that the material checking voice in the time interval has the same identity with the known voice file, namely the same speaker; when in useJudging that the material testing voice in the time interval does not have the same identity with the known voice file, namely different speakers are obtained; when in useA hypothesis test is used to verify identity.
The hypothesis testing method is a t testing method, and comprises the following specific steps:
let the 4 long-term formant measurement data of the material testing voice be Z1, wherein
In the formula, zF11……zF1jFor first to jth resonance peak data, z, at a first frequency of the material-testing voiceF21……ZF2jSecond voice for material inspectionFirst to jth resonance peak data z at frequencyF31……zF3jFor first to jth resonance peak data, Z, under the third frequency of the material-testing voiceF41……ZF4jData of first to jth resonance peaks at a fourth frequency of the material detection voice are obtained;
let xF21、xF22、xF23、……、xF2mObedience as N (u, σ)2) Normal distribution of (a), zF21、zF22、 zF23……zF2jObedience as N (v, σ)2) According to the statistical theory, the data of the resonance peak at the second frequency are distributed as follows:
wherein xF2mean、SxAre respectively xF21、xF22、xF23、……、xF2mMean and standard deviation of (2), ZF2mean、SzAre each zF21、zF22、ZF23……zF2jMean and standard deviation of.
There are 2 hypotheses, H0:u=v,H1U is not equal to v if H0If true, then this time:
to H0、H1When performing hypothesis testing, a confidence level α is given when
Then, it is determined that the time interval material-checking voice is identical to the known voice file, i.e. H is accepted0(ii) a Otherwise, judging that the time interval material checking voice does not have the same identity with the known voice file, namely rejecting H0。
As shown in fig. 7, when two test materials are considered to be from the same speaker with a probability of 95% confidence level, the two detected files are required to measure long-term formants satisfying the following inequality:
|xF2mean-zF2mean|<c
whereint0.05And (m + j-2) is a t distribution variable value corresponding to the degree of freedom m + j-2, i.e., the degree of reliability α is 0.05. As can be seen from FIG. 7, the larger 1 α is, the higher H0The greater the confidence that it is established. Since the t distribution is symmetrical about the vertical axis, let 2 β be 1- α, then
When the two samples are subjected to the hypothesis test of the identity of the two samples, in order to determine the reasonable value range of the beta, the upper and lower limits of the beta can be determined by comparing the beta with the samplesWhen in use The detected materials are considered to have identity; when in useRefusing the material to be detected to have identity;a comprehensive judgment needs to be made in conjunction with the distance D.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the voice identity verification method based on the long-term formant measurement when executing the computer program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for voice identity verification based on long-term formant measurements.
The above embodiments are only used for illustrating the design idea and features of the present invention, and the purpose of the present invention is to enable those skilled in the art to understand the content of the present invention and implement the present invention accordingly, and the protection scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes and modifications made in accordance with the principles and concepts disclosed herein are intended to be included within the scope of the present invention.
Claims (6)
1. A voice identity verification method based on long-term formant measurement is characterized by comprising the following steps: the method comprises the following steps:
knowing a voice file from the same speaker, calculating the distance between the long-time resonance peak data of any two sections of voice in the known voice file, and obtaining the upper limit distanceAnd a lower limit distance
When a material testing voice is collected, calculating the long-term formant distance D between the material testing voice and the known voice file, and carrying out the following judgment:
when in useJudging that the material checking voice in the time interval has the same identity with the known voice file, namely the same speaker;
when in useJudging that the time interval material checking voice and the known voice file do not have the sameSex, namely different speakers;
2. The method of claim 1, wherein: the upper limit distanceAnd a lower limit distanceThe calculation method of (2) is as follows:
let the 4 long-term formant measurement data of 2-segment speech in the known speech file be X1 and Y1, wherein,
in the formula, xF11……xF1mFor the first to the mth resonance peak data, x, under the first frequency of the first section voiceF21……xF2mFor the first to the mth resonance peak data x under the second frequency of the first section voiceF31……xF3mIs the first to the mth resonance peak data x under the third frequency of the first section voiceF41……xF4mData of first to mth resonance peaks under a fourth frequency of the first section of voice; y isF11……yF1nFor the first to nth resonance peak data, y, at the first frequency of the second speech segmentF21……yF2nFor the first to nth resonance peak data, y, at the second frequency of the second speech segmentF31……yF3nIs the first frequency of the second speech segmentTo nth resonance peak data, yF41……yF4nThe data of the first to nth resonance peak under the fourth frequency of the second section of voice; the first to fourth frequencies are frequencies which are sequentially increased or sequentially decreased;
the column data of each long-time formant measurement data matrix form a formant vector xi=[xF1i xF2i xF3ixF4i]、yi=[yF1i yF2i yF3i yF4i]Respectively calculating the central position of the m vectors of the first section of voice and the n vectors of the second section of voice, and enabling x to bec=[xF1c xF2c xF3c xF4c]Is the center of the X1 matrix, let yc=[yF1c yF2c yF3c yF4c]For the center of Y1 matrix, x is obtained according to the clustering principlecTo xiIs minimized, x is obtained by solving the following minimum problemcAnd yc:
At xcAnd ycOn the basis, the Euclidean distance between centers is calculated to calculate the long-term formant distance D of the two sections of voice*:
3. The method of claim 2, wherein: the method for calculating the long-term formant distance D of the material-tested voice is the same as the method for calculating the long-term formant distance D of two sections of voice in the known voice file.
4. The method of claim 3, wherein: the hypothesis testing method is a t testing method, and comprises the following specific steps:
let the 4 long-term formant measurement data of the material testing voice be Z1, wherein
In the formula, zF11……zF1jFor first to jth resonance peak data, z, at a first frequency of the material-testing voiceF21……zF2jFor first to jth resonance peak data, z, at a second frequency of the material-testing voiceF31……zF3jFor the first to jth resonance peak data, z, at the third frequency of the material-testing voiceF41……zF4jData of first to jth resonance peaks at a fourth frequency of the material detection voice are obtained;
let xF21、xF22、xF23、……、xF2mObedience as N (u, σ)2) Normal distribution of (a), zF21、zF22、zF23……zF2jObedience as N (v, σ)2) According to the statistical theory, the data of the resonance peak at the second frequency are distributed as follows:
wherein xF2mean、SxAre respectively xF21、xF22、xF23、……、xF2mMean and standard deviation of (2), zF2mean、SzAre each zF21、zF22、zF23……zF2jMean and standard deviation of;
given a degree of confidence a, when
And judging that the time-interval material-checking voice is identical to the known voice file, otherwise, judging that the time-interval material-checking voice is not identical to the known voice file.
5. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein: the processor, when executing the computer program, performs the steps of the method for verifying speech identity based on long-term formant measurements according to any one of claims 1 to 4.
6. A non-transitory computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when being executed by a processor realizes the steps of a method for verifying speech identity based on long-term formant measurements according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110510987.1A CN113409796B (en) | 2021-05-11 | 2021-05-11 | Voice identity verification method based on long-term formant measurement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110510987.1A CN113409796B (en) | 2021-05-11 | 2021-05-11 | Voice identity verification method based on long-term formant measurement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113409796A true CN113409796A (en) | 2021-09-17 |
CN113409796B CN113409796B (en) | 2022-09-27 |
Family
ID=77678249
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110510987.1A Active CN113409796B (en) | 2021-05-11 | 2021-05-11 | Voice identity verification method based on long-term formant measurement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113409796B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016209888A1 (en) * | 2015-06-22 | 2016-12-29 | Rita Singh | Processing speech signals in voice-based profiling |
CN111105815A (en) * | 2020-01-20 | 2020-05-05 | 深圳震有科技股份有限公司 | Auxiliary detection method and device based on voice activity detection and storage medium |
CN111108552A (en) * | 2019-12-24 | 2020-05-05 | 广州国音智能科技有限公司 | Voiceprint identity identification method and related device |
CN111108551A (en) * | 2019-12-24 | 2020-05-05 | 广州国音智能科技有限公司 | Voiceprint identification method and related device |
-
2021
- 2021-05-11 CN CN202110510987.1A patent/CN113409796B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016209888A1 (en) * | 2015-06-22 | 2016-12-29 | Rita Singh | Processing speech signals in voice-based profiling |
CN111108552A (en) * | 2019-12-24 | 2020-05-05 | 广州国音智能科技有限公司 | Voiceprint identity identification method and related device |
CN111108551A (en) * | 2019-12-24 | 2020-05-05 | 广州国音智能科技有限公司 | Voiceprint identification method and related device |
CN111105815A (en) * | 2020-01-20 | 2020-05-05 | 深圳震有科技股份有限公司 | Auxiliary detection method and device based on voice activity detection and storage medium |
Non-Patent Citations (5)
Title |
---|
ERICA GOLD等: "Examining correlations between phonetic parameters: Implications for forensic speaker comparison", 《THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA》 * |
ERICA GOLD等: "Examining long-term formant distributions as a discriminant in forensic speaker comparisons under a likelihood ratio framework", 《THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA》 * |
MICHAEL JESSEN等: "Long‐term formant distribution as a forensic‐phonetic feature", 《THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA》 * |
曹洪林: "长时共振峰分布特征在声纹鉴定中的应用", 《中国司法鉴定》 * |
贾丽文: "音量增大时语音的长时共振峰分布特征变化及其对声纹鉴定的影响", 《山西大同大学学报(自然科学版)》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113409796B (en) | 2022-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yu et al. | Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features | |
US9536547B2 (en) | Speaker change detection device and speaker change detection method | |
Becker et al. | Forensic speaker verification using formant features and Gaussian mixture models. | |
US20210125603A1 (en) | Acoustic model training method, speech recognition method, apparatus, device and medium | |
CN101136199B (en) | Voice data processing method and equipment | |
Mandasari et al. | Quality measure functions for calibration of speaker recognition systems in various duration conditions | |
US8271283B2 (en) | Method and apparatus for recognizing speech by measuring confidence levels of respective frames | |
US20090313016A1 (en) | System and Method for Detecting Repeated Patterns in Dialog Systems | |
Jin et al. | Cute: A concatenative method for voice conversion using exemplar-based unit selection | |
Ferragne et al. | Vowel systems and accent similarity in the British Isles: Exploiting multidimensional acoustic distances in phonetics | |
CN113409796B (en) | Voice identity verification method based on long-term formant measurement | |
US20230178099A1 (en) | Using optimal articulatory event-types for computer analysis of speech | |
WO2002029785A1 (en) | Method, apparatus, and system for speaker verification based on orthogonal gaussian mixture model (gmm) | |
Vair et al. | Loquendo-Politecnico di torino's 2006 NIST speaker recognition evaluation system. | |
Laskowski et al. | Modeling instantaneous intonation for speaker identification using the fundamental frequency variation spectrum | |
Kinoshita et al. | Sub-band cepstral distance as an alternative to formants: Quantitative evidence from a forensic comparison experiment | |
Arcienega et al. | Pitch-dependent GMMs for text-independent speaker recognition systems. | |
CN113705671A (en) | Speaker identification method and system based on text related information perception | |
Dusan | On the relevance of some spectral and temporal patterns for vowel classification | |
Al-Manie et al. | Automatic speech segmentation using the Arabic phonetic database | |
Nath et al. | Composite feature selection method based on spoken word and speaker recognition | |
Andreev et al. | Attacking the problem of continuous speech segmentation into basic units | |
Xie et al. | Kurtosis normalization after short-time gaussianization for robust speaker verification | |
Tomar | Discriminant feature space transformations for automatic speech recognition | |
Nath et al. | Feature Selection Method for Speaker Recognition using Neural Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |