GB2583420A - Speaker identification - Google Patents
Speaker identification Download PDFInfo
- Publication number
- GB2583420A GB2583420A GB2009795.2A GB202009795A GB2583420A GB 2583420 A GB2583420 A GB 2583420A GB 202009795 A GB202009795 A GB 202009795A GB 2583420 A GB2583420 A GB 2583420A
- Authority
- GB
- United Kingdom
- Prior art keywords
- speech
- voice biometric
- biometric process
- audio signal
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Abstract
A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice bio metric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.
Claims (32)
1. A method of speaker identification, comprising: receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker, wherein the second voice biometric process is selected to be more discriminative than the first voice biometric process.
2. A method according to claim 1 , wherein the second voice biometric process is configured to have a lower False Acceptance Rate than the first voice biometric process.
3. A method according to claim 1 , wherein the second voice biometric process is configured to have a lower False Rejection Rate than the first voice biometric process.
4. A method according to claim 1 , wherein the second voice biometric process is configured to have a lower Equal Error Rate than the first voice biometric process.
5. A method according to any preceding claim, wherein the first voice biometric process is selected as a relatively low power process compared to the second voice biometric process.
6. A method according to claim 1 , comprising making a decision as to whether the speech is the speech of the enrolled speaker, based on a result of the second voice biometric process.
7. A method according to claim 1 , comprising making a decision as to whether the speech is the speech of the enrolled speaker, based on a fusion of a result of the first voice biometric process and a result of the second voice biometric process.
8. A method according to any preceding claim, wherein the first voice biometric process is selected from the following: a process based on analysing a long-term spectrum of the speech; a method using a Gaussian Mixture Model; a method using Mel Frequency Cepstral Coefficients; a method using Principal Component Analysis; a method using machine learning techniques such as Deep Neural Nets (DNNs); and a method using a Support Vector Machine.
9. A method according to any preceding claim, wherein the second voice biometric process is selected from the following: a neural net process, a Joint Factor Analysis process; a Tied Mixture of Factor Analyzers process; and an i-vector process.
10. A method according to any preceding claim, wherein the first voice biometric process is performed in a first device and the second voice biometric process is performed in a second device remote from the first device.
11. A method according to any preceding claim, comprising maintaining the second voice biometric process in a low power state, and activating the second voice biometric process if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user.
12. A method according to any preceding claim, comprising activating the second voice biometric process in response to an initial determination based on a partial completion of the first voice biometric process that the speech might be the speech of an enrolled user, and deactivating the second voice biometric process in response to a determination based on a completion of the first voice biometric process that the speech is not the speech of the enrolled user.
13. A method according to any preceding claim, comprising: detecting a trigger phrase in the received audio signal; and responsive to the detecting of a trigger phrase, performing the first voice biometric process on the received audio signal.
14. A method according to any preceding claim, comprising: detecting voice activity in the received audio signal; and responsive to the detecting of voice activity, performing the first voice biometric process on at least a part of the received audio signal.
15. A method according to any of claims 1 to 14, comprising: detecting voice activity in the received audio signal; responsive to the detecting of voice activity, performing keyword detection; and responsive to detecting a keyword, performing the first voice biometric process on at least a part of the received audio signal.
16. A method according to any preceding claim, comprising: performing the first voice biometric process on the entire received audio signal.
17. A method according to any preceding claim, comprising using an initial determination by the first voice biometric process, that the speech is the speech of an enrolled user, as an indication that the received audio signal comprises speech.
18. A method according to any of claims 1 to 17, comprising: performing at least a part of a voice biometric process suitable for determining whether a signal contains speech of an enrolled user, and generating an output signal when it is determined that the signal contains human speech.
19. A method according to claim 18, comprising comparing a similarity score with a first threshold to determine whether the signal contains speech of an enrolled user, and comparing the similarity score with a second, lower, threshold to determine whether the signal contains speech.
20. A method according to claim 19, comprising determining that the signal contains human speech before it is possible to determine whether the signal contains speech of an enrolled user.
21. A method according to any preceding claim, wherein the first voice biometric process is configured as an analog processing system, and the second voice biometric process is configured as a digital processing system.
22. A method according to any preceding claim, further comprising performing one or more tests on the received audio signal, to determine whether the received audio signal has properties that indicate that it may result from a replay attack.
23. A method according to claim 22, comprising performing the second voice biometric process on the received audio signal only if it is determined that the received audio signal does not have properties that indicate that it may result from a replay attack.
24. A speaker identification system, comprising: an input for receiving an audio signal representing speech; a first device including a first processor for performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and a second device including a second processor for performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker, wherein the second voice biometric process is initiated if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, and wherein the second voice biometric process is selected to be more discriminative than the first voice biometric process.
25. A speaker identification system according to claim 24, wherein the first device comprises a first integrated circuit, and the second device comprises a second integrated circuit.
26. A speaker identification system according to claim 24 or 25, wherein the first device comprises a dedicated biometrics integrated circuit.
27. A speaker identification system according to claim 26, wherein the first device is an accessory device.
28. A speaker identification system according to claim 27, wherein the first device is a listening device.
29. A speaker identification system according to claim 24 or 25, wherein the second device comprises an applications processor .
30. A speaker identification system according to claim 29, wherein the second device is a handset device.
31. A speaker identification system according to claim 30, wherein the second device is a smartphone.
32. A speaker identification system according to any of claims 24 to 31 , wherein the first device is arranged to perform a spoof detection process on the audio signal, to identify if the audio signal is the result of an audio spoof attack, and wherein the output of the first voice biometric process is gated by the output of the spoof detection process, such that, if a spoof attack is detected, the first voice biometric process is prevented from initiating the second voice biometric process.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB2210387.3A GB2608710B (en) | 2018-01-23 | 2019-01-23 | Speaker identification |
GB2210986.2A GB2609093B (en) | 2018-01-23 | 2019-01-23 | Speaker identification |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/877,660 US11264037B2 (en) | 2018-01-23 | 2018-01-23 | Speaker identification |
GBGB1809474.8A GB201809474D0 (en) | 2018-01-23 | 2018-06-08 | Speaker identification |
US201862733755P | 2018-09-20 | 2018-09-20 | |
PCT/GB2019/050185 WO2019145708A1 (en) | 2018-01-23 | 2019-01-23 | Speaker identification |
Publications (3)
Publication Number | Publication Date |
---|---|
GB202009795D0 GB202009795D0 (en) | 2020-08-12 |
GB2583420A true GB2583420A (en) | 2020-10-28 |
GB2583420B GB2583420B (en) | 2022-09-14 |
Family
ID=67395939
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB2210387.3A Active GB2608710B (en) | 2018-01-23 | 2019-01-23 | Speaker identification |
GB2210986.2A Active GB2609093B (en) | 2018-01-23 | 2019-01-23 | Speaker identification |
GB2009795.2A Active GB2583420B (en) | 2018-01-23 | 2019-01-23 | Speaker identification |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB2210387.3A Active GB2608710B (en) | 2018-01-23 | 2019-01-23 | Speaker identification |
GB2210986.2A Active GB2609093B (en) | 2018-01-23 | 2019-01-23 | Speaker identification |
Country Status (4)
Country | Link |
---|---|
KR (1) | KR20200108858A (en) |
CN (1) | CN111656440A (en) |
GB (3) | GB2608710B (en) |
WO (1) | WO2019145708A1 (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201801526D0 (en) | 2017-07-07 | 2018-03-14 | Cirrus Logic Int Semiconductor Ltd | Methods, apparatus and systems for authentication |
GB201801659D0 (en) | 2017-11-14 | 2018-03-21 | Cirrus Logic Int Semiconductor Ltd | Detection of loudspeaker playback |
US10915614B2 (en) * | 2018-08-31 | 2021-02-09 | Cirrus Logic, Inc. | Biometric authentication |
WO2021154600A1 (en) * | 2020-01-27 | 2021-08-05 | Pindrop Security, Inc. | Robust spoofing detection system using deep residual neural networks |
KR102493866B1 (en) * | 2020-02-20 | 2023-01-30 | 시러스 로직 인터내셔널 세미컨덕터 리미티드 | Audio system with digital microphone |
US11341974B2 (en) | 2020-05-21 | 2022-05-24 | Cirrus Logic, Inc. | Authenticating received speech |
US11721346B2 (en) * | 2020-06-10 | 2023-08-08 | Cirrus Logic, Inc. | Authentication device |
CA3190161A1 (en) * | 2020-08-21 | 2022-02-24 | Pindrop Security, Inc. | Improving speaker recognition with quality indicators |
CN113327618B (en) * | 2021-05-17 | 2024-04-19 | 西安讯飞超脑信息科技有限公司 | Voiceprint discrimination method, voiceprint discrimination device, computer device and storage medium |
CN113327617B (en) * | 2021-05-17 | 2024-04-19 | 西安讯飞超脑信息科技有限公司 | Voiceprint discrimination method, voiceprint discrimination device, computer device and storage medium |
CN113516987B (en) * | 2021-07-16 | 2024-04-12 | 科大讯飞股份有限公司 | Speaker recognition method, speaker recognition device, storage medium and equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1399915A2 (en) * | 2001-06-19 | 2004-03-24 | Securivox Ltd | Speaker recognition system |
US20150356974A1 (en) * | 2013-01-17 | 2015-12-10 | Nec Corporation | Speaker identification device, speaker identification method, and recording medium |
US20170351487A1 (en) * | 2016-06-06 | 2017-12-07 | Cirrus Logic International Semiconductor Ltd. | Voice user interface |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020194003A1 (en) * | 2001-06-05 | 2002-12-19 | Mozer Todd F. | Client-server security system and method |
WO2006054205A1 (en) * | 2004-11-16 | 2006-05-26 | Koninklijke Philips Electronics N.V. | Audio device for and method of determining biometric characteristincs of a user. |
EP1938093B1 (en) * | 2005-09-22 | 2012-07-25 | Koninklijke Philips Electronics N.V. | Method and apparatus for acoustical outer ear characterization |
US9384738B2 (en) * | 2014-06-24 | 2016-07-05 | Google Inc. | Dynamic threshold for speaker verification |
GB201801530D0 (en) * | 2017-07-07 | 2018-03-14 | Cirrus Logic Int Semiconductor Ltd | Methods, apparatus and systems for authentication |
-
2019
- 2019-01-23 GB GB2210387.3A patent/GB2608710B/en active Active
- 2019-01-23 CN CN201980009737.0A patent/CN111656440A/en active Pending
- 2019-01-23 KR KR1020207022108A patent/KR20200108858A/en not_active Application Discontinuation
- 2019-01-23 GB GB2210986.2A patent/GB2609093B/en active Active
- 2019-01-23 WO PCT/GB2019/050185 patent/WO2019145708A1/en active Application Filing
- 2019-01-23 GB GB2009795.2A patent/GB2583420B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1399915A2 (en) * | 2001-06-19 | 2004-03-24 | Securivox Ltd | Speaker recognition system |
US20150356974A1 (en) * | 2013-01-17 | 2015-12-10 | Nec Corporation | Speaker identification device, speaker identification method, and recording medium |
US20170351487A1 (en) * | 2016-06-06 | 2017-12-07 | Cirrus Logic International Semiconductor Ltd. | Voice user interface |
Also Published As
Publication number | Publication date |
---|---|
KR20200108858A (en) | 2020-09-21 |
CN111656440A (en) | 2020-09-11 |
GB202210387D0 (en) | 2022-08-31 |
GB2609093A (en) | 2023-01-25 |
GB202210986D0 (en) | 2022-09-07 |
GB2583420B (en) | 2022-09-14 |
GB2608710A (en) | 2023-01-11 |
GB2609093B (en) | 2023-05-10 |
GB202009795D0 (en) | 2020-08-12 |
WO2019145708A1 (en) | 2019-08-01 |
GB2608710B (en) | 2023-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
GB2583420A (en) | Speaker identification | |
US11042616B2 (en) | Detection of replay attack | |
US20220093108A1 (en) | Speaker identification | |
Permanasari et al. | Speech recognition using dynamic time warping (DTW) | |
US20200227071A1 (en) | Analysing speech signals | |
US11037574B2 (en) | Speaker recognition and speaker change detection | |
CN108630202A (en) | Speech recognition equipment, audio recognition method and speech recognition program | |
US9147401B2 (en) | Method and apparatus for speaker-calibrated speaker detection | |
CN110335593A (en) | Sound end detecting method, device, equipment and storage medium | |
CN112002349B (en) | Voice endpoint detection method and device | |
US11437022B2 (en) | Performing speaker change detection and speaker recognition on a trigger phrase | |
Jaafar et al. | Automatic syllables segmentation for frog identification system | |
CN110111776A (en) | Interactive voice based on microphone signal wakes up electronic equipment, method and medium | |
CN111161746B (en) | Voiceprint registration method and system | |
CN110580897B (en) | Audio verification method and device, storage medium and electronic equipment | |
US11468899B2 (en) | Enrollment in speaker recognition system | |
CN109273012B (en) | Identity authentication method based on speaker recognition and digital voice recognition | |
GB2576960A (en) | Speaker recognition | |
US10818298B2 (en) | Audio processing | |
JP6616182B2 (en) | Speaker recognition device, discriminant value generation method, and program | |
US11074917B2 (en) | Speaker identification | |
CN109271480A (en) | A kind of voice searches topic method and electronic equipment | |
Mishra et al. | Speaker identification, differentiation and verification using deep learning for human machine interface | |
CN108573712B (en) | Voice activity detection model generation method and system and voice activity detection method and system | |
US20200321022A1 (en) | Method and apparatus for detecting an end of an utterance |