GB2583420A - Speaker identification - Google Patents

Speaker identification Download PDF

Info

Publication number
GB2583420A
GB2583420A GB2009795.2A GB202009795A GB2583420A GB 2583420 A GB2583420 A GB 2583420A GB 202009795 A GB202009795 A GB 202009795A GB 2583420 A GB2583420 A GB 2583420A
Authority
GB
United Kingdom
Prior art keywords
speech
voice biometric
biometric process
audio signal
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB2009795.2A
Other versions
GB2583420B (en
GB202009795D0 (en
Inventor
Paul Lesso John
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cirrus Logic International Semiconductor Ltd
Original Assignee
Cirrus Logic International Semiconductor Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/877,660 external-priority patent/US11264037B2/en
Application filed by Cirrus Logic International Semiconductor Ltd filed Critical Cirrus Logic International Semiconductor Ltd
Priority to GB2210387.3A priority Critical patent/GB2608710B/en
Priority to GB2210986.2A priority patent/GB2609093B/en
Publication of GB202009795D0 publication Critical patent/GB202009795D0/en
Publication of GB2583420A publication Critical patent/GB2583420A/en
Application granted granted Critical
Publication of GB2583420B publication Critical patent/GB2583420B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Abstract

A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice bio metric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.

Claims (32)

1. A method of speaker identification, comprising: receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker, wherein the second voice biometric process is selected to be more discriminative than the first voice biometric process.
2. A method according to claim 1 , wherein the second voice biometric process is configured to have a lower False Acceptance Rate than the first voice biometric process.
3. A method according to claim 1 , wherein the second voice biometric process is configured to have a lower False Rejection Rate than the first voice biometric process.
4. A method according to claim 1 , wherein the second voice biometric process is configured to have a lower Equal Error Rate than the first voice biometric process.
5. A method according to any preceding claim, wherein the first voice biometric process is selected as a relatively low power process compared to the second voice biometric process.
6. A method according to claim 1 , comprising making a decision as to whether the speech is the speech of the enrolled speaker, based on a result of the second voice biometric process.
7. A method according to claim 1 , comprising making a decision as to whether the speech is the speech of the enrolled speaker, based on a fusion of a result of the first voice biometric process and a result of the second voice biometric process.
8. A method according to any preceding claim, wherein the first voice biometric process is selected from the following: a process based on analysing a long-term spectrum of the speech; a method using a Gaussian Mixture Model; a method using Mel Frequency Cepstral Coefficients; a method using Principal Component Analysis; a method using machine learning techniques such as Deep Neural Nets (DNNs); and a method using a Support Vector Machine.
9. A method according to any preceding claim, wherein the second voice biometric process is selected from the following: a neural net process, a Joint Factor Analysis process; a Tied Mixture of Factor Analyzers process; and an i-vector process.
10. A method according to any preceding claim, wherein the first voice biometric process is performed in a first device and the second voice biometric process is performed in a second device remote from the first device.
11. A method according to any preceding claim, comprising maintaining the second voice biometric process in a low power state, and activating the second voice biometric process if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user.
12. A method according to any preceding claim, comprising activating the second voice biometric process in response to an initial determination based on a partial completion of the first voice biometric process that the speech might be the speech of an enrolled user, and deactivating the second voice biometric process in response to a determination based on a completion of the first voice biometric process that the speech is not the speech of the enrolled user.
13. A method according to any preceding claim, comprising: detecting a trigger phrase in the received audio signal; and responsive to the detecting of a trigger phrase, performing the first voice biometric process on the received audio signal.
14. A method according to any preceding claim, comprising: detecting voice activity in the received audio signal; and responsive to the detecting of voice activity, performing the first voice biometric process on at least a part of the received audio signal.
15. A method according to any of claims 1 to 14, comprising: detecting voice activity in the received audio signal; responsive to the detecting of voice activity, performing keyword detection; and responsive to detecting a keyword, performing the first voice biometric process on at least a part of the received audio signal.
16. A method according to any preceding claim, comprising: performing the first voice biometric process on the entire received audio signal.
17. A method according to any preceding claim, comprising using an initial determination by the first voice biometric process, that the speech is the speech of an enrolled user, as an indication that the received audio signal comprises speech.
18. A method according to any of claims 1 to 17, comprising: performing at least a part of a voice biometric process suitable for determining whether a signal contains speech of an enrolled user, and generating an output signal when it is determined that the signal contains human speech.
19. A method according to claim 18, comprising comparing a similarity score with a first threshold to determine whether the signal contains speech of an enrolled user, and comparing the similarity score with a second, lower, threshold to determine whether the signal contains speech.
20. A method according to claim 19, comprising determining that the signal contains human speech before it is possible to determine whether the signal contains speech of an enrolled user.
21. A method according to any preceding claim, wherein the first voice biometric process is configured as an analog processing system, and the second voice biometric process is configured as a digital processing system.
22. A method according to any preceding claim, further comprising performing one or more tests on the received audio signal, to determine whether the received audio signal has properties that indicate that it may result from a replay attack.
23. A method according to claim 22, comprising performing the second voice biometric process on the received audio signal only if it is determined that the received audio signal does not have properties that indicate that it may result from a replay attack.
24. A speaker identification system, comprising: an input for receiving an audio signal representing speech; a first device including a first processor for performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and a second device including a second processor for performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker, wherein the second voice biometric process is initiated if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, and wherein the second voice biometric process is selected to be more discriminative than the first voice biometric process.
25. A speaker identification system according to claim 24, wherein the first device comprises a first integrated circuit, and the second device comprises a second integrated circuit.
26. A speaker identification system according to claim 24 or 25, wherein the first device comprises a dedicated biometrics integrated circuit.
27. A speaker identification system according to claim 26, wherein the first device is an accessory device.
28. A speaker identification system according to claim 27, wherein the first device is a listening device.
29. A speaker identification system according to claim 24 or 25, wherein the second device comprises an applications processor .
30. A speaker identification system according to claim 29, wherein the second device is a handset device.
31. A speaker identification system according to claim 30, wherein the second device is a smartphone.
32. A speaker identification system according to any of claims 24 to 31 , wherein the first device is arranged to perform a spoof detection process on the audio signal, to identify if the audio signal is the result of an audio spoof attack, and wherein the output of the first voice biometric process is gated by the output of the spoof detection process, such that, if a spoof attack is detected, the first voice biometric process is prevented from initiating the second voice biometric process.
GB2009795.2A 2018-01-23 2019-01-23 Speaker identification Active GB2583420B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB2210387.3A GB2608710B (en) 2018-01-23 2019-01-23 Speaker identification
GB2210986.2A GB2609093B (en) 2018-01-23 2019-01-23 Speaker identification

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US15/877,660 US11264037B2 (en) 2018-01-23 2018-01-23 Speaker identification
GBGB1809474.8A GB201809474D0 (en) 2018-01-23 2018-06-08 Speaker identification
US201862733755P 2018-09-20 2018-09-20
PCT/GB2019/050185 WO2019145708A1 (en) 2018-01-23 2019-01-23 Speaker identification

Publications (3)

Publication Number Publication Date
GB202009795D0 GB202009795D0 (en) 2020-08-12
GB2583420A true GB2583420A (en) 2020-10-28
GB2583420B GB2583420B (en) 2022-09-14

Family

ID=67395939

Family Applications (3)

Application Number Title Priority Date Filing Date
GB2210387.3A Active GB2608710B (en) 2018-01-23 2019-01-23 Speaker identification
GB2210986.2A Active GB2609093B (en) 2018-01-23 2019-01-23 Speaker identification
GB2009795.2A Active GB2583420B (en) 2018-01-23 2019-01-23 Speaker identification

Family Applications Before (2)

Application Number Title Priority Date Filing Date
GB2210387.3A Active GB2608710B (en) 2018-01-23 2019-01-23 Speaker identification
GB2210986.2A Active GB2609093B (en) 2018-01-23 2019-01-23 Speaker identification

Country Status (4)

Country Link
KR (1) KR20200108858A (en)
CN (1) CN111656440A (en)
GB (3) GB2608710B (en)
WO (1) WO2019145708A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201801526D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for authentication
GB201801659D0 (en) 2017-11-14 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of loudspeaker playback
US10915614B2 (en) * 2018-08-31 2021-02-09 Cirrus Logic, Inc. Biometric authentication
WO2021154600A1 (en) * 2020-01-27 2021-08-05 Pindrop Security, Inc. Robust spoofing detection system using deep residual neural networks
KR102493866B1 (en) * 2020-02-20 2023-01-30 시러스 로직 인터내셔널 세미컨덕터 리미티드 Audio system with digital microphone
US11341974B2 (en) 2020-05-21 2022-05-24 Cirrus Logic, Inc. Authenticating received speech
US11721346B2 (en) * 2020-06-10 2023-08-08 Cirrus Logic, Inc. Authentication device
CA3190161A1 (en) * 2020-08-21 2022-02-24 Pindrop Security, Inc. Improving speaker recognition with quality indicators
CN113327618B (en) * 2021-05-17 2024-04-19 西安讯飞超脑信息科技有限公司 Voiceprint discrimination method, voiceprint discrimination device, computer device and storage medium
CN113327617B (en) * 2021-05-17 2024-04-19 西安讯飞超脑信息科技有限公司 Voiceprint discrimination method, voiceprint discrimination device, computer device and storage medium
CN113516987B (en) * 2021-07-16 2024-04-12 科大讯飞股份有限公司 Speaker recognition method, speaker recognition device, storage medium and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1399915A2 (en) * 2001-06-19 2004-03-24 Securivox Ltd Speaker recognition system
US20150356974A1 (en) * 2013-01-17 2015-12-10 Nec Corporation Speaker identification device, speaker identification method, and recording medium
US20170351487A1 (en) * 2016-06-06 2017-12-07 Cirrus Logic International Semiconductor Ltd. Voice user interface

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194003A1 (en) * 2001-06-05 2002-12-19 Mozer Todd F. Client-server security system and method
WO2006054205A1 (en) * 2004-11-16 2006-05-26 Koninklijke Philips Electronics N.V. Audio device for and method of determining biometric characteristincs of a user.
EP1938093B1 (en) * 2005-09-22 2012-07-25 Koninklijke Philips Electronics N.V. Method and apparatus for acoustical outer ear characterization
US9384738B2 (en) * 2014-06-24 2016-07-05 Google Inc. Dynamic threshold for speaker verification
GB201801530D0 (en) * 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for authentication

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1399915A2 (en) * 2001-06-19 2004-03-24 Securivox Ltd Speaker recognition system
US20150356974A1 (en) * 2013-01-17 2015-12-10 Nec Corporation Speaker identification device, speaker identification method, and recording medium
US20170351487A1 (en) * 2016-06-06 2017-12-07 Cirrus Logic International Semiconductor Ltd. Voice user interface

Also Published As

Publication number Publication date
KR20200108858A (en) 2020-09-21
CN111656440A (en) 2020-09-11
GB202210387D0 (en) 2022-08-31
GB2609093A (en) 2023-01-25
GB202210986D0 (en) 2022-09-07
GB2583420B (en) 2022-09-14
GB2608710A (en) 2023-01-11
GB2609093B (en) 2023-05-10
GB202009795D0 (en) 2020-08-12
WO2019145708A1 (en) 2019-08-01
GB2608710B (en) 2023-05-17

Similar Documents

Publication Publication Date Title
GB2583420A (en) Speaker identification
US11042616B2 (en) Detection of replay attack
US20220093108A1 (en) Speaker identification
Permanasari et al. Speech recognition using dynamic time warping (DTW)
US20200227071A1 (en) Analysing speech signals
US11037574B2 (en) Speaker recognition and speaker change detection
CN108630202A (en) Speech recognition equipment, audio recognition method and speech recognition program
US9147401B2 (en) Method and apparatus for speaker-calibrated speaker detection
CN110335593A (en) Sound end detecting method, device, equipment and storage medium
CN112002349B (en) Voice endpoint detection method and device
US11437022B2 (en) Performing speaker change detection and speaker recognition on a trigger phrase
Jaafar et al. Automatic syllables segmentation for frog identification system
CN110111776A (en) Interactive voice based on microphone signal wakes up electronic equipment, method and medium
CN111161746B (en) Voiceprint registration method and system
CN110580897B (en) Audio verification method and device, storage medium and electronic equipment
US11468899B2 (en) Enrollment in speaker recognition system
CN109273012B (en) Identity authentication method based on speaker recognition and digital voice recognition
GB2576960A (en) Speaker recognition
US10818298B2 (en) Audio processing
JP6616182B2 (en) Speaker recognition device, discriminant value generation method, and program
US11074917B2 (en) Speaker identification
CN109271480A (en) A kind of voice searches topic method and electronic equipment
Mishra et al. Speaker identification, differentiation and verification using deep learning for human machine interface
CN108573712B (en) Voice activity detection model generation method and system and voice activity detection method and system
US20200321022A1 (en) Method and apparatus for detecting an end of an utterance