GB2583420A

GB2583420A - Speaker identification

Info

Publication number: GB2583420A
Application number: GB2009795.2A
Authority: GB
Inventors: Paul Lesso John
Original assignee: Cirrus Logic International Semiconductor Ltd
Current assignee: Cirrus Logic International Semiconductor Ltd
Priority date: 2018-01-23
Filing date: 2019-01-23
Publication date: 2020-10-28
Anticipated expiration: 2039-01-23
Also published as: KR20200108858A; CN111656440A; GB202210387D0; GB2609093A; GB202210986D0; GB2583420B; GB2608710A; GB2609093B; GB202009795D0; WO2019145708A1; GB2608710B

Abstract

A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice bio metric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.

Claims

1. A method of speaker identification, comprising: receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker, wherein the second voice biometric process is selected to be more discriminative than the first voice biometric process.

2. A method according to claim 1 , wherein the second voice biometric process is configured to have a lower False Acceptance Rate than the first voice biometric process.

3. A method according to claim 1 , wherein the second voice biometric process is configured to have a lower False Rejection Rate than the first voice biometric process.

4. A method according to claim 1 , wherein the second voice biometric process is configured to have a lower Equal Error Rate than the first voice biometric process.

5. A method according to any preceding claim, wherein the first voice biometric process is selected as a relatively low power process compared to the second voice biometric process.

6. A method according to claim 1 , comprising making a decision as to whether the speech is the speech of the enrolled speaker, based on a result of the second voice biometric process.

7. A method according to claim 1 , comprising making a decision as to whether the speech is the speech of the enrolled speaker, based on a fusion of a result of the first voice biometric process and a result of the second voice biometric process.

8. A method according to any preceding claim, wherein the first voice biometric process is selected from the following: a process based on analysing a long-term spectrum of the speech; a method using a Gaussian Mixture Model; a method using Mel Frequency Cepstral Coefficients; a method using Principal Component Analysis; a method using machine learning techniques such as Deep Neural Nets (DNNs); and a method using a Support Vector Machine.

9. A method according to any preceding claim, wherein the second voice biometric process is selected from the following: a neural net process, a Joint Factor Analysis process; a Tied Mixture of Factor Analyzers process; and an i-vector process.

10. A method according to any preceding claim, wherein the first voice biometric process is performed in a first device and the second voice biometric process is performed in a second device remote from the first device.

11. A method according to any preceding claim, comprising maintaining the second voice biometric process in a low power state, and activating the second voice biometric process if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user.

12. A method according to any preceding claim, comprising activating the second voice biometric process in response to an initial determination based on a partial completion of the first voice biometric process that the speech might be the speech of an enrolled user, and deactivating the second voice biometric process in response to a determination based on a completion of the first voice biometric process that the speech is not the speech of the enrolled user.

13. A method according to any preceding claim, comprising: detecting a trigger phrase in the received audio signal; and responsive to the detecting of a trigger phrase, performing the first voice biometric process on the received audio signal.

14. A method according to any preceding claim, comprising: detecting voice activity in the received audio signal; and responsive to the detecting of voice activity, performing the first voice biometric process on at least a part of the received audio signal.

15. A method according to any of claims 1 to 14, comprising: detecting voice activity in the received audio signal; responsive to the detecting of voice activity, performing keyword detection; and responsive to detecting a keyword, performing the first voice biometric process on at least a part of the received audio signal.

16. A method according to any preceding claim, comprising: performing the first voice biometric process on the entire received audio signal.

17. A method according to any preceding claim, comprising using an initial determination by the first voice biometric process, that the speech is the speech of an enrolled user, as an indication that the received audio signal comprises speech.

18. A method according to any of claims 1 to 17, comprising: performing at least a part of a voice biometric process suitable for determining whether a signal contains speech of an enrolled user, and generating an output signal when it is determined that the signal contains human speech.

19. A method according to claim 18, comprising comparing a similarity score with a first threshold to determine whether the signal contains speech of an enrolled user, and comparing the similarity score with a second, lower, threshold to determine whether the signal contains speech.

20. A method according to claim 19, comprising determining that the signal contains human speech before it is possible to determine whether the signal contains speech of an enrolled user.

21. A method according to any preceding claim, wherein the first voice biometric process is configured as an analog processing system, and the second voice biometric process is configured as a digital processing system.

22. A method according to any preceding claim, further comprising performing one or more tests on the received audio signal, to determine whether the received audio signal has properties that indicate that it may result from a replay attack.

23. A method according to claim 22, comprising performing the second voice biometric process on the received audio signal only if it is determined that the received audio signal does not have properties that indicate that it may result from a replay attack.

24. A speaker identification system, comprising: an input for receiving an audio signal representing speech; a first device including a first processor for performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and a second device including a second processor for performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker, wherein the second voice biometric process is initiated if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, and wherein the second voice biometric process is selected to be more discriminative than the first voice biometric process.

25. A speaker identification system according to claim 24, wherein the first device comprises a first integrated circuit, and the second device comprises a second integrated circuit.

26. A speaker identification system according to claim 24 or 25, wherein the first device comprises a dedicated biometrics integrated circuit.

27. A speaker identification system according to claim 26, wherein the first device is an accessory device.

28. A speaker identification system according to claim 27, wherein the first device is a listening device.

29. A speaker identification system according to claim 24 or 25, wherein the second device comprises an applications processor .

30. A speaker identification system according to claim 29, wherein the second device is a handset device.

31. A speaker identification system according to claim 30, wherein the second device is a smartphone.

32. A speaker identification system according to any of claims 24 to 31 , wherein the first device is arranged to perform a spoof detection process on the audio signal, to identify if the audio signal is the result of an audio spoof attack, and wherein the output of the first voice biometric process is gated by the output of the spoof detection process, such that, if a spoof attack is detected, the first voice biometric process is prevented from initiating the second voice biometric process.