GB2581594A

GB2581594A - Detection of liveness

Info

Publication number: GB2581594A
Application number: GB2004477.2A
Authority: GB
Inventors: Paul Lesso John
Original assignee: Cirrus Logic International Semiconductor Ltd
Current assignee: Cirrus Logic International Semiconductor Ltd
Priority date: 2017-10-13
Filing date: 2018-10-11
Publication date: 2020-08-26
Anticipated expiration: 2038-10-11
Also published as: WO2019073235A1; KR20200062320A; GB202004477D0; GB2581594B; CN111201568A

Abstract

Detecting a replay attack on a voice biometrics system comprises: receiving a speech signal; generating an ultrasound signal; detecting a reflection of the generated ultrasound signal; detecting Doppler shifts in the reflection of the generated ultrasound signal; and identifying whether the received speech signal is indicative of the liveness of a speaker based on the detected Doppler shifts. Identifying whether the received speech signal is indicative of liveness based on the detected Doppler shifts comprises determining whether the detected Doppler shifts correspond to a speech articulation rate.

Claims

1. A method of detecting liveness, the method comprising: receiving a speech signal; generating an ultrasound signal; detecting a reflection of the generated ultrasound signal; detecting Doppler shifts in the reflection of the generated ultrasound signal; and identifying whether the received speech signal is indicative of the liveness of a speaker based on the detected Doppler shifts, wherein identifying whether the received speech signal is indicative of liveness based on the detected Doppler shifts comprises: determining whether the detected Doppler shifts correspond to a speech articulation rate.

2. A method according to claim 1 , wherein determining whether the detected Doppler shifts correspond to a speech articulation rate comprises: determining whether the detected Doppler shifts correspond to facial movements at a frequency in the range of 4-1 OHz.

3. A method according to claim 1 or 2, wherein determining whether the detected Doppler shifts correspond to a speech articulation rate comprises: determining an articulation rate associated with the speech signal; and determining whether the detected Doppler shifts correspond to facial movements at the articulation rate associated with the speech signal.

4. A method according to claim 2, further comprising, if it is determined that the detected Doppler shifts correspond to facial movements at a frequency in the range of 4-1 OHz: determining an articulation rate associated with the speech signal; determining whether the detected Doppler shifts correspond to lip movements at the articulation rate associated with the speech signal; and determining that the received speech signal is indicative of liveness if the detected Doppler shifts correspond to lip movements at the articulation rate associated with the speech signal.

5. A method according to any of claims 1 to 4, for use in a voice biometrics system, wherein identifying whether the received speech signal is indicative of liveness comprises determining whether the received speech signal may be a product of a replay attack.

6. A system for liveness detection, the system comprising: at least one microphone input, for receiving an audio signal from a microphone; and at least one transducer output, for transmitting a signal to an ultrasound transducer, and the system being configured for: receiving a speech signal at the at least one microphone input; generating an ultrasound signal by transmitting a signal at the at least one transducer output; detecting a reflection of the generated ultrasound signal; detecting Doppler shifts in the reflection of the generated ultrasound signal; and identifying whether the received speech signal is indicative of the liveness of a speaker based on the detected Doppler shifts, wherein identifying whether the received speech signal is indicative of liveness based on the detected Doppler shifts comprises: determining whether the detected Doppler shifts correspond to a speech articulation rate.

7. A device comprising a system as claimed in claim 6

8. A device as claimed in claim 7, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance .

9. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 1 to 5.

10. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 1 to 5.

11. A device comprising the non-transitory computer readable storage medium as claimed in claim 10.

12. A device as claimed in claim 1 1 , wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.

13. A method of liveness detection, the method comprising: generating an ultrasound signal; receiving an audio signal comprising a reflection of the ultrasound signal; using the received audio signal comprising the reflection of the ultrasound signal to detect the liveness of a speaker; monitoring ambient ultrasound noise; and adjusting the operation of a system receiving the audio signal, based on a level of the reflected ultrasound and the monitored ambient ultrasound noise.

14. A method according to claim 13, for use in a voice biometrics system, wherein detecting the liveness of a speaker comprises determining whether a received speech signal may be a product of a replay attack, and comprising: adjusting the operation of the voice biometrics system based on a level of the reflected ultrasound and the monitored ambient ultrasound noise.

15. A method according to claim 14, comprising: detecting Doppler shifts in the reflection of the generated ultrasound signal; and identifying whether the received speech signal may be the result of a replay attack on the voice biometrics system based on the detected Doppler shifts, the method further comprising: determining a reliance to be placed on the identification whether the received speech signal may be the result of a replay attack, based on the level of the monitored ambient ultrasound noise.

16. A method according to claim 15, wherein determining the reliance to be placed on the identification comprises not performing the identification if the level of the monitored ambient ultrasound noise exceeds a first threshold level.

17. A method according to claim 14, comprising: detecting Doppler shifts in the reflection of the generated ultrasound signal; and identifying whether the received speech signal may be the result of a replay attack on the voice biometrics system based on the detected Doppler shifts, wherein identifying whether the received speech signal may result from a replay attack based on the detected Doppler shifts comprises: determining a correlation between the detected Doppler shifts and the received speech signal; and adapting a threshold correlation value to be used in identifying whether the received speech signal may result from a replay attack, based on the level of the monitored ambient ultrasound noise.

18. A system for liveness detection, the system comprising: at least one microphone input, for receiving an audio signal from a microphone; and at least one transducer output, for transmitting a signal to an ultrasound transducer, and the system being configured for: generating an ultrasound signal; receiving an audio signal comprising a reflection of the ultrasound signal; using the received audio signal comprising the reflection of the ultrasound signal to detect the liveness of a speaker; monitoring ambient ultrasound noise; and adjusting the operation of a system receiving the audio signal, based on a level of the reflected ultrasound and the monitored ambient ultrasound noise.

19. A device comprising a system as claimed in claim 18.

20. A device as claimed in claim 19, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.

21 . A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 13 to 17.

22. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 13 to 17 .

23. A device comprising the non-transitory computer readable storage medium as claimed in claim 22.

24. A device as claimed in claim 23, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.

25. A method of liveness detection in a device, the method comprising: receiving a speech signal from a voice source; generating and transmitting an ultrasound signal through a transducer of the device; detecting a reflection of the transmitted ultrasound signal; detecting Doppler shifts in the reflection of the generated ultrasound signal; and identifying whether the received speech signal is indicative of liveness of a speaker based on the detected Doppler shifts, and the method further comprising: obtaining information about a position of the device; and adapting the generating and transmitting of the ultrasound signal based on the information about the position of the device.

26. A method according to claim 25, wherein adapting the generating and transmitting of the ultrasound signal comprises: adjusting a transmit power of the ultrasound signal.

27. A method according to claim 25 or 26, wherein the device has multiple transducers, and wherein adapting the generating and transmitting of the ultrasound signal comprises: selecting the transducer in which the ultrasound signal is generated.

28. A method according to claim 25, 26 or 27, wherein obtaining information about a position of the device comprises obtaining information about an orientation of the device .

29. A method according to claim 25, 26, 27 or 28, wherein obtaining information about a position of the device comprises obtaining information about a distance of the device from the voice source.

30. A method according to claim 25, wherein the device is a mobile phone comprising at least a first transducer at a lower end of the device and a second transducer at an upper end of the device, and wherein adapting the generating and transmitting of the ultrasound signal based on the information about the position of the device comprises: transmitting the ultrasound signal from the first transducer at an intensity in the range of 70-90dB SPL at 1 cm if the information about the position of the device indicates that the device is being used in a close talk mode.

31 . A method according to claim 25, wherein the device is a mobile phone comprising at least a first transducer at a lower end of the device and a second transducer at an upper end of the device, and wherein adapting the generating and transmitting of the ultrasound signal based on the information about the position of the device comprises: transmitting the ultrasound signal at an intensity in the range of 90-1 10dB SPL at 1 cm if the information about the position of the device indicates that the device is being used in a near talk mode.

32. A method according to claim 27, wherein adapting the generating and transmitting of the ultrasound signal based on the information about the position of the device comprises: transmitting the ultrasound signal from the first transducer if the information about the position of the device indicates that the device is being used in a generally horizontal orientation.

33. A method according to claim 27 or 32, wherein adapting the generating and transmitting of the ultrasound signal based on the information about the position of the device comprises: transmitting the ultrasound signal from the second transducer if the information about the position of the device indicates that the device is being used in a generally vertical orientation .

34. A method according to any of claims 25 to 33, wherein adapting the generating and transmitting of the ultrasound signal based on the information about the position of the device comprises: preventing transmission of the ultrasound signal if the information about the position of the device indicates that the device is being used in a far talk mode.

35. A method according to claim 29, wherein adapting the generating and transmitting of the ultrasound signal comprises adjusting a transmit power of the ultrasound signal, with a higher power being used when the device is further from the voice source, for distances below a predetermined maximum distance.

36. A method according to any of claims 25 to 35, wherein obtaining information about a position of the device comprises obtaining information as to which of multiple loudspeaker transducers is closest to the voice source, and adapting the generating and transmitting of the ultrasound signal comprises transmitting the ultrasound signal primarily or entirely from that loudspeaker.

37. A method according to any of claims 25 to 36, comprising obtaining information about the position of the device from one or more of the following: gyroscopes, accelerometers, proximity sensors, light level sensors, touch sensors, sound level sensors, and a camera.

38. A method according to any of claims 25 to 37, for use in a voice biometrics system, wherein identifying whether the received speech signal is indicative of liveness comprises determining whether the received speech signal may be a product of a replay attack.

39. A system for liveness detection in a device, the system comprising: at least one microphone input, for receiving an audio signal from a microphone; and at least one transducer output, for transmitting a signal to an ultrasound, and the system being configured for: receiving a speech signal from the at least one microphone input; generating a control signal through the transducer output, for transmitting an ultrasound signal through a transducer of the device; detecting a reflection of the transmitted ultrasound signal; detecting Doppler shifts in the reflection of the generated ultrasound signal; and identifying whether the received speech signal is indicative of liveness of a speaker based on the detected Doppler shifts, and the method further comprising: obtaining information about a position of the device; and adapting the generating and transmitting of the ultrasound signal based on the information about the position of the device.

40. A device comprising a system as claimed in claim 39.

41 . A device as claimed in claim 40, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance .

42. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 25 to 38.

43. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 25 to 38.

44. A device comprising the non-transitory computer readable storage medium as claimed in claim 43 .

45. A device as claimed in claim 44, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.

46. A method for improving the robustness of a speech processing system having at least one speech processing module, the method comprising: receiving an input sound signal comprising audio and non-audio frequencies; separating the input sound signal into an audio band component and a non-audio band component; identifying possible interference within the audio band from the non-audio band component; and adjusting the operation of a downstream speech processing module based on said identification.

47. The method of claim 46, wherein identifying possible interference within the audio band from the non-audio band component comprises determining whether a power level of the non-audio band component exceeds a threshold value and, if so, identifying possible interference within the audio band from the non-audio band component.

48. The method of claim 46, wherein identifying possible interference within the audio band from the non-audio band component comprises comparing the audio band and non-audio band components .

49. The method of claim 48, wherein the step of identifying possible interference within the audio band from the non-audio band component comprises: measuring a signal power in the audio band component Pa; measuring a signal power in the non-audio band component Pb and if (Pa /Pb) < threshold limit, flagging the quality of the input sound signal as unreliable for speech processing; and wherein the step of adjusting comprises controlling the operation of a downstream speech processing module based on the flagged unreliable quality.

50. The method of claim 48, wherein the step of comparing comprises: detecting the envelope of the signal of the non-audio band component; detecting a level of correlation between the envelope of the signal and the audio band component; and determining possible non-audio band interference within the audio band if the level of correlation exceeds a threshold value.

51 . The method of claim 48, wherein the step of comparing comprises: simulating the effect of a non-linearity on the non-audio band component to provide a simulated non-linear signal; detecting a level of correlation between the simulated non-linear signal and the audio band component; and determining possible non-audio band interference within the audio band if the level of correlation exceeds a threshold value.

52. The method of claim 50 or 51 , wherein the step of adjusting comprises flagging a detection of possible non-audio band interference within the audio band to a downstream speech processing module.

53. The method of any of claims 46 to 52, wherein the step of adjusting comprises providing a compensated sound signal to a downstream speech processing module .

54. The method of claim 53, wherein the step of providing a compensated sound signal comprises subtracting a simulated non-linear signal from the audio band component to provide a compensated output signal; and providing the compensated output signal to a downstream speech processing module.

55. The method of claim 48, wherein the steps of comparing and adjusting comprise: simulating the effect of a non-linearity on the non-audio band component to provide a simulated non-linear signal; subtracting the simulated non-linear signal from the audio band component to provide a compensated output signal; and providing the compensated output signal to a downstream speech processing module.

56. The method of claim 54 or 55, wherein the step of subtracting comprises: applying the simulated non-linearity signal to a filter; and subtracting the filtered simulated non-linearity signal from the audio band component of the input sound signal to provide a compensated output signal.

57. A method according to claim 56, wherein the filter is an adaptive filter, and the method comprises adapting the adaptive filter such that the component of the filtered simulated non-linearity signal in the compensated output signal is minimised.

58. The method of claim 57, wherein adapting the adaptive filter comprises adapting a gain of the filter .

59. The method of claim 57 or 58, wherein adapting the adaptive filter comprises adapting filter coefficients of the filter.

60. The method of claim 54 or 55, wherein the step of simulating a non-linearity comprises providing the non-audio band component to an adaptive non-linearity module, and wherein the method comprises controlling the adaptive non-linearity module such that the component of the simulated non-linearity signal in the compensated output signal is minimised.

61 . The method of any of claims 46 to 60, further comprising the step of: measuring a signal power in the non-audio band component Pb, wherein the method is responsive to the step of measuring the signal power, such that: if the measured signal power level Pb is below a threshold level X, the method comprises flagging the input sound signal as free of non-audio band interference, and if the measured signal power level Pb is above a threshold level X, the method performs the step of identifying possible interference within the audio band from the non-audio band component.

62. The method of any of claims 46 to 61 , wherein the step of separating comprises: filtering the input sound signal to obtain an audio band component of the input sound signal; and filtering the input sound signal to obtain a non-audio band component of the input sound signal.

63. The method of any of claims 46 to 62, wherein the speech processing system is a voice biometrics system.

64. A method of detecting an ultrasound interference signal, the method comprising: filtering an input signal to obtain an audio band component of the input signal; filtering the input signal to obtain an ultrasound component of the input signal; detecting an envelope of the ultrasound component of the input signal; detecting a degree of correlation between the audio band component of the input signal and the envelope of the ultrasound component of the input signal; and detecting a presence of an ultrasound interference signal if the degree of correlation between the audio band component of the input signal and the envelope of the ultrasound component of the input signal exceeds a threshold level.

65. A method of detecting an ultrasound interference signal, the method comprising: filtering an input signal to obtain an audio band component of the input signal; filtering the input signal to obtain an ultrasound component of the input signal; modifying the ultrasound component to simulate an effect of a non-linear downconversion of the input signal; detecting a degree of correlation between the audio band component of the input signal and the modified ultrasound component of the input signal; and detecting a presence of an ultrasound interference signal if the degree of correlation between the audio band component of the input signal and the modified ultrasound component of the input signal exceeds a threshold level.

66. A method of processing a signal containing an ultrasound interference signal, the method comprising: filtering an input signal to obtain an audio band component of the input signal; filtering the input signal to obtain an ultrasound component of the input signal; modifying the ultrasound component to simulate an effect of a non-linear downconversion of the input signal; and comparing the audio band component of the input signal and the modified ultrasound component.

67. A method according to claim 66, wherein comparing the audio band component of the input signal and the modified ultrasound component comprises: detecting a degree of correlation between the audio band component of the input signal and the modified ultrasound component of the input signal; and detecting a presence of an ultrasound interference signal if the degree of correlation between the audio band component of the input signal and the modified ultrasound component of the input signal exceeds a threshold level.

68. A method according to claim 67, further comprising sending the audio band component of the input signal to a speech processing module only if no ultrasound interference signal is detected .

69. A method according to claim 66, wherein comparing the audio band component of the input signal and the modified ultrasound component comprises: applying the modified ultrasound component of the input signal to a filter; and subtracting the filtered modified ultrasound component of the input signal from the audio band component of the input signal to obtain an output signal.

70. A method according to claim 69, wherein the filter is an adaptive filter, and the method comprises adapting the adaptive filter such that the component of the filtered modified ultrasound component in the output signal is minimised.

71 . A system for improving the robustness of a speech processing system having at least one speech processing module, the system comprising an input for receiving an input sound signal comprising audio and non-audio frequencies; and a filter for separating a non-audio band component from the input sound signal, and the system being configured for: receiving an input sound signal comprising audio and non-audio frequencies; separating the input sound signal into an audio band component and a non-audio band component; identifying possible interference within the audio band from the non-audio band component; and adjusting the operation of a downstream speech processing module based on said identification.

72. A device comprising a system as claimed in claim 71

73. A device as claimed in claim 72, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance .

74. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 46 to 70.

75. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 46 to 70.

76. A device comprising the non-transitory computer readable storage medium as claimed in claim 75.

77. A device as claimed in claim 75, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.