GB2580821A

GB2580821A - Analysing speech signals

Info

Publication number: GB2580821A
Application number: GB2004481.4A
Authority: GB
Inventors: Paul Lesso John
Original assignee: Cirrus Logic International Semiconductor Ltd
Current assignee: Cirrus Logic International Semiconductor Ltd
Priority date: 2017-10-13
Filing date: 2018-10-11
Publication date: 2020-07-29
Anticipated expiration: 2038-10-11
Also published as: GB202004481D0; GB2580821B; WO2019073233A1; CN111201570A

Abstract

A method of analysis of an audio signal comprises: receiving an audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user. Based on the analysing, information is obtained information about at least one of a channel and noise affecting the audio signal.

Claims

1. A method of analysis of an audio signal, the method comprising: receiving an audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user; and based on said analysing, obtaining information about at least one of a channel and noise affecting said audio signal.

2. A method according to claim 1 , wherein extracting first and second components of the audio signal comprises: identifying periods when the audio signal contains voiced speech; and identifying remaining periods of speech as containing unvoiced speech.

3. A method according to claim 1 or 2, wherein analysing the first and second components of the audio signal with the models of the first and second acoustic classes of the speech of the enrolled user comprises: comparing magnitudes of the audio signal at a number of predetermined frequencies with magnitudes in the models of the first and second acoustic classes of the speech.

4. A method according to any preceding claim, comprising compensating the received audio signal for channel and/or noise.

5. A method according to any preceding claim, comprising: performing a speaker identification process on the received audio signal to form a provisional decision on an identity of a speaker; selecting the models of the first and second acoustic classes of the speech of the enrolled user, from a plurality of models, based on the provisional decision on the identity of the speaker; compensating the received audio signal for channel and/or noise; and performing a second speaker identification process on the compensated received audio signal to form a final on the identity of the speaker.

6. A method according to claim 5, wherein compensating the received audio signal for channel and/or noise comprises: identifying at least one part of a frequency spectrum of the received audio signal where a noise level exceeds a threshold level; and ignoring the identified part of the frequency spectrum of the received audio signal when performing the second speaker identification process.

7. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise voiced speech and unvoiced speech.

8. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise first and second phoneme classes.

9. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise first and second fricatives.

10. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise fricatives and sibilants.

1 1. A system for analysis of an audio signal, the system comprising an input for receiving an audio signal, and being configured for: receiving an audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user; and based on said analysing, obtaining information about at least one of a channel and noise affecting said audio signal.

12. A device comprising a system as claimed in any of claims 1 to 10.

13. A device as claimed in claim 12, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.

14. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 1 to 10.

15. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 1 to 10.

16. A method of determining whether a received signal may result from a replay attack, the method comprising: receiving an audio signal representing speech; obtaining information about a channel affecting said audio signal; and determining whether the channel has at least one characteristic of a loudspeaker.

17. A method according to claim 16, wherein determining whether the channel has at least one characteristic of a loudspeaker comprises: determining whether the channel has a low frequency roll-off.

18. A method according to claim 17, wherein determining whether the channel has a low frequency roll-off comprises determining whether the channel decreases at a constant rate for frequencies below a lower cut-off frequency.

19. A method according to claim 16 or 17, wherein determining whether the channel has at least one characteristic of a loudspeaker comprises: determining whether the channel has a high frequency roll-off.

20. A method according to claim 19, wherein determining whether the channel has a high frequency roll-off comprises determining whether the channel decreases at a constant rate for frequencies above an upper cut-off frequency.

21 . A method according to claim 16, 17 or 19, wherein determining whether the channel has at least one characteristic of a loudspeaker comprises: determining whether the channel has ripple in a pass-band thereof.

22. A method according to claim 21 , wherein determining whether the channel has ripple in a pass-band thereof comprises determining whether a degree of ripple over a central part of the pass-band, for example from 100Hz - 10kHz, exceeds a threshold amount.

23. A system for determining whether a received signal may result from a replay attack, the system comprising an input for receiving an audio signal, and being configured for: receiving an audio signal representing speech; obtaining information about a channel affecting said audio signal; and determining whether the channel has at least one characteristic of a loudspeaker.

24. A device comprising a system as claimed in any of claims 16 to 22.

25. A device as claimed in claim 24, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.

26. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 16 to 22.

27. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 16 to 22.

28. A method of speaker identification, comprising: receiving an audio signal representing speech; removing effects of a channel and/or noise from the received audio signal to obtain a cleaned audio signal; obtaining an average spectrum of at least a part of the cleaned audio signal; comparing the average spectrum with a long term average speaker model for an enrolled speaker; and determining based on the comparison whether the speech is the speech of the enrolled speaker.

29. A method according to claim 28, wherein obtaining an average spectrum of at least a part of the cleaned audio signal comprises obtaining an average spectrum of a part of the cleaned audio signal representing voiced speech.

30. A method according to claim 28, wherein obtaining an average spectrum of at least a part of the cleaned audio signal comprises obtaining a first average spectrum of a part of the cleaned audio signal representing a first acoustic class and obtaining a second average spectrum of a part of the cleaned audio signal representing a second acoustic class, and wherein comparing the average spectrum with a long term average speaker model for an enrolled speaker comprises comparing the first average spectrum with a long term average speaker model for the first acoustic class for the enrolled speaker and comparing the second average spectrum with a long term average speaker model for the second acoustic class for the enrolled speaker.

31. A method according to claim 28, wherein the first acoustic class is voiced speech and the second acoustic class is unvoiced speech.

32. A method according to claim 28, 29, 30 or 31 , comprising comparing the average spectrum with respective long term average speaker models for each of a plurality of enrolled speakers; and determining based on the comparison whether the speech is the speech of one of the enrolled speakers.

33. A method according to claim 32, further comprising comparing the average spectrum with a Universal Background Model; and including a result of the comparing the average spectrum with the Universal Background Model in determining whether the speech is the speech of one of the enrolled speakers.

34. A method according to claim 32, comprising identifying one of the enrolled speakers as a most likely candidate as a source of the speech.

35. A method according to any of claims 28 to 34, comprising: obtaining information about the effects of a channel and/or noise on the received audio signal by: receiving the audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user; and based on said analysing, obtaining information about at least one of a channel and noise affecting said audio signal.

36. A method according to claim 35, comprising analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of a plurality of enrolled users, to obtain respective hypothetical values of the channel, and determining that the speech is not the speech of any enrolled speaker whose models give rise to physically implausible hypothetical values of the channel.

37. A method according to claim 36, wherein a hypothetical value of the channel is considered to be physically implausible if it contains variations exceeding a threshold level across the relevant frequency range.

38. A method according to claim 36, wherein a hypothetical value of the channel is considered to be physically implausible if it contains significant discontinuities.

39. A system for analysis of an audio signal, the system comprising an input for receiving an audio signal, and being configured for: receiving an audio signal representing speech; removing effects of a channel and/or noise from the received audio signal to obtain a cleaned audio signal; obtaining an average spectrum of at least a part of the cleaned audio signal; comparing the average spectrum with a long term average speaker model for an enrolled speaker; and determining based on the comparison whether the speech is the speech of the enrolled speaker.

40. A device comprising a system as claimed in claim 39.

41. A device as claimed in claim 40, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.

42. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 28 to 38.

43. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 28 to 38.