GB2580821A - Analysing speech signals - Google Patents

Analysing speech signals Download PDF

Info

Publication number
GB2580821A
GB2580821A GB2004481.4A GB202004481A GB2580821A GB 2580821 A GB2580821 A GB 2580821A GB 202004481 A GB202004481 A GB 202004481A GB 2580821 A GB2580821 A GB 2580821A
Authority
GB
United Kingdom
Prior art keywords
audio signal
speech
channel
speaker
determining whether
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB2004481.4A
Other versions
GB202004481D0 (en
GB2580821B (en
Inventor
Paul Lesso John
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cirrus Logic International Semiconductor Ltd
Original Assignee
Cirrus Logic International Semiconductor Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB1719731.0A external-priority patent/GB2567503A/en
Priority claimed from GBGB1719734.4A external-priority patent/GB201719734D0/en
Application filed by Cirrus Logic International Semiconductor Ltd filed Critical Cirrus Logic International Semiconductor Ltd
Publication of GB202004481D0 publication Critical patent/GB202004481D0/en
Publication of GB2580821A publication Critical patent/GB2580821A/en
Application granted granted Critical
Publication of GB2580821B publication Critical patent/GB2580821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A method of analysis of an audio signal comprises: receiving an audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user. Based on the analysing, information is obtained information about at least one of a channel and noise affecting the audio signal.

Claims (43)

1. A method of analysis of an audio signal, the method comprising: receiving an audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user; and based on said analysing, obtaining information about at least one of a channel and noise affecting said audio signal.
2. A method according to claim 1 , wherein extracting first and second components of the audio signal comprises: identifying periods when the audio signal contains voiced speech; and identifying remaining periods of speech as containing unvoiced speech.
3. A method according to claim 1 or 2, wherein analysing the first and second components of the audio signal with the models of the first and second acoustic classes of the speech of the enrolled user comprises: comparing magnitudes of the audio signal at a number of predetermined frequencies with magnitudes in the models of the first and second acoustic classes of the speech.
4. A method according to any preceding claim, comprising compensating the received audio signal for channel and/or noise.
5. A method according to any preceding claim, comprising: performing a speaker identification process on the received audio signal to form a provisional decision on an identity of a speaker; selecting the models of the first and second acoustic classes of the speech of the enrolled user, from a plurality of models, based on the provisional decision on the identity of the speaker; compensating the received audio signal for channel and/or noise; and performing a second speaker identification process on the compensated received audio signal to form a final on the identity of the speaker.
6. A method according to claim 5, wherein compensating the received audio signal for channel and/or noise comprises: identifying at least one part of a frequency spectrum of the received audio signal where a noise level exceeds a threshold level; and ignoring the identified part of the frequency spectrum of the received audio signal when performing the second speaker identification process.
7. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise voiced speech and unvoiced speech.
8. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise first and second phoneme classes.
9. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise first and second fricatives.
10. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise fricatives and sibilants.
1 1. A system for analysis of an audio signal, the system comprising an input for receiving an audio signal, and being configured for: receiving an audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user; and based on said analysing, obtaining information about at least one of a channel and noise affecting said audio signal.
12. A device comprising a system as claimed in any of claims 1 to 10.
13. A device as claimed in claim 12, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.
14. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 1 to 10.
15. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 1 to 10.
16. A method of determining whether a received signal may result from a replay attack, the method comprising: receiving an audio signal representing speech; obtaining information about a channel affecting said audio signal; and determining whether the channel has at least one characteristic of a loudspeaker.
17. A method according to claim 16, wherein determining whether the channel has at least one characteristic of a loudspeaker comprises: determining whether the channel has a low frequency roll-off.
18. A method according to claim 17, wherein determining whether the channel has a low frequency roll-off comprises determining whether the channel decreases at a constant rate for frequencies below a lower cut-off frequency.
19. A method according to claim 16 or 17, wherein determining whether the channel has at least one characteristic of a loudspeaker comprises: determining whether the channel has a high frequency roll-off.
20. A method according to claim 19, wherein determining whether the channel has a high frequency roll-off comprises determining whether the channel decreases at a constant rate for frequencies above an upper cut-off frequency.
21 . A method according to claim 16, 17 or 19, wherein determining whether the channel has at least one characteristic of a loudspeaker comprises: determining whether the channel has ripple in a pass-band thereof.
22. A method according to claim 21 , wherein determining whether the channel has ripple in a pass-band thereof comprises determining whether a degree of ripple over a central part of the pass-band, for example from 100Hz - 10kHz, exceeds a threshold amount.
23. A system for determining whether a received signal may result from a replay attack, the system comprising an input for receiving an audio signal, and being configured for: receiving an audio signal representing speech; obtaining information about a channel affecting said audio signal; and determining whether the channel has at least one characteristic of a loudspeaker.
24. A device comprising a system as claimed in any of claims 16 to 22.
25. A device as claimed in claim 24, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.
26. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 16 to 22.
27. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 16 to 22.
28. A method of speaker identification, comprising: receiving an audio signal representing speech; removing effects of a channel and/or noise from the received audio signal to obtain a cleaned audio signal; obtaining an average spectrum of at least a part of the cleaned audio signal; comparing the average spectrum with a long term average speaker model for an enrolled speaker; and determining based on the comparison whether the speech is the speech of the enrolled speaker.
29. A method according to claim 28, wherein obtaining an average spectrum of at least a part of the cleaned audio signal comprises obtaining an average spectrum of a part of the cleaned audio signal representing voiced speech.
30. A method according to claim 28, wherein obtaining an average spectrum of at least a part of the cleaned audio signal comprises obtaining a first average spectrum of a part of the cleaned audio signal representing a first acoustic class and obtaining a second average spectrum of a part of the cleaned audio signal representing a second acoustic class, and wherein comparing the average spectrum with a long term average speaker model for an enrolled speaker comprises comparing the first average spectrum with a long term average speaker model for the first acoustic class for the enrolled speaker and comparing the second average spectrum with a long term average speaker model for the second acoustic class for the enrolled speaker.
31. A method according to claim 28, wherein the first acoustic class is voiced speech and the second acoustic class is unvoiced speech.
32. A method according to claim 28, 29, 30 or 31 , comprising comparing the average spectrum with respective long term average speaker models for each of a plurality of enrolled speakers; and determining based on the comparison whether the speech is the speech of one of the enrolled speakers.
33. A method according to claim 32, further comprising comparing the average spectrum with a Universal Background Model; and including a result of the comparing the average spectrum with the Universal Background Model in determining whether the speech is the speech of one of the enrolled speakers.
34. A method according to claim 32, comprising identifying one of the enrolled speakers as a most likely candidate as a source of the speech.
35. A method according to any of claims 28 to 34, comprising: obtaining information about the effects of a channel and/or noise on the received audio signal by: receiving the audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user; and based on said analysing, obtaining information about at least one of a channel and noise affecting said audio signal.
36. A method according to claim 35, comprising analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of a plurality of enrolled users, to obtain respective hypothetical values of the channel, and determining that the speech is not the speech of any enrolled speaker whose models give rise to physically implausible hypothetical values of the channel.
37. A method according to claim 36, wherein a hypothetical value of the channel is considered to be physically implausible if it contains variations exceeding a threshold level across the relevant frequency range.
38. A method according to claim 36, wherein a hypothetical value of the channel is considered to be physically implausible if it contains significant discontinuities.
39. A system for analysis of an audio signal, the system comprising an input for receiving an audio signal, and being configured for: receiving an audio signal representing speech; removing effects of a channel and/or noise from the received audio signal to obtain a cleaned audio signal; obtaining an average spectrum of at least a part of the cleaned audio signal; comparing the average spectrum with a long term average speaker model for an enrolled speaker; and determining based on the comparison whether the speech is the speech of the enrolled speaker.
40. A device comprising a system as claimed in claim 39.
41. A device as claimed in claim 40, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.
42. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 28 to 38.
43. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 28 to 38.
GB2004481.4A 2017-10-13 2018-10-11 Analysing speech signals Active GB2580821B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201762571978P 2017-10-13 2017-10-13
US201762578667P 2017-10-30 2017-10-30
GB1719731.0A GB2567503A (en) 2017-10-13 2017-11-28 Analysing speech signals
GBGB1719734.4A GB201719734D0 (en) 2017-10-30 2017-11-28 Speaker identification
PCT/GB2018/052905 WO2019073233A1 (en) 2017-10-13 2018-10-11 Analysing speech signals

Publications (3)

Publication Number Publication Date
GB202004481D0 GB202004481D0 (en) 2020-05-13
GB2580821A true GB2580821A (en) 2020-07-29
GB2580821B GB2580821B (en) 2022-11-09

Family

ID=66100464

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2004481.4A Active GB2580821B (en) 2017-10-13 2018-10-11 Analysing speech signals

Country Status (3)

Country Link
CN (1) CN111201570A (en)
GB (1) GB2580821B (en)
WO (1) WO2019073233A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808595A (en) * 2020-06-15 2021-12-17 颜蔚 Voice conversion method and device from source speaker to target speaker

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002103680A2 (en) * 2001-06-19 2002-12-27 Securivox Ltd Speaker recognition system ____________________________________
US20070129941A1 (en) * 2005-12-01 2007-06-07 Hitachi, Ltd. Preprocessing system and method for reducing FRR in speaking recognition
WO2013022930A1 (en) * 2011-08-08 2013-02-14 The Intellisis Corporation System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain
EP2860706A2 (en) * 2013-09-24 2015-04-15 Agnitio S.L. Anti-spoofing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105244031A (en) * 2015-10-26 2016-01-13 北京锐安科技有限公司 Speaker identification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002103680A2 (en) * 2001-06-19 2002-12-27 Securivox Ltd Speaker recognition system ____________________________________
US20070129941A1 (en) * 2005-12-01 2007-06-07 Hitachi, Ltd. Preprocessing system and method for reducing FRR in speaking recognition
WO2013022930A1 (en) * 2011-08-08 2013-02-14 The Intellisis Corporation System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain
EP2860706A2 (en) * 2013-09-24 2015-04-15 Agnitio S.L. Anti-spoofing

Also Published As

Publication number Publication date
GB202004481D0 (en) 2020-05-13
GB2580821B (en) 2022-11-09
WO2019073233A1 (en) 2019-04-18
CN111201570A (en) 2020-05-26

Similar Documents

Publication Publication Date Title
CN111316668B (en) Detection of loudspeaker playback
CN110832580B (en) Detection of replay attacks
US11270707B2 (en) Analysing speech signals
GB2578386A (en) Detection of replay attack
GB2588040A (en) Detection of replay attack
CN107910014B (en) Echo cancellation test method, device and test equipment
US20200227071A1 (en) Analysing speech signals
US11017781B2 (en) Reverberation compensation for far-field speaker recognition
CN108877823B (en) Speech enhancement method and device
GB2578545A (en) Magnetic detection of replay attack
KR20180063282A (en) Method, apparatus and storage medium for voice detection
US20180033427A1 (en) Speech recognition transformation system
GB2593300A (en) Biometric user recognition
JP2019053321A (en) Method for detecting audio signal and apparatus
CN111866690A (en) Microphone testing method and device
KR20180067608A (en) Method and apparatus for determining a noise signal, and method and apparatus for removing a noise noise
CA2869884C (en) A processing apparatus and method for estimating a noise amplitude spectrum of noise included in a sound signal
CN104217728A (en) Audio processing method and electronic device
CN104707331A (en) Method and device for generating game somatic sense
CN107767860B (en) Voice information processing method and device
CN111161746A (en) Voiceprint registration method and system
GB2581677A (en) Speaker enrolment
GB2580821A (en) Analysing speech signals
CN110797008B (en) Far-field voice recognition method, voice recognition model training method and server
CN110995914A (en) Double-microphone testing method and device