GB2580821A - Analysing speech signals - Google Patents
Analysing speech signals Download PDFInfo
- Publication number
- GB2580821A GB2580821A GB2004481.4A GB202004481A GB2580821A GB 2580821 A GB2580821 A GB 2580821A GB 202004481 A GB202004481 A GB 202004481A GB 2580821 A GB2580821 A GB 2580821A
- Authority
- GB
- United Kingdom
- Prior art keywords
- audio signal
- speech
- channel
- speaker
- determining whether
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005236 sound signal Effects 0.000 claims abstract 53
- 238000004458 analytical method Methods 0.000 claims abstract 4
- 238000000034 method Methods 0.000 claims 38
- 238000001228 spectrum Methods 0.000 claims 17
- 230000007774 longterm Effects 0.000 claims 6
- 238000004590 computer program Methods 0.000 claims 3
- 230000000694 effects Effects 0.000 claims 3
- 230000007423 decrease Effects 0.000 claims 2
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/14—Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Telephone Function (AREA)
- Telephonic Communication Services (AREA)
Abstract
A method of analysis of an audio signal comprises: receiving an audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user. Based on the analysing, information is obtained information about at least one of a channel and noise affecting the audio signal.
Claims (43)
1. A method of analysis of an audio signal, the method comprising: receiving an audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user; and based on said analysing, obtaining information about at least one of a channel and noise affecting said audio signal.
2. A method according to claim 1 , wherein extracting first and second components of the audio signal comprises: identifying periods when the audio signal contains voiced speech; and identifying remaining periods of speech as containing unvoiced speech.
3. A method according to claim 1 or 2, wherein analysing the first and second components of the audio signal with the models of the first and second acoustic classes of the speech of the enrolled user comprises: comparing magnitudes of the audio signal at a number of predetermined frequencies with magnitudes in the models of the first and second acoustic classes of the speech.
4. A method according to any preceding claim, comprising compensating the received audio signal for channel and/or noise.
5. A method according to any preceding claim, comprising: performing a speaker identification process on the received audio signal to form a provisional decision on an identity of a speaker; selecting the models of the first and second acoustic classes of the speech of the enrolled user, from a plurality of models, based on the provisional decision on the identity of the speaker; compensating the received audio signal for channel and/or noise; and performing a second speaker identification process on the compensated received audio signal to form a final on the identity of the speaker.
6. A method according to claim 5, wherein compensating the received audio signal for channel and/or noise comprises: identifying at least one part of a frequency spectrum of the received audio signal where a noise level exceeds a threshold level; and ignoring the identified part of the frequency spectrum of the received audio signal when performing the second speaker identification process.
7. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise voiced speech and unvoiced speech.
8. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise first and second phoneme classes.
9. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise first and second fricatives.
10. A method according to any of claims 1 to 6, wherein the first and second acoustic classes of the speech comprise fricatives and sibilants.
1 1. A system for analysis of an audio signal, the system comprising an input for receiving an audio signal, and being configured for: receiving an audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user; and based on said analysing, obtaining information about at least one of a channel and noise affecting said audio signal.
12. A device comprising a system as claimed in any of claims 1 to 10.
13. A device as claimed in claim 12, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.
14. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 1 to 10.
15. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 1 to 10.
16. A method of determining whether a received signal may result from a replay attack, the method comprising: receiving an audio signal representing speech; obtaining information about a channel affecting said audio signal; and determining whether the channel has at least one characteristic of a loudspeaker.
17. A method according to claim 16, wherein determining whether the channel has at least one characteristic of a loudspeaker comprises: determining whether the channel has a low frequency roll-off.
18. A method according to claim 17, wherein determining whether the channel has a low frequency roll-off comprises determining whether the channel decreases at a constant rate for frequencies below a lower cut-off frequency.
19. A method according to claim 16 or 17, wherein determining whether the channel has at least one characteristic of a loudspeaker comprises: determining whether the channel has a high frequency roll-off.
20. A method according to claim 19, wherein determining whether the channel has a high frequency roll-off comprises determining whether the channel decreases at a constant rate for frequencies above an upper cut-off frequency.
21 . A method according to claim 16, 17 or 19, wherein determining whether the channel has at least one characteristic of a loudspeaker comprises: determining whether the channel has ripple in a pass-band thereof.
22. A method according to claim 21 , wherein determining whether the channel has ripple in a pass-band thereof comprises determining whether a degree of ripple over a central part of the pass-band, for example from 100Hz - 10kHz, exceeds a threshold amount.
23. A system for determining whether a received signal may result from a replay attack, the system comprising an input for receiving an audio signal, and being configured for: receiving an audio signal representing speech; obtaining information about a channel affecting said audio signal; and determining whether the channel has at least one characteristic of a loudspeaker.
24. A device comprising a system as claimed in any of claims 16 to 22.
25. A device as claimed in claim 24, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.
26. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 16 to 22.
27. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 16 to 22.
28. A method of speaker identification, comprising: receiving an audio signal representing speech; removing effects of a channel and/or noise from the received audio signal to obtain a cleaned audio signal; obtaining an average spectrum of at least a part of the cleaned audio signal; comparing the average spectrum with a long term average speaker model for an enrolled speaker; and determining based on the comparison whether the speech is the speech of the enrolled speaker.
29. A method according to claim 28, wherein obtaining an average spectrum of at least a part of the cleaned audio signal comprises obtaining an average spectrum of a part of the cleaned audio signal representing voiced speech.
30. A method according to claim 28, wherein obtaining an average spectrum of at least a part of the cleaned audio signal comprises obtaining a first average spectrum of a part of the cleaned audio signal representing a first acoustic class and obtaining a second average spectrum of a part of the cleaned audio signal representing a second acoustic class, and wherein comparing the average spectrum with a long term average speaker model for an enrolled speaker comprises comparing the first average spectrum with a long term average speaker model for the first acoustic class for the enrolled speaker and comparing the second average spectrum with a long term average speaker model for the second acoustic class for the enrolled speaker.
31. A method according to claim 28, wherein the first acoustic class is voiced speech and the second acoustic class is unvoiced speech.
32. A method according to claim 28, 29, 30 or 31 , comprising comparing the average spectrum with respective long term average speaker models for each of a plurality of enrolled speakers; and determining based on the comparison whether the speech is the speech of one of the enrolled speakers.
33. A method according to claim 32, further comprising comparing the average spectrum with a Universal Background Model; and including a result of the comparing the average spectrum with the Universal Background Model in determining whether the speech is the speech of one of the enrolled speakers.
34. A method according to claim 32, comprising identifying one of the enrolled speakers as a most likely candidate as a source of the speech.
35. A method according to any of claims 28 to 34, comprising: obtaining information about the effects of a channel and/or noise on the received audio signal by: receiving the audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user; and based on said analysing, obtaining information about at least one of a channel and noise affecting said audio signal.
36. A method according to claim 35, comprising analysing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of a plurality of enrolled users, to obtain respective hypothetical values of the channel, and determining that the speech is not the speech of any enrolled speaker whose models give rise to physically implausible hypothetical values of the channel.
37. A method according to claim 36, wherein a hypothetical value of the channel is considered to be physically implausible if it contains variations exceeding a threshold level across the relevant frequency range.
38. A method according to claim 36, wherein a hypothetical value of the channel is considered to be physically implausible if it contains significant discontinuities.
39. A system for analysis of an audio signal, the system comprising an input for receiving an audio signal, and being configured for: receiving an audio signal representing speech; removing effects of a channel and/or noise from the received audio signal to obtain a cleaned audio signal; obtaining an average spectrum of at least a part of the cleaned audio signal; comparing the average spectrum with a long term average speaker model for an enrolled speaker; and determining based on the comparison whether the speech is the speech of the enrolled speaker.
40. A device comprising a system as claimed in claim 39.
41. A device as claimed in claim 40, wherein the device comprises a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.
42. A computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to any one of claims 28 to 38.
43. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of claims 28 to 38.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762571978P | 2017-10-13 | 2017-10-13 | |
US201762578667P | 2017-10-30 | 2017-10-30 | |
GB1719731.0A GB2567503A (en) | 2017-10-13 | 2017-11-28 | Analysing speech signals |
GBGB1719734.4A GB201719734D0 (en) | 2017-10-30 | 2017-11-28 | Speaker identification |
PCT/GB2018/052905 WO2019073233A1 (en) | 2017-10-13 | 2018-10-11 | Analysing speech signals |
Publications (3)
Publication Number | Publication Date |
---|---|
GB202004481D0 GB202004481D0 (en) | 2020-05-13 |
GB2580821A true GB2580821A (en) | 2020-07-29 |
GB2580821B GB2580821B (en) | 2022-11-09 |
Family
ID=66100464
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB2004481.4A Active GB2580821B (en) | 2017-10-13 | 2018-10-11 | Analysing speech signals |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN111201570A (en) |
GB (1) | GB2580821B (en) |
WO (1) | WO2019073233A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113808595A (en) * | 2020-06-15 | 2021-12-17 | 颜蔚 | Voice conversion method and device from source speaker to target speaker |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002103680A2 (en) * | 2001-06-19 | 2002-12-27 | Securivox Ltd | Speaker recognition system ____________________________________ |
US20070129941A1 (en) * | 2005-12-01 | 2007-06-07 | Hitachi, Ltd. | Preprocessing system and method for reducing FRR in speaking recognition |
WO2013022930A1 (en) * | 2011-08-08 | 2013-02-14 | The Intellisis Corporation | System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain |
EP2860706A2 (en) * | 2013-09-24 | 2015-04-15 | Agnitio S.L. | Anti-spoofing |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105244031A (en) * | 2015-10-26 | 2016-01-13 | 北京锐安科技有限公司 | Speaker identification method and device |
-
2018
- 2018-10-11 WO PCT/GB2018/052905 patent/WO2019073233A1/en active Application Filing
- 2018-10-11 GB GB2004481.4A patent/GB2580821B/en active Active
- 2018-10-11 CN CN201880065835.1A patent/CN111201570A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002103680A2 (en) * | 2001-06-19 | 2002-12-27 | Securivox Ltd | Speaker recognition system ____________________________________ |
US20070129941A1 (en) * | 2005-12-01 | 2007-06-07 | Hitachi, Ltd. | Preprocessing system and method for reducing FRR in speaking recognition |
WO2013022930A1 (en) * | 2011-08-08 | 2013-02-14 | The Intellisis Corporation | System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain |
EP2860706A2 (en) * | 2013-09-24 | 2015-04-15 | Agnitio S.L. | Anti-spoofing |
Also Published As
Publication number | Publication date |
---|---|
GB202004481D0 (en) | 2020-05-13 |
GB2580821B (en) | 2022-11-09 |
WO2019073233A1 (en) | 2019-04-18 |
CN111201570A (en) | 2020-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111316668B (en) | Detection of loudspeaker playback | |
CN110832580B (en) | Detection of replay attacks | |
US11270707B2 (en) | Analysing speech signals | |
GB2578386A (en) | Detection of replay attack | |
GB2588040A (en) | Detection of replay attack | |
CN107910014B (en) | Echo cancellation test method, device and test equipment | |
US20200227071A1 (en) | Analysing speech signals | |
US11017781B2 (en) | Reverberation compensation for far-field speaker recognition | |
CN108877823B (en) | Speech enhancement method and device | |
GB2578545A (en) | Magnetic detection of replay attack | |
KR20180063282A (en) | Method, apparatus and storage medium for voice detection | |
US20180033427A1 (en) | Speech recognition transformation system | |
GB2593300A (en) | Biometric user recognition | |
JP2019053321A (en) | Method for detecting audio signal and apparatus | |
CN111866690A (en) | Microphone testing method and device | |
KR20180067608A (en) | Method and apparatus for determining a noise signal, and method and apparatus for removing a noise noise | |
CA2869884C (en) | A processing apparatus and method for estimating a noise amplitude spectrum of noise included in a sound signal | |
CN104217728A (en) | Audio processing method and electronic device | |
CN104707331A (en) | Method and device for generating game somatic sense | |
CN107767860B (en) | Voice information processing method and device | |
CN111161746A (en) | Voiceprint registration method and system | |
GB2581677A (en) | Speaker enrolment | |
GB2580821A (en) | Analysing speech signals | |
CN110797008B (en) | Far-field voice recognition method, voice recognition model training method and server | |
CN110995914A (en) | Double-microphone testing method and device |