WO2015012680A2 - A method for speech watermarking in speaker verification - Google Patents

A method for speech watermarking in speaker verification Download PDF

Info

Publication number
WO2015012680A2
WO2015012680A2 PCT/MY2014/000138 MY2014000138W WO2015012680A2 WO 2015012680 A2 WO2015012680 A2 WO 2015012680A2 MY 2014000138 W MY2014000138 W MY 2014000138W WO 2015012680 A2 WO2015012680 A2 WO 2015012680A2
Authority
WO
WIPO (PCT)
Prior art keywords
speaker
speech
speech signal
watermarking
voice
Prior art date
Application number
PCT/MY2014/000138
Other languages
French (fr)
Other versions
WO2015012680A3 (en
Inventor
Syed Abdul Rahman AL-HADDAD SYED MOHAMED
M. Iqbal Saripan
Shyamala C. DORAISAMY
Abd. Rahman RAMLI
Mohammad Ali NEMATOLLAHI
Original Assignee
Universiti Putra Malaysia
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from MYPI2013701280A external-priority patent/MY180944A/en
Application filed by Universiti Putra Malaysia filed Critical Universiti Putra Malaysia
Publication of WO2015012680A2 publication Critical patent/WO2015012680A2/en
Publication of WO2015012680A3 publication Critical patent/WO2015012680A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • This invention relates to a method for speech watermarking to provide a secure communication system in speaker verification, and more particularly to a method for speech watermarking by taking into account speaker-specific information and characteristics of speech features.
  • Speaker verification is a process to verify speaker identity in a speech signal to provide secure access in communication system particularly in a distance communication system involving critical subject matter such as telephone banking and air traffic control.
  • speaker verification process is a must and need to be employed before taking further action.
  • Conventional speaker verification techniques are exposed to two possible vulnerable points. Firstly, speech could be manipulated while speech is recorded before being transmitted and secondly when speech signal passes through the communication channel.
  • Speech watermarking improves security of the conventional speaker verification by embedding watermark inside the speech signal at a transmitter side and extracting on a receiver side. Apart from the security issues, selecting proper features is another concern for the conventional speaker verification due to discriminant ability, reliability and robustness. Speaker recognition base on speech features has several common problems such as long-term effects due to physiological changes, the emotional state of the speaker, illness, time of the day, fatigue or tiredness, and auditory accommodation. This is due to the speaker-specific features having different concentration in each speech signal frame. Other problems of feature base speaker verification are time and cost of training, amount of data for training, level of security to achieve, and developing text dependant or text independent system. Furthermore, noise in the speech signal is a major contributor for mismatch between training and testing phases which could degrade speaker verification performance. Many researchers have tried to combat with undesired features effect as long as developing speaker modelling techniques to improve the accuracy.
  • US patent 6892175 B1 disclosed a method for encoding watermark in digital message such as a speech signal.
  • the cited patent generates a spread spectrum signal, wherein the spread spectrum signal is representative of the digital information and further embedding the spread spectrum signal in the speech signal.
  • Drawback of the cited patent is that the spread spectrum signal of the watermark is embedded in all frames of the speech signal. As the speech signal has less bandwidth compare to audio signal, thus speech signal can carry less watermark bits than the audio signal which is lead to less watermark capacity.
  • implementing speech watermarking in all frames of the speech signal may degrade accuracy of the speaker verification while consuming more time.
  • the present invention relates to a method for speech watermarking in speaker verification, comprising the steps of: embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver; characterised by the steps of: selecting frames having least speaker-specific information from the speech signal to carry watermark data; detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and embedding watermark data into the selected frames of the speech signal according to the presence or absence of the speaker's voice.
  • Fig. 1 is a flow chart of a method for embedding speech watermarking in speaker verification of the present invention.
  • Fig. 2 is a schematic diagram of a method for speech watermarking in speaker verification of the present invention.
  • Fig. 3 is a flow chart of frame selection in the method of the speech watermarking in the present invention.
  • Fig. 4 shows a step of detecting voice activity for separating voice and non-voice frames.
  • Fig. 5 is a schematic diagram for a method of embedding the speech watermarking in speaker verification in the present invention.
  • Fig. 6 is a schematic diagram for a method of extracting speech watermarking in speaker verification in the present invention.
  • the present invention provides a method for speech watermarking in speaker verification, comprising the steps of:
  • the method for speech watermarking in speaker verification of the present invention comprises embedding watermark data into the speech signal.
  • the embedding watermark process is employed at the transmitter side whereby only watermarked speech signal is available at the receiver. Then, the watermarked speech signal is transmitted over a communication channel to the receiver to go through a watermark extraction method as shown in Fig. 2 before being further processed.
  • the speech signal is first undergo a frame selection to prioritize frames of the speech signal to carry the watermark data. This is due to the speaker specific-information are not uniformly distributed into all frames of the speech signal.
  • the speaker-specific information depends on system noise, fundamental frequencies, system features and source features of the speaker-specific information.
  • the system features relates to structure of speaker vocal fold while source features are up to manner and vibration of speaker vocal cords.
  • Fig. 3 shows a preferred embodiment of the step for selecting frames of the speech signal.
  • fundamental frequency estimation for the frame selection is estimated using Linear Predictive Coding (LPC).
  • LPC Linear Predictive Coding
  • GCI glottal closure instance
  • most speaker discriminant frequencies are located in low frequencies below 600 Hz and high frequencies above 3500 Hz. Some frequencies are located in mid frequency area of 500Hz to 3500Hz which is most important for phonetic speech verification.
  • phonetic speaker verification shows that stop, fricative, nasal, diphthongs and vowel have important speaker-specific information in ascending order.
  • Said frequencies are then weighted for comparison between frames of the speech signal.
  • higher-order spectral analysis HOS is also applied to each frame to detect associated Gaussianity of the speech signal such as speech enhancement, channel selection and blind source separation.
  • variance, skewness and kurtosis are applied to select most noisy frame from other frames in the speech signal.
  • the most noisy frame is preferred as noise is known to be the main source of mismatch between enrolment (training) and testing sets in speaker verification systems. In addition to that, noise does not carry much speaker-specific information.
  • the frames with least speaker-specific information are selected to carry the watermark data Therefore, the embedded watermark cannot change the noisy frame severely. Thus, the watermark will be imperceptible and inaudible.
  • voice activity detection is applied to the selected frames to detect presence or absence of speaker's voice in the speech signal.
  • the step of detecting voice activity in the speech signal categorizes the selected frames into voice and non-voice frames.
  • Magnitude Sum Function (MSF), Pitch period and Zero Crossing Rates (ZCR) is utilized to determine the voiced and non-voiced frames.
  • Fig. 4 shows a preferred embodiment of the voice activity detection for separating voice and non-voice frames.
  • ZCR is counting number of times that speech signal cross the X axis.
  • the non-voice frame has more ZCR than voice frame due to high frequency character.
  • the MSF shows energy of the speech signal wherein in the preferred embodiment show that voice frame has more energy than non-voice frame due to lower frequency. Also, shown in Fig. 4 that pitch period in voice frame is higher than the non-voice frame.
  • the steps of embedding watermark data comprises of modifying probability distribution function of Linear Predictive Coding (LPC) coefficients.
  • LPC Linear Predictive Coding
  • constants may be applied to shape the probability density function of the LPCs in the method for embedding the speech watermarking. This is done by multiplying a constant to all LPCs and adding another constant to all LPCs. These constants change the variance and mean of the LPCs. Therefore, all LPCs of the speech frames will be embedded with the watermark to increase robustness instead of only embedding the watermark in just one LPC.
  • Fig.5 shows preferred embodiment of a schematic diagram for a method of embedding the speech watermarking in speaker verification.
  • the schematic diagram depicted how the probability density function is shaped by constants named alpha and beta.
  • Fig.6 shows preferred embodiment of a schematic diagram for a method of embedding the speech watermarking in speaker verification. The preferred embodiment in Fig. 6 shows how watermark may be detected by using mean and standard deviation.
  • the step of extracting watermark data from the speech signal is comprises the steps of:
  • the step of extracting watermark data from the speech signal is performed on the receiver side of the communication system.
  • the receiver receives the watermarked speech signal
  • the watermarked frames must be distinguished from non-watermarked frames. Therefore, synchronization is performed to arrange the received speech signals.
  • the step of performing synchronization may also improve timing and robustness between the transmitter and the receiver.
  • other information like meta data, parity, cycling redundancy check (CRC) and watermark information may also be sent from the transmitter to the receiver.
  • CRC cycling redundancy check
  • synchronization is performed for timing between transmitter and receiver. Then, based on synchronization information the watermarked speech signal is segmented to frames. Then, Voice Activity Detection (VAD) is applied to each frame to distinguish voice or non-voice speech signal. Fourth, based on VAD decision, type of watermark method is found and LPCs are extracted from the frame. Finally, watermark is detected based on the shape of probability density function of LPCs in the method of embedding the speech watermarking.
  • VAD Voice Activity Detection
  • frames of the least speaker-specific information are selected to carry the watermark data to preserve performance of features in the speaker-specific information in the speaker verification.
  • the method in the present invention may stand alone as a method for speech watermarking and may also be used in conventional speaker verification to solve security problems over channels without any degradation over performance, accuracy and efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The present invention relates toamethod forspeech watermarking inspeaker verification,comprising the steps of: embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver;characterisedby the steps of: selecting frameshavingleast speaker-specific information fromthe speech signal to carry watermark data; detecting voice activity to detect presence or absence of speaker's voice in the speech signal;and embedding watermark data into the selected frames of the speech signal according to thepresence or absence of the speaker's voice.

Description

A METHOD FOR SPEECH WATERMARKING IN SPEAKER VERIFICATION
Background of the Invention
Field of the Invention
This invention relates to a method for speech watermarking to provide a secure communication system in speaker verification, and more particularly to a method for speech watermarking by taking into account speaker-specific information and characteristics of speech features. Description of Related Arts
Speaker verification is a process to verify speaker identity in a speech signal to provide secure access in communication system particularly in a distance communication system involving critical subject matter such as telephone banking and air traffic control. In order to establish a secure communication system, speaker verification process is a must and need to be employed before taking further action. Conventional speaker verification techniques are exposed to two possible vulnerable points. Firstly, speech could be manipulated while speech is recorded before being transmitted and secondly when speech signal passes through the communication channel.
There are various techniques for performing speaker verification and one of the most well-known techniques is speech watermarking. Speech watermarking improves security of the conventional speaker verification by embedding watermark inside the speech signal at a transmitter side and extracting on a receiver side. Apart from the security issues, selecting proper features is another concern for the conventional speaker verification due to discriminant ability, reliability and robustness. Speaker recognition base on speech features has several common problems such as long-term effects due to physiological changes, the emotional state of the speaker, illness, time of the day, fatigue or tiredness, and auditory accommodation. This is due to the speaker-specific features having different concentration in each speech signal frame. Other problems of feature base speaker verification are time and cost of training, amount of data for training, level of security to achieve, and developing text dependant or text independent system. Furthermore, noise in the speech signal is a major contributor for mismatch between training and testing phases which could degrade speaker verification performance. Many researchers have tried to combat with undesired features effect as long as developing speaker modelling techniques to improve the accuracy.
One of the prior art is US patent 6892175 B1 , disclosed a method for encoding watermark in digital message such as a speech signal. The cited patent generates a spread spectrum signal, wherein the spread spectrum signal is representative of the digital information and further embedding the spread spectrum signal in the speech signal. Drawback of the cited patent is that the spread spectrum signal of the watermark is embedded in all frames of the speech signal. As the speech signal has less bandwidth compare to audio signal, thus speech signal can carry less watermark bits than the audio signal which is lead to less watermark capacity. Furthermore, implementing speech watermarking in all frames of the speech signal may degrade accuracy of the speaker verification while consuming more time.
In the paper of Marcos Faundez-Zanuy et al. disclosed speech watermarking which combines the spread spectrum approach with a simplified frequency masking. However, the present paper also does not consider the speaker-specific features for embedding the watermark data. Therefore, a lot challenges and opportunity in robustness, accuracy, and efficiency of the speech watermarking methods are yet to be explored particularly in distance speaker verification. Accordingly, it can be seen in the prior arts that there exists a need to provide a speech watermarking method for more secured while efficiently considering speaker-specific features of speech signal in speaker verification process. The speech watermarking method should be robust under unintentional attacks (i.e background noise, compression, amplitude scaling) and fragile under intentional attacks (i.e copying, cutting or removing). The speech watermarking method must also provide enough capacity to transmit verification data through speech signal. Also, there is trade-off between capacity, inaudibility and robustness that should be considered for designing speech watermark method.
References
· Marcos Faundez-Zanuy et al., Pattern Recognition Journal, Elsevier, volume 40, pages 3027-3034, February 2007.
• Faundez-Zanuy, Marcos, Jose J. Lucena-Molina, and Martin Hagmiiller.
"Speech Watermarking: An Approach for the Forensic Analysis of Digital Telephonic Recordings*." Journal of forensic sciences 55.4 (2010): 1080-1087.
Summary of Invention
It is an objective of the present invention to provide a robust, efficient and accurate speech watermarking method in speaker verification technique.
It is also an objective of the present invention to provide speech watermarking method having least speaker-specific features.
It is yet another objective of the present invention to provide speech watermarking method by selecting frames with the least speaker-specific features to carry watermark data.
It is a further objective of the present invention to provide an efficient speech watermarking method for a genuine distance speaker verification technique.
Accordingly, these objectives may be achieved by following the teachings of the present invention. The present invention relates to a method for speech watermarking in speaker verification, comprising the steps of: embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver; characterised by the steps of: selecting frames having least speaker-specific information from the speech signal to carry watermark data; detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and embedding watermark data into the selected frames of the speech signal according to the presence or absence of the speaker's voice. Brief Description of the Drawings
The features of the invention will be more readily understood and appreciated from the following detailed description when read in conjunction with the accompanying drawings of the preferred embodiment of the present invention, in which: Fig. 1 is a flow chart of a method for embedding speech watermarking in speaker verification of the present invention.
Fig. 2 is a schematic diagram of a method for speech watermarking in speaker verification of the present invention.
Fig. 3 is a flow chart of frame selection in the method of the speech watermarking in the present invention.
Fig. 4 shows a step of detecting voice activity for separating voice and non-voice frames.
Fig. 5 is a schematic diagram for a method of embedding the speech watermarking in speaker verification in the present invention.
Fig. 6 is a schematic diagram for a method of extracting speech watermarking in speaker verification in the present invention.
Detailed Description of the Invention
As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting but merely as a basis for claims. It should be understood that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modification, equivalents and alternatives falling within the scope of the present invention as defined by the appended claims. As used throughout this application, the word "may" is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words "include," "including," and "includes" mean including, but not limited to. Further, the words "a" or "an" mean "at least one" and the word "plurality" means one or more, unless otherwise mentioned. Where the abbreviations or technical terms are used, these indicate the commonly accepted meanings as known in the technical field. For ease of reference, common reference numerals will be used throughout the figures when referring to the same or similar features common to the figures. The present invention will now be described with reference to Figs. 1-6.
The present invention provides a method for speech watermarking in speaker verification, comprising the steps of:
embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver; characterised by the steps of.
selecting frames having least speaker-specific information from the speech signal to carry watermark data;
detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and
embedding watermark data into the selected frames of the speech signal according to the presence or absence of the speaker's voice.
Referring to Fig. 1 , the method for speech watermarking in speaker verification of the present invention comprises embedding watermark data into the speech signal. The embedding watermark process is employed at the transmitter side whereby only watermarked speech signal is available at the receiver. Then, the watermarked speech signal is transmitted over a communication channel to the receiver to go through a watermark extraction method as shown in Fig. 2 before being further processed.
As shown in Fig.3, the speech signal is first undergo a frame selection to prioritize frames of the speech signal to carry the watermark data. This is due to the speaker specific-information are not uniformly distributed into all frames of the speech signal. In a preferred embodiment of the method for speech watermarking in speaker verification, the speaker-specific information depends on system noise, fundamental frequencies, system features and source features of the speaker-specific information. In a preferred embodiment, the system features relates to structure of speaker vocal fold while source features are up to manner and vibration of speaker vocal cords.
Fig. 3 shows a preferred embodiment of the step for selecting frames of the speech signal. In a preferred embodiment, fundamental frequency estimation for the frame selection is estimated using Linear Predictive Coding (LPC). LPC is applied on each frame of the speech signal to calculate prediction number of dominant frequencies and prediction error for extracting glottal closure instance (GCI) in next step. In a preferred embodiment, most speaker discriminant frequencies are located in low frequencies below 600 Hz and high frequencies above 3500 Hz. Some frequencies are located in mid frequency area of 500Hz to 3500Hz which is most important for phonetic speech verification. In the preferred embodiment, phonetic speaker verification shows that stop, fricative, nasal, diphthongs and vowel have important speaker-specific information in ascending order. Said frequencies are then weighted for comparison between frames of the speech signal. In the preferred embodiment in the present invention, higher-order spectral analysis (HOS) is also applied to each frame to detect associated Gaussianity of the speech signal such as speech enhancement, channel selection and blind source separation. In the preferred embodiment, variance, skewness and kurtosis are applied to select most noisy frame from other frames in the speech signal. The most noisy frame is preferred as noise is known to be the main source of mismatch between enrolment (training) and testing sets in speaker verification systems. In addition to that, noise does not carry much speaker-specific information. In a preferred embodiment of the method for speech watermarking in speaker verification, the frames with least speaker-specific information are selected to carry the watermark data Therefore, the embedded watermark cannot change the noisy frame severely. Thus, the watermark will be imperceptible and inaudible. When the least speaker-specific features frames are selected, voice activity detection is applied to the selected frames to detect presence or absence of speaker's voice in the speech signal. The step of detecting voice activity in the speech signal categorizes the selected frames into voice and non-voice frames. In a preferred embodiment, Magnitude Sum Function (MSF), Pitch period and Zero Crossing Rates (ZCR) is utilized to determine the voiced and non-voiced frames. Fig. 4 shows a preferred embodiment of the voice activity detection for separating voice and non-voice frames. ZCR is counting number of times that speech signal cross the X axis. In a preferred embodiment, the non-voice frame has more ZCR than voice frame due to high frequency character. On the other hand, the MSF shows energy of the speech signal wherein in the preferred embodiment show that voice frame has more energy than non-voice frame due to lower frequency. Also, shown in Fig. 4 that pitch period in voice frame is higher than the non-voice frame.
In a preferred embodiment of the method for speech watermarking in speaker verification, the steps of embedding watermark data comprises of modifying probability distribution function of Linear Predictive Coding (LPC) coefficients. However, it may be difficult to modify each LPC because the LPCs are varying during the embedding and extraction process even without speech manipulation attack. In another preferred embodiment, constants may be applied to shape the probability density function of the LPCs in the method for embedding the speech watermarking. This is done by multiplying a constant to all LPCs and adding another constant to all LPCs. These constants change the variance and mean of the LPCs. Therefore, all LPCs of the speech frames will be embedded with the watermark to increase robustness instead of only embedding the watermark in just one LPC. Fig.5 shows preferred embodiment of a schematic diagram for a method of embedding the speech watermarking in speaker verification. In the preferred embodiment, the schematic diagram depicted how the probability density function is shaped by constants named alpha and beta. Fig.6 shows preferred embodiment of a schematic diagram for a method of embedding the speech watermarking in speaker verification. The preferred embodiment in Fig. 6 shows how watermark may be detected by using mean and standard deviation.
In a preferred embodiment of the method for speech watermarking in speaker verification, the step of extracting watermark data from the speech signal is comprises the steps of:
performing synchronization of a decoder to the speech signal;
detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and
extracting watermark data from the speech signal according to the presence or absence of the speaker's voice.
In the preferred embodiment of the present invention as shown in Fig.2, the step of extracting watermark data from the speech signal is performed on the receiver side of the communication system. When the receiver receives the watermarked speech signal, the watermarked frames must be distinguished from non-watermarked frames. Therefore, synchronization is performed to arrange the received speech signals. The step of performing synchronization may also improve timing and robustness between the transmitter and the receiver. Besides that, through the step of synchronization, other information like meta data, parity, cycling redundancy check (CRC) and watermark information may also be sent from the transmitter to the receiver.
Firstly in the step of performing synchronization, synchronization is performed for timing between transmitter and receiver. Then, based on synchronization information the watermarked speech signal is segmented to frames. Then, Voice Activity Detection (VAD) is applied to each frame to distinguish voice or non-voice speech signal. Fourth, based on VAD decision, type of watermark method is found and LPCs are extracted from the frame. Finally, watermark is detected based on the shape of probability density function of LPCs in the method of embedding the speech watermarking. According to the method for speech watermarking in the present invention considering speaker-specific information and embedding watermark data regarding to speech characteristics of voice and non-voice frames provide solution to security issues in speaker verification as well as improving the efficacy and accuracy of the speaker verification. Therefore, frames of the least speaker-specific information are selected to carry the watermark data to preserve performance of features in the speaker-specific information in the speaker verification. The method in the present invention may stand alone as a method for speech watermarking and may also be used in conventional speaker verification to solve security problems over channels without any degradation over performance, accuracy and efficiency.
Although the present invention has been described with reference to specific embodiments, also shown in the appended figures, it will be apparent for those skilled in the art that many variations and modifications can be done within the scope of the invention as described in the specification and defined in the following claims.

Claims

A method for speech watermarking in speaker verification, comprising the steps of:
embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver; characterised by the steps of:
selecting frames having least speaker-specific information from the speech signal to carry watermark data;
detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and
embedding watermark data into the selected frames of the speech signal according to the presence or absence of the speaker's voice.
A method for speech watermarking in speaker verification according to claim 1 , wherein speaker-specific information depends on system noise, fundamental frequencies, system features and source features of the speaker-specific information.
A method for speech watermarking in speaker verification according to claim 1 , wherein the step of extracting watermark data from the speech signal is comprises the steps of:
performing synchronization of a decoder to the speech signal; detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and
extracting watermark data from the speech signal according to the presence or absence of the speaker's voice.
PCT/MY2014/000138 2013-07-22 2014-05-29 A method for speech watermarking in speaker verification WO2015012680A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2013701280A MY180944A (en) 2012-09-14 2013-07-22 A method for speech watermarking in speaker verification
MYPI2013701280 2013-07-22

Publications (2)

Publication Number Publication Date
WO2015012680A2 true WO2015012680A2 (en) 2015-01-29
WO2015012680A3 WO2015012680A3 (en) 2015-03-26

Family

ID=51542420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2014/000138 WO2015012680A2 (en) 2013-07-22 2014-05-29 A method for speech watermarking in speaker verification

Country Status (1)

Country Link
WO (1) WO2015012680A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2552722A (en) * 2016-08-03 2018-02-07 Cirrus Logic Int Semiconductor Ltd Speaker recognition
US10950245B2 (en) 2016-08-03 2021-03-16 Cirrus Logic, Inc. Generating prompts for user vocalisation for biometric speaker recognition
CN113113021A (en) * 2021-04-13 2021-07-13 效生软件科技(上海)有限公司 Voice biological recognition authentication real-time detection method and system
US11269976B2 (en) 2019-03-20 2022-03-08 Saudi Arabian Oil Company Apparatus and method for watermarking a call signal
CN114999502A (en) * 2022-05-19 2022-09-02 贵州财经大学 Adaptive word framing based voice content watermark generation and embedding method and voice content integrity authentication and tampering positioning method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531176B (en) * 2016-10-27 2019-09-24 天津大学 The digital watermarking algorithm of audio signal tampering detection and recovery

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6892175B1 (en) 2000-11-02 2005-05-10 International Business Machines Corporation Spread spectrum signaling for speech watermarking

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002514318A (en) * 1997-01-31 2002-05-14 ティ―ネティックス,インコーポレイテッド System and method for detecting recorded speech

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6892175B1 (en) 2000-11-02 2005-05-10 International Business Machines Corporation Spread spectrum signaling for speech watermarking

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FAUNDEZ-ZANUY; MARCOS; JOSE J. LUCENA-MOLINA; MARTIN HAGMULLER.: "Speech Watermarking: An Approach for the Forensic Analysis of Digital Telephonic Recordings", JOURNAL OF FORENSIC SCIENCES, vol. 55.4, 2010, pages 1080 - 1087, XP055159377, DOI: doi:10.1111/j.1556-4029.2010.01395.x
MARCOS FAUNDEZ-ZANUY ET AL.: "Pattern Recognition Journal", vol. 40, February 2007, ELSEVIER, pages: 3027 - 3034

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2552722A (en) * 2016-08-03 2018-02-07 Cirrus Logic Int Semiconductor Ltd Speaker recognition
WO2018025024A1 (en) * 2016-08-03 2018-02-08 Cirrus Logic International Semiconductor Limited Speaker recognition
GB2567339A (en) * 2016-08-03 2019-04-10 Cirrus Logic Int Semiconductor Ltd Speaker recognition
US10726849B2 (en) 2016-08-03 2020-07-28 Cirrus Logic, Inc. Speaker recognition with assessment of audio frame contribution
US10950245B2 (en) 2016-08-03 2021-03-16 Cirrus Logic, Inc. Generating prompts for user vocalisation for biometric speaker recognition
GB2567339B (en) * 2016-08-03 2022-04-06 Cirrus Logic Int Semiconductor Ltd Speaker recognition
US11735191B2 (en) 2016-08-03 2023-08-22 Cirrus Logic, Inc. Speaker recognition with assessment of audio frame contribution
US11269976B2 (en) 2019-03-20 2022-03-08 Saudi Arabian Oil Company Apparatus and method for watermarking a call signal
CN113113021A (en) * 2021-04-13 2021-07-13 效生软件科技(上海)有限公司 Voice biological recognition authentication real-time detection method and system
CN114999502A (en) * 2022-05-19 2022-09-02 贵州财经大学 Adaptive word framing based voice content watermark generation and embedding method and voice content integrity authentication and tampering positioning method
CN114999502B (en) * 2022-05-19 2023-01-06 贵州财经大学 Adaptive word framing based voice content watermark generation and embedding method and voice content integrity authentication and tampering positioning method

Also Published As

Publication number Publication date
WO2015012680A3 (en) 2015-03-26

Similar Documents

Publication Publication Date Title
Mowlaee et al. Advances in phase-aware signal processing in speech communication
WO2015012680A2 (en) A method for speech watermarking in speaker verification
Mak et al. A study of voice activity detection techniques for NIST speaker recognition evaluations
Cooke A glimpsing model of speech perception in noise
Hu et al. Pitch‐based gender identification with two‐stage classification
EP2224433B1 (en) An apparatus for processing an audio signal and method thereof
Nematollahi et al. An overview of digital speech watermarking
RU2680352C1 (en) Encoding mode determining method and device, the audio signals encoding method and device and the audio signals decoding method and device
Hu et al. Segregation of unvoiced speech from nonspeech interference
KR20130031849A (en) A bandwidth extender
Narayanan et al. The role of binary mask patterns in automatic speech recognition in background noise
Kakouros et al. Evaluation of spectral tilt measures for sentence prominence under different noise conditions
CN102376306B (en) Method and device for acquiring level of speech frame
Wang et al. Detection of speech tampering using sparse representations and spectral manipulations based information hiding
Celik et al. Pitch and duration modification for speech watermarking
Wang et al. Tampering Detection Scheme for Speech Signals using Formant Enhancement based Watermarking.
Wang et al. Formant enhancement based speech watermarking for tampering detection
Ijitona et al. Improved silence-unvoiced-voiced (SUV) segmentation for dysarthric speech signals using linear prediction error variance
Nematollahi et al. Semifragile speech watermarking based on least significant bit replacement of line spectral frequencies
Srinivasan et al. A model for multitalker speech perception
Joglekar et al. DeepComboSAD: Spectro-Temporal Correlation Based Speech Activity Detection for Naturalistic Audio Streams
Nishimura Reversible audio data hiding based on variable error-expansion of linear prediction for segmental audio and G. 711 speech
Wang et al. Speech Watermarking Based on Source-filter Model of Speech Production.
El-Maleh Classification-based Techniques for Digital Coding of Speech-plus-noise
Mawalim et al. Improving Security in McAdams Coefficient-Based Speaker Anonymization by Watermarking Method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14766550

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14766550

Country of ref document: EP

Kind code of ref document: A2