WO2015183254A1 - Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system - Google Patents

Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system Download PDF

Info

Publication number
WO2015183254A1
WO2015183254A1 PCT/US2014/039722 US2014039722W WO2015183254A1 WO 2015183254 A1 WO2015183254 A1 WO 2015183254A1 US 2014039722 W US2014039722 W US 2014039722W WO 2015183254 A1 WO2015183254 A1 WO 2015183254A1
Authority
WO
WIPO (PCT)
Prior art keywords
glottal
glottal pulse
signal
pulse
database
Prior art date
Application number
PCT/US2014/039722
Other languages
English (en)
French (fr)
Inventor
Rajesh DACHIRAJU
Aravind GANAPATHIRAJU
Original Assignee
Interactive Intelligence, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Interactive Intelligence, Inc. filed Critical Interactive Intelligence, Inc.
Priority to JP2016567717A priority Critical patent/JP6449331B2/ja
Priority to BR112016027537-3A priority patent/BR112016027537B1/pt
Priority to EP14893138.9A priority patent/EP3149727B1/en
Priority to AU2014395554A priority patent/AU2014395554B2/en
Priority to CA3178027A priority patent/CA3178027A1/en
Priority to NZ725925A priority patent/NZ725925A/en
Priority to PCT/US2014/039722 priority patent/WO2015183254A1/en
Priority to CA2947957A priority patent/CA2947957C/en
Publication of WO2015183254A1 publication Critical patent/WO2015183254A1/en
Priority to ZA2016/07696A priority patent/ZA201607696B/en
Priority to AU2020227065A priority patent/AU2020227065B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers

Definitions

  • the present invention generally relates to telecommunications systems and methods, as well as speech synthesis. More particularly, the present invention pertains to the formation of the excitation signal in a Hidden Markov Model based statistical parametric speech synthesis system.
  • a method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system is presented.
  • fundamental frequency values are used to form the excitation signal.
  • the excitation is modeled using a voice source pulse selected from a database of a given speaker.
  • the voice source signal is segmented into glottal segments, which are used in vector representation to identify the glottal pulse used for formation of the excitation signal.
  • Use of a novel distance metric and preserving the original signals extracted from the speakers voice samples helps capture low frequency information of the excitation signal.
  • segment edge artifacts are removed by applying a unique segment joining method to improve the quality of synthetic speech while creating a true representation of the voice quality of a speaker.
  • a method is presented to create a glottal pulse database from a speech signal, comprising the steps of: performing pre-filtering on the speech signal to obtain a pre-filtered signal; analyzing the pre-filtered signal to obtain inverse filtering parameters; performing inverse filtering of the speech signal using the inverse filtering parameters; computing an integrated linear prediction residual signal using the inversely filtered speech signal; identifying glottal segment boundaries in the speech signal; segmenting the integrated linear prediction residual signal into glottal pulses using the identified glottal segment boundaries from the speech signal; performing normalization of the glottal pulses; and forming the glottal pulse database by collecting all normalized glottal pulses obtained for the speech signal.
  • a method is presented to form parametric models, comprising the steps of: computing a glottal pulse distance metric between a number of glottal pulses; clustering the glottal pulse database into a number of clusters to determine centroid glottal pulses; forming a corresponding vector database by associating a vector with each glottal pulse in the glottal pulse database, wherein the centroid glottal pulses and the distance metric is defined mathematically to determine association; determining Eigenvectors of the vector database; and forming parametric models by associating a glottal pulse from the glottal pulse database to each determined Eigenvector.
  • a method is presented to synthesize speech using input text, comprising the steps of: a) converting the input text into context dependent phone labels; b) processing the phone la bels created in step (a) using trained parametric models to predict fundamental frequency values, duration of the speech synthesized, and spectral features of the phone labels; c) creating an excitation signal using an Eigen glottal pulse and said predicted one or more of: fundamental frequency values, spectral features of phone labels, and duration of the speech synthesized; and d) combining the excitation signal with the spectral features of the phone labels using a filter to create synthetic speech output.
  • Figure 1 is a diagram illustrating an embodiment of an Hidden Markov Model based Text to Speech system.
  • Figure 2 is a diagram illustrating an embodiment of a signal.
  • Figure 3 is a diagram illustrating an embodiment of excitation signal creation.
  • Figure 4 is a diagram illustrating an embodiment of excitation signal creation.
  • Figure 5 is a diagram illustrating an embodiment of overlap boundaries.
  • Figure 6 is a diagram illustrating an embodiment of excitation signal creation.
  • Figure 7 is a diagram illustrating an embodiment of glottal pulse identification.
  • Figure 8 is a diagram illustrating an embodiment of glottal pulse database creation. DETAILED DESCRIPTION
  • Excitation is generally assumed to be a quasi-periodic sequence of impulses for voiced regions.
  • T 0 represents pitch period and F 0 represents fundamental frequency.
  • the excitation in unvoiced regions, is modeled as white noise. In voiced regions, the excitation is not actually impulse sequences. The excitation is instead a sequence of voice source pulses which occur due to vibration of the vocal folds.
  • the pulses' shapes may vary depending on various factors such as the speaker, the mood of the speaker, the linguistic context, emotions, etc.
  • Source pulses have been treated mathematically as vectors by length normalization (through resampling) and impulse alignment, as described in European Patent EP 2242045 (granted June 27, 2012, inventors Thomas Drugman, et al.)
  • the final length of normalized source pulse signal is resampled to meet the target pitch.
  • the source pulse is not chosen from a database, but obtained over a series of calculations which compromise the pulse characteristics in the frequency domain.
  • the approximate excitation signal used for creating a pulse database does not capture low frequency source content as there is no pre-filtering done while determining the Linear Prediction (LP) coefficients, which are used for inverse filtering.
  • LP Linear Prediction
  • speech unit signals are represented by a set of parameters which can be used to synthesize speech.
  • the parameters may be learned by statistical models, such as HMMs, for example.
  • speech may be represented as a source-filter model, wherein source/excitation is a signal which when passed through an appropriate filter produces a given sound.
  • Figure 1 is a diagram illustrating an embodiment of a Hidden Markov Model (HMM) based Text to Speech (TTS) system.
  • HMM Hidden Markov Model
  • TTS Text to Speech
  • the Speech Database 105 may contain an amount of speech data for use in speech synthesis.
  • a speech signal 106 is converted into parameters.
  • the parameters may be comprised of excitation parameters and spectral parameters.
  • Excitation Parameter Extraction 110 and Spectral Parameter Extraction 115 occurs from the speech signal 106 which travels from the Speech Database 105.
  • a Hidden Markov Model 120 may be trained using these extracted parameters and the Labels 107 from the Speech Database 105. Any number of HM M models may result from the training and these context dependent HMMs are stored in a database 125.
  • the synthesis phase begins as the context dependent HMMs 125 are used to generate parameters 140.
  • the parameter generation 140 may utilize input from a corpus of text 130 from which speech is to be synthesized from.
  • the text 130 may undergo analysis 135 and the extracted labels 136 are used in the generation of parameters 140.
  • excitation and spectral parameters may be generated in 140.
  • the excitation parameters may be used to generate the excitation signal 145, which is input, along with the spectral parameters, into a synthesis filter 150.
  • Filter parameters are generally Mel frequency cepstral coefficients (MFCC) and are often modeled by a statistical time series by using H M Ms.
  • MFCC Mel frequency cepstral coefficients
  • H M Ms The predicted values of the filter and the fundamental frequency as time series values may be used to synthesize the filter by creating an excitation signal from the fundamental frequency values and the M FCC values used to form the filter.
  • Synthesized speech 155 is produced when the excitation signal passes through the filter.
  • the formation of the excitation signal 145 is integral to the quality of the output, or synthesized, speech 155. Low frequency information of the excitation is not captured. It will thus be appreciated that an approach is needed to captu re the low frequency source content of the excitation signal and to improve the quality of synthetic speech.
  • FIG. 2 is a graphical illustration of an embodiment of the signal regions of a speech segment, indicated generally at 200.
  • the signal has been broken down into segments based on funda mental frequency values for categories such as voiced, unvoiced, and pause segments.
  • the vertical axis 205 illustrates fundamental frequency in Hertz (Hz) while the horizontal axis 210 represents the passage of milliseconds (ms).
  • the time series, F 0 , 215 represents the fundamental frequency.
  • the voiced region, 220 can be seen as a series of peaks and may be referred to as a non-zero segment.
  • the non-zero segments 220 may be concatenated to form an excitation signal for the entire speech, as described in further detail below.
  • the unvoiced region 225 is seen as having no peaks in the graphical illustration 200 and may be referred to as zero segments.
  • the zero segments may represent a pause or an u nvoiced segment given by the phone la bels.
  • Figure 3 is a diagram illustrating an embodiment of excitation signal creation ind icated generally at 300.
  • Figure 3 illustrates the creation of the excitation signal for both u nvoiced and pause segments.
  • the fundamental frequency time series values, represented as F 0 represent signal regions 305 that are broken down into voiced, unvoiced, and pause segments based on the F 0 values.
  • An excitation signal 320 is created for unvoiced and pause segments. Where pauses occur, zeroes (0) are placed in the excitation signal. In unvoiced regions, white noise of appropriate energy (in one embodiment, this may be determined empirically by listening tests) is used as the excitation signal.
  • the signal regions, 305, along with the Glottal Pulse 310 are used for excitation generation 315 and subsequent generation of the excitation signal 320.
  • the Glottal Pulse 310 comprises an Eigen glottal pulse that has been identified from the glottal pulse database, the creation of which is described in further detail in Figure 8 below.
  • FIG. 4 is a diagram illustrating an embodiment of excitation signal creation for a voiced segment, indicated generally at 400. It is assumed that a Eigen glottal pulse has been identified from the glottal pulse database (described in further detail in Figure 7 below).
  • the signal region 405 comprises F 0 values, which may be predicted by models, from the voiced segment.
  • the lengths of the F 0 segments, which may be represented by Nf, are used to determine the length of the excitation signal using the mathematical equation:
  • f s represents the sampling frequency of the signal.
  • the value of 5/1000 represents the interval of 5 ms durations that the F 0 values are determined for. It should be noted that any interval of a designated duration of a unit time may be used.
  • Another array, designated as F Q ( I), is obtained by linearly interpolating the F 0 array.
  • glottal boundaries are created, 410, which mark the pitch boundaries of the excitation signal of the voiced segments in the signal region 405.
  • the pitch period array may be computed using the following mathematical equation:
  • the glottal pulse 415 is used along with the identified glottal boundaries 410 in the overlap adding 420 of a glottal pulse beginning at each glottal boundary.
  • the excitation signal 425 is then created through the process of "stitching", or segment joining, to avoid boundary effects which are further described in Figures 5 and 6.
  • Figure 5 is a diagram illustrating an embodiment of overlap boundaries, indicated generally at 500.
  • the illustration 500 represents a series of glottal pulses 515 and overlapping glottal pulses 520 in the segment.
  • the vertical axis 505 represents the amplitude of excitation.
  • the horizontal axis 510 may represent the frame number.
  • FIG. 6 is a diagram illustrating an embodiment of excitation signal creation for a voiced segment, indicated generally at 600.
  • Switching may be used to form the final excitation signal of voiced segments (from Figure 4), which is ideally devoid of boundary effects.
  • any number of different excitation signals may have been formed through the overlap add method illustrated in Figure 4 and in the diagram 500 ( Figure 5).
  • the different excitation signals may have a constantly increasing amount of shifts in glottal boundaries 605 and an equal amount of circular left shift 630 for the glottal pulse signal.
  • the glottal pulse signal 615 is of a length less than the corresponding pitch period, then the glottal pulse may be zero extended 625 to the length of the pitch period before circular left shifting 630 is performed.
  • the highest pitch period present in the given voice segment is represented as m * w.
  • Glottal pulses are created and associated with each pitch boundary array P m .
  • the glottal pulses 620 may be obtained from the glottal pulse signal of some length N by first zero extending it to the pitch period and then circularly left shifting it by m * w samples.
  • an excitation signal 635 is formed by initializing the glottal pulses to zero (0).
  • the formed signal is as a single stitched excitation, corresponding to the shift, m.
  • the arithmetic mean of all of the single stitched excitation signals is then computed 640, which represents the final excitation signal for the voiced segment 645.
  • FIG 7 is a diagram illustrating an embodiment of glottal pulse identification, indicated generally at 700.
  • any two given glottal pulses may be used to compute the distance metric/dissimilarity between them. These are taken from the glottal pulse database 840 created in process 800 (further described in Figure 8 below).
  • the computation may be performed by decomposing the two given glottal pulses X j , j into sub-band components
  • the given glottal pulse may be transformed into the frequency domain by using a method such as Discrete Cosine Transform (DCT), for example.
  • DCT Discrete Cosine Transform
  • the frequency band may be split into a number of bands, which are demodulated and converted into time domain. In this example, three bands are used for illustrative purposes.
  • the sub-band metric which may be represented as d s (f, g), where d s represents the distance between the two sub-band components / and g, may be computed as described in the following paragraphs.
  • the Discrete Hilbert Transform of normalized circular cross correlation is computed and denoted as R ⁇ g n). Using the normalized circular cross correlation and the Discrete Hilbert Transform of the normalized circular cross correlation, the signal may be determined as:
  • H fi g(n) ⁇ Rf ,g (n 2 + R ⁇ g (n) 2 .
  • the sub-band metric, d s (f, g), between the two sub-band components / and g may be determined as:
  • d s (f, g) J2(l - cos 0(/, #).
  • the glottal pulse database 840 may be clustered into a number of clusters, for example 256 (or M), using a modified k-means algorithm 705. Instead of using the Euclidean distance metric, the distance metric defined above is used. The centroids of a cluster are then updated with that element of the cluster whose sum of squares of distances from all other elements of that cluster is minimum such that:
  • the clustering iterations are terminated when there is no shift in any of the centroids of the k clusters.
  • a vector a set of N real numbers, for example 256, is associated with every glottal pulse 710 in the glottal pulse database 840 to form a corresponding vector database 715.
  • the associating is performed for a given glottal pulse x it a vector
  • Vi [ ⁇ ( ⁇ ⁇ ), ⁇ 2 ( ⁇ ), ⁇ 3 ( ⁇ ), - ⁇ ( ⁇ ), - ⁇ 2 56 ( ⁇ )] )
  • ⁇ ; ( ⁇ ⁇ ) d 2 (x il Cj ) - d 2 ⁇ x i , x 0 ) - d 2 (cj, x 0 ) and, x 0 is a fixed glottal pulse picked from the database and d 2 (x j , ; ) represents the square of the distance metric defined above between two glottal pulses x ; and Cj and assuming that c , c 2 , ... C j , . . 2 56 are the centroid glottal pulses determined by clustering.
  • the vector associated with the given glottal pulse x t may be computed with the mathematical equation:
  • V t [ ⁇ 1 ( ⁇ ⁇ ), ⁇ 2 ( ⁇ ), ⁇ 3 ( ⁇ ), - ⁇ ; ( ⁇ ), - ⁇ 2 56 (3 ⁇ 4)]
  • step 720 Principal Component Analysis (PCA) is performed to compute Eigenvectors of the vector database 715.
  • PCA Principal Component Analysis
  • any one Eigenvector may be chosen 725.
  • the closest matching vector 730 to the chosen Eigenvector from the vector database 715 is then determined in the sense of Euclidean distance.
  • the glottal pulse from the pulse database 840 which corresponds to the closest matching vector 730 is regarded as the resulting Eigen glottal pulse 735 associated with an Eigenvector.
  • FIG. 8 is a diagram illustrating an embodiment of glottal pulse database creation indicated generally at 800.
  • a speech signal, 805, undergoes pre-filtering, such as pre-emphasis 810.
  • Linear Prediction (LP) Analysis, 815 is performed using the pre-filtered signal to obtain the LP coefficients.
  • LP Linear Prediction
  • Low frequency information of the excitation may be captured.
  • the coefficients are determined, they are used to inverse filter, 820, the original speech signal, 805, which is not pre-filtered, to compute the Integrated Linear Prediction Residual (ILPR) signal 825.
  • the ILPR signal 825 may be used as an approximation to the excitation signal, or voice source signal.
  • the ILPR signal 825 is segmented 835 into glottal pulses using the glottal segment/cycle boundaries that have been determined from the speech signal 805.
  • the segmentation 835 may be performed using the Zero Frequency Filtering Technique (ZFF) technique.
  • ZFF Zero Frequency Filtering Technique
  • the resulting glottal pulses may then be energy normalized. All of the glottal pulses for the entire speech training data are combined in order to form the glottal pulse database 840.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/US2014/039722 2014-05-28 2014-05-28 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system WO2015183254A1 (en)

Priority Applications (10)

Application Number Priority Date Filing Date Title
JP2016567717A JP6449331B2 (ja) 2014-05-28 2014-05-28 パラメトリック音声合成システムに基づく声門パルスモデルの励起信号形成方法
BR112016027537-3A BR112016027537B1 (pt) 2014-05-28 2014-05-28 Método para criar um banco de dados de pulso glotal a partir de um sinal de discurso, em um sistema de síntese de discurso, método para criar modelos paramétricos para o uso no treinamento do sistema de síntese de discurso executado por um processador de computador genérico, e método para sintetizar o discurso usando o texto de entrada
EP14893138.9A EP3149727B1 (en) 2014-05-28 2014-05-28 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
AU2014395554A AU2014395554B2 (en) 2014-05-28 2014-05-28 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
CA3178027A CA3178027A1 (en) 2014-05-28 2014-05-28 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
NZ725925A NZ725925A (en) 2014-05-28 2014-05-28 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
PCT/US2014/039722 WO2015183254A1 (en) 2014-05-28 2014-05-28 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
CA2947957A CA2947957C (en) 2014-05-28 2014-05-28 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
ZA2016/07696A ZA201607696B (en) 2014-05-28 2016-11-08 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
AU2020227065A AU2020227065B2 (en) 2014-05-28 2020-09-03 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2014/039722 WO2015183254A1 (en) 2014-05-28 2014-05-28 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system

Publications (1)

Publication Number Publication Date
WO2015183254A1 true WO2015183254A1 (en) 2015-12-03

Family

ID=54699420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/039722 WO2015183254A1 (en) 2014-05-28 2014-05-28 Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system

Country Status (8)

Country Link
EP (1) EP3149727B1 (hu)
JP (1) JP6449331B2 (hu)
AU (2) AU2014395554B2 (hu)
BR (1) BR112016027537B1 (hu)
CA (2) CA2947957C (hu)
NZ (1) NZ725925A (hu)
WO (1) WO2015183254A1 (hu)
ZA (1) ZA201607696B (hu)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017210630A1 (en) 2016-06-02 2017-12-07 Interactive Intelligence Group, Inc. Technologies for authenticating a speaker using voice biometrics
WO2018043708A1 (ja) * 2016-09-05 2018-03-08 国立研究開発法人情報通信研究機構 音声のイントネーション構造を抽出する方法及びそのためのコンピュータプログラム
US10014007B2 (en) 2014-05-28 2018-07-03 Interactive Intelligence, Inc. Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
US10255903B2 (en) 2014-05-28 2019-04-09 Interactive Intelligence Group, Inc. Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6795807B1 (en) * 1999-08-17 2004-09-21 David R. Baraff Method and means for creating prosody in speech regeneration for laryngectomees
US20120123782A1 (en) * 2009-04-16 2012-05-17 Geoffrey Wilfart Speech synthesis and coding methods
US8386256B2 (en) * 2008-05-30 2013-02-26 Nokia Corporation Method, apparatus and computer program product for providing real glottal pulses in HMM-based text-to-speech synthesis
US20140142946A1 (en) * 2012-09-24 2014-05-22 Chengjun Julian Chen System and method for voice transformation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5400434A (en) * 1990-09-04 1995-03-21 Matsushita Electric Industrial Co., Ltd. Voice source for synthetic speech system
JP2002244689A (ja) * 2001-02-22 2002-08-30 Rikogaku Shinkokai 平均声の合成方法及び平均声からの任意話者音声の合成方法
JP5075865B2 (ja) * 2009-03-25 2012-11-21 株式会社東芝 音声処理装置、方法、及びプログラム
JP5085700B2 (ja) * 2010-08-30 2012-11-28 株式会社東芝 音声合成装置、音声合成方法およびプログラム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6795807B1 (en) * 1999-08-17 2004-09-21 David R. Baraff Method and means for creating prosody in speech regeneration for laryngectomees
US8386256B2 (en) * 2008-05-30 2013-02-26 Nokia Corporation Method, apparatus and computer program product for providing real glottal pulses in HMM-based text-to-speech synthesis
US20120123782A1 (en) * 2009-04-16 2012-05-17 Geoffrey Wilfart Speech synthesis and coding methods
US20140142946A1 (en) * 2012-09-24 2014-05-22 Chengjun Julian Chen System and method for voice transformation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3149727A4 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10014007B2 (en) 2014-05-28 2018-07-03 Interactive Intelligence, Inc. Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
US10255903B2 (en) 2014-05-28 2019-04-09 Interactive Intelligence Group, Inc. Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
US10621969B2 (en) 2014-05-28 2020-04-14 Genesys Telecommunications Laboratories, Inc. Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
WO2017210630A1 (en) 2016-06-02 2017-12-07 Interactive Intelligence Group, Inc. Technologies for authenticating a speaker using voice biometrics
EP3469580A4 (en) * 2016-06-02 2020-01-08 Genesys Telecommunications Laboratories, Inc. TECHNOLOGIES FOR AUTHENTICATING A SPEAKER BY LANGUAGE BIOMETRY
US10614814B2 (en) 2016-06-02 2020-04-07 Interactive Intelligence Group, Inc. Technologies for authenticating a speaker using voice biometrics
WO2018043708A1 (ja) * 2016-09-05 2018-03-08 国立研究開発法人情報通信研究機構 音声のイントネーション構造を抽出する方法及びそのためのコンピュータプログラム

Also Published As

Publication number Publication date
CA2947957C (en) 2023-01-03
EP3149727A4 (en) 2018-01-24
NZ725925A (en) 2020-04-24
AU2014395554A1 (en) 2016-11-24
EP3149727B1 (en) 2021-01-27
BR112016027537B1 (pt) 2022-05-10
EP3149727A1 (en) 2017-04-05
CA3178027A1 (en) 2015-12-03
AU2020227065A1 (en) 2020-09-24
ZA201607696B (en) 2019-03-27
JP6449331B2 (ja) 2019-01-09
JP2017520016A (ja) 2017-07-20
AU2020227065B2 (en) 2021-11-18
CA2947957A1 (en) 2015-12-03
BR112016027537A2 (hu) 2017-08-15
AU2014395554B2 (en) 2020-09-24

Similar Documents

Publication Publication Date Title
AU2020227065B2 (en) Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
US10621969B2 (en) Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
JP4802135B2 (ja) 話者認証登録及び確認方法並びに装置
US10014007B2 (en) Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
KR20130133858A (ko) 청각 주의 큐를 사용하여 스피치 음절/모음/음의 경계 검출
CN108369803B (zh) 用于形成基于声门脉冲模型的参数语音合成系统的激励信号的方法
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
US11929058B2 (en) Systems and methods for adapting human speaker embeddings in speech synthesis
JP2017520016A5 (ja) パラメトリック音声合成システムに基づく声門パルスモデルの励起信号形成方法
EP3113180B1 (en) Method for performing audio inpainting on a speech signal and apparatus for performing audio inpainting on a speech signal
JP6142401B2 (ja) 音声合成モデル学習装置、方法、及びプログラム
Vasudev et al. Speaker identification using FBCC in Malayalam language
JP2012058293A (ja) 無声フィルタ学習装置、音声合成装置、無声フィルタ学習方法、およびプログラム
Yakoumaki et al. Emotional speech classification using adaptive sinusoidal modelling.
KR100488121B1 (ko) 화자간 변별력 향상을 위하여 개인별 켑스트럼 가중치를 적용한 화자 인증 장치 및 그 방법
Pan et al. Comprehensive voice conversion analysis based on DGMM and feature combination
CN116741156A (zh) 基于语义场景的语音识别方法、装置、设备及存储介质
CN116884385A (zh) 语音合成方法、装置及计算机可读存储介质
Apte Innovative wavelet based speech model using optimal mother wavelet generated from pitch synchronous LPC trajectory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14893138

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2947957

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2016567717

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2014395554

Country of ref document: AU

Date of ref document: 20140528

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112016027537

Country of ref document: BR

REEP Request for entry into the european phase

Ref document number: 2014893138

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014893138

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 112016027537

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20161123