EP3113180B1 - Method for performing audio inpainting on a speech signal and apparatus for performing audio inpainting on a speech signal - Google Patents

Method for performing audio inpainting on a speech signal and apparatus for performing audio inpainting on a speech signal Download PDF

Info

Publication number
EP3113180B1
EP3113180B1 EP15306085.0A EP15306085A EP3113180B1 EP 3113180 B1 EP3113180 B1 EP 3113180B1 EP 15306085 A EP15306085 A EP 15306085A EP 3113180 B1 EP3113180 B1 EP 3113180B1
Authority
EP
European Patent Office
Prior art keywords
speech
gap
speech signal
transcript
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP15306085.0A
Other languages
German (de)
French (fr)
Other versions
EP3113180A1 (en
Inventor
Pierre Prablanc
Quang Khanh Ngoc DUONG
Alexey Ozerov
Patrick Perez
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
InterDigital CE Patent Holdings SAS
Original Assignee
InterDigital CE Patent Holdings SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by InterDigital CE Patent Holdings SAS filed Critical InterDigital CE Patent Holdings SAS
Priority to PL15306085T priority Critical patent/PL3113180T3/en
Priority to EP15306085.0A priority patent/EP3113180B1/en
Publication of EP3113180A1 publication Critical patent/EP3113180A1/en
Application granted granted Critical
Publication of EP3113180B1 publication Critical patent/EP3113180B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm

Definitions

  • the present principles relate to a method for performing audio inpainting on a speech signal, and an apparatus for performing audio inpainting on a speech signal.
  • Audio inpainting is the problem of recovering audio samples which are missing or distorted due to, e.g., lost IP packets during a Voice over IP (VoIP) transmission or any other kind of deterioration. Audio inpainting algorithms have various applications ranging from IP packets loss recovery, especially in VoIP or mobile phone, or voice censorship cancelling to various types of damaged audio repairs, including declipping and declicking. Moreover, inpainting might be used for speech modification, e.g., to replace a word of a sequence of words in a speech sequence by some other words. While signal completion has been thoroughly investigated for image and video inpainting, it is much less the case in the context of audio data in general and speech in particular.
  • VoIP Voice over IP
  • Adler et al. [1] introduced an audio inpainting algorithm for the specific purpose of audio declipping, i.e., intended to recover missing audio samples in the time domain that were clipped due to, e.g., limited range of the acquisition device. Also techniques for filling missing segments in the time-frequency domain have been developed [2,3]. However, these methods are not suitable in case of large spectral holes, especially when all frequency bins are missing in certain time frames.
  • Drori et al. [4] proposed another approach to audio inpainting in the spectral domain, relying on exemplar spectral patches taken from the known part of the spectrogram.
  • Bahat et al. [7] proposed a method for filling moderate gaps, e.g.
  • a novel method to fill gaps in speech data while preserving speech meaning and voice characteristics is disclosed in claim 1.
  • the disclosed speech audio inpainting technique plausibly recovers speech parts that are lost due to, e.g., specific audio editing or lossy transmission with the help of synthetic speech generated from the text transcript of the missing part.
  • the synthesized speech is modified based on conventional voice conversion (e.g., as in [5]) to fit with the original speaker's voice.
  • a text transcript of the missing speech part is generated or given, e.g., it can be provided by a user, infered by natural language processing techniques based on the known phrases before and/or after the gap, or available from any other source.
  • the text transcript of the missing speech part is used to complete an obfuscated speech signal. It allows leveraging recent progress of text-to-speech (TTS) synthesizers at generating very natural and high quality speech data.
  • TTS text-to-speech
  • a method for speech inpainting comprises synthesizing speech for a gap that occurs in a speech signal using a transcript of the speech signal, converting the synthesized speech by voice conversion according to the original speech signal, and blending the synthesized converted speech into the original speech signal to fill the gap.
  • the apparatus comprises a speech analyzer that is adapted for detecting a gap in the speech signal, a speech synthesizer that is adapted for performing automatic speech synthesis from text transcript at least for the gap, a voice converter that is adapted for performing voice conversion to adapt the synthesized speech to an original speaker's voice, and a mixer that is adapted for blending of the converted synthesized speech into the original speech audio track.
  • a speech analyzer that is adapted for detecting a gap in the speech signal
  • a speech synthesizer that is adapted for performing automatic speech synthesis from text transcript at least for the gap
  • a voice converter that is adapted for performing voice conversion to adapt the synthesized speech to an original speaker's voice
  • a mixer that is adapted for blending of the converted synthesized speech into the original speech audio track.
  • temporal and/or phase mismatches are removed.
  • Voice conversion is a process that transforms the speech signal from the voice of one person, which is called source speaker, as if it would have been uttered by another person, which is called target speaker.
  • a learning step two steps have to be considered: a learning step and a conversion step.
  • a mapping function is learned to map voice parameters of a source speaker to voice parameters of a target speaker.
  • some training data from both speakers are needed.
  • parallel training data which is a set of sentences uttered by both source and target speakers.
  • the target speaker is the one whose data are missing whereas the "source speaker" is a synthesized speech.
  • target data can be extracted from the surrounding region of the gap or, in case of a famous speaker, in one embodiment it can be retrieved from a database, e.g. on the Internet.
  • training data for the target speaker can be recorded by e.g. asking the target speaker to say some words, utterances or sentences.
  • source data are synthesized with a text-to-speech synthesizer thanks to the transcript of the source speech, in one embodiment.
  • text is extracted from the available speech signal by means of automatic speech recognition (ASR). Then, it is determined that one or more words or sounds are missing due to a gap in the speech signal, a context of the remainder of the speech signal is analyzed, and, according to the context and the remainder, one or more words, sounds or syllables are determined that are omitted by the gap. This can be done by estimating or guessing (e.g., in one embodiment by using a dictionary), or by obtaining from any source a complete transcript of the speech signal that covers at least the gap. It is easier to locate the gap if the complete transcript covers some more speech before and/or after the gap.
  • ASR automatic speech recognition
  • a computer readable medium has stored thereon executable instructions that when executed on a processor cause a processor to perform a method as disclosed above.
  • a method for performing speech inpainting on a speech signal comprises automatically generating a transcript on an input speech signal, determining voice characteristics of the input speech signal, processing the input speech signal, whereby a processed speech signal is obtained, detecting a gap in the processed speech signal, automatically synthesizing from the transcript speech at least for the gap, voice converting the synthesized speech according to the determined voice characteristics, and inpainting the processed speech signal, wherein the voice converted synthesized speech is filled into the gap.
  • an apparatus for performing speech inpainting on a speech signal comprises at least one hardware component, such as a hardware processor, and a non-transitory, tangible, computer-readable storage medium tangibly embodying at least one software component, and when executing on the at least one hardware processor, the software component causes the hardware processor to automatically perform the steps of claim 1.
  • a hardware component such as a hardware processor
  • a non-transitory, tangible, computer-readable storage medium tangibly embodying at least one software component
  • Fig.1 shows a general workflow of a speech inpainting system.
  • An input speech signal has a missing part 10, ie. a gap.
  • a textual transcript of the missing part is available, for example it can be generated from the original speech signal.
  • a speech utterance corresponding to the missing part 10 is synthesized 51 from the known text transcription through text-to-speech synthesis in a TTS synthesis block 11.
  • TTS synthesis systems may synthesize speech only phoneme by phoneme. Thus, if gaps occur in the middle of a phoneme, it is unlikely to recover only the utterance corresponding to the missing part.
  • the generated speech is used for the gap filling.
  • the synthesized speech has generally few similarities with the original speaker in terms of timbre and prosody. Therefore, its spectral features and fundamental frequency (F0) trajectory are adapted via voice conversion 12 to be similar to those of the target speech.
  • the gap is filled by the voice converted synthesized speech signal, which results in an inpainted output signal 13.
  • a conventional speech analysis-synthesis system e.g.[6]
  • a STRAIGHT smooth spectrogram representing the evolution of the vocal tract without time and frequency interference
  • an F0 trajectory and a voice/unvoiced detector and an aperiodic component.
  • the first two parameters are manipulated by voice conversion to modify the speech.
  • a STRAIGHT smooth spectrogram is known e.g. from [6].
  • STRAIGHT is a speech tool for speech analysis and synthesis. It allows flexible manipulations on speech because it decomposes speech in the source-filter model in three parts: a smooth spectrum representing a spectral envelope, a fundamental frequency F0 measurement, and an aperiodic component.
  • the fundamental frequency F0 measurement and the aperiodic component correspond to the source of the source-filter model, while the smooth spectrum representing a spectral envelope corresponds to the filter.
  • the smooth STRAIGHT spectrum is a good representation of the envelope, because STRAIGHT reconstructs the envelope as if it was sampled by the source. Manipulating this spectrum allows us to make good modification of the timbre of the voice.
  • the voice conversion system 12 comprises two steps. First a mapping function is learned on training data, and then it is used to convert new utterances. In order to get the mapping function, parameters to convert are extracted (e.g. with the STRAIGHT system) and aligned with dynamic time warping (DTW [8]). Then the learning phase is performed e.g. with a Gaussian mixture model (GMM [9]) or nonnegative matrix factorization (NMF [10])) to get the mapping function.
  • GMM [9] Gaussian mixture model
  • NMF [10] nonnegative matrix factorization
  • Fig.2 shows different embodiments of a voice conversion system, using a speech database. It is important to note that the original speech samples from the database do not necessarily need to cover the words or context of the current speech signal on which the inpainting is performed.
  • the mapping function allowing to perform the prediction comprises two kind of parameters: general parameters that need to be calculated only once and parameters specific to the utterance that should be calculated for each utterance that is possible to convert.
  • the general parameters may comprise e.g. Gaussian Mixture Model (GMM) parameters for GMM-based voice conversion and/or a phoneme dictionary for Non-negative Matrix Factorization (NMF)-based voice conversion.
  • the specific parameters may comprise posterior probabilities for GMM-based voice conversion and/or temporal activation matrices for NMF-based voice conversion.
  • the user is asked to enter, for a partly available speech signal 22, the speaker's identity in a query 21.
  • the query results in voice characteristics of the speaker, or in original speech samples of the speaker from which the voice characteristics are extracted.
  • the synthesized or original speech samples 23 obtained from a database or from the Internet 24 may be used to fill the gap. This approach may use standard voice conversion 25.
  • voice characteristics of the speaker are retrieved upon a query 26 or automatically from the remaining part of the speech signal 27, which serve as a small set of training data.
  • the synthesized speech for the gap 28 is voice converted 29 using the retrieved voice characteristics from around the gap.
  • two options may be considered to obtain training data, depending on whether the target speaker is a famous person or not. If the target speaker is e.g. well-known, it is generally possible to retrieve characteristic voice data from the Internet via the speaker's identity, or to try guessing the identity with automatic speaker recognition. Otherwise, only local data, i.e. data around the gap or some additional data, are available and the voice conversion system is adapted to the amount of data.
  • Fig.3 shows two embodiments 30,35 for learning voice conversion.
  • a mapping function is learned on training data, and then it is used to convert new utterances.
  • speech is generated from the training data by a text-to-speech block 31,38 (e.g. a speech synthesizer) and voice conversion parameters are extracted (e.g. with the STRAIGHT system) and aligned 32 to the synthesized speech with dynamic time warping (DTW).
  • DTW dynamic time warping
  • the learning phase is performed 33,39, e.g. with a Gaussian mixture model (GMM [9]) or Non-negative Matrix Factorization (NMF [10])), to get the mapping function 34.
  • GMM [9] Gaussian mixture model
  • NMF [10] Non-negative Matrix Factorization
  • only a small amount of training data is available, since only the speech surrounding the gap can be used as reliable speech to extract voice parameters.
  • a large amount of training data can be obtained from a database 36 such as the Internet, and automatic speech recognition 37 is used.
  • a waveform signal is resynthesized, e.g. by a conventional STRAIGHT synthesizer with the new voice parameters.
  • one or more additional steps may need to be performed, since once conversion is performed the resulting speech may still not perfectly fill the gap for the following reasons.
  • edge mismatches such as spectral, fundamental frequency and phase discontinuities may need to be counteracted.
  • spectral trajectories of the formants are naturally smooth due to the slow variation of the vocal tract shape.
  • Fundamental frequency and temporal phase are not as smooth as the spectral trajectories, but still need continuity to sound natural.
  • the speech signal is converted, it is unlikely that the parameters of the spectral envelope trajectory, fundamental frequency and temporal phase are temporally continuous at the border of the gap.
  • the parameters of the spectral envelope trajectory, fundamental frequency and temporal phase are adapted to the ones nearby in the non-missing part of the speech, so that any discontinuity at the border is reduced.
  • duration of the converted utterance may be longer or shorter than the true missing utterance. Therefore, in one embodiment, the speaking rate is converted to match the available part of the speech signal. If the speaking rate cannot be converted, at least a temporal adjustment may be done on the global time scaling of the converted utterance.
  • Fig.4 shows an overview of post-processing of the converted utterance.
  • the converted set of frames may not properly fill the gap. This can be seen as spectral discontinuities 4a.
  • the gaps may be properly filled by finding 41 in the spectral domain the best frames at the end of the converted spectrogram and merging them with the reliable spectrogram of the available portion of speech signal. This can be done by the known dynamic time warping (DTW) algorithm. Aligning converted and reliable spectra is a way to find which data are used to fill the gap. Then, in a similar adjustment to handle phase discontinuities 4b, the best samples to merge are found 42 on the signal waveform.
  • DTW dynamic time warping
  • the converted signal with F0 modification is time-scaled 44 (without pitch modification, in one embodiment) according to the indices found in the phase adjustment step.
  • the length-adjusted (ie. "stretched” or “compressed") signal is overlap-added 45 on the edges to minimize fuzzy artefacts that could still remain.
  • One advantage of the disclosed audio inpainting technique is that even long gaps can be inpainted. It is also robust when only a small amount of data is available in voice conversion.
  • Fig.5 shows a flow-chart of a method for performing speech inpainting on a speech signal, according to one embodiment.
  • the method 50 comprises determining 51 voice characteristics of the speech signal, detecting 52 a gap in the speech signal, automatically synthesizing 53, from a transcript, speech at least for the gap, voice converting 54 the synthesized speech according to the determined voice characteristics, and inpainting 55 the speech signal, wherein the voice converted synthesized speech is filled into the gap.
  • the method further comprises a step of automatically generating 56 said transcript on an input speech signal.
  • the method further comprises a step of processing 57 the voice signal, wherein the gap is generated during the processing, and wherein the transcript is generated before the processing.
  • the step of automatically synthesizing 53 speech at least for the gap comprises retrieving from a database recorded speech data from a natural speaker. This may support, enhance, replace or control the synthesis.
  • the method further comprises steps of detecting 581 that the transcript does not cover the gap, determining 582 one or more words or sounds omitted by the gap, and adding 583 the estimated word or sound to the transcript before synthesizing speech from the transcript.
  • the determining 582 is done by estimating or guessing the one or more words or sounds (e.g. from a dictionary).
  • the determining 582 is done by retrieving a complete transcript of the speech through other channels (e.g. the Internet).
  • the determined voice characteristics comprise parameters for a spectral envelope and a fundamental frequency F0 (or, in other words, it is timbre and prosody of the speech).
  • the method further comprises adapting parameters for a spectral envelope trajectory, a fundamental frequency and temporal phase at one or both boundaries of the gap to match the corresponding parameters of the available adjacent speech signal before and/or after the gap. This is in order for the parameters to be temporally continuous before and/or after the gap.
  • the method further comprises a step of time-scaling the voice-converted speech signal before it is filled into the gap.
  • Fig.6 shows a block diagram of an apparatus 60 for performing speech inpainting on a speech signal, according to one embodiment.
  • the apparatus comprises a speech analyser 61 for detecting a gap G in the speech signal SI, a speech synthesizer 62 for automatically synthesizing from a transcript T speech SS at least for the gap, a voice converter 63 for converting the synthesized speech SS according to the determined voice characteristics VC, and a mixer 64 for inpainting the speech signal, wherein the voice converted synthesized speech VCS is filled into the gap G of the speech signal to obtain an inpainted speech output signal SO.
  • the apparatus further comprises a voice analyzer 65 for determining voice characteristics of the speech signal.
  • the apparatus further comprises a speech-to-text converter 66 for automatically generating a transcript of the speech signal.
  • the apparatus further comprises a database having stored speech data of example phonemes or words of natural speech, and the speech synthesizer 62 retrieves speech data from the database for automatically synthesizing the speech at least for the gap.
  • the apparatus further comprises an interface 67 for receiving a complete transcript of the speech signal, the transcript covering at least text that is omitted by the gap.
  • the apparatus further comprises a time-scaler for time-scaling the voice-converted speech signal before it is filled into the gap.
  • an apparatus for performing speech inpainting on a speech signal comprises a processor and a memory storing instructions that, when executed by the processor, cause the apparatus to perform the method steps of any of the methods disclosed above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)

Description

    Field of the invention
  • The present principles relate to a method for performing audio inpainting on a speech signal, and an apparatus for performing audio inpainting on a speech signal.
  • Background
  • Audio inpainting is the problem of recovering audio samples which are missing or distorted due to, e.g., lost IP packets during a Voice over IP (VoIP) transmission or any other kind of deterioration. Audio inpainting algorithms have various applications ranging from IP packets loss recovery, especially in VoIP or mobile phone, or voice censorship cancelling to various types of damaged audio repairs, including declipping and declicking. Moreover, inpainting might be used for speech modification, e.g., to replace a word of a sequence of words in a speech sequence by some other words. While signal completion has been thoroughly investigated for image and video inpainting, it is much less the case in the context of audio data in general and speech in particular.
  • Adler et al. [1] introduced an audio inpainting algorithm for the specific purpose of audio declipping, i.e., intended to recover missing audio samples in the time domain that were clipped due to, e.g., limited range of the acquisition device. Also techniques for filling missing segments in the time-frequency domain have been developed [2,3]. However, these methods are not suitable in case of large spectral holes, especially when all frequency bins are missing in certain time frames. Drori et al. [4] proposed another approach to audio inpainting in the spectral domain, relying on exemplar spectral patches taken from the known part of the spectrogram. Bahat et al. [7] proposed a method for filling moderate gaps, e.g. corresponding to the loss of several successive IP packets, and especially in the case of speech signals. These approaches are based on self-similarity of some speech features within the signal and thus perform poorly if the missing part is actually very different from the rest. The known approaches, including the speech-specific method in [7], are unable to cope with situations where quite large temporal gaps are missing in a speech signal. For example, one such gap can cover one entire word or a sequence of words. Indeed, methods based on audio patch similarity or speech feature similarities are unable to recreate entire missing words.
  • Summary of the Invention
  • A novel method to fill gaps in speech data while preserving speech meaning and voice characteristics is disclosed in claim 1.
  • It has been found that, if a gap occurs in a speech signal, it is very helpful to use any kind of information in order to fill the gap, and that it is possible to fill the gap by using a text transcript of the corresponding utterance. The disclosed speech audio inpainting technique plausibly recovers speech parts that are lost due to, e.g., specific audio editing or lossy transmission with the help of synthetic speech generated from the text transcript of the missing part. The synthesized speech is modified based on conventional voice conversion (e.g., as in [5]) to fit with the original speaker's voice.
  • A text transcript of the missing speech part is generated or given, e.g., it can be provided by a user, infered by natural language processing techniques based on the known phrases before and/or after the gap, or available from any other source. The text transcript of the missing speech part is used to complete an obfuscated speech signal. It allows leveraging recent progress of text-to-speech (TTS) synthesizers at generating very natural and high quality speech data.
  • In principle, a method for speech inpainting comprises synthesizing speech for a gap that occurs in a speech signal using a transcript of the speech signal, converting the synthesized speech by voice conversion according to the original speech signal, and blending the synthesized converted speech into the original speech signal to fill the gap.
  • An apparatus for performing speech inpainting on a speech signal is disclosed in claim 11. The apparatus comprises a speech analyzer that is adapted for detecting a gap in the speech signal, a speech synthesizer that is adapted for performing automatic speech synthesis from text transcript at least for the gap, a voice converter that is adapted for performing voice conversion to adapt the synthesized speech to an original speaker's voice, and a mixer that is adapted for blending of the converted synthesized speech into the original speech audio track. In one embodiment of the mixer, temporal and/or phase mismatches are removed.
  • Voice conversion is a process that transforms the speech signal from the voice of one person, which is called source speaker, as if it would have been uttered by another person, which is called target speaker. In a usual voice conversion workflow, two steps have to be considered: a learning step and a conversion step. During the learning step, a mapping function is learned to map voice parameters of a source speaker to voice parameters of a target speaker. To model differences between the two speakers, some training data from both speakers are needed. For conversion within the same language, it is more conventional to use parallel training data, which is a set of sentences uttered by both source and target speakers. In the present case, the target speaker is the one whose data are missing whereas the "source speaker" is a synthesized speech. For the training, target data can be extracted from the surrounding region of the gap or, in case of a famous speaker, in one embodiment it can be retrieved from a database, e.g. on the Internet. In another embodiment, training data for the target speaker can be recorded by e.g. asking the target speaker to say some words, utterances or sentences. Then source data are synthesized with a text-to-speech synthesizer thanks to the transcript of the source speech, in one embodiment.
  • In one embodiment, text is extracted from the available speech signal by means of automatic speech recognition (ASR). Then, it is determined that one or more words or sounds are missing due to a gap in the speech signal, a context of the remainder of the speech signal is analyzed, and, according to the context and the remainder, one or more words, sounds or syllables are determined that are omitted by the gap. This can be done by estimating or guessing (e.g., in one embodiment by using a dictionary), or by obtaining from any source a complete transcript of the speech signal that covers at least the gap. It is easier to locate the gap if the complete transcript covers some more speech before and/or after the gap.
  • All following occurrences of the words "embodiment(s)" and "implementation(s)", if referring to feature combinations different from those defined by the independent claims, refer to examples which were originally filed but which do not represent embodiments/implementations of the presently claimed invention; these examples are still shown for illustrative purposes only.
  • In one embodiment, a computer readable medium has stored thereon executable instructions that when executed on a processor cause a processor to perform a method as disclosed above.
  • It is clear that in case of fully missing words it may in general simply be impossible to recover the missing speech, because it is not known what was said. At least some embodiments of the present principles provide a solution for this case, for example by generating the transcript based on the undistorted speech signal.
  • In one embodiment, a method for performing speech inpainting on a speech signal comprises automatically generating a transcript on an input speech signal, determining voice characteristics of the input speech signal, processing the input speech signal, whereby a processed speech signal is obtained, detecting a gap in the processed speech signal, automatically synthesizing from the transcript speech at least for the gap, voice converting the synthesized speech according to the determined voice characteristics, and inpainting the processed speech signal, wherein the voice converted synthesized speech is filled into the gap.
  • In one embodiment, an apparatus for performing speech inpainting on a speech signal comprises at least one hardware component, such as a hardware processor, and a non-transitory, tangible, computer-readable storage medium tangibly embodying at least one software component, and when executing on the at least one hardware processor, the software component causes the hardware processor to automatically perform the steps of claim 1.
  • Advantageous embodiments of the invention are disclosed in the dependent claims, the following description and the figures.
  • Brief description of the drawings
  • Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in
  • Fig.1
    a general workflow of a speech inpainting system;
    Fig.2
    embodiments of a voice conversion system;
    Fig.3
    two embodiments for learning voice conversion,
    Fig.4
    an overview of post-processing of the converted utterance,
    Fig.5
    a flow-chart of a method for performing speech inpainting, and
    Fig.6
    a block diagram of an apparatus for performing speech inpainting.
    Detailed description of the invention
  • Fig.1 shows a general workflow of a speech inpainting system. An input speech signal has a missing part 10, ie. a gap. A textual transcript of the missing part is available, for example it can be generated from the original speech signal. A speech utterance corresponding to the missing part 10 is synthesized 51 from the known text transcription through text-to-speech synthesis in a TTS synthesis block 11. However, such TTS synthesis systems may synthesize speech only phoneme by phoneme. Thus, if gaps occur in the middle of a phoneme, it is unlikely to recover only the utterance corresponding to the missing part. In one embodiment, it is more appropriate to synthesize as well the first and the last phoneme, corresponding to the beginning and the end of the missing part respectively, to reproduce the linguistic information because of the pronunciation context. It is also advantageous because it avoids speech discontinuities issues. After automatic speech synthesis 11, the generated speech is used for the gap filling. However, the synthesized speech has generally few similarities with the original speaker in terms of timbre and prosody. Therefore, its spectral features and fundamental frequency (F0) trajectory are adapted via voice conversion 12 to be similar to those of the target speech. Finally, the gap is filled by the voice converted synthesized speech signal, which results in an inpainted output signal 13. In the general pipeline, a conventional speech analysis-synthesis system (e.g.[6]) is used. This system enables performing flexible modifications on speech signals without loss of naturalness. In one embodiment, three parameters are extracted from the input signal: a STRAIGHT smooth spectrogram representing the evolution of the vocal tract without time and frequency interference, an F0 trajectory and a voice/unvoiced detector, and an aperiodic component. The first two parameters are manipulated by voice conversion to modify the speech. A STRAIGHT smooth spectrogram is known e.g. from [6]. STRAIGHT is a speech tool for speech analysis and synthesis. It allows flexible manipulations on speech because it decomposes speech in the source-filter model in three parts: a smooth spectrum representing a spectral envelope, a fundamental frequency F0 measurement, and an aperiodic component. Basically, the fundamental frequency F0 measurement and the aperiodic component correspond to the source of the source-filter model, while the smooth spectrum representing a spectral envelope corresponds to the filter. The smooth STRAIGHT spectrum is a good representation of the envelope, because STRAIGHT reconstructs the envelope as if it was sampled by the source. Manipulating this spectrum allows us to make good modification of the timbre of the voice.
  • In one embodiment, the voice conversion system 12 comprises two steps. First a mapping function is learned on training data, and then it is used to convert new utterances. In order to get the mapping function, parameters to convert are extracted (e.g. with the STRAIGHT system) and aligned with dynamic time warping (DTW [8]). Then the learning phase is performed e.g. with a Gaussian mixture model (GMM [9]) or nonnegative matrix factorization (NMF [10])) to get the mapping function.
  • Fig.2 shows different embodiments of a voice conversion system, using a speech database. It is important to note that the original speech samples from the database do not necessarily need to cover the words or context of the current speech signal on which the inpainting is performed. The mapping function allowing to perform the prediction comprises two kind of parameters: general parameters that need to be calculated only once and parameters specific to the utterance that should be calculated for each utterance that is possible to convert. The general parameters may comprise e.g. Gaussian Mixture Model (GMM) parameters for GMM-based voice conversion and/or a phoneme dictionary for Non-negative Matrix Factorization (NMF)-based voice conversion. The specific parameters may comprise posterior probabilities for GMM-based voice conversion and/or temporal activation matrices for NMF-based voice conversion.
  • In one embodiment, where the speaker is a well-known person for whom many original speech samples can be retrieved from the Internet, the user is asked to enter, for a partly available speech signal 22, the speaker's identity in a query 21. The query results in voice characteristics of the speaker, or in original speech samples of the speaker from which the voice characteristics are extracted. The synthesized or original speech samples 23 obtained from a database or from the Internet 24 may be used to fill the gap. This approach may use standard voice conversion 25.
  • In another embodiment, where it is not possible to obtain sufficient original speech samples (e.g. because the speaker is not famous), voice characteristics of the speaker are retrieved upon a query 26 or automatically from the remaining part of the speech signal 27, which serve as a small set of training data. The synthesized speech for the gap 28 is voice converted 29 using the retrieved voice characteristics from around the gap.
  • Thus, two options may be considered to obtain training data, depending on whether the target speaker is a famous person or not. If the target speaker is e.g. well-known, it is generally possible to retrieve characteristic voice data from the Internet via the speaker's identity, or to try guessing the identity with automatic speaker recognition. Otherwise, only local data, i.e. data around the gap or some additional data, are available and the voice conversion system is adapted to the amount of data.
  • Fig.3 shows two embodiments 30,35 for learning voice conversion. As described above, a mapping function is learned on training data, and then it is used to convert new utterances. In order to get the mapping function 34, speech is generated from the training data by a text-to-speech block 31,38 (e.g. a speech synthesizer) and voice conversion parameters are extracted (e.g. with the STRAIGHT system) and aligned 32 to the synthesized speech with dynamic time warping (DTW). Then the learning phase is performed 33,39, e.g. with a Gaussian mixture model (GMM [9]) or Non-negative Matrix Factorization (NMF [10])), to get the mapping function 34. In one embodiment 30, only a small amount of training data is available, since only the speech surrounding the gap can be used as reliable speech to extract voice parameters. In another embodiment 33, a large amount of training data can be obtained from a database 36 such as the Internet, and automatic speech recognition 37 is used.
  • After speech parameters are converted thanks to the mapping function 34, a waveform signal is resynthesized, e.g. by a conventional STRAIGHT synthesizer with the new voice parameters.
  • In some embodiments, one or more additional steps may need to be performed, since once conversion is performed the resulting speech may still not perfectly fill the gap for the following reasons. First, edge mismatches such as spectral, fundamental frequency and phase discontinuities may need to be counteracted. Indeed, spectral trajectories of the formants are naturally smooth due to the slow variation of the vocal tract shape. Fundamental frequency and temporal phase are not as smooth as the spectral trajectories, but still need continuity to sound natural. Although the speech signal is converted, it is unlikely that the parameters of the spectral envelope trajectory, fundamental frequency and temporal phase are temporally continuous at the border of the gap. Thus, in one embodiment, the parameters of the spectral envelope trajectory, fundamental frequency and temporal phase are adapted to the ones nearby in the non-missing part of the speech, so that any discontinuity at the border is reduced. Besides, duration of the converted utterance may be longer or shorter than the true missing utterance. Therefore, in one embodiment, the speaking rate is converted to match the available part of the speech signal. If the speaking rate cannot be converted, at least a temporal adjustment may be done on the global time scaling of the converted utterance.
  • A method dealing with these issues is briefly outlined in Fig.4, which shows an overview of post-processing of the converted utterance. First, the converted set of frames may not properly fill the gap. This can be seen as spectral discontinuities 4a. According to an embodiment of the present principles, the gaps may be properly filled by finding 41 in the spectral domain the best frames at the end of the converted spectrogram and merging them with the reliable spectrogram of the available portion of speech signal. This can be done by the known dynamic time warping (DTW) algorithm. Aligning converted and reliable spectra is a way to find which data are used to fill the gap. Then, in a similar adjustment to handle phase discontinuities 4b, the best samples to merge are found 42 on the signal waveform. Such issue appears when for instance speech is voice converted and the waveform signal has the particularity to be periodic. This property is used in cross-correlation between the edges of the reliable signal and the beginning of the converted signal. Peaks in the cross-correlation point out best indices to merge both signals. Then, a fundamental frequency F0 trajectory is modified 43 so that F0 and F0 derivative (dF0/dt) discontinuities 4c are minimized especially on the edges of the converted signal 4d. The F0 trajectory can be computed in the same way as for spectral parameters. The edges of the resulting signal are "allocated" to gap edges. However, the body of the signal may not be suited to the gap: it may be too large or too small. Therefore, in one embodiment the converted signal with F0 modification is time-scaled 44 (without pitch modification, in one embodiment) according to the indices found in the phase adjustment step. Finally, the length-adjusted (ie. "stretched" or "compressed") signal is overlap-added 45 on the edges to minimize fuzzy artefacts that could still remain.
  • One advantage of the disclosed audio inpainting technique is that even long gaps can be inpainted. It is also robust when only a small amount of data is available in voice conversion.
  • Fig.5 shows a flow-chart of a method for performing speech inpainting on a speech signal, according to one embodiment. The method 50 comprises determining 51 voice characteristics of the speech signal, detecting 52 a gap in the speech signal, automatically synthesizing 53, from a transcript, speech at least for the gap, voice converting 54 the synthesized speech according to the determined voice characteristics, and inpainting 55 the speech signal, wherein the voice converted synthesized speech is filled into the gap.
  • In one embodiment, the method further comprises a step of automatically generating 56 said transcript on an input speech signal.
  • In one embodiment, the method further comprises a step of processing 57 the voice signal, wherein the gap is generated during the processing, and wherein the transcript is generated before the processing.
  • In one embodiment, the step of automatically synthesizing 53 speech at least for the gap comprises retrieving from a database recorded speech data from a natural speaker. This may support, enhance, replace or control the synthesis.
  • In one embodiment, the method further comprises steps of detecting 581 that the transcript does not cover the gap, determining 582 one or more words or sounds omitted by the gap, and adding 583 the estimated word or sound to the transcript before synthesizing speech from the transcript.
  • In one embodiment, the determining 582 is done by estimating or guessing the one or more words or sounds (e.g. from a dictionary).
  • In one embodiment, the determining 582 is done by retrieving a complete transcript of the speech through other channels (e.g. the Internet).
  • In one embodiment, the determined voice characteristics comprise parameters for a spectral envelope and a fundamental frequency F0 (or, in other words, it is timbre and prosody of the speech).
  • In one embodiment, the method further comprises adapting parameters for a spectral envelope trajectory, a fundamental frequency and temporal phase at one or both boundaries of the gap to match the corresponding parameters of the available adjacent speech signal before and/or after the gap. This is in order for the parameters to be temporally continuous before and/or after the gap.
  • In one embodiment, the method further comprises a step of time-scaling the voice-converted speech signal before it is filled into the gap.
  • Fig.6 shows a block diagram of an apparatus 60 for performing speech inpainting on a speech signal, according to one embodiment. The apparatus comprises a speech analyser 61 for detecting a gap G in the speech signal SI, a speech synthesizer 62 for automatically synthesizing from a transcript T speech SS at least for the gap, a voice converter 63 for converting the synthesized speech SS according to the determined voice characteristics VC, and a mixer 64 for inpainting the speech signal, wherein the voice converted synthesized speech VCS is filled into the gap G of the speech signal to obtain an inpainted speech output signal SO.
  • In one embodiment, the apparatus further comprises a voice analyzer 65 for determining voice characteristics of the speech signal.
  • In one embodiment, the apparatus further comprises a speech-to-text converter 66 for automatically generating a transcript of the speech signal.
  • In one embodiment, the apparatus further comprises a database having stored speech data of example phonemes or words of natural speech, and the speech synthesizer 62 retrieves speech data from the database for automatically synthesizing the speech at least for the gap.
  • In one embodiment, the apparatus further comprises an interface 67 for receiving a complete transcript of the speech signal, the transcript covering at least text that is omitted by the gap.
  • In one embodiment, the apparatus further comprises a time-scaler for time-scaling the voice-converted speech signal before it is filled into the gap.
  • In one embodiment, an apparatus for performing speech inpainting on a speech signal comprises a processor and a memory storing instructions that, when executed by the processor, cause the apparatus to perform the method steps of any of the methods disclosed above.
  • It is noted that the use of the verb "comprise" and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. Furthermore, the use of the article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Furthermore, the invention resides in each and every novel feature or combination of features.
  • It should be noted that although the STRAIGHT system is mentioned, other types of speech analysis and synthesis systems may be used other than STRAIGHT, as would be apparent to those of ordinary skill in the art, all of which are contemplated within the scope of the invention.
  • While there has been shown, described, and pointed out fundamental novel features of the present invention as applied to preferred embodiments thereof, it will be understood that various omissions and substitutions and changes in the apparatus and method described, in the form and details of the devices disclosed, and in their operation, may be made by those skilled in the art without departing from the scope of the present invention. It is expressly intended that all combinations of those elements that perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Substitutions of elements from one described embodiment to another are also fully intended and contemplated. Each feature disclosed in the description and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination. Features may, where appropriate be implemented in hardware, software, or a combination of the two. Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.
  • The scope of the present invention is defined in the appended claims.
  • Cited References
    1. [1] Amir Adler, Valentin Emiya, Maria Jafari, Michael Elad, Remi Gribonval, Mark D. Plumbley, "Audio inpainting," IEEE Transactions on Audio, Speech and Language Processing, IEEE, 2012, 20 (3), pp. 922 - 932 , XP011397627
    2. [2] P. Smaragdis et al. "Missing data imputation for spectral audio signal," Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2009
    3. [3] J. Le Roux et al., "Computational auditory induction as a missing data model-fitting problem with Bregman divergence," Speech Communication, vol. 53, no. 5, pp. 658-676, 2011
    4. [4] I. Drori et al. "Spectral sound gap filling," Proc. ICPR 2004, pp. 871-874
    5. [5] Jani Nurminen, Hanna Silen, Victor Popa, Elina Helander and Moncef Gabbouj (2012). "Voice Conversion, Speech Enhancement, Modeling and Recognition- Algorithms and Applications", Dr. S Ramakrishnan (Ed.), ISBN: 978-953-51-0291-5, InTech, DOI: 10.5772/37334.
    6. [6] Hideki Kawahara. Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited. In 1997 IEEE Inter- national Conference on Acoustics, Speech, and Signal Processing, ICASSP '97, Munich, Germany, April 21-24, 1997, pages 1303-1306, 1997.
    7. [7] Y. Bahat, Y. Y. Schechner, and M. Elad, "Self-content-based audio inpainting," Signal Processing, vol. 111, pp. 61-72, 2015.
    8. [8] D. Ellis (2003). Dynamic Time Warp (DTW) in Matlab, Web resource, available at http://www.ee.columbia.edu/-dpwe/resources/matlab/dtw/. Visited 4/29/2015.
    9. [9] Toda, T.; Black, A.W.; Tokuda, K., "Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory," Audio, Speech, and Language Processing, IEEE Transactions on, vol.15, no.8, pp.2222,2235, Nov. 2007
    10. [10] Aihara, R.; Nakashika, T.; Takiguchi, T.; Ariki, Y., "Voice conversion based on Non-negative matrix factorization using phoneme-categorized dictionary," Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on , vol., no., pp.7894,7898, 4-9 May 2014

Claims (15)

  1. A method (50) comprising:
    - obtaining (51) voice characteristics of a speech signal;
    - detecting (52) a gap in the speech signal;
    - automatically synthesizing (53), from a transcript, speech at least for the gap in the speech signal;
    - voice converting (54) the synthesized speech according to the obtained voice characteristics of the speech signal; and
    - inpainting (55) the speech signal, wherein the voice converted synthesized speech is filled into the gap.
  2. The method of claim 1, comprising automatically generating (56) said transcript from the speech signal.
  3. The method according to claim 1 or 2, comprising processing (57) the speech signal, wherein the gap occurs during the processing, and wherein the transcript is generated before the processing.
  4. The method according to any of claims 1-3, wherein automatically synthesizing (53), from a transcript, speech at least for the gap comprises retrieving from a database recorded speech data from a natural speaker.
  5. The method according to any of claims 1-4, comprising
    - detecting (581) that the transcript does not cover the gap;
    - determining (582) one or more words or sounds omitted within the gap; and
    - adding (583) the determined word or sound to the transcript before synthesizing speech from the transcript.
  6. The method according to claim 5, wherein the determining (582) is done by estimating or guessing the one or more words or sounds.
  7. The method according to claim 5, wherein the determining (582) is done by retrieving a complete transcript of the speech through other channels.
  8. The method according to any of claims 1-7, wherein the voice characteristics comprise parameters for a spectral envelope and a fundamental frequency.
  9. The method according to one of the claims 1-8, comprises adapting parameters for a spectral envelope trajectory, a fundamental frequency and a temporal phase at one or both boundaries of the gap to match the corresponding parameters of the available adjacent speech signal before and/or after the gap.
  10. The method according to one of the claims 1-9, comprising time-scaling the voice-converted speech signal before it is filled into the gap.
  11. An apparatus (60) comprising:
    - a speech analyser (61) for detecting gap in a speech signal;
    - a speech synthesizer (62) for automatically synthesizing, from a transcript, speech at least for a gap in the speech signal;
    - means for obtaining voice characteristics of the speech signal;
    - a voice converter (63) for converting the synthesized speech according to the obtained voice characteristics of the speech signal; and
    - a mixer (64) for inpainting the speech signal, wherein the voice converted synthesized speech is filled into the gap of the speech signal.
  12. The apparatus of claim 11, wherein said means for obtaining comprises a voice analyzer (65) for obtaining the voice characteristics of the speech signal.
  13. The apparatus of claim 11 or 12, comprising a speech-to-text converter (66) for automatically generating a transcript of the speech signal.
  14. The apparatus of one of the claims 11-13, comprising a database having stored speech data of example phonemes or words of natural speech, wherein the speech synthesizer (62) retrieves speech data from the database for automatically synthesizing the speech at least for the gap.
  15. The apparatus of one of the claims 11-14, comprising an interface (67) for receiving a complete transcript of the speech signal, the transcript covering at least text that is omitted by the gap.
EP15306085.0A 2015-07-02 2015-07-02 Method for performing audio inpainting on a speech signal and apparatus for performing audio inpainting on a speech signal Active EP3113180B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PL15306085T PL3113180T3 (en) 2015-07-02 2015-07-02 Method for performing audio inpainting on a speech signal and apparatus for performing audio inpainting on a speech signal
EP15306085.0A EP3113180B1 (en) 2015-07-02 2015-07-02 Method for performing audio inpainting on a speech signal and apparatus for performing audio inpainting on a speech signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP15306085.0A EP3113180B1 (en) 2015-07-02 2015-07-02 Method for performing audio inpainting on a speech signal and apparatus for performing audio inpainting on a speech signal

Publications (2)

Publication Number Publication Date
EP3113180A1 EP3113180A1 (en) 2017-01-04
EP3113180B1 true EP3113180B1 (en) 2020-01-22

Family

ID=53610835

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15306085.0A Active EP3113180B1 (en) 2015-07-02 2015-07-02 Method for performing audio inpainting on a speech signal and apparatus for performing audio inpainting on a speech signal

Country Status (2)

Country Link
EP (1) EP3113180B1 (en)
PL (1) PL3113180T3 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3281194B1 (en) * 2015-04-10 2019-05-01 Dolby International AB Method for performing audio restauration, and apparatus for performing audio restauration
JP6452061B1 (en) * 2018-08-10 2019-01-16 クリスタルメソッド株式会社 Learning data generation method, learning method, and evaluation apparatus
US11356492B2 (en) * 2020-09-16 2022-06-07 Kyndryl, Inc. Preventing audio dropout

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117614B (en) * 2010-01-05 2013-01-02 索尼爱立信移动通讯有限公司 Personalized text-to-speech synthesis and personalized speech feature extraction
US9583111B2 (en) * 2013-07-17 2017-02-28 Technion Research & Development Foundation Ltd. Example-based audio inpainting

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
PL3113180T3 (en) 2020-06-01
EP3113180A1 (en) 2017-01-04

Similar Documents

Publication Publication Date Title
EP3855340B1 (en) Cross-lingual voice conversion system and method
US10733974B2 (en) System and method for synthesis of speech from provided text
US10255903B2 (en) Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
AU2020227065B2 (en) Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
EP4018439B1 (en) Systems and methods for adapting human speaker embeddings in speech synthesis
US10706867B1 (en) Global frequency-warping transformation estimation for voice timbre approximation
EP3113180B1 (en) Method for performing audio inpainting on a speech signal and apparatus for performing audio inpainting on a speech signal
CN116994553A (en) Training method of speech synthesis model, speech synthesis method, device and equipment
CA3004700C (en) Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
US10446133B2 (en) Multi-stream spectral representation for statistical parametric speech synthesis
JP7040258B2 (en) Pronunciation converter, its method, and program
KR102051235B1 (en) System and method for outlier identification to remove poor alignments in speech synthesis
JP5245962B2 (en) Speech synthesis apparatus, speech synthesis method, program, and recording medium
KR20100111544A (en) System for proofreading pronunciation using speech recognition and method therefor
US11302300B2 (en) Method and apparatus for forced duration in neural speech synthesis
JP6468518B2 (en) Basic frequency pattern prediction apparatus, method, and program
CN116884385A (en) Speech synthesis method, device and computer readable storage medium
CN114299912A (en) Speech synthesis method and related device, equipment and storage medium
Chomwihoke et al. Comparative study of text-to-speech synthesis techniques for mobile linguistic translation process
Qian et al. A unified trajectory tiling approach to high quality TTS and cross-lingual voice transformation

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20171005

RBV Designated contracting states (corrected)

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTG Intention to grant announced

Effective date: 20180115

GRAJ Information related to disapproval of communication of intention to grant by the applicant or resumption of examination proceedings by the epo deleted

Free format text: ORIGINAL CODE: EPIDOSDIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

INTC Intention to grant announced (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20180618

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: INTERDIGITAL CE PATENT HOLDINGS

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTG Intention to grant announced

Effective date: 20190926

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 1227425

Country of ref document: AT

Kind code of ref document: T

Effective date: 20200215

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602015045941

Country of ref document: DE

REG Reference to a national code

Ref country code: NL

Ref legal event code: FP

REG Reference to a national code

Ref country code: NO

Ref legal event code: T2

Effective date: 20200122

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200614

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200522

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200423

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200422

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602015045941

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1227425

Country of ref document: AT

Kind code of ref document: T

Effective date: 20200122

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20201023

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20200731

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200731

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200702

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200702

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200731

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200731

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: MT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200122

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230511

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: NO

Payment date: 20230719

Year of fee payment: 9

Ref country code: GB

Payment date: 20230725

Year of fee payment: 9

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230725

Year of fee payment: 9

Ref country code: DE

Payment date: 20230726

Year of fee payment: 9

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: PL

Payment date: 20240621

Year of fee payment: 10

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: NL

Payment date: 20240725

Year of fee payment: 10