EP1273003B1 - Verfahren und vorrichtung zum bestimmen prosodischer markierungen - Google Patents
Verfahren und vorrichtung zum bestimmen prosodischer markierungen Download PDFInfo
- Publication number
- EP1273003B1 EP1273003B1 EP01940136A EP01940136A EP1273003B1 EP 1273003 B1 EP1273003 B1 EP 1273003B1 EP 01940136 A EP01940136 A EP 01940136A EP 01940136 A EP01940136 A EP 01940136A EP 1273003 B1 EP1273003 B1 EP 1273003B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- prosodic
- neural network
- input
- autoassociators
- neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims description 32
- 238000013528 artificial neural network Methods 0.000 claims description 40
- 238000012549 training Methods 0.000 claims description 28
- 230000001537 neural effect Effects 0.000 claims description 11
- 210000002569 neuron Anatomy 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 6
- 239000003550 marker Substances 0.000 claims description 6
- 239000013598 vector Substances 0.000 description 30
- 238000013459 approach Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 4
- 239000000463 material Substances 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000002243 precursor Substances 0.000 description 2
- 229930091051 Arenine Natural products 0.000 description 1
- 241001672694 Citrus reticulata Species 0.000 description 1
- 101100154785 Mus musculus Tulp2 gene Proteins 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 230000001944 accentuation Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Definitions
- the present invention relates to a method for determining prosodic markers and a device for implementation of the procedure.
- phrase boundaries can be used as prosodic markers and word accents (pitch-accent) serve. Be under phrases Groupings of words understood within one Textes are usually spoken together, so without intervening inserted lying pauses. pauses lie only at the respective ends of the phrases, the phrase boundaries, at. By inserting such breaks to the Phrase boundaries of synthesized speech become their intelligibility and naturalness significantly increased.
- stage 1 prepare such a two-stage approach both the stable prediction and determination of phrase boundaries as well as accents problems.
- a Process for the preparation and structuring of an unknown to create spoken text with a smaller one Training text can be trained and about similar detection rates as known processes achieved with larger Texts are trained.
- prosodic Tags based on a neural network determined by linguistic categories.
- linguistic categories Depending on the respective language of a text are subdivisions of the words known in various linguistic categories. In the German language are used in the context of this invention, for example 14 categories, for the English language e.g. 23 Categories provided. Being aware of these categories a neural network is trained to recognize structures can and so on based on groupings of e.g. 3 to 15 consecutive words a prosodic marker predicts or determines.
- the capturing of the properties any prosodic labeling by neuronal autoassociators and evaluating the output from each of the car associators detailed source information, referred to as so-called Error vector is present in a neural classifier includes.
- neural networks By the inventive use of neural networks is enabled in generating prosodic parameters for Speech synthesis systems accurately predict phrase boundaries.
- the neural network according to the invention is robust against "less” or a small amount of training material (sparse training material).
- neural networks allows time and money saving Training method and a flexible application a method according to the invention and a corresponding Device on any languages. It is little in addition prepared information and little expert knowledge to initialize of such a system of a particular language required.
- the neural network according to the invention is therefore well suited to having a multilingual TTS system Synthesize texts from multiple languages. Since the invention neural networks trained without expert knowledge they can be cheaper than known ones Method for determining phrase boundaries to be initialized.
- the two-stage structure comprises several Autoassociators, each with a phrasing strength be trained for all to be evaluated linguistic classes.
- parts of the neural network are class specific educated.
- the training material is usually statistically asymmetrical, that is, many words without phrase borders, but only a few with phrase boundaries are.
- the state of the art Technology becomes a dominance within a neural network thereby avoiding a class specific training of respective car associates is performed.
- FIG. 1 schematically shows a neural network 1 according to the invention with an input 2, an intermediate layer 3 and an output 4 for determining prosodic markings.
- the input 2 is made up of nine input groups 5 for performing a part-of-speech (POS) sequence analysis.
- POS part-of-speech
- Each of the input group 5 includes in adaptation to the German language 14 neurons 6, which are not all shown in Fig. 1 for reasons of clarity. So there is one neuron 6 each for a linguistic category.
- linguistic categories are subdivided as follows: linguistic categories category description NUM numeral VERB verbs VPART Verbp orientation PRON pronoun PREP prepositions NOUN Nouns, proper names PART particle DET items CONJ conjunctions ADV adverbs ADJ adjectives PDET PREP + DET INTJ interjections PUNCT punctuation mark
- the output 4 is by a neuron with a continuous Trained, that means that the output values all Values of a certain range of numbers, e.g. all real Numbers between 0 and 1 may include.
- Fig. 1 are nine Input groups 5 for entering the categories of the individual Words provided.
- To the middle input group 5a is the Created category of the word to be determined whether there is a phase boundary at the end of the word or not Phase boundary is present.
- To the four input groups 5b on the left side of the input group 5a are the categories of the forerunners of the word under investigation and the the right side arranged input groups 5c the successors of the word to be examined.
- Precursors are all words in the context immediately before the one to be examined Word are arranged.
- Successors are all words, in the context immediately following that to be examined Word are arranged. This is achieved with the inventive neural network 1 of Fig. 1 is a context by Max. evaluated nine words.
- the category of the to be examined Word applied to the input group 5a that is, to the neuron 6, which corresponds to the category of the word that Value +1 and to the remaining neurons 6 of the input group 5a the value -1 is created.
- the Categories of the four preceding to the word to be examined or subsequent words to the input groups 5b, or 5c created. If no corresponding precursors or Successor be present as it is e.g. at the beginning and at the end of a text are sent to the neurons 6 of the corresponding Input groups 5b, 5c, the value 0 is applied.
- Another input group 5d is for inputting the previous one Phrase boundaries provided. At this input group 5d The last nine phrase boundaries can be entered.
- a convenient subdivision of the linguistic categories of the English language comprises 23 categories such that the dimension of the input space is 216.
- the input data form an input vector x with the dimension m.
- the neural network according to the invention is equipped with a training file trains who have a text and the information too includes the phrase boundaries of the text. These phrase boundaries can contain purely binary values, that is, only information, if there is a phrase boundary or if no Phrase boundary exists. Will the neural network with a training such a training file, so the output is on Output 4 binary. The output 4 generates continuous Output values, but by means of a threshold value decision be assigned to discrete values.
- the output contains not only binary values, but multi-level values, that is, information about the strength of the phrase boundary be taken into account.
- This is the neural network to train with a training file containing multi-level information to the phrase boundaries.
- the gradation can from two stages to any number of stages, so that a quasi-continuous output can be achieved can.
- Fig. 3 is an example sentence with a three-stage evaluation with the output values 0 for no phrase boundary, 1 for a primary phrase boundary and 2 for a secondary phrase boundary shown.
- secondary is located a secondary phrase boundary and the terms "Phrase boundary” and "required” a primary phrase boundary.
- Fig. 4 is a preferred embodiment of the invention represented neural network.
- This neural network again comprises an input 2, which in Fig. 4 only is shown schematically as an element, but just like the input 2 of Fig. 1 is constructed.
- the intermediate layer 3 There are several autoassociators in this embodiment 7 (AA1, AA2, AA3) which each have a model for one represent predetermined phrasing strength.
- the car associates 7 are subnetworks that detect a specific Phrasing strength are trained.
- the output of the Autoassoziatoren 7 is connected to a classifier 8.
- the classifier 8 is another neural subnetwork, this also the already described with reference to FIG. 1 output includes.
- the embodiment shown in Fig. 4 comprises three autoassociators, with each car associate a specific Phrasing strength can be detected, so that this embodiment for the detection of two different phrasing strengths and the absence of any phrasing limit suitable is.
- Each car associate will be using the data of the class he is using represents, trains. That is, each car associator with the belonging to the phrasing strength he represents Data is trained.
- the autoassociators map the m-dimensional input vector x to an n-dimensional vector z , where n ⁇ m.
- the vector z is mapped to an output vector x '.
- the mappings are done by means of matrices w 1 ⁇ R n ⁇ m and w 2 ⁇ R n ⁇ m .
- the autoassociators are trained so that their output vectors x ' match as closely as possible with the input vectors x ( Figure 5 left side). As a result, the information of the m-dimensional input vector x is compressed to the n-dimensional vector z. It assumes that no information is lost and the model captures the properties of the class.
- the compression ratio m: n of the individual Autoassoziatoren may be different.
- an error vector e rec (x-x ') 2 is calculated for each auto-associate (FIG. 5, right-hand side). The squaring takes place elementwise.
- This error vector e rec is a "dimension" that x 'x corresponds to the distance of the vector to the input vector and is thus inversely proportional to the probability that the assigned to the respective autoassociator phrase boundary is present.
- Fig. 6 The complete the carassocators and the classifier comprehensive neural network is shown schematically in Fig. 6. It shows car associates 7 for k classes.
- the individual elements p i of the output vector p indicate the probability with which a phrase boundary has been detected at the autoassociator i.
- the probability p i is greater than 0.5, this is evaluated as having a corresponding phrase boundary i. If the probability p i is less than 0.5, this means that the phrase limit i is not present here.
- the output vector p has more than two elements p i , it is expedient to evaluate the output vector p in such a way that the phrase boundary is present whose probability p i is greatest in comparison to the other probabilities p i of the output vector p .
- a phrase boundary is determined whose probability p i is in the range of 0.5, for example in the range of 0.4 to 0.6, to carry out a further routine with which the existence the phrase boundary is checked.
- This further routine can be based on both a rule-driven and a data-driven approach.
- the individual Autoassoziatoren 7 When training with a training file that includes appropriate phrasing information, in a first training phase, the individual Autoassoziatoren 7 each trained to their predetermined Phrasi fürspark. As stated above, the input vectors x corresponding to the phrase boundary associated with the respective auto-associate are applied to the input and output sides of the individual auto-associates 7, respectively.
- a second training phase the weighting elements of the autoassociators 7 are recorded and the classifier 8 is trained.
- the error vectors e rec of the autoassociators and at the output side the vectors containing the values for the different phrase boundaries are applied.
- the classifier learns from the error vectors to determine the output vectors p .
- a fine adjustment of all Weighting elements of the entire neural network (the k car associates and the classifier).
- the classifier 8 shown in FIG. 6 has weighting matrices GW, which are each assigned to an auto-associate 7.
- the weighting matrix GW associated with the i-th auto-associate 7 has weighting factors w n in the i-th row. The remaining elements of the matrix are equal to zero.
- the number of weighting factors w n corresponds to the dimension of the input vector, wherein in each case a weighting element w n is related to a component of the input vector.
- a neural network according to the invention has been trained with a predetermined English text. The same text was used to train an HMM recognizer. The performance criteria used were the percentage of correctly recognized phrase boundaries (B-corr), the total correctly rated words, whether one or no phrase boundary follows (total), and the non-correctly recognized words without phrase boundary (NB-ncorr) determined.
- B-corr percentage of correctly recognized phrase boundaries
- NB-ncorr non-correctly recognized words without phrase boundary
- results shown in the table show that the neural networks according to the invention with respect to the correctly recognized Phrase boundaries and the total correctly recognized Words yield approximately the same results as an HMM recognizer.
- the neural networks according to the invention are. the erroneously detected phrase boundaries, in places where it in itself there is no phrase limit, much better than that HMM recognizer. This kind of mistake is in the language-to-text implementation particularly serious, since these errors are the one Immediately generate striking false accentuation.
- one of the neural networks according to the invention was trained with a fraction of the training text used in the above experiments (5%, 10%, 30%, 50%). The following results were achieved: Fraction of the training text B-corr total NB-ncorr 5% 70.50% 89.96% 4.65% 10% 75.00% 90.76% 4.57% 30% 76.30% 91.48% 4.16% 50% 78.01% 91.53% 4.44%
- the embodiment described above has k autoassociators on. For a precise evaluation of the phrase boundaries It may be appropriate to have a large number of car associates to use, taking up to 20 auto-associates appropriate could be. This is a quasi-continuous course achieved the output values.
- the neural networks described above are computer programs realized independently on a computer to translate the linguistic category of a text into whose prosodic marker expire. You thus stop automatically executable process.
- the computer program can also be stored on an electronically readable Disk will be saved and so on another Computer system to be transferred.
- the computer system 9 has an internal bus 10 having a memory area 11, a central processing unit 12 and an interface 13 is connected.
- the interface 13 provides via a Data line 14 a data connection to other computer systems ago.
- On the internal bus are also an acoustic Output unit 15, a graphic output unit 16 and a Input unit 17 connected.
- the acoustic output unit 15 is a speaker 18, the graphical output unit 16 with a screen 19 and the input unit 17 connected to a keyboard 20.
- To the computer system 9 can transmitted over the data line 14 and the interface 13 text are stored in the memory area 11.
- the memory area 11 is divided into several areas, in which texts, audio files, application programs for Carrying out the method according to the invention and other application and utilities are stored.
- the as a text file stored texts are with predetermined program packages analyzed and the respective linguistic categories of words. Thereafter, with the inventive Procedures from the linguistic categories the determined prosodic markers. These prosodic markers will be again entered into another program package, the using prosodic markers to create audio files, via the internal bus 10 to the acoustic output unit 15 transmitted and from this on the speaker 18 as Language are output.
- the method can be similar construction of a Device and adapted training but also to Evaluation of an unknown text regarding a prediction of stresses, e.g. according to the internationally standardized ToBI-labels (tones and breaks indices), and / or the sentence melody are used. These adjustments have in Dependence on the particular language of the processed Text to be done, since the prosody always language-specific is.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Description
| linguistische Kategorieren | |
| Kategorie | Beschreibung |
| NUM | Numerale |
| VERB | Verben |
| VPART | Verbpartikel |
| PRON | Pronomen |
| PREP | Präpositionen |
| NOMEN | Nomen, Eigennamen |
| PART | Partikel |
| DET | Artikel |
| CONJ | Konjunktionen |
| ADV | Adverben |
| ADJ | Adjektive |
| PDET | PREP+DET |
| INTJ | Interjektionen |
| PUNCT | Satzzeichen |
| B-corr | Gesamt | NB-ncorr | |
| erw. Autoass. | 80,33% | 91,68% | 4,72% |
| Autoass. | 78,10% | 90,95% | 3,93 |
| HMM | 79,48% | 91,60% | 5,57% |
| Bruchteil des Trainingstextes | B-corr | Gesamt | NB-ncorr |
| 5% | 70,50% | 89,96% | 4,65% |
| 10% | 75,00% | 90,76% | 4,57% |
| 30% | 76,30% | 91,48% | 4,16% |
| 50% | 78,01% | 91,53% | 4,44% |
Claims (12)
- Verfahren zum Bestimmen prosodischer Markierungen, wobei als prosodische Markierungen Phrasengrenzen und Wortakzente dienen, wobeigekennzeichnet durch die Schrittenauf der Basis linguistischer Kategorien prosodische Markierungen durch ein neuronales Netzwerk (1) bestimmtErfassen der Eigenschaften jeder prosodischen Markierung durch neuronale Autoassoziatoren (7), die auf jeweils eine bestimmte prosodische Markierung trainiert sind, undAuswerten der von jedem der Autoassoziatoren (7) ausgegebenen Ausgangsinformationen in einem neuronalen Klassifikator (8).
- Verfahren nach Anspruch 1,
dadurch gekennzeichnet, daß als prosodische Markierungen Phrasengrenzen bestimmt und vorzugsweise auch ausgewertet und/oder bewertet werden. - Verfahren nach Anspruch 1 und/oder Anspruch 2,
dadurch gekennzeichnet, daß am Eingang (2) des Netzwerkes (1) die linguistischen Kategorien von zumindest drei Wörtern eines zu synthetisierenden Textes angelegt werden. - Verfahren nach einem der vorhergehenden Ansprüche,
dadurch gekennzeichnet, daß die Autoassoziatoren (1) für eine jeweilige vorbestimmte Phrasengrenze trainiert sind. - Verfahren nach Anspruch 4,
dadurch gekennzeichnet, daß das Training des neuronalen Klassifikators (8) nach dem Training aller Autoassoziatoren (7) erfolgt. - Neuronales Netzwerk zum Bestimmen prosodischer Markierungen, wobei als prosodische Markierungen Phrasengrenzen und Wortakzente dienen, mit
einem Eingang (2), einer Zwischenschicht (3) und einem Ausgang (4), wobei der Eingang zum Erfassen von linguistischen Kategorien von Wörtern eines zu analysierenden Textes ausgebildet ist,
dadurch gekennzeichnet, daß Eigenschaften jeder prosodischen Markierung durch neuronale Autoassoziatoren (7) erfassbar sind, die auf jeweils eine bestimmte prosodische Markierung trainiert sind, und
daß die von jedem der Autoassoziatoren (7) ausgegebenen Ausgangsinformationen in einem neuronalen Klassifikator (8) auswertbar sind. - Neuronales Netzwerk nach Anspruch 6,
dadurch gekennzeichnet, daß die Zwischenschicht (3) zumindest zwei Autoassoziatoren (7) aufweist. - Neuronales Netzwerk nach Anspruch 6 oder 7,
dadurch gekennzeichnet, daß der Eingang (2) Eingangsgruppen (5) aufweist, welche mehrere Neuronen (6) besitzen, die jeweils einer linguistischen Kategorie zugeordnet sind, und jede Eingangsgruppe zum Erfassen der linguistischen Kategorie eines Wortes des zu analysierenden Textes dient. - Neuronales Netzwerk nach einem der Ansprüche 6 bis 8,
dadurch gekennzeichnet, daß das Netzwerk zum Ausgeben einer binären, tertiären oder quatären Phrasierungsstufe ausgebildet ist. - Neuronales Netzwerk nach einem der Ansprüche 7 bis 9,
dadurch gekennzeichnet, daß das Netzwerk zum Ausgeben eines quasikontinuierlichen Phrasierungsbereichs ausgebildet ist. - Verfahren nach einem der Ansprüche 1 bis 5,
gekennzeichnet durch,
die Verwendung eines Neuronalen Netzwerkes nach einem dei Ansprüche 6 bis 10. - Vorrichtung zum Bestimmen prosodischer Markierungen mit einem Computersystem (9), das einen Speicherbereich (11) aufweist, in dem ein Programm zum Ausführen eines Neuronaler. Netzwerkes nach einem der Ansprüche 6 bis 10 gespeichert ist
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| DE10018134A DE10018134A1 (de) | 2000-04-12 | 2000-04-12 | Verfahren und Vorrichtung zum Bestimmen prosodischer Markierungen |
| DE10018134 | 2000-04-12 | ||
| PCT/DE2001/001394 WO2001078063A1 (de) | 2000-04-12 | 2001-04-09 | Verfahren und vorrichtung zum bestimmen prosodischer markierungen |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP1273003A1 EP1273003A1 (de) | 2003-01-08 |
| EP1273003B1 true EP1273003B1 (de) | 2005-12-07 |
Family
ID=7638473
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP01940136A Expired - Lifetime EP1273003B1 (de) | 2000-04-12 | 2001-04-09 | Verfahren und vorrichtung zum bestimmen prosodischer markierungen |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US7409340B2 (de) |
| EP (1) | EP1273003B1 (de) |
| DE (2) | DE10018134A1 (de) |
| WO (1) | WO2001078063A1 (de) |
Families Citing this family (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| DE10207875A1 (de) * | 2002-02-19 | 2003-08-28 | Deutsche Telekom Ag | Parametergesteuerte Sprachsynthese |
| US20060293890A1 (en) * | 2005-06-28 | 2006-12-28 | Avaya Technology Corp. | Speech recognition assisted autocompletion of composite characters |
| US20070055526A1 (en) * | 2005-08-25 | 2007-03-08 | International Business Machines Corporation | Method, apparatus and computer program product providing prosodic-categorical enhancement to phrase-spliced text-to-speech synthesis |
| US7860705B2 (en) * | 2006-09-01 | 2010-12-28 | International Business Machines Corporation | Methods and apparatus for context adaptation of speech-to-speech translation systems |
| JP4213755B2 (ja) * | 2007-03-28 | 2009-01-21 | 株式会社東芝 | 音声翻訳装置、方法およびプログラム |
| US9583095B2 (en) * | 2009-07-17 | 2017-02-28 | Nec Corporation | Speech processing device, method, and storage medium |
| TWI573129B (zh) * | 2013-02-05 | 2017-03-01 | 國立交通大學 | 編碼串流產生裝置、韻律訊息編碼裝置、韻律結構分析裝置與語音合成之裝置及方法 |
| US9195656B2 (en) | 2013-12-30 | 2015-11-24 | Google Inc. | Multilingual prosody generation |
| CN105374350B (zh) * | 2015-09-29 | 2017-05-17 | 百度在线网络技术(北京)有限公司 | 语音标注方法及装置 |
| US20180018973A1 (en) | 2016-07-15 | 2018-01-18 | Google Inc. | Speaker verification |
| WO2018048934A1 (en) * | 2016-09-06 | 2018-03-15 | Deepmind Technologies Limited | Generating audio using neural networks |
| JP6750121B2 (ja) | 2016-09-06 | 2020-09-02 | ディープマインド テクノロジーズ リミテッド | 畳み込みニューラルネットワークを使用したシーケンスの処理 |
| US11080591B2 (en) | 2016-09-06 | 2021-08-03 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
| KR102458808B1 (ko) | 2016-10-26 | 2022-10-25 | 딥마인드 테크놀로지스 리미티드 | 신경망을 이용한 텍스트 시퀀스 처리 |
| KR102071582B1 (ko) * | 2017-05-16 | 2020-01-30 | 삼성전자주식회사 | 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 문장이 속하는 클래스(class)를 분류하는 방법 및 장치 |
| CN109492223B (zh) * | 2018-11-06 | 2020-08-04 | 北京邮电大学 | 一种基于神经网络推理的中文缺失代词补全方法 |
| CN111354333B (zh) * | 2018-12-21 | 2023-11-10 | 中国科学院声学研究所 | 一种基于自注意力的汉语韵律层级预测方法及系统 |
| CN111508522A (zh) * | 2019-01-30 | 2020-08-07 | 沪江教育科技(上海)股份有限公司 | 一种语句分析处理方法及系统 |
| US11610136B2 (en) * | 2019-05-20 | 2023-03-21 | Kyndryl, Inc. | Predicting the disaster recovery invocation response time |
| KR20210099988A (ko) * | 2020-02-05 | 2021-08-13 | 삼성전자주식회사 | 뉴럴 네트워크의 메타 학습 방법 및 장치와 뉴럴 네트워크의 클래스 벡터 학습 방법 및 장치 |
| CN112786023B (zh) * | 2020-12-23 | 2024-07-02 | 竹间智能科技(上海)有限公司 | 标记模型构建方法及语音播报系统 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2764343B2 (ja) * | 1990-09-07 | 1998-06-11 | 富士通株式会社 | 節/句境界抽出方式 |
| EP0708958B1 (de) * | 1993-07-13 | 2001-04-11 | Theodore Austin Bordeaux | Spracherkennungssystem für mehrere sprachen |
| WO1995030193A1 (en) * | 1994-04-28 | 1995-11-09 | Motorola Inc. | A method and apparatus for converting text into audible signals using a neural network |
| JP3536996B2 (ja) * | 1994-09-13 | 2004-06-14 | ソニー株式会社 | パラメータ変換方法及び音声合成方法 |
| US5950162A (en) * | 1996-10-30 | 1999-09-07 | Motorola, Inc. | Method, device and system for generating segment durations in a text-to-speech system |
| BE1011892A3 (fr) * | 1997-05-22 | 2000-02-01 | Motorola Inc | Methode, dispositif et systeme pour generer des parametres de synthese vocale a partir d'informations comprenant une representation explicite de l'intonation. |
| US6134528A (en) * | 1997-06-13 | 2000-10-17 | Motorola, Inc. | Method device and article of manufacture for neural-network based generation of postlexical pronunciations from lexical pronunciations |
-
2000
- 2000-04-12 DE DE10018134A patent/DE10018134A1/de not_active Ceased
-
2001
- 2001-04-09 EP EP01940136A patent/EP1273003B1/de not_active Expired - Lifetime
- 2001-04-09 WO PCT/DE2001/001394 patent/WO2001078063A1/de not_active Ceased
- 2001-04-09 DE DE50108314T patent/DE50108314D1/de not_active Expired - Lifetime
-
2003
- 2003-01-27 US US10/257,312 patent/US7409340B2/en not_active Expired - Fee Related
Also Published As
| Publication number | Publication date |
|---|---|
| US20030149558A1 (en) | 2003-08-07 |
| DE50108314D1 (de) | 2006-01-12 |
| DE10018134A1 (de) | 2001-10-18 |
| WO2001078063A1 (de) | 2001-10-18 |
| EP1273003A1 (de) | 2003-01-08 |
| US7409340B2 (en) | 2008-08-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP1273003B1 (de) | Verfahren und vorrichtung zum bestimmen prosodischer markierungen | |
| DE69908047T2 (de) | Verfahren und System zur automatischen Bestimmung von phonetischen Transkriptionen in Verbindung mit buchstabierten Wörtern | |
| DE602004012909T2 (de) | Verfahren und Vorrichtung zur Modellierung eines Spracherkennungssystems und zur Schätzung einer Wort-Fehlerrate basierend auf einem Text | |
| DE60111329T2 (de) | Anpassung des phonetischen Kontextes zur Verbesserung der Spracherkennung | |
| DE69818161T2 (de) | Automatisierte Gruppierung von sinnvollen Sätzen | |
| DE60126564T2 (de) | Verfahren und Anordnung zur Sprachsysnthese | |
| DE69010941T2 (de) | Verfahren und Einrichtung zur automatischen Bestimmung von phonologischen Regeln für ein System zur Erkennung kontinuierlicher Sprache. | |
| DE3337353C2 (de) | Sprachanalysator auf der Grundlage eines verborgenen Markov-Modells | |
| DE69427083T2 (de) | Spracherkennungssystem für mehrere sprachen | |
| DE60004862T2 (de) | Automatische bestimmung der genauigkeit eines aussprachewörterbuchs in einem spracherkennungssystem | |
| DE69622565T2 (de) | Verfahren und vorrichtung zur dynamischen anpassung eines spracherkennungssystems mit grossem wortschatz und zur verwendung von einschränkungen aus einer datenbank in einem spracherkennungssystem mit grossem wortschatz | |
| DE69818231T2 (de) | Verfahren zum diskriminativen training von spracherkennungsmodellen | |
| DE69519328T2 (de) | Verfahren und Anordnung für die Umwandlung von Sprache in Text | |
| DE69315374T2 (de) | Spracherkennungssystem zur naturgetreuen Sprachübersetzung | |
| DE19825205C2 (de) | Verfahren, Vorrichtung und Erzeugnis zum Generieren von postlexikalischen Aussprachen aus lexikalischen Aussprachen mit einem neuronalen Netz | |
| DE69618503T2 (de) | Spracherkennung für Tonsprachen | |
| DE3416238C2 (de) | Extremschmalband-Übertragungssystem und Verfahren für eine Übertragung von Nachrichten | |
| DE19942178C1 (de) | Verfahren zum Aufbereiten einer Datenbank für die automatische Sprachverarbeitung | |
| DE20004416U1 (de) | Spracherkennungsvorrichtung unter Verwendung mehrerer Merkmalsströme | |
| DE4310190A1 (de) | Sprecher-Verifizierungssystem unter Verwendung der Abstandsmessung nächster Nachbarn | |
| WO1998011534A1 (de) | Verfahren zur anpassung eines hidden-markov-lautmodelles in einem spracherkennungssystem | |
| DE602004004310T2 (de) | System mit kombiniertem statistischen und regelbasierten Grammatikmodell zur Spracherkennung und zum Sprachverstehen | |
| DE69519229T2 (de) | Verfahren und vorrichtung zur anpassung eines spracherkenners an dialektische sprachvarianten | |
| DE60133537T2 (de) | Automatisches umtrainieren eines spracherkennungssystems | |
| EP1058235B1 (de) | Wiedergabeverfahren für sprachgesteuerte Systeme mit text-basierter Sprachsynthese |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20021002 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
| RBV | Designated contracting states (corrected) |
Designated state(s): DE FR GB IT |
|
| 17Q | First examination report despatched |
Effective date: 20040728 |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| GRAC | Information related to communication of intention to grant a patent modified |
Free format text: ORIGINAL CODE: EPIDOSCIGR1 |
|
| GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
| GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
| AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB IT |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
| REF | Corresponds to: |
Ref document number: 50108314 Country of ref document: DE Date of ref document: 20060112 Kind code of ref document: P |
|
| GBT | Gb: translation of ep patent filed (gb section 77(6)(a)/1977) |
Effective date: 20060118 |
|
| ET | Fr: translation filed | ||
| PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
| 26N | No opposition filed |
Effective date: 20060908 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20110427 Year of fee payment: 11 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20110419 Year of fee payment: 11 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: IT Payment date: 20110422 Year of fee payment: 11 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20110620 Year of fee payment: 11 |
|
| GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20120409 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20121228 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20120409 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 50108314 Country of ref document: DE Effective date: 20121101 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20120430 Ref country code: IT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20120409 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20121101 |
