EP1810277A1 - Verfahren zur verteilten konstruktion eines stimmenerkennungsmodells sowie vorrichtung, server und computerprogramme zu seiner implementierung - Google Patents
Verfahren zur verteilten konstruktion eines stimmenerkennungsmodells sowie vorrichtung, server und computerprogramme zu seiner implementierungInfo
- Publication number
- EP1810277A1 EP1810277A1 EP05815123A EP05815123A EP1810277A1 EP 1810277 A1 EP1810277 A1 EP 1810277A1 EP 05815123 A EP05815123 A EP 05815123A EP 05815123 A EP05815123 A EP 05815123A EP 1810277 A1 EP1810277 A1 EP 1810277A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- server
- modeling
- entity
- modeled
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000004590 computer program Methods 0.000 title claims description 5
- 238000010276 construction Methods 0.000 title abstract description 7
- 238000004891 communication Methods 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 10
- 238000004519 manufacturing process Methods 0.000 claims description 8
- 230000007704 transition Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000001944 accentuation Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
Definitions
- the present invention relates to the field of embedded speech recognition, and more particularly the field of the manufacture of voice recognition models used in the context of embedded recognition.
- a user terminal practicing on-board recognition captures a voice signal to be recognized from the user. It compares it with predetermined recognition patterns stored in the user terminal each corresponding to a word (or a sequence of words) to recognize, among them, the word (or sequence of words) that has been pronounced by the user . Then he performs an operation according to the recognized word.
- the embedded recognition avoids the transfer delays occurring in the case of centralized or distributed recognition and due to network exchanges between the user terminal and a server then performing all or part of the recognition tasks.
- Embedded discovery is especially effective for speech recognition tasks such as the custom directory.
- the model of a word is a set of information representing several ways of pronouncing the word (accentuation / omission of certain phonemes and / or variety of speakers etc.). Models can also model, not a word, but a sequence of words. It is possible to manufacture the model of a word, from an initial representation of the word, this initial representation being able to be textual (string of characters) or still vocal. In some cases, the models corresponding to the vocabulary
- the templates are fabricated on a server and then downloaded to the user terminal.
- the document EP 1 047 046 describes an architecture comprising a user terminal, comprising an on-board recognition module, and a server connected by a communication network.
- the user terminal captures an entity to be modeled, for example a contact name intended to be stored in a voice directory of the user terminal. Then it sends to the server data representative of the contact name.
- the server determines from these data a reference model representative of the contact name (for example a Markov model) and communicates it to the user terminal, which stores it in a reference model lexicon associated with the speech recognition module. .
- this architecture involves the transmission to the user terminal of all the parameters of the reference model for each contact name to be registered, which implies a large number of data to be transmitted, and therefore significant communication costs and delays.
- the present invention aims to propose a solution not having _3Q_ such disadvantages ⁇
- the invention proposes a method of distributed construction of a voice recognition model of an entity to be modeled.
- the model is intended to be used by a device with a built models and a reference database storing modeling elements.
- the device is able to communicate with a server via a communication link.
- the method comprises at least the following steps: the device obtains the entity to be modeled;
- the device transmits data representative of the entity on the communication link destined for the server;
- the server receives the data to be modeled and carries out a processing to determine from these data a set of modeling parameters indicating modeling elements;
- the server transmits on the communication link destined for the device the modeling parameters
- the device receives the modeling parameters and determines the voice recognition model of the entity to be modeled according to at least the modeling parameters and at least one modeling element stored in the reference base and indicated in the transmitted modeling parameters;
- the device stores the voice recognition model of the entity to be modeled in the base of built models.
- the device is an on-board voice recognition user terminal.
- the invention thus makes it possible to benefit from the power of resources available on a server and thus not to be limited during the first stages of the construction of the model by memory dimension constraints specific to the device, for example a user terminal, while by limiting the amount of data transferred over the network.
- the transferred data do not correspond to the complete model corresponding to the entity to be modeled, but to information that will enable the device to build the complete model, by relying on a generic knowledge base-stored-in- the device:
- the invention makes it possible, by centralized evolution, maintenance and / or updating operations, carried out on the knowledge bases of the server, to make the devices benefit from these evolutions.
- the invention proposes a device capable of communicating with a server via a communication link. He understands :
- a reference database storing modeling elements
- the device is adapted to implement the steps of a method according to the first aspect of the invention which are incumbent on the device, to form the model of the entity to be modeled;
- the device is a user terminal for performing embedded voice recognition using on-board voice recognition means adapted to compare data representative of an audio signal to be recognized captured by the user terminal, to speech recognition patterns stored in the user terminal.
- the invention proposes a server for performing a part of recognition model manufacturing tasks.
- the server includes: means for receiving, via the communication link, data to be modeled transmitted by the device;
- the server is further adapted to implement the steps of a method according to the first aspect of the invention which is the responsibility of the server.
- the invention proposes a computer program for creating speech recognition models from an entity to be modeled, executable by a processing unit of a device intended to perform on-board voice recognition.
- This user program comprises instructions for performing the steps, which are the responsibility of the device, of a method according to the first aspect of the invention, during a program execution by the processing unit.
- the invention provides a computer program for forming speech recognition models, executable by a processing unit of a server and comprising instructions for executing the steps, which are the responsibility of the server, of a method according to the first aspect of the invention, during a program execution by the processing unit.
- FIG. 1 represents a system comprising a user terminal and a server in an implementation mode of the invention
- FIG. 2 represents a lexical graph determined from the character string ⁇ ⁇ cPetit "by a server in one embodiment of the invention
- FIG. 3 represents a lexical graph determined from the "small” character string, taking into account the contexts by a server in one embodiment of the invention
- FIG. 4 represents an acoustic modeling graph determined from the string "Small" by a server in one embodiment of the invention.
- FIG. 1 represents a user terminal 1, which comprises a voice recognition module 2, a lexicon 5 storing recognition patterns, a model making module 6 and a reference base 7.
- the reference base 7 stores elements of modelization. These elements were previously provided in a configuration step of the base 7 of the terminal, factory or download.
- each contact name in the directory is associated with a respective recognition model stored in the lexicon 5, which thus includes the set of recognizable contact names.
- the corresponding signal is captured using a microphone 3 and supplied at the input of the recognition module 2.
- This module 2 implements a recognition algorithm analyzing the signal (for example by performing an acoustic analysis to determine a sequence of frames and associated cepstral coefficients) and determining if it corresponds to one of the recognition models stored in lexicon 5.
- the user terminal 1 dials the phone number stored in the voice directory in association with the name of the recognized contact.
- the models stored in the lexicon 5 are for example Markov models corresponding to the names of the contacts. It is recalled that a model probability density and a Markov chain. It allows the calculation of the probability of an observation X for a given message m.
- the manufacture of the recognition models stored in the lexicon 5 is distributed between the user terminal 1 and a server 9.
- the server 9 and the user terminal 1 are connected by a bidirectional link 8.
- the server 9 comprises a module 10 for determining modeling parameters and a plurality of bases 11 comprising rules of the lexical and / or syntactic and / or acoustic type and / or knowledge relating in particular to the variants depending on the languages, the accents , exceptions in the field of proper names etc.
- the plurality of bases 11 thus makes it possible to obtain the set of possible pronunciation variants of an entity to be modeled, when such modeling is desired.
- the user terminal 1 is adapted to obtain an entity to be modeled 15 (in the case considered here: the "PETIT" contact name) provided by the user, for example in textual form, via keys included in the terminal. user 1.
- entity to be modeled 15 in the case considered here: the "PETIT" contact name
- the user terminal 1 then establishes a data mode connection via the communication link 8, and sends the server 9 via this link 8 the character string "Small” corresponding to the word "PETIT”. .
- the server 9 receives the character string and performs processing using the module 10 and the plurality of bases 11, to output a set of modeling parameters indicating modeling elements.
- the server 9 sends the modeling parameters to the user terminal 1.
- the user terminal 1 receives these modeling parameters that indicate modeling elements, extracted from the reference base 7
- the reference base 7 includes a recognition pattern for each phoneme, for example a Markov model.
- the module 10 for determining modeling parameters of the server 9 is adapted to determine a phonetic graph corresponding to the string of characters received. Using the plurality of bases 11, it thus determines from the received character string, the different possible pronunciations of the word. Then he represents each of these pronunciations in the form of a succession of phonemes. Thus, from the "Small" character string received, the module 10 of the server determines the two following pronunciations: p.e.t.i. or p.t.i, depending on whether the mute e is pronounced or not. These variants correspond to respective successions of phonemes, represented jointly in the form p. (e I ()). t. i or by the phonetic graph shown in FIG.
- the server 9 then returns to the user terminal 1 a set of modeling parameters describing these variants.
- the exchange is for example the following: Terminal -> Server: "Small” Server -> Terminal: p. (e I ()). t. i.
- the user terminal When the user terminal receives these modeling parameters describing phoneme sequences, it constructs the model of the word "PETIT" from the phonetic graph, and Markov models stored in the modeling element database for each of the phonemes / p /, / e /, / t /, / i /. Then he stores the Markov model thus constructed for the contact name "PETIT" in lexicon 5.
- the reference base 7 of the user terminal 1 can store sets of phoneme models for several languages.
- the server 10 also transmits an indication on the game to use.
- the exchange will for example be of the type:
- Terminal -> Server "Small"
- the server 9, using the plurality of bases 11, detects and takes into account the language of "supposed” origin of the name. He thus generates relevant variants of pronunciation for this one (see “Generating proper name pronunciation variants for automatic recognition", by K. Bartkova, Proceedings ICPhS'2003, 15 th International Congress of Phonetic Sciences, Barcelona, Spain, 3- August 9, 2003, pp 1321-1324).
- the module 10 for determining modeling parameters of the server 9 is adapted to take into account, in addition, the contextual influences, that is to say the 1 phonemes which preceding and following the current phoneme, as shown in Figure 3.
- the module 10 in one embodiment can then send modeling parameters describing the phonetic graph taking into account the contexts.
- the reference base 7 comprises the Markov models of the phonemes taking into account the contexts.
- niodes_-de_ -Setting - in- Oedomain- of - the invention - can "î ⁇ pTés ⁇ ntêf of 'pronunciations as a succession of phonetic units other than phonemes, eg polyphonic (Contd several phonemes) or sub-phonetic units which take into account, for example, the separation between the holding and the explosion of the plosives.
- the base 7 comprises respective models of such phonetic units.
- the server takes into account the contexts.
- it is the terminal that will take into account the contexts for the modeling, on the basis of a lexical description (for example a standard lexical graph simply indicating the phonemes) transmitted by the server, of the entity to be modeled.
- the module 10 of the server 9 is adapted to determine, from the information sent by the terminal relating to the entity to be modeled, an acoustic modeling graph.
- Such an acoustic modeling graph determined by the module 10 from the phonetic graph obtained from the string "Petit” is represented in FIG. 4. This graph is the support of the model of
- Circles, numbered 1 to 14, represent the states of the chain of
- the D labels designate the probability density functions, which model the spectral shapes that are observed on a signal and that result from an acoustic analysis.
- the Markov chain constrains the temporal order in which these spectral forms must be observed. We consider here that the densities of probabilities are associated with the states of the Markov chain (in another embodiment, the densities are associated with the transitions).
- the upper part of the graph corresponds to the pronunciation variant p.e.t.i
- the lower part corresponds to the variant p .t.i.
- Dp1, Dp2, Dp3 denote three densities associated with the phoneme / p /.
- De1, De2, De3 denote the three densities associated with the phoneme IeI; Dtl, JDt2, D_t3-müsignent- three densities associated to ⁇ ⁇ phc7ré7 ⁇ T ⁇ "7t /" èTD ⁇ 1 'Di2,
- Di3 denote the three densities associated with the phoneme IM.
- the choice of three states and densities by phoneme acoustic model (corresponding respectively to the beginning, the middle and the end of the phoneme) is common, but not unique. Indeed, one can use more or less states and densities for each model of phoneme.
- Each density is in fact made up of a weighted sum of several Gaussian functions defined on the space of the acoustic parameters (space corresponding to the measurements made on the signal to be recognized).
- some Gaussian functions of some densities are schematically represented.
- Dp1 for example:
- the server 9 is adapted to transmit to the user terminal 1 information from the acoustic modeling graph determined by the module 10, which provides the list of successive transitions between states and indicates for each state. the identifier of the associated density.
- the exchange is for example of the type: Terminal -> Server: "Small"
- the first block of information transmitted between the ⁇ Transitions-15 Graph> and ⁇ / Transitions-Graph> tags thus describes all 28 transitions of the acoustic graph, with each starting state and each arrival state.
- ⁇ / Density States> describes the association of the densities with the states of the graph, by specifying the pairs state / identifier of associated density.
- the reference base 7 has the parameters of the probability densities associated with the received identifiers. These parameters are parameters of description and / or precision of the densities.
- the received density identifier Dp1 For example, from the received density identifier Dp1, it provides the weighted sum describing the density, as well as the value of the weighting coefficients and the parameters of the Gaussians involved in the summation.
- the user terminal 1 When the user terminal 1 receives the modeling parameters described above, it extracts the base 7 parameters densities 30. probability associés- to the identifiers listed in the "block ⁇ ⁇ Eîàts-Oe ⁇ siFés>, and builds the model of the word "SMALL" from these density parameters and modeling parameters. Then he stores the model thus constructed for the contact name "PETIT" in lexicon 5.
- the server 9 is adapted to transmit to the user terminal 1 information from the acoustic modeling graph determined by the module 10, which provides, in addition to the list of successive transitions between states and the identifier of the associated density for each state as before, the definition of densities according to the Gaussian functions.
- the server 9 sends to the user terminal 1, in addition to the two blocks of information described above, a block of additional information transmitted between the tags ⁇ Densities-
- Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian Weights Gaussian Gaussian Weights , Dp2, ..., Di3 of the graph are to be described:
- the reference base 7 has parameters describing the Gaussian associated with the received identifiers.
- the user terminal When the user terminal receives the modeling parameters described above, it constructs the model of the word "PETIT" from these paj; aji ⁇ èjres_et ⁇ pj ⁇ r_cjtaii_5aussienne - indicated - in-the-bloe - ⁇ - Densities-
- Gaussian> from the parameters stored in the reference base 7.
- the server knows the state of the reference base 7 of the terminal 1 and knows how to determine what is stored or not in the base 7. It is adapted to provide only the description of the phonetic graph when it determines that the models of the phonemes present in the phonetic graph are stored in the base 7. For the phonemes whose models are not described in the base 7, it determines the acoustic modeling graph. It supplies the user terminal 1 with the information of the ⁇ Transitions-Graph> and ⁇ Density-state> blocks relating to the densities that it determines as known from the base 7. It furthermore provides the information of the ⁇ Gaussian-Density> block relating to the density not defined in the base 7 of the user terminal.
- the server 9 does not know the contents of the reference base 7 of the user terminal 1, and the latter is adapted, in the event that it receives information from the server 9 comprising an identifier of a data element.
- modeling for example a probability density or a Gaussian
- the reference base 7 does not include the parameters of the modeling element thus identified, to send a request to the server 9 to obtain these missing parameters in order to determine the modeling element and enrich the baseline.
- the server 9 can search among the modeling units that it knows to be available in the reference base 7, which resemble "those most required by a new model to be constructed corresponding to a different language. In this case, it can adapt the modeling parameters to be transmitted to the user terminal 1 to describe as much as possible the model or a modeling element absent from the base 7 and required by the user terminal, as a function of the modeling elements stored in the complementary to ⁇ transfer and store in the terminal.
- the example described above corresponds to the provision by the user terminal of the entity to be modeled in text form, for example via the keyboard.
- Other modes of input or recovery of the entity to be modeled can be implemented according to the invention.
- the entity to be modeled is retrieved by the user terminal 1 from a received call identifier (display name / number).
- the entity to be modeled is captured by the user terminal 1 from one or more examples of pronunciation of the contact name by the user.
- the user terminal 1 transmits to the server 9 these examples of the entity to be modeled (either directly in acoustic form, or after an analysis 0 determining acoustic parameters, for example cepstral coefficients).
- the server 9 is then adapted, from the received data, to determine a phonetic graph and / or an acoustic modeling graph (directly from the data for example in a monolocutor type approach or after the determination of the phonetic graph), and send the modeling parameters to the user terminal 1.
- the terminal uses these modeling parameters (which in particular indicate modeling elements described in the base 7) and the model elements thus indicated 0 and available in the base 7, to construct the model
- the user terminal 1 is adapted to optimize the lexicon of the models constructed, by factoring any redundancies. This operation consists in determining the parts common to several models stored in the lexicon 5 (for example the identical beginning or end of the word). It makes it possible to avoid unnecessarily duplicating calculations during the decoding phase and thus to save the computing resource.
- the factorization of the models can concern words, complete sentences or portions of sentences.
- the factoring step is CL - performed.
- parJe-server, ⁇ for example-from-a-list of words ⁇ sent " by " the terminal, or from a new word to model sent by the terminal and a list of words stored at the server and known by the server as listing words whose templates are stored in the terminal.
- the server sends information relating to the common factors thus determined.
- the user terminal 1 is adapted to send to the server 9, in addition to the entity to be modeled, additional information, for example the indication of the language used, so that the server performs a certain task. phonetic analysis accordingly, and / or the characteristics of the phonetic units to be provided or the acoustic models to be used, or the indication of the accent or any other characterization of the speaker allowing generation of pronunciation or modeling variants adapted to this speaker (note that this information can be stored on the server, if it can automatically identify the calling terminal) etc.
- the solution according to the invention applies to all kinds of embedded recognition applications, the voice directory application indicated above being mentioned only as an example.
- the lexicon 5 described above has recognizable contact names; however, it may have common names and / or recognizable phrases.
- Transmissions from the server can be in the form of sending blocks of data in response to a particular request from the terminal, or by sending blocks with tags similar to those presented above.
- the examples described above correspond to the implementation of the invention within a user terminal.
- the co ⁇ struction of models-reGonnaissanee- is distributedernon ⁇ not ⁇ e ⁇ itre server cm and a user terminal, but between a server and a gateway adapted to be connected to several user terminals, for example a residential gateway, within the same home (residential gateway).
- This configuration allows to pool the construction of the models.
- voice recognition is performed either exclusively by the user terminal (the models constructed are transmitted to it by the gateway), or by the gateway, or by both in the case of a terminal. distributed recognition.
- the present invention therefore makes it possible advantageously to take advantage of multiple databases of knowledge of the server (for example multilingual) for the constitution of models, bases which can not, for reasons of memory capacity, be installed on a device of the user terminal or gateway type while limiting the amount of information to be transmitted over the communication link between the device and the server.
- the invention also allows greater ease of implementation of model determination evolutions, since it suffices to perform the maintenance, updating and evolution operations on the server's bases, and not on each other. device.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Telephonic Communication Services (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0411873 | 2004-11-08 | ||
PCT/FR2005/002695 WO2006051180A1 (fr) | 2004-11-08 | 2005-10-27 | Procede de construction distribuee d'un modele de reconnaissance vocale , dispositif, serveur et programmes d'ordinateur pour mettre en œuvre un tel procede |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1810277A1 true EP1810277A1 (de) | 2007-07-25 |
Family
ID=34950626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05815123A Withdrawn EP1810277A1 (de) | 2004-11-08 | 2005-10-27 | Verfahren zur verteilten konstruktion eines stimmenerkennungsmodells sowie vorrichtung, server und computerprogramme zu seiner implementierung |
Country Status (3)
Country | Link |
---|---|
US (1) | US20080103771A1 (de) |
EP (1) | EP1810277A1 (de) |
WO (1) | WO2006051180A1 (de) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070129949A1 (en) * | 2005-12-06 | 2007-06-07 | Alberth William P Jr | System and method for assisted speech recognition |
US8140336B2 (en) * | 2005-12-08 | 2012-03-20 | Nuance Communications Austria Gmbh | Speech recognition system with huge vocabulary |
US9129599B2 (en) * | 2007-10-18 | 2015-09-08 | Nuance Communications, Inc. | Automated tuning of speech recognition parameters |
GB2466242B (en) * | 2008-12-15 | 2013-01-02 | Audio Analytic Ltd | Sound identification systems |
US11087739B1 (en) * | 2018-11-13 | 2021-08-10 | Amazon Technologies, Inc. | On-device learning in a hybrid speech processing system |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5864810A (en) * | 1995-01-20 | 1999-01-26 | Sri International | Method and apparatus for speech recognition adapted to an individual speaker |
US5960399A (en) * | 1996-12-24 | 1999-09-28 | Gte Internetworking Incorporated | Client/server speech processor/recognizer |
DE19751123C1 (de) * | 1997-11-19 | 1999-06-17 | Deutsche Telekom Ag | Vorrichtung und Verfahren zur sprecherunabhängigen Sprachnamenwahl für Telekommunikations-Endeinrichtungen |
US6463413B1 (en) | 1999-04-20 | 2002-10-08 | Matsushita Electrical Industrial Co., Ltd. | Speech recognition training for small hardware devices |
DE19918382B4 (de) * | 1999-04-22 | 2004-02-05 | Siemens Ag | Erstellen eines Referenzmodell-Verzeichnisses für ein sprachgesteuertes Kommunikationsgerät |
US6442519B1 (en) * | 1999-11-10 | 2002-08-27 | International Business Machines Corp. | Speaker model adaptation via network of similar users |
US20030182113A1 (en) * | 1999-11-22 | 2003-09-25 | Xuedong Huang | Distributed speech recognition for mobile communication devices |
US6823306B2 (en) * | 2000-11-30 | 2004-11-23 | Telesector Resources Group, Inc. | Methods and apparatus for generating, updating and distributing speech recognition models |
EP1215661A1 (de) * | 2000-12-14 | 2002-06-19 | TELEFONAKTIEBOLAGET L M ERICSSON (publ) | Sprachgesteuertes tragbares Endgerät |
FI20010792A (fi) * | 2001-04-17 | 2002-10-18 | Nokia Corp | Käyttäjäriippumattoman puheentunnistuksen järjestäminen |
CN1409527A (zh) * | 2001-09-13 | 2003-04-09 | 松下电器产业株式会社 | 终端器、服务器及语音辨识方法 |
CN1453767A (zh) * | 2002-04-26 | 2003-11-05 | 日本先锋公司 | 语音识别装置以及语音识别方法 |
US7386443B1 (en) * | 2004-01-09 | 2008-06-10 | At&T Corp. | System and method for mobile automatic speech recognition |
-
2005
- 2005-10-27 US US11/667,184 patent/US20080103771A1/en not_active Abandoned
- 2005-10-27 WO PCT/FR2005/002695 patent/WO2006051180A1/fr active Application Filing
- 2005-10-27 EP EP05815123A patent/EP1810277A1/de not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of WO2006051180A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2006051180A1 (fr) | 2006-05-18 |
US20080103771A1 (en) | 2008-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200320977A1 (en) | Speech recognition method and apparatus | |
EP1362343B1 (de) | Verfahren, modul, vorrichtung und server zur spracherkennung | |
JP4267081B2 (ja) | 分散システムにおけるパターン認識登録 | |
KR101670150B1 (ko) | 이름 발음을 위한 시스템 및 방법 | |
JP5062171B2 (ja) | 音声認識システム、音声認識方法および音声認識用プログラム | |
KR101120765B1 (ko) | 스위칭 상태 스페이스 모델과의 멀티모덜 변동 추정을이용한 스피치 인식 방법 | |
US20070260455A1 (en) | Feature-vector compensating apparatus, feature-vector compensating method, and computer program product | |
JP2002091477A (ja) | 音声認識システム、音声認識装置、音響モデル管理サーバ、言語モデル管理サーバ、音声認識方法及び音声認識プログラムを記録したコンピュータ読み取り可能な記録媒体 | |
CN107104994B (zh) | 语音识别方法、电子装置及语音识别系统 | |
EP1810277A1 (de) | Verfahren zur verteilten konstruktion eines stimmenerkennungsmodells sowie vorrichtung, server und computerprogramme zu seiner implementierung | |
Lee et al. | The I4U mega fusion and collaboration for NIST speaker recognition evaluation 2016 | |
EP1642264B1 (de) | Spracherkennung für grosse dynamische vokabulare | |
JP2009128490A (ja) | 学習データ選択装置、学習データ選択方法、プログラムおよび記録媒体、音響モデル作成装置、音響モデル作成方法、プログラムおよび記録媒体 | |
EP1803116B1 (de) | Spracherkennungsverfahren mit temporaler markereinfügung und entsprechendes system | |
CN114023309A (zh) | 语音识别系统、相关方法、装置及设备 | |
US7853451B1 (en) | System and method of exploiting human-human data for spoken language understanding systems | |
EP1285435B1 (de) | Syntax- und semantische-analyse von sprachbefehlen | |
CN109524000A (zh) | 离线对话实现方法和装置 | |
Kos et al. | A speech-based distributed architecture platform for an intelligent ambience | |
FR3058253B1 (fr) | Procede de traitement de donnees audio issues d'un echange vocal, systeme et programme d'ordinateur correspondant. | |
WO2005010868A1 (ja) | 音声認識システム及びその端末とサーバ | |
US20240233706A1 (en) | Text-based speech generation | |
EP1741092A1 (de) | Verfahren und system zur spracherkennung durch kontextuelle modellierung der spracheinheiten | |
JP5956913B2 (ja) | 言語モデル生成装置、言語モデル生成方法、プログラム、記録媒体 | |
EP4443425A1 (de) | Verfahren und systeme zur sprachsteuerung |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20070504 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
DAX | Request for extension of the european patent (deleted) | ||
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: FRANCE TELECOM |
|
17Q | First examination report despatched |
Effective date: 20090717 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20100626 |