ES2311344B1 - METHOD OF RECOGNITION OF SPEECH WITH PROGRESSIVE TRAINING. - Google Patents
METHOD OF RECOGNITION OF SPEECH WITH PROGRESSIVE TRAINING. Download PDFInfo
- Publication number
- ES2311344B1 ES2311344B1 ES200601101A ES200601101A ES2311344B1 ES 2311344 B1 ES2311344 B1 ES 2311344B1 ES 200601101 A ES200601101 A ES 200601101A ES 200601101 A ES200601101 A ES 200601101A ES 2311344 B1 ES2311344 B1 ES 2311344B1
- Authority
- ES
- Spain
- Prior art keywords
- user
- training
- recognition
- platform
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 230000000750 progressive effect Effects 0.000 title claims abstract description 6
- 230000001131 transforming effect Effects 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 2
- 230000007704 transition Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Método de reconocimiento del habla con entrenamiento progresivo de la plataforma, permitiendo alcanzar niveles de reconocimiento similares a los de plataformas que requieren una fase de entrenamiento especifico, y a su vez ofrecer servicios previamente y durante dicho proceso de entrenamiento.Speech recognition method with progressive training of the platform, allowing to reach recognition levels similar to those of platforms that require a specific training phase, and in turn offer services before and during said training process.
Description
Método de reconocimiento del habla con entrenamiento progresivo.Speech recognition method with progressive training
El método descrito permite desarrollar el reconocimiento del habla, obteniendo resultados similares a los obtenidos mediante una fase de entrenamiento dedicada, al tiempo que permite ofrecer servicios sobre la plataforma encargada de reconocer los vocablos utilizados por el usuario, mientras dura dicho proceso de entrenamiento.The described method allows to develop the speech recognition, obtaining results similar to those obtained through a dedicated training phase, at the same time which allows to offer services on the platform in charge of recognize the words used by the user, while it lasts said training process.
La mayor parte de los métodos de reconocimiento del habla actuales desarrollan su función mediante un modelo estadístico que determina la probabilidad condicional de que una determinada palabra, produzca la secuencia auditiva observada. Mediante la comparación de estas probabilidades es posible determinar cual es, con mayor probabilidad, la palabra dicha por el usuario. Este modelo estadístico se compone de una serie de estados y unas probabilidades de transición entre los distintos estados. Mientras que los posibles estados suelen estar predeterminados por el modelo utilizado, las probabilidades de transición suelen tratarse como parámetros del modelo, y distintos valores de los mismos permiten ajustar el funcionamiento del método según distintos condicionantes -hablante, condiciones de ruido, etc. Estos parámetros pueden optimizarse a través de distintos métodos, siendo los más habituales los basados en el entrenamiento. Atendiendo a la necesidad previa o no de un entrenamiento especifico, los métodos de reconocimiento del habla pueden dividirse en dos grandes grupos:Most recognition methods current speech develop their function through a model statistic that determines the conditional probability that a certain word, produce the observed auditory sequence. By comparing these probabilities it is possible determine what is most likely the word spoken by the Username. This statistical model is composed of a series of states and some transition probabilities between the different states. While the possible states are usually predetermined by the model used, the transition probabilities usually be treated as model parameters, and different values of the they allow to adjust the operation of the method according to different conditioning-speaker, noise conditions, etc. These parameters can be optimized through different methods, being The most common are those based on training. Attending to the prior or not need for specific training, the methods Speech recognition can be divided into two large groups:
- a)to)
- Métodos que requieren una fase de entrenamiento específico. Este tipo de métodos requieren que el usuario final entrene el sistema previamente a su uso. Suelen presentar dependencia del hablante para el reconocimiento del habla y un dominio de reconocimiento extenso -reconocen una gran variedad de palabras y frases. Para entrenar el sistema, el usuario debe repetir una serie de palabras y/o frases, de modo que el sistema puede ajustar sus parámetros.Methods that require a phase of specific training These types of methods require that the end user train the system before use. Usually present speaker dependence for speech recognition and an extensive recognition domain - they recognize a great variety of words and phrases To train the system, the user you must repeat a series of words and / or phrases, so that the system can adjust its parameters.
- b)b)
- Métodos que no requieren una fase de entrenamiento específico. Este tipo de métodos se caracterizan por no depender del hablante para el reconocimiento del habla y por disponer de un dominio de reconocimiento reducido, normalmente limitado a unos cientos de palabras.Methods that do not require a phase of specific training These types of methods are characterized by not depend on the speaker for speech recognition and for have a reduced recognition domain, usually Limited to a few hundred words.
Se utilizan las siguientes definiciones para las diversas entidades que conforman la solución de la invención:The following definitions are used for various entities that make up the solution of the invention:
- a)to)
- Terminal de usuario. Terminal con funcionalidad básica para realizar llamadas de voz.User terminal Terminal with Basic functionality to make voice calls.
- b)b)
- Plataforma de reconocimiento del habla. Plataforma encargada de reconocer los vocablos utilizados por el usuario y, en su caso, de desencadenar las acciones oportunas a realizar como consecuencia de los mismos.Speech recognition platform. Platform responsible for recognizing the words used by the user and, where appropriate, trigger the appropriate actions to perform as a result of them.
La presente invención se basa en la modificación de la plataforma de reconocimiento del habla, para partiendo de un sistema que se comporta como un sistema que no requiere una fase de entrenamiento específico -capaz de reconocer un número limitado de palabras-, a través de un entrenamiento progresivo y no dedicado realizado por el usuario final, pasar a uno que se comporta como un sistema que requiere una fase de entrenamiento específico -capaz de reconocer miles de palabras o incluso lenguaje natural-.The present invention is based on the modification of the speech recognition platform, starting from a system that behaves like a system that does not require a phase of specific training - able to recognize a limited number of words-, through progressive and non-dedicated training made by the end user, move on to one that behaves like a system that requires a specific training phase - capable of Recognize thousands of words or even natural language.
Al acceder un usuario a través de su terminal de usuario a la plataforma de reconocimiento, ésta recupera el perfil del usuario, comprobando el nivel de entrenamiento realizado y los parámetros específicos del usuario en cuestión. En caso de no existir un perfil para el usuario se crea y almacena, siendo utilizado en adelante como el perfil del usuario indicado. Dependiendo del estado de entrenamiento respecto al usuario en cuestión, la plataforma ofrece al usuario distintas versiones de los servicios, diferenciándose éstas en la riqueza del vocabulario disponible para el usuario, así, los usuarios que han alcanzado un mayor nivel de entrenamiento podrán utilizar un mayor número de palabras que serán reconocidas por la plataforma, en tanto que los usuarios en niveles de entrenamiento inferiores dispondrán de un vocabulario más reducido.When accessing a user through its terminal user to the recognition platform, it retrieves the profile of the user, checking the level of training performed and the specific parameters of the user in question. In case of no exist a profile for the user is created and stored, being used hereinafter as the profile of the indicated user. Depending on the training status with respect to the user in issue, the platform offers the user different versions of the services, differing these in the richness of the vocabulary available to the user as well, users who have reached a higher level of training may use a greater number of words that will be recognized by the platform, while the users at lower training levels will have a smaller vocabulary
Cada vez que durante el uso de un servicio la plataforma reconoce correctamente una palabra o frase de las pronunciadas por el usuario, ésta reajusta sus parámetros internos de manera que se maximice la probabilidad de que dadas las observaciones correctas anteriores y la nueva observación, las palabras reconocidas hubiesen sido aquellas que el usuario ha pronunciado.Each time during the use of a service the platform correctly recognizes a word or phrase from the pronounced by the user, it resets its internal parameters so as to maximize the probability that given the correct observations above and the new observation, the recognized words would have been those that the user has pronounced.
Este método requiere que el vocabulario disponible para el usuario sea reducido en las primeras fases de entrenamiento y vaya aumentando progresivamente, ya que de lo contrario, el porcentaje de acierto en el reconocimiento sería muy bajo, impidiendo tanto el uso de los servicios como el propio entrenamiento de la plataforma.This method requires that vocabulary available to the user be reduced in the early stages of training and progressively increase, since what on the contrary, the percentage of success in the recognition would be very low, preventing both the use of the services and their own platform training.
Otra de las ventajas del presente método, es que permite aumentar el porcentaje de éxito de reconocimiento de la plataforma, al adaptar la misma sus parámetros internos de reconocimiento a las características del habla del usuario.Another advantage of the present method is that allows to increase the success rate of recognition of the platform, by adapting it its internal parameters of recognition of the user's speech characteristics.
Para complementar la descripción que se está realizando y con objeto de facilitar la comprensión de las características de la invención, se acompaña a la presente memoria descriptiva un juego de dibujos en los que, con carácter ilustrativo y no limitativo, se ha representado lo siguiente:To complement the description that is being performing and in order to facilitate the understanding of characteristics of the invention, is attached herein descriptive a set of drawings in which, with character Illustrative and not limiting, the following has been represented:
En la figura 1 se muestra un diagrama de flujo completo del método descrito.A flow chart is shown in Figure 1 Full of the described method.
Tal y como se aprecia en el diagrama de la figura 1, al establecerse una llamada, a través de un terminal de usuario, el sistema en primer lugar identifica al usuario en base a determinados datos que toma de su terminal de usuario, de información obtenida de la propia llamada -por ejemplo la voz del usuario, o de otros datos proporcionados por sistemas externos. En caso de que exista un perfil almacenado para el usuario identificado, procede a recuperarlo, y en caso de no existir dicho perfil crea uno nuevo asociado al usuario llamante.As can be seen in the diagram of the Figure 1, when a call is established, through a terminal user, the system first identifies the user based on certain data that you take from your user terminal, from information obtained from the call itself - for example the voice of user, or other data provided by external systems. In if there is a profile stored for the user identified, proceeds to retrieve it, and if there is no such Profile creates a new one associated with the calling user.
El entrenamiento de la plataforma para cada usuario es progresivo y, mientras no se ha completado totalmente, ofrece al usuario servicios de reconocimiento parcial de voz, que se van ampliando paulatinamente a medida que amplia el vocabulario de palabras reconocidas por el sistema. Cuando el entrenamiento se ha completado el servicio que ofrece es de reconocimiento total de la voz.The platform training for each user is progressive and, while not fully completed, offers the user partial voice recognition services, which they gradually expand as the vocabulary expands of words recognized by the system. When the training is has completed the service it offers is full recognition of the voice.
Así pues, el método de la presente invención requiere para su desarrollo una plataforma de reconocimiento del habla capaz de:Thus, the method of the present invention requires for its development a recognition platform of the speaks capable of:
- a)to)
- Crear, recuperar, almacenar y modificar perfiles de usuario, ya sea en la propia plataforma o en un medio de almacenamiento externo.Create, retrieve, store and modify user profiles, either on the platform itself or in a medium External storage
- b)b)
- Adaptar sus parámetros de reconocimiento de acuerdo con los almacenados en el perfil del usuario.Adapt your parameters recognition according to those stored in the profile of the Username.
- c)C)
- Adaptar su funcionamiento al nivel de entrenamiento del usuario, ofreciendo vocabularios más amplios a aquellos usuarios que hayan alcanzado un nivel de entrenamiento mayor.Adapt its operation to the level of user training, offering broader vocabularies to those users who have reached a training level higher.
- d)d)
- Ajustar los parámetros de reconocimiento de un usuario específico a través de un entrenamiento basado en el reconocimiento de determinados vocablos, frases y/o fonemas.Adjust the parameters of recognition of a specific user through a training based on the recognition of certain words, phrases and / or phonemes.
Una vez descrita suficientemente la naturaleza de la invención, se hace constar a los efectos oportunos que los materiales, forma, tamaño y disposición de los elementos descritos podrán ser modificados, siempre y cuando ello no suponga una alteración de las características esenciales de la invención que se reivindican a continuación.Once nature is sufficiently described of the invention, it is stated for the appropriate purposes that Materials, shape, size and arrangement of the elements described may be modified, as long as this does not imply alteration of the essential characteristics of the invention that is claim below.
Claims (2)
- a)to)
- Crear, recuperar, almacenar y modificar perfiles de usuario, ya sea en la propia plataforma o en un medio de almacenamiento externo.Create, retrieve, store and modify user profiles, either on the platform itself or in a medium External storage
- b)b)
- Adaptar sus parámetros de reconocimiento de acuerdo con los almacenados en el perfil del usuario.Adapt your parameters recognition according to those stored in the profile of the Username.
- c)C)
- Adaptar su funcionamiento al nivel de entrenamiento del usuario, ofreciendo vocabularios más amplios a aquellos usuarios que hayan alcanzado un nivel de entrenamiento mayor.Adapt its operation to the level of user training, offering broader vocabularies to those users who have reached a training level higher.
- d)d)
- Ajustar los parámetros de reconocimiento de un usuario específico a través de un entrenamiento basado en el reconocimiento de determinados vocablos, frases y/o fonemas.Adjust the parameters of recognition of a specific user through a training based on the recognition of certain words, phrases and / or phonemes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
ES200601101A ES2311344B1 (en) | 2006-04-28 | 2006-04-28 | METHOD OF RECOGNITION OF SPEECH WITH PROGRESSIVE TRAINING. |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
ES200601101A ES2311344B1 (en) | 2006-04-28 | 2006-04-28 | METHOD OF RECOGNITION OF SPEECH WITH PROGRESSIVE TRAINING. |
Publications (2)
Publication Number | Publication Date |
---|---|
ES2311344A1 ES2311344A1 (en) | 2009-02-01 |
ES2311344B1 true ES2311344B1 (en) | 2009-12-17 |
Family
ID=40260957
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
ES200601101A Expired - Fee Related ES2311344B1 (en) | 2006-04-28 | 2006-04-28 | METHOD OF RECOGNITION OF SPEECH WITH PROGRESSIVE TRAINING. |
Country Status (1)
Country | Link |
---|---|
ES (1) | ES2311344B1 (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE69924596T2 (en) * | 1999-01-20 | 2006-02-09 | Sony International (Europe) Gmbh | Selection of acoustic models by speaker verification |
US6766295B1 (en) * | 1999-05-10 | 2004-07-20 | Nuance Communications | Adaptation of a speech recognition system across multiple remote sessions with a speaker |
EP1079615A3 (en) * | 1999-08-26 | 2002-09-25 | Matsushita Electric Industrial Co., Ltd. | System for identifying and adapting a TV-user profile by means of speech technology |
US6895257B2 (en) * | 2002-02-18 | 2005-05-17 | Matsushita Electric Industrial Co., Ltd. | Personalized agent for portable devices and cellular phone |
US7174298B2 (en) * | 2002-06-24 | 2007-02-06 | Intel Corporation | Method and apparatus to improve accuracy of mobile speech-enabled services |
-
2006
- 2006-04-28 ES ES200601101A patent/ES2311344B1/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
ES2311344A1 (en) | 2009-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11341958B2 (en) | Training acoustic models using connectionist temporal classification | |
CN110675855B (en) | Voice recognition method, electronic equipment and computer readable storage medium | |
US9911420B1 (en) | Behavior adjustment using speech recognition system | |
ES2233002T3 (en) | SPEECH RECOGNITION SYSTEM WITH UPDATED LEXIC BY INTRODUCTION OF SPELLED WORDS. | |
CN107767861B (en) | Voice awakening method and system and intelligent terminal | |
US20170323644A1 (en) | Speaker identification device and method for registering features of registered speech for identifying speaker | |
US8296141B2 (en) | System and method for discriminative pronunciation modeling for voice search | |
US9484019B2 (en) | System and method for discriminative pronunciation modeling for voice search | |
CN109461436A (en) | Method and system for correcting pronunciation errors of voice recognition | |
CN111179917B (en) | Speech recognition model training method, system, mobile terminal and storage medium | |
US9135912B1 (en) | Updating phonetic dictionaries | |
Lee et al. | Joint learning of phonetic units and word pronunciations for ASR | |
CN102063900A (en) | Speech recognition method and system for overcoming confusing pronunciation | |
Goel et al. | Approaches to automatic lexicon learning with limited training examples | |
US20160232892A1 (en) | Method and apparatus of expanding speech recognition database | |
WO2017166625A1 (en) | Acoustic model training method and apparatus for speech recognition, and electronic device | |
KR20190012419A (en) | System and method for evaluating speech fluency automatically | |
US20170270923A1 (en) | Voice processing device and voice processing method | |
US20180012602A1 (en) | System and methods for pronunciation analysis-based speaker verification | |
ES2311344B1 (en) | METHOD OF RECOGNITION OF SPEECH WITH PROGRESSIVE TRAINING. | |
KR20160061071A (en) | Voice recognition considering utterance variation | |
Sim et al. | Robust phone set mapping using decision tree clustering for cross-lingual phone recognition | |
CN112614485A (en) | Recognition model construction method, voice recognition method, electronic device, and storage medium | |
KR102199445B1 (en) | Method and apparatus for discriminative training acoustic model based on class, and speech recognition apparatus using the same | |
CN113160804B (en) | Hybrid voice recognition method and device, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EC2A | Search report published |
Date of ref document: 20090201 Kind code of ref document: A1 |
|
FD2A | Announcement of lapse in spain |
Effective date: 20170216 |