ES2311344B1

ES2311344B1 - METHOD OF RECOGNITION OF SPEECH WITH PROGRESSIVE TRAINING.

Info

Publication number: ES2311344B1
Application number: ES200601101A
Authority: ES
Inventors: Alvaro Maso Besga; Tomas Brezmes Llecha
Original assignee: France Telecom Espana SA
Current assignee: Orange Espana SA
Priority date: 2006-04-28
Filing date: 2006-04-28
Publication date: 2009-12-17
Anticipated expiration: 2026-04-28
Also published as: ES2311344A1

Abstract

Método de reconocimiento del habla con entrenamiento progresivo de la plataforma, permitiendo alcanzar niveles de reconocimiento similares a los de plataformas que requieren una fase de entrenamiento especifico, y a su vez ofrecer servicios previamente y durante dicho proceso de entrenamiento.Speech recognition method with progressive training of the platform, allowing to reach recognition levels similar to those of platforms that require a specific training phase, and in turn offer services before and during said training process.

Description

Método de reconocimiento del habla con entrenamiento progresivo.Speech recognition method with progressive training

Object of the invention

El método descrito permite desarrollar el reconocimiento del habla, obteniendo resultados similares a los obtenidos mediante una fase de entrenamiento dedicada, al tiempo que permite ofrecer servicios sobre la plataforma encargada de reconocer los vocablos utilizados por el usuario, mientras dura dicho proceso de entrenamiento.The described method allows to develop the speech recognition, obtaining results similar to those obtained through a dedicated training phase, at the same time which allows to offer services on the platform in charge of recognize the words used by the user, while it lasts said training process.

Background of the invention

La mayor parte de los métodos de reconocimiento del habla actuales desarrollan su función mediante un modelo estadístico que determina la probabilidad condicional de que una determinada palabra, produzca la secuencia auditiva observada. Mediante la comparación de estas probabilidades es posible determinar cual es, con mayor probabilidad, la palabra dicha por el usuario. Este modelo estadístico se compone de una serie de estados y unas probabilidades de transición entre los distintos estados. Mientras que los posibles estados suelen estar predeterminados por el modelo utilizado, las probabilidades de transición suelen tratarse como parámetros del modelo, y distintos valores de los mismos permiten ajustar el funcionamiento del método según distintos condicionantes -hablante, condiciones de ruido, etc. Estos parámetros pueden optimizarse a través de distintos métodos, siendo los más habituales los basados en el entrenamiento. Atendiendo a la necesidad previa o no de un entrenamiento especifico, los métodos de reconocimiento del habla pueden dividirse en dos grandes grupos:Most recognition methods current speech develop their function through a model statistic that determines the conditional probability that a certain word, produce the observed auditory sequence. By comparing these probabilities it is possible determine what is most likely the word spoken by the Username. This statistical model is composed of a series of states and some transition probabilities between the different states. While the possible states are usually predetermined by the model used, the transition probabilities usually be treated as model parameters, and different values of the they allow to adjust the operation of the method according to different conditioning-speaker, noise conditions, etc. These parameters can be optimized through different methods, being The most common are those based on training. Attending to the prior or not need for specific training, the methods Speech recognition can be divided into two large groups:

a)to): Métodos que requieren una fase de entrenamiento específico. Este tipo de métodos requieren que el usuario final entrene el sistema previamente a su uso. Suelen presentar dependencia del hablante para el reconocimiento del habla y un dominio de reconocimiento extenso -reconocen una gran variedad de palabras y frases. Para entrenar el sistema, el usuario debe repetir una serie de palabras y/o frases, de modo que el sistema puede ajustar sus parámetros.Methods that require a phase of specific training These types of methods require that the end user train the system before use. Usually present speaker dependence for speech recognition and an extensive recognition domain - they recognize a great variety of words and phrases To train the system, the user you must repeat a series of words and / or phrases, so that the system can adjust its parameters.

b)b): Métodos que no requieren una fase de entrenamiento específico. Este tipo de métodos se caracterizan por no depender del hablante para el reconocimiento del habla y por disponer de un dominio de reconocimiento reducido, normalmente limitado a unos cientos de palabras.Methods that do not require a phase of specific training These types of methods are characterized by not depend on the speaker for speech recognition and for have a reduced recognition domain, usually Limited to a few hundred words.

Description of the invention

Se utilizan las siguientes definiciones para las diversas entidades que conforman la solución de la invención:The following definitions are used for various entities that make up the solution of the invention:

a)to): Terminal de usuario. Terminal con funcionalidad básica para realizar llamadas de voz.User terminal Terminal with Basic functionality to make voice calls.

b)b): Plataforma de reconocimiento del habla. Plataforma encargada de reconocer los vocablos utilizados por el usuario y, en su caso, de desencadenar las acciones oportunas a realizar como consecuencia de los mismos.Speech recognition platform. Platform responsible for recognizing the words used by the user and, where appropriate, trigger the appropriate actions to perform as a result of them.

La presente invención se basa en la modificación de la plataforma de reconocimiento del habla, para partiendo de un sistema que se comporta como un sistema que no requiere una fase de entrenamiento específico -capaz de reconocer un número limitado de palabras-, a través de un entrenamiento progresivo y no dedicado realizado por el usuario final, pasar a uno que se comporta como un sistema que requiere una fase de entrenamiento específico -capaz de reconocer miles de palabras o incluso lenguaje natural-.The present invention is based on the modification of the speech recognition platform, starting from a system that behaves like a system that does not require a phase of specific training - able to recognize a limited number of words-, through progressive and non-dedicated training made by the end user, move on to one that behaves like a system that requires a specific training phase - capable of Recognize thousands of words or even natural language.

Al acceder un usuario a través de su terminal de usuario a la plataforma de reconocimiento, ésta recupera el perfil del usuario, comprobando el nivel de entrenamiento realizado y los parámetros específicos del usuario en cuestión. En caso de no existir un perfil para el usuario se crea y almacena, siendo utilizado en adelante como el perfil del usuario indicado. Dependiendo del estado de entrenamiento respecto al usuario en cuestión, la plataforma ofrece al usuario distintas versiones de los servicios, diferenciándose éstas en la riqueza del vocabulario disponible para el usuario, así, los usuarios que han alcanzado un mayor nivel de entrenamiento podrán utilizar un mayor número de palabras que serán reconocidas por la plataforma, en tanto que los usuarios en niveles de entrenamiento inferiores dispondrán de un vocabulario más reducido.When accessing a user through its terminal user to the recognition platform, it retrieves the profile of the user, checking the level of training performed and the specific parameters of the user in question. In case of no exist a profile for the user is created and stored, being used hereinafter as the profile of the indicated user. Depending on the training status with respect to the user in issue, the platform offers the user different versions of the services, differing these in the richness of the vocabulary available to the user as well, users who have reached a higher level of training may use a greater number of words that will be recognized by the platform, while the users at lower training levels will have a smaller vocabulary

Cada vez que durante el uso de un servicio la plataforma reconoce correctamente una palabra o frase de las pronunciadas por el usuario, ésta reajusta sus parámetros internos de manera que se maximice la probabilidad de que dadas las observaciones correctas anteriores y la nueva observación, las palabras reconocidas hubiesen sido aquellas que el usuario ha pronunciado.Each time during the use of a service the platform correctly recognizes a word or phrase from the pronounced by the user, it resets its internal parameters so as to maximize the probability that given the correct observations above and the new observation, the recognized words would have been those that the user has pronounced.

Este método requiere que el vocabulario disponible para el usuario sea reducido en las primeras fases de entrenamiento y vaya aumentando progresivamente, ya que de lo contrario, el porcentaje de acierto en el reconocimiento sería muy bajo, impidiendo tanto el uso de los servicios como el propio entrenamiento de la plataforma.This method requires that vocabulary available to the user be reduced in the early stages of training and progressively increase, since what on the contrary, the percentage of success in the recognition would be very low, preventing both the use of the services and their own platform training.

Otra de las ventajas del presente método, es que permite aumentar el porcentaje de éxito de reconocimiento de la plataforma, al adaptar la misma sus parámetros internos de reconocimiento a las características del habla del usuario.Another advantage of the present method is that allows to increase the success rate of recognition of the platform, by adapting it its internal parameters of recognition of the user's speech characteristics.

Description of the figures

Para complementar la descripción que se está realizando y con objeto de facilitar la comprensión de las características de la invención, se acompaña a la presente memoria descriptiva un juego de dibujos en los que, con carácter ilustrativo y no limitativo, se ha representado lo siguiente:To complement the description that is being performing and in order to facilitate the understanding of characteristics of the invention, is attached herein descriptive a set of drawings in which, with character Illustrative and not limiting, the following has been represented:

En la figura 1 se muestra un diagrama de flujo completo del método descrito.A flow chart is shown in Figure 1 Full of the described method.

Preferred Embodiment of the Invention

Tal y como se aprecia en el diagrama de la figura 1, al establecerse una llamada, a través de un terminal de usuario, el sistema en primer lugar identifica al usuario en base a determinados datos que toma de su terminal de usuario, de información obtenida de la propia llamada -por ejemplo la voz del usuario, o de otros datos proporcionados por sistemas externos. En caso de que exista un perfil almacenado para el usuario identificado, procede a recuperarlo, y en caso de no existir dicho perfil crea uno nuevo asociado al usuario llamante.As can be seen in the diagram of the Figure 1, when a call is established, through a terminal user, the system first identifies the user based on certain data that you take from your user terminal, from information obtained from the call itself - for example the voice of user, or other data provided by external systems. In if there is a profile stored for the user identified, proceeds to retrieve it, and if there is no such Profile creates a new one associated with the calling user.

El entrenamiento de la plataforma para cada usuario es progresivo y, mientras no se ha completado totalmente, ofrece al usuario servicios de reconocimiento parcial de voz, que se van ampliando paulatinamente a medida que amplia el vocabulario de palabras reconocidas por el sistema. Cuando el entrenamiento se ha completado el servicio que ofrece es de reconocimiento total de la voz.The platform training for each user is progressive and, while not fully completed, offers the user partial voice recognition services, which they gradually expand as the vocabulary expands of words recognized by the system. When the training is has completed the service it offers is full recognition of the voice.

Así pues, el método de la presente invención requiere para su desarrollo una plataforma de reconocimiento del habla capaz de:Thus, the method of the present invention requires for its development a recognition platform of the speaks capable of:

a)to): Crear, recuperar, almacenar y modificar perfiles de usuario, ya sea en la propia plataforma o en un medio de almacenamiento externo.Create, retrieve, store and modify user profiles, either on the platform itself or in a medium External storage

b)b): Adaptar sus parámetros de reconocimiento de acuerdo con los almacenados en el perfil del usuario.Adapt your parameters recognition according to those stored in the profile of the Username.

c)C): Adaptar su funcionamiento al nivel de entrenamiento del usuario, ofreciendo vocabularios más amplios a aquellos usuarios que hayan alcanzado un nivel de entrenamiento mayor.Adapt its operation to the level of user training, offering broader vocabularies to those users who have reached a training level higher.

d)d): Ajustar los parámetros de reconocimiento de un usuario específico a través de un entrenamiento basado en el reconocimiento de determinados vocablos, frases y/o fonemas.Adjust the parameters of recognition of a specific user through a training based on the recognition of certain words, phrases and / or phonemes.

Una vez descrita suficientemente la naturaleza de la invención, se hace constar a los efectos oportunos que los materiales, forma, tamaño y disposición de los elementos descritos podrán ser modificados, siempre y cuando ello no suponga una alteración de las características esenciales de la invención que se reivindican a continuación.Once nature is sufficiently described of the invention, it is stated for the appropriate purposes that Materials, shape, size and arrangement of the elements described may be modified, as long as this does not imply alteration of the essential characteristics of the invention that is claim below.

Claims

1. Speech recognition method with progressive training, which uses a user terminal as a means to make voice calls and a speech recognition platform, capable of recognizing the words used by the user and, where appropriate, triggering timely actions to be carried out as a consequence of them, characterized in that initially, with each user, it behaves as a system that does not require a specific training phase being able to recognize a limited number of words, recovering in each connection the parameters of each user specific and the level of training carried out up to that point in each case, or a new user is registered if there is no profile created for it because it is the first time they access the system and, depending on the training status with respect to the user in issue, the platform offers different recognition versions that differ in the richness of the available vocabulary for the specific user; each time the platform correctly recognizes a word or phrase from those pronounced by the user, it resets its internal parameters so as to maximize the probability that given the previous correct observations and the new observation, the recognized words would have been those that the user he has pronounced, for which in the first phases of training it is required that the vocabulary available to the user be reduced and increase progressively, so that the percentage of success in recognition is high; so that when a user has reached a higher level of training they can use a greater number of words that are recognized by the platform, gradually transforming into a system capable of recognizing a multitude of words or even natural language.

2. Method according to the preceding claim, characterized in that it requires for its development a speech recognition platform capable of: