FR2940497A1

FR2940497A1 - METHOD FOR CONTROLLING AN APPLICATION FROM A VOICE SIGNAL AND ASSOCIATED DEVICE FOR ITS IMPLEMENTATION

Info

Publication number: FR2940497A1
Application number: FR0858987A
Authority: FR
Inventors: Aymeric Zils; Bruno Verbrugghe; Damien Henry; Olivier Lescurieux; Nicolas Delorme
Original assignee: VOXLER
Current assignee: VOXLER
Priority date: 2008-12-23
Filing date: 2008-12-23
Publication date: 2010-06-25
Anticipated expiration: 2028-12-23
Also published as: WO2010072965A1; FR2940497B1

Abstract

La présente invention concerne un procédé de contrôle d'une application, par exemple un jeu vidéo, par un signal (4) de voix caractérisé en ce qu'il comporte l'étape de définir un vocabulaire de commande, c'est-à-dire un ensemble de mots de commande, de préférence des onomatopées, présentant chacun un phonème caractéristique ou plusieurs phonèmes caractéristiques permettant de distinguer chaque mot des autres mots du vocabulaire de commande. Dès qu'un mot de commande contenu dans le signal (4) de voix est détecté, on génère (21, 41 ) une action spécifique. Cette détection du mot de commande est effectuée au moyen d'au moins une machine (38) de discrimination permettant de détecter et discriminer la prononciation de différents phonèmes tout en excluant éventuellement un ensemble d'autres phonèmes, cette machine (38) de discrimination assurant par exemple la distinction entre au moins deux phonèmes caractéristiques ou la distinction entre au moins deux groupes de phonèmes caractéristiques. Ensuite, l'action spécifique est pilotée (17, 44, 45) en fonction d'au moins une grandeur (55) continue du signal de voix, telle que la durée, l'intensité, la hauteur ou le timbre.The present invention relates to a method of controlling an application, for example a video game, by a voice signal (4) characterized in that it comprises the step of defining a control vocabulary, that is to say to say a set of control words, preferably onomatopoeia, each having a characteristic phoneme or several characteristic phonemes making it possible to distinguish each word from the other words of the control vocabulary. As soon as a control word contained in the voice signal (4) is detected, generating (21, 41) a specific action. This detection of the control word is carried out by means of at least one discrimination machine (38) making it possible to detect and discriminate the pronunciation of different phonemes while possibly excluding a set of other phonemes, this discrimination machine (38) assuring for example the distinction between at least two characteristic phonemes or the distinction between at least two characteristic phoneme groups. Then, the specific action is driven (17, 44, 45) as a function of at least one continuous magnitude (55) of the voice signal, such as duration, intensity, pitch, or timbre.

Description

Procédé de pilotage d'une application à partir d'un signal de voix et dispositif associé pour sa mise en oeuvre Method for controlling an application from a voice signal and associated device for its implementation

La présente invention concerne un procédé de pilotage d'une application à partir d'un signal de voix ainsi que le dispositif associé pour sa mise en oeuvre. L'invention a notamment pour but d'augmenter la réactivité du contrôle de l'application, tout en permettant un pilotage fiable et précis. The present invention relates to a method for controlling an application from a voice signal and the associated device for its implementation. The invention is intended in particular to increase the responsiveness of the application control, while allowing a reliable and accurate control.

L'invention trouve une application particulièrement avantageuse, mais non exclusive, pour contrôler des actions dans des applications de type jeux vidéo à partir de la voix. Par action, on entend notamment le déplacement, io les actions de type saut, tir, des personnages du jeu vidéo, ainsi que l'environnement du jeu (décor, musique). The invention finds a particularly advantageous, but not exclusive, application for controlling actions in video game type applications from the voice. By action, we mean in particular moving, io jump-type actions, shooting, video game characters, and the game environment (scenery, music).

Toutefois, l'invention pourrait être utilisée dans d'autres applications telles que pour la commande d'instruments de musique numériques, la commande d'un avatar, la commande d'un robot de type ludique, domestique ou 15 militaire, ou toute autre application apte à être pilotée par un utilisateur, comme par exemple un logiciel de simulation de souris pour personnes handicapées ou de navigation dans une base de données ou dans une liste, comme par exemple le répertoire d'un téléphone portable. However, the invention could be used in other applications such as for the control of digital musical instruments, the control of an avatar, the control of a playful robot, domestic or military, or any other application capable of being controlled by a user, such as a mouse simulation software for disabled people or navigation in a database or in a list, such as the directory of a mobile phone.

On connaît des systèmes traditionnels de contrôle vocal offrant un contrôle 20 basique par reconnaissance d'un mot entier, une fois que le mot a été prononcé. On a alors un contrôle discret , dans la mesure où chaque mot est reconnu de manière identique quelle que soit la façon dont il a été prononcé. Par exemple, la reconnaissance des mots gauche ou droite n'offre que deux choix discrets : gauche ou droite et donc la 25 génération de deux actions discrètes associées à ces mots. Ces systèmes attendent nécessairement la fin du mot avant d'engager une action. Traditional voice control systems providing basic control by recognition of an entire word once the word has been spoken are known. We then have a discrete control, insofar as each word is recognized identically regardless of how it was pronounced. For example, the recognition of left or right words offers only two discrete choices: left or right and thus the generation of two discrete actions associated with these words. These systems necessarily wait for the end of the word before taking action.

L'invention permet d'engager une action correspondant au mot détecté avant que l'utilisateur ait fini de prononcer le mot afin d'augmenter la réactivité du procédé de pilotage vocal. A cet effet, l'invention détecte la présence de voix, 30 puis met en oeuvre une reconnaissance, non pas de l'intégralité, mais des premiers phonèmes des mots de commande de l'application, puis permet de piloter l'application en fonction de la prononciation de la suite du mot de commande. Dans le cas de mots de commande à plusieurs syllabes, le processus détection/pilotage peut être réitéré, le mot de commande contient donc plusieurs parties de détection et de pilotage. The invention makes it possible to initiate an action corresponding to the detected word before the user has finished pronouncing the word in order to increase the responsiveness of the voice control method. For this purpose, the invention detects the presence of voices, then implements a recognition, not of the entirety, but of the first phonemes of the control words of the application, then makes it possible to control the application in function of the pronunciation of the continuation of the command word. In the case of multi-syllable control words, the detection / control process can be reiterated, so the control word contains several detection and control parts.

Dans la suite du document, on entend par MOT DE COMMANDE un mot dont la prononciation permet de contrôler une application, par détection, puis par pilotage. In the remainder of the document, a CONTROL word is understood to mean a word whose pronunciation makes it possible to control an application, by detection, then by control.

Dans la suite du document, on entend par PHONEMES DE DETECTION io dans un mot de commande les phonèmes utilisés pour la reconnaissance d'une partie d'un mot de commande, permettant de déclencher une action dans une application. In the remainder of the document, DETECTION PHONEMES in a control word are understood to mean the phonemes used for the recognition of a part of a command word, making it possible to trigger an action in an application.

Dans la suite du document, on entend par PHONEMES DE PILOTAGE dans un mot de commande les phonèmes utilisés pour le suivi de caractéristiques 15 de la voix permettant de piloter une action dans une application. In the remainder of the document, PHONEMES DE PILOTAGE in a control word is understood to mean the phonemes used for the monitoring of characteristics of the voice making it possible to control an action in an application.

Dans la suite du document, on entend par VOCABULAIRE DE COMMANDE l'ensemble des mots de commande à discriminer simultanément pour commander une application. In the remainder of the document, the VOCABULAIRE DE COMMANDE is understood to mean all of the command words to be discriminated simultaneously in order to control an application.

L'implémentation classique de la détection dans un mot de commande utilise 20 les techniques de reconnaissance vocale connues de l'homme du métier, basées par exemple sur des Modèles de Markov Cachés (ou HMM : Hidden Markov Models). The conventional implementation of detection in a command word makes use of voice recognition techniques known to those skilled in the art, based for example on Hidden Markov Models (or HMMs).

Une implémentation optimisée de la détection dans un mot de commande, dite par phonème caractéristique , consiste à détecter au moins un 25 phonème caractéristique de chaque partie à détecter dans le mot de commande, la première partie à détecter étant située de préférence en début de mot. On utilise pour cela des machines d'analyse vocale permettant de discriminer ces phonèmes caractéristiques, appelée machine de discrimination. Par exemple, pour détecter Flip et Flap , on utilise une machine permettant de discriminer la prononciation de A et de I . An optimized implementation of the detection in a so-called phoneme command word, consists in detecting at least one characteristic phoneme of each part to be detected in the control word, the first part to be detected being preferably located at the beginning of the word . This is done using voice analysis machines to discriminate these characteristic phonemes, called discrimination machine. For example, to detect Flip and Flap, we use a machine to discriminate the pronunciation of A and I.

Dans la suite du document, on entend par PHONEME CARACTERISTIQUE d'un mot le phonème permettant de le distinguer par rapport aux autres mots du vocabulaire de commande. In the remainder of the document, PHONEME CHARACTERISTIC of a word is understood to be the phoneme making it possible to distinguish it from other words in the control vocabulary.

Cependant, afin d'assurer la fiabilité de la détection d'un phonème caractéristique, la machine de discrimination ne doit pas produire de fausse détection avec l'un des phonèmes prononcés avant les phonèmes caractéristiques des mots à détecter. Ainsi, dans notre exemple, il faut s'assurer que la machine A / I ne produit pas de fausse détection à la io prononciation de F ou de L . However, in order to ensure the reliability of the detection of a characteristic phoneme, the discrimination machine must not produce false detection with one of the phonemes pronounced before the characteristic phonemes of the words to be detected. Thus, in our example, it must be ensured that the A / I machine does not produce false detection at the pronunciation of F or L.

Dans la suite du document, on entend par MACHINE DE DISCRIMINATION un outil d'analyse de la voix permettant de détecter et discriminer la prononciation de différents phonèmes, tout en excluant éventuellement un ensemble d'autres phonèmes de la détection. In the remainder of the document, DISCRIMINATION MACHINE is understood to mean a voice analysis tool making it possible to detect and discriminate the pronunciation of different phonemes, while possibly excluding a set of other phonemes of the detection.

15 En outre, afin de renforcer la fiabilité de détection, il est possible de combiner plusieurs machines de discrimination afin de détecter plusieurs phonèmes caractéristiques de chaque mot de commande. Dans ce cas, la détection porte non plus sur un seul, mais sur un groupe de phonèmes. Il suffit simplement de s'assurer que chaque machine de discrimination ne produit 20 pas de fausse détection avec chaque phonème prononcé avant les phonèmes caractéristiques de l'ensemble des mots du vocabulaire de commande. Cette méthode est désignée ci-après par le terme principe de non détection des phonèmes antérieurs des mots de commande. In addition, in order to enhance the detection reliability, it is possible to combine several discrimination machines in order to detect several phonemes characteristic of each control word. In this case, the detection is no longer a single, but a group of phonemes. It suffices simply to ensure that each discrimination machine does not produce false detection with each phoneme uttered before the characteristic phonemes of the set of words of the control vocabulary. This method is hereinafter referred to as the principle of non-detection of the previous phonemes of the control words.

Cette reconnaissance précoce du mot de commande permet de mettre en 25 oeuvre la deuxième étape de l'invention, consistant à utiliser les phonèmes de pilotage du mot de commande, c'est-à-dire les phonèmes suivant les phonèmes de détection, pour effectuer des contrôles supplémentaires de l'application, appelés pilotage de l'application. Ce pilotage est basé sur les paramètres prosodiques de la voix, comme par exemple la durée, la hauteur, 30 l'énergie ou le timbre avec lesquels sont prononcés les phonèmes de pilotage du mot de commande. This early recognition of the control word makes it possible to implement the second step of the invention, consisting in using the driving phonemes of the control word, that is to say the phonemes according to the detection phonemes, to perform additional controls of the application, called application control. This control is based on the prosodic parameters of the voice, such as, for example, the duration, the pitch, the energy or the timbre with which the driving phonemes of the control word are pronounced.

Par exemple, la prononciation du mot de commande gauche aura une influence sur la pilotage de l'action, suivant qu'elle est brève ( gauche ) ou plus longue ( gauuuuuuche ) suivant qu'elle se fait avec une voix grave, une voix aiguë, ou une voix dont la hauteur évolue, suivant qu'elle se fait avec une voix forte, une voix douce, ou une voix dont l'intensité évolue. Ces paramètres prosodiques permettent d'ajouter des contrôles vocaux supplémentaires, et peuvent s'appliquer à des valeurs continues (comme par exemple une position ou une vitesse), ce que ne permet pas la io reconnaissance discrète de mots de commande. Ces paramètres prosodiques permettent aussi de piloter une deuxième action au sein d'un mot de commande. Ainsi les premiers phonèmes (phonèmes de détection) sont utilisés pour choisir entre deux actions, et les suivants (phonèmes de pilotage) sont utilisés pour piloter les actions, le pilotage pouvant être réduit à 15 un choix entre deux autres actions. For example, the pronunciation of the left command word will have an influence on the control of the action, depending on whether it is brief (left) or longer (gauuuuuuche) depending on whether it is done with a deep voice, a high voice , or a voice whose pitch evolves, according as it is made with a loud voice, a soft voice, or a voice whose intensity evolves. These prosodic parameters make it possible to add additional voice controls, and can be applied to continuous values (such as a position or speed, for example), which is not possible with the discrete recognition of control words. These prosodic parameters also make it possible to drive a second action within a command word. Thus, the first phonemes (detection phonemes) are used to choose between two actions, and the following ones (control phonemes) are used to control the actions, the control being able to be reduced to a choice between two other actions.

Plus précisément, dans l'invention, on choisit un vocabulaire de commande constitué d'un ensemble de mots de commande comportant chacun une séquence de phonèmes, un phonème étant considéré comme la plus petite unité discrète ou distinctive (c'est-à-dire permettant de distinguer des mots 20 les uns des autres) que l'on puisse isoler par segmentation dans la chaîne parlée. Dans l'invention, le phonème peut correspondre à un son tenu ou non, qui a une hauteur ou non (par exemple les différentes voyelles et consonnes). Le vocabulaire de commande est constitué d'un ensemble de mots, dits mots de commande, dont la prononciation est distincte, et pouvant 25 donc être discriminés les uns des autres de manière fiable, de préférence avant la fin de leur prononciation. Les mots de commande choisis pour le contrôle vocal sont de préférence des onomatopées, c'est-à-dire des mots courts, de préférence monosyllabiques, comme par exemple Paf , Fire , etc...ou des mots plus longs tels que bazooka ou fusil présentant un 30 phonème caractéristique pouvant être détecté avant la fin du mot, dans le cas d'une détection optimisée par phonème caractéristique. More specifically, in the invention, a control vocabulary is chosen consisting of a set of control words each comprising a sequence of phonemes, a phoneme being considered as the smallest discrete or distinctive unit (that is to say to distinguish words from each other) that can be isolated by segmentation in the spoken channel. In the invention, the phoneme can correspond to a sound held or not, which has a height or not (for example the different vowels and consonants). The control vocabulary consists of a set of words, called control words, whose pronunciation is distinct, and which can therefore be reliably discriminated from one another, preferably before the end of their pronunciation. The control words chosen for the voice control are preferably onomatopoeia, that is to say short words, preferably monosyllabic, such as Paf, Fire, etc ... or longer words such as bazooka or A rifle having a characteristic phoneme detectable before the end of the word, in the case of a phoneme-optimized detection.

Le contrôle vocal selon l'invention présente plusieurs étapes. 4 Dans une première étape, on détecte la présence d'un son vocal quel qu'il soit sans s'attacher à son contenu, cette détection pouvant générer instantanément une action générique. Cette étape est de préférence mise en oeuvre avant même l'étape de classification définie ci-après. Toutefois, cette première étape n'est pas obligatoire et pourrait être omise. Ensuite, dans une étape de détection, le système reconnaît les phonèmes de détection du mot de commande permettant de déclencher une action, parmi l'ensemble des phonèmes de détection du vocabulaire de commande prédéfini pour le contrôle de l'application. Dans le cas de la détection optimisée par phonème io caractéristique, cette détection est basée sur la reconnaissance d'au moins un phonème caractéristique de l'onomatopée à l'aide d'au moins une machine de discrimination. Une action spécifique correspondant à la partie détectée du mot de commande est alors engagée. The voice control according to the invention has several steps. In a first step, the presence of any vocal sound is detected without being attached to its content, this detection being able to generate a generic action instantaneously. This step is preferably carried out even before the classification step defined below. However, this first step is not mandatory and could be omitted. Then, in a detection step, the system recognizes the detection phonemes of the command word for triggering an action, among the set of detection phonemes of the predefined command vocabulary for the control of the application. In the case of the phoneme-optimized detection, this detection is based on the recognition of at least one phoneme characteristic of the onomatopoeia using at least one discrimination machine. A specific action corresponding to the detected part of the control word is then engaged.

Plusieurs machines de discrimination peuvent être utilisées en parallèle, afin 15 d'améliorer la fiabilité de la détection, le lancement d'une action spécifique nécessitant alors la combinaison de plusieurs phonèmes de détection. Several discrimination machines can be used in parallel, in order to improve the reliability of the detection, the initiation of a specific action then requiring the combination of several detection phonemes.

D'autre part, plusieurs machines de discrimination peuvent être utilisées successivement, de sorte qu'une action spécifique peut être engagée lorsque l'on détecte chacun des différents phonèmes de détection. Une fois que le 20 système a reconnu la partie de détection du mot de commande et a déterminé l'action spécifique devant être menée, l'action est pilotée grâce à un contrôle prosodique sur la prononciation des phonèmes de pilotage du mot de commande. Ce contrôle prosodique permet de piloter l'action spécifique engagée en fonction d'une grandeur continue extraite du signal de 25 voix telle que la durée, l'intensité, la hauteur ou le timbre de la voix, chacune de ces valeurs pouvant constituer un axe de contrôle de l'action. On the other hand, several discrimination machines can be used successively, so that a specific action can be taken when detecting each of the different detection phonemes. Once the system has recognized the detection portion of the control word and determined the specific action to be taken, the action is driven by a prosodic control over the pronunciation of the control word driving phonemes. This prosodic control makes it possible to control the specific action engaged as a function of a continuous magnitude extracted from the voice signal such as the duration, the intensity, the pitch or the timbre of the voice, each of these values being able to constitute an axis control of the action.

L'action spécifique, et le cas échéant l'action générique, s'achève lorsque le son de la voix se termine, ou lorsqu'un phonème particulier du mot de commande est reconnu. The specific action, and if appropriate the generic action, ends when the sound of the voice ends, or when a particular phoneme of the command word is recognized.

30 En variante, on commence par l'étape de pilotage des paramètres prosodiques si la partie pilotage se situe au début du mot de commande, et on déclenche une action spécifique lorsqu'on reconnait la partie détection du mot de commande. As a variant, the step of controlling the prosodic parameters is started if the piloting part is at the beginning of the control word, and a specific action is triggered when the detection part of the control word is recognized.

Avec la méthode optimisée de détection par phonème caractéristique, en se basant sur l'utilisation de machines de discrimination de simples phonèmes, le procédé selon l'invention a une consommation CPU extrêmement faible. With the optimized method of detection by phoneme characteristic, based on the use of simple phonemes discrimination machines, the method according to the invention has an extremely low CPU consumption.

En outre, les machines de discrimination utilisées sont très fiables de sorte qu'en choisissant des phonèmes caractéristiques suffisamment éloignés d'un point de vue de la prononciation, on obtient un système de grande fiabilité. In addition, the discriminating machines used are very reliable so that by choosing characteristic phonemes far enough from a point of view of pronunciation, one obtains a system of great reliability.

Par ailleurs, le procédé selon l'invention est facilement adaptable à plusieurs io utilisateurs. En effet, les algorithmes de reconnaissance de phonèmes utilisés par les machines de discrimination proposent par défaut des réglages génériques multi-locuteurs. De plus, pour améliorer leur fiabilité, ces algorithmes sont facilement adaptables à tout nouvel utilisateur par un apprentissage extrêmement court : un simple enregistrement de quelques 15 phonèmes ou de chaque mot. Moreover, the method according to the invention is easily adaptable to several users. Indeed, phoneme recognition algorithms used by discrimination machines by default offer generic settings multi-speakers. In addition, to improve their reliability, these algorithms are easily adaptable to any new user by an extremely short learning: a simple recording of some 15 phonemes or each word.

L'invention concerne donc un procédé de contrôle d'une application, par exemple un jeu vidéo, par un signal de voix caractérisé en ce qu'il comporte les étapes suivantes : - définir un vocabulaire de commande, c'est-à-dire un ensemble de mots de 20 commande, de préférence des onomatopées, présentant chacun un phonème caractéristique ou plusieurs phonèmes caractéristiques permettant de distinguer chaque mot des autres mots du vocabulaire de commande, - détecter un mot de commande contenu dans le signal de voix au moyen d'au moins une machine de discrimination permettant de détecter et 25 discriminer la prononciation de différents phonèmes tout en excluant éventuellement un ensemble d'autres phonèmes, cette machine de discrimination assurant par exemple la distinction entre au moins deux phonèmes caractéristiques ou la distinction entre au moins deux groupes de phonèmes caractéristiques, 30 - générer une action spécifique correspondant au mot de commande détecté, puis - piloter l'action spécifique générée en fonction d'au moins une grandeur continue du signal de voix, telle que la durée, l'intensité, la hauteur ou le timbre du signal. The invention thus relates to a method for controlling an application, for example a video game, by a voice signal characterized in that it comprises the following steps: defining a control vocabulary, that is to say a set of control words, preferably onomatopoeia, each having a characteristic phoneme or several characteristic phonemes for distinguishing each word from the other words of the control vocabulary; - detecting a control word contained in the voice signal by means of at least one discrimination machine making it possible to detect and discriminate the pronunciation of different phonemes while possibly excluding a set of other phonemes, this discriminating machine ensuring, for example, the distinction between at least two characteristic phonemes or the distinction between at least two two groups of characteristic phonemes, 30 - generate a specific action corresponding to the command word de tected, then - drive the specific action generated according to at least one continuous magnitude of the voice signal, such as the duration, intensity, pitch or timbre of the signal.

Selon une mise en oeuvre, la machine de discrimination détecte le phonème 5 ou le groupe de phonèmes caractéristique du mot de commande et génère l'action spécifique avant la fin de la prononciation du mot de commande. According to one embodiment, the discrimination machine detects the phoneme 5 or the group of phonemes characteristic of the control word and generates the specific action before the end of the pronunciation of the command word.

Selon une mise en oeuvre, la grandeur continue du signal de voix en fonction de laquelle l'action est pilotée est située dans le temps après le phonème de détection détecté par la machine de discrimination. According to one embodiment, the continuous magnitude of the voice signal according to which the action is driven is located in time after the detection phoneme detected by the discrimination machine.

io Selon une mise en oeuvre, dès que la présence d'un signal de voix est détectée, avant même la reconnaissance du mot de commande, on génère une action générique. According to one implementation, as soon as the presence of a voice signal is detected, even before the recognition of the command word, a generic action is generated.

Selon une mise en oeuvre, l'action spécifique peut être différente de l'action générique. According to one implementation, the specific action may be different from the generic action.

15 Selon une mise en oeuvre, le procédé mettant en oeuvre au moins deux machines de discrimination, plusieurs actions spécifiques sont engagées successivement lors de la détection successive de phonèmes ou groupes de phonèmes caractéristiques dans le mot de commande par chacune des machines. According to one embodiment, the method implementing at least two discrimination machines, several specific actions are engaged successively during the successive detection of phonemes or groups of characteristic phonemes in the control word by each of the machines.

20 Selon une mise en oeuvre, deux machines de discrimination étant utilisées, - une première action spécifique est générée après la détection du premier phonème de détection du mot de commande par une première machine de discrimination apte à discriminer N phonèmes différents, de sorte que cette une première action peut être générée parmi N premières actions possibles, 25 et - une deuxième action spécifique est générée après la détection du deuxième phonème caractéristique du mot de commande par une deuxième machine de discrimination apte à discriminer M phonèmes distincts, de sorte que cette deuxième action peut être générée parmi N x M deuxièmes actions possibles. According to one embodiment, two discriminating machines being used, a first specific action is generated after the detection of the first detection phoneme of the control word by a first discrimination machine able to discriminate N different phonemes, so that this a first action can be generated among N first possible actions, and - a second specific action is generated after the detection of the second characteristic phoneme of the control word by a second discrimination machine able to discriminate M distinct phonemes, so that this second action can be generated among N x M second possible actions.

Selon une mise en oeuvre : - dès qu'on détecte un phonème ou un groupe de phonèmes caractéristiques, ou - dès que l'un des paramètres continu du signal de voix dépasse un certain seuil, par exemple dès que le signal de voix disparaît, - l'action spécifique se termine. Selon une mise en oeuvre, 1 et 7 machines de discrimination sont mises en oeuvre afin de pouvoir faire la différence entre 1 à 15 mots de commande. According to one embodiment: - as soon as one detects a phoneme or a group of characteristic phonemes, or - as soon as one of the continuous parameters of the voice signal exceeds a certain threshold, for example as soon as the voice signal disappears, - the specific action ends. According to one implementation, 1 and 7 discrimination machines are implemented in order to be able to differentiate between 1 to 15 control words.

io Selon une mise en oeuvre, la machine de discrimination utilise les modèles de Markov cachés. According to one embodiment, the discrimination machine uses the hidden Markov models.

Selon une mise en oeuvre, on choisit de générer des actions et de piloter de manière synchronisée avec les frames vidéo d'un jeu à piloter. According to one embodiment, it is decided to generate actions and to drive synchronously with the video frames of a game to be piloted.

Selon une mise en oeuvre, le procédé selon l'invention est mis en oeuvre en 15 combinaison avec un procédé de pilotage par une autre interface de contrôle, comme par exemple un game pad, un clavier, un dispositif de suivi oculaire, ou un dispositif de suivi de mouvement. According to one embodiment, the method according to the invention is implemented in combination with a control method by another control interface, such as for example a game pad, a keyboard, an eye tracking device, or a device. motion tracking.

Selon une mise en oeuvre, pour résister aux environnements bruités, on utilise un microphone de contact. According to one embodiment, to withstand noisy environments, a contact microphone is used.

20 L'invention concerne en outre un dispositif de contrôle d'une application, par exemple un jeu vidéo, par un signal de voix caractérisé en ce que : - un vocabulaire de commande, c'est-à-dire un ensemble de mots de commande, de préférence des onomatopées, étant défini, chaque mot de commande présentant un phonème caractéristique ou plusieurs phonèmes 25 caractéristiques permettant de distinguer chaque mot des autres mots du vocabulaire de commande, il comporte : - des moyens pour détecter un mot de commande contenu dans le signal de voix, ces moyens comportant au moins une machine de discrimination permettant de détecter et discriminer la prononciation de différents 8 phonèmes tout en excluant éventuellement un ensemble d'autres phonèmes, cette machine de discrimination assurant par exemple la distinction entre au moins deux phonèmes caractéristiques ou la distinction entre au moins deux groupes de phonèmes caractéristiques, - des moyens pour générer une action spécifique correspondant au mot de commande détecté, et - des moyens pour piloter l'action spécifique générée en fonction d'au moins une grandeur continue du signal de voix, telle que la durée, l'intensité, la hauteur ou le timbre du signal. The invention further relates to a device for controlling an application, for example a video game, by a voice signal characterized in that: - a control vocabulary, that is to say a set of words of command, preferably onomatopoeia, being defined, each control word having a characteristic phoneme or several characteristic phonemes making it possible to distinguish each word from the other words of the command vocabulary, it comprises: means for detecting a command word contained in the voice signal, these means comprising at least one discrimination machine making it possible to detect and discriminate the pronunciation of different phonemes while possibly excluding a set of other phonemes, this discrimination machine ensuring for example the distinction between at least two phonemes characteristics or the distinction between at least two groups of characteristic phonemes, - means for generating an act specific ion corresponding to the detected control word, and - means for controlling the specific action generated as a function of at least one continuous magnitude of the voice signal, such as duration, intensity, pitch or timbre of the signal .

io L'invention sera mieux comprise à la lecture de la description qui suit et à l'examen des figures qui l'accompagnent. Ces figures ne sont données qu'à titre illustratif mais nullement limitatif de l'invention. Elles montrent : The invention will be better understood on reading the description which follows and on examining the figures which accompany it. These figures are given for illustrative but not limiting of the invention. They show :

Figure 1 : une représentation schématique des différents modules du système de contrôle vocal selon l'invention permettant de contrôler une 15 action dans un jeu vidéo ; Figure 1: a schematic representation of the various modules of the voice control system according to the invention for controlling an action in a video game;

Figure 2 : des tableaux montrant des exemples de phonèmes de détection et de phonèmes de pilotage utilisées dans le procédé selon l'invention pour discriminer des mots de commande et piloter les actions correspondantes; 2: tables showing examples of detection phonemes and control phonemes used in the method according to the invention for discriminating control words and controlling the corresponding actions;

Figures 3a, 3b, 3c : des tableaux montrant les différentes machines de 20 discrimination pouvant être utilisées dans le procédé selon l'invention pour détecter les phonèmes caractéristiques de plusieurs paires de mots de commande, dans le cas d'une détection optimisée par phonème caractéristique ; Figures 3a, 3b, 3c: Tables showing the different discriminating machines that can be used in the method according to the invention for detecting the phonemes characteristic of several pairs of control words, in the case of a detection optimized by characteristic phoneme ;

Figure 3d: un tableau comparant la reconnaissance vocale classique et la 25 détection optimisée par phonème caractéristique sur un exemple ; Figure 3d: a table comparing conventional speech recognition and phoneme-optimized detection on an example;

Figure 4 : une représentation schématique d'un système selon l'invention permettant la commande d'une action dans un jeu vidéo à partir des mots bazooka ou fusil , dans le cas d'une détection optimisée avec les phonèmes caractéristiques [y] et [u]. FIG. 4: a schematic representation of a system according to the invention enabling the control of an action in a video game from the words bazooka or rifle, in the case of an optimized detection with the characteristic phonemes [y] and [ u].

Les éléments identiques conservent la même référence d'une figure à l'autre. Identical elements retain the same reference from one figure to another.

La Figure 1 montre une représentation schématique d'un système 1 de contrôle vocal selon l'invention permettant de piloter une action dans un jeu 3 vidéo à partir d'un signal 4 de voix. Figure 1 shows a schematic representation of a voice control system 1 according to the invention for controlling an action in a video game 3 from a voice signal 4.

A cet effet, le système 1 comporte un microphone 6 relié en entrée d'un module 9 d'analyse par l'intermédiaire d'un convertisseur 7 analogique/numérique. Le microphone 6 peut être un microphone aérien classique ou tout autre dispositif de captation de la voix, en particulier un microphone conçu pour ne pas être sensible au bruit ambiant tel qu'un io microphone de contact. For this purpose, the system 1 comprises a microphone 6 connected to the input of an analysis module 9 via an analog / digital converter 7. The microphone 6 may be a conventional overhead microphone or any other voice pickup device, particularly a microphone designed to be unresponsive to ambient noise such as a contact microphone.

Le module 9 apte à extraire des paramètres du signal 4 de voix est connecté en entrée d'un module 10 dit module de mapping en relation avec les entrées du jeu vidéo 3 à contrôler, le module 10 de mapping assurant la correspondance entre les paramètres du signal 4 de voix extraits et le 15 pilotage de l'action dans le jeu 3 vidéo. The module 9 able to extract parameters from the voice signal 4 is connected at the input of a module 10 called a mapping module in relation to the inputs of the video game 3 to be controlled, the mapping module 10 ensuring the correspondence between the parameters of the voice signal 4 extracts and the steering of the action in the game 3 video.

Plus précisément, le module 9 d'analyse comporte un module 13 de détection de la présence de la voix apte à détecter la présence ou non du signal 4 de voix, indépendamment de son contenu. More specifically, the analysis module 9 comprises a voice presence detection module 13 capable of detecting the presence or absence of the voice signal 4, regardless of its content.

Le module 9 d'analyse comporte également un module 15 de classification 20 apte à détecter un mot de commande contenu dans le signal de voix. Ce mot de commande, par exemple une onomatopée de type pif , paf , présente un phonème caractéristique permettant de le distinguer des autres mots de commande prédéfinis. La détection optimisée par phonème caractéristique utilise des machines de discrimination permettant de détecter 25 ces phonèmes caractéristiques (voir Figures 3a à 3d). The analysis module 9 also comprises a classification module 20 able to detect a control word contained in the voice signal. This control word, for example an onomatopoeia of the type pif, paf, has a characteristic phoneme making it possible to distinguish it from the other predefined command words. The characteristic phoneme-optimized detection uses discrimination machines to detect these characteristic phonemes (see Figures 3a-3d).

A cet effet, le module 15 détecte la partie détection du mot de commande. Dans le cas de la détection optimisée par phonème caractéristique, le module 15 comporte au moins une machine de discrimination ou une combinaison de plusieurs machines de discrimination permettant chacune de discriminer au moins deux phonèmes. La combinaison de plusieurs machines de discrimination permet notamment d'utiliser une concaténation de phonèmes caractéristiques, plutôt qu'un seul phonème caractéristique. For this purpose, the module 15 detects the detection part of the control word. In the case of detection optimized by characteristic phoneme, the module 15 comprises at least one discrimination machine or a combination of several discrimination machines each for discriminating at least two phonemes. The combination of several discrimination machines makes it possible in particular to use a concatenation of characteristic phonemes, rather than a single characteristic phoneme.

Ainsi, la classification effectuée par le module 15 peut s'opérer sur les éléments suivants : - des éléments tels que les phonèmes (par exemple A , P , ...), - des combinaisons d'éléments, tels que des ensembles de phonèmes non temporellement ordonnés (par exemple [ P + A ] ou [ C + L + A ]), io - des combinaisons d'éléments tels que des phonèmes temporellement ordonnés (par exemple [ gau , che , bazoo , ka ), - toute combinaison d'éléments simples et complexes. Thus, the classification carried out by the module 15 can operate on the following elements: elements such as phonemes (for example A, P,...); Combinations of elements, such as sets of phonemes; not temporally ordered (eg [P + A] or [C + L + A]), io - combinations of elements such as temporally ordered phonemes (eg [gau, che, bazoo, ka), - any combination simple and complex elements.

Dans le cas d'une détection par reconnaissance vocale classique, les machines de reconnaissance de phonèmes sont connues de l'homme du 15 métier et notamment utilisées dans le domaine de la reconnaissance de la parole. Ces machines peuvent être réalisées à partir d'algorithmes comme par exemple: modèles statistiques et probabilistiques de type HMM (Hidden Markov Model), modèles de distance, etc... Un tel système de reconnaissance vocale correspond à un exemple de machine de 20 discrimination. In the case of detection by conventional speech recognition, the phoneme recognition machines are known to those skilled in the art and in particular used in the field of speech recognition. These machines can be made from algorithms such as, for example, statistical and probabilistic models of the HMM (Hidden Markov Model) type, distance models, etc. Such a voice recognition system corresponds to an example of a discrimination machine. .

Dans le cas de la détection optimisée par phonème caractéristique, dans un exemple, les machines de discrimination sont choisies parmi une machine de discrimination de voyelles A/I permettant de différencier les mots de commande en a des mots de commande en i , une machine de 25 voyelles A/1/0U permettant de différencier les mots de commande en a , des mots de commande en i , et des mots de commande en ou , une machine de discrimination de consonnes P/K permettant de différencier les mots de commande en p des mots de commande en k , une machine de consonnes F,CH,SSS/voyelles permettant de différencier les mots de 30 commande en f , ch , sss par rapport à n'importe quelle voyelle, une machine P, T, K/B, D, G permettant de différencier les mots de commande en p , t ou k par rapport aux mots de commande en b , d ou g . Toutefois tout autre type de machine de discrimination est envisageable, dans la mesure où elle respecte le principe de non-détection des phonèmes antérieurs des mots de commande. In the case of the characteristic phoneme-optimized detection, in one example, the discrimination machines are chosen from a vowel discrimination machine A / I making it possible to differentiate the control words at a from the control words at i, a machine of 25 vowels A / 1 / 0U for differentiating the command words in a, control words in i, and control words in or, a discrimination machine P / K consonants to differentiate the control words in p control words in k, a machine of consonants F, CH, SSS / vowels to differentiate the control words in f, ch, sss from any vowel, a machine P, T, K / B , D, G for differentiating the control words in p, t or k with respect to the control words in b, d or g. However any other type of discriminating machine is possible, insofar as it respects the principle of non-detection of the previous phonemes of the command words.

Dans une méthode simple de reconnaissance de mot, on n'utilise qu'une seule machine de discrimination pour détecter des mots de commande ne variant que par les seuls phonèmes détectables par la machine de discrimination choisie. Ainsi en utilisant une machine de discrimination de voyelles NI, il sera possible de distinguer par exemple les mots de commande Flip et Flap , la machine respectant de préférence le io principe de non-détection des phonèmes antérieurs des mots de commande, c'est-à-dire qu'elle ne produit pas de fausse détection à la prononciation de F et L . En utilisant une machine de discrimination P/K, il sera possible de distinguer les mots de commande Pan et Clac , etc... In a simple method of word recognition, only one discrimination machine is used to detect control words varying only by the only phonemes detectable by the discrimination machine chosen. Thus, using a NI vowel discrimination machine, it will be possible to distinguish, for example, the Flip and Flap control words, the machine preferably respecting the principle of non-detection of the previous phonemes of the control words, that is, that is, it does not produce false detection at the pronunciation of F and L. By using a discrimination machine P / K, it will be possible to distinguish the command words Pan and Clac, etc.

Pour une détection de mots de commande plus complexe, on peut utiliser 15 plusieurs machines de discrimination successivement, de sorte qu'une action spécifique peut être engagée lorsque l'on détecte chacun des différents phonèmes de détection (boucle 2). Par exemple, si on combine deux machines A/I et P/K, on peut reconnaître 4 mots différents Paf / Pif / Clac / Clic . For more complex control word detection, several discrimination machines can be used successively, so that a specific action can be taken when detecting each of the different detection phonemes (loop 2). For example, if we combine two machines A / I and P / K, we can recognize 4 different words Paf / Pif / Clac / Clic.

20 En variante, une des classes de la ou d'une des machines de discrimination correspond à tout ce qui n'est pas le ou l'un des phonèmes à détecter, à l'exception des phonèmes antérieurs des mots de commande. Ainsi, il est possible d'associer une action à un mot ou à un début de mot et une autre action à tout mot prononcé non reconnu (tout autre groupe de phonèmes). As a variant, one of the classes of the one or more discriminating machines corresponds to all that is not the one or one of the phonemes to be detected, with the exception of the previous phonemes of the control words. Thus, it is possible to associate an action with a word or a beginning of a word and another action with any pronounced unrecognized word (any other group of phonemes).

25 Le module 9 comporte également un module 17 de pilotage apte à extraire les paramètres continus du signal de voix tels que la durée, l'intensité, la hauteur, le timbre. Ce module 17 d'extraction peut être réalisé comme celui qui est décrit dans la demande de brevet FR-2905510. The module 9 also comprises a control module 17 capable of extracting the continuous parameters of the voice signal such as duration, intensity, pitch, timbre. This extraction module 17 can be made as that described in the patent application FR-2905510.

Le module 10 de mapping comporte un module 20 d'action générique, un 30 module 21 de génération d'action spécifique, et un module 22 de modulation de l'action spécifique associés respectivement aux modules d'analyse 13, 15 et 17 qui génèrent des actions dans le jeu à contrôler en fonction des paramètres de voix détectés par le module 9. Des exemples de mapping sont également décrits dans la demande de brevet FR-2905510. The mapping module 10 comprises a generic action module 20, a specific action generation module 21, and a specific action modulation module 22 associated respectively with the analysis modules 13, 15 and 17 which generate actions in the game to be controlled according to the voice parameters detected by the module 9. Examples of mapping are also described in the patent application FR-2905510.

Ainsi dans une phase d'utilisation, lorsque l'utilisateur émet un signal 4 de voix, ce signal de voix est transformé par le convertisseur 7 en un signal 4 de voix numérique. Thus, in a use phase, when the user transmits a voice signal 4, this voice signal is converted by the converter 7 into a digital voice signal 4.

Dès que le module 13 détecte la présence du signal 4 de voix, une action générique est générée par le module 20 dans le jeu 3. Ainsi, le temps de io réponse du procédé selon l'invention est quasi instantané. As soon as the module 13 detects the presence of the voice signal 4, a generic action is generated by the module 20 in the set 3. Thus, the response time of the method according to the invention is almost instantaneous.

Ensuite, lorsque le module 15 de classification détecte le mot de commande présent dans le signal de voix au moyen de la ou des machines de discrimination, le module 21 d'action spécifique génère une action spécifique correspondant à l'onomatopée détectée. Then, when the classification module detects the control word present in the voice signal by means of the discrimination machine or machines, the specific action module 21 generates a specific action corresponding to the detected onomatopoeia.

15 Le module 22 de modulation de l'action permet alors de piloter l'action en fonction des paramètres continus du signal de voix extraits par le module 17 d'extraction. Chacune des valeurs continues peuvent constituer un axe de pilotage de l'action. Par exemple, si l'action spécifique engendrée par le module 21 suite à la détection est le départ de la course d'un personnage, la 20 hauteur du signal de voix pourra commander la direction de déplacement, tandis que l'intensité pourra commander la vitesse. The modulation module 22 of the action then makes it possible to control the action as a function of the continuous parameters of the voice signal extracted by the extraction module 17. Each of the continuous values can constitute a steering axis of the action. For example, if the specific action generated by the module 21 following the detection is the start of the race of a character, the height of the voice signal may control the direction of movement, while the intensity may control the direction of movement. speed.

Dans certaines mises en oeuvre, plusieurs actions spécifiques peuvent être engagées successivement lors de la détection successive des phonèmes dans le mot de commande. Ainsi par exemple si une première machine de 25 discrimination P/K est utilisée en combinaison avec une machine de discrimination NI pour détecter les 4 onomatopées Paf / Pif / Clac / Clic , lorsque la première consonne est détectée, on peut commander une première action parmi deux premières actions possibles correspondant aux deux 2 groupes Paf+Pif et Clac+Clic . Puis, lorsque la voyelle est 30 détectée, on peut commander une deuxième action parmi les 4 groupes Paf , Pif , Clac et Clic . Ainsi, une action peut être générée dès la détection du premier phonème caractéristique du mot de commande par le module 15 de classification. Ce déclenchement d'action précoce augmente encore l'impression d'instantanéité de la commande de l'action. On peut ensuite piloter de manière continue l'action spécifique sur le phonème de pilotage a ou i du mot de commande. In some implementations, several specific actions can be engaged successively during the successive detection of the phonemes in the command word. Thus, for example, if a first discrimination machine P / K is used in combination with a discrimination machine NI to detect the 4 onomatopoeia Paf / Pif / Clac / Clic, when the first consonant is detected, a first action can be controlled first two possible actions corresponding to the two groups Paf + Pif and Clac + Clic. Then, when the vowel is detected, a second action can be commanded among the 4 groups Paf, Pif, Clac and Clic. Thus, an action can be generated as soon as the first characteristic phoneme of the control word is detected by the classification module. This triggering of early action further increases the impression of instantaneous control of the action. It is then possible to drive the specific action on the control phoneme a or i of the control word continuously.

En variante, les modules 13 de détection de présence de voix et d'action 20 correspondant peuvent être supprimés, de sorte qu'une action n'est générée qu'à partir du moment où un phonème de détection a été détecté par une io machine de discrimination. In a variant, the voice presence and corresponding action detection modules 13 can be omitted, so that an action is generated only when a detection phoneme has been detected by a machine. of discrimination.

La figure 2 montre des tableaux faisant apparaître pour une paire de mots de commande les phonèmes de détection utilisés pour la détection, et les phonèmes de pilotage utilisés pour effectuer le pilotage de l'action engagée suite à la détection. FIG. 2 shows tables showing, for a pair of control words, the detection phonemes used for the detection, and the control phonemes used to control the action taken following the detection.

15 Il est possible de piloter l'action engagée à partir d'un phonème différent de celui ou ceux utilisés pour effectuer la détection du mot de commande. Ainsi les mots bazooka et fusil s'écrivent respectivement en langage phonétique [b] [a] [z] [u] [k] [a] et [f] [y] [z] [i]. Les phonèmes de détection [b] [a] [z] [u] [k] et [f] [y] [z] sont utilisés pour la détection d'un des deux mots, 20 tandis que le phonème de pilotage [a] assure le pilotage de l'action engagée suite à la détection du mot bazooka et le phonème de pilotage [i] assure le pilotage de l'action engagée suite à la détection du mot fusil . It is possible to control the action initiated from a phoneme different from that used to detect the control word. Thus the words bazooka and rifle are written respectively in phonetic language [b] [a] [z] [u] [k] [a] and [f] [y] [z] [i]. The detection phonemes [b] [a] [z] [u] [k] and [f] [y] [z] are used for the detection of one of the two words, while the driving phoneme [a ] ensures the control of the action taken following the detection of the word bazooka and the phoneme of control [i] ensures the control of the action taken following the detection of the word gun.

Il est également possible de piloter l'action détectée à partir d'un des phonèmes utilisés pour effectuer la classification du mot. Ainsi les mots 25 gauche et droite s'écrivent respectivement en langage phonétique [g] [o] [f] et [d] [R] [u] [a] [t]. Les phonèmes [o] et [a] sont utilisés à la fois comme phonèmes de détection des mots de commande (module 15) , et comme phonèmes de pilotage (module 17) de l'action correspondant aux mots de commande. Dans ce cas, le phonème de détection/pilotage déclenche 30 l'action et la tenue du son du phonème de détection/pilotage permet de contrôler cette action. It is also possible to control the action detected from one of the phonemes used to carry out the classification of the word. Thus the words left and right are respectively written in phonetic language [g] [o] [f] and [d] [R] [u] [a] [t]. The phonemes [o] and [a] are used both as detection phonemes of the control words (module 15), and as control phonemes (module 17) of the action corresponding to the control words. In this case, the detection / driving phoneme triggers the action and sound-holding of the detection / control phoneme to control this action.

Par ailleurs, l'action pourra être arrêtée, soit par un phonème de détection indiquant la fin de l'action, soit par détection de la fin du signal de voix. Ainsi pour la paire gauche/droite ce sera la détection du phonème terminant le mot ([f] pour gauche ou [t] pour droite) qui mettra fin à l'action tandis que pour la paire bazooka/fusil l'action est arrêtée lors de la fin de la prononciation du mot (Figure 2). Il est également possible de terminer l'action à l'atteinte d'un seuil prédéfini sur l'un quelconque des paramètres prosodiques (par exemple durée : à l'expiration d'une période, on arrête l'action, intensité ou hauteur : lorsqu'on atteint un seuil d'intensité ou de io hauteur l'action s'arrête). Moreover, the action can be stopped, either by a detection phoneme indicating the end of the action, or by detecting the end of the voice signal. Thus for the left / right pair it will be the detection of the phoneme ending the word ([f] for left or [t] for right) which will end the action while for the pair bazooka / rifle the action is stopped during the end of the pronunciation of the word (Figure 2). It is also possible to end the action when a predefined threshold is reached on any of the prosodic parameters (for example duration: at the end of a period, the action, intensity or height is stopped: when an intensity or height threshold is reached, the action stops).

Dans la suite du document et dans les tableaux des Figures 3a à 3d montrant des machines pouvant être utilisées pour effectuer la discrimination entre différents mots de commande, l'expression [x]+[y] correspond à la reconnaissance cumulée des deux phonèmes [x] et [y], le mot étant détecté 15 après que les phonèmes [x] et [y] aient été détectés par les machines de discrimination indépendamment de l'ordre dans lequel ils ont été identifiés l'expression [x] - [y] correspond à une reconnaissance cumulée successive des phonèmes [x] et [y], le mot étant détecté après que le phonème [x] puis le phonème [y] aient été détectés par les machines de discrimination 20 l'expression ([x]) indique l'existence unique d'un phonème [x] optionnel tandis que l'expression {[x]+[y]} indique une concaténation de phonèmes, si [x]=[y] cela correspond à un [x] de longue durée. In the remainder of the document and in the tables of FIGS. 3a to 3d showing machines that can be used to discriminate between different control words, the expression [x] + [y] corresponds to the cumulative recognition of the two phonemes [x ] and [y], the word being detected after the phonemes [x] and [y] have been detected by the discrimination machines regardless of the order in which they have been identified the expression [x] - [y ] corresponds to a cumulative successive recognition of the phonemes [x] and [y], the word being detected after the phoneme [x] and then the phoneme [y] have been detected by the discrimination machines 20 the expression ([x] ) indicates the unique existence of an optional phoneme [x] while the expression {[x] + [y]} indicates a concatenation of phonemes, if [x] = [y] this corresponds to a [x] of long duration.

La figure 3a montre la reconnaissance de deux mots ( flip et flap ) en utilisant une machine de discrimination entre deux phonèmes I et A . Figure 3a shows the recognition of two words (flip and flap) using a discrimination machine between two phonemes I and A.

25 La figure 3b montre la reconnaissance de deux mots ( gauche et droite ) en utilisant une détection sur les phonèmes G et D , ou O et A . Afin d'obtenir une détection plus robuste, il est possible de combiner deux machines de discrimination pour effectuer une reconnaissance cumulée de deux phonèmes ou plus de type G + O 30 et D + A . En variante, on effectue une reconnaissance de type O et OU ou A ; dans ce cas, on construit une machine spécifique dans laquelle le OU et le A sont considérés comme un seul et même phonème. Dans le cas de machines de discrimination combinées, l'action spécifique peut être engagée dès le premier phonème de détection, puis continuée si le deuxième phonème de détection confirme que l'onomatopée détectée lors du premier test, ou arrêtée sinon. Cependant, il existe des cas dans lesquels il est impossible de reconnaître des mots par des combinaisons d'éléments simples sans l'introduction d'une notion de temps. Un exemple est donné dans la Figure 3c. Ainsi le mot babafi ne peut être différentié du mot fibaba en considérant io uniquement les phonèmes simples F , I , B , A . Il y a donc lieu de prendre en compte la séquence des phonèmes reconnus pour différencier le mot babafi du mot fibaba . Ainsi, la détection du phonème B suivi du phonème F permet de distinguer le mot babafi de fibaba , qui serait caractérisé par le phonème F suivi du phonème B . Figure 3b shows the recognition of two words (left and right) using a detection on the phonemes G and D, or O and A. In order to obtain a more robust detection, it is possible to combine two discriminating machines to perform cumulative recognition of two or more phonemes of type G + O and D + A. Alternatively, type O and OR or A recognition is performed; in this case, we build a specific machine in which the OU and the A are considered as one and the same phoneme. In the case of combined discrimination machines, the specific action can be engaged from the first detection phoneme and then continued if the second detection phoneme confirms that the onomatopoeia detected during the first test, or stopped otherwise. However, there are cases in which it is impossible to recognize words by combinations of simple elements without the introduction of a notion of time. An example is given in Figure 3c. Thus the word Babafi can not be differentiated from the word fibaba by considering only the simple phonemes F, I, B, A. It is therefore necessary to take into account the sequence of recognized phonemes to differentiate the word babafi from the word fibaba. Thus, the detection of the phoneme B followed by the phoneme F makes it possible to distinguish the word babafi from fibaba, which would be characterized by the phoneme F followed by the phoneme B.

15 Par ailleurs, on notera que la granularité de la classification peut varier du mot au phonème. On the other hand, it will be noted that the granularity of the classification may vary from word to phoneme.

Cela permet par exemple de reconnaître deux mots presque identiques, l'un ayant une partie de pilotage et l'autre non, mais aussi de différencier deux mots dont la partie pilotage utilise le même phonème. Par exemple, dans la 20 Figure 3d, on voit deux exemples d'implémentation qui permettent de reconnaître les mots plus , pluuuuuuus , minus et minuuuuuus . This makes it possible, for example, to recognize two almost identical words, one having a steering part and the other not, but also to differentiate two words whose steering part uses the same phoneme. For example, in Figure 3d, two examples of implementations are shown which make it possible to recognize the words plus, pluuuuuuus, minus and minuuuuuus.

Dans un apprentissage classique par mot comme cela apparaît dans la colonne 28, le système apprend deux classes: le mot court sans pilotage ( plus et minus ) et le mot long avec pilotage ( pluuuuuus et 25 minuuuuus ). Le pilotage est rendu possible avant la fin du mot car le système de reconnaissance envoie ses hypothèses de reconnaissance pendant la prononciation du mot. La durée du mot long peut être variable. In a classical word learning as it appears in column 28, the system learns two classes: the short word without piloting (plus and minus) and the long word with piloting (pluuuuuus and 25 minuuuuus). Piloting is made possible before the end of the word because the recognition system sends its recognition hypotheses during the pronunciation of the word. The duration of the long word can be variable.

Avec un apprentissage optimisé par phonèmes caractéristiques comme cela apparaît dans la colonne 29, le système fait une reconnaissance phonème 30 par phonème : [p], [I], [y], [s], [m], [i], [n] (correspondant aux lettres P , L , U , S , M , I , N ). La détection du mot se fait sur la reconnaissance de la concaténation des premiers phonèmes caractéristiques, puis la phase de pilotage est caractérisée par une reconnaissance multiple et continue du phonème de pilotage [y] (lettre U ). Dans notre exemple, la détection du phonème [s] en fin de mot est optionnelle. Elle peut cependant être utile pour spécifier une action de fin de reconnaissance de mot. L'utilisation de mot avec pilotage peut alors être combinée avec la reconnaissance de mot sans pilotage, et l'on peut assigner des actions différentes selon que le mot reconnu possède ou non une phase io de pilotage lors de sa reconnaissance. With an optimized learning by characteristic phonemes as it appears in column 29, the system makes a phoneme 30 recognition by phoneme: [p], [I], [y], [s], [m], [i], [ n] (corresponding to the letters P, L, U, S, M, I, N). The detection of the word is done on the recognition of the concatenation of the first characteristic phonemes, then the piloting phase is characterized by a multiple and continuous recognition of the driving phoneme [y] (letter U). In our example, the detection of the phoneme [s] at the end of the word is optional. However, it can be useful for specifying an end of word recognition action. The use of word with control can then be combined with the word recognition without piloting, and it is possible to assign different actions depending on whether the recognized word has a control phase during its recognition.

La Figure 4 donne un exemple de mise en oeuvre du procédé selon l'invention dans un jeu de tir dans lequel deux actions peuvent être commandées respectivement au moyen de deux mots distincts. Ainsi le mot Fusil va commander une action de tir au fusil, tandis que le Bazooka 15 va commander une action de tir au bazooka, les paramètres prosodiques de la fin de ces mots contrôlant par exemple l'orientation du tir, l'intensité, ou toute autre caractéristique de ce tir. FIG. 4 gives an example of implementation of the method according to the invention in a shooting game in which two actions can be controlled respectively by means of two distinct words. Thus the word Rifle will command a rifle action, while the Bazooka 15 will command a bazooka shooting action, the prosodic parameters of the end of these words controlling for example the direction of the shot, the intensity, or any other characteristic of this shot.

A cet effet, un système 35 comprend un module 37 de classification comportant une machine 38 de discrimination de voyelle [u]/[y], un module 20 41 d'action spécifique faisant la correspondance entre le mot détecté par le module 37 et l'action à déclencher, ainsi que deux machines 44 et 45 de pilotage (une par action susceptible d'être déclenchée) permettant de piloter de manière continue ou discrète l'action déclenchée. Un micro 47 associé à un convertisseur 48 analogique/numérique est en relation avec les modules 25 37 de classification et de pilotage 44, 45 pour acheminer le signal de voix numérisé vers ces différents modules. For this purpose, a system 35 comprises a classification module 37 comprising a vowel discrimination machine 38 [u] / [y], a specific action module 41 making the correspondence between the word detected by the module 37 and the action to be triggered, as well as two control machines 44 and 45 (one per action that can be triggered) for continuously or discretely controlling the triggered action. A microphone 47 associated with an analog / digital converter 48 is related to the classification and control modules 37, 45, 45 for routing the digitized voice signal to these different modules.

Ainsi, comme représenté, lorsqu'un utilisateur 50 prononce dans le microphone 47 le mot bazooka , la machine 38 détecte la présence de du phonème [u] référencé 51, et le module 41 déclenche alors l'action de tir au 30 bazooka correspondant au phonème détecté. Thus, as shown, when a user 50 pronounces in the microphone 47 the word bazooka, the machine 38 detects the presence of the phoneme [u] referenced 51, and the module 41 then triggers the bazooka firing action corresponding to the phoneme detected.

Puis le module 45 de pilotage extrait des paramètres du signal de voix, en particulier du phonème [a] référencé 53, tels que la durée, la hauteur, l'intensité, le timbre ou des combinaisons de ces paramètres, et le cas échéant, effectue des calculs statistiques à partir de ces paramètres pour générer un ensemble 55 de paramètres continus ou discrets. Then the control module 45 extracts parameters of the voice signal, in particular the phoneme [a] referenced 53, such as the duration, the pitch, the intensity, the timbre or combinations of these parameters, and if necessary, performs statistical calculations from these parameters to generate a set of continuous or discrete parameters.

Ces paramètres 55 sont ensuite utilisés par le module 45 pour moduler l'action déclenchée du tir au bazooka. Dans le cadre d'un contrôle discret, émettre un son aigu ou grave par rapport à un seuil permet d'orienter le tir à gauche ou à droite. Dans le cadre d'un contrôle continu, la prononciation plus ou moins forte du [a] de bazooka permettra par exemple de faire varier de io manière continue l'intensité du tir ou son orientation. These parameters 55 are then used by the module 45 to modulate the action triggered by the bazooka shot. In the context of a discrete control, emitting a high or low sound with respect to a threshold makes it possible to direct the shot to the left or to the right. In the context of continuous monitoring, the more or less pronounced pronunciation of the bazooka [a] will make it possible, for example, to continuously vary the intensity of the shot or its orientation.

En variante, le système comporte en outre également les modules 13 et 20, de sorte qu'il est possible que le personnage du jeu effectue une action, comme par exemple armer son arme, dès qu'un son est détecté, et avant même de savoir quel type d'arme (bazooka ou fusil) va être commandé. In a variant, the system also includes modules 13 and 20, so that it is possible for the game character to perform an action, such as arming his weapon, as soon as a sound is detected, and even before know what kind of weapon (bazooka or rifle) will be ordered.

15 L'invention est généralisable à des mots multi-syllabiques. Par exemple on peut avoir un vocabulaire constitué de bateau et couper ; dès le b de bateau , on sait le différentier de couper . Ensuite, on peut tenir le a de bateau pour générer un contrôle continu, puis passer à l'état suivant à la prononciation de t , et, enfin piloter l'action tant que l'on prononce 20 o . Cette notion de bouclage est symbolisée par la flèche 2 de la Figure 1. The invention is generalizable to multi-syllabic words. For example one can have a vocabulary consisting of boat and cut; from the boat b, we know the difference between cutting. Then we can hold the boat a to generate a continuous control, then move to the next state at the pronunciation of t, and finally pilot the action as long as we pronounce 20 o. This concept of looping is symbolized by the arrow 2 of Figure 1.

Le procédé de contrôle selon l'invention peut être utilisé comme seul mode de contrôle mais aussi dans le cadre d'une utilisation multimodale, par exemple une utilisation en combinaison avec un game pad, un clavier, ou un système de reconnaissance de mouvement, tel que celui mis en oeuvre au 25 sein de la console Wii (marque déposée) ou des jeux Eyetoy (marque déposée), ou un système de suivi des mouvements oculaires. The control method according to the invention can be used as the sole mode of control but also in the context of a multimodal use, for example a use in combination with a game pad, a keyboard, or a motion recognition system, such as that implemented within the Wii console (registered trademark) or Eyetoy games (registered trademark), or an eye tracking system.

Dans le procédé de contrôle selon l'invention, un ou plusieurs des mots du vocabulaire de commande peut être remplacé par un son comme par exemple un claquement de mains. Un exemple de vocabulaire de commande 30 pet être constitué par { Up , Down , Bruit de claquement de mains }. In the control method according to the invention, one or more words of the control vocabulary can be replaced by a sound such as a clapping of hands. An example of a control vocabulary can be constituted by {Up, Down, Clapping Hands}.

Pour une mise en oeuvre efficace dans le cadre d'un système de jeu vidéo, on choisit de travailler avec des sorties de mapping des modules 20 à 22 synchronisées avec les frames vidéo, c'est-à-dire qu'on se synchronise avec la vitesse de rafraichissement de l'écran qui est classiquement de 60 frames par seconde. En effet, le programmeur de jeu vidéo peut influer sur les actions de jeu à chaque rafraichissement d'écran, soit 1 fois tous les 60ème de seconde. Il est donc utile de pouvoir lui fournir des paramètres de contrôles synchronisés sur cette période ou sur des multiples de cette période. For an efficient implementation in the context of a video game system, we choose to work with the mapping outputs of the modules 20 to 22 synchronized with the video frames, that is to say that we synchronize with the refresh rate of the screen which is conventionally 60 frames per second. Indeed, the video game programmer can influence the game actions at each refresh of the screen, ie 1 time every 60th of a second. It is therefore useful to be able to provide synchronized control parameters for this period or multiples of this period.

io On notera que le traitement du signal vocal est effectué à une fréquence déterminée par l'acoustique des phénomènes à analyser, fréquence décorrélée des 60 frames par seconde, un système de synchronisation permettant de récupérer les dernières informations d'analyse de la voix disponibles de manière synchronisée avec la fréquence de rafraichissement 15 vidéo. It will be noted that the processing of the voice signal is carried out at a frequency determined by the acoustics of the phenomena to be analyzed, the frequency decorrelated at 60 frames per second, a synchronization system making it possible to retrieve the latest available voice analysis information from synchronized with the video refresh rate.

Toutefois, il est également possible de se synchroniser à des fréquences plus rapides. However, it is also possible to synchronize at faster frequencies.

Claims

REVENDICATIONS1. Method for controlling an application, for example a video game, with a voice signal (4) characterized in that it comprises the following steps: defining a control vocabulary, that is to say a set of command words, preferably onomatopoeia, each having a characteristic phoneme or a plurality of characteristic phonemes making it possible to distinguish each word from the other words of the control vocabulary; - detecting (15) a control word contained in the voice signal (4) by means of at least one discriminating machine (38) for detecting and discriminating the pronunciation of different phonemes while possibly excluding a set of other phonemes, such discrimination machine (38) ensuring for example the distinction between at least two characteristic phonemes or the distinction between at least two groups of characteristic phonemes, - generating (21, 41) a specific action corresponding to the control word de tected, then 20 - driving (17, 44, 45) the specific action generated as a function of at least one continuous magnitude (55) of the voice signal, such as duration, intensity, pitch or timbre of the signal.

Method according to claim 1, characterized in that the discriminating machine detects the phoneme or group of phonemes characteristic of the control word and generates the specific action before the end of the pronunciation of the control word.

3. Method according to claim 1 or 2, characterized in that the continuous magnitude of the voice signal (4) according to which the action is driven is located in time after the detection phoneme detected by the machine (38). ) of discrimination.

4. Method according to one of claims 1 to 3, characterized in that as soon as the presence of a voice signal (4) is detected (13), even before the recognition of the command word, a generic action is generated .

5. Method according to claim 4, characterized in that the specific action may be different from the generic action.

6. Method according to one of claims 1 to 5, characterized in that the method employing at least two discrimination machines, several specific actions are engaged successively io during the successive detection of phonemes or groups of phonemes features in the command word by each of the machines.

7. Method according to claim 6, characterized in that, two discriminating machines being used, a first specific action is generated after the detection of the first detection phoneme of the control word by a first discriminating machine capable of discriminating N different phonemes, so that a first action can be generated among N first possible actions, and 20 - a second specific action is generated after the detection of the second characteristic phoneme of the control word by a second discriminating machine able to discriminate M phonemes distinct, so that this second action can be generated among N x M second possible actions. 25

8. Method according to one of claims 1 to 7, characterized in that: - as soon as one detects a phoneme or a group of characteristic phonemes, or - as soon as one of the continuous parameters of the voice signal exceeds one certain threshold, for example as soon as the voice signal disappears, - the specific action ends.

9. Method according to one of claims 1 to 8, characterized in that 1 and 7 discrimination machines are implemented in order to be able to differentiate between 1 to 15 control words.

10. Method according to one of claims 1 to 9, characterized in that the discrimination machine uses hidden Markov models.

11. Method according to one of claims 1 to 10 characterized in that one chooses to generate actions and to drive in a manner synchronized with the video frames of a game to be driven.

12. Method according to one of claims 1 to 11, characterized in that it is implemented in combination with a control method by another control interface, such as for example a game pad, a keyboard, a device eye tracking, or motion tracking device.

13. Method according to one of claims 1 to 12, characterized in that to resist noisy environments, using a contact microphone. 20

14. Device for controlling an application, for example a video game, with a voice signal (4) characterized in that - a control vocabulary, that is to say a set of command words, preferably Onomatopoeia, being defined, each control word having a characteristic phoneme or several characteristic phonemes making it possible to distinguish each word from the other words of the control vocabulary, it comprises: means (15) for detecting a control word contained in the voice signal (4), said means (15) comprising at least one discrimination machine (38) for detecting and discriminating the pronunciation of different phonemes while possibly excluding a set of other phonemes, said machine (38) of discrimination ensuring, for example, the distinction between at least two characteristic phonemes or the distinction between at least two groups of characteristic phonemes; means (21, 41) for generating a specific action corresponding to the detected control word, and - means (17, 44, 45) for controlling the specific action generated as a function of at least one continuous magnitude (55) of the voice signal, such as the duration, the intensity, pitch or timbre of the signal.