FR3078573A1

FR3078573A1 - LEXICAL ANALYSIS METHOD

Info

Publication number: FR3078573A1
Application number: FR1851876A
Authority: FR
Inventors: Yohann Lebret; Nathanael Delahaye
Original assignee: Delta Dore SA
Current assignee: Delta Dore SA
Priority date: 2018-03-05
Filing date: 2018-03-05
Publication date: 2019-09-06
Anticipated expiration: 2038-03-05
Also published as: FR3078573B1

Abstract

L'invention concerne un procédé d'analyse lexicale d'une chaîne de caractères mis en œuvre par un dispositif électronique comprenant une mémoire dans laquelle est enregistrée une liste de sous-chaînes de caractères prédéterminées, le dispositif électronique étant adapté pour recevoir, en entrée, la chaîne de caractères et pour enregistrer dans la mémoire, en sortie, une liste ordonnée des sous-chaînes de caractères prédéterminées identifiées dans la chaîne de caractères, le procédé comprenant les étapes itératives de déterminer au moyen d'un premier condensat si des caractères de la chaîne de caractères correspondent au début possible d'une sous-chaîne de caractères prédéterminée testée, et, si c'est le cas, de déterminer si des condensats de la sous-chaîne testée et de la sous-chaîne de caractères de même longueur de la chaîne de caractères correspondent.The invention relates to a method for the lexical analysis of a character string implemented by an electronic device comprising a memory in which a list of predetermined character substrings is stored, the electronic device being adapted to receive, as input , the string of characters and for storing in the memory, at output, an ordered list of the predetermined character substrings identified in the character string, the method comprising the iterative steps of determining by means of a first condensate whether characters of the character string corresponds to the possible start of a predetermined substring of characters tested, and, if so, to determine whether condensates of the substring tested and the substring of characters of the same length of the string match.

Description

La présente invention concerne le domaine de l’analyse sémantique dans un contexte de reconnaissance vocale. L’invention concerne plus particulièrement un procédé d’analyse lexicale d’une chaîne de caractères.The present invention relates to the field of semantic analysis in a context of voice recognition. The invention relates more particularly to a method for the lexical analysis of a character string.

L’analyse lexicale d’une chaîne de caractères (ou symboles) consiste en la transformation de la chaîne de caractères, c’est-à-dire d’une suite de caractères, en une liste ordonnée de mots, c’est-à-dire de sous-chaînes de caractères ayant un sens prédéterminé. Dans un contexte de reconnaissance vocale, la chaîne de caractères peut correspondre au résultat d’un algorithme de reconnaissance vocale. Il est alors nécessaire de pouvoir reconnaître au sein de cette chaîne de caractères d’éventuels mots clefs prédéterminés.The lexical analysis of a character string (or symbol) consists of the transformation of the character string, that is to say of a series of characters, into an ordered list of words, i.e. -to say substrings of characters having a predetermined meaning. In a speech recognition context, the character string can correspond to the result of a speech recognition algorithm. It is therefore necessary to be able to recognize, within this character string, possible predetermined keywords.

Il est ainsi nécessaire de rechercher, au sein de la chaîne de caractères, des souschaînes de caractères prédéterminées. Toutefois, un algorithme de recherche classique nécessite un temps de calcul qui évolue de façon non linéaire suivant le nombre de sous-chaînes à rechercher. Ce temps de calcul qui peut devenir très important est à éviter particulièrement dans un système dans lequel la reconnaissance vocale est utilisée pour du pilotage vocal. En effet, si le temps d’analyse lexicale devient important, cela introduit une latence dans la réponse du système de pilotage vocale, latence qui peut perturber un utilisateur du système.It is therefore necessary to search, within the character string, for predetermined character strings. However, a conventional search algorithm requires a computation time which evolves in a non-linear fashion according to the number of substrings to be searched. This computing time which can become very important is to be avoided particularly in a system in which voice recognition is used for voice control. Indeed, if the lexical analysis time becomes significant, this introduces latency in the response of the voice control system, latency which can disturb a user of the system.

Des algorithmes ont été proposés pour répondre à cette problématique, comme par exemple l’algorithme de Rabin-Karp ou bien l’algorithme de Boyer-Moore. Toutefois ces algorithmes ne répondent pas à toutes les caractéristiques attendues dans un système de pilotage vocal, c’est-à-dire :Algorithms have been proposed to answer this problem, such as the Rabin-Karp algorithm or the Boyer-Moore algorithm. However, these algorithms do not meet all the characteristics expected in a voice control system, that is to say:

- une possibilité de recherche de sous-chaînes multiples, sans que le temps de calcul n’évolue de façon non linéaire, pas de recours à une phase préalable nécessitant des calculs intensifs afin de pouvoir modifier une liste de sous-chaînes à rechercher sans devoir relancer ces calculs intensifs, adaptabilité de l’algorithme à la longueur des sous-chaînes à rechercher, adaptabilité de l’algorithme à la longueur de la chaîne de caractères à traiter.- a possibility of searching for multiple substrings, without the calculation time changing in a non-linear fashion, no recourse to a prior phase requiring intensive calculations in order to be able to modify a list of substrings to be searched without having to relaunch these intensive calculations, adaptability of the algorithm to the length of the substrings to be searched, adaptability of the algorithm to the length of the character string to be processed.

Un objet de la présente invention est de proposer un procédé d’analyse lexicale d’une chaîne de caractères qui ne présente pas les inconvénients de l’art antérieur.An object of the present invention is to propose a method for the lexical analysis of a character string which does not have the drawbacks of the prior art.

L’invention concerne un procédé d’analyse lexicale d’une chaîne de caractères, le procédé étant exécuté par un dispositif électronique, le dispositif électronique comprenant une mémoire dans laquelle est enregistrée une liste de sous-chaînes de caractères prédéterminées, le dispositif électronique étant adapté pour recevoir, en entrée, une chaîne de caractères et pour enregistrer dans la mémoire, en sortie, une liste ordonnée des sous-chaînes comprises dans la chaîne de caractères, le procédé comprenant les étapes préalables de :The invention relates to a method for the lexical analysis of a character string, the method being executed by an electronic device, the electronic device comprising a memory in which a list of predetermined character substrings is recorded, the electronic device being adapted to receive, at input, a character string and to store in memory, at output, an ordered list of substrings included in the character string, the method comprising the prior steps of:

pour chaque sous-chaîne de caractères prédéterminée, déterminer un condensât, dit condensât préfixe, en fonction des P premiers caractères de ladite souschaîne, P étant prédéterminé, pour chaque sous-chaîne de caractères prédéterminée, déterminer, dans une table, dite table de hachage, enregistrée dans la mémoire du dispositif électronique, une entrée correspondant au condensât préfixe associé à ladite souschaîne, pour chaque entrée de la table de hachage correspondant à un préfixe, déterminer une liste ordonnée des sous-chaînes prédéterminées associées, chaque sous-chaîne prédéterminée étant associée à un condensât, dit condensât sous-chaîne, fonction de la sous-chaîne prédéterminée, la liste étant ordonnée selon une longueur décroissante des sous-chaînes prédéterminées, et, recevoir la chaîne de caractères et déterminer une position d’un pointeur sur la chaîne de caractères, la position correspondant initialement au premier caractère de la chaîne de caractères, le procédé comprenant de plus les étapes itératives de : déterminer un premier condensât de la sous-chaîne formée par P caractères de la chaîne de caractères à partir de la position du pointeur, déterminer si une entrée condensât préfixe de la table de hachage correspond à ce premier condensât, si oui, pour chaque longueur des sous-chaînes prédéterminées de la liste associée à ladite entrée condensât préfixe, et dans l’ordre de la liste, déterminer un condensât de la sous-chaîne formée par un nombre de caractères de la chaîne de caractères égal à ladite longueur à partir de la position du pointeur, et, si ledit condensât est égal à un condensât sous-chaîne associé à l’une des sous-chaînes prédéterminées de même longueur, alors :for each predetermined character substring, determining a condensate, known as the prefix condensate, as a function of the first P characters of said substring, P being predetermined, for each predetermined character substring, determining, in a table, called hash table , stored in the memory of the electronic device, an entry corresponding to the prefix condensate associated with said substring, for each entry in the hash table corresponding to a prefix, determining an ordered list of the predetermined substrings associated, each predetermined substring being associated with a condensate, called condensate substring, function of the predetermined substring, the list being ordered according to a decreasing length of the predetermined substrings, and, receiving the character string and determining a position of a pointer on the character string, the position corresponding initially to the first character of the character string, the method further comprising the iterative steps of: determining a first condensate of the substring formed by P characters of the character string from the position of the pointer, determining whether an entry condensate prefix of the hash table corresponds to this first condensate, if yes, for each length of the predetermined substrings of the list associated with said prefix condensate entry, and in the order of the list, determine a condensate of the substring formed by a number of characters in the character string equal to said length from the position of the pointer, and, if said condensate is equal to a substring condensate associated with one of the predetermined substrings of the same length, then :

o vérifier que ladite sous-chaîne prédéterminée est égale à la sous-chaîne formée par un nombre de caractères de la chaîne de caractères égal à ladite longueur à partir de la position du pointeur, et, si c’est le cas, o enregistrer ladite sous-chaîne, dite sous-chaîne identifiée, dans la liste ordonnée des sous-chaînes prédéterminées comprises dans la chaîne de caractères, o positionner le pointeur sur la chaîne de caractères à une nouvelle position correspondant à la position additionnée d’un nombre de caractères correspondant à la longueur de la sous-chaîne identifiée, si non, positionner le pointeur sur la chaîne de caractères à une nouvelle position correspondant au déplacement d’un caractère sur la chaîne de caractères.o check that said predetermined substring is equal to the substring formed by a number of characters in the string equal to said length from the position of the pointer, and, if this is the case, o save said substring, known as identified substring, in the ordered list of predetermined substrings included in the character string, o position the pointer on the character string at a new position corresponding to the position added with a number of characters corresponding to the length of the identified substring, if not, position the pointer on the character string at a new position corresponding to the movement of a character on the character string.

Avantageusement, ce procédé permet une analyse lexicale d’une chaîne de caractères en réduisant le temps de calcul nécessaire par rapport à un procédé nécessitant une comparaison caractère par caractère d’une sous-chaîne de caractères afin de déterminer la présence, ou non, de ladite sous-chaîne de caractères dans la chaîne de caractères. Le temps de calcul évolue ainsi de manière linéaire suivant la taille de la liste de sous-chaînes de caractères prédéfinies, et non de manière nonlinéaire, ce qui garantit une analyse lexicale plus rapide.Advantageously, this method allows a lexical analysis of a character string by reducing the calculation time required compared to a method requiring a character-by-character comparison of a substring of characters in order to determine the presence, or not, of said character substring in the character string. The calculation time thus evolves linearly according to the size of the list of predefined character substrings, and not in a nonlinear manner, which guarantees faster lexical analysis.

Selon un mode de réalisation complémentaire de l’invention, le procédé comprend les étapes ultérieures de mise à jour de la liste de sous-chaînes de caractères prédéterminées :According to a complementary embodiment of the invention, the method comprises the subsequent steps of updating the list of predetermined character substrings:

- recevoir une sous-chaîne,- receive a substring,

- déterminer le condensât dit condensât préfixe, en fonction des P premiers caractères de ladite sous-chaîne,- determine the condensate called the prefix condensate, as a function of the first P characters of said substring,

- déterminer une entrée dans la table de hachage correspondant audit condensât préfixe,- determine an entry in the hash table corresponding to said prefix condensate,

- enregistrer dans la liste ordonnée des sous-chaînes associées à l’entrée déterminée de la sous-chaîne, la position dans la liste de la sous-chaîne étant fonction de la longueur de la sous-chaîne,- save in the ordered list of substrings associated with the determined input of the substring, the position in the list of the substring being a function of the length of the substring,

- déterminer et enregistrer un condensât de la sous-chaîne en association avec la sous-chaîne dans la liste.- determine and save a condensate of the substring in association with the substring in the list.

L’invention concerne également un dispositif électronique permettant une analyse lexicale d’une chaîne de caractères, le dispositif électronique comprenant une mémoire dans laquelle est enregistrée une liste de sous-chaînes de caractères prédéterminées, le dispositif électronique étant adapté pour :The invention also relates to an electronic device allowing a lexical analysis of a character string, the electronic device comprising a memory in which is stored a list of predetermined character substrings, the electronic device being adapted for:

pour chaque sous-chaîne de caractères prédéterminée, déterminer un condensât, dit condensât préfixe, en fonction des P premiers caractères de ladite sous-chaîne, P étant prédéterminé, pour chaque sous-chaîne de caractères prédéterminée, déterminer, dans une table, dite table de hachage, enregistrée dans la mémoire du dispositif électronique, une entrée correspondant au condensât préfixe associé à ladite sous-chaîne, pour chaque entrée de la table de hachage correspondant à un préfixe, déterminer une liste ordonnée des sous-chaînes prédéterminées associées, chaque sous-chaîne prédéterminée étant associée à un condensât, dit condensât sous-chaîne, fonction de la sous-chaîne prédéterminée, la liste étant ordonnée selon une longueur décroissante des sous-chaînes prédéterminées, et, recevoir une chaîne de caractères et déterminer une position d’un pointeur sur la chaîne de caractères, la position correspondant initialement au premier caractère de la chaîne de caractères, le procédé comprenant de plus les étapes itératives de :for each predetermined character substring, determining a condensate, called prefix condensate, as a function of the first P characters of said substring, P being predetermined, for each predetermined character substring, determining, in a table, called table hash, stored in the memory of the electronic device, an entry corresponding to the prefix condensate associated with said substring, for each entry in the hash table corresponding to a prefix, determine an ordered list of predetermined associated substrings, each -predetermined chain being associated with a condensate, called substring condensate, function of the predetermined substring, the list being ordered according to a decreasing length of the predetermined substrings, and, receiving a character string and determining a position of a pointer to the character string, the position corresponding initially at the first character of the character string, the method further comprising the iterative steps of:

déterminer un premier condensât de la sous-chaîne formée par P caractères de la chaîne de caractères à partir de la position du pointeur, déterminer si une entrée condensât préfixe de la table de hachage correspond à ce premier condensât, si oui, pour chaque longueur des sous-chaînes prédéterminées de la liste associée à ladite entrée condensât préfixe, et dans l’ordre de la liste, déterminer un condensât de la sous-chaîne formée par un nombre de caractères de la chaîne de caractères égal à ladite longueur à partir de la position du pointeur, et, si ledit condensât est égal à un condensât sous-chaîne associé à l’une des sous-chaînes prédéterminées de même longueur, alors :determine a first condensate of the substring formed by P characters of the character string from the position of the pointer, determine if a prefix condensate entry of the hash corresponds to this first condensate, if so, for each length of the predetermined substrings from the list associated with said prefix condensate entry, and in the order of the list, determine a condensate of the substring formed by a number of characters in the string equal to said length from the position of the pointer, and, if said condensate is equal to a substring condensate associated with one of the predetermined substrings of the same length, then:

o vérifier que ladite sous-chaîne prédéterminée est égale à la sous-chaîne formée par un nombre de caractères de la chaîne de caractères égal à ladite longueur à partir de la position du pointeur, et, si c’est le cas, o enregistrer ladite sous-chaîne, dite sous-chaîne identifiée, dans une liste ordonnée des sous-chaînes prédéterminées comprises dans la chaîne de caractères, ladite liste étant enregistrée dans la mémoire du dispositif électronique, o positionner le pointeur sur la chaîne de caractères à une nouvelle position correspondant à la position additionnée d’un nombre de caractères correspondant à la longueur de la sous-chaîne identifiée, si non, positionner le pointeur sur la chaîne de caractères à une nouvelle position correspondant au déplacement d’un caractère sur la chaîne de caractères.o check that said predetermined substring is equal to the substring formed by a number of characters in the string equal to said length from the position of the pointer, and, if this is the case, o save said substring, known as identified substring, in an ordered list of predetermined substrings included in the character string, said list being saved in the memory of the electronic device, o position the pointer on the character string in a new position corresponding to the position plus a number of characters corresponding to the length of the identified substring, if not, position the pointer on the character string at a new position corresponding to the movement of a character on the character string.

L’invention concerne également un programme d’ordinateur comprenant des instructions pour mettre en œuvre, par un processeur d’un dispositif électronique, un procédé d’analyse lexicale décrit dans le présent document lorsque le programme d’ordinateur est exécuté par le processeur.The invention also relates to a computer program comprising instructions for implementing, by a processor of an electronic device, a lexical analysis method described in this document when the computer program is executed by the processor.

L’invention concerne également un support d'enregistrement, lisible par un dispositif électronique, sur lequel est stocké ledit programme d’ordinateur.The invention also relates to a recording medium, readable by an electronic device, on which said computer program is stored.

La Fig. 1 représente un organigramme d'un procédé d’analyse lexicale d’une chaîne de caractères selon un mode de réalisation de l'invention, les Fig. 2A à 2J représentent une illustration de la mise en œuvre du procédé d’analyse lexicale d’une chaîne de caractères selon un mode de réalisation de l’invention, la Fig. 3 représente un organigramme de mise à jour d’une table de hachage pour un procédé d’analyse lexicale d’une chaîne de caractères selon un mode de réalisation de l'invention, la Fig. 4 représente schématiquement l’architecture matérielle d’un dispositif électronique adapté pour mettre en œuvre un procédé d’analyse lexicale d’une chaîne de caractères selon un mode de réalisation de l’invention.Fig. 1 represents a flowchart of a method for the lexical analysis of a character string according to an embodiment of the invention, FIGS. 2A to 2J represent an illustration of the implementation of the method of lexical analysis of a character string according to an embodiment of the invention, FIG. 3 represents a flow chart for updating a hash table for a method of lexical analysis of a character string according to an embodiment of the invention, FIG. 4 schematically represents the hardware architecture of an electronic device suitable for implementing a method of lexical analysis of a character string according to an embodiment of the invention.

La Fig. 1 représente un organigramme d'un procédé 100 d’analyse lexicale d’une chaîne de caractères selon un mode de réalisation de l'invention. Le procédé 100 d’analyse lexicale d’une chaîne de caractères peut être exécuté par un dispositif électronique tel que décrit ci-après dans la Fig. 4. Ledit dispositif électronique comprend une mémoire dans laquelle est enregistrée une liste de sous-chaînes de caractères prédéterminées. Ces sous-chaînes correspondent à des mots prédéterminés ou « mots clefs » que le procédé 100 d’analyse lexicale se propose de retrouver (ou d’identifier) au sein de la chaîne de caractères. Le dispositif électronique est adapté pour recevoir, en entrée, la chaîne de caractères. Selon un mode de réalisation de l’invention, la chaîne de caractères est issue d’un module de reconnaissance vocale du dispositif électronique. Le dispositif électronique est adapté pour enregistrer dans une mémoire, en sortie, une liste ordonnée des sous-chaînes comprises dans la chaîne de caractères. Dit autrement, le dispositif électronique mettant en œuvre le procédé 100 d’analyse lexicale, reçoit d’un côté la chaîne de caractères et, en sortie, détermine une liste des sous-chaînes de caractères prédéterminées identifiées dans la chaîne de caractères. Cette liste est ordonnée dans le sens où les sous-chaînes de caractères apparaissent dans le même ordre que leur ordre d’apparition dans la chaîne de caractères. Le procédé peut être mis en œuvre par un dispositif électronique sur une chaîne de caractères de longueur finie. Selon un autre mode de réalisation de l’invention, le procédé 100 d’analyse lexicale peut être mis en œuvre de manière continue sur une chaîne de caractères de type « buffer » constamment alimentée, par exemple par un module de reconnaissance vocale.Fig. 1 represents a flowchart of a method 100 for lexical analysis of a character string according to an embodiment of the invention. The method 100 of lexical analysis of a character string can be executed by an electronic device as described below in FIG. 4. Said electronic device comprises a memory in which a list of predetermined character substrings is recorded. These substrings correspond to predetermined words or "keywords" that the method 100 of lexical analysis proposes to find (or identify) within the character string. The electronic device is adapted to receive, as input, the character string. According to one embodiment of the invention, the character string comes from a voice recognition module of the electronic device. The electronic device is adapted to store in an output memory an ordered list of substrings included in the character string. In other words, the electronic device implementing the method of lexical analysis 100, receives on one side the character string and, as an output, determines a list of predetermined character substrings identified in the character string. This list is ordered in the sense that the substrings appear in the same order as their order of appearance in the string. The method can be implemented by an electronic device on a character string of finite length. According to another embodiment of the invention, the method 100 of lexical analysis can be implemented continuously on a character string of the “buffer” type constantly supplied, for example by a voice recognition module.

Le procédé permet avantageusement de ne parcourir la chaîne de caractères à analyser qu’une seule fois, sans devoir effectuer aucun aller-retour. Comme décrit ciaprès, à chaque position testée, le procédé permet de tester la présence d’une des souschaînes prédéterminées. Le procédé s’appuie sur une table de hachage, avantageusement enregistrée dans une mémoire du dispositif électronique mettant en œuvre le procédé. Cette table de hachage peut être initialement prédéterminée. La table de hachage peut aussi être complétée par l’ajout de nouvelles sous-chaînes de caractères. Le procédé d’ajout d’une sous-chaîne de caractères dans la table de hachage est plus particulièrement décrit dans la Fig. 3, comme décrit ci-après.The method advantageously makes it possible to browse the character string to be analyzed only once, without having to make any round trips. As described below, at each position tested, the method makes it possible to test the presence of one of the predetermined sub-chains. The method is based on a hash table, advantageously recorded in a memory of the electronic device implementing the method. This hash table can be initially predetermined. The hash table can also be supplemented by adding new substrings. The method of adding a substring of characters to the hash table is more particularly described in FIG. 3, as described below.

La présente description de la Fig. 1 considère qu’une table de hachage prédéterminée est enregistrée dans la mémoire du dispositif électronique mettant en œuvre le procédé.The present description of FIG. 1 considers that a predetermined hash table is stored in the memory of the electronic device implementing the method.

Une table de hachage comprend une pluralité d’entrées, chaque entrée correspondant à un condensât (ou « hash » en anglais, c’est-à-dire le résultat d’une fonction de hachage) d’un préfixe d’une sous-chaîne de caractères, ainsi que dudit préfixe. On entend par préfixe les P premiers caractères d’une sous-chaîne de caractères. Le paramètre P permet donc de définir une longueur de préfixe. La valeur du paramètre P doit être à la fois la plus petite possible afin de pouvoir traiter des sous-chaînes de petite taille, tout en étant en même temps d’une valeur suffisante pour permettre une bonne discrimination des condensats. P est avantageusement fixé à la valeur « 3 », valeur qui permet un bon compromis. Ainsi, par exemple, le préfixe de la sous-chaîne de caractères « ETAGE », avec P = 3, est « ETA». Pour chaque entrée correspondant à un préfixe est associée une liste de sous-chaînes comprenant ce même préfixe. Cette liste de sous-chaînes est ordonnée, les sous-chaînes étant ainsi ordonnées en fonction de leur taille, de la plus grande vers la plus petite. La longueur de chaque sous-chaîne de caractères est enregistrée dans la liste, ainsi qu’un condensât de la sous-chaîne de caractères.A hash table includes a plurality of entries, each entry corresponding to a condensate (or "hash" in English, that is to say the result of a hash function) of a prefix of a sub- character string, as well as said prefix. By prefix is meant the first P characters of a substring. The parameter P therefore makes it possible to define a prefix length. The value of parameter P must be as small as possible in order to be able to process small substrings, while at the same time being of a sufficient value to allow good discrimination of the condensates. P is advantageously set to the value "3", a value which allows a good compromise. Thus, for example, the prefix of the substring of characters "STAGE", with P = 3, is "ETA". For each entry corresponding to a prefix is associated a list of substrings comprising this same prefix. This list of substrings is ordered, the substrings thus being ordered according to their size, from the largest to the smallest. The length of each substring is recorded in the list, as well as a condensate of the substring.

Un exemple d’une table de hachage est donné ci-après :An example of a hash table is given below:

Condensai Sous-chaîne de caractères Préfixe préfixe Sous-cliaînc Longueur Condensai Condensai substring Prefix prefix Subcliaïnc Length Condensai Hash (Préfixe A) Hash (Prefix A) Préfixe A Prefix A Sous-chaîne 1 Sub-string 1 Longueur sous-chaîne 1 Sub-chain length 1 //o.s/zfsous-chainc 1) //o.s/zfsous-chainc 1) Sous-chaîne 2 Sub-chain 2 Longueur sous-chaîne 2 Sub-chain length 2 //o.s/zfsous-chainc 2) //o.s/zfsous-chainc 2) Sous-chaîne 3 Sub-chain 3 Longueur sous-chaîne 3 Sub-chain length 3 //o.s/zfsous-chainc 3) //o.s/zfsous-chainc 3) Hash (Préfixe B) Hash (Prefix B) Préfixe B Prefix B Sous-chaîne 4 Sub-chain 4 Longueur sous-chaîne 4 Sub-chain length 4 //o.s/zfsous-chainc 4) //o.s/zfsous-chainc 4) Hash (Préfixe C) Hash (Prefix C) Préfixe C Prefix C Sous-chaîne 5 Sub-chain 5 Longueur sous-chaîne 5 Sub-chain length 5 //o.s/zfsous-chainc 5) //o.s/zfsous-chainc 5) Sous-chaîne 6 Sub-chain 6 Longueur sous-chaîne 6 Sub-chain length 6 //o.s/zfsous-chainc 6) //o.s/zfsous-chainc 6)

Tableau 1 - exemple de table de hachageTable 1 - example of hash table

Dans cet exemple de table de hachage illustrée dans le tableau 1, « sous-chaîne », avec i = 1 à 6, correspondent à des sous-chaînes de caractères différentes.In this example of hash table illustrated in table 1, “substring”, with i = 1 to 6, correspond to substrings of different characters.

Les trois sous-chaînes 1, 2 et 3 partagent un même « préfixe A». C’est-à-dire que ces trois sous-chaînes commencent par les mêmes P premiers caractères. Ces trois sous-chaînes sont associées au préfixe A. Dit autrement, une liste de sous-chaînes est associée au préfixe A, cette liste comprenant ici les trois sous-chaînes de caractères 1, et 3. De même, les sous-chaînes 5 et 6 partagent un même préfixe « préfixe C », c’est-à-dire qu’elles commencent par les mêmes P premiers caractères correspondant à « préfixe C ». Le préfixe est indiqué dans la deuxième colonne intitulée « Préfixes ». La première colonne « condensât » correspond au résultat d’une fonction de hachage sur ledit préfixe (fonction notée « HashQ »).The three substrings 1, 2 and 3 share the same "prefix A". That is, these three substrings start with the same first P characters. These three substrings are associated with the prefix A. In other words, a list of substrings is associated with the prefix A, this list comprising here the three substrings of characters 1, and 3. Similarly, the substrings 5 and 6 share the same prefix "prefix C", that is to say that they start with the same first P characters corresponding to "prefix C". The prefix is indicated in the second column entitled "Prefixes". The first column "condensate" corresponds to the result of a hash function on said prefix (function noted "HashQ").

La colonne « Longueurs » comprend la longueur de la sous-chaîne correspondante. Cela permet avantageusement d’ordonner chaque liste de souschaînes associées à un préfixe dans le sens décroissant. La dernière colonne « Condensât » comprend un condensât de la sous-chaîne correspondante. La fonction de hachage utilisée pour déterminer le condensât du préfixe et la fonction de hachage utilisée pour déterminer le condensât d’une sous-chaîne de caractères peuvent être différentes.The “Lengths” column includes the length of the corresponding substring. This advantageously makes it possible to order each list of sub-chains associated with a prefix in the descending direction. The last column "Condensate" includes a condensate of the corresponding substring. The hash function used to determine the condensate of the prefix and the hash function used to determine the condensate of a substring may be different.

Dit autrement, l’entrée correspondant au préfixe C est associée à une liste comprenant ici deux sous-chaînes de caractères, la sous-chaîne 5 et la sous-chaîne 6. Cette liste comprend de plus, pour chaque sous-chaîne, la longueur de la sous-chaîne et un condensât de la sous-chaîne de caractères. La liste est ordonnée.In other words, the entry corresponding to the prefix C is associated with a list comprising here two substrings of characters, substring 5 and substring 6. This list also comprises, for each substring, the length of the substring and a condensate of the substring. The list is ordered.

Ainsi, dans l’exemple donné, la longueur de la sous-chaîne 5 est supérieure ou égale à la longueur de la sous-chaîne 6.Thus, in the example given, the length of the substring 5 is greater than or equal to the length of the substring 6.

La table de hachage étant définie ou prédéterminée, le procédé d’analyse lexicale 100 peut être mis en œuvre.The hash table being defined or predetermined, the lexical analysis method 100 can be implemented.

Dans une première étape (non représentée), le dispositif électronique reçoit une chaîne de caractères à analyser. Cette chaîne de caractères peut être fournie par un module de reconnaissance vocale du dispositif électronique.In a first step (not shown), the electronic device receives a character string to be analyzed. This character string can be supplied by a voice recognition module of the electronic device.

Dans une étape 100, le dispositif électronique positionne un pointeur sur le premier caractère de la chaîne de caractères. Cette position peut correspondre par exemple à un indice « n = 0 » (cf. par exemple Fig. 2A).In a step 100, the electronic device positions a pointer on the first character of the character string. This position can correspond for example to an index "n = 0" (cf. for example Fig. 2A).

Dans une étape suivante 105, le dispositif électronique détermine un condensât de la sous-chaîne formée par les P premiers caractères de la chaîne de caractères à partir de la position actuelle du pointeur. Dit autrement, si le pointeur est positionné à la position « n » de la chaîne de caractères, et si « P = 3 », alors le dispositif électronique détermine un condensât de la sous-chaîne de trois caractères comprenant les caractères aux positions « n », « n+ 1 » et « n+2 ».In a next step 105, the electronic device determines a condensate of the substring formed by the first P characters of the character string from the current position of the pointer. In other words, if the pointer is positioned at position "n" of the character string, and if "P = 3", then the electronic device determines a condensate of the substring of three characters comprising the characters at positions "n "," N + 1 "and" n + 2 ".

Dans une étape suivante 110, le dispositif électronique détermine si le condensât déterminé à l’étape précédente 105 correspond à l’un des condensats « préfixe » de la table de hachage (cf. Fig. 2C ou bien 2F). Le dispositif électronique parcourt donc la table de hachage pour rechercher l’éventuelle présence d’un condensât préfixe égal au condensât calculé lors de l’étape 105 précédente. Comparer des condensats plutôt que des chaînes de caractères permet d’accélérer le temps de la comparaison et de réduire le temps de calcul nécessaire. En effet, la comparaison porte alors sur des condensats, c’est-à-dire des nombres, plutôt que sur des chaînes de caractères qui nécessiteraient d’être comparées caractère par caractère. La comparaison est ainsi plus rapide et/ou nécessite moins de temps de calcul.In a next step 110, the electronic device determines whether the condensate determined in the previous step 105 corresponds to one of the “prefix” condensates of the hash table (cf. FIG. 2C or else 2F). The electronic device therefore traverses the hash table to search for the possible presence of a prefix condensate equal to the condensate calculated during the previous step 105. Comparing condensates rather than strings can speed up the comparison time and reduce the computation time required. Indeed, the comparison then relates to condensates, that is to say numbers, rather than to strings of characters which would need to be compared character by character. The comparison is thus faster and / or requires less computation time.

Si aucun condensât ne correspond, alors le dispositif électronique passe à l’étape 115. Dans cette étape 115, le dispositif électronique incrémente la position du pointeur sur la chaîne de caractères afin de tester la position suivante (cf. passage de la Fig. 2A à la Fig. 2B, ou bien de la Fig. 2E à la Fig. 2F). Dit autrement, si le pointeur était à la position « n » de la chaîne de caractères, cette position du pointeur est incrémentée de un, « n » prenant la valeur de « n+1 ». Le procédé se poursuit alors par une nouvelle étape 105.If no condensate corresponds, then the electronic device goes to step 115. In this step 115, the electronic device increments the position of the pointer on the character string in order to test the next position (cf. passage in Fig. 2A in Fig. 2B, or from Fig. 2E to Fig. 2F). In other words, if the pointer was at position "n" of the character string, this position of the pointer is incremented by one, "n" taking the value of "n + 1". The process then continues with a new step 105.

Si un condensât préfixe est retrouvé (cf. Fig. 2C ou bien 2F), de valeur identique au condensât calculé lors de l’étape 105, alors le dispositif électronique passe à l’étape 120. Dans cette étape 120, le dispositif électronique retrouve la liste ordonnée des sous-chaînes de caractères associée au préfixe correspondant au condensât préfixe retrouvé. Les étapes 125 et 130 sont ensuite exécutées pour chaque sous-chaîne de caractères de la liste.If a prefix condensate is found (cf. Fig. 2C or 2F), with a value identical to the condensate calculated during step 105, then the electronic device goes to step 120. In this step 120, the electronic device finds the ordered list of substrings associated with the prefix corresponding to the condensate prefix found. Steps 125 and 130 are then executed for each substring of characters in the list.

Dans une étape 125, et successivement pour chaque longueur des sous-chaînes de caractères de la liste associée à ladite entrée condensât préfixe, et dans l’ordre de cette liste, le dispositif électronique détermine un condensât de la sous-chaîne de caractères formée par un nombre de caractères de la chaîne de caractères égal à ladite longueur à partir de la position du pointeur. Dit autrement, le dispositif électronique détermine, en commençant par la longueur de la sous-chaîne la plus grande dans la liste, un condensât de la sous-chaîne de caractères de même longueur formée par les caractères de la chaîne de caractères à partir de la position actuelle du pointeur.In a step 125, and successively for each length of the substrings of characters in the list associated with said prefix condensate entry, and in the order of this list, the electronic device determines a condensate of the substring of characters formed by a number of characters in the character string equal to said length from the position of the pointer. In other words, the electronic device determines, starting with the length of the largest substring in the list, a condensate of the substring of characters of the same length formed by the characters of the string from the current pointer position.

Si les deux condensais ne correspondent pas, alors, dans une étape 130, le dispositif électronique sélectionne la sous-chaîne de caractères de taille immédiatement inférieure pour répéter une étape 125. Si les deux condensais correspondent, et que le condensât est associé à plusieurs sous-chaînes de caractères dans la liste, alors ces sous-chaînes de caractères sont testées (lors de l’étape 125) l’une après l’autre, dans l’ordre dans lequel elles apparaissent dans la liste. Cet ordre, à longueur de sous-chaîne de caractères identique, importe peu.If the two condensates do not correspond, then, in a step 130, the electronic device selects the substring of characters of immediately smaller size to repeat a step 125. If the two condensates correspond, and the condensate is associated with several sub -strings in the list, then these substrings are tested (during step 125) one after the other, in the order in which they appear in the list. This order, with identical substring length, does not matter.

Si aucun condensât déterminé lors de l’étape 125 ne correspond à un condensât d’une sous-chaîne de caractères de la liste, et quand la liste a été parcourue en totalité, alors le dispositif électronique passe ensuite à l’étape 115 afin d’incrémenter la position du pointeur. Cela signifie qu’aucune sous-chaîne de caractères de la liste de sous-chaînes de caractères associées au condensât préfixe n’a été identifiée et que le procédé est repris à partir de la position suivante (n = n+1) de la chaîne de caractères à partir de l’étape 105.If no condensate determined during step 125 corresponds to a condensate of a substring of characters from the list, and when the list has been traversed in its entirety, then the electronic device then proceeds to step 115 in order to 'increment the pointer position. This means that no character substring from the list of substrings associated with the prefix condensate has been identified and that the process is repeated from the next position (n = n + 1) of the string characters from step 105.

Si au contraire le condensât déterminé lors de l’étape 125 correspond à un condensât d’une sous-chaîne de caractères dans la liste associée au condensât préfixe, cela signifie que très probablement ladite sous-chaîne de caractères a été identifiée dans la chaîne de caractères. Selon le mode de réalisation de l’invention, une étape optionnelle 135 de vérification de l’égalité entre la sous-chaîne de caractères identifiée et la sous-chaîne de caractères de même longueur formée par les caractères de la chaîne de caractères à partir de la position actuelle du pointeur est effectuée. Dit autrement, le dispositif électronique vérifie que la sous-chaîne de la table de hachage retrouvée est égale à la sous-chaîne formée par un nombre de caractères de la chaîne de caractères égal à ladite longueur à partir de la position du pointeur. Cette étape est optionnelle, en fonction en particulier de la fonction de hachage utilisée. Cette étape a pour objectif d’éviter les situations de collisions de la fonction de hachage.If, on the contrary, the condensate determined during step 125 corresponds to a condensate of a substring of characters in the list associated with the prefix condensate, this means that most probably said substring of characters has been identified in the string of characters. According to the embodiment of the invention, an optional step 135 of verifying the equality between the identified substring of characters and the substring of characters of the same length formed by the characters of the string of characters from the current pointer position is taken. In other words, the electronic device verifies that the substring of the hash table found is equal to the substring formed by a number of characters in the string equal to said length from the position of the pointer. This step is optional, depending in particular on the hash function used. The purpose of this step is to avoid collision situations with the hash function.

Une fois une sous-chaîne de la table de hachage identifiée dans la chaîne de caractères, le dispositif électronique enregistre ladite sous-chaîne de caractères dans la mémoire. Avantageusement, ladite sous-chaîne de caractères est enregistrée dans une liste ordonnée correspondant à une liste des sous-chaînes de caractères de la table de hachage qui ont été identifiées dans la chaîne de caractères. Le dispositif électronique peut enregistrer aussi la position dans la chaîne de caractères (c’est-à-dire la valeur de « n »), et possiblement la longueur, de chaque sous-chaîne de caractères ainsi identifiée.Once a substring of the hash table identified in the character string, the electronic device stores said substring of characters in the memory. Advantageously, said substring of characters is recorded in an ordered list corresponding to a list of substrings of characters in the hash table which have been identified in the string of characters. The electronic device can also record the position in the character string (ie the value of "n"), and possibly the length, of each substring thus identified.

Cette liste ordonnée, à ne pas confondre avec les listes ordonnées de la table de hachage, correspond in fine au résultat du procédé d’analyse lexicale. Le procédé 100 d’analyse lexicale permet d’enregistrer dans cette liste les sous-chaînes de caractères identifiées (ou reconnues).This ordered list, not to be confused with the ordered lists of the hash table, ultimately corresponds to the result of the lexical analysis process. The method 100 of lexical analysis makes it possible to record in this list the substrings of characters identified (or recognized).

Une fois une sous-chaîne de caractères enregistrée dans la mémoire, le dispositif électronique passe à l’étape 145. Dans cette étape 145, la position du pointeur sur la chaîne de caractères est déplacée d’un nombre de caractères correspondant à la longueur de la sous-chaîne de caractères identifiée. Dit autrement, suite à l’étape 145, le procédé 100 d’analyse lexicale se poursuit à partir du premier caractère de la chaîne de caractères après la sous-chaîne de caractères venant d’être identifiée (cf. Fig. 2J).Once a substring of characters recorded in the memory, the electronic device goes to step 145. In this step 145, the position of the pointer on the character string is moved by a number of characters corresponding to the length of the character substring identified. In other words, following step 145, the method 100 of lexical analysis continues from the first character of the character string after the substring of characters just identified (cf. Fig. 2J).

Les Figs. 2A à 2J représentent une illustration de la mise en œuvre du procédé 100 d’analyse lexicale d’une chaîne de caractères selon un mode de réalisation de l’invention.Figs. 2A to 2J represent an illustration of the implementation of the method 100 of lexical analysis of a character string according to an embodiment of the invention.

Une table de hachage prédéterminée est enregistrée dans la mémoire du dispositif électronique mettant en œuvre le procédé. Ladite table de hachage est représentée ci-après :A predetermined hash table is stored in the memory of the electronic device implementing the method. Said hash table is shown below:

Condensât Sons-chaîne de caractères Préfixe Sons-chaîne Longueur Condensai Condensate Sounds string Sound chain prefix Length Condensai /W/(ETA) / W / (ETA) ETA ETA ETAGERE SHELF 7 7 TTasB (ETAGERE) TTasB (SHELF) ETAGE FLOOR 5 5 //o.sA(ETAGE) //o.sA(ETAGE) ETAL ET AL 4 4 //mA/ETAL) // mA / CAL)

Tableau 2 - Exemple de table de hachageTable 2 - Example of hash table

Dans la Fig. 2A, un pointeur (représenté dans les figures par une flèche pointant vers le bas) est positionné en tout début de la chaîne de caractères. Dans l’exemple donné, la valeur de P est égale à 3. Ainsi les trois premiers caractères de la chaîne de caractères, à partir de la position du pointeur, apparaissent en grisé. Le dispositif électronique, conformément à l’étape 105 du procédé 100 illustré précédemment, détermine un condensât des trois caractères « AAE » de la chaîne de caractères grisés. Ce condensât ne correspondant pas au condensât du préfixe « ETA » enregistré dans la table de hachage, le pointeur est déplacé d’une position sur la chaîne de caractères (étape 115 du procédé 100) comme illustré dans la Fig. 2B.In Fig. 2A, a pointer (represented in the figures by an arrow pointing downwards) is positioned at the very beginning of the character string. In the example given, the value of P is equal to 3. Thus the first three characters of the character string, starting from the position of the pointer, appear in gray. The electronic device, in accordance with step 105 of method 100 illustrated above, determines a condensate of the three characters "AAE" of the gray character string. This condensate does not correspond to the condensate of the prefix "ETA" recorded in the hash table, the pointer is moved by one position on the character string (step 115 of method 100) as illustrated in FIG. 2B.

Dans la Fig. 2B, de même, le condensât de « AET », les P (P =3) caractères de la chaîne de caractères, est déterminé. Comme il ne correspond pas non plus au condensât du préfixe « ETA » dans la table de hachage, le pointeur est encore déplacé, comme illustré dans la Fig. 2C.In Fig. 2B, likewise, the condensate of "AET", the P (P = 3) characters in the character string, is determined. Since it also does not correspond to the condensate of the prefix "ETA" in the hash table, the pointer is still moved, as illustrated in FIG. 2C.

Dans la Fig. 2C, le condensât des trois caractères grisés « ETA » de la chaîne de caractères correspond au condensât du préfixe « ETA » dans la table de hachage, le dispositif électronique retrouve alors les sous-chaînes de caractères associées au préfixe ETA, c’est-à-dire « ETAGERE », « ETAGE » et « ETAL » et peut les rechercher une par une (étapes 120, 125, 130 et 135 du procédé 100).In Fig. 2C, the condensate of the three grayed-out characters "ETA" of the character string corresponds to the condensate of the prefix "ETA" in the hash table, the electronic device then finds the substrings of characters associated with the prefix ETA, that is ie "SHELF", "STAGE" and "STALL" and can search for them one by one (steps 120, 125, 130 and 135 of method 100).

Comme indiqué dans la Fig. 2D, le dispositif électronique commence par rechercher la sous-chaîne de plus grande longueur, c’est-à-dire « ETAGERE ». Pour cela, le dispositif électronique détermine un condensât correspondant aux caractères grisés dans la Fig. 2D, c’est-à-dire la sous-chaîne de caractères de même longueur que la sous-chaîne « ETAGERE » de la chaîne de caractères formée à partir du pointeur. Dans le cas illustré, le condensât déterminé correspond bien au condensât de la sous chaîne « ETAGERE » dans la table de hachage. Une vérification caractère par caractère de la sous-chaîne de caractères peut ensuite être exécutée (étape 135 du procédé 100). La sous-chaîne de caractères « ETAGERE » a donc été identifiée dans la chaîne de caractères. Cette sous-chaîne est enregistrée dans une liste ordonnée comprenant le résultat de l’analyse lexicale.As shown in Fig. 2D, the electronic device begins by searching for the longest substring, that is to say "SHELF". For this, the electronic device determines a condensate corresponding to the grayed out characters in FIG. 2D, that is to say the substring of characters of the same length as the substring "SHELF" of the character string formed from the pointer. In the illustrated case, the condensate determined corresponds well to the condensate of the "SHELF" substring in the hash table. Character-by-character verification of the character substring can then be performed (step 135 of method 100). The substring "SHELF" was therefore identified in the string. This substring is saved in an ordered list including the result of the lexical analysis.

Dans la Fig. 2E, conformément à l’étape 145 du procédé 100, le pointeur est déplacé à la suite de la sous-chaîne identifiée. Le condensât de la sous-chaîne de caractères grisés ne correspondant pas à un condensât connu, le pointeur est déplacé d’une position, comme illustré dans la Fig. 2F.In Fig. 2E, in accordance with step 145 of method 100, the pointer is moved after the identified substring. The condensate of the gray character substring does not correspond to a known condensate, the pointer is moved by one position, as illustrated in Fig. 2 F.

Dans la Fig. 2F, le dispositif électronique détermine que le condensât de la souschaîne de caractères formée par les caractères grisés correspond au condensât du préfixe « ETA ».In Fig. 2F, the electronic device determines that the condensate of the character substring formed by the grayed out characters corresponds to the condensate of the prefix "ETA".

Dans la Fig. 2G, le dispositif électronique détermine le condensât de la souschaîne de caractères de même longueur que la sous-chaîne de caractères « ETAGERE » à partir de la position actuelle du pointeur, c’est-à-dire des caractères grisés dans la Fig. 2G. Les condensats étant différents, le dispositif électronique passe à la sous-chaîne de caractères associée au préfixe ETA suivant, c’est-à-dire « ETAGE ».In Fig. 2G, the electronic device determines the condensate of the character substring of the same length as the substring of characters "SHELF" from the current position of the pointer, that is to say the grayed out characters in FIG. 2G. The condensates being different, the electronic device goes to the substring of characters associated with the following prefix ETA, that is to say "FLOOR".

Ainsi, dans la Fig. 2H, le dispositif électronique détermine le condensât de la sous-chaîne de caractères de même longueur que la sous-chaîne de caractères « ETAGE » à partir de la position actuelle du pointeur, c’est-à-dire des caractères grisés dans la Fig. 2H. Les condensats étant différents, le dispositif électronique passe à la sous-chaîne de caractères associée au préfixe ETA suivant, c’est-à-dire « ETAL ».Thus, in FIG. 2H, the electronic device determines the condensate of the substring of characters of the same length as the substring of characters "FLOOR" from the current position of the pointer, that is to say the grayed out characters in FIG . 2H. The condensates being different, the electronic device goes to the substring of characters associated with the following ETA prefix, that is to say "ETAL".

Ainsi, dans la Fig. 21, le dispositif électronique détermine le condensât de la sous-chaîne de caractères de même longueur que la sous-chaîne de caractères « ETAL » à partir de la position actuelle du pointeur, c’est-à-dire des caractères grisés dans la Fig. 21. Les condensats correspondent. La sous-chaîne « ETAL » est identifiée, le dispositif électronique peut déplacer ensuite le pointeur à la suite de la sous-chaîne de caractères identifiée, comme illustrée dans la Fig. 2J.Thus, in FIG. 21, the electronic device determines the condensate of the substring of characters of the same length as the substring of characters "ETAL" from the current position of the pointer, that is to say the grayed out characters in FIG. . 21. The condensates match. The “ETAL” substring is identified, the electronic device can then move the pointer following the identified substring of characters, as illustrated in FIG. 2J.

Le procédé 100 est ainsi exécuté de façon itérative sur la chaîne de caractères, et ce jusqu’à la fin de la chaîne de caractères. Selon un mode de réalisation de l’invention, la chaîne de caractères est continue, c’est-à-dire qu’elle peut être complétée en permanence par exemple par un module de reconnaissance vocale.The method 100 is thus executed iteratively on the character string, and this until the end of the character string. According to one embodiment of the invention, the character string is continuous, that is to say that it can be continuously supplemented, for example by a voice recognition module.

La Fig. 3 représente un organigramme d’un procédé 300 de mise à jour d’une table de hachage pour un procédé d’analyse lexicale d’une chaîne de caractères selon un mode de réalisation de l'invention. Ce procédé 300 peut être exécuté préalablement à la réalisation du procédé 100. Le procédé 300 peut aussi être réalisé à tout moment lors de l’exécution du procédé 100, la table de hachage étant alors mise à jour dynamiquement. Possiblement, si une analyse lexicale d’une chaîne de caractères est en cours lors de la mise à jour de la table de hachage, le procédé peut être réinitialisé à sa première étape afin de prendre en compte la ou les nouvelles entrées ajoutées ou supprimées dans la table de hachage. Cela n’est toutefois pas obligatoire.Fig. 3 shows a flow diagram of a method 300 for updating a hash table for a method of lexical analysis of a character string according to an embodiment of the invention. This method 300 can be executed before carrying out the method 100. The method 300 can also be carried out at any time during the execution of the method 100, the hash table then being updated dynamically. Possibly, if a lexical analysis of a character string is in progress during the update of the hash table, the process can be reset at its first step in order to take into account the new entry (s) added or deleted in the hash table. However, this is not mandatory.

Le procédé 300 de mise à jour est illustré avec la table de hachage illustrée précédemment dans le tableau 2.The update method 300 is illustrated with the hash table previously illustrated in Table 2.

Dans une première étape 301, le dispositif électronique reçoit une sous-chaîne de caractères (par exemple ici « ETAT »). Cette sous-chaîne de caractères est a priori différente des sous-chaînes de caractères déjà enregistrées dans la table de hachage. Une étape de vérification de la non présence de la sous-chaîne de caractères peut être exécutée. Dans ce cas, la sous-chaîne peut être ignorée si la vérification montre que la sous-chaîne est déjà présente dans la table de hachage.In a first step 301, the electronic device receives a substring of characters (for example here "STATUS"). This substring of characters is a priori different from the substrings of characters already recorded in the hash table. A step of checking for the absence of the substring of characters can be carried out. In this case, the substring can be ignored if the check shows that the substring is already present in the hash table.

Dans une étape suivante 305, le dispositif électronique détermine un condensât correspondant au préfixe de la sous-chaîne reçue, c’est-à-dire fonction des P premiers caractères de ladite sous-chaîne. Dit autrement, le dispositif électronique détermine un condensât de la sous-chaîne de caractères formée par le P premiers caractères de la sous-chaîne de caractères reçue (dans notre exemple, hash(ETA)).In a next step 305, the electronic device determines a condensate corresponding to the prefix of the substring received, that is to say as a function of the first P characters of said substring. In other words, the electronic device determines a condensate of the substring of characters formed by the first P characters of the substring of characters received (in our example, hash (ETA)).

Dans une étape 310, le dispositif électronique détermine si une entrée de la table de hachage, c’est-à-dire un « hash(préfixe) », correspond au condensât déterminé lors de l’étape précédente.In a step 310, the electronic device determines whether an entry in the hash table, that is to say a "hash (prefix)", corresponds to the condensate determined during the previous step.

Si non, une nouvelle entrée est créée dans la table de hachage et la sous-chaîne de caractères est placée dans une nouvelle liste associée à cette nouvelle entrée.If not, a new entry is created in the hash table and the substring is placed in a new list associated with this new entry.

Si oui, la sous-chaîne de caractères reçue doit être enregistrée dans la liste associée au condensât préfixe identifié (ici dans notre exemple, hash(ETA)).If yes, the substring received must be saved in the list associated with the identified prefix condensate (here in our example, hash (ETA)).

Dans une étape 315, le dispositif électronique parcourt la liste associée afin de déterminer la position d’enregistrement de la sous-chaîne de caractères reçue en fonction de la longueur de ladite sous-chaîne. En effet, la liste est ordonnée, chaque sous-chaîne de caractères de la liste étant classée en fonction de sa longueur. Les souschaînes de caractères sont ordonnées par ordre décroissant. Il s’agit d’une convention, les sous-chaînes pourraient être ordonnées par ordre croissant, il faudrait alors juste modifier le procédé 100 afin de parcourir la liste par sa fin. La sous-chaîne de caractères reçue est enregistrée dans la liste en fonction de sa longueur et de la longueur des autres sous-chaînes déjà présentes dans la table de hachage. La longueur de la sous-chaîne est aussi enregistrée.In a step 315, the electronic device traverses the associated list in order to determine the position for recording the substring of characters received as a function of the length of said substring. Indeed, the list is ordered, each substring of characters in the list being classified according to its length. The character substrings are ordered in descending order. It is a convention, the substrings could be ordered in ascending order, we would just have to modify the method 100 in order to browse the list by its end. The character substring received is saved in the list according to its length and the length of the other substrings already present in the hash table. The length of the substring is also saved.

Dans une étape suivante 320, le dispositif électronique détermine le condensât de la sous-chaîne de caractères reçue.In a next step 320, the electronic device determines the condensate of the received substring of characters.

Dans une étape 325, le dispositif électronique enregistre le condensât de la souschaîne de caractères dans la table de hachage, en association avec la sous-chaîne. In fine, le dispositif électronique enregistre, dans la liste ordonnée des sous-chaînes associées à l’entrée déterminée, la sous-chaîne reçue en association avec sa longueur et un condensât, la position dans la liste de la sous-chaîne étant fonction de la longueur de la sous-chaîne reçue.In a step 325, the electronic device records the condensate of the character substring in the hash table, in association with the substring. Ultimately, the electronic device records, in the ordered list of substrings associated with the determined entry, the substring received in association with its length and a condensate, the position in the list of the substring being a function of the length of the substring received.

La table de hachage, telle que précédemment illustrée, est ainsi modifiée (la sous-chaîne de caractères ajoutée est identifiée ci-dessous en gras et souligné) :The hash table, as previously illustrated, is thus modified (the added substring is identified below in bold and underlined):

Condensât Sous-chaîne de caractères préfixe Sous-chaîne Longueur Condensai Condensate substring prefix Sub-chain Length Condensai //mA(ETA) // mA (ETA) ETA ETA ETAGERE SHELF 7 7 /Æ/.s/zf ETAGERE) /Æ/.s/zf SHELF) ETAGE FLOOR 5 5 /Æ/.sAf ETAGE) /Æ/.sAf FLOOR) ETAT STATE 4 4 Hash(ETAT) Hash (STATE) ETAL ET AL 4 4 HashfET&L) HashfET & L)

Tableau 3 - Table de hachage mise à jourTable 3 - Updated hash table

La Fig. 4 représente schématiquement l’architecture matérielle d’un dispositif électronique 400 adapté pour mettre en œuvre un procédé d’analyse lexicale d’une chaîne de caractères selon un mode de réalisation de l’invention. Le dispositif électronique 400 est par exemple adapté pour mettre en œuvre les procédés décrits dans la Fig. 2 et dans la Fig. 3.Fig. 4 schematically represents the hardware architecture of an electronic device 400 adapted to implement a method of lexical analysis of a character string according to an embodiment of the invention. The electronic device 400 is for example adapted to implement the methods described in FIG. 2 and in FIG. 3.

Selon un mode de réalisation de l’invention, le dispositif électronique 400 comprend une mémoire dans laquelle est enregistrée une liste de sous-chaînes de caractères prédéterminées. Le dispositif électronique 400 est adapté pour :According to one embodiment of the invention, the electronic device 400 comprises a memory in which a list of predetermined character substrings is recorded. The electronic device 400 is suitable for:

pour chaque sous-chaîne de caractères prédéterminée, déterminer un condensât, dit condensât préfixe, en fonction des P premiers caractères de ladite sous-chaîne, P étant prédéterminé, pour chaque sous-chaîne de caractères prédéterminée, déterminer, dans une table, dite table de hachage, enregistrée dans la mémoire du dispositif électronique, une entrée correspondant au condensât préfixe associé à ladite sous-chaîne, pour chaque entrée de la table de hachage correspondant à un préfixe, déterminer une liste ordonnée des sous-chaînes prédéterminées associées, chaque sous-chaîne prédéterminée étant associée à un condensât, dit condensât sous-chaîne, fonction de la sous-chaîne prédéterminée, la liste étant ordonnée selon une longueur décroissante des sous-chaînes prédéterminées, et, recevoir une chaîne de caractères et déterminer une position d’un pointeur sur la chaîne de caractères, la position correspondant initialement au premier caractère de la chaîne de caractères.for each predetermined character substring, determining a condensate, called prefix condensate, as a function of the first P characters of said substring, P being predetermined, for each predetermined character substring, determining, in a table, called table hash, stored in the memory of the electronic device, an entry corresponding to the prefix condensate associated with said substring, for each entry in the hash table corresponding to a prefix, determine an ordered list of predetermined associated substrings, each -predetermined chain being associated with a condensate, called substring condensate, function of the predetermined substring, the list being ordered according to a decreasing length of the predetermined substrings, and, receiving a character string and determining a position of a pointer to the character string, the position corresponding initially at the first character of the character string.

Le dispositif électronique 400 est adapté pour mettre en œuvre les étapes itératives de :The electronic device 400 is suitable for implementing the iterative steps of:

Ainsi, selon un mode de réalisation de l’invention, le dispositif électronique 400 comprend, reliés par un bus de communication : un processeur ou CPU (« Central Processing Unit » en anglais) 401 ; une mémoire MEM 402 de type RAM (« Random Access Memory » en anglais) et/ou ROM (« Read Only Memory » en anglais), un module réseau NET 403, un module de stockage STCK 404 de type stockage interne et possiblement une pluralité de modules 405 à 40N pour mettre en œuvre des fonctionnalités du dispositif électronique 400. Par exemple, le module 405 peut être un module de reconnaissance vocale, lequel module comprenant un microphone est adapté pour convertir un signal audio capté en une chaîne de caractères. Ladite chaîne de caractères peut faire l’objet du procédé d’analyse lexicale, objet du présent document. Un module 40N peut être un module de contrôle d’un équipement domotique connecté au dispositif électronique 400.Thus, according to an embodiment of the invention, the electronic device 400 comprises, connected by a communication bus: a processor or CPU ("Central Processing Unit" in English) 401; a memory RAM 402 of RAM type (“Random Access Memory” in English) and / or ROM (“Read Only Memory” in English), a network module NET 403, a storage module STCK 404 of internal storage type and possibly a plurality modules 405 to 40N for implementing functionalities of the electronic device 400. For example, the module 405 can be a voice recognition module, which module comprising a microphone is suitable for converting an audio signal picked up into a character string. Said character string can be the subject of the lexical analysis process, the subject of this document. A module 40N can be a module for controlling home automation equipment connected to the electronic device 400.

Le module de stockage STCK 404 peut être de type disque dur HDD (« Hard Disk Drive » en anglais) ou SSD (« Solid-State Drive » en anglais), ou de type lecteur de support de stockage externe, tel un lecteur de cartes SD (« Secure Digital » en anglais). Le processeur CPU 401 peut enregistrer des données, ou informations, dans la mémoire MEM 402 ou dans le module de stockage STCK 404. Selon un mode de réalisation de l’invention, le module de stockage STCK 404 peut être la mémoire telle que décrite précédemment lors de la description de la Fig. 2 ou de la Fig. 3. Le processeur CPU 401 peut lire des données enregistrées dans la mémoire MEM 402 ou dans le module de stockage STCK 404. Ces données peuvent correspondre à des paramètres de configuration, des instructions, à une chaîne de caractères reçue par le dispositif électronique 400, à la table de hachage, à une liste de sous-chaînes prédéterminées ou à toute donnée précédemment décrite. Le module réseau NET 403 permet la connexion du dispositif électronique 400 à un réseau local et/ou Internet. Le dispositif électronique 400 peut donc recevoir une chaîne de caractères via le module réseau NET 403. Le dispositif électronique 400 peut accéder à un stockage ou une mémoire distante afin de retrouver et/ou enregistrer toute donnée précédemment décrite.The STCK 404 storage module can be of the hard disk type HDD (“Hard Disk Drive” in English) or SSD (“Solid-State Drive” in English), or of the reader type of external storage medium, such as a card reader SD ("Secure Digital" in English). The processor CPU 401 can record data, or information, in the memory MEM 402 or in the storage module STCK 404. According to one embodiment of the invention, the storage module STCK 404 can be the memory as described previously during the description of FIG. 2 or of FIG. 3. The processor CPU 401 can read data recorded in the memory MEM 402 or in the storage module STCK 404. This data can correspond to configuration parameters, instructions, to a character string received by the electronic device 400, to the hash table, to a list of predetermined substrings or to any data previously described. The NET network module 403 allows the electronic device 400 to be connected to a local network and / or the Internet. The electronic device 400 can therefore receive a character string via the NET network module 403. The electronic device 400 can access a storage or a remote memory in order to find and / or save any data previously described.

Le processeur CPU 401 est capable d’exécuter des instructions chargées dans la mémoire MEM 402, par exemple à partir du module de stockage STCK 404 ou via le module réseau NET 403. Lorsque le dispositif électronique 400 est mis sous tension, le processeur CPU 401 est capable de lire de la mémoire MEM 402 des instructions et de les exécuter. Ces instructions forment un programme d’ordinateur causant la mise en œuvre, par le processeur CPU 401, de tout ou partie des procédés et étapes décrits ci-avant, particulièrement le procédé décrit dans les Figs. 1, 2 ou 3. Ainsi, tout ou partie des procédés et étapes décrits ci-avant peut être implémenté sous forme logicielle par exécution d’un ensemble d’instructions par une machine programmable, telle qu’un DSP (« Digital SignalProcessor » en anglais) ou un microcontrôleur. Tout ou partie des procédés et étapes décrits ici peuvent aussi être implémentés sous forme matérielle par une machine ou un composant dédié, tel qu’un FPGA (« FieldProgrammable Gâte Array » en anglais) ou un ASIC (« Application-Specific Integrated Circuit » en anglais). Les fonctions du dispositif électronique 400 peuvent être intégrées dans un dispositif électronique existant par une mise à jour d’un logiciel, c’est-à-dire par exemple par mise à jour du microprogramme (« firmware » en anglais) du dispositif électronique.The processor CPU 401 is capable of executing instructions loaded in the memory MEM 402, for example from the storage module STCK 404 or via the network module NET 403. When the electronic device 400 is powered up, the processor CPU 401 is able to read and execute instructions from MEM 402 memory. These instructions form a computer program causing the implementation, by the processor CPU 401, of all or part of the methods and steps described above, particularly the method described in Figs. 1, 2 or 3. Thus, all or part of the methods and steps described above can be implemented in software form by execution of a set of instructions by a programmable machine, such as a DSP ("Digital SignalProcessor" in English) or a microcontroller. All or part of the methods and steps described here can also be implemented in hardware form by a machine or a dedicated component, such as an FPGA ("FieldProgrammable Gâte Array" in English) or an ASIC ("Application-Specific Integrated Circuit" in English). The functions of the electronic device 400 can be integrated into an existing electronic device by updating software, that is to say for example by updating the firmware of the electronic device.

Le dispositif électronique 400 peut être tout ou partie d’un dispositif de contrôle d’une installation domotique.The electronic device 400 can be all or part of a device for controlling a home automation installation.

Il est à noter que le procédé est adapté pour reconnaître des sous-chaînes de caractères de longueur supérieure ou égale à P. Selon un mode de réalisation complémentaire de l’invention, le procédé comprend un procédé spécifique permettant de reconnaître des sous-chaînes de caractères de taille inférieure à P. Ce procédé spécifique n’utilise pas la notion de hachage (ou condensât) et permet de reconnaître ces sous-chaînes de taille inférieure à P par comparaison directe des sous-chaînes de caractères, possiblement caractère par caractère. Ces étapes peuvent avantageusement être intercalées après l’étape 101 (ou l’étape 115 après une itération) et avant l’étape 105.It should be noted that the method is suitable for recognizing substrings of characters of length greater than or equal to P. According to a complementary embodiment of the invention, the method comprises a specific method making it possible to recognize substrings of characters of size less than P. This specific process does not use the concept of hashing (or condensate) and makes it possible to recognize these substrings of size less than P by direct comparison of the substrings of characters, possibly character by character. These steps can advantageously be inserted after step 101 (or step 115 after an iteration) and before step 105.

Claims

1) Method for lexical analysis of a character string, the method being executed by an electronic device, the electronic device comprising a memory in which a list of predetermined character substrings is recorded, the electronic device being adapted to receive , as input, a character string and to store in memory, at output, an ordered list of substrings included in the character string, the method comprising the prior steps of:

- for each predetermined character substring, determining a condensate, called the prefix condensate, as a function of the first P characters of said subchain, P being predetermined,

- for each predetermined character substring, determining, in a table, called hash table, stored in the memory of the electronic device, an entry corresponding to the prefix condensate associated with said substring,

- for each entry of the hash table corresponding to a prefix, determine an ordered list of the associated predetermined substrings, each predetermined substring being associated with a condensate, called condensate substring, function of the predetermined substring, the list being ordered according to a decreasing length of the predetermined substrings, and,

- receive the character string and determine a position of a pointer on the character string, the position initially corresponding to the first character of the character string, the method further comprising the iterative steps of:

- determine a first condensate of the substring formed by P characters of the character string from the position of the pointer,

- determine if a prefix condensate entry of the hash table corresponds to this first condensate, if so, for each length of the predetermined substrings of the list associated with said prefix condensate entry, and in the order of the list, determine a condensate of the substring formed by a number of characters of the character string equal to said length from the position of the pointer, and, if said condensate is equal to a condensate substring associated with one of the substrates predetermined strings of the same length, then:

o check that said predetermined substring is equal to the substring formed by a number of characters in the string equal to said length from the position of the pointer, and, if this is the case, o save said substring, known as identified substring, in the ordered list of predetermined substrings included in the character string, o position the pointer on the character string at a new position corresponding to the position added with a number of characters corresponding to the length of the substring identified, if not, position the pointer on the character string at a new position corresponding to the movement of a character on the character string.

2) Method according to the preceding claim, the method comprising the subsequent steps of updating the list of predetermined character substrings:

- receive a substring,

- determine the condensate called the prefix condensate, as a function of the first P characters of said substring,

- determine an entry in the hash table corresponding to said prefix condensate,

- save the substring in the ordered list of substrings associated with the specified entry, the position in the substring list being a function of the length of the substring,

- determine and save a condensate of the substring in association with the substring in the list.

3) Electronic device allowing a lexical analysis of a character string, the electronic device comprising a memory in which a list of predetermined character substrings is recorded, the electronic device being adapted for:

- receive a character string and determine a position of a pointer on the character string, the position initially corresponding to the first character of the character string, the electronic device being further adapted for:

determine a first condensate of the substring formed by P characters of the character string from the position of the pointer, determine if a prefix condensate entry of the hash corresponds to this first condensate, if so, for each length of the predetermined substrings from the list associated with said prefix condensate entry, and in the order of the list, determine a condensate of the substring formed by a number of characters in the string equal to said length from the position of the pointer, and, if said condensate is equal to a substring condensate associated with one of the predetermined substrings of the same length, then:

o check that said predetermined substring is equal to the substring formed by a number of characters in the string equal to said length from the position of the pointer, and, if this is the case, o save said substring, known as identified substring, in an ordered list of predetermined substrings included in the character string, said list being saved in the memory of the electronic device, o position the pointer on the character string in a new position corresponding to the position plus a number of characters corresponding to the length of the identified substring, if not, position the pointer on the character string at a new position corresponding to the movement of a character on the character string.

4) Computer program, characterized in that it comprises instructions for implementing, by a processor of an electronic device, the lexical analysis method, according to one of claims 1 to 2, when the program computer is

5 executed by the processor.

5) Recording medium, readable by an electronic device, on which the computer program according to the preceding claim is stored.