FR2883652A1

FR2883652A1 - Sub markup e.g. digital image, accessing method for use during extensible markup language file, involves verifying existence of index in file, in response to request of access to sub markup, by verifying presence of index in sub markup

Info

Publication number: FR2883652A1
Application number: FR0502873A
Authority: FR
Inventors: Jean Jacques Moreau; Herve Ruellan
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2005-03-23
Filing date: 2005-03-23
Publication date: 2006-09-29
Anticipated expiration: 2025-03-23
Also published as: FR2883652B1

Abstract

The method involves verifying an existence of an index associated to a mother markup and represented in the form of a table, in a extensible markup language file, in response to an access request to a chosen sub markup. A carriage return is inserted at an end of the index. The index existence is verified by verifying the index presence in the sub markup. An identifier of a request is compared with table identifiers based on a comparison and search law if the index existence verification result is positive. The position of the sub markup is obtained and the sub markup is directly accessed. Independent claims are also included for the following: (1) a device for accessing a sub markup during reading a markup language file (2) an information medium readable by a computer system and comprising instructions of a computer program for implementing a method of accessing the sub markup (3) a computer program stored on an information medium including instructions permitting to implement a sub markup accessing method.

Description

La présente invention se rapporte à l'accès à une sous-balise lors de laThe present invention relates to access to a sub-tag during the

lecture d'un fichier écrit en langage de balisage. reading a file written in markup language.

Elle trouve une application générale dans le traitement de flots de données écrit en langage de balisage et plus particulièrement en XML, acronyme de eXtensible Markup Language , c'est-à-dire langage à balise extensible. It finds a general application in the processing of data flows written in markup language and more particularly in XML, an acronym for eXtensible Markup Language, that is to say extensible tag language.

XML est un langage informatique permettant de mettre en forme des documents grâce à des balises (Markup). XML is a computer language used to format documents using Markup tags.

En pratique, XML est un format de description de données et non de leur représentation ou de leur affichage. In practice, XML is a data description format and not their representation or display.

Le format XML est de plus en plus utilisé pour le stockage et la transmission de données numériques. Par exemple, pour représenter un album de photographies. The XML format is increasingly used for storing and transmitting digital data. For example, to represent a photo album.

Cela n'est pas sans générer un certain nombre de problèmes. Notamment, lorsque la quantité de données à représenter est importante. 20 L'accès à ces données peut alors prendre du temps. This is not without generating a number of problems. In particular, when the amount of data to be represented is important. Access to these data can then take a long time.

C'est plus particulièrement le cas lorsqu'on n'a besoin que d'une souspartie de ces données, par exemple une photographie numérique parmi un album de photographies. This is particularly the case when only a subset of these data is needed, for example a digital photograph from a photo album.

Plutôt que l'accès à l'ensemble des photographies, il se pose généralement le problème de savoir si telle photographie particulière est contenue dans l'album (par exemple pour vérifier si la personne qui en fait la requête est autorisée à visualiser cette photographie). Rather than having access to all the photographs, there is usually the problem of knowing if a particular photograph is contained in the album (for example to check whether the person making the request is authorized to view this photograph) .

XML n'est pas un format indexé. Il n'est donc pas possible d'accéder directement aux informations concernant uniquement une photographie particulière. XML is not an indexed format. It is therefore not possible to directly access information concerning only a particular photograph.

Plus grave, essayer de le faire ferait immanquablement manquer les déclarations d'espace de nommage ( namespaces ) englobant ces données. More seriously, trying to do so would inevitably miss the namespace declarations that encompass this data.

Une solution connue consiste à utiliser une base de données afin d'indexer automatiquement les données XML. Toutefois, cette solution est lente et contraignante. Lente parce que l'accès à un simple fichier, fût- il grand, est plus rapide que l'accès à une base de données, même performante. A known solution is to use a database to automatically index XML data. However, this solution is slow and restrictive. Slow because access to a single file, no matter how big, is faster than access to a database, even powerful.

Contraignante, parce qu'il est plus difficile de mettre à jour la base entière qu'un simple fichier. Par ailleurs, un fichier peut être plus facilement partagé entre plusieurs ordinateurs, dans un système classique de répartition de charge, qu'une base de données, qui forme au contraire un goulot d'étranglement. Binding because it is more difficult to update the entire database than a simple file. Moreover, a file can be more easily shared between several computers, in a conventional system of load sharing, than a database, which instead forms a bottleneck.

Il est donc utile de pouvoir indexer facilement un fichier XML. It is therefore useful to be able to easily index an XML file.

Toutefois, pour des raisons de facilité d'utilisation, cette indexation doit être comprise directement dans le fichier XML et non dans un fichier séparé. However, for reasons of ease of use, this indexing must be included directly in the XML file and not in a separate file.

Par ailleurs, le fichier XML ainsi indexé doit pouvoir être lisible par tout analyseur XML, même non compatible avec un index (dans ce cas, évidemment, l'indexation ne sera pas disponible, mais les données seront entièrement lisibles comme si l'index n'était pas présent). Moreover, the XML file thus indexed must be readable by any XML parser, even if it is not compatible with an index (in this case, obviously, the indexing will not be available, but the data will be fully readable as if the index was not present).

Le Demandeur s'est posé le problème d'ajouter un tel index à un fichier XML. The Applicant has had the problem of adding such an index to an XML file.

La présente invention apporte justement une solution à ce problème. Elle porte sur un procédé d'accès à une sous-balise lors de la lecture d'un fichier écrit en langage de balisage. The present invention provides a solution to this problem. It relates to a method of accessing a sub-tag when reading a file written in markup language.

Selon une définition générale de l'invention, le procédé est caractérisé en ce qu'il comprend les étapes suivantes: 1) en réponse à une requête d'accès à une sous-balise choisie contenue dans une balise mère et accompagnée d'un identifiant, vérifier dans le fichier, l'existence d'un index associé à la balise mère et représenté sous la forme d'un tableau contenu dans ledit fichier et comprenant, pour chaque sous-balise de la balise mère, au moins une ligne de correspondance entre un identifiant attribué à la sous-balise et au moins la position de la sous-balise dans le fichier, 2) en cas de vérification positive, aller dans le tableau et comparer l'identifiant de la requête avec les identifiants dudit tableau selon une loi de recherche et de comparaison choisie, 3) en cas de comparaison positive, obtenir la position de la sous-balise et accéder directement à la sous-balise à partir de ladite position ainsi obtenue. According to a general definition of the invention, the method is characterized in that it comprises the following steps: 1) in response to a request for access to a chosen sub-tag contained in a mother tag and accompanied by an identifier , checking in the file, the existence of an index associated with the mother tag and represented in the form of a table contained in said file and comprising, for each sub-tag of the mother tag, at least one line of correspondence between an identifier assigned to the sub-tag and at least the position of the sub-tag in the file, 2) in the case of positive verification, go to the table and compare the identifier of the request with the identifiers of said table according to a law of search and comparison chosen, 3) in case of a positive comparison, obtain the position of the sub-tag and directly access the sub-tag from said position thus obtained.

Selon une réalisation, les identifiants sont rangés dans l'index selon 5 un ordre choisi et la loi de recherche et de comparaison est du type dichotomique. In one embodiment, the identifiers are ranked in the index in a chosen order and the law of search and comparison is of the dichotomous type.

En pratique, l'ordre de rangement des identifiants est de type croissant. In practice, the storage order of the identifiers is of ascending type.

Selon une autre réalisation, il est prévu au préalable de créer un 10 index selon les étapes suivantes: i) créer un tableau comprenant, pour chaque sous-balise, au moins une ligne de correspondance entre un identifiant attribuée à la sous-balise et au moins la position de la sousbalise dans le fichier, ii) trier les lignes de correspondance selon un ordre choisi en 15 fonction de l'identifiant, et iii) écrire en langage de balisage l'index à partir du tableau ainsi créé dans le fichier à un endroit choisi. In another embodiment, provision is made beforehand to create an index according to the following steps: i) to create a table comprising, for each sub-tag, at least one line of correspondence between an identifier assigned to the sub-tag and the tag; minus the position of the sub-file in the file, ii) sort the correspondence lines in a chosen order according to the identifier, and iii) write in markup language the index from the table thus created in the file to a chosen place.

En pratique, la vérification de l'existence d'un index consiste à vérifier la présence d'une balise commentaire écrite en langage de balisage et 20 apte à contenir ledit index. In practice, the verification of the existence of an index consists in verifying the presence of a comment tag written in markup language and capable of containing said index.

Un tel index stocké dans une balise commentaire permet de ne pas gêner les analyseurs (parseurs) ne comprenant pas l'invention. L'index est donc automatiquement ignoré par de tels parseurs. Such an index stored in a comment tag does not interfere with analyzers (parsers) not including the invention. The index is automatically ignored by such parsers.

Par exemple, la balise commentaire est placée en fin du fichier. For example, the comment tag is placed at the end of the file.

En variante, la balise commentaire est la première sous-balise de la balise mère. Alternatively, the comment tag is the first subtag of the parent tag.

Selon une autre réalisation, la vérification de l'existence d'un index consiste à vérifier la présence d'un attribut prédéterminé dans la balise mère. In another embodiment, the verification of the existence of an index consists in verifying the presence of a predetermined attribute in the mother tag.

Selon encore une autre réalisation, dans chaque ligne de correspondance l'identifiant est sous une forme réduite selon l'application d'une loi de réduction, la loi de recherche et de comparaison utilisant ladite loi de réduction. According to yet another embodiment, in each line of correspondence the identifier is in a reduced form according to the application of a reduction law, the law of search and comparison using said law of reduction.

En pratique, la loi de réduction est stockée dans le fichier. In practice, the law of reduction is stored in the file.

La présente invention a égalemènt pour objet, un dispositif d'accès à une sous-balise lors de la lecture d'un fichier écrit en langage de balisage. The present invention also relates to a device for accessing a sub-tag when reading a file written in markup language.

Selon un autre aspect de l'invention, le dispositif comprend: -des moyens de vérification aptes, en réponse à une requête d'accès à une sous-balise choisie contenue dans une balise mère accompagnée d'un identifiant, à vérifier dans le fichier, l'existence d'un index associé à la balise mère et représenté sous la forme d'un tableau contenu dans ledit fichier et comprenant, pour chaque sous-balise de la balise mère, au moins une ligne de correspondance entre un identifiant attribué à la sous-balise et au moins la position de la sous-balise dans le fichier; -des moyens de traitement aptes, en cas de vérification positive, à aller dans le tableau et à comparer l'identifiant de la requête avec les identifiants dudit tableau, selon une loi de recherche et de comparaison choisie; et - des moyens d'accès aptes en cas de comparaison positive à obtenir la position de la sous-balise et à accéder directement à la sous-balise à partir de ladite position ainsi obtenue. According to another aspect of the invention, the device comprises: means capable of verifying, in response to a request for access to a chosen sub-tag contained in a mother tag accompanied by an identifier, to be checked in the file , the existence of an index associated with the mother tag and represented in the form of a table contained in said file and comprising, for each sub-tag of the mother tag, at least one line of correspondence between an identifier assigned to the sub-tag and at least the position of the sub-tag in the file; processing means capable, in the case of a positive verification, to go into the table and to compare the identifier of the request with the identifiers of said table, according to a chosen law of search and comparison; and - access means capable, in the case of a positive comparison, to obtain the position of the sub-beacon and to directly access the sub-beacon from said position thus obtained.

Selon une réalisation, le dispositif comprend en outre des moyens de création aptes à créer un tableau comprenant pour chaque sous-balise au moins une ligne de correspondance entre un identifiant attribué à la sousbalise et au moins la position de la sous-balise dans le fichier, à trier les lignes de correspondance selon un ordre choisi en fonction de l'identifiant et à écrire en langage de balisage l'index à partir du tableau ainsi créé dans le fichier à un endroit choisi. According to one embodiment, the device further comprises authoring means capable of creating a table comprising for each sub-tag at least one line of correspondence between an identifier assigned to the sub-tag and at least the position of the sub-tag in the file. , sorting the correspondence lines according to an order chosen according to the identifier and writing in markup language the index from the table thus created in the file at a chosen location.

La présente invention a également pour objet un support d'information lisible par un système informatique caractérisé en ce qu'il comporte des instructions d'un programme informatique permettant la mise en oeuvre du procédé d'accès visé ci avant, lorsque ce programme est chargé et exécuté par un système informatique. The present invention also relates to an information medium readable by a computer system characterized in that it comprises instructions of a computer program for implementing the access method referred to above, when the program is loaded. and executed by a computer system.

La présente invention a également pour objet un support d'informations amovible partiellement ou totalement lisible par un système informatique, caractérisé en ce qu'il comporte des instructions d'un programme informatique permettant la mise en oeuvre d'un procédé d'accès visé ci avant lorsque le programme est chargé et exécuté par un système informatique. The present invention also relates to a removable information medium partially or completely readable by a computer system, characterized in that it comprises instructions of a computer program for implementing an access method referred to ci before when the program is loaded and executed by a computer system.

La présente invention a enfin pour objet un programme d'ordinateur stocké sur un support d'informations, ledit programme comportant des instructions permettant la mise en oeuvre d'un procédé d'accès visé ci avant lorsque le programme est chargé et exécuté par un système informatique. Finally, a subject of the present invention is a computer program stored on an information medium, said program comprising instructions for implementing an access method referred to above when the program is loaded and executed by a system. computer.

D'autres caractéristiques et avantages de l'invention apparaîtront à la lumière de la description détaillée et des dessins dans lesquels: -la figure 1 est un organigramme illustrant l'algorithme de recherche d'une sous-balise lors de la lecture d'un fichier XML selon l'invention -la figure 2 est un organigramme illustrant la recherche par dichotomie selon l'invention; - la figure 3 est un organigramme illustrant les étapes de base du procédé selon l'invention; - la figure 4 est un organigramme illustrant l'algorithme de création d'un index selon l'invention - la figure 5 est un organigramme illustrant l'algorithme d'écriture 20 de l'index selon l'invention; et - la figure 6 représente schématiquement les ressources physiques d'un ordinateur apte à mettre en oeuvre l'invention. Other features and advantages of the invention will become apparent in the light of the detailed description and the drawings in which: FIG. 1 is a flowchart illustrating the algorithm for searching for a sub-tag when reading a XML file according to the invention-Figure 2 is a flowchart illustrating the search by dichotomy according to the invention; FIG. 3 is a flowchart illustrating the basic steps of the method according to the invention; FIG. 4 is a flowchart illustrating the algorithm for creating an index according to the invention; FIG. 5 is a flowchart illustrating the writing algorithm of the index according to the invention; and FIG. 6 schematically represents the physical resources of a computer capable of implementing the invention.

En référence à la figure 1, on a représenté un organigramme illustrant les étapes d'un algorithme de recherche d'une sous-balise (par exemple une image ou photographie numérique) lors de la lecture d'un fichier XML (par exemple contenant un album de photographies). Referring to FIG. 1, there is shown a flowchart illustrating the steps of an algorithm for searching for a sub-tag (for example an image or digital photograph) when reading an XML file (for example containing a photo album).

En pratique, chaque photographie numérique ou image possède un identificateur unique (la clef, dans la suite de l'algorithme). In practice, each digital photograph or image has a unique identifier (the key, in the rest of the algorithm).

Pour la dernière image du fichier, l'identification a par exemple la valeur 4191 bl dd-f302-6a48-a99b-2366f4d0ec1 b . Cette valeur identifie l'image de manière unique parmi tous les albums, et donc en particulier pour celui considéré. For the last image of the file, the identification has for example the value 4191 bl dd-f302-6a48-a99b-2366f4d0ec1b. This value uniquely identifies the image among all albums, and therefore especially for that one.

On considère pour dérouler l'algorithme qu'on cherche à obtenir les données numériques concernant cette image particulière. It is considered to run the algorithm that one seeks to obtain the digital data concerning this particular image.

On sait qu'il s'agit de la dernière image, mais le parseur ou l'analyseur de l'ordinateur ne le saura qu'en parcourant la totalité du fichier XML 5 pour aboutir finalement sur cette image. We know that this is the last image, but the parser or the parser of the computer will know that by browsing the entire XML file 5 to finally reach this image.

Le procédé démarre d'un fichier XML (étape El), ouvre le fichier (Etape E2) et va à la fin du fichier (Etape E3). The process starts from an XML file (step E1), opens the file (Step E2) and goes to the end of the file (Step E3).

Conformément à l'étape E4, le procédé dans un mode de réalisation préféré lit le fichier XML par la fin (puisque c'est là que se trouve l'index). According to step E4, the method in a preferred embodiment reads the XML file by the end (since that is where the index is).

Conformément à l'étape E5, le procédé vérifie que le fichier se termine bien par un commentaire, sinon le procédé fait une recherche sans index (étape E6). According to step E5, the method verifies that the file ends with a comment, otherwise the process does a search without index (step E6).

Conformément à l'étape E7, le procédé lit le champ précédant la fin de commentaire et vérifie (Etape E8) qu'il est égal à http://crf. canon. fr/oml/xml- index. Cette URI indique en effet la présence d'un index au format spécifié par l'invention. Le procédé met en place une recherche sans index dans le cas contraire. According to step E7, the method reads the field preceding the end of the comment and verifies (Step E8) that it is equal to http: // crf. gun. en / oml / xml-index. This URI indicates the presence of an index in the format specified by the invention. The method implements a search without index in the opposite case.

Arrivé à cette étape, le procédé sait qu'on a bien un index conforme au format selon l'invention. Arrived at this stage, the method knows that one has an index conforming to the format according to the invention.

Conformément à l'étape E9, le procédé lit le champ précédent. II s'agit du pointeur (de la distance relative) vers le début de l'index. According to step E9, the method reads the preceding field. This is the pointer (relative distance) to the beginning of the index.

Conformément à l'étape E10, le procédé accède directement au début de l'index, et lit le champ suivant (Etape E11). In accordance with step E10, the method accesses the beginning of the index directly, and reads the next field (Step E11).

Conformément à l'étape E12, le procédé vérifie qu'il s'agit d'un 25 commentaire ouvrant. Dans la négative, le procédé met en place une recherche sans index (Etape E13). In accordance with step E12, the method verifies that it is an opening comment. If not, the method sets up a search without index (Step E13).

Conformément à l'étape E14, le procédé vérifie que le champ suivant est l'URI http://crf.canon.fr/oml/xml-index (optionnel). Cette étape a pour fonction de doubler la vérification au cas où l'index aurait été corrompu lors de la première vérification. According to step E14, the method verifies that the following field is the URI http://crf.canon.fr/oml/xml-index (optional). The purpose of this step is to double the check in case the index has been corrupted during the first check.

Le procédé lit alors successivement la taille o du champ offset, la taille c du champ clef, et le nombre n d'enregistrements. The method then successively reads the size o of the offset field, the size c of the key field, and the number n of records.

L'index étant trié par clefs croissantes, le procédé effectue alors selon l'étape E15 une recherche par dichotomie pour (éventuellement) trouver la clef recherchée. The index being sorted by increasing keys, the method then performs according to step E15 a search by dichotomy to (possibly) find the desired key.

Le processus est alors répété jusqu'à ce que le procédé tombe sur la 5 clef (Etapes E16 ou E18) ou sur un intervalle nul correspondant à une recherche infructueuse (Etapes E16 et E17). The process is then repeated until the method falls on the key (Steps E16 or E18) or on a zero interval corresponding to an unsuccessful search (Steps E16 and E17).

La recherche dichotomique est en 0(log(n)) au lieu de 0(n) pour la recherche séquentielle, c'est-à-dire qu'elle est nettement plus rapide parce qu'elle nécessite moins de comparaisons. Les clefs et les offsets étant tous de même taille, il est relativement facile de pointer sur n'importe quelle entrée de l'index. The dichotomous search is 0 (log (n)) instead of 0 (n) for the sequential search, that is, it is much faster because it requires fewer comparisons. Because keys and offsets are all the same size, it is relatively easy to point to any entry in the index.

Le procédé retourne l'offset de la clef, si il l'a effectivement trouvée. Sinon il retourne la valeur zéro (Etape E17). The process returns the offset of the key, if it actually found it. Otherwise it returns the value zero (Step E17).

Le procédé accède directement à la balise recherchée via l'offset 15 obtenu ci-dessus (Etape E18). The method directly accesses the beacon sought via the offset obtained above (Step E18).

En référence à la figure 2, on a décrit en détail, la recherche par dichotomie. Les étapes initiales E50 à E56 consistent à définir le début du fichier P1 (Etapes E50), la fin du fichier P2 (Etape E52), et le milieu du fichier Pm contenant l'index (Etapes E54 et E56). La recherche par dichotomie accède au milieu du fichier contenant l'index et lit l'entrée correspondante Pm (Etape E58). Si la clef du milieu du fichier correspond (Etape E60) à la clef recherchée (égalité des identifiants), l'offset (décalage) correspondant est renvoyé (Etape E62). En cas d'inégalité, suivant que la clef du milieu est plus grande ou plus petite que la clef à rechercher (Etape E64), le procédé continue la recherche entre le début et le milieu (Etape E70) ou le milieu et la fin du fichier (Etape E82). Le processus est alors répété jusqu'à ce que le procédé aboutisse à la clef à rechercher (égalité des identifiants) (Etapes E80 ou E84) ou sur un intervalle nul (Etape E88). Referring to Figure 2, the dichotomy search is described in detail. The initial steps E50 to E56 consist in defining the beginning of the file P1 (Steps E50), the end of the file P2 (Step E52), and the middle of the file Pm containing the index (Steps E54 and E56). The search by dichotomy accesses the middle of the file containing the index and reads the corresponding entry Pm (Step E58). If the middle key of the file corresponds (Step E60) to the searched key (equality of the identifiers), the corresponding offset (offset) is returned (Step E62). In case of inequality, depending on whether the middle key is larger or smaller than the key to be searched for (Step E64), the process continues the search between the beginning and the middle (Step E70) or the middle and end of the search. file (Step E82). The process is then repeated until the method results in the key to be searched (equality of the identifiers) (Steps E80 or E84) or on a zero interval (Step E88).

L'algorithme décrit en référence à la figure 1 permet donc d'accéder directement à une balise XML. Dans l'exemple, une fois l'offset de la balise connu, le procédé accède directement sur la dernière balise IMG , celle dont l'attribut CONTENT ID contient la valeur 4191 bl ddf302-6a48-a99b-2366f4d0ec1 b . The algorithm described with reference to FIG. 1 thus allows direct access to an XML tag. In the example, once the offset of the known tag, the method accesses directly on the last IMG tag, the one whose CONTENT ID attribute contains the value 4191 bl ddf302-6a48-a99b-2366f4d0ec1 b.

En pratique, l'algorithme est modifié pour fonctionner avec les espaces de nommage ( namespace ) XML. In practice, the algorithm is modified to work with XML namespace.

Les namespaces XML permettent à deux utilisateurs de créer des balises de même nom.. Le conflit est évité en préfixant ces balises par un préfixe différent. Par exemple: <namespacel:IMG> < namespace2:IMG> Le problème pourrait à nouveau se poser de deux utilisateurs définissant le même namespace. XML namespaces allow two users to create tags of the same name. Conflict is avoided by prefixing these tags with a different prefix. For example: <namespacel: IMG> <namespace2: IMG> The problem could again arise from two users defining the same namespace.

Pour résoudre ce problème définitivement, un namespace est défini de manière unique par une URI. C'est l'URI (par exemple http://crf. canon. fr/ns/one ), et non sa référence (par exemple namespacel ) qui est utilisée pour toutes les comparaisons. To solve this problem permanently, a namespace is uniquely defined by a URI. It is the URI (eg http: // canon.cf / en / ns / one), not its reference (eg namespacel) which is used for all comparisons.

Ces URls de nommage se définissent elles-mêmes sur une balise. Par exemple, ci-dessous, on définit sans ambiguïté les deux préfixes namespacel et namespace 2 : <COLLECTION xmins:namespacel ="http://crf.canon.fr/ns/one" xmins:namespace1="http://crf.canon. fr/ns/two" > Pour gérer les namespaces correctement, il est nécessaire de les analyser (parser). These naming URls define themselves on a tag. For example, below, we unambiguously define the two prefixes namespacel and namespace 2: <COLLECTION xmins: namespacel = "http://crf.canon.fr/ns/one" xmins: namespace1 = "http: // crf .canon en / ns / two "> To manage namespaces correctly, it is necessary to analyze them (parser).

On insère donc entre les étapes E17 et E18 précédentes, la nouvelle étape E19 suivante. Cette étape permet de gérer les namespaces. Therefore, the following new step E19 is inserted between the preceding steps E17 and E18. This step is used to manage namespaces.

Ainsi selon l'étape E19, le procédé lit comme classiquement le fichier XML depuis le début jusqu'au commentaire suivant, qui indique le début du point où il peut utiliser l'index sans avoir de problème avec les namespaces (puisque désormais il a lu les déclarations de namespaces et qu'il connaît donc leurs valeurs) : <!-- http://crf.canon. fr/oml/xmlindex-start --!> En variante, la balise commentaire n'est pas seulement un indicateur de la possibilité d'utiliser l'index mais l'index lui-même. Ainsi, l'index lui-même peut être prévu dans la première sous- balise de la balise mère. Thus according to step E19, the method reads as conventionally the XML file from the beginning to the following comment, which indicates the beginning of the point where it can use the index without having any problem with the namespaces (since now it has read declarations of namespaces and so knows their values): <! - http://crf.canon. en / oml / xmlindex-start -!> Alternatively, the comment tag is not only an indicator of the possibility of using the index but the index itself. Thus, the index itself can be provided in the first sub-tag of the mother tag.

Dans une autre variante, la présence de l'index peut aussi être indiquée par un attribut prédéterminé sur la balise mère comme par exemple: < COLLECTION xmlns:index="http://crf.canon.fr/oml/index" index:INDEXED="true" > Dans l'exemple de l'album de photographies, toutes les images possèdent un identificateur unique. Or, ce peut ne pas être le cas, il peut falloir le construire, ou, du moins, en réduire la taille. En réduire la taille permet de réduire le nombre de comparaisons de caractères à effectuer. En effet, en utilisant l'index, il faut constamment comparer la valeur de la clef recherchée aux informations contenues dans l'index. Plus la clef est longue, plus cette comparaison prend du temps. Sur de gros fichiers, ce temps n'est pas négligeable. In another variant, the presence of the index can also be indicated by a predetermined attribute on the mother tag, for example: <COLLECTION xmlns: index = "http://crf.canon.fr/oml/index" index: INDEXED = "true"> In the example of the photo album, all images have a unique identifier. This may not be the case, it may have to be built, or at least reduced in size. Reducing the size reduces the number of character comparisons to be made. Indeed, using the index, it is necessary constantly to compare the value of the searched key with the information contained in the index. The longer the key, the longer this comparison takes. On large files, this time is not negligible.

Or, la manière de réduire les clefs diffère selon les applications. Il n'est donc pas possible d'imposer une manière commune. However, the way to reduce the keys differs according to the applications. It is therefore not possible to impose a common way.

Pour pallier cet inconvénient, le procédé selon l'invention insère directement dans le fichier XML la manière de réduire les clefs. To overcome this drawback, the method according to the invention directly insert in the XML file the way to reduce the keys.

En pratique, après la taille de l'index (cinquième champ à partir du début de l'index), on ajoute le champ suivant: Valeur de la fonction de réduction (c'est-à-dire de hachage) utilisée. Deux exemples de valeur de ce champ sont: hash="id.value()" hash="id.value()[1]+ id.value()[-1]" Dans le premier cas ci-dessus, il s'agit de la fonction identité. Dans le second, on utilise comme valeur de hachage d'une clef son premier et son dernier caractère. In practice, after the size of the index (fifth field from the beginning of the index), we add the following field: Value of the reduction function (that is to say hash) used. Two examples of this field's value are: hash = "id.value ()" hash = "id.value () [1] + id.value () [- 1]" In the first case above, it's is the identity function. In the second, we use as a hash value a key its first and last character.

En pratique, on utilise le langage de programmation JavaScript pour 30 la définition des valeurs de hachage. Ce langage est bien connu de l'homme du métier et facilement disponible. In practice, the JavaScript programming language is used for the definition of the hash values. This language is well known to those skilled in the art and readily available.

En référence à la figure 3, on a décrit les étapes de base de l'algorithme de recherche d'une image particulière (sous-balise) lors de la lecture d'un fichier XML.. Ces étapes de base comprennent une étape E90 de définition d'une position P dans le fichier; une étape E92 de positionnement du fichier de ladite position P ainsi définie; une étape E94 de lecture de l'offset; une étape E96 de lecture de clef; et une étape de retour de données E98. With reference to FIG. 3, the basic steps of the algorithm for searching for a particular image (sub-tag) during the reading of an XML file are described. These basic steps include a step E90 of defining a position P in the file; a file positioning step E92 of said position P thus defined; an E94 step of reading the offset; a key reading step E96; and a data return step E98.

En référence à la figure 4, on a décrit les étapes illustrant l'algorithme de création de l'index selon l'invention. With reference to FIG. 4, the steps illustrating the algorithm for creating the index according to the invention have been described.

Pour pouvoir utiliser l'index, il faut d'abord le créer. To use the index, you must first create it.

La création peut s'effectuer: soit lors de la génération des données XML; soit dans une étape de post-traitement, une fois toutes les données disponibles. The creation can be done: either during the generation of XML data; either in a post-processing step, once all the data available.

On considère dans la suite uniquement le second algorithme. Le premier en est facilement déductible. In the following, only the second algorithm is considered. The first is easily deductible.

On considère que le type d'élément à indexer est fourni par un chemin XPath. The type of element to be indexed is considered to be provided by an XPath path.

XPath est une syntaxe permettant de décrire le chemin d'accès à un noeud XML. XPath is a syntax for describing the path to an XML node.

Dans le cas de l'album de photographies numériques, on désire indexer toutes les balises IMG . Ces balises sont des sous-balises contenues dans la balise mère COLLECTION . On indique donc comme chemin XPath: "COLLECTION/IMG" Dans le même temps, on souhaite indiquer que la valeur des clefs 25 est contenue dans l'attribut: CONTENT ID . On ajoute cette information directement dans le chemin XPath, qui devient: "COLLECTION/IMG@CONTENT_ID" En variante, on utilise un second chemin XPath pour indiquer la valeur des clefs, car elles peuvent être stockées à un endroit différent du 30 document XML. In the case of the album of digital photographs, one wishes to index all the IMG tags. These tags are sub-tags contained in the COLLECTION parent tag. So we indicate as XPath path: "COLLECTION / IMG" At the same time, we want to indicate that the value of the keys 25 is contained in the attribute: CONTENT ID. This information is added directly to the XPath path, which becomes: "COLLECTION / IMG @ CONTENT_ID" Alternatively, a second XPath path is used to indicate the value of the keys, as they can be stored at a different location than the XML document.

Utiliser deux chemins XPath, l'un pour les balises, l'autre pour les clefs, présente en outre l'avantage que le premier chemin XPath peut désormais contenir des conditions. Par exemple, on peut utiliser: "COLLECTION/IMG[position() > 5]" pour indexer toutes les images, sauf les cinq premières. Using two XPath paths, one for tags, the other for keys, has the further advantage that the first XPath path can now contain conditions. For example, you can use: "COLLECTION / IMG [position ()> 5]" to index all but the first five images.

Si on désire réduire la taille des clefs, comme indiqué 5 précédemment, on fournira en outre une fonction de hachage. If it is desired to reduce the size of the keys, as previously indicated, a hash function will also be provided.

Munis de ces informations, il est désormais possible d'indexer cette souspartie du document XML. With this information, it is now possible to index this subpart of the XML document.

L'algorithme de création et index est le suivant: Conformément à l'étape E100, le procédé consiste à créer un 10 tableau t vide. Il contiendra l'index à la fin de l'algorithme. The creation algorithm and index is as follows: According to step E100, the method consists of creating an empty array t. It will contain the index at the end of the algorithm.

Dans l'étape E102, il est prévu d'initialiser la variable i à un. Conformément à l'étape E104, le procédé parcourt le document XML jusqu'à trouver la balise recherchée. In step E102, it is expected to initialize the variable i to one. In accordance with step E104, the method traverses the XML document until it finds the searched tag.

Selon l'étape E106, le procédé note la position o de cette balise dans 15 le fichier (c'est-à-dire le nombre de caractères depuis le début du fichier, jusqu'au < commençant la balise). According to step E106, the method notes the o position of this tag in the file (i.e., the number of characters from the beginning of the file, to the <starting tag).

Selon l'étape E108, le procédé passe les attributs de cette balise et identifie (El 10) celui contenant la clef. According to step E108, the method passes the attributes of this tag and identifies (El 10) that containing the key.

Selon l'étape E112, le procédé lit la valeur c de l'attribut contenant la clef. Si nécessaire, il est prévu de remplacer c par la valeur hachée de c. Selon l'étape E114, le procédé stocke (o, c) dans le tableau t(]. Selon l'étape E116, il est prévu d'incrémenter i. According to step E112, the method reads the value c of the attribute containing the key. If necessary, it is planned to replace c by the hash value of c. According to the step E114, the method stores (o, c) in the table t (] According to the step E116, it is expected to increment i.

Ensuite le procédé saute les balises filles ou sous-balises, lit la balise fermante (de la balise recherchée) et lit la balise suivante (Etape E118). S'il n'y 25 en a plus, le procédé continue en étape E122. Then the process skips the tags or sub-tags, reads the closing tag (of the searched tag) and reads the next tag (Step E118). If there are no more, the process proceeds to step E122.

S'il s'agit de nouveau d'une balise à indexer, le procédé continue à l'étape E106. Sinon, il continue en étape E118. If it is again a tag to be indexed, the method continues in step E106. Otherwise, it continues in step E118.

Selon l'étape E122, le procédé trie le tableau t suivant l'ordre des clefs croissantes. According to step E122, the method sorts the array t according to the order of the increasing keys.

Selon l'étape E124, le procédé écrit l'index. According to step E124, the method writes the index.

En référence à la figure 5, on a décrit l'algorithme d'écriture de l'index selon l'invention. With reference to FIG. 5, the writing algorithm of the index according to the invention has been described.

L'écriture proprement dite de l'index s'effectue de la manière suivante dans un mode préféré de réalisation: Selon l'étape E200, il est prévu d'ouvrir le fichier XML Selon l'étape E202, le procédé se positionne à l'extrême fin du fichier. The actual writing of the index is carried out as follows in a preferred embodiment: According to step E200, it is planned to open the XML file. According to step E202, the method is positioned at the same time. extreme end of the file.

Il est prévu de vérifier qu'il n'y a pas déjà un index et le cas échéant, l'éliminer. It is planned to verify that there is not already an index and if necessary, to eliminate it.

Selon l'étape E204, le procédé note la position p courante (c'est-à-dire le début de l'index). According to step E204, the method notes the current position p (i.e. the beginning of the index).

Selon l'étape E206, le procédé écrit un commentaire ouvrant. According to step E206, the method writes an opening comment.

Par exemple, le commentaire est http://crf.canon.fr/oml/xml-index (Etape E208). For example, the comment is http://crf.canon.fr/oml/xml-index (Step E208).

Le procédé écrit la longueur du champ offset (il s'agit de la longueur de la plus petite chaîne qui permettra de contenir le plus grand offset contenu dans le tableau t). Par exemple, la longueur du champ offset est égale à 7 (étape E210). The process writes the length of the offset field (this is the length of the smallest string that will contain the largest offset in the table t). For example, the length of the offset field is equal to 7 (step E210).

Le procédé écrit la longueur du champ clef (Etape E212). Dans l'exemple de l'album photo, toutes les clefs sont de longueurs identiques. Le procédé écrit la taille du tableau t (Etape E214). The method writes the length of the key field (Step E212). In the example of the photo album, all the keys are of identical lengths. The method writes the size of the array t (Step E214).

Selon l'étape E222, le procédé parcourt le tableau t, et pour chaque élément, écrit l'offset, puis la clef. According to step E222, the method traverses the array t, and for each element, writes the offset, then the key.

Selon l'étape E224, le procédé écrit la variable p. Selon l'étape E226, le procédé écrit le commentaire http://crf.canon.fr/oml/xml-index, et selon l'étape E228, il écrit le commentaire 25 fermant et ferme enfin le fichier (Etape E230). According to step E224, the method writes the variable p. According to step E226, the method writes the comment http://crf.canon.fr/oml/xml-index, and according to step E228, it writes the comment 25 finally closing and closing the file (Step E230).

En variante, comme mentionnée en référence à la figure 1, l'écriture du commentaire s'effectue non pas à la fin du fichier mais dans la première sous-balise de la balise mère nécessitant une indexation. As a variant, as mentioned with reference to FIG. 1, the comment is written not at the end of the file but in the first sub-tag of the mother tag requiring indexing.

Dans une autre variante, l'écriture du commentaire s'effectue par l'écriture d'un attribut prédéterminé dans la balise mère nécessitant une indexation. In another variant, the comment is written by writing a predetermined attribute in the mother tag requiring indexing.

Par souci de simplicité, ci-dessus, on a omis les écritures d'espace qui permettent de séparer les champs. For the sake of simplicity, above, we have omitted the space entries that separate the fields.

Par ailleurs, si l'on veut que l'index soit facilement lisible par un être humain, il faut insérer un retour chariot: après l'en-tête de l'index; après chaque clef; à la fin de l'index. On the other hand, if you want the index to be easily read by a human being, you have to insert a carriage return: after the index header; after each key; at the end of the index.

En outre, pour que tous les offsets aient la même taille, il faudra éventuellement rajouter des zéros en en-tête. Ainsi, l'offset 123 deviendra par exemple: 0000123 , ce qui permet d'avoir tous les offsets sur 7 caractères. In addition, for all offsets to be the same size, it may be necessary to add zeros in the header. Thus, the offset 123 will become for example: 0000123, which allows to have all the offsets on 7 characters.

On peut vouloir indexer plusieurs sous-parties d'un même document XML. One may want to index several subparts of the same XML document.

Dans ce cas, on modifie l'index comme suit (d'autres variantes sont possibles) : L'index contient plusieurs sous-index. In this case, the index is modified as follows (other variants are possible): The index contains several subindices.

Un index maître permet de lister les différents sous-index. C'est l'index principal qui est contenu à l'extrême fin du fichier. A master index is used to list the different subindices. This is the main index that is contained at the very end of the file.

Une représentation compacte de ces sous-index, qui tient dans un seul commentaire XML, est la suivante: É Commentaire ouvrant É http://crf. canon.fr/oml/xml-index É Taille du champ offset de l'index 1 É Taille du champ clef de l'index 1 É Taille de l'index 1 É Offset1,i É Clef1,i É Offset2,1 É Clef2,1 É É Offsetn,1 É Clefn,i É Taille du champ offset de l'index n É Taille du champ clef de l'index n É Taille de l'index n É Offset1,n É Clef1,n É Offset2,n É Clef2,n É Offsetn,n É Clefn,n É Offset du début de l'index 1 É Offset du début de l'index 2 É Offset du début de l'index n É Nombre d'index É http://crf.canon.fr/oml/xml-index É Commentaire fermant En référence à la figure 6, on a représenté les ressources physiques d'un appareil programmable 200 apte à mettre en oeuvre l'invention. A compact representation of these sub-indexes, which fits into a single XML comment, is as follows: Commentaire Comment opening http http: // crf. canon.fr/oml/xml-index É Index offset field size 1 É Index key field size 1 É Index size 1 É Offset1, i É Clef1, i É Offset2,1 É Clef2 Offset, 1 Key, i Offset field size of index n Key index field size n Index size n Offset1, nKey1, n Offset2, n É Clef2, n É Offsetn, n É Clefn, n É Offset of the beginning of the index 1 É Offset of the beginning of the index 2 É Offset of the beginning of the index n É Number of index É http: // crf .cc / oml / xml-index E Comment closing With reference to FIG. 6, the physical resources of a programmable device 200 capable of implementing the invention have been represented.

L'appareil 200 comporte un bus de communication 202, auquel sont reliés: une unité centrale de traitement 203 (microprocesseur, CPU) qui commande les échanges entre les divers éléments de l'appareil; - une mémoire morte (ROM) 204 pouvant comporter les programmes de l'invention (Prog1, Prog2) ; une mémoire vive (RAM) 206; un disque dur 212 pouvant comporter les programmes précités; un clavier 210; une caméra numérique 201; un écran 208; un lecteur de disquette 214 adapté à recevoir une disquette 216 et à y lire ou à y écrire des documents traités ou à traiter selon l'invention; 30 - une interface de communication 218 reliée à un réseau de communication 220, par exemple le réseau Internet, l'interface étant apte à transmettre et à recevoir des documents. The apparatus 200 comprises a communication bus 202, to which are connected: a central processing unit 203 (microprocessor, CPU) which controls the exchanges between the various elements of the apparatus; a read-only memory (ROM) 204 that can comprise the programs of the invention (Prog1, Prog2); a random access memory (RAM) 206; a hard disk 212 may include the aforementioned programs; a keyboard 210; a digital camera 201; a screen 208; a diskette drive 214 adapted to receive a floppy disk 216 and to read or write there documents treated or to be treated according to the invention; A communication interface 218 connected to a communication network 220, for example the Internet network, the interface being able to transmit and receive documents.

Le bus de communication 202 permet la communication et l'interopérabilité entre les différents éléments inclus dans l'appareil ou reliés à lui. La représentation du bus n'est pas limitative et, notamment, l'unité centrale est susceptible de communiquer des instructions à tout élément de l'appareil directement ou par l'intermédiaire d'un autre élément de l'appareil. The communication bus 202 allows communication and interoperability between the various elements included in the device or connected to it. The representation of the bus is not limiting and, in particular, the central unit is capable of communicating instructions to any element of the apparatus directly or via another element of the apparatus.

Le code exécutable de chaque programme permettant à l'appareil programmable de mettre en oeuvre les traitements selon l'invention peut être stocké par exemple dans le disque dur 212 ou en mémoire morte 204. The executable code of each program enabling the programmable device to implement the processes according to the invention can be stored for example in the hard disk 212 or in the read-only memory 204.

Selon une variante de réalisation, la disquette 216 peut contenir des documents ainsi que le code exécutable des programmes précités qui, une fois lu par l'appareil, est stocké dans le disque dur 212. According to an alternative embodiment, the floppy disk 216 may contain documents as well as the executable code of the aforementioned programs which, once read by the apparatus, is stored in the hard disk 212.

Selon une autre variante de réalisation, le code exécutable des programmes peut être reçu par l'intermédiaire du réseau de communication, via l'interface 106, pour être stocké de façon identique à celle décrite précédemment. According to another embodiment, the executable code of the programs can be received via the communication network, via the interface 106, to be stored identically to that described above.

Les disquettes peuvent être remplacées par tout support d'information tel que, par exemple, un disque compact (CD ROM) ou une carte mémoire. De manière générale, un moyen de stockage d'information, visible par un ordinateur ou par un microprocesseur, intégré ou non à l'appareil, éventuellement amovible, est adapté à mémoriser un ou plusieurs programmes dont l'exécution permet la mise en oeuvre du procédé selon l'invention. Floppies can be replaced by any information medium such as, for example, a compact disc (CD ROM) or a memory card. In general, an information storage means, visible by a computer or by a microprocessor, whether or not integrated into the device, possibly removable, is adapted to store one or more programs whose execution allows the implementation of the process according to the invention.

De manière plus générale, le ou les programmes peuvent être chargés dans un des moyens de stockage de l'appareil avant d'être exécutés. More generally, the program or programs can be loaded into one of the storage means of the device before being executed.

L'unité centrale 203 commande et dirige l'exécution des instructions ou portions de code logiciel du ou des programmes selon l'invention, instructions qui sont stockées dans le disque dur 212 ou la mémoire morte 204, ou bien dans les autres éléments de stockage précités. Lors de la mise sous tension, le ou les programmes qui sont stockés dans une mémoire non volatile, par exemple le disque dur 212 ou la mémoire ROM 204, sont transférés dans la mémoire vive RAM 206 qui contiendra alors le code exécutable du ou des programmes selon l'invention, ainsi que des registrespour mémoriser les variables des paramètres nécessaires à la mise en oeuvre de l'invention. The central unit 203 controls and directs the execution of the instructions or portions of software code of the program or programs according to the invention, instructions which are stored in the hard disk 212 or the read-only memory 204, or else in the other storage elements supra. When powering on, the program or programs that are stored in a non-volatile memory, for example the hard disk 212 or the ROM 204, are transferred into the RAM RAM 206 which will then contain the executable code of the program or programs. according to the invention, as well as registers for memorizing the variables of the parameters necessary for the implementation of the invention.

Il convient de noter que l'appareil programmable comportant le 5 dispositif selon l'invention peut être également un appareil programmé. It should be noted that the programmable apparatus comprising the device according to the invention may also be a programmed apparatus.

Cet appareil contient alors le code du ou des programmes informatiques, par exemple figé dans un circuit intégré à application spécifique (ASIC). This device then contains the code of the computer program or programs, for example fixed in a specific application integrated circuit (ASIC).

Claims

1. A method of accessing a sub-tag when reading a file written in markup language, the method being characterized in that it comprises the following steps: 1) in response to an access request to a chosen sub-tag contained in a mother tag and accompanied by an identifier, checking in the file, the existence of an index associated with the mother tag and represented in the form of a table contained in said file and including, for each sub-tag of the mother tag, at least one line of correspondence between an identifier assigned to the sub-tag and at least the position of the sub-tag in the file, 2) in case of positive verification, go to the table and compare the identifier of the request with the identifiers of said array according to a chosen search and comparison law, 3) in the case of a positive comparison, obtain the position of the sub-tag and directly access the sub-tag from of said position thus obtained.

2. Method according to claim 1, wherein the identifiers are arranged in the index in a chosen order and the law of search and comparison is of the dichotomous type.

3. Method according to claim 2, wherein the storage order of the identifiers is of ascending type.

4. Method according to any one of the preceding claims, wherein it is planned in advance to create an index according to the following steps: i) create a table comprising, for each sub-tag, at least one line of correspondence between an identifier assigned to the sub-tag and at least the position of the sub-tag in the file, ii) sorting the match lines according to an order chosen according to the identifier, iii) writing in markup language the index from the array created in the file at a chosen location.

5. Method according to one of the preceding claims, wherein the verification of the existence of an index is to verify the presence of a comment tag written in markup language and able to contain said index.

6. The method of claim 5, wherein said comment tag is placed at the end of the file.

The method of claim 5, wherein said comment tag is the first subtag of the mother tag.

8. The method according to one of claims 1 to 4, wherein verifying the existence of an index consists in verifying the presence of a predetermined attribute in the mother tag.

9. Method according to one of the preceding claims, characterized in that in each line of correspondence the identifier is in a reduced form according to the application of a reduction law, the law of search and comparison using said law of reduction.

10. Method according to claim 9, characterized in that the reduction law is stored in the file.

11. The method of claim 9 or 10, characterized in that the reduction law is a hash function.

12. Device for accessing a sub-tag when reading a file written in markup language, characterized in that it comprises: -suitable verification means, in response to a request for access to a underlines selected contained in a mother tag accompanied by an identifier, to check in the file, the existence of an index associated with the mother tag and represented in the form of a table contained in said file and including, for each sub tag of the mother tag, at least one line of correspondence between an identifier assigned to the sub-tag and at least the position of the sub-tag in the file; processing means capable, in the case of a positive verification, to go into the table and to compare the identifier of the request with the identifiers of said table, according to a chosen law of search and comparison; and - access means capable, in the case of a positive comparison, to obtain the position of the sub-beacon and to directly access the sub-beacon from said position thus obtained.

13. Device according to claim 12, characterized in that it further comprises creation means able to create a table comprising for each sub-tag at least one line of correspondence between an identifier assigned to the sub-tag and at least the position of the sub-tag in the file, to sort the lines of correspondence according to an order chosen according to the identifier and to write in markup language the index from the table thus created in the file at a chosen location .

14. Information medium readable by a computer system, characterized in that it comprises instructions of a computer program for implementing an access method according to any one of claims 1 to 11, when this program is loaded and executed by a computer system.

15. Information carrier removable, partially or completely readable by a computer system, characterized in that it comprises instructions of a computer program for implementing an access method according to any one of the claims. 1 to 11, when the program is loaded and executed by a computer system.

16. A computer program stored on an information medium, said program comprising instructions for implementing an access method according to any one of claims 1 to 11, when the program is loaded and executed by a computer system.