FR3118250A1

FR3118250A1 - Document management system integrated in a computer operating system, corresponding method and computer program

Info

Publication number: FR3118250A1
Application number: FR2013560A
Authority: FR
Inventors: Franck Meyer; Tiphaine Marie; Philippe PORRETTA
Original assignee: Orange SA
Current assignee: Orange SA
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2022-06-24

Abstract

Système de gestion de documents intégré dans un système d’exploitation d’un ordinateur, procédé et programme d’ordinateur correspondants L'invention concerne un procédé de gestion de documents mis en œuvre au sein d’un système d’exploitation d’un ordinateur (22). Sur détection (21) d’une sélection d’un document par un utilisateur (Ut-A), un tel procédé met en œuvre une notification (23) audit utilisateur d’au moins un document similaire audit document sélectionné, deux documents similaires étant tous deux associés à au moins un terme appartenant à un corpus (11) de termes d’intérêt, défini pour ledit utilisateur et personnalisé en fonction d’un profil de centres d’intérêt dudit utilisateur. Figure pour l’abrégé : Fig 1Document management system integrated in a computer operating system, corresponding method and computer program The invention relates to a document management method implemented within a computer operating system. (22). Upon detection (21) of a selection of a document by a user (Ut-A), such a method implements a notification (23) to said user of at least one document similar to said selected document, two similar documents being both associated with at least one term belonging to a corpus (11) of terms of interest, defined for said user and personalized according to a profile of centers of interest of said user. Figure for abstract: Fig 1

Description

Document management system integrated in a computer operating system, corresponding method and computer program

Le domaine de l'invention est celui de la gestion de documents et de la recherche d’informations. Plus précisément, l'invention concerne la gestion de documents, au sein du système d’exploitation d’un ordinateur, permettant notamment de proposer en temps réel à un utilisateur des documents similaires à un document donné.The field of the invention is that of document management and information retrieval. More specifically, the invention relates to document management, within the operating system of a computer, making it possible in particular to offer a user in real time documents similar to a given document.

Art antérieurPrior art

On rappelle, à titre préliminaire, qu’un système d’exploitation (souvent appelé OS, de l’anglais « Operating System ») est un ensemble de programmes qui dirige l’utilisation des ressources d’un ordinateur par des logiciels applicatifs. Il existe de très nombreux systèmes d’exploitation différents, tels que Windows®, Mac OS®, GNU/Linux® ou encore Android®, dont les fonctionnalités diffèrent en fonction de l’exécution des programmes, l’utilisation de la mémoire centrale ou des périphériques, la manipulation des systèmes de fichiers, etc.It is recalled, as a preliminary, that an operating system (often called OS, from the English “Operating System”) is a set of programs which directs the use of the resources of a computer by application software. There are many different operating systems, such as Windows®, Mac OS®, GNU/Linux® or Android®, whose functionalities differ according to the execution of programs, the use of central memory or devices, manipulation of file systems, etc.

Cependant, la plupart de ces systèmes d’exploitation offrent aux utilisateurs la possibilité de rechercher des documents à partir de mots clés. Cette fonctionnalité est utile pour un utilisateur qui recherche un document enregistré dans un espace de stockage de son ordinateur, mais dont il a oublié le nom exact, ou l’emplacement du répertoire dans lequel il est archivé. L’utilisateur peut ainsi saisir un ou plusieurs mots clés dans un champ de recherche de l’explorateur de fichiers de son ordinateur. En réponse, le système d’exploitation lui propose un ou plusieurs documents associés aux mots clés saisis, par exemple dont le nom de fichier contient ce ou ces mots clés.However, most of these operating systems provide users with the ability to search for documents based on keywords. This feature is useful for a user who is looking for a document saved in a storage space on his computer, but of which he has forgotten the exact name, or the location of the directory in which it is archived. The user can thus enter one or more keywords in a search field of the file explorer of his computer. In response, the operating system offers him one or more documents associated with the keywords entered, for example whose file name contains this or these keywords.

Si cette fonctionnalité est intéressante, elle est cependant limitée, et ne permet pas de répondre à l’ensemble des besoins des utilisateurs en matière de recherche de documents.If this feature is interesting, it is however limited, and does not meet all the needs of users in terms of searching for documents.

En effet, dans de nombreux cas, il peut être intéressant pour l’utilisateur, en considérant un document particulier, de savoir quels sont les documents enregistrés sur son ordinateur (ou plus généralement sur un disque, ou un espace de travail qui lui est associé), qui sont, selon son point de vue, les plus similaires à ce document particulier.Indeed, in many cases, it can be interesting for the user, when considering a particular document, to know which documents are saved on his computer (or more generally on a disk, or a workspace associated with it ), which are, in his view, the most similar to this particular document.

L’utilisateur peut en effet avoir besoin de trouver facilement d’autres versions d’un même document, qui serait éventuellement enregistrées dans un autre répertoire de son espace de stockage.The user may indeed need to easily find other versions of the same document, which would possibly be saved in another directory of his storage space.

Il peut également être intéressé à identifier d’éventuels doublons de ce document, afin de pouvoir les supprimer et libérer ainsi de l’espace sur son ordinateur.He may also be interested in identifying possible duplicates of this document, in order to be able to delete them and thus free up space on his computer.

Enfin, dans le cadre d’une recherche de type bibliographique sur son ordinateur personnel par exemple, l’utilisateur peut avoir besoin de trouver, par rebond depuis un document particulier servant de contexte, différents documents traitant du même sujet d’intérêt.Finally, in the context of a bibliographic type search on his personal computer for example, the user may need to find, by bouncing from a particular document serving as context, different documents dealing with the same subject of interest.

Les systèmes d’exploitation existants ne permettent pas, à ce jour, de satisfaire ces différents besoins.Existing operating systems do not, to date, meet these different needs.

On connaît par ailleurs, dans d’autres domaines, tels que celui du e-commerce, des techniques de recommandation d’achat, visant à proposer à un utilisateur des produits similaires à ceux dont il consulte les caractéristiques, ou qu’il envisage d’acheter. Ainsi, un client qui consulte la page web du DVD d’un dessin animé peut se voir recommander le DVD d’un autre dessin animé susceptible de l’intéresser également. Ces recommandations sont dites item-to-item, c’est-à-dire qu’elles partent d’un produit pour proposer un produit similaire. Elles utilisent ainsi un produit comme contexte de départ.In addition, in other fields, such as e-commerce, purchase recommendation techniques are known, aimed at offering a user products similar to those whose characteristics he is consulting, or which he is considering buying. 'buy. Thus, a customer who consults the web page of the DVD of a cartoon can be recommended the DVD of another cartoon likely to interest him as well. These recommendations are called item-to-item, that is to say that they start from a product to offer a similar product. They thus use a product as a starting context.

On notera qu’il existe également des techniques plus anciennes, à savoir des techniques de simple recherche implicite automatisée, qui filtrent une base documentaire à partir de préférences utilisateurs, et retournent toujours les mêmes résultats. De tels résultats présentent généralement des problèmes de pertinence car la prise en compte de toutes les préférences simultanément pour le filtrage a tendance à noyer l’information utile (notons que ces techniques de recherche implicite automatisée sont adaptées à des articles, ou catalogues de contenus qui changent continuellement, mais pas à une grande base de documents relativement statiques).It should be noted that there are also older techniques, namely simple automated implicit search techniques, which filter a documentary base based on user preferences, and always return the same results. Such results generally present problems of relevance because taking into account all the preferences simultaneously for filtering tends to drown out the useful information (note that these automated implicit search techniques are adapted to articles, or catalogs of content which continuously changing, but not to a large database of relatively static documents).

Les recommandations item-to-item, quant à elles, sont élaborées à partir de matrices de similarités, qui peuvent être construites à partir des journaux d’achat (en anglais « log ») de l’ensemble des clients du site de e-commerce (procédé dit de filtrage collaboratif), ou à partir de mots-clés extraits, par le site marchand, d’un descriptif du produit (procédé dit content-based, ou « basé contenu »).Item-to-item recommendations, on the other hand, are developed from similarity matrices, which can be constructed from the purchase logs (in English “log”) of all the customers of the e- commerce (process known as collaborative filtering), or from keywords extracted, by the merchant site, from a description of the product (process known as content-based, or “content-based”).

Ces recommandations item-to-item, qui sont intéressantes en ce qu’elles permettent de guider l’utilisateur dans ses achats, présentent cependant l’inconvénient de ne pas être personnalisées. Les recommandations item-to-item formulées par le site marchand sont les mêmes pour tous les utilisateurs, et ne tiennent pas compte de ses éventuels centres d’intérêt : elles dépendent uniquement du contexte courant, à savoir l’article en cours de consultation ou en cours d’achat. De plus, cette technique, déployée au niveau d’un site internet, n’est pas directement transposable au contexte du système d’exploitation d’un ordinateur personnel.These item-to-item recommendations, which are interesting in that they help guide the user in his purchases, however have the disadvantage of not being personalized. The item-to-item recommendations made by the merchant site are the same for all users, and do not take into account their possible areas of interest: they depend solely on the current context, namely the item being consulted or being purchased. Moreover, this technique, deployed at the level of a website, is not directly transposable to the context of the operating system of a personal computer.

Il existe donc un besoin d'une technique de gestion de documents qui ne présente pas ces différents inconvénients de l’art antérieur. Notamment, il existe un besoin d’une telle technique qui puisse être directement intégrée au système d’exploitation d’un ordinateur. Il existe également un besoin d’une telle technique qui permette de proposer à l’utilisateur des recommandations item-to-item personnalisées, en fonction de son profil et/ou de ses centres d’intérêt. Il existe encore un besoin d’une telle technique qui trouve des applications avantageuses dans le domaine professionnel, et constitue un outil efficace pour les entreprises.There is therefore a need for a document management technique which does not have these various drawbacks of the prior art. In particular, there is a need for such a technique which can be directly integrated into the operating system of a computer. There is also a need for such a technique which makes it possible to offer the user personalized item-to-item recommendations, according to his profile and/or his centers of interest. There is still a need for such a technique which finds advantageous applications in the professional field, and constitutes an effective tool for companies.

L'invention répond à ce besoin en proposant un procédé de gestion de documents mis en œuvre au sein d’un système d’exploitation d’un ordinateur. Sur détection d’une sélection d’un document par un utilisateur, un tel procédé met en œuvre une notification à l’utilisateur d’au moins un document similaire au document sélectionné, deux documents similaires étant tous deux associés à au moins un terme appartenant à un corpus de termes d’intérêt, défini pour l’utilisateur et personnalisé en fonction d’un profil de centres d’intérêt de ce dernier.The invention meets this need by proposing a method for managing documents implemented within a computer operating system. Upon detection of a selection of a document by a user, such a method implements a notification to the user of at least one document similar to the selected document, two similar documents both being associated with at least one term belonging to a corpus of terms of interest, defined for the user and personalized according to a profile of centers of interest of the latter.

Ainsi, l’invention repose sur une approche tout à fait nouvelle et inventive de la gestion de documents et de recherche d’informations par le système d’exploitation d’un ordinateur. En effet, le procédé selon un mode de réalisation de l’invention repose sur la notification personnalisée en temps réel à l’utilisateur de documents similaires à celui qu’il s’apprête à consulter. On notera que par sélection d’un document, on entend tout événement lié au cycle de vie de ce document sur l’ordinateur, à savoir aussi bien l’ouverture de ce document dans un logiciel applicatif, que sa simple sélection via un dispositif de pointage (par exemple une souris) dans un explorateur de fichiers, sa suppression, ou encore son déplacement, sa copie, sa réception ou son envoi par l’utilisateur.Thus, the invention is based on an entirely new and inventive approach to document management and information retrieval by the operating system of a computer. Indeed, the method according to one embodiment of the invention is based on the personalized notification in real time to the user of documents similar to the one he is about to consult. It should be noted that by selecting a document, we mean any event linked to the life cycle of this document on the computer, namely both the opening of this document in an application software, and its simple selection via a pointing (for example a mouse) in a file explorer, its deletion, or even its moving, copying, receiving or sending by the user.

En outre, contrairement aux techniques connues de l’art antérieur, selon lesquelles la notion de similarité de documents est invariante d’un utilisateur à l’autre, car fondée sur une liste de mots clés établiea prioripour l’ensemble des utilisateurs (ou, dans le domaine voisin du filtrage collaboratif, qui n’est pas applicable ici, sur des logs d’usages ou d’achats d’un grand nombre d’utilisateurs), la technique de l’invention repose au contraire sur une personnalisation de cette similarité, en fonction d’un profil de ses centres d’intérêts (profil de mot-clés). Ainsi, pour un même document de contexte, noté X, la liste de documents similaires à X proposés à un utilisateur A sera distincte de la liste de documents similaires à X proposés à un utilisateur B, si A et B ont des profils de centres d’intérêts différents.In addition, unlike the known techniques of the prior art, according to which the notion of similarity of documents is invariant from one user to another, because it is based on a list of keywords established a priori for all users ( or, in the neighboring field of collaborative filtering, which is not applicable here, on the usage or purchase logs of a large number of users), the technique of the invention is based on the contrary on a personalization of this similarity, according to a profile of his centers of interest (keyword profile). Thus, for the same context document, denoted X, the list of documents similar to X offered to a user A will be distinct from the list of documents similar to X offered to a user B, if A and B have center profiles d different interests.

On notera qu’on considère qu’un terme d’intérêt est associé à un document, si le document est de nature textuelle et que le terme d’intérêt figure dans le corps ou dans le nom du document. Pour un document non textuel, par exemple une image ou une photographie, un terme d’intérêt lui est associé s’il figure dans des annotations, ou tags, associés au document ; un terme d’intérêt peut encore être associé à un document textuel ou non, si ce document est enregistré dans un répertoire dont le chemin d’accès comprend le terme d’intérêt (par exemple, le nom du répertoire de stockage du document contient le terme d’intérêt).Note that a term of interest is considered to be associated with a document, if the document is textual in nature and the term of interest appears in the body or in the name of the document. For a non-textual document, for example an image or a photograph, a term of interest is associated with it if it appears in annotations, or tags, associated with the document; a term of interest can still be associated with a textual document or not, if this document is saved in a directory whose access path includes the term of interest (for example, the name of the directory where the document is stored contains the term of interest).

Selon une caractéristique avantageuse, une telle notification du ou des document(s) similaire(s) s’effectue par affichage, dans une fenêtre d’une interface utilisateur, d’une liste dudit au moins un document similaire et d’un bouton d’action sur ledit au moins un document similaire. Notamment, dans un mode de réalisation, le bouton d’action affiché peut dépendre du type de sélection de document détecté.According to an advantageous characteristic, such a notification of the similar document(s) is carried out by displaying, in a window of a user interface, a list of said at least one similar document and a button to action on said at least one similar document. In particular, in one embodiment, the action button displayed may depend on the type of document selection detected.

Ainsi, en fonction d’un contexte associé au type d’action que l’utilisateur s’apprête à effectuer sur le document, on affiche un bouton de raccourci d’action contextuelle, permettant à l’utilisateur d’agir sur le ou les document(s) similaire(s) qui lui est(sont) proposé(s).Thus, depending on a context associated with the type of action that the user is about to perform on the document, a contextual action shortcut button is displayed, allowing the user to act on the similar document(s) proposed to it.

Par exemple, lorsqu’un utilisateur sélectionne ou ouvre un document, il obtient directement, par exemple via une fenêtre intruse (en anglais, pop-up) qui s’affiche sur l’écran de son ordinateur, une liste de K documents similaires ainsi qu’un accès au répertoire dans lesquels ils sont enregistrés, et un bouton d’accès direct pour à une ouverture en un clic de ces K documents. Cette fenêtre s’affiche en temps réel, dès que l’utilisateur manifeste son intérêt pour un document archivé dans un espace de stockage de son ordinateur ou d’un répertoire partagé qui lui est associé : grâce aux liens d’accès qui y figurent, l’utilisateur peut très rapidement accéder aux documents similaires, de son point de vue (en fonction de la personnalisation de ses centres d’intérêt), au document qu’il s’apprête à consulter.For example, when a user selects or opens a document, he obtains directly, for example via an intruding window (in English, pop-up) which is displayed on the screen of his computer, a list of K similar documents as well as access to the directory in which they are saved, and a direct access button for one-click opening of these K documents. This window is displayed in real time, as soon as the user expresses his interest in a document archived in a storage space on his computer or in a shared directory associated with him: thanks to the access links that appear there, the user can very quickly access similar documents, from his point of view (according to the personalization of his centers of interest), to the document he is about to consult.

Lorsqu’on détecte que l’utilisateur a sélectionné le document en vue de sa suppression, le bouton d’action contextuel qui s’affiche peut permettre un accès au répertoire du document similaire et une suppression de ce dernier en un clic.When it is detected that the user has selected the document for deletion, the contextual action button that appears can allow access to the directory of the similar document and deletion of the latter in one click.

De même, lorsqu’on détecte que l’utilisateur a sélectionné le document en vue de son déplacement, le bouton d’action contextuel qui s’affiche peut permettre un accès au répertoire du document similaire et un déplacement de ce document similaire vers le même répertoire cible que le document courant.Similarly, when it is detected that the user has selected the document with a view to moving it, the contextual action button that is displayed can allow access to the directory of the similar document and a movement of this similar document to the same target directory than the current document.

On peut ainsi, pour tout événement du cycle de vie d’un document, proposer un bouton d’action contextuel permettant à l’utilisateur de réaliser, sur le document similaire qui lui est proposé, une action rapide et identique à celle qu’il s’apprête à effectuer pour le document courant.It is thus possible, for any event in the life cycle of a document, to propose a contextual action button allowing the user to carry out, on the similar document which is proposed to him, a fast action identical to that which he is about to perform for the current document.

On notera bien sûr qu’en variante la notification de documents similaires peut se faire au moyen d’un assistant vocal, qui propose, en audio, une liste de documents similaires à l’utilisateur, et lui propose d’opérer certaines actions sur ces derniers, par commande vocale.It will of course be noted that as a variant the notification of similar documents can be done by means of a voice assistant, which offers, in audio, a list of similar documents to the user, and offers him to operate certain actions on these latest, by voice command.

Selon un mode de réalisation, un tel procédé comprend également un affichage d’un score de similarité affecté audit au moins un document similaire. L’utilisateur peut ainsi identifier aisément quels sont les documents les plus pertinents, car les plus similaires à celui qu’il s’apprête à consulter. Ceci est particulièrement avantageux pour la recherche de doublons, ou pour la recherche de versions antérieures du document sélectionné, qui sont susceptibles de présenter des scores de similarité très élevés. Il s’agit également d’une fonctionnalité intéressante lorsque l’utilisateur souhaite opérer une recherche bibliographique par rebond, pour identifier par exemple dans une bibliothèque d’articles le ou les articles traitant de sujets les plus similaires à celui de l’article qu’il a sélectionné. A nouveau, en variante, un assistant vocal peut se substituer à ce principe d’affichage.According to one embodiment, such a method also comprises a display of a similarity score assigned to said at least one similar document. The user can thus easily identify which are the most relevant documents, because they are the most similar to the one he is about to consult. This is particularly advantageous for finding duplicates, or for finding earlier versions of the selected document, which are likely to have very high similarity scores. This is also an interesting feature when the user wishes to perform a bibliographic search by rebound, for example to identify in a library of articles the article or articles dealing with subjects most similar to that of the article that he selected. Again, as a variant, a voice assistant can replace this display principle.

Selon un mode de réalisation, ledit au moins un document similaire est extrait, à partir d’un identifiant du document sélectionné, d’une table de documents similaires mise à jour cycliquement et chargée dans une mémoire vive dudit ordinateur.According to one embodiment, said at least one similar document is extracted, from an identifier of the selected document, from a table of similar documents updated cyclically and loaded into a random access memory of said computer.

Ainsi, pour offrir des recommandations pertinentes en temps réel à l’utilisateur, on met à jour de manière cyclique la table de documents similaires, par exemple une ou deux fois par jour. En outre, on la charge dans la mémoire vive de l’ordinateur (par exemple une mémoire RAM pour « Random Access Memory »), afin qu’il puisse y être accédé très rapidement, et ainsi offrir une proposition personnalisée rapide à l’utilisateur.Thus, to offer relevant recommendations in real time to the user, the table of similar documents is updated cyclically, for example once or twice a day. In addition, it is loaded into the RAM of the computer (for example a RAM memory for "Random Access Memory"), so that it can be accessed very quickly, and thus offer a quick personalized proposal to the user. .

Selon un aspect de l’invention, un tel procédé comprend une phase préliminaire de construction, par l’utilisateur, du corpus de termes d’intérêt comprenant un ensemble de mots et/ou expressions et/ou noms propres reflétant ses centres d’intérêt.According to one aspect of the invention, such a method comprises a preliminary phase of construction, by the user, of the corpus of terms of interest comprising a set of words and/or expressions and/or proper names reflecting his centers of interest .

Ainsi, les termes d’intérêt peuvent être aussi bien de simples mots, que des groupes de mots, des expressions, des noms propres, des noms de projets sur lesquels l’utilisateur travaille, etc. Ces termes sont choisis par l’utilisateur, qui personnalise donc totalement son corpus de termes d’intérêt, en fonction par exemple de son métier, sa fonction dans l’entreprise, les projets sur lesquels il travaille, ses centres d’intérêt. Cette phase de construction peut être déclarative, en ce sens que l’utilisateur propose une liste d’une dizaine ou centaine de termes clés qui définissent son profil. Il est aussi possible d’établir un certain nombre de profils types, par exemples par métiers, qui peuvent ensuite être adaptés et personnalisés par l’utilisateur. Cette phase de construction peut encore consister en une simple validation, par l’utilisateur, d’une liste de termes clés qui auraient été inférés par le système d’exploitation, par observation des documents et applications consultés par l’utilisateur sur une période de temps donnée.Thus, the terms of interest can be simple words as well as groups of words, expressions, proper nouns, names of projects on which the user is working, etc. These terms are chosen by the user, who therefore fully personalizes his corpus of terms of interest, depending for example on his profession, his function in the company, the projects on which he works, his centers of interest. This construction phase can be declarative, in the sense that the user proposes a list of ten or a hundred key terms that define his profile. It is also possible to establish a certain number of standard profiles, for example by profession, which can then be adapted and personalized by the user. This construction phase can also consist of a simple validation, by the user, of a list of key terms which would have been inferred by the operating system, by observing the documents and applications consulted by the user over a period of given time.

Selon un autre aspect, un tel procédé de gestion de documents comprend également une étape, mise en œuvre cycliquement, d’indexation de documents mémorisés dans au moins un espace de stockage associé à l’ordinateur, à partir du corpus de termes d’intérêt. Cette étape d’indexation délivre une matrice à deux dimensions dont une dimension correspond aux documents indexés et une dimension correspond aux termes d’intérêt du corpus, un coefficient de la matrice correspondant à un poids représentatif d’un nombre d’occurrences du terme d’intérêt dans le document. Un tel espace de stockage peut être un espace de stockage de l’ordinateur, ou un répertoire qui lui est associé, un espace de stockage partagé, etc.According to another aspect, such a method for managing documents also comprises a step, implemented cyclically, of indexing documents stored in at least one storage space associated with the computer, from the corpus of terms of interest . This indexing step delivers a two-dimensional matrix, one dimension of which corresponds to the indexed documents and one dimension corresponds to the terms of interest of the corpus, a coefficient of the matrix corresponding to a weight representative of a number of occurrences of the term d interest in the document. Such a storage space can be a computer storage space, or a directory associated with it, a shared storage space, etc.

L’invention concerne également un produit programme d'ordinateur comprenant des instructions de code de programme pour la mise en œuvre d’un procédé tel que décrit précédemment, lorsqu’il est exécuté par un processeur.The invention also relates to a computer program product comprising program code instructions for implementing a method as described previously, when it is executed by a processor.

L’invention vise également un support d’enregistrement lisible par un ordinateur sur lequel est enregistré un programme d’ordinateur comprenant des instructions de code de programme pour l’exécution des étapes du procédé de gestion de documents selon l’invention tel que décrit ci-dessus.The invention also relates to a recording medium readable by a computer on which is recorded a computer program comprising program code instructions for the execution of the steps of the document management method according to the invention as described above. -above.

Un tel support d'enregistrement peut être n'importe quelle entité ou dispositif capable de stocker le programme. Par exemple, le support peut comporter un moyen de stockage, tel qu'une ROM, par exemple un CD ROM ou une ROM de circuit microélectronique, ou encore un moyen d'enregistrement magnétique, par exemple une clé USB ou un disque dur.Such recording medium can be any entity or device capable of storing the program. For example, the medium may comprise a storage means, such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or else a magnetic recording means, for example a USB key or a hard disk.

D'autre part, un tel support d'enregistrement peut être un support transmissible tel qu'un signal électrique ou optique, qui peut être acheminé via un câble électrique ou optique, par radio ou par d'autres moyens, de sorte que le programme d’ordinateur qu’il contient est exécutable à distance. Le programme selon l'invention peut être en particulier téléchargé sur un réseau par exemple le réseau Internet.On the other hand, such a recording medium may be a transmissible medium such as an electrical or optical signal, which may be conveyed via an electrical or optical cable, by radio or by other means, so that the program computer it contains is executable remotely. The program according to the invention can in particular be downloaded onto a network, for example the Internet network.

Alternativement, le support d'enregistrement peut être un circuit intégré dans lequel le programme est incorporé, le circuit étant adapté pour exécuter ou pour être utilisé dans l'exécution du procédé de gestion de documents précité.Alternatively, the recording medium may be an integrated circuit in which the program is incorporated, the circuit being suitable for executing or for being used in the execution of the aforementioned document management method.

L’invention concerne encore un système de gestion de documents intégré dans un système d’exploitation d’un ordinateur, qui comprend :
- une sonde logicielle configurée pour détecter une sélection d’un document par un utilisateur,
- un module de notification configuré pour notifier à l’utilisateur au moins un document similaire au document sélectionné, deux documents étant similaires quand ils sont tous deux associés à au moins un terme appartenant à un corpus de termes d’intérêt, défini pour l’utilisateur et personnalisé en fonction d’un profil de centres d’intérêt de ce dernier.The invention also relates to a document management system integrated into a computer operating system, which comprises:
- a software probe configured to detect a selection of a document by a user,
- a notification module configured to notify the user of at least one document similar to the selected document, two documents being similar when they are both associated with at least one term belonging to a corpus of terms of interest, defined for the user and personalized according to a profile of centers of interest of the latter.

On rappelle que par sélection d’un document, on entend tout événement lié au cycle de vie de ce document sur l’ordinateur, à savoir aussi bien l’ouverture de ce document dans un logiciel applicatif, que sa simple sélection via un dispositif de pointage (par exemple une souris) dans un explorateur de fichiers, sa suppression, ou encore son déplacement ou sa copie par l’utilisateur, ainsi que sa réception ou son envoi. Ainsi, la sonde logicielle est configurée pour détecter tout événement lié au cycle de vie de ce document sur l’ordinateur.It is recalled that by selecting a document, we mean any event linked to the life cycle of this document on the computer, namely both the opening of this document in an application software, and its simple selection via a pointing (for example a mouse) in a file explorer, its deletion, or even its moving or copying by the user, as well as its reception or sending. Thus, the software probe is configured to detect any event related to the life cycle of this document on the computer.

Selon un aspect, un tel système comprend également un module d’accès à une table de documents similaires, la table associant à un document de référence au moins un document qui lui est similaire, et le module de notification est configuré pour interroger le module d’accès à partir d’un identifiant du document sélectionné fourni par la sonde logicielle, et pour notifier à l’utilisateur une liste d’au moins un document similaire au document sélectionné, extraite de la table par le module d’accès.According to one aspect, such a system also comprises a module for accessing a table of similar documents, the table associating with a reference document at least one document which is similar to it, and the notification module is configured to interrogate the access module from an identifier of the selected document provided by the software probe, and to notify the user of a list of at least one document similar to the selected document, extracted from the table by the access module.

Selon un autre aspect, un tel système comprend également :
- un module de gestion de profil d’utilisateur, configuré pour construire le corpus de termes d’intérêt défini pour l’utilisateur,
- un module d’indexation de documents mémorisés dans au moins un espace de stockage de l’ordinateur, à partir du corpus de termes d’intérêt,
- un module de calcul de scores de similarité associés aux documents indexés, configuré pour générer la table de documents similaires.According to another aspect, such a system also comprises:
- a user profile management module, configured to build the corpus of terms of interest defined for the user,
- a module for indexing documents stored in at least one storage space of the computer, from the corpus of terms of interest,
- a module for calculating similarity scores associated with the indexed documents, configured to generate the table of similar documents.

Plus généralement, un tel système de gestion de documents présente en combinaison tout ou partie des caractéristiques exposées dans l'ensemble de ce document.More generally, such a document management system has in combination all or part of the characteristics set out throughout this document.

Le système de gestion de documents et le programme d'ordinateur correspondants précités présentent au moins les mêmes avantages que ceux conférés par le procédé de gestion de documents selon la présente invention.The aforementioned document management system and corresponding computer program have at least the same advantages as those conferred by the document management method according to the present invention.

Présentation des figuresPresentation of figures

D'autres buts, caractéristiques et avantages de l'invention apparaîtront plus clairement à la lecture de la description suivante, donnée à titre de simple exemple illustratif, et non limitatif, en relation avec les figures, parmi lesquelles :Other aims, characteristics and advantages of the invention will appear more clearly on reading the following description, given by way of a simple illustrative example, and not limiting, in relation to the figures, among which:

présente sous forme schématique la structure matérielle et fonctionnelle du système de gestion de documents selon un mode de réalisation de l’invention ; presents in schematic form the hardware and functional structure of the document management system according to one embodiment of the invention;

illustre les principales étapes mises en œuvre par la brique fonctionnelle temps réel du système de gestion de documents de la . illustrates the main steps implemented by the real-time functional brick of the document management system of the .

Description détaillée de modes de réalisation de l'inventionDetailed Description of Embodiments of the Invention

Le principe général de l'invention repose sur la notification automatique et personnalisée, à un utilisateur, de documents similaires à celui qu’il s’apprête à consulter sur son poste de travail, à des fins d’ouverture, de sélection, de modification, de suppression, de copie, de déplacement ou encore d’envoi par exemple.The general principle of the invention is based on the automatic and personalized notification, to a user, of documents similar to the one he is about to consult on his workstation, for the purposes of opening, selecting, modifying , deleting, copying, moving or even sending, for example.

On notera que, dans l’ensemble de ce document, il est fait référence au système d’exploitation d’un ordinateur. Ce dernier terme doit s’entendre au sens large et couvre aussi bien un ordinateur personnel de type PC qu’un ordinateur portable, un téléphone intelligent de type smartphone, une tablette, ou tout autre équipement informatique équipé d’une interface utilisateur et d’un système d’exploitation. On utilise par la suite uniquement le terme « ordinateur », par souci de simplification.Note that throughout this document, reference is made to the operating system of a computer. This last term must be understood in the broad sense and covers both a personal computer of the PC type and a laptop computer, a smart phone of the smartphone type, a tablet, or any other computer equipment equipped with a user interface and an operating system. Only the term “computer” is used hereafter, for the sake of simplification.

On présente désormais, en relation avec la , la structure générale d’un système de gestion de documents selon un mode de réalisation de l’invention. Un tel système comprend deux briques fonctionnelles principales :We now present, in relation to the , the general structure of a document management system according to one embodiment of the invention. Such a system comprises two main functional bricks:

A first brick referenced 100, which operates cyclically or in batches. For example, this first functional brick 100 performs a daily or bi-daily update.
A second brick referenced 200, which works in real time, during the work of the user on his computer.

On s’attache tout d’abord à décrire la structure et le fonctionnement de la première brique 100 de traitement par lots.We first set out to describe the structure and operation of the first batch processing brick 100.

Elle comprend un module GEST_PROF, référencé 10, de gestion du profil des utilisateurs, et notamment dans cet exemple de l’utilisateur Ut-A. Un tel module GEST_PROF 10 est configuré pour récupérer et gérer la liste de mots-clés et expressions qui intéressent l’utilisateur Ut-A, i.e. son corpus de termes d’intérêt. Un tel module comprend donc une unité d’interface, par laquelle il obtient les mots clés, noms propres ou expressions liés aux centres d’intérêt de l’utilisateur Ut-A, tels que ce dernier les saisit à l’aide du clavier, ou d’une interface audio ou graphique de l’ordinateur. Cette construction du corpus de termes d’intérêt est mise en œuvre au cours d’une phase préalable de configuration du système.It includes a module GEST_PROF, referenced 10, for managing the user profile, and in particular in this example of the user Ut-A. Such a module GEST_PROF 10 is configured to retrieve and manage the list of keywords and expressions which interest the user Ut-A, i.e. his corpus of terms of interest. Such a module therefore comprises an interface unit, by which it obtains the key words, proper names or expressions linked to the centers of interest of the user Ut-A, such as the latter enters them using the keyboard, or a computer audio or graphics interface. This construction of the corpus of terms of interest is implemented during a preliminary phase of system configuration.

Plusieurs procédés sont envisageables pour la mise en œuvre de cette phase de construction du corpus de termes d’intérêt :Several processes are possible for the implementation of this phase of construction of the corpus of terms of interest:

The Ut-A user can self-declare the ten or a hundred terms of interest that best define his profile;
He can also enter them in the form of checkboxes for hierarchical themes and sub-themes that he selects;
We can also consider offering the Ut-A user a corpus of terms of interest corresponding to his job profile, which he can validate or personalize;
Finally, this corpus of terms of interest can be deduced by the module GEST_PROF 10, at the end of a phase of observation of the behavior of the user Ut-A: the corpus of terms of interest inferred by the module GEST-PROF 10 can be subject to the possible validation of the Ut-A user.

Plus généralement, toute méthode d’obtention préalable d’un corpus de termes d’intérêt personnalisé pour l’utilisateur Ut-A peut être mise en œuvre par le module GEST_PROF 10, sans sortir du cadre de l’invention.More generally, any method for obtaining beforehand a corpus of personalized terms of interest for the user Ut-A can be implemented by the GEST_PROF module 10, without departing from the scope of the invention.

Le corpus de termes d’intérêt 11 ainsi généré par le module de gestion de profil GEST_PROF 10 alimente un module logiciel d’indexation personnalisée INDEX 13.The corpus of terms of interest 11 thus generated by the profile management module GEST_PROF 10 feeds a personalized indexing software module INDEX 13.

Ce module d’indexation INDEX 13 parcourt cycliquement le disque dur 12 du poste de travail de l’utilisateur Ut-A, ou tout autre espace de stockage spécifié par paramétrage (répertoire partagé, disque dur externe, etc.). Tout document textuel est reconnu, et l’ensemble de son contenu est lu par le module INDEX 13, qui y recherche l’ensemble des termes d’intérêt du corpus 11 de l’utilisateur Ut-A. De même, le module INDEX 13 parcourt les répertoires contenant des photographies, des images, ou d’autres documents non textuels, à la recherche d’annotations contenant des termes d’intérêt du corpus 11, ou en analysant les noms des chemins d’accès à ces documents à la recherche de ces termes. Par exemple, si le corpus 11 contient le terme d’intérêt « EGYPTE », des photographies de voyage stockées dans un répertoire C:/Congrès/Egypte pourraient être indexées avec le mot clé « EGYPTE », même si elle ne compte aucune annotation ou tag.This INDEX 13 indexing module cyclically scans the hard disk 12 of the user's workstation Ut-A, or any other storage space specified by configuration (shared directory, external hard disk, etc.). Any textual document is recognized, and all of its content is read by the INDEX module 13, which searches therein for all the terms of interest of the corpus 11 of the user Ut-A. Similarly, the INDEX module 13 searches the directories containing photographs, images, or other non-textual documents, looking for annotations containing terms of interest from the corpus 11, or by analyzing the names of the paths of access these documents by searching for these terms. For example, if corpus 11 contains the term of interest "EGYPT", travel photographs stored in a directory C:/Congress/Egypt could be indexed with the keyword "EGYPT", even if it does not have any annotation or tag.

Dans ce mode de réalisation, le module INDEX 13 n’indexe donc les documents contenus dans le disque dur 12 que sur la base du vocabulaire restreint du corpus 11, ce qui permet une indexation personnalisée pour l’utilisateur Ut-A.In this embodiment, the INDEX module 13 therefore indexes the documents contained in the hard disk 12 only on the basis of the restricted vocabulary of the corpus 11, which allows personalized indexing for the user Ut-A.

Le module d’indexation personnalisée INDEX 13 construit ainsi une matrice Documents x Termes référencée 14, qui est mise à jour à chaque cycle de fonctionnement de la brique logicielle 100. Dans cette matrice 14, chaque ligne correspond à un document indexé par le module INDEX 13, et chaque colonne correspond à un terme d’intérêt du corpus 11. Selon une première approche simple, chaque coefficient de la matrice a_i,jcorrespond au nombre d’occurrences du terme d’intérêt j dans le document i. Ainsi, chaque document est représenté par un vecteur d’un certain nombre de termes d’intérêt extraits du corpus 11 personnalisé de l’utilisateur Ut-A, selon le principe du Vector Space Model décrit par SALTON Gerard, WONG Anita, et YANG Chung-Shu dans « A vector space model for automatic indexing », Communications of the ACM, 1975, vol. 18, no 11, p. 613-620.The personalized indexing module INDEX 13 thus constructs a Documents x Terms matrix referenced 14, which is updated at each operating cycle of the software component 100. In this matrix 14, each line corresponds to a document indexed by the INDEX module 13, and each column corresponds to a term of interest from the corpus 11. According to a first simple approach, each coefficient of the matrix a _i,j corresponds to the number of occurrences of the term of interest j in the document i. Thus, each document is represented by a vector of a certain number of terms of interest extracted from the personalized corpus 11 of the user Ut-A, according to the principle of the Vector Space Model described by SALTON Gerard, WONG Anita, and YANG Chung -Shu in “A vector space model for automatic indexing”, Communications of the ACM, 1975, vol. 18, No. 11, p. 613-620.

Le document d_ipeut alors être représenté sous la forme d_i=(w_1,i, w_2,i, w_3,i,…, w_t,i), où chaque coefficient w_j,icorrespond à un terme d’intérêt distinct du corpus 11. Dès qu’un terme est associé au document, sa valeur dans le vecteur est non nulle. Différentes techniques de l’art antérieur existent pour le calcul de ses valeurs, également appelées poids des termes d’intérêt. Parmi celles-ci, on s’attache ci-après à décrire à titre d’exemple la méthode de pondération dite TF-IDF (de l’anglais « Term Frequency-Inverse Document Frequency »). Cette méthode, souvent utilisée en recherche d’information et en particulier dans la fouille de textes permet d’évaluer l’importance d’un terme contenu dans un document, relativement à un corpus. Le poids augmente proportionnellement au nombre d’occurrences du terme dans le document. Il varie également en fonction de la fréquence du terme dans le corpus. Elle est notamment décrite dans l’article de AIZAWA Akiko « An information-theoretic perspective of tf–idf measures », Information Processing & Management, 2003, vol. 39, no 1, p. 45-65.The document d _i can then be represented in the form d _i =(w _1,i , w _2,i , w _3,i ,…, w _t,i ), where each coefficient w _j,i corresponds to a term d distinct interest of the corpus 11. As soon as a term is associated with the document, its value in the vector is non-zero. Different techniques of the prior art exist for the calculation of its values, also called weights of the terms of interest. Among these, we set out below to describe by way of example the weighting method known as TF-IDF (from the English “Term Frequency-Inverse Document Frequency”). This method, often used in information retrieval and in particular in text mining, makes it possible to evaluate the importance of a term contained in a document, relative to a corpus. The weight increases proportionally to the number of occurrences of the term in the document. It also varies according to the frequency of the term in the corpus. It is described in particular in the article by AIZAWA Akiko “An information-theoretic perspective of tf–idf measures”, Information Processing & Management, 2003, vol. 39, No. 1, p. 45-65.

La fréquence d’un terme TF peut correspondre à sa fréquence brute (i.e. le nombre d’occurrences de ce terme dans le document considéré), qui peut, en variante, être normalisée logarithmiquement pour amortir les écarts, ou normalisée par la fréquence brute maximale du document pour prendre en compte la longueur de ce dernier.The frequency of a TF term can correspond to its raw frequency (i.e. the number of occurrences of this term in the document considered), which can, as a variant, be normalized logarithmically to smooth out the differences, or normalized by the maximum raw frequency of the document to take into account the length of the latter.

La fréquence inverse de document IDF est quant à elle une mesure de l’importance du terme dans l’ensemble des documents. Elle vise ainsi à donner un poids plus important aux termes les moins fréquents, considérés comme plus discriminants. Elle consiste à calculer le logarithme (en base 10 ou en base 2) de l’inverse de la proportion de documents de la matrice 14 qui contiennent le terme w_i, sous la forme :The inverse IDF document frequency is a measure of the importance of the term in all the documents. It thus aims to give greater weight to less frequent terms, considered to be more discriminating. It consists of calculating the logarithm (in base 10 or in base 2) of the inverse of the proportion of documents in matrix 14 that contain the term w _i , in the form:

Où |D| représente le nombre total de documents parcourus par le module d’indexation INDEX 13 et où le dénominateur représente le nombre de documents d_jauxquels le terme w_iest associé (i.e. le nombre de termes a_i,jnon nuls de la matrice 14).Where |D| represents the total number of documents scanned by the indexing module INDEX 13 and where the denominator represents the number of documents d _j with which the term w _i is associated (ie the number of non-zero terms a _i,j of the matrix 14) .

Le poids d’un terme j dans un document i (i.e. le coefficient a_i,jde la matrice 14) s’obtient en multipliant les deux mesures de la fréquence d’un terme, et la fréquence inverse de document : a_i,j=tf_i,j.idf_i The weight of a term j in a document i (ie the coefficient a _i,j of the matrix 14) is obtained by multiplying the two measurements of the frequency of a term, and the inverse document frequency: a _{i, j} =tf _i,j .idf _i

Après normalisation de la matrice 14 par un procédé de type TF-IDF, elle est fournie en entrée d’un module CALC_SIMIL 15, qui est configuré pour calculer et stocker les similarités personnalisées entre documents. Ce module CALC_SIMIL 15 effectue une recherche dite de K-plus-proches-voisins, qui est connue de l’état de l’art, et ne sera donc pas décrite ici plus en détail. A titre d’exemple, on pourra se référer, pour mieux comprendre la technique antérieure de rapprochement de documents basés sur leur contenu, à l’article de BILLSUS Daniel et PAZZANI Michael J. « User modeling for adaptive news access », User modeling and user-adapted interaction, 2000, vol. 10, no 2-3, p. 147-18. Un tel article décrit l’utilisation des techniques de K-plus-proches voisins, mais, contrairement à la présente invention, sans sélection des mots-clés représentant les documents en amont pour le calcul des similarités.After normalization of the matrix 14 by a TF-IDF type process, it is supplied as input to a CALC_SIMIL 15 module, which is configured to calculate and store the personalized similarities between documents. This CALC_SIMIL 15 module performs a so-called K-nearest-neighbor search, which is known from the state of the art, and will therefore not be described here in more detail. By way of example, reference may be made, in order to better understand the prior technique for matching documents based on their content, to the article by BILLSUS Daniel and PAZZANI Michael J. "User modeling for adaptive news access", User modeling and user-adapted interaction, 2000, vol. 10, no. 2-3, p. 147-18. Such an article describes the use of K-nearest neighbor techniques, but, contrary to the present invention, without selecting the keywords representing the upstream documents for the calculation of similarities.

Cette recherche de K-plus-proches-voisins permet d’associer chaque document d_ide la matrice 14 à une liste de K documents qui lui sont similaires, cette similarité étant calculée à partir des termes d’intérêt du corpus 11 propre à l’utilisateur, et donc personnalisée. Le nombre K est bien sûr paramétrable.This search for K-nearest-neighbors makes it possible to associate each document d _i of the matrix 14 with a list of K documents which are similar to it, this similarity being calculated from the terms of interest of the corpus 11 specific to the user, and therefore personalized. The number K is of course configurable.

Cette recherche repose sur une comparaison des vecteurs d_i=(w_1,i, w_2,i, w_3,i,…, w_t,i) représentant chacun des documents d_i, par exemple par mesure de distance euclidienne ou mesure du cosinus de l’angle entre les vecteurs.This search is based on a comparison of the vectors d _i =(w _1,i , w _2,i , w _3,i ,…, w _t,i ) representing each of the documents d _i , for example by Euclidean distance measure or measure of the cosine of the angle between the vectors.

Le module CALC_SIMIL 15 génère ainsi une table TAB constituée de la liste des documents d_iet, pour chaque document d_i, des K documents qui lui sont similaires. Cette table TAB est ensuite stockée sur le disque 16 du poste de travail de l’utilisateur, en base de données locale.The CALC_SIMIL 15 module thus generates a table TAB consisting of the list of documents d _i and, for each document d _i , of the K documents that are similar to it. This table TAB is then stored on disk 16 of the user's workstation, in a local database.

Cette table TAB dite « table des documents similaires » est mise à jour à chaque cycle du module d’indexation INDEX 13, par exemple une à deux fois par jour.This table TAB called “table of similar documents” is updated at each cycle of the indexing module INDEX 13, for example once or twice a day.

La table TAB peut également contenir, pour chaque document, d’autres informations complémentaires, telles que leur chemin d’accès ou des informations de présentation (date, nuage de mots clés, …). Elle peut également contenir, pour chaque document similaire, un score de similarité, reflétant la similarité d’un document avec l’un de ses K documents similaires. Un tel score de similarité peut être calculé à titre d’exemple à partir du cosinus de l’angle entre les vecteurs représentant les deux documents que l’on compare.The TAB table can also contain, for each document, other additional information, such as their access path or presentation information (date, keyword cloud, etc.). It can also contain, for each similar document, a similarity score, reflecting the similarity of a document with one of its K similar documents. Such a similarity score can be calculated by way of example from the cosine of the angle between the vectors representing the two documents being compared.

On s’attache désormais à décrire la structure et le fonctionnement de la deuxième brique logicielle temps réel 200.We now attempt to describe the structure and operation of the second real-time software component 200.

Elle comprend un module DET_EVT 21, qui écoute les événements du système d’exploitation de l’ordinateur 22. Un tel module DET_EVT 21 constitue une sonde logicielle qui détecte les événements systèmes sur un PC, et notamment les événements pouvant être liés au cycle de vie d’un document, tels que par exemple :It comprises a DET_EVT 21 module, which listens to the events of the operating system of the computer 22. Such a DET_EVT 21 module constitutes a software probe which detects the system events on a PC, and in particular the events which may be linked to the cycle of life of a document, such as for example:

- sélection d’un document : par exemple, l’utilisateur Ut-A sélectionne un document via un explorateur de fichiers, et via un dispositif de pointage de type souris ;- selection of a document: for example, the user Ut-A selects a document via a file explorer, and via a mouse type pointing device;

- ouverture d’un document : par exemple, l’utilisateur Ut-A ouvre un document via un logiciel applicatif de traitement de texte ;- opening of a document: for example, the user Ut-A opens a document via a word processing software application;

- suppression d’un document ;- deletion of a document;

- déplacement d’un document ;- moving a document;

- copie d’un document ;- copy of a document;

- réception ou envoi d’un document comme pièce jointe à un email, ou via tout autre logiciel de messagerie ou assimilé ;- receiving or sending a document as an attachment to an email, or via any other messaging software or similar;

- etc.- etc.

Le principe de ce type de module DET_EVT 21 est connu de l’état de l’art et ne sera pas détaillé ici. Il consiste à s’interfacer avec le système d’exploitation de l’ordinateur 22 (par exemple, Windows®) et à écouter les différents événements loggués par le système au fil de l’eau (clavier, souris, applications lancées, assistants vocaux, fichiers sélectionnés, etc.). Parmi les événements systèmes, les événements pouvant être liés au cycle de vie d’un document sont surveillées, de façon à pouvoir appeler ensuite le module de notification NOTIF_DOC_SIMIL référencé 23. Lorsque le module DET_EVT 21 détecte un des événements surveillés, il récupère en paramètre un identifiant du document relatif à cet événement et appelle le module de notification NOTIF_DOC_SIMIL 23 en lui fournissant ce paramètre.The principle of this type of DET_EVT 21 module is known from the state of the art and will not be detailed here. It consists of interfacing with the operating system of the computer 22 (for example, Windows®) and listening to the various events logged by the system as they occur (keyboard, mouse, applications launched, voice assistants , selected files, etc.). Among the system events, the events that can be linked to the life cycle of a document are monitored, so as to be able to then call the NOTIF_DOC_SIMIL notification module referenced 23. When the DET_EVT module 21 detects one of the monitored events, it recovers as a parameter an identifier of the document relating to this event and calls the notification module NOTIF_DOC_SIMIL 23 by providing it with this parameter.

Ce module NOTIF_DOC_SIMIL 23 constitue le module principal de la brique temps réel 200, et a pour fonction d’effectuer les recommandations de documents similaires, à l’utilisateur Ut-A, en fonction du contexte, notamment du document de référence qui est sélectionné, modifié, supprimé ou ouvert, et dont l’identifiant est reçu du module d’écoute des événements DET-EVT 21.This NOTIF_DOC_SIMIL 23 module constitutes the main module of the real-time brick 200, and has the function of making the recommendations of similar documents, to the user Ut-A, according to the context, in particular of the reference document which is selected, modified, deleted or opened, and whose identifier is received from the event listening module DET-EVT 21.

A réception de l’identifiant du document sélectionné par l’utilisateur Ut-A, le module NOTIF_DOC_SIMIL 23 adresse une requête au module d’accès ACC-TAB 20, qui gère l’accès en ligne à la table de documents similaires TAB. Ce module d’accès ACC-TAB 20 reçoit en entrée l’identifiant du document sélectionné par l’utilisateur Ut-A et délivre en sortie une liste de documents similaires au document sélectionné, accompagnée de leurs informations de présentation. Cette liste contient donc tous les documents qui ressemblent thématiquement au document de référence qui vient d’être sélectionné pour action par l’utilisateur Ut-A.Upon receipt of the identifier of the document selected by the user Ut-A, the module NOTIF_DOC_SIMIL 23 sends a request to the access module ACC-TAB 20, which manages online access to the table of similar documents TAB. This access module ACC-TAB 20 receives as input the identifier of the document selected by the user Ut-A and outputs a list of documents similar to the selected document, accompanied by their presentation information. This list therefore contains all the documents which thematically resemble the reference document which has just been selected for action by the user Ut-A.

Le module NOTIF_DOC_SIMIL 23 notifie ensuite cette liste à l’utilisateur Ut-A, comme illustré par la .The NOTIF_DOC_SIMIL 23 module then notifies this list to the user Ut-A, as illustrated by the .

Sur cette , on a représenté la fenêtre 30 de l’explorateur de fichiers de l’ordinateur 22, qui s’affiche sur l’écran de ce dernier, lorsque l’utilisateur explore l’arborescence de l’espace de stockage 12 de l’ordinateur. Dans la liste des documents affichés sur cette fenêtre 30, le document référencé 31 apparaît surligné, car il correspond au document que l’utilisateur Ut-A a sélectionné, au cours d’une étape S1 de sa navigation, à l’aide de la souris de son ordinateur.On this , the window 30 of the file explorer of the computer 22 has been represented, which is displayed on the screen of the latter, when the user explores the tree structure of the storage space 12 of the computer . In the list of documents displayed on this window 30, the referenced document 31 appears highlighted, because it corresponds to the document that the user Ut-A has selected, during a step S1 of his navigation, using the mouse from his computer.

Comme indiqué ci-avant, cet événement S1 est détecté par la sonde logicielle DET_EVT 21, qui fournit au module NOTIF_DOC_SIMIL 23 un identifiant de ce document 31. Ce dernier interroge le module d’accès ACC_TAB 20 avec comme paramètre d’entrée l’identifiant du document 31 sélectionné, et reçoit en retour la liste des documents similaires au document sélectionné 31, telle qu’extraite de la table de documents similaires TAB.As indicated above, this event S1 is detected by the software probe DET_EVT 21, which supplies the module NOTIF_DOC_SIMIL 23 with an identifier of this document 31. The latter interrogates the access module ACC_TAB 20 with the identifier of the selected document 31, and receives in return the list of documents similar to the selected document 31, as extracted from the table of similar documents TAB.

Au cours d’une étape référencée S2, le module NOTIF_DOC_SIMIL 23 affiche alors une fenêtre contextuelle 32 en pop-up (en français, fenêtre intruse), avec pour chaque document similaire recommandé, un ensemble d’informations qui lui sont associées, telles que par exemple son nom, ses principaux thèmes, et/ou un court extrait (en anglais « snippet », terme de programmation informatique désignant une petite portion réutilisable de code source ou, ici, de texte), son emplacement, un bouton d’action par simple clic et un indice de similarité avec le document de référence 31. D’autres informations éventuelles pourraient bien sûr être aussi affichées. En outre, en variante de réalisation, ces différentes informations pourraient être fournies à l’utilisateur par un assistant vocal, au lieu d’être affichées à l’écran.During a step referenced S2, the NOTIF_DOC_SIMIL module 23 then displays a pop-up window 32 (in French, intruder window), with for each similar document recommended, a set of information associated with it, such as for example its name, its main themes, and/or a short extract (in English "snippet", a computer programming term designating a small reusable portion of source code or, here, of text), its location, an action button by simple click and a similarity index with the reference document 31. Other possible information could of course also be displayed. In addition, as an alternative embodiment, these various pieces of information could be provided to the user by a voice assistant, instead of being displayed on the screen.

Ainsi, dans l’exemple de la , la fenêtre 32 est organisée en quatre colonnes :So, in the example of the , window 32 is organized into four columns:

A first column 324 contains the name of the similar document, as well as the access path to the directory in which it is stored (for example …/Projet_Ordalideal/Point project IDEAL-S May 2020-V16.pptx);
A second column 323 contains a brief description of the similar document;
A third column offers contextual OPEN 322 action buttons on which the user Ut-A can click, for example using his mouse, to perform an action on the similar document;
Finally, a fourth column indicates the similarity score 321 of the similar document with the reference document 31.

On notera que, dans cet exemple, le bouton d’action contextuel OPEN 322 qui s’affiche permet une ouverture en un clic du document similaire. Cependant, dans un mode de réalisation, les actions proposées en raccourcis via des boutons 322 sont eux-mêmes contextuels à l’action de l’utilisateur détectée par la sonde. Par exemple, pour la détection d’une suppression du document de référence 31, le bouton d’action 322 qui s’affiche peut permettre la suppression en un clic du document similaire ; de même, pour un déplacement de documents, un bouton d’action 322 pourrait permettre de déplacer, vers la même destination, les documents similaires (même s’ils sont issus d’autres répertoires variés), etc.It will be noted that, in this example, the contextual action button OPEN 322 which is displayed allows a one-click opening of the similar document. However, in one embodiment, the actions offered as shortcuts via buttons 322 are themselves contextual to the user action detected by the probe. For example, for the detection of a deletion of the reference document 31, the action button 322 which is displayed can allow the deletion in one click of the similar document; similarly, for moving documents, an action button 322 could make it possible to move, to the same destination, similar documents (even if they come from other varied directories), etc.

Si l’utilisateur Ut-A s’apprête à envoyer à un utilisateur Ut-B un document « comparatif_rentrée_trottinettes_électriques », le système selon un mode de réalisation de l’invention peut lui proposer d’envoyer aussi le document « guide_2020_vélos_électriques », qui lui est similaire.If the user Ut-A is about to send a user Ut-B a "comparative_return_to_electric_scooter" document, the system according to one embodiment of the invention can suggest that he also send the document "guide_2020_electric_bikes", which Is similar.

Ainsi, plus généralement, les boutons d’action 322 peuvent être des boutons « réplication d’actions contextuelles » permettant d’appliquer « en un clic », pour chaque document similaire, la même action que celle appliquée au document de contexte 31 : par exemple, si l’utilisateur Ut-A déplace un document « bilan_annuel_2020.docx » vers un nouveau répertoire « C:\BILANS », le système lui signale tous les autres documents similaires (BILAN _2012.doc, BILAN_ARCH_2010.doc..) issus de répertoires plus anciens et propose pour chacun d’eux, via un bouton contextuel 322 « Déplacer aussi » de les déplacer, en un clic, vers le même répertoire de destination.Thus, more generally, the action buttons 322 can be “replication of contextual actions” buttons making it possible to apply “in one click”, for each similar document, the same action as that applied to the context document 31: for example, if the user Ut-A moves a document "balance_annuel_2020.docx" to a new directory "C:\BILANS", the system notifies him of all the other similar documents (BILAN _2012.doc, BILAN_ARCH_2010.doc..) from older directories and proposes for each of them, via a contextual button 322 “Move also” to move them, in one click, to the same destination directory.

On comprend bien sûr que dans un mode de réalisation reposant sur l’utilisation d’un assistant vocal, les boutons d’action 322 peuvent être remplacés par des ordres vocalisés, du type « ouvre tel document ». Dans ce cas, l’assistant vocal interprète l’ordre formulé par l’utilisateur Ut-A, qui est ensuite converti, par le système d’exploitation, en une action de sélection puis ouverture du document.It is of course understood that in an embodiment based on the use of a voice assistant, the action buttons 322 can be replaced by vocalized commands, of the “open such document” type. In this case, the voice assistant interprets the command formulated by the user Ut-A, which is then converted, by the operating system, into an action of selection then opening of the document.

La durée d’affichage de la fenêtre pop-up 32 peut être paramétrée par l’utilisateur Ut-A, en fonction de ses besoins. Un affichage de quelques secondes peut être suffisant la plupart du temps pour permettre à l’utilisateur de prendre connaissance rapidement de la liste de documents similaires. Cependant, dans un contexte particulier où l’utilisateur Ut-A effectue par exemple une recherche bibliographique, dans une base de données d’articles ou de publications, il peut être intéressant de configurer un temps d’affichage plus long de la fenêtre 32, voire même permanent.The display duration of the pop-up window 32 can be configured by the user Ut-A, according to his needs. A display of a few seconds may be sufficient most of the time to allow the user to quickly become acquainted with the list of similar documents. However, in a particular context where the user Ut-A performs for example a bibliographic search, in a database of articles or publications, it may be interesting to configure a longer display time of the window 32, even permanent.

On notera que, dans l’ensemble de la description donnée ci-avant en relation avec les figures 1 et 2, le terme module peut correspondre aussi bien à un composant logiciel qu’à un composant matériel ou un ensemble de composants matériels et logiciels, un composant logiciel correspondant lui-même à un ou plusieurs programmes ou sous-programmes d’ordinateur ou de manière plus générale à tout élément d’un programme apte à mettre en œuvre une fonction ou un ensemble de fonctions.It will be noted that, throughout the description given above in relation to FIGS. 1 and 2, the term module can correspond both to a software component and to a hardware component or a set of hardware and software components, a software component itself corresponding to one or more computer programs or sub-programs or more generally to any element of a program capable of implementing a function or a set of functions.

La illustre seulement une manière particulière, parmi plusieurs possibles, de réaliser le système de gestion de documents, afin qu’il effectue les étapes du procédé détaillé ci-dessus, en relation avec les figures 1 et 2 (dans l’un quelconque des différents modes de réalisation, ou dans une combinaison de ces modes de réalisation). En effet, ces étapes peuvent être réalisées indifféremment sur une machine de calcul reprogrammable (un ordinateur PC, un processeur DSP ou un microcontrôleur) exécutant un programme comprenant une séquence d’instructions, ou sur une machine de calcul dédiée (par exemple un ensemble de portes logiques comme un FPGA ou un ASIC, ou tout autre module matériel).There only illustrates one particular way, among several possible, of realizing the document management system, so that it carries out the steps of the method detailed above, in relation to FIGS. 1 and 2 (in any one of the different modes embodiment, or in a combination of these embodiments). Indeed, these steps can be carried out either on a reprogrammable calculation machine (a PC computer, a DSP processor or a microcontroller) executing a program comprising a sequence of instructions, or on a dedicated calculation machine (for example a set of logic gates like an FPGA or an ASIC, or any other hardware module).

Dans le cas où le système de gestion de documents est réalisé avec une machine de calcul reprogrammable, le programme correspondant (c'est-à-dire la séquence d’instructions) pourra être stocké dans un médium de stockage amovible (tel que par exemple une disquette, un CD-ROM ou un DVD-ROM) ou non, ce médium de stockage étant lisible partiellement ou totalement par un ordinateur ou un processeur.In the case where the document management system is produced with a reprogrammable calculation machine, the corresponding program (that is to say the sequence of instructions) can be stored in a removable storage medium (such as for example diskette, CD-ROM or DVD-ROM) or not, this storage medium being partially or totally readable by a computer or a processor.

Claims

Process for managing documents implemented within an operating system of a computer (22),
characterized in that , upon detection (21) of a selection of a document (31) by a user (Ut-A), said method implements a notification (23) to said user of at least one document similar to said selected document,
two similar documents both being associated with at least one term belonging to a corpus (11) of terms of interest, defined for said user and personalized according to a profile of centers of interest of said user.

Document management method according to Claim 1, characterized in that the said notification of the said at least one similar document is carried out by displaying, in a window (32) of a user interface, a list of the said at least one similar document and an action button on said at least one similar document.

Document management method according to claim 2, characterized in that said action button displayed depends on a type of selection detected.

Document management method according to claim 2 or 3, characterized in that it also comprises a display of a similarity score (321) assigned to said at least one similar document.

Document management method according to any one of Claims 1 to 4, characterized in that the said at least one similar document is extracted, from an identifier of the said selected document, from a table of similar documents (TAB) placed cyclically updated and loaded into a random access memory of said computer (22).

Document management method according to any one of Claims 1 to 5, characterized in that it comprises a preliminary phase of construction, by said user, of said corpus (11) of terms of interest comprising a set of words and/or or expressions and/or proper names reflecting his areas of interest.

Document management method according to any one of Claims 1 to 6, characterized in that it also comprises a step, implemented cyclically, of indexing (13) documents stored in at least one storage space (12 ) associated with said computer, from said corpus (11) of terms of interest,
said indexing step delivering a two-dimensional matrix (14) of which one dimension corresponds to said indexed documents and one dimension corresponds to said terms of interest of said corpus, a coefficient of said matrix corresponding to a weight representative of a number of occurrences said term of interest in said document.

Computer program product comprising program code instructions for implementing a method according to any one of claims 1 to 7, when executed by a processor.

Document management system integrated into a computer operating system,
characterized in that it comprises:
- a software probe (21) configured to detect a selection of a document by a user,
- a notification module (23) configured to notify said user of at least one document similar to said selected document, two documents being similar when they are both associated with at least one term belonging to a corpus of terms of interest, defined for said user and personalized according to an interest profile of said user

Document management system according to Claim 9, characterized in that it also comprises an access module (20) to a table of similar documents (TAB), the said table associating with a reference document at least one document which is similar, and in that said notification module (23) is configured to interrogate said access module (20) from an identifier of said selected document provided by said software probe (21), and to notify said user of a list of at least one document similar to said selected document, extracted from said table by said access module.

Document management system according to either of Claims 9 and 10, characterized in that it also comprises:
- a user profile management module (10), configured to build said corpus (11) of terms of interest defined for said user,
- a module (13) for indexing documents stored in at least one storage space (12) associated with said computer, from said corpus (11) of terms of interest,
- a module (15) for calculating similarity scores associated with said indexed documents, configured to generate said table of similar documents (TAB).