FR3136298A1

FR3136298A1 - METHOD FOR ASSOCIATION OF DATA TO A DIGITAL DOCUMENT, ASSOCIATED SYSTEM

Info

Publication number: FR3136298A1
Application number: FR2205349A
Authority: FR
Inventors: Barbara DELACROIX; Marvin SANT
Original assignee: Scriptor Artis
Current assignee: Scriptor Artis
Priority date: 2022-06-02
Filing date: 2022-06-02
Publication date: 2023-12-08

Abstract

PROCEDE D’ASSOCIATION D’UNE DONNEE A UN DOCUMENT NUMERIQUE, SYSTEME ASSOCIE Procédé pour associer une donnée produite à un premier document numérique, ledit procédé comprenant : Affichage (AFF1) d’un ensemble de données (ENS1) dans une première fenêtre (F1) ;Actionnement (ACT1) d’une commande numérique pour extraire ledit ensemble de données (ENS1) affiché dans la première fenêtre (F1) ;Exécution (EXEC1) d’un algorithme apprenant générant un texte à partir de l’ensemble de données (ENS1) ;Exécution (EXEC2) d’un second algorithme apprenant pour générer un ensemble de mots-clefs ;Exécution (EXEC3) d’un troisième algorithme apprenant pour générer un ensemble de requêtes textuelles et de localisateurs uniformes de ressources ;Génération (GEN2) d’une seconde fenêtre graphique (F2) superposée à la première fenêtre (F1), ladite seconde fenêtre (F2) comportant un actionneur numérique permettant d’associer ces données à un identifiant d’un document prédéfini. Figure pour l’abrégé : Fig. 1METHOD FOR ASSOCIATION OF DATA TO A DIGITAL DOCUMENT, ASSOCIATED SYSTEM Method for associating data produced with a first digital document, said method comprising: Display (AFF1) of a set of data (ENS1) in a first window (F1 ); Actuation (ACT1) of a digital command to extract said set of data (ENS1) displayed in the first window (F1); Execution (EXEC1) of a learning algorithm generating text from the set of data ( ENS1);Execution (EXEC2) of a second learning algorithm to generate a set of keywords;Execution (EXEC3) of a third learning algorithm to generate a set of textual queries and uniform resource locators;Generation (GEN2) a second graphic window (F2) superimposed on the first window (F1), said second window (F2) comprising a digital actuator making it possible to associate this data with an identifier of a predefined document. Figure for abstract: Fig. 1

Description

METHOD FOR ASSOCIATION OF DATA TO A DIGITAL DOCUMENT, ASSOCIATED SYSTEM

Field of the invention

L’invention concerne un procédé pour associer un ensemble de données à un document numérique tel qu’un document en cours d’élaboration. Le domaine de l’invention se rapporte aux procédés pour assister et aider un utilisateur d’un réseau de données dans la collecte de données en vue de la préparation d’un document numérique. Le domaine de l’invention se rapporte plus particulièrement aux procédés visant à générer des conteneurs de données non homogènes pour leur exploitation ultérieure.The invention relates to a method for associating a set of data with a digital document such as a document in progress. The field of the invention relates to methods for assisting and assisting a user of a data network in the collection of data for the preparation of a digital document. The field of the invention relates more particularly to methods aimed at generating non-homogeneous data containers for their subsequent exploitation.

State of the art

Il existe des conteneurs de données permettant de rassembler ou collecter des données d’un réseau de données pour leur traitement. On connait le « caddie », une fonction généralement offerte sur des sites d’achats en ligne sur internet qui permet de rassembler des données au cours d’une consultation d’une page WEB sans perdre le fil de navigation sur le réseau. Cette fonction a l’avantage de paralléliser des tâches réalisées sur un serveur de données tout en conservant des historiques de navigation afin de faciliter les opérations d’un utilisateur consultant différentes ressources de données. Cette fonction est utile dès lors qu’on souhaite collecter des données en vue d’une unique opération, c’est-à-dire un paiement en ligne d’une somme. Toutefois, elle n’est pas adaptée à la réalisation de différentes opérations de natures différentes manipulant des données hétérogènes.There are data containers for gathering or collecting data from a data network for processing. We know the “shopping cart”, a function generally offered on online shopping sites on the Internet which allows data to be gathered during a consultation of a WEB page without losing the navigation thread on the network. This function has the advantage of parallelizing tasks carried out on a data server while retaining browsing histories in order to facilitate the operations of a user consulting different data resources. This function is useful when you want to collect data for a single transaction, that is to say an online payment of a sum. However, it is not suitable for carrying out different operations of different natures manipulating heterogeneous data.

Il existe également dans les navigateurs disponibles permettant d’afficher des ressources d’un réseau de données des options permettant d’épingler un contenu et de l’intégrer à une liste de favoris. Toutefois, ces solutions ne permettent pas de hiérarchiser le contenu collecté et de différencier ce dernier en fonction d’un document à élaborer ultérieurement.There are also options in the browsers available for displaying data network resources to pin content and integrate it into a list of favorites. However, these solutions do not make it possible to prioritize the content collected and to differentiate it according to a document to be developed later.

Il existe également des conteneurs de données définissant des outils électroniques pour prendre des notes en vue de la rédaction d’un mémoire, toutefois la plupart du temps ces solutions ne permettent pas de structurer le document en fonction des données collectées dans les notes.There are also data containers defining electronic tools for taking notes for writing a dissertation, however most of the time these solutions do not allow the document to be structured according to the data collected in the notes.

Or certains documents numériques nécessitent une collecte de données importantes et de nature hétérogène. L’utilisateur d’un logiciel de traitement de texte perd un temps important à rassembler les données de manière non structurée. Enfin, lors de la collecte de ces données, il est difficile de juger en même temps de la pertinence des données collectées et de leur agencement dans un document.However, certain digital documents require the collection of significant data of a heterogeneous nature. The user of word processing software wastes a lot of time gathering data in an unstructured way. Finally, when collecting this data, it is difficult to judge at the same time the relevance of the data collected and their arrangement in a document.

A cet effet, une fonction de type « caddie », ou une fonction permettant d’alimenter des favoris d’un navigateur ou encore une fonction logicielle de production d’un carnet de notes numériques ne permettent pas de collecter des données tout en archivant une structure de données et de qualifier les ressources de ces données dans un même procédé.For this purpose, a “shopping cart” type function, or a function allowing the addition of browser favorites or even a software function for producing a digital notebook do not allow data to be collected while archiving a data structure and qualify the resources of this data in the same process.

Il existe donc un besoin de définition un conteneur de données permettant de fournir une aide à la structuration d’un document numérique selon la nature des données collectées.There is therefore a need to define a data container to provide assistance in structuring a digital document according to the nature of the data collected.

Selon un premier aspect, l’invention concerne un procédé pour associer une donnée produite à un premier document numérique à partir d’une ressource de données affichée sur un afficheur d’un équipement informatique, ledit procédé comprenant :

Affichage d’un premier ensemble de données accessible depuis au moins un premier localisateur uniforme de ressources dans une première fenêtre d’un navigateur d’un réseau de données ;
Actionnement d’une commande numérique pour extraire un premier sous-ensemble de données affiché dans la première fenêtre et pour enregistrer ledit premier sous-ensemble de données dans une mémoire ;
Exécution d’un premier algorithme apprenant comportant un modèle de langage et étant pré-entrainé avec un premier ensemble de données d’entrainement, ledit modèle de langage comportant un modèle statistique qui modélise la distribution de séquences de symboles discrets dans une langue naturelle, pour générer un premier ensemble de données de sortie à partir du au moins un sous-ensemble de données extrait et à partir d’un premier modèle de données définissant un premier domaine d’entrainement spécifique, ledit premier domaine d’entrainement spécifique comportant un ensemble de données définissant des entrées du premier algorithme et des ensembles de sorties souhaitées du premier algorithme ;
Exécution d’un second algorithme apprenant comportant un second modèle de langage pour générer un ensemble de mots-clefs définissant un second ensemble de sorties à partir d’au moins un sous-ensemble de données extrait ;
Exécution d’un troisième algorithme apprenant comportant un troisième modèle de langage pour générer un ensemble de requêtes textuelles formant des entrées d’un moteur de recherche comportant un ensemble de ressources indexées sur un réseau de données ;
Génération desdites requêtes au sein d’un premier moteur de recherche, récupération d’un ensemble de localisateurs uniforme de ressources retourné par le moteur de recherche et traitement desdites ressources pour les filtrer selon un critère prédéfini, lesdites ressources filtrées définissant un troisième ensemble de sortie ;
Génération d’une seconde fenêtre graphique superposée à la première fenêtre, ladite seconde fenêtre affichant au moins une donnée de chaque ensemble de sorties et un premier actionneur numérique permettant d’enregistrer le premier localisateur uniforme de ressources et lesdites données de sortie produites dans une mémoire, l’actionnement du premier actionneur entrainant la création d’une association des données de sorties à un identifiant utilisateur.

According to a first aspect, the invention relates to a method for associating data produced with a first digital document from a data resource displayed on a display of computer equipment, said method comprising:

Displaying a first set of data accessible from at least one first uniform resource locator in a first window of a browser of a data network;
Actuation of a digital command to extract a first subset of data displayed in the first window and to record said first subset of data in a memory;
Execution of a first learning algorithm comprising a language model and being pre-trained with a first set of training data, said language model comprising a statistical model which models the distribution of discrete symbol sequences in a natural language, for generate a first set of output data from the at least one extracted data subset and from a first data model defining a first specific training domain, said first specific training domain comprising a set of data defining inputs of the first algorithm and sets of desired outputs of the first algorithm;
Executing a second learning algorithm comprising a second language model to generate a set of keywords defining a second set of outputs from at least one extracted subset of data;
Execution of a third learning algorithm comprising a third language model to generate a set of textual queries forming inputs to a search engine comprising a set of resources indexed on a data network;
Generation of said queries within a first search engine, retrieval of a set of uniform resource locators returned by the search engine and processing of said resources to filter them according to a predefined criterion, said filtered resources defining a third output set ;
Generation of a second graphic window superimposed on the first window, said second window displaying at least one piece of data from each set of outputs and a first digital actuator making it possible to record the first uniform resource locator and said output data produced in a memory , the actuation of the first actuator leading to the creation of an association of the output data with a user identifier.

Un avantage est de permettre une navigation sur des pages d’un réseau de données d’intérêt tout en permettant une analyse des données consultées offrant des indications de pertinentes rapidement et un moyen pour collecter ces données sans perdre le fil de la navigation.An advantage is to allow navigation on pages of a data network of interest while allowing analysis of the data consulted offering relevant indications quickly and a means of collecting this data without losing the thread of navigation.

Selon un mode de réalisation, l’actionnement du premier actionneur entraine la création d’une association des données de sorties avec un identifiant d’un espace mémoire et/ou à un premier document numérique prédéfini.According to one embodiment, the actuation of the first actuator results in the creation of an association of the output data with an identifier of a memory space and/or with a first predefined digital document.

Un avantage est de permettre de construire dynamiquement un document en l’annotant au fil d’une discussion avec des informations d’intérêt qui peuvent être organisées et ordonnées selon un plan ou une structure prédéfinie ou pouvant être modifiée selon les données collectées.One advantage is that it makes it possible to dynamically construct a document by annotating it during a discussion with information of interest which can be organized and ordered according to a plan or a predefined structure or which can be modified according to the data collected.

Selon un mode de réalisation, le premier ensemble de données de sortie est un résumé ou un avis en langage naturel du sous-ensemble de données extrait lorsque ce dernier est un texte en langage naturel.According to one embodiment, the first set of output data is a natural language summary or opinion of the extracted data subset when the latter is natural language text.

Un avantage est de permettre de prendre une décision rapidement par un utilisateur consultant de nombreuses ressources de données.One advantage is that it allows a user to make a decision quickly by consulting numerous data resources.

Selon un mode de réalisation, le procédé comprend un traitement du premier sous-ensemble de données pour homogénéiser desdites données.According to one embodiment, the method comprises processing the first subset of data to homogenize said data.

Un avantage est de réduire le bruit et optimiser la sortie de la ou des fonctions apprenantes.An advantage is to reduce noise and optimize the output of the learning function(s).

Selon un mode de réalisation, un second actionneur numérique permet d’accéder à l’ensemble des données d’au moins un ensemble de données de sortie.According to one embodiment, a second digital actuator provides access to all of the data from at least one set of output data.

Un avantage est de permettre un approfondissement à la demande selon le contexte de la donnée produite. La seconde fenêtre permet une aide à la décision sur la retenue d’une donnée pertinente, sa consultation au moment de l’analyse ou une consultation ultérieure.An advantage is to allow in-depth analysis on demand depending on the context of the data produced. The second window provides decision support on retaining relevant data, consulting it at the time of analysis or subsequent consultation.

Selon un mode de réalisation, le procédé comprend la production d’un ensemble de données en entrée du troisième algorithme apprenant définissant un domaine spécifique d’entrainement, ledit troisième algorithme apprenant étant pré-entrainé à partir d’un domaine générique, ledit domaine spécifique d’entrainement comprenant un second modèle de données définissant un second domaine d’entrainement spécifique, ledit second domaine d’entrainement spécifique comportant des textes en langages naturels en entrée et des requêtes en langages naturels définissant des sorties souhaitées du troisième algorithme apprenant.According to one embodiment, the method comprises the production of a set of data as input to the third learning algorithm defining a specific training domain, said third learning algorithm being pre-trained from a generic domain, said specific domain training comprising a second data model defining a second specific training domain, said second specific training domain comprising texts in natural languages as input and queries in natural languages defining desired outputs of the third learning algorithm.

Selon un mode de réalisation, le premier algorithme apprenant, le second algorithme apprenant et le troisième algorithme apprenant, sont des transformeurs de type GPT-3 désignant « generative Pre-Training Transformer ».According to one embodiment, the first learning algorithm, the second learning algorithm and the third learning algorithm are GPT-3 type transformers designating “generative Pre-Training Transformer”.

Un avantage est de permettre de définir des requêtes en langues naturelles avec une syntaxe ou une grammaire propre à un domaine spécifique.One advantage is that it allows queries to be defined in natural languages with syntax or grammar specific to a specific domain.

Selon un mode de réalisation, le premier algorithme apprenant a une dimension de 12288, le second algorithme apprenant a une dimension de 1024 et le troisième algorithme apprenant a une dimension de 4096.According to one embodiment, the first learning algorithm has a dimension of 12288, the second learning algorithm has a dimension of 1024 and the third learning algorithm has a dimension of 4096.

Un avantage est de dissocier pour chaque fonction utilisée la bonne capacité de calcul. La dissociation des dimensions selon les différents algorithmes permet de produire la sortie la plus optimisée en fonction du résultat que l’on cherche à obtenir. Ainsi, une telle configuration offre un bon compromis entre la pertinence des résultats retournés par les trois algorithmes et un temps de calcul réduit.An advantage is to separate the right calculation capacity for each function used. The dissociation of dimensions according to the different algorithms makes it possible to produce the most optimized output depending on the result we seek to obtain. Thus, such a configuration offers a good compromise between the relevance of the results returned by the three algorithms and reduced calculation time.

Selon un mode de réalisation, le procédé comprend une étape de configuration visant à définir une taille des données de sortie du premier algorithme, un nombre maximal de mots-clefs en sortie du second algorithme et un nombre maximal de requêtes en sortie du troisième algorithme et un nombre maximal de localisateurs de ressources pour chaque requête produite par le troisième algorithme.According to one embodiment, the method comprises a configuration step aimed at defining a size of the output data of the first algorithm, a maximum number of keywords output from the second algorithm and a maximum number of queries output from the third algorithm and a maximum number of resource locators for each query produced by the third algorithm.

Un avantage est de permettre de créer une seconde fenêtre comportant des informations riches et variées et analysables pour un utilisateur. L’utilisateur n’est pas noyé sous une multitude d’informations, l’invention permet de produire des données en nombre suffisante pour l’aider à prendre une décision.An advantage is to make it possible to create a second window containing rich and varied information that can be analyzed for a user. The user is not drowned in a multitude of information, the invention makes it possible to produce sufficient data to help him make a decision.

Selon un mode de réalisation, le procédé comprend une étape visant à associer une pluralité de données de sortie à un même premier document numérique.According to one embodiment, the method comprises a step aimed at associating a plurality of output data with the same first digital document.

Un avantage est de fournir une aide à la conception d’un document à partir de plusieurs ressources collectées lors d’une ou plusieurs navigations sur un réseau de données.One advantage is to provide assistance in designing a document from several resources collected during one or more navigations on a data network.

Selon un mode de réalisation, l’association de l’ensemble des données de sortie au premier document numérique comporte une indexation des dites données selon une séquence ordonnée d’une pluralité de données de sortie correspondant à d’autres sous-ensembles de données du même ensemble de données ou d’un autre ensemble de données.According to one embodiment, the association of all the output data with the first digital document comprises an indexing of said data according to an ordered sequence of a plurality of output data corresponding to other subsets of data of the same dataset or another dataset.

Selon un second aspect, l’invention concerne un système comportant au moins un serveur de données comportant une mémoire et des moyens de calculs permettant de définir un espace de travail dans lequel au moins un document numérique en cours d’élaboration est enregistré et des données de profils utilisateur, le système comportant en outre un terminal électronique muni d’un afficheur, d’une mémoire et d’un calculateur, ledit système étant configuré pour réaliser les étapes du procédé de l’invention.According to a second aspect, the invention relates to a system comprising at least one data server comprising a memory and calculation means making it possible to define a workspace in which at least one digital document being developed is recorded and data user profiles, the system further comprising an electronic terminal provided with a display, a memory and a calculator, said system being configured to carry out the steps of the method of the invention.

Brief description of the figures

D’autres caractéristiques et avantages de l’invention ressortiront à la lecture de la description détaillée qui suit, en référence aux figures annexées, qui illustrent :Other characteristics and advantages of the invention will emerge on reading the detailed description which follows, with reference to the appended figures, which illustrate:

: un exemple d’étapes mises en œuvre selon un exemple de réalisation du procédé ; : an example of steps implemented according to an example of carrying out the process;

: un exemple de représentation d’une fenêtre d’un navigateur et de représentation d’une fenêtre produite par le procédé de l’invention ; : an example of representation of a browser window and representation of a window produced by the method of the invention;

: un exemple d’un système comportant un serveur de données pour exécuter les fonctions apprenantes. : an example of a system including a data server to execute the learning functions.

Description of the invention

La représente un exemple de réalisation des différentes étapes du procédé de l’invention. Selon un mode de réalisation, lors d’une navigation sur un réseau de données, tel que le réseau internet, l’utilisateur accède à un contenu qui est affiché sur un afficheur d’un terminal électronique tel qu’un ordinateur personnel, un téléphone mobile tel qu’un Smartphone, un casque de réalité virtuelle, un dispositif d’affichage avec une fonction de réalité augmentée ou encore une tablette numérique. Le contenu peut être stocké sur une ou plusieurs mémoires distances de serveurs de données. Une URL, désignant un localisateur uniforme de ressource permet de récupérer des données à partir d’une requête de données émise auprès d’au moins un serveur de données.There represents an example of carrying out the different stages of the process of the invention. According to one embodiment, when browsing a data network, such as the Internet, the user accesses content which is displayed on a display of an electronic terminal such as a personal computer, a telephone mobile such as a Smartphone, a virtual reality headset, a display device with an augmented reality function or even a digital tablet. The content can be stored on one or more remote memories of data servers. A URL, designating a uniform resource locator, makes it possible to retrieve data from a data request sent to at least one data server.

Le procédé de l’invention comprend donc une première étape d’affichage AFF₁d’un contenu numérique provenant d’au moins un serveur distant. Cet affichage est préférentiellement réalisé à partir d’un navigateur, tel qu’un navigateur WEB. Il s’agit plus généralement d’un logiciel permettant de consulter et afficher un contenu structuré provenant d’un réseau de données tel que le Word Wide Web. Un navigateur est un client http.The method of the invention therefore comprises a first step of displaying AFF ₁ of digital content coming from at least one remote server. This display is preferably produced from a browser, such as a WEB browser. More generally, it is software allowing you to consult and display structured content coming from a data network such as the Word Wide Web. A browser is an http client.

Le procédé comprend une étape visant à sélectionner toute ou partie du contenu. Cette étape est notée ACT₁sur la . Elle peut être activée à partir d’un bouton d’un navigateur ou à partir d’un menu accessible par le navigateur et offrant des commandes permettant d’accéder et interpréter les données affichées. Le bouton peut être par exemple un module d’extension, appelée "plug-in", du navigateur afin de permettre d’accéder à une nouvelle fonction logicielle telle que celle pouvant être exécutée par le procédé de l’invention.The method includes a step aimed at selecting all or part of the content. This stage is rated ACT ₁ on the . It can be activated from a browser button or from a menu accessible by the browser and offering commands allowing access and interpretation of the data displayed. The button can for example be an extension module, called a "plug-in", of the browser in order to provide access to a new software function such as that which can be executed by the method of the invention.

Selon un mode de réalisation, afin de sélectionner des données comprises dans une portion de la page affichée, notées sous-ensemble de données SSENS₁, une pluralité de boutons agencée dans différentes zones du navigateur permet de sélectionner ledit sous-ensemble de données SSENS₁. Le procédé de l’invention se rapporte plus particulièrement au cas où les données SSENS₁sont des données représentant des symboles d’une langue naturelle.According to one embodiment, in order to select data included in a portion of the displayed page, denoted SSENS ₁ data subset, a plurality of buttons arranged in different areas of the browser makes it possible to select said SSENS ₁ data subset . The method of the invention relates more particularly to the case where the SSENS ₁ data are data representing symbols of a natural language.

La représente un exemple d’un navigateur comportant différentes zones comportant chacune des données. Les différentes zones définissent des contenus numériques de nature différente, il peut y avoir des zones de textes, des titres, des menus, des images, des pieds de page ou des entêtes, etc. La illustre quelques sous-ensembles de données SSENS₁, SSENS₂, SSENS₃, SSENS₄ayant différentes caractéristiques selon leur agencement, leur police, etc. Dans le cas de la , l’utilisateur sélectionne le sous-ensemble SSENS₁qui comprend différents paragraphes représentés par des rectangles sur la figure. La sélection de l’ensemble des données du sous-ensemble SSENS1 aboutit à la production de la fenêtre F₂grâce au procédé de l’invention. Un avantage de la sélection d’un sous-ensemble de contenu numérique est d’obtenir un traitement des données par les algorithmes ci-après décrits plus pertinents. En effet, seules les données d’intérêt sont traitées par les algorithmes d’intelligence artificielle ce qui permet un traitement optimisé et une diminution du bruit dans la production des données de sortie.There represents an example of a browser with different areas each containing data. The different areas define digital content of a different nature; there may be text areas, titles, menus, images, footers or headers, etc. There illustrates some subsets of SSENS ₁ , SSENS ₂ , SSENS ₃ , SSENS ₄ data having different characteristics depending on their layout, font, etc. In the case of the , the user selects the SSENS ₁ subset which includes different paragraphs represented by rectangles in the figure. The selection of all the data from the SSENS1 subset results in the production of the window F ₂ thanks to the method of the invention. An advantage of selecting a subset of digital content is to obtain more relevant data processing by the algorithms described below. In fact, only the data of interest are processed by the artificial intelligence algorithms, which allows optimized processing and a reduction in noise in the production of output data.

Selon un exemple, afin de sélectionner le contenu numérique définissant les données du sous-ensemble SSENS₁qui est affiché au sein du navigateur, un outil de sélection tel qu’une souris ou un écran tactile permet de sélectionner une région donnée du navigateur. Selon un mode de réalisation, les zones sont définies par des balises ou un langage interprétable par une machine et peuvent être délimitées automatiquement à partir d’une fonction interprétant le langage encodant les données dans le navigateur. Ainsi selon cette option, une pluralité de zones peut être segmentée automatiquement. Un intérêt est de proposer une analyse de portions d’intérêt de la page ayant une cohérence structurelle considérée ensemble, telle qu’une succession de paragraphes sous un titre. Un autre avantage est de ne pas considérer des données d’une autre zone n’ayant pas d’intérêt pour l’utilisateur ou ayant un rapport éloigné avec un autre contenu d’une autre zone.According to one example, in order to select the digital content defining the data of the SSENS ₁ subset which is displayed within the browser, a selection tool such as a mouse or a touch screen makes it possible to select a given region of the browser. According to one embodiment, the zones are defined by tags or a machine-interpretable language and can be delimited automatically from a function interpreting the language encoding the data in the browser. Thus according to this option, a plurality of zones can be segmented automatically. An advantage is to propose an analysis of portions of interest of the page having a structural coherence considered together, such as a succession of paragraphs under a title. Another advantage is not to consider data from another zone having no interest for the user or having a distant relationship with other content from another zone.

Lorsqu’un contenu numérique est un texte numérique comportant éventuellement des métadonnées, les données sélectionnées sont enregistrées dans une mémoire. Les métadonnées peuvent correspondre à des liens, URL, une langue, des balises d’un langage interprété par le navigateur ou d’autres données se rapportant aux données affichées. La mémoire peut être une mémoire du terminal électronique ou une mémoire d’un serveur distant.When digital content is digital text possibly including metadata, the selected data is recorded in a memory. Metadata may correspond to links, URLs, language, language tags interpreted by the browser or other data relating to the data displayed. The memory can be a memory of the electronic terminal or a memory of a remote server.

Le procédé de l’invention comprend une étape de pré-traitement visant à traiter les données sélectionnées. Selon un mode de réalisation, ces traitements comprennent des actions automatiques visant à homogénéiser les données notamment des suppressions de mises en forme telles que des soulignements, majuscules, police, style d’un texte ou encore la suppression de retours à la ligne. Le texte ainsi traité est enregistré de sorte à définir une entrée d’un algorithme d’intelligence artificielle.The method of the invention comprises a pre-processing step aimed at processing the selected data. According to one embodiment, these processing operations include automatic actions aimed at homogenizing the data, in particular deletions of formatting such as underlining, capital letters, font, text style or even the deletion of line breaks. The text thus processed is recorded so as to define an input to an artificial intelligence algorithm.

Premier algorithme : génération de résumé ou d’avisFirst algorithm: generation of summary or opinions

Selon un exemple une première fonction apprenante ALGO₁est mise en œuvre pour traiter le texte sélectionné. La première fonction apprenante est également appelée premier algorithme apprenant dans la mesure où l’algorithme est instancié avec des variables physiques et un paramétrage pour définir une fonction. L’instanciation est par exemple définie par des tailles souhaitées de données en sortie et le paramétrage peut être défini par un apprentissage spécifique d’un modèle générique réalisé à partir d’un domaine spécifique d’apprentissage.According to an example, a first ALGO ₁ learning function is implemented to process the selected text. The first learner function is also called the first learner algorithm since the algorithm is instantiated with physical variables and parameterization to define a function. The instantiation is for example defined by desired sizes of output data and the parameterization can be defined by specific learning of a generic model carried out from a specific learning domain.

La première fonction apprenante peut être exécutée par exemple au moyen d’un service proposé par un site internet par exemple sous la forme d’une API, désignant une interface de programmation d’application. La première fonction apprenante comporte préférentiellement un modèle d’apprentissage profond. Selon un exemple, la première fonction apprenante est apprise avec un modèle de langage MOD_LANG₁qui comporte un modèle statistique qui modélise la distribution de séquences de symboles discrets dans une langue naturelle. Selon un exemple, il est pré-entrainé avec un premier ensemble de données d’entrainement TR₁. Dans ce dernier cas de figure, le modèle pré-entrainé est un système BERT, désignant dans la terminologie anglosaxonne « Bidirectional Encoder Représentations from Transformers » ou un système GPT, désignant dans la terminologie anglosaxonne « Generative Pre-training Transformer ». Selon un exemple de réalisation, le modèle de langage GPT-3 est mis en œuvre dans l’invention. Le modèle DAVINCI de GPT-3 de dimensions 12288 peut être utilisé pour obtenir la production d’un résumé ou d’un avis à partir d’un texte fourni en entrée du modèle. Les données d’entrainement permettant le pré-entrainement peuvent par exemple provenir d’un corpus Wikipédia. Selon un autre exemple, le corpus Common Crawl comprenant un grand nombre d'unités textuelles sous-lexicales encodées par l'algorithme BPE peut être utilisé et/ou encore le corpus WebText2 et/ou le corpus Books1 ou Books2.The first learning function can be executed for example by means of a service offered by a website, for example in the form of an API, designating an application programming interface. The first learning function preferably includes a deep learning model. According to one example, the first learning function is learned with a language model MOD_LANG ₁ which includes a statistical model which models the distribution of sequences of discrete symbols in a natural language. According to one example, it is pre-trained with a first set of training data TR ₁ . In this last scenario, the pre-trained model is a BERT system, designating in the English terminology “Bidirectional Encoder Representations from Transformers” or a GPT system, designating in the English terminology “Generative Pre-training Transformer”. According to an exemplary embodiment, the GPT-3 language model is implemented in the invention. The DAVINCI GPT-3 model of dimensions 12288 can be used to obtain the production of a summary or an opinion from a text provided as input to the model. The training data allowing pre-training can for example come from a Wikipedia corpus. According to another example, the Common Crawl corpus comprising a large number of sub-lexical textual units encoded by the BPE algorithm can be used and/or the WebText2 corpus and/or the Books1 or Books2 corpus.

Selon un mode de réalisation, le procédé de l’invention permet de définir deux entrées de cet algorithme. La première entrée correspond aux données sélectionnées et éventuellement prétraitées. Cette première entrée peut donc être un texte en anglais d’une pluralité de lignes comportant une pluralité de phrases.According to one embodiment, the method of the invention makes it possible to define two inputs to this algorithm. The first entry corresponds to the selected and possibly preprocessed data. This first entry can therefore be an English text of a plurality of lines comprising a plurality of sentences.

La seconde entrée comprend un ensemble de modèles définissant un domaine d’entrainement spécifique TR₂comportant des entrées du premier algorithme et des sorties souhaitées dudit algorithme. Selon un mode de réalisation, l’ensemble des modèles comprend des couples d’entrée et de sortie de la fonction apprenante, ladite sortie correspondante à ladite entrée. Chaque entrée comprend un texte, de préférence en anglais d’une longueur donnée. Chaque sortie comprend un texte, de préférence en anglais, dont la longueur peut dépendre du résultat qu’on souhaite. Selon un mode de réalisation, la longueur du texte en sortie peut être un paramètre du premier algorithme apprenant ALGO₁.The second input comprises a set of models defining a specific training domain TR ₂ comprising inputs of the first algorithm and desired outputs of said algorithm. According to one embodiment, all of the models comprise input and output pairs of the learning function, said output corresponding to said input. Each entry includes text, preferably in English of a given length. Each output includes a text, preferably in English, the length of which may depend on the desired result. According to one embodiment, the length of the output text can be a parameter of the first learning algorithm ALGO ₁ .

La sortie du second algorithme apprenant, notée ENS_A, peut correspondre à un résumé du texte délivré en entrée. Selon un autre exemple, la sortie peut correspondre à un avis du texte délivré en entrée. Les sorties souhaitées du domaine d’entrainement spécifique TR₂peuvent être des textes révisés obtenus avec le premier algorithme ALGO₁pré-entrainé. Selon un autre cas, les sorties souhaitées du domaine d’entrainement spécifique TR₂sont des données produites « à la main », c’est-à-dire générées par un ou plusieurs individus. Selon un autre exemple, les sorties souhaitées comprennent un premier ensemble de sorties générées par un ou plusieurs individus et un second ensemble de sorties générées par le premier algorithme ALGO₁et modifié par un individu.The output of the second learning algorithm, denoted ENS _A , can correspond to a summary of the text delivered as input. According to another example, the output may correspond to a notice of the text delivered as input. The desired outputs from the specific training domain TR ₂ can be revised texts obtained with the first pre-trained ALGO ₁ algorithm. According to another case, the desired outputs from the specific training domain TR ₂ are data produced “by hand”, that is to say generated by one or more individuals. According to another example, the desired outputs include a first set of outputs generated by one or more individuals and a second set of outputs generated by the first ALGO ₁ algorithm and modified by an individual.

L’avantage de l’utilisation d’un domaine d’entrainement spécifique TR₂est d’obtenir des résultats plus pertinents selon le cas d’espèce souhaité. Le domaine d’entrainement spécifique peut se rapporter à un domaine technique ou un domaine sémantique ou un mixte des deux.The advantage of using a specific TR ₂ training domain is to obtain more relevant results depending on the desired case. The specific training domain may relate to a technical domain or a semantic domain or a mixture of the two.

Second algorithm: concept generation

Selon un mode de réalisation de l’invention, un second algorithme apprenant ALGO₂est exécuté à partir de la même entrée, c’est-à-dire les données sélectionnées SSENS₁et possiblement prétraitées. Le second algorithme apprenant est également appelé seconde fonction apprenante dans la mesure où l’algorithme est instancié avec des variables physiques et un paramétrage. L’instanciation est par exemple définie par un nombre de concepts ou mots-clefs souhaités en sortie et le paramétrage de l’algorithme peut être défini par un apprentissage spécifique d’un modèle générique réalisé à partir d’un domaine spécifique d’apprentissage. Ce second algorithme apprenant peut-être de type RNN désignant un récurrent neural network ou encore un LSTM désignant « Long short-term memory qui est un réseau de neurones. Selon un autre mode de réalisation, le second algorithme apprenant est un transformeur de type BERT ou GPT, tel que GPT-3. Le modèle ADA de GPT-3 de dimensions 1024 peut être utilisé pour obtenir un ensemble de mots-clefs à partir d’un texte fournir en entrée du modèle.According to one embodiment of the invention, a second algorithm learning ALGO ₂ is executed from the same input, that is to say the data selected SSENS ₁ and possibly preprocessed. The second learning algorithm is also called the second learning function since the algorithm is instantiated with physical variables and a parameter setting. The instantiation is for example defined by a number of concepts or keywords desired as output and the parameterization of the algorithm can be defined by specific learning of a generic model carried out from a specific learning domain. This second learning algorithm may be of the RNN type designating a recurrent neural network or even an LSTM designating “Long short-term memory which is a neural network. According to another embodiment, the second learning algorithm is a BERT or GPT type transformer, such as GPT-3. The GPT-3 ADA model of dimensions 1024 can be used to obtain a set of keywords from a text provided as input to the model.

Le second algorithme apprenant ALGO₂est entrainé de sorte à produire en sortie une liste de concepts, c’est-à-dire de mots-clefs, relative au texte sélectionné SSENS₁. Selon un exemple, la seconde fonction apprenante est apprise avec un modèle de langage MOD_LANG₂. De manière préférée, le second modèle de langue MOD_LANG₂est identique au premier modèle de langue MOD_LANG₁. La sortie du second algorithme apprenant ALGO₂est notée ENS_B. A titre d’exemple, les concepts peuvent être produits en fonction de leur occurrence dans le texte et/ou à partir d’une sémantique prise ensemble du texte et donc du ou des domaine(s) du texte. Selon un exemple, les concepts peuvent être produits en considérant un dictionnaire et/ou un thesaurus et/ou une ontologie. Le dictionnaire permet notamment d’extraire les différentes racines d’un terme voir des synonymes en cas d’un dictionnaire de synonymes. Le thesaurus permet d’extraire les termes d’un domaine donné et l’ontologie permet de hiérarchiser des notions entre elles selon un graphe connecté d’entités et de liens permettant de structurer des notions entre elles.The second learning algorithm ALGO ₂ is trained to produce as output a list of concepts, that is to say key words, relating to the selected text SSENS ₁ . According to one example, the second learning function is learned with a MOD_LANG ₂ language model. Preferably, the second language model MOD_LANG ₂ is identical to the first language model MOD_LANG ₁ . The output of the second ALGO ₂ learning algorithm is denoted ENS _B. For example, concepts can be produced according to their occurrence in the text and/or from semantics taken together from the text and therefore from the domain(s) of the text. According to one example, the concepts can be produced by considering a dictionary and/or a thesaurus and/or an ontology. The dictionary allows you to extract the different roots of a term or even synonyms in the case of a synonym dictionary. The thesaurus makes it possible to extract terms from a given domain and the ontology makes it possible to prioritize notions among themselves according to a connected graph of entities and links allowing notions to be structured between them.

Le second algorithme apprenant ALGO₂correspond à la fonction permettant de produire en sortie une liste de concepts. L’entrainement du second algorithme peut être réalisé en prenant en considération dans l’apprentissage les dictionnaires, thesaurus ou ontologie relative à l’ensemble des séquences de symboles définissant des mots d’une langue donnée.The second ALGO ₂ learning algorithm corresponds to the function allowing a list of concepts to be produced as output. The training of the second algorithm can be carried out by taking into consideration in learning the dictionaries, thesaurus or ontology relating to all the sequences of symbols defining words of a given language.

Selon un autre exemple, l’apprentissage est réalisé de manière supervisée en labélisant chaque texte utilisé dans l’apprentissage avec des concepts donnés. Selon un autre exemple, l’apprentissage est réalisé de manière non supervisée, c’est-à-dire que les textes utilisés pour l’apprentissage sont classés en sortie de l’algorithme dans des groupes qui sont ensuite annotés.According to another example, learning is carried out in a supervised manner by labeling each text used in learning with given concepts. According to another example, learning is carried out in an unsupervised manner, that is to say that the texts used for learning are classified at the output of the algorithm into groups which are then annotated.

Les concepts produits par le second algorithme ALGO₂à partir du texte sélectionné SSENS₁dans le navigateur sont générés dans une même fenêtre F₂que la sortie du premier algorithme ALGO₁.The concepts produced by the second ALGO ₂ algorithm from the selected text SSENS ₁ in the browser are generated in the same window F ₂ as the output of the first ALGO ₁ algorithm.

Third algorithm: URL generation

Enfin, un troisième algorithme apprenant ALGO₃est exécuté à partir du texte sélectionné SSENS₁ dans le navigateur_.Le troisième algorithme ALGO3 comprend une mise en œuvre d’une intelligence artificielle permettant de produire en sortie des requêtes en langage naturel produite à partir du texte en entrée SSENS₁.Finally, a third algorithm learning ALGO₃is executed from the selected text SSENS₁ in the browser_.The third ALGO3 algorithm includes an implementation of artificial intelligence allowing the output of natural language queries produced from the input text SSENS₁.

Le troisième algorithme apprenant est également appelé troisième fonction apprenante dans la mesure où l’algorithme est instancié avec des variables physiques et un paramétrage donné. L’instanciation est par exemple définie par un nombre souhaité de requêtes produites en sortie et le paramétrage peut être défini par un apprentissage spécifique d’un modèle générique réalisé à partir d’un domaine spécifique d’apprentissage.The third learning algorithm is also called the third learning function since the algorithm is instantiated with physical variables and a given parameterization. The instantiation is for example defined by a desired number of queries produced as output and the parameterization can be defined by specific learning of a generic model carried out from a specific learning domain.

Selon un exemple, la troisième fonction apprenante est apprise avec un troisième modèle de langage MOD_LANG₃. De manière préférée, le troisième modèle de langue MOD_LANG₃est identique au premier modèle de langue MOD_LANG₁ouau second modèle de langue MOD_LANG₂.According to one example, the third learner function is learned with a third language model MOD_LANG₃. Preferably, the third language model MOD_LANG₃is identical to the first language model MOD_LANG₁Orto the second language model MOD_LANG₂.

Un tel algorithme apprenant est par exemple mis en œuvre à partir d’un réseau de neurones de type RNN désignant un récurrent neural network ou encore un LSTM désignant « Long short-term memory qui est un réseau de neurones. Selon un autre mode de réalisation, le troisième algorithme apprenant ALGO₃est un transformeur de type BERT ou GPT, tel que GPT-3. Ce troisième algorithme ALGO₃peut être pré-entrainé de manière à produire des requêtes en langage naturel à partir d’un texte en entrée. Le pré-entrainement peut être réalisé avec un corpus de données représentant une pluralité de textes en langage naturel dans un domaine générique noté TR₃. Un entrainement réalisé à partir d’un domaine spécifique TR₄peut être réalisé à partir du texte sélectionné SSENS₁selon le contexte du texte, des métadonnées du texte sélectionné SSENS₁ou de données présentes dans la page affichée et non sélectionnée comportant le texte SSENS₁. Le domaine spécifique d’entrainement TR₄peut être défini à partir des actions produites par un utilisateur vis-à-vis des premiers résultats retournés à partir du texte sélectionné SSENS₁ou d’un autre texte sélectionné SSENS_i.Such a learning algorithm is for example implemented from a neural network of the RNN type designating a recurrent neural network or even an LSTM designating “Long short-term memory which is a neural network. According to another embodiment, the third ALGO ₃ learning algorithm is a BERT or GPT type transformer, such as GPT-3. This third ALGO ₃ algorithm can be pre-trained to produce natural language queries from input text. The pre-training can be carried out with a corpus of data representing a plurality of natural language texts in a generic domain denoted TR ₃ . Training carried out from a specific domain TR ₄ can be carried out from the selected text SSENS ₁ depending on the context of the text, the metadata of the selected text SSENS ₁ or data present in the displayed and unselected page containing the text SSENS ₁ . The specific training domain TR ₄ can be defined from the actions produced by a user with respect to the first results returned from the selected text SSENS ₁ or from another selected text SSENS _i .

Selon un mode de réalisation, les requêtes produites comprennent chacune une séquence de symboles définissant des phrases en langage naturel. Le procédé comprend une étape visant la génération de ces requêtes auprès d’un moteur de recherche MR₁. Le moteur de recherche MR₁est par exemple une application permettant à un utilisateur d'effectuer une recherche sur un réseau de données. Les résultats retournés sont généralement des ressources de données extraites et renvoyées à partir d'une requête composée de termes en langage naturel. Les ressources peuvent notamment être des pages web, des articles de forums, des images, des vidéos, des fichiers, des ouvrages, des sites pédagogiques, des applications, des logiciels open source. Le moteur de recherche comprend préférentiellement un index du contenu produit préalablement à la recherche effectuée à partir d’au moins une requête produite par le troisième algorithme ALGO₃.According to one embodiment, the queries produced each include a sequence of symbols defining sentences in natural language. The method includes a step aimed at generating these queries from a search engine MR ₁ . The search engine MR ₁ is for example an application allowing a user to carry out a search on a data network. The results returned are typically data resources extracted and returned from a query composed of natural language terms. The resources may in particular be web pages, forum articles, images, videos, files, books, educational sites, applications, open source software. The search engine preferably includes an index of the content produced prior to the search carried out from at least one query produced by the third ALGO ₃ algorithm.

Un moteur de recherche MR₁renvoie généralement une liste de localisateurs des ressources identifiées sous la forme d’une page de résultats ordonnés selon un critère de pertinence. Le critère de pertinence est généralement calculé à partir d’un score représentant une probabilité calculée en comparant la requête et le contenu d’une ressource.An MR ₁ search engine generally returns a list of identified resource locators in the form of a page of results ordered according to a relevance criterion. The relevance criterion is generally calculated from a score representing a probability calculated by comparing the query and the content of a resource.

Selon un mode de réalisation, une pluralité de moteurs de recherche MR1, MR2, etc. sont utilisés pour produire des résultats aux différentes requêtes générés par le troisième algorithme ALGO₃.According to one embodiment, a plurality of search engines MR1, MR2, etc. are used to produce results for different queries generated by the third ALGO ₃ algorithm.

Le procédé de l’invention permet de sélectionner une partie des localisateurs de ressources produites afin de les rassembler au sein d’une liste générée dans la seconde fenêtre F₂.The method of the invention makes it possible to select a portion of the resource locators produced in order to bring them together within a list generated in the second window F ₂ .

Afin de sélectionner une partie uniquement des localisateurs retournés par un moteur de recherche obtenus au moyen d’une ou plusieurs requêtes, des filtres peuvent être configurés. Les filtres permettent de supprimer par exemple des types de localisateurs selon leur catégorie ou selon l’adresse du localisateur. Le procédé de l’invention permet également de configurer l’assignation d’une priorité au localisateur par exemple pour les trier et les sélectionner par importante. Le filtrage peut être réalisé à partir d’un critère prédéfini et/ou d’un dictionnaire contenant une liste de localisateurs d’intérêt.In order to select only part of the locators returned by a search engine obtained by means of one or more queries, filters can be configured. Filters allow you to delete, for example, types of locators according to their category or according to the address of the locator. The method of the invention also makes it possible to configure the assignment of a priority to the locator, for example to sort and select them by importance. Filtering can be carried out using a predefined criterion and/or a dictionary containing a list of locators of interest.

Selon un mode de réalisation, il est possible de retenir un nombre restreint de localisateurs par requête selon un critère de pertinence et selon le nombre de requêtes générées.According to one embodiment, it is possible to retain a restricted number of locators per request according to a relevance criterion and according to the number of requests generated.

Selon un mode de réalisation, le procédé de l’invention comprend une étape qui vise à comptabiliser les liens activés ou consultés par un utilisateur. Cette étape permet notamment de générer un domaine spécifique d’entrainementAccording to one embodiment, the method of the invention comprises a step which aims to count the links activated or consulted by a user. This step makes it possible in particular to generate a specific training area

F2 window

Le procédé de l’invention comprend une étape GEN₂visant à générer une fenêtre F₂. Ainsi, la fenêtre F₂produite en superposition de la fenêtre F₁comprend un ensemble de données rassemblées issues des sorties ENS_A, ENS_Bet ENS_Cdes trois algorithmes ALGO₁, ALGO₂et ALGO₃et de la génération des requêtes GEN₁. La seconde fenêtre F₂est avantageusement générée en superposition de la première fenêtre F₁, une telle fenêtre est désignée dans la littérature technique anglo-saxonne « pop-up ». A partir des données synthétisées dans la fenêtre F₂, un utilisateur est capable de rapidement prendre une action quant au contenu sélectionné, en effet, le premier ensemble ENS₁lui permet de rapidement comprendre le sujet ou l’avis du texte, les mots-clefs permettent de représenter un champ sémantique relatif au texte sélectionné et les localisateurs de ressources permettent de rapidement accéder à un contenu lié au texte sélectionné. En d’autres termes un utilisateur d’un réseau de données possédant un navigateur et consultant des ressources de données peut très rapidement qualifier la donnée qui est consultée.The method of the invention comprises a step GEN ₂ aimed at generating a window F ₂ . Thus, the window F ₂ produced in superposition of the window F ₁ includes a set of data collected from the outputs ENS _A , ENS _B and ENS _C of the three algorithms ALGO ₁ , ALGO ₂ and ALGO ₃ and the generation of queries GEN ₁ . The second window F ₂ is advantageously generated superimposed on the first window F ₁ , such a window is referred to in the Anglo-Saxon technical literature as “pop-up”. From the data synthesized in the window F ₂ , a user is able to quickly take an action regarding the selected content, in fact, the first set ENS ₁ allows him to quickly understand the subject or the opinion of the text, the words- keys allow you to represent a semantic field relating to the selected text and resource locators allow you to quickly access content linked to the selected text. In other words, a user of a data network with a browser and consulting data resources can very quickly qualify the data that is consulted.

Le procédé de l’invention permet non seulement d’afficher ces données dans une fenêtre F₂en superposition de la fenêtre F₁mais également de proposer des actions vis-à-vis des données collectées.The method of the invention not only makes it possible to display this data in a window F ₂ superimposed on the window F ₁ but also to propose actions with regard to the data collected.

Une première action consiste à enregistrer les données des ensembles ENS_A, ENS_Bet ENS_Cdans une mémoire. Selon ce mode de réalisation, la mémoire peut être celle d’un serveur distant ou celle d’un terminal électronique local. Un premier bouton numérique B₁permettant de générer une commande pour enregistrer le contenu rassemblé dans la seconde fenêtre F₂peut être agencé au sein de la seconde fenêtre F₂. De manière préférée, les données ainsi enregistrées des trois ensembles sont associées à un profil utilisateur U₁. Le profil utilisateur peut lui-même être associé à un identifiant utilisateur et un mot de passe d’un espace de travail offrant des fonctions permettant d’organiser un ou plusieurs documents numériques D₁, D₂, etc. Selon un exemple, l’espace de travail d’un utilisateur peut comprendre une pluralité de documents numériques en cours d’élaboration telle qu’une thèse, un mémoire technique, un projet de publication, un document de suivi d’un stagiaire ou d’une personne réalisant une thèse, etc. Selon un exemple de réalisation, les données collectées au sein de la fenêtre F₂peuvent être associées à un document numérique D₁.A first action consists of recording the data from the ENS _A , ENS _B and ENS _C sets in a memory. According to this embodiment, the memory can be that of a remote server or that of a local electronic terminal. A first digital button B ₁ making it possible to generate a command to save the content collected in the second window F ₂ can be arranged within the second window F ₂ . Preferably, the data thus recorded from the three sets are associated with a user profile U ₁ . The user profile can itself be associated with a user identifier and a password of a workspace offering functions making it possible to organize one or more digital documents D ₁ , D ₂ , etc. According to one example, a user's workspace may include a plurality of digital documents currently being developed such as a thesis, a technical dissertation, a publication project, a follow-up document from a trainee or 'a person completing a thesis, etc. According to an exemplary embodiment, the data collected within the window F ₂ can be associated with a digital document D ₁ .

On entend par document numérique au sens le plus large un conteneur de données tel qu’un fichier numérique, par exemple comportant un format prédéfini, tel que le format .doc, .txt, .pdf ou encore une base de données permettant d’agencer des données de manière organisée au sein d’une mémoire.By digital document in the broadest sense we mean a data container such as a digital file, for example comprising a predefined format, such as the .doc, .txt, .pdf format or even a database making it possible to arrange data in an organized manner within a memory.

Une seconde action permet de consulter les données DATA₁et d’approfondir une exploration sur le réseau de données de certains ensembles de données, notamment des données des ensembles ENS₁et ENS₃.Lorsque le texte de l’ensemble ENS₁est supérieur à une taille donnée, la consultation des données de l’ensemble ENS₁permet soit d’agrandir la fenêtre F₂soit de générer une nouvelle fenêtre de visualisation de l’intégralité des données de l’ensemble ENS₁. Lorsqu’un certain nombre de localisateurs URL_isont présents dans la fenêtre F₂mais que seul un nombre limité est affiché dans la fenêtre F₂, une commande permet d’agrandir la fenêtre F₂ou d’afficher la liste des localisateurs dans une autre fenêtre. Selon un exemple, une simple activation d’un localisateur, par exemple s’il comprend un lien hypertexte permet d’ouvrir un nouvel onglet du navigateur ou une nouvelle fenêtre du navigateur permettant d'afficher la ressource accessible depuis le localisateur.A second action allows you to consult the DATA data₁and to deepen an exploration on the data network of certain data sets, in particular data from the ENS sets₁and ENS₃.When the text of the ENS set₁is greater than a given size, consultation of the data from the ENS assembly₁allows you to enlarge the window F₂or to generate a new visualization window of all the data from the ENS set₁. When a number of URL locators_iare present in window F₂but only a limited number is displayed in window F₂, a command allows you to enlarge the window F₂or display the list of locators in another window. According to one example, a simple activation of a locator, for example if it includes a hyperlink, makes it possible to open a new browser tab or a new browser window allowing the resource accessible from the locator to be displayed.

Association des données à un utilisateur et/ou un documentAssociation of data with a user and/or a document

Selon un mode de réalisation, le procédé de l’invention comprend une étape d’association du contenu collecté DATA₁disponible dans la fenêtre F₂à un identifiant utilisateur ID_U1. Selon un autre exemple, le procédé de l’invention comprend une étape d’association du contenu collecté DATA₁disponible dans la fenêtre F₂. Selon un exemple de réalisation, un utilisateur peut associer le contenu collecté dans la fenêtre F₂à un identifiant utilisateur U₁puis dans un second temps, il est possible d’associer ce contenu DATA₁à un document numérique spécifique D₁de son espace de travail. Selon un exemple de réalisation, le procédé comprend une étape permettant de sélectionner tout ou partie du contenu de la fenêtre F₂et de l’enregistrer dans l’espace de travail de l’utilisateur U₁. Ainsi, grâce au procédé de l’invention, un utilisateur U₁peut uniquement décider d’enregistrer les données ENS_Aet de les associer à un document numérique D₁sans avoir à enregistrer l’ensemble des données DATA₁. Selon un autre exemple, le procédé de l’invention offre une fonction permettant d’enregistrer les données DATA₁de chaque ensemble ENS_A, ENS_Bet ENS_Cdans un premier temps puis d’en sélectionner une partie, par exemple les données de l’ensemble ENS_Cafin de les associer à un document numérique D₁.According to one embodiment, the method of the invention comprises a step of associating the collected content DATA ₁ available in the window F ₂ with a user identifier ID _U1 . According to another example, the method of the invention comprises a step of associating the collected content DATA ₁ available in the window F ₂ . According to an exemplary embodiment, a user can associate the content collected in the window F ₂ with a user identifier U ₁ then in a second step, it is possible to associate this DATA ₁ content with a specific digital document D ₁ of its space of work. According to an exemplary embodiment, the method comprises a step making it possible to select all or part of the content of the window F ₂ and to save it in the workspace of the user U ₁ . Thus, thanks to the method of the invention, a user U ₁ can only decide to record the data ENS _A and associate them with a digital document D ₁ without having to record all of the data DATA ₁ . According to another example, the method of the invention offers a function making it possible to record the DATA ₁ data of each set ENS _A , ENS _B and ENS _C initially then to select a part of it, for example the data of the ENS _C assembly in order to associate them with a digital document D ₁ .

Multi-association to the same digital document

Selon un mode de réalisation, le procédé comprend une pluralité d’associations de données DATA₁à un même document numérique D₁.According to one embodiment, the method comprises a plurality of associations of DATA ₁ data to the same digital document D ₁ .

Le procédé comprend une étape d’agencement automatique des différents contenus DATA₁. L’agencement automatique peut comprendre par exemple un ordonnancement des différents contenus DATA₁selon un plan ou une structure donnée. Selon un autre exemple, une proposition d’ordonnancement des différents contenus DATA₁permet d’organiser le document D₁avec les données extraites des différents contenus DATA₁.The method includes a step of automatically arranging the different DATA ₁ contents. The automatic arrangement may include, for example, an ordering of the different DATA ₁ contents according to a given plan or structure. According to another example, a proposal for ordering the different DATA ₁ contents makes it possible to organize the document D ₁ with the data extracted from the different DATA ₁ contents.

La illustre un exemple d’un système de l’invention comportant un terminal électronique T₁et un serveur de données distant SERV₁permettant de stocker les données produites par les différents algorithmes mis en œuvre par le procédé de l’invention. Le réseau de données NET₁est par exemple le réseau internet. Le serveur SERV₂est par exemple un serveur permettant d’exécuter les algorithmes apprenant avec des configurations prédéfinies. Selon un exemple, les algorithmes exécutés sur le serveur SERV₂délivrent des données au moyen d’une API accessible via le réseau de données NET₁.There illustrates an example of a system of the invention comprising an electronic terminal T ₁ and a remote data server SERV ₁ making it possible to store the data produced by the different algorithms implemented by the method of the invention. The NET ₁ data network is for example the Internet network. The SERV ₂ server is for example a server allowing the learning algorithms to be executed with predefined configurations. According to one example, the algorithms executed on the server SERV ₂ deliver data by means of an API accessible via the data network NET ₁ .

Claims

Method for associating produced data (DATA₁) to a user identifier (ID_U1), said user being identified within a workspace comprising at least one memory allowing the recording of a first digital document (D₁) from a data resource displayed on a display of computer equipment, said method comprising:

Display (AFF ₁ ) of a first set of data (ENS ₁ ) accessible from at least a first uniform resource locator (URL ₁ ) in a first window (F ₁ ) of a browser of a data network (NET ₁ );
Actuation (ACT ₁ ) of a digital command to extract a first subset of data (SSENS ₁ ) displayed in the first window (F ₁ ) and to record said first subset of data (SSENS ₁ ) in a memory;
Execution (EXEC ₁ ) of a first learning algorithm (ALGO ₁ ) comprising a language model (MOD_LANG ₁ ) and being pre-trained with a first set of training data (TR ₁ ), said language model (MOD_LANG ₁₎ ) comprising a statistical model which models the distribution of discrete symbol sequences in a natural language, to generate a first set of output data (ENS _A ) from the at least one extracted subset of data (SSENS ₁ ) and to from a first data model (MOD ₁ ) defining a first specific training domain (TR ₂ ), said first specific training domain (TR ₂ ) comprising a set of data defining inputs to the first algorithm (ALGO ₁ ) and sets of desired outputs from the first algorithm (ALGO ₁ );
Execution (EXEC ₂ ) of a second learning algorithm (ALGO ₂ ) comprising a second language model (MOD_LANG ₂ ) to generate a set of keywords defining a second set of outputs (ENS _B ) from at least one extracted data subset (SSENS ₁ , SSENS ₂ );
Execution (EXEC ₃ ) of a third learning algorithm (ALGO ₃ ) comprising a third language model (MOD_LANG ₃ ) to generate a set of textual queries forming inputs to a search engine (MR ₁ ) comprising a set of resources indexed on a data network (NET ₁ );
Generation (GEN ₁ ) of said queries within a first search engine (MR ₁ ), recovery of a set of uniform resource locators (URL _i ) returned by the search engine (MR ₁ ) and processing of said resources for filter them according to a predefined criterion, said filtered resources defining a third output set (ENS _C );
Generation (GEN ₂ ) of a second graphic window (F ₂ ) superimposed on the first window (F ₁ ), said second window (F ₂ ) displaying at least one piece of data from each set of outputs (ENS _A , ENS _B , ENS _C, DATA ₁ ) and a first digital actuator (B ₁ ) making it possible to record the first uniform resource locator (URL ₁ ) and said output data (ENS _A , ENS _B , ENS _C , DATA ₁ ) produced in a memory , the actuation of the first actuator (B ₁ ) leading to the creation of an association of output data (DATA ₁ ) with a user identifier (ID _U1 ).

Method according to claim 1 characterized in that the actuation of the first actuator (B ₁ ) results in the creation of an association of the output data (DATA ₁ ) with an identifier of a memory space and/or with a first digital document predefined (D ₁ ).

Method according to claim 1 characterized in that the first set of output data (ENS _A ) is a summary or opinion in natural language of the extracted subset of data (SSENS ₁ ) when the latter is a text in natural language.

Method according to claim 1 characterized in that it comprises processing the first subset of data (SSENS ₁ ) to homogenize said data.

Method according to claim 1 characterized in that a second digital actuator (B ₂ ) provides access to all the data of at least one set of output data (ENS _A , ENS _B , ENS _C ).

Method according to claim 1 characterized in that it comprises the production of a set of data (MOD ₂ ) as input to the third learning algorithm (ALGO ₃ ) defining a specific training domain (TR ₄ ), said third learning algorithm (ALGO ₃ ) being pre-trained from a generic domain (TR ₂ ), said specific training domain (TR ₄ ) comprising a second data model (MOD ₂ ) defining a second specific training domain (TR ₄ ), said second specific training domain (TR ₄ ) comprising texts in natural languages as input and queries in natural languages defining desired outputs of the third learning algorithm (ALGO ₃ ).

Method according to claim 1 characterized in that the first learning algorithm (ALGO ₁ ), the second learning algorithm (ALGO ₂ ) and the third learning algorithm (ALGO ₃ ), are GPT-3 type transformers designating “generative Pre-Training Transform ".

Method according to claim 7 characterized in that the first learning algorithm (ALGO ₁ ) has a dimension of 12288, the second learning algorithm (ALGO ₂ ) has a dimension of 1024 and the third learning algorithm (ALGO ₃ ) has a dimension of 4096 .

Method according to claim 1 characterized in that the method comprises a configuration step aimed at defining a size of the output data of the first algorithm (ALGO ₁ ), a maximum number of keywords output from the second algorithm (ALGO ₂ ) and a maximum number of queries output from the third algorithm (ALGO ₃ ) and a maximum number of resource locators for each query produced by the third algorithm (ALGO ₃ ).

Method according to claim 1 characterized in that it comprises a step aimed at associating a plurality of output data (DATAi) with the same first digital document (D ₁ ).

Method according to claim 10 characterized in that the association of all the output data (DATA ₁ ) with the first digital document (D ₁ ) comprises an indexing of said data (DATA ₁ ) according to an ordered sequence of a plurality of output data (DATA _i ) corresponding to other subsets of data (SSENS _i ) of the same data set (ENS ₁ ) or another data set (ENS _i ).

System comprising at least one data server comprising a memory and calculation means making it possible to define a workspace in which at least one digital document being developed is recorded and user profile data, the system further comprising a electronic terminal provided with a display, a memory and a calculator, said system being configured to carry out the steps of the method of any one of claims 1 to 11.