WO2020169898A1

WO2020169898A1 - Method for proviing relevant information associated with a patent

Info

Publication number: WO2020169898A1
Application number: PCT/FR2020/050219
Authority: WO
Inventors: Georges CORNUÉJOLS; Marc CORNUÉJOLS; Marine CORNUÉJOLS
Original assignee: Arkyan
Priority date: 2019-02-21
Filing date: 2020-02-07
Publication date: 2020-08-27

Abstract

The method (10) for providing relevant information associated with a patent, which includes a step (11) of entering a patent identifier, includes the following steps: - converting (12) the identifier into a first set of keywords representing a technical field, in a first database associating an identifier with a set of keywords; - creating (13) a request to collect statistical information, according to the keywords resulting from the conversion and, for each item of said information, a second set of keywords associated with said information - comparing (14) the sets of keywords, - assigning (15) a weight to each item of information collected according to the result of the comparison, - hierarchizing (16) the set of statistics according to the weight of each item of information, - selecting (17) a set of statistics from among the hierarchized sets of statistics and - providing (18) each statistic from the selected set of statistics.

Description

PROCESS FOR PROVIDING RELEVANT INFORMATION ASSOCIATED WITH A

PATENT

TECHNICAL FIELD OF THE INVENTION

The present invention relates to a method of providing and providing relevant information associated with a patent. It applies, in particular, to searching databases and to enriching queries in patent databases with national statistical data. In addition, the present invention is applicable to the automatic detection of user behavior in relation to patent extensions.

STATE OF THE ART

The compartmentalization of information of different types leads to difficulties in connecting a user wishing to cross different sources to increase his knowledge of a subject.

This compartmentalization results in a risk of the user failing to identify information from a first source when performing a search for information from information from a second source. In addition, there is a risk of identifying incorrect information or information that is less relevant than possible due to the user's choice in the search for information. Finally, it obviously results from a waste of time related to the search by the user who must analyze information from a first source to identify a search criterion in a second source, perform the search, sort the results before select one.

The databases listing patents for inventions are not linked to databases of statistical information relating to countries, such as, for example, macroeconomic information, as soon as this information is not directly linked to statistics dealing with patents. This compartmentalization of information makes it difficult to link patent-type information to other information that is nevertheless necessary in the development of a strategic choice related to patents. One such choice is, for example, the geographical extension of a patent application under cover of the right of priority or upon exiting a PCT application. Such a choice requires the crossing of information representative of the market potential of the patent application, with budgetary data.

STATEMENT OF THE INVENTION

The present invention aims to remedy all or part of these drawbacks.

To this end, the present invention relates to a method for providing relevant information associated with a patent comprising a step of entering at least one invention patent identifier, which comprises:

a step of converting the identifier entered into a first set of keywords representative of at least one technical field of the patent, in a first database associating at least one identifier with a set of keywords; - a step of creating a request for collecting information representative of a set of national statistics, accessible on a network, according to the keywords resulting from the conversion and, for each said information, a second set of words -keys associated with said information,

- a step of comparing the first set of keywords and each second set,

- a step of assigning a weight to each piece of information representative of the set of national statistics collected according to the result of the comparison,

- a stage of prioritization of the set of national statistics according to the weight of each said information,

- a step of selecting at least one set of national statistics among the sets of hierarchical national statistics and

- a step of supplying each said national statistic of the selected set of national statistics.

Thanks to these provisions, a query, in a patent database, is enriched, based on keywords associated with the patent of corresponding national, supranational or subnational statistics, automatically and reliably so that a user can receive quickly reliable information. Such information makes it possible, in particular, to make an informed choice of the international extension of a patent application, for example.

In some embodiments, the method that is the subject of the present invention comprises, upstream of the input step, a step of constructing the first database which comprises:

- a step of collecting and indexing public identifiers of at least one patent and

- an association step, for each patent whose public data is indexed, at least one piece of information representative of at least one keyword.

These embodiments create a new search database using publicly available data associated with keywords relevant to the technical field of the invention.

In some embodiments, the information is representative of at least one title of a technical class associated with a set of keywords.

These embodiments make it possible to deduce from an international or cooperative class of patents keywords which technically represent the field of application of the sought patent. Such keywords can be associated via a correspondence table, for example.

In some embodiments, the information is representative of at least part of the content of the patent.

Thanks to these provisions, the keywords are deduced from the content of the patent to correspond more precisely to the patent concerned. The content here may refer to the description, claims or abstract, for example.

In some embodiments, the step of constructing the first database comprises a step of creating a tree structure, linking at least two patent identifiers, the information representative of at least one keyword being associated in function of each identifier linked to the identifier entered during the entry step. These embodiments make it possible to link patents by patent family, for example according to a priority claimed according to the Paris Union convention. Patent families can therefore be created, the information contained in the database of a patent being able to enrich a request formulated concerning a patent of the same family.

In embodiments, the method that is the subject of the present invention which comprises a step of extrapolating a national statistic from a set of national statistics having a missing national statistic, each value obtained by extrapolation being associated with the second set of words -keys of information representative of the set of national statistics of the extrapolated value.

These embodiments make it possible to extrapolate from national statistics if one or more national statistics is missing from a set of national statistics.

In some embodiments, the method that is the subject of the present invention comprises a step of constructing the second database comprising:

- a stage of collecting and indexing sets of national statistics,

- an association step, for each set of national statistics indexed, at least one piece of information representative of at least one keyword.

These embodiments make it possible to build a database of disparate national statistics.

In some embodiments, the sets of national statistics are organized in at least one tree structure comprising at least two levels, a so-called “higher” level to which is attached a so-called “lower” level comprising several sets of national statistics, the set of statistics national level of the higher level being equal to the sum of the sets of values of the lower level.

These embodiments make it possible to match a patent classification with a classification of national statistical data.

In some embodiments, in the extrapolation step, a national statistic is extrapolated based on the set of higher level national statistics.

Thanks to these provisions, the extrapolation of national statistics is consistent with other sets of national statistics in the same tree structure.

In some embodiments, the sets of national statistics are organized in two distinct trees, with a national statistic from one tree being extrapolated based on a set of national statistics from another tree.

These embodiments allow extrapolating from other sets of related national statistics, but in two separate trees.

In embodiments, the first database comprises for each patent, an identification of at least one holder and a code representative of the geographical scope, the method further comprises:

- a step of associating at least one other patent from the first database corresponding to the first set of keywords of the identifier entered and / or the holder of the title,

- a step of counting the number of occurrences of each code representative of the geographical scope according to each other associated patent patent whose identifier is entered;

- a step of weighting each national statistic of the selected set of national statistics according to the number of occurrences, - during the supply stage, the set of national statistics is classified by decreasing weighting.

Thanks to these provisions, the behavior of managers in similar situations is modeled and makes it possible to deduce, for the manager in charge of the industrial property title studied, an appropriate strategy based on the behavior observed to limit costs and optimize the decisions taken.

In embodiments, each national statistic is associated with a geographic area of scope, the method further comprising;

- a step of defining a predetermined limit value,

- a step of classifying each national statistic from the set of national statistics selected by geographic area and by decreasing value,

- an iterative step of calculating the sum of each national statistic, in order of classification, until the value of the sum is greater than or equal to the predetermined limit value, and

- during the supply step, the national statistics provided are the national statistics used in the last iteration of the calculation step.

These embodiments make it possible to enrich the query with geographic areas.

BRIEF DESCRIPTION OF THE FIGURES

Other advantages, aims and particular characteristics of the invention will emerge from the non-limiting description which follows of at least one particular embodiment of the method which is the subject of the present invention, with reference to the accompanying drawings, in which:

- Figure 1 shows, schematically and in the form of a flowchart, a succession of particular steps of a first embodiment of the method object of the present invention,

- Figure 2 shows, schematically and in the form of a flowchart, a succession of particular steps of a second method embodiment of the present invention,

- Figure 3 shows, schematically and in the form of a flowchart, a succession of particular steps of a third method embodiment of the present invention,

- Figure 4 shows, schematically and in the form of a flowchart, a succession of particular steps of a fourth method embodiment of the present invention,

FIG. 5 represents, schematically, a graphic representation of the objects handled by the method which is the subject of the present invention.

DESCRIPTION OF EXAMPLES OF EMBODIMENT OF THE INVENTION

This description is given without limitation, each characteristic of an embodiment being able to be combined with any other characteristic of any other embodiment in an advantageous manner.

We now note that the figures are not to scale.

It is noted here that the term "patent" designates both a patent application, a granted patent, a utility certificate and any other industrial property title classified according to one of the technical classifications of patents, among which in particular the IPC (for "International Classification of Patents ”) and the CCB (for“ Cooperative Patent Classification ”, known in English under the acronym CPC, for“ Cooperative Patent Classification ”).

It should be noted that the term “national statistics” refers to any data characterizing a country or a group of countries. This data can be representative of an economic activity, a political or legal state or representative of the population of a country for example. These examples are given without limitation. By extension, a set of national statistics is a set of numerical values representing the same entity, over several time periods and / or geographic areas. Note that a time period is a duration of predefined length, for example a year or a month. We also note that a geographical area is defined by the demarcation of the borders of a country or a group of countries.

"Machine learning" is any statistical approach technology that allows a computer program to improve its performance in solving a task without this increase in performance being the result of explicit programming. The machine learning considered can implement any algorithm known to those skilled in the art to achieve the objective for which it is applied.

It is noted that the first and the second computer terminal can be confused.

FIG. 1 shows a flowchart of steps of a first particular embodiment of the method 10 which is the subject of the present invention. The method 10 of providing relevant information associated with a patent which comprises a step 11 of entering at least one invention patent identifier, comprises:

- a conversion step 12 of the identifier entered into a first set of keywords representative of at least one technical field of the patent, in a first database associating at least one identifier with a set of keywords;

a step 13 of creating a request for collecting information representative of a set of national statistics, accessible on a network, as a function of the keywords resulting from the conversion and, for each said information, a second set of keywords associated with said information,

- a comparison step 14 of the first set of keywords and of each second set,

- a step 15 of assigning a weight to each piece of information representative of the set of national statistics collected according to the result of the comparison,

- a step 16 of prioritizing the set of national statistics according to the weight of each said information,

- a selection step 17 of at least one set of national statistics from among the sets of hierarchical national statistics and

a step 18 of providing each said national statistic of the selected set of national statistics.

The input step 11 is carried out by means of a first computer terminal comprising a man / machine interface, for example a keyboard or a touch screen. During the entry step 11, an operator inserts into an entry field a series of alphanumeric characters representing an identifier of a patent. An operator enters, for example, a patent application or publication number. For example, the patent identifier can be a code consisting of two representative letters a geographical area and a set of numerical characters representative of an application number.

In some embodiments, the identifier corresponds to the publication number, filing number or grant number of a patent or patent application. Preferably, each industrial property title in the first database comprises a field representative of the geographical area, country or region, covered by the title. The form of a patent identifier is known to those skilled in the art. The input step 11 may include a step of displaying a predictive input (otherwise called "intuitive input") known to those skilled in the art.

During conversion step 12, the identifier entered is associated with at least one keyword in the first database. The construction of the first database is described with reference to the flowchart in Figure 2.

The construction step 21 of the first database comprises:

- a step of collecting 22 and indexing public identifiers of at least one patent and

- an association step 23, for each patent whose public data is indexed, at least one piece of information representative of at least one keyword.

The collection step 22 is carried out, for example, by the implementation of an electronic calculation circuit, of the computer or computer server type, configured to receive, via a means of communication, such as a network card for example, identifiers. patent public.

In the collection step 22 and indexing, several public patent patent databases are collected and assembled. Each patent is defined by computer data representative of a unique identifier.

In some embodiments, the method that is the subject of the present invention comprises a plurality of data collection steps 22. Such a plurality of collection steps 22 may correspond either to the successive collection of an identical type of data and / or to the collection of several different types of data.

Preferably, each patent is associated with at least one of the following characteristics:

- computer data representative of a holder of the so-called "holder" title,

- IT data representative of at least one technical class called "class",

- an agent,

- a priority number,

- a priority date,

- the identification of at least one inventor,

- a filing date,

- a deposit number,

- a date of publication,

- a publication number,

- a title,

- at least one document cited in the patent examination procedure,

- an abstract,

- a description,

- at least one claim and / or - any other information known to those skilled in the art and present on at least industrial property title database.

The computer data representative of a holder may represent a name, an address, a telephone number, a legal form, a unique identification number issued by a government administration, and / or any other information representative of a company or from an individual.

Each computer data representative of at least one technical class may correspond to an element of a patent classification such as the "Cooperative Patent Classification" (acronym "CPC") or the "International Patent Classification" (acronym "CIB"). Each code representative of a technical area of the CPC and the IPC comprises at least, in the following order:

- a letter representing a section,

- two digits representative of a class,

- a second letter representative of a subclass,

- at least one figure representative of a group.

In some embodiments, each code further comprises at least two other digits separated from each representative digit of a group by the symbol "/". Said two other digits represent a subgroup. For example, the alphanumeric string of a data representative of a technical class would be "A21 C1 / 06".

The association step 23 is carried out, for example, by the implementation of an electronic calculation circuit, of the computer or computer server type, configured to directly or indirectly associate at least one keyword with a patent identifier. . An indirect association is achieved, for example, by associating a corresponding technical class indicator, via a correspondence table, with a set of keywords.

In some embodiments, during the association step 23, the information is representative of at least one title of a technical class associated with a set of keywords. For example, the technical class is associated in another database with a title. Keywords can be extracted from said title.

In some embodiments, the information is representative of at least part of the content of the patent. For example, the information is representative of at least part of a claim, aggregate, title or description.

In embodiments, the information is representative of at least one activity of at least one holder. For example, each holder is associated with an economic area of activity in another database. The technical field of activity can be represented by a code according to the French nomenclature of activity (acronym "NAF") or any equivalent in other countries.

In some embodiments, the information is representative of a combination of the above information.

In some embodiments, the construction step 20 comprises a step of creation 24 of a tree structure, connecting at least two patent identifiers, the information representative of at least one keyword being associated according to each identifier. linked to the identifier entered during the entry step. The construction step 24 is carried out, for example, by the implementation of an electronic calculation circuit, of the computer or computer server type, configured to associate several patent identifiers in a database according to a determined rule.

In embodiments, one or more patents may be cited during the process of granting the patent whose identifier is seized, linking said patents to the patent whose identifier has been seized. Each of said patents is identified by a unique identifier associated with information representative of at least one keyword. In embodiments, two patent identifiers are linked by at least one common priority claim, or by a common owner, for example.

In particular embodiments, such as that shown in FIG. 2, the method 20 comprises, upstream of the conversion step 12, a step of reading a data item representative of a content of the patent, the step of conversion 12 being performed as a function of the data read.

Such a reading step can be carried out, for example, by implementing an electronic calculation circuit configured to execute a computer program for reading documents.

This representative data is, for example, a representative text of the abstract of the patent on a database, such as a patent register. This textual content is then, for example, added to the first set of keywords, so that the calculation is then linked to the patent which contextualizes the class to the application covered by the patent.

During the conversion step 12, each piece of information representative of at least one keyword is obtained by consulting the first database. Then, at least one keyword associated with the information is obtained. Preferably, each keyword obtained is integrated into the first set of keywords. The conversion step 12 is performed, for example, by the computer server in a database associating, for at least one patent identifier, at least one technical class. This database is, for example, an online patent register, such as, for example, the French register of patents kept by IΊNRI (for "National Institute of Industrial Property"), the European patent register kept by the EPO (for "European Patent Office") or the United States Patent Register kept by the USPTO (for "United States Patent and Trademarks Office", translated by the United States Patent and Trademarks Office ). This database is preferably the DocDB database maintained by the EPO.

In some embodiments, the converting step includes a step of removing duplicates from the first set of keywords.

In embodiments, at least one keyword for which the representative information is at least one of a part of a claim, title, abstract and / or description is extracted from said element.

In embodiments, at least one keyword for which the representative information is at least one technical class is taken from the title of the technical class.

In some embodiments, at least one keyword is associated with a relevance value, this relevance value being calculated by semantic and / or statistical analysis of the candidate keywords for conversion.

In embodiments in which a tree structure is created, the set of keywords at least partially comprises the keywords of a patent whose identifier is linked to the patent whose the identifier is entered. There is then an accumulation of keywords for more precision during the research.

The method of extracting keywords is known to those skilled in the art.

The method 10 comprises a step 13 of creating a request for collecting information representative of a set of national statistics, accessible on a network, as a function of the keywords resulting from the conversion and, for each said item of information, a second set of keywords associated with said information.

A request to collect information, also known as a "search request". In step 13 of creating a query, the first set of keywords are used by an information search engine in the second database. A request is a set of words associated with logical operators such as “AND”, “OR”, or even “AND / OR”. The search engine searches the data at its disposal to find the words according to the defined logical operators.

Preferably, during the creation step 13, the set of words is associated with an “AND / OR” operator, that is to say that the set of words can appear simultaneously or alternately in a search field. .

The second database is detailed with regard to the construction step 31 shown in Figure 3. The construction step 31 of the second database comprises:

- a step of collecting and indexing 32 sets of national statistics,

- an association step 33, for each set of national statistics indexed, at least one piece of information representative of at least one keyword.

Step 32 of collecting and indexing sets of national statistics is performed, for example, by a human operator consulting an open database of national statistical data and recording all or part of the data accessible on a computer server or on a computer. computer terminal, such as a computer. This collection step 32 can also be automatic and carried out by connecting a computer server or collection computer to an open database. Alternatively, national statistical data can be entered by a human operator, via a man-machine interface, on a computer server or collection computer.

Each national statistical data thus collected comprises, at a minimum, a set of keywords associated with a numerical value. For example, a national statistical data collected is: “number of stethoscopes sold in Europe” associated with the number 175,372.

The comparison step 14 is carried out, for example, by the computer implementation of an algorithm for measuring lexical and / or semantic proximity between two sets of keywords. Such an algorithm is, for example, an algorithm for measuring a distance:

- from Hamming,

- from Levenshtein,

- from Damerau-Levenshtein, the edit distance and the edit distance with transposition and / or

- Jaro-Winkler distance.

These examples of distances are well known in the prior art and in the field of text mining, aim to evaluate the distance which separates two text strings, such as class keywords or national statistical data for example. This measurement of distance, or of similarity index on the contrary, takes place for at least one pair formed of a set, or subset, of keywords of a class and of a set, or sub - set of keywords of a national statistical data.

The classes and sets of national statistics can belong to a tree structure, called a classification, made up of nodes and branches. Two such trees are illustrated in figure 5:

- a technical 505 classification on the one hand,

- a 510 classification of national statistical data on the other hand.

The technical classification 505 is made up of two levels of nodes, 540 and 515, with node 540 thematically encapsulating node 515. For example, node 540 corresponds to agricultural techniques, and node 515 to fruit growing.

Two 520 terminal classes are also represented and attached to node 515. These classes correspond to the endings of the tree structure and are, for example, "orchard cultivation techniques" and "greenhouse cultivation techniques". In this document, both a node and an end of the classification are called “class”.

The national statistical data classification 510 is formed of two levels of nodes, 525 and 530, with node 525 thematically encapsulating node 530. For example, node 525 corresponds to the agricultural production values of fruits and vegetables, and node 530 to the agricultural production value of fruits.

Two terminal 535 classes are also represented and attached to node 530. These classes correspond to the endings of the tree structure and are, for example, "greenhouse fruits" and "other fruit productions". In this document, "national statistical data" is referred to as both a node and an end of such a classification of national statistical data.

When at least one of the data among classes or national statistical data is classified according to a determined tree structure, the comparison step 13 can be modified as follows: the similarity is measured between pairs of classes - national statistical data is carried out at different levels at least one tree structure so as to rank the national statistical data according to the level for which the similarity index is maximum.

Thus, preferably, this index is measured starting either from the highest level or the lowest level of a classification and going towards the other end.

In the example of Figure 5, for example, a similarity index is first measured between node 540 and node 525, then between node 540 and node 530, then between node 540 and each terminal 535 data. . The same cycle begins again with the children of node 540, and so on, so as to produce a matrix of similarity indices.

This matrix makes it possible to identify, for a given class, the most relevant national statistical data, whatever the level in the tree structure of this national statistical data. Preferably, during the calculations associated with a node, all or part of the keywords of the children of said node are included in the keywords of the node.

In the example of Figure 5, for the “greenhouse cultivation techniques” class, the most similar national statistical data is “greenhouse fruits”. For the class "techniques of cultivation in orchards ”, the most similar national statistical data is, in this example,“ agricultural production values of fruits ”.

Preferably, the lower a national statistical data associated with a class is in the economic tree, the more its similarity index is increased.

In variants, during a selection step, a user may request, via a man-machine interface, to go up to a higher technical classification level or to go down to a lower technical classification level, in this case, National statistical data are adapted to correspond to the level of classification intended by the user.

When no similarity index has been calculated for a given pair, the similarity index is determined to be minimal, for example.

The method 10 includes a step of assigning a weight to each piece of information representative of the set of national statistics collected as a function of the result of the comparison. The affected weight is representative of at least one element, or a combination of:

- the similarity measurement carried out, if applicable,

- the number of keywords common to the two sets,

- the classification level,

- the completeness of the set of national statistics,

- the source of national statistical data,

- a relevance or non-relevance factor given by the user by means of a man-machine interface.

In particular embodiments, such as that represented in FIG. 1, the method 10 the step of weighting a set of national statistics is carried out at least partially as a function of data representative of a nationality of an applicant. of the patent. This representative data can be read, for example, on an online database, such as a patent register for example.

A prioritization step 16 of the set of national statistics as a function of the weight of each said item of information. Step 16 of prioritization is performed, for example, by sorting national statistics in relation to a given class as an increasing or decreasing function of the weight assigned. This sorting is carried out, for example, by a computer program. This prioritization step 16 aims to facilitate the selection by a human user or by an automatic selection process carried out, for example, by an artificial intelligence.

In particular embodiments, such as that shown in FIG. 1, the step 16 of prioritizing sets of national statistics is carried out, when at least two classes are read, as a function of at least one class associated with at least national statistical data.

In variants, the computer server associates with at least one class an indicator representative of an application or technological character, the prioritization 16 being carried out so that the national statistical data associated with an application class are placed hierarchically higher.

The method 10 includes a step 17 of selecting at least one set of national statistics from among the sets of hierarchical national statistics.

The selection step 17 is carried out, for example, by a human operator selecting, via a man-machine interface and on a computer terminal, at least one national statistical data item. associate with at least one class. Alternatively, this selection step 17 is carried out automatically as soon as a pair has a weight greater than a predetermined or determined limit value and learned by an artificial intelligence. Such learning is described with regard to an optional learning step described below.

In some embodiments, the method 10 comprises a recording step, implemented, for example, by a computer server connected to the terminal used by the human operator to record each selection made by the operator so as to gradually constitute a database of links between classes and national statistical data. This registration is intended to facilitate the selection made during a second selection stage.

In some embodiments, a connection step is performed, for example, by implementing a data network, such as the Internet or a cellular network for example, between the second computer terminal and the computer server recording the associations between national statistical classes and data.

The second selection step is performed, for example, by a human operator selecting, via a man-machine interface and on a computer terminal, at least one association among the registered associations. Alternatively, this selection step is performed automatically based on user behavior learned by artificial intelligence. Such learning is described with regard to an optional learning step described below.

As a result of this process 10, a patent has been associated with a set of national statistics, such association being provided, for example, displayed or used in a subsequent process.

In particular embodiments, the method 10 comprises a step of registering at least one association selected during the second selection step on a computer server.

This registration step is performed, for example, in a similar manner to the registration step, via the connection of the computer server to the second terminal. Preferably, this association is entered in a user profile of the user who performed the second selection step.

In these variants, the step of assigning a weight can be adapted as a function of a user's context, or of an indicator of relevance or not, and of reducing the field of associations applicable to the user. . Thus, the associations recorded during the weight assignment step 17 can be contextually neutral while the associations recorded during the enrollment step take into account the user's own context.

For example, a national statistical datum can associate a technical class of “electronic control means” with both a national statistical datum of “automatic door controls” and “automatic gas valve controls”. These two associations may have been selected during the first selection step 17 because they target many electrical control means. During the second selection step, the user, manager of a patent portfolio in the gas field, selects only the automatic controls of gas valves so as to visualize national statistical data for his use of patents, modifying thus the weight assigned to the sets of national statistics. In embodiments, at least one selection step 17 is performed by an automatic learning algorithm.

In particular embodiments, such as that shown in FIG. 1, the method 10 comprises downstream from the registration step, a step 150 of learning a selection behavior during the second selection step, the machine learning algorithm selecting an association based on at least one learned behavior.

The learning step 150 is performed, for example, in a probabilistic manner by successively reading the selections made by a user to determine user behavior with respect to the first and second sets of keywords. The following description assumes that a first set is defined by the title of a class and that a second set is defined by the title of a set of national statistics. When a class - national statistics set association frequency is less than a determined cutoff frequency, the class - national statistics set association is either withdrawn for that user or pre-selected. When an association frequency is low for a determined number of distinct users, the weight assigned between the class and the set of national statistics can be reduced so that during the selection step, this association is prioritized at a lower level or that, if the selection is automated, this association is deleted.

In deeper learning models, the machine learning algorithm is designed to detect, in a set of keywords not selected by a user against a set of national statistics, the composition of that set so to establish a rule according to the successive behaviors of the user.

The supply step 18 is carried out, for example, by the implementation of a display on a screen of the second terminal.

Particular steps and particular embodiments of the step of building the second database are described below.

In some embodiments, the method that is the subject of the present invention comprises, downstream of the collection step 32, a step of normalizing at least part of the data collected.

The problem to be solved is the following: a collected data set can include data classified according to several distinct systematic approaches (of taxonomy type for example), making the data inconsistent. In such a set, each data item includes both a value and an indicator of the classification system used. However, the result of the plurality of systematic approaches used is an inconsistency between representative data, however, of the same object or concept.

An illustration of this example is, for example, the classification of a monetary value of footwear production changing from year to year. In this illustration, this monetary value can be positioned in a class named "A" of a first system for a given year and in a class named "B" of a second system for another year. However, for the first year determined, class “B” corresponds in the first system used to the monetary value of soap production, resulting in an inconsistency in the data collected.

This illustration can be schematized as follows, for data representative of a single object and / or concept: [Table 1]

Thus, during the normalization stage, a set of collected data is translated according to a unified systematic approach. This normalization step is performed, for example, by an electronic computing device configured to assign a value to each entry of a database comprising the set of data, said value being associated with a single classification system indicator used. .

The classification system used can be determined by a user, via a man-machine interface. For example, the user chooses a classification system from a list of all classification systems for which an indicator is detected among the data in the dataset.

The classification system used can be determined automatically, for example, by the electronic computing device configured to statistically determine a number of data associated with each classification system and to select the classification system whose number of occurrences is the highest. higher. Alternatively, this statistical approach can be carried out on a sample of data corresponding to states of the object and / or concept between two determined dates, for example two years before the date of execution of the harmonization step. .

Alternatively, the electronic computing device is configured to statistically determine a number of data associated with each classification system for data corresponding to the last state of the object and / or concept that these data represent and to select the classification system with the highest number of occurrences.

Once the classification system used has been determined, the normalization step preferably implements a correspondence table between classification systems to replace the classification system indicators not used by the indicators of the classification system. For example, in the table below, the entries corresponding to the lines "class 2010 - A" and "classification system 2010 - 1" are replaced or completed by lines "class 2010 - B" and "classification system 2010 - 2 ”.

The correspondence table can be predetermined, collected or dynamic and obtained by the use of a machine learning algorithm configured to determine a semantic proximity between objects of at least two different classification systems and, according to a value from this semantic proximity, determining a correspondence between the classes corresponding to said objects in the classification systems.

For example, in the table above, the values of 2010 and 201 1 are both representative, regardless of the classification system, of the production value of footwear. The proximity between these two labels is total since these labels are identical: a correspondence can thus be automatically generated between classes A, of the 2010 classification system and B of the 201 1 classification system.

In variations, when this correspondence table associates several classes in an unused system with a class in the used system, the values associated with these classes are added together to produce a composite value. For example, if a value in 2010 for class "A" is 10 and the value in 2010 for a class "B" is 20 and the correspondence table matches "A" to class "C" and " B ”to class“ C ”, the value in class“ C ”for 2010 and in the classification system of“ C ”is 30.

Sometimes such an approach is not possible, especially when a class from a classification system that predates the classification system used is split in two. In these cases, it is impossible to know how much of a value associated with a class in the previous classification system is to be assigned to each of the two new classes in the classification system used. In such cases, the electronic calculation device assigns either a value of 0 in the classification system used, or a value determined according to a determined proportion between the two classes of the classification system used.

In these variants, the method that is the subject of the present invention comprises a step of extrapolation 34 of a national statistic from a set of national statistics exhibiting a missing national statistic, each value obtained by extrapolation being associated with the second set of keywords. information representative of the national statistics set of the extrapolated value. This extrapolation step 34 is performed, for example, by the electronic computing device calculating an average of the relative proportion of known values for classes of the classification system used forming, in a combined manner, a class of the previous classification system for another known dataset. For example, here is a data table:

[Table 2]

In this example, the value associated with class "A" in the 2010 classification system corresponds in part to the value associated with class "B" and in part to the value associated with class "C" in classification system 201 1. The value associated with the classes “B” and “C” in the classification system 201 1 represent respectively 75% and 25% of the total of these two values.

Thus, the value associated with the class “A” in the classification system 2010 can receive as a corresponding value in the classes “B” and “C” of the classification system 201 1 respectively the values “1500” and “500”. In some embodiments, the method that is the subject of the present invention comprises a step of calculating an aggregated value as a function of values collected during the collection step 32, harmonized during the harmonization step and / or extrapolated during step 34 of extrapolation. Such an aggregated value corresponds, for example, to a value of an economic indicator for a patent office, this economic indicator value being obtained by adding the values of the economic indicator for the member countries of this patent office.

In some embodiments, the extrapolation step 34 consists of filling in missing values in a series of data.

For example, if the dataset has only values for the years 2010, 201 1, 2013 and 2014, what value should be considered for the year 2012? This continuity of data is essential to detect a trend in many uses.

Several extrapolation methods can be used here. In a first variant, a linear interpolation is used to determine a mathematical function which comprises two points surrounding a missing piece of data and by calculating, from this function, the value for the missing point. For example, if the value associated with 201 1 is "10" and the value associated with 2013 is "30", the value 2012 can be estimated as "20", which is the average of "10" and "30".

Such a method, however, is limited to cases where a value later than the sought value is known and, moreover, greatly loses precision when several successive values are missing. Finally, such a method prevents the value thus calculated from falling outside the interval between the minimum value and the maximum value of the values surrounding the value to be estimated.

To overcome these drawbacks, an alternative extrapolation method used can consist of a trend extrapolation.

In some embodiments, the sets of national statistics are organized in at least one tree structure comprising at least two levels, a so-called “higher” level to which is attached a so-called “lower” level comprising several sets of national statistics, the set of statistics national level of the higher level being equal to the sum of the sets of values of the lower level. In these embodiments, a curve fitting function can be employed on a set of values prior to a value to be filled. Once this curve is obtained, the calculation of the missing value can be performed according to one of the coordinates of this value, such as a year for example.

However, such a method can generate curves locally exhibiting growth rates that are too high to be consistent with real values measured a posteriori.

To avoid such a drawback, a value to be filled can be determined by weighting the results of different extrapolation methods, such as, for example, linear interpolation and curve fitting methods.

In embodiments, in which the sets of national statistics are organized in at least one tree structure comprising at least two levels, a so-called “higher” level to which is attached a so-called “lower” level comprising several sets of national statistics, the set of national statistics of the higher level being equal to the sum of the sets of values of the lower level.

In some embodiments, the sets of national statistics are organized in two distinct trees, with a national statistic in a tree being extrapolated based on from a set of national statistics from another tree. In these embodiments, extrapolation methods consist in applying a trend from another set of values to the set of values for which a value must be filled, during a tree comparison step. . This other set of values can result from a step 32 of collecting a set of distinct national statistics.

These methods consist in determining a rate of change of a value from a rate of change measured for other known values of another set of values.

For example in a very simplified way, if between the years 2010 and 201 1, the growth rate of the Gross Domestic Product is 5%, a production value of shoes in 201 1, to be filled, can be calculated by the use of the formula "production value of shoes in 2010 x 105%".

The set of values used to determine this variation can be predetermined or, preferably, determined automatically.

In variants where the selected set of values is determined automatically, a determination step is carried out during the method which is the subject of the present invention. This determination step is carried out, for example, by an electronic calculation device.

During this determination step, at least one set of values is selected from a set of sets of available values. For each such selected set of values, an evaluation of the relevance of the set of values is performed. This evaluation is carried out, for example, by the comparison between a variation over time of the set of values and a variation over time of the set of values with a value to be filled. If the signatures of the variations match, that is to say have a correlation index greater than a determined value, the set of values evaluated is validated.

The rate of change corresponding to a known value in the known set of values corresponding to an unknown value of the set of values to be filled is then applied to the set of values to be filled to determine the unknown value.

In variations, known values of the set of values to be filled are considered virtually unknown, so that the reliability of the prediction can be assessed during the evaluation step from actual values.

In variants, whether the extrapolation is carried out to fill a missing value in a series of values or to calculate a value at the end of the set of values, the method that is the subject of the present invention uses an automatic learning algorithm configured to from a set of known values and from a determined number of parameters associated with these values, determining a weighting between values associated with parameters of other sets of values.

Below, an example of an extrapolation step is provided.

In this example, we consider that the values collected come from different sources and that these values are representative of different ways of characterizing a country. For exemple :

- a first data source provides a set of values representative of a number of births, GDP and energy production from renewable sources for France, the United Kingdom and Germany, - a second data source provides a set of values representative of a monetary value representative of imports and exports of recyclable infant diapers for France and the United Kingdom and

- a third data source provides a set of values representative of a monetary value representative of the production of infant layers for France and Germany.

Each value set has a value for each of the years 2010, 201 1, and 2012, for example. In this example, the production value of infant diapers is unknown for the United Kingdom, regardless of the year.

To determine the production values of infant diapers in the United Kingdom for 2010, 201 1 and 2012, the electronic calculation device evaluates the evolution of the GDP of France between 2010 and 201 1 and between 201 1 and 2012 and compares this evolution with, respectively, the change in the monetary value representative of the production of infant diapers for France between 2010 and 201 1 and between 201 1 and 2012. If these changes have a higher correlation coefficient than, for example, the production revolution energy from renewable sources in France between 2010 and 201 1 and between 201 1 and 2012 compared with, respectively, the change in the monetary value representative of the production of infant diapers for France between 2010 and 201 1 and between 201 1 and 2012, then the calculation device determines that the change in GDP is more representative than the change in energy production from renewable sources to estimate the value production of infant diapers. Knowing the GDP of the United Kingdom and its evolution over time, the electronic calculator determines a rule of three applicable in France and Germany between the GDP and the production of infant layers and applies this rule of three to the value of the GDP of the United Kingdom for each year to determine the value of infant production for the years 2010, 201 1 and 2012.

In a more sophisticated variant, based on machine learning technology, the electronic computing device assigns weighting factors to each of the data types and to recorded parameters specific to each country from known past data of production values. diapers for the UK.

FIG. 4 shows a second embodiment 40 of the method that is the subject of the present invention. The embodiment shown in FIG. 4 is compatible with all of the embodiments described above. Steps 1 1 to 18 are identical to the embodiments described above.

The method 40 additionally comprises steps 41 to 46.

For the implementation of method 40, the first database includes for each patent, an identification of at least one holder and a code representative of the geographical scope.

The method 40 comprises:

- a step of associating 41 at least one other patent from the first database corresponding to the first set of keywords of the identifier entered and / or to the holder of the title,

- a counting step 42 of the number of occurrences of each code representative of the geographical scope as a function of each other associated patent patent whose identifier is entered;

- a weighting step 43 of each national statistic of the set of national statistics selected according to the number of occurrences, - during the supply step 18, the set of national statistics is classified by decreasing weighting.

In embodiments, the association step 41 associates at least one other patent of the first database corresponding to the first set of keywords of the entered identifier. The association step 41 is performed, for example, by the computer implementation of an algorithm for measuring lexical proximity between two sets of keywords. Such an algorithm is, for example, an algorithm for measuring a distance:

- from Hamming,

- from Levenshtein?

- Jaro-Winkler distance.

These examples of distances are well known in the prior art and in the field of text mining, aim to assess the distance between two text strings, such as class keywords or national statistical data for example.

This measurement of distance, or of similarity index on the contrary, takes place for at least one pair formed of a set, or subset, of keywords of a class and of a set, or sub - set of keywords of a national statistical data.

In embodiments, at least one title of a technical class is associated with a set of keywords. For example, keywords can be extracted from said title.

Preferably, these embodiments include a step of evaluating a predominant technical class associated with the identifier entered as a function of each technical class associated with the identifier entered. Each technical class associated with the identifier entered provides information concerning a technical field to which the industrial property title applies. The evaluation step allows the predominant technical area to be deduced.

Preferably, the evaluation step comprises:

- a truncation step of each technical class to obtain a truncated class,

- a step of counting the number of occurrences of classes of the same truncation, the technical field evaluated corresponding to the truncated class of the majority number of occurrences.

During the truncation step, each class is truncated such that the subclass is the last element taken into account. For example, the class “A21 C1 / 06” is shortened to “A21 C”. The truncation step is performed by computer, for example by means of an application server querying the first database to retrieve each class. In embodiments, the class may be shortened to digits representative of the class. For example, the class “A21 C1 / 06” is shortened to “A21”. In other words, when the technical classification of patents, for example the CPC or the IPC, is seen as a tree structure, the truncation amounts to going up to a higher node in the tree.

Then, for the patent corresponding to the identifier entered, we count the number of classes which, once truncated, are identical. To perform the count by computer, a sorting algorithm known to those skilled in the art can be used.

The first set of keywords is determined based on the title of the truncated class occurring the most times. Preferably, the evaluation step, the truncation step and the counting step are applied to each other patent of the first database and more particularly those having a corresponding holder.

We call corresponding holder, holders whose name is the same and / or whose unique national identifier, for example the number of the identification system of the business register (acronym "SIREN") is the same. In other embodiments, two holders are considered as corresponding if, after applying an algorithm for calculating a semantic proximity, the semantic proximity is greater than a predetermined limit value, for example equal to 90% of the value maximum semantic proximity.

In some embodiments, the method 40 includes a step of standardizing the computer data representative of holders stored in the first database. The standardization step, known to those skilled in the art, can, for example, put each character in lowercase, remove special characters, remove accents.

In some embodiments, in the event of equality between the number of occurrences, the evaluation step comprises, a step of counting the number of occurrences of each truncated class among each patent, of the first database presenting a corresponding holder. The first set of keywords is determined based on the title of the truncated class occurring the most times.

To associate at least one patent with the patent whose identifier is entered, the evaluation step is applied to each patent in the first database. Then, the predominant class can be stored in a field of the first database. The first set of keywords associated with the patent so the identifier is entered then has a high semantic proximity to the other patents of the first database of the same predominant class.

In some embodiments, each evaluation step is performed during the construction of the first database. And the field corresponding to the technical area evaluated is updated when updating at least one class associated with a patent from the first database.

In embodiments, the association step 41 associates at least one other patent from the first database corresponding to the owner of the patent whose identifier is entered. In these embodiments, a corresponding holder is determined as indicated above.

In embodiments, the association step 41 associates at least one other patent from the first database corresponding to the first set of keywords of the entered identifier and a corresponding holder. These embodiments are a combination of the embodiments detailed above.

The method 40 includes a counting step 42 of the number of occurrences of each code representative of the geographical scope according to each other associated patent patent whose identifier is entered.

During the counting step 42, the content of the field comprising the geographical area covered by the patent is subjected to a sorting algorithm known to those skilled in the art in order to count and classify the geographical areas by decreasing number of occurrences. The ranking is stored temporarily. In some embodiments, the patents submitted to the sorting algorithm are filtered by holder, for example, for the corresponding holders or even for holders of the same nationality.

In some embodiments, the counting step comprises:

- a step of selecting a predetermined value X of holders classified by decreasing number of occurrences,

- a counting step 22 of the number of industrial property titles in the first database in the same technical field for each selected holder,

- a classification step 23 of each zone by decreasing number of occurrences.

During the selection step, an operator specifies, by means of a man / machine interface, for example a keyboard, a mouse or a touch screen, the number of holders X to be taken into account. In embodiments, a choice of predetermined X values may be offered to the user, for example, three, five, or ten.

During counting step 20, the corresponding holders are considered to be the same holder.

The X most recurring holders are selected. A filtering of the first database is performed to exclude holders other than the selected holders.

Then for each patent of at least one of the X selected owners, a counting step similar to counting step 42 and independent of the patent holder is implemented.

The sorted list is stored in memory at least temporarily.

In weighting step 43, each national statistic in the selected set of national statistics is weighted according to the number of occurrences. For example, the weighting can follow a linear law depending on the number of occurrences.

In embodiments, each national statistic is associated with a geographic area of scope, method 40 further comprises;

- a definition step 44 of a predetermined limit value,

- a classification step 45 of each national statistic from the set of national statistics selected by geographic area and by decreasing value,

- an iterative step 46 of calculating the sum of each national statistic, in order of classification, until the value of the sum is greater than or equal to the predetermined limit value, and

- in the supply step 18, the national statistics provided are the national statistics used in the last iteration of the calculation step.

During the definition step 44, an operator enters a predetermined limit value by means of a man / machine interface, for example by moving a cursor representing a target market percentage ranging from zero percent to one hundred percent. The first predetermined limit value is then calculated from the market value over all geographic areas.

During the ranking step 45, a sorting algorithm is implemented to rank the geographical areas according to the value of the national statistical data in a decreasing manner. The classification step 45 is preferably implemented by an application server.

During the calculation step 46, it is checked whether the value of the geographical area classified according to the first rank is greater than the predetermined limit value. If necessary, we go to the step of supply 18. Otherwise, we calculate the sum of the national statistics corresponding to the geographical area ranked first with the national statistics corresponding to the geographical area ranked second. If the sum is greater than the predetermined limit value, one passes to the supply step 18, otherwise the calculation is repeated with the national statistics corresponding to the geographical area of the third rank. The calculation step 46 is thus iterated until the sum is greater than the predetermined limit value.

The most recurring geographic areas classified by decreasing recurrence. The sorted list is stored in memory at least temporarily.

The supply step 18 may include a display step performed on a screen, for example, by means of an internet browser displaying a secure web page by using the hierarchy provided. The display can be in the form of a map or a list, for example.

Claims

1. Method (10, 40) of providing relevant information associated with a patent which comprises a step of entering (1 1) at least one invention patent identifier, characterized in that it comprises:

a step of converting (12) the identifier entered into a first set of keywords representative of at least one technical field of the patent, in a first database associating at least one identifier with a set of keywords ;

- a step of creating (13) a request for collecting national statistical information, accessible on a network, as a function of the keywords resulting from the conversion and, for each said information, a second set of associated keywords to said information,

- a comparison step (14) of the first set of keywords and of each second set,

- a step of assigning (15) a weight to each piece of information representative of the set of national statistics collected according to the result of the comparison,

- a step of prioritization (16) of the set of national statistics according to the weight of each said information,

- a selection step (17) of at least one set of national statistics among the sets of hierarchical national statistics and

- a supply step (18) of each said national statistic of the selected national statistics set.

2. Method (10,40) according to claim 1, which comprises, upstream of the input step (1 1), a step of building the first database which comprises:

- a step (22) of collecting and indexing public identifiers of at least one patent and

- a step (23) of association, with each patent whose public data is indexed, at least one piece of information representative of at least one keyword.

3. Method (10,40) according to claim 2, wherein the information is representative of at least one title of a technical class associated with a set of keywords.

4. Method (10,40) according to one of claims 2 or 3, wherein the information is representative of at least part of the content of the patent.

5. Method (10,40) according to one of claims 2 to 4, wherein the construction step (21) of the first database comprises a creation step (24) of a tree structure, linking at least two patent identifiers, the information representative of at least one keyword being associated as a function of each identifier linked to the identifier entered during the entry step.

6. Method (10,40) according to one of claims 1 to 5, which comprises a step of extrapolation (34) of a national statistic from a set of national statistical values having a missing national statistic, each value obtained. by extrapolation being associated with the second set of keywords of the information representative of the set of national statistics of the extrapolated value.

7. Method (10,40) according to one of claims 1 to 6, which comprises a construction step (31) of the second database comprising:

- a step (32) of collecting and indexing sets of national statistics, a step (33) of associating, with each set of national statistics indexed, at least one piece of information representative of at least one keyword.

8. Method (10,40) according to claim 7, in which the sets of national statistics are organized in at least one tree structure comprising at least two levels, a so-called “upper” level to which is attached a so-called “lower” level comprising several. sets of national statistics, where the set of national statistics of the upper level is equal to the sum of the sets of values of the lower level.

9. The method (10, 40) according to claims 6 and 8, wherein, during the extrapolation step (34), a national statistic is extrapolated according to the set of national statistics of a higher level.

10. Method (10, 40) according to one of claims 6 and 8 or 9, in which the sets of national statistics are organized according to two distinct trees, a national statistic of a tree being extrapolated (34) as a function of 'a set of national statistics from another tree.

1 1. Method (140) according to one of claims 1 to 10, in which the first database comprises, for each patent, an identification of at least one holder and a code representative of the geographical scope, the method comprising, in addition :

- a step of associating (41) at least one other patent from the first database corresponding to the first set of keywords of the identifier entered and / or the holder of the title,

- a counting step (42) of the number of occurrences of each code representative of the geographical scope according to each other associated patent patent whose identifier is entered;

- a weighting step (43) of each national statistic of the selected set of national statistics according to the number of occurrences,

- during the supply step (18), the set of national statistics is classified by decreasing weighting.

12. Method (40) according to one of claims 1-1 1, wherein each national statistic is associated with a geographic area of scope, the method further comprises;

- a definition step (44) of a predetermined limit value,

- a classification step (45) of each national statistic from the set of national statistics selected by geographic area and by decreasing value,

- an iterative step of calculating (46) the sum of each national statistic, in order of classification, until the value of the sum is greater than or equal to the predetermined limit value, and

- during the supply step (18), the national statistics provided are the national statistics used in the last iteration of the calculation step.