WO2018115626A1

WO2018115626A1 - Identification of an information source

Info

Publication number: WO2018115626A1
Application number: PCT/FR2017/053433
Authority: WO
Inventors: Jérôme BESOMBES
Original assignee: Office National D'etudes Et De Recherches Aérospatiales
Priority date: 2016-12-20
Filing date: 2017-12-07
Publication date: 2018-06-28
Also published as: FR3060801B1; FR3060801A1

Abstract

The identification of an information source (S2-S4) that has produced content (D2-D4) accessible by at least one database (BD) allows a level of anticipation of said source to be known in relation to other sources that have subsequently produced other content. A subsequent monitoring of a source that has shown a high level of anticipation can allow new information to be accessed before said information has been largely repeated, regained or reused.

Description

IDENTIFICATION OF AN INFORMATION SOURCE

The present invention relates to a method of identifying an information source, as well as an automatic search module and a program that are adapted to implement such a method.

The provision of documentary databases that are increasingly extensive, including information, articles, images, videos, messages published via social networks, etc., and which allow authors to produce new contents with a frequency of higher and higher, makes a task of documentary surveillance more and more complex. Automated document monitoring systems are used to search for content that is related to a domain of interest determined by a user, and then to present to that user the collected content that is supposed to be most useful to them. The usefulness of these contents is currently evaluated according to the following two criteria:

- the age of the information: information is considered all the more relevant because its publication is recent; and

- the relevance of the information: information is considered all the more relevant as its subject corresponds with more coincidence to the area of interest as determined by the user.

From these two criteria, known systems of documentary surveillance can classify in descending order of the contents that pre-exist in the database interrogated, or those which were added during a certain period, then to submit this classification to the 'user.

However, the relevance of the results of such searches depends very much on the ability of the system used to integrate the user's area of interest. But two difficulties limit this capacity:

the difficulty of modeling the domain of interest: that is to say the way this domain can be expressed, for example by keywords, and can be integrated into the system; and

the difficulty of taking into account a possible evolution of the domain of interest of the user over time.

To answer these difficulties, different techniques have been developed and implemented:

- Collaborative filtering: the system integrates uses of other users who are judged to have common interests with the current user, for example because they belong to the same social network. It is then possible to modify the domain of interest of the current user, according to domains of interest of other users; and

automatic learning: the system changes the domain of interest that was initially characterized by the user, for example as a function of interrogations which were successively produced by the user, according to privileged consultations, by the user. user, content that relates to certain topics, or based on user-generated ratings of content previously presented to him / her.

These two techniques can be used separately or in combination with each other.

But collaborative filtering is only possible if the user agrees to share its uses with a community, which is not compatible with documentary surveillance that is part of a competitive framework or a security framework. Moreover, machine learning often implies a certain inertia, which does not make it possible to rapidly evolve, when it may be necessary, the characteristics of fields of interest that are derived from prior uses. In addition, these characteristics concerning prior uses, which are derived from the analysis of the user's uses, are often low in quality and quantity. To further improve the relevance of the content that is provided to the user in response to a query query based on a The field of interest that he has entered, it is also known to identify sources of useful content that have already been provided to this user. Thus, in addition to each content, the source of it can be communicated to the user, such as for example a Twitter account, a media or blog website, etc. It is then possible for the user to "follow" these sources by systematically or selectively consulting the content they produce. These sources may also produce content outside the domain of interest entered by the user, which can help to change this area of interest. Thus, a source that has shown an important ability to produce useful content for a particular area of interest can also produce content that is relevant to the user outside of that particular area or boundary thereof. Such identification of sources is already widely practiced in the context of social network analysis. It is then mainly to identify people who are at the heart of the network, called opinion leaders. As a matter of principle, such opinion leaders have strong influences, that is to say that the contents they produce are taken up by many other sources, so that the same information is obtained with a great deal of redundancy. response to a query request. As a result, content sources that are opinion leaders directly and indirectly mask low-occurrence sources in the responses to a query. Because of this, useful information that is weakly repeated, also called weak signals, becomes more difficult to access in a mass of highly publicized information. From this situation, an object of the present invention is to provide a user with easy access to a source of content that is the source of early information on events, especially if this source has a small audience or appears with a low level of occurrence in the databases. Such an early source of information is referred to herein as the clairvoyant source, regardless of its audience and level of influence on other sources. A clairvoyant source with a low audience level will be called a "weak source".

Once the user has identified such a clairvoyant source and has access to it, it will be possible for him to "follow" to have information that is relevant to him as soon as possible after this information appeared a first time.

An ancillary object of the present invention is to identify clairvoyant sources for areas of interest that are likely to evolve, either because of the user or because of external trends that affect the community of sources of interest. content.

To achieve one or other of these purposes, a first aspect of the present invention provides a method of identifying an information source, which comprises the following steps to be performed using an automatic search module:

IM produce at least one query query that corresponds to several events; and

121 collect from at least one database, content references that are obtained in response to the query query, and whose contents each correspond to at least one of the events, and for each content, identify a source and a date of production of this content.

In general, in the present description, the term "at least one interrogation request" is understood to mean a set of interrogation requests, which may contain one or more interrogation requests, which are implemented during the same execution of the interrogation request. method of the invention.

According to the invention, the method further comprises the following additional steps: 131 among the identified sources, selecting at least one source that has produced at least one content relating to at least one of the events of step IM;

IAI for each source selected in step 131, and for each content produced by this source that is relative to one of the events of step IM, determining a temporal advance acquired by the source having produced this content, relative to on a date of the event or by report at a date when other contents relating to the same event have been produced, then combining the temporal clearances acquired by one of the sources selected in step 131 in order to calculate a numerical value, called the anticipation value and attributed to the source, which varies monotonously according to each temporal advance determined for this source; then

15 / provide an identifier of at least one of the sources selected in step 131, with the anticipation value that has been calculated in step IAI for this source. Optionally, in step 131, it is possible to select a source only if it has produced contents that are related to several of the events of the step / M. Thus, the method of the invention implements a correlation between subsets of contents that have been obtained in response to the interrogation request, and which relate to different events. Correlation identifies sources that have been active for multiple events, minimizing the importance of sources that have been active for only one event. In this way, the sources that appeared in the responses while they correspond little to the events of the interrogation request, are discarded. All content considered in step IAI may correspond to references that were collected at step 121.

In addition, the method of the invention classifies the sources that have been active about several of the events, based on their cumulative anticipation with respect to each event, or with respect to other sources about the events involved. Thus, a source that has reacted in advance of others to several events or has anticipated several events more than the other sources, that is to say, a source that has been clairvoyant, is evidenced by the method of the invention, and its identification is provided to a document surveillance operator. In first embodiments of the invention, said at least one query request is constructed directly from an initial supply of several events. In other words, the events are fixed and known a priori by the document surveillance operator who applies the invention. In this case, said at least one query request is generated in step IM from an event capture in the automatic search module, and each time advance that is acquired by a source can be determined at the same time. step 141 as a difference between the date of one of the events and a date when the source has produced content relating to the same event, and whose reference was collected at step 121.

In second embodiments of the invention, the events may not be known a priori, but are underlying in the responses that are obtained to said at least one query request. The respective dates of these events are not known, but dates that are produced simultaneously or in a short time many contents that correspond to said at least one query query, are approximations of these dates of events. These approximations can then be used to evaluate the temporal advance of each content. The step IM of such second embodiments of the invention then comprises entering a characterization of a field of interest in the automatic search module. The query query is then determined from the domain of interest entered, in a manner that is known per se. In step 121, after collecting the content references that were obtained in response to the query request, dates are determined, at which larger numbers of these contents have been generated. Each of these dates is then associated with one of the events of the step IM, even if this event may remain unknown. Then, each time advance that is acquired by a source can be determined in step 141 as a difference between one of the dates at which a greater number of contents were generated whose references were collected at step 121, and a date on which the source has produced content whose reference was also collected at step 121.

In third embodiments of the invention, events can be known a priori, to which can be added other events which are detected as in step IM of the second modes of the invention. Implementation.

In general terms for the invention, the anticipation value which is calculated for each source selected in step 131 may be an increasing function of each time advance that has been acquired by this source having produced a content relating to the one of the events. In this way, a clairvoyant source is characterized by a high anticipation value.

Possibly, the steps IAI and 151 can be executed for several sources that were selected in step 131, and their identifiers are provided in step 151 by being ranked according to the anticipation values that were calculated for each of these sources.

Also generally, but optionally, the method of the invention may include an additional step of removing sources whose hearing is too important, to highlight more clairvoyant sources low noise. For this, an audience value can be determined for each of the sources that were selected in step 131, and one of them can be rejected if its audience value is greater than a predetermined threshold value, or greater than the audience value of at least one of the sources selected at step 131. The audience value of a source may include the number of third-party views of the content that was produced by that source. source.

Still in general terms for the invention, the query request that is produced in step IM may be an aggregation of several elementary requests. In this case, the aggregation is established according to predetermined aggregation rules, including rules of proximity or semantic or linguistic equivalence.

In improvements of the invention, the anticipation value can be calculated for each source that has been selected in step 131, also as a function of at least one of the following parameters, in addition to the time delays acquired by this source: a number of events among all those of step IV, in relation to each of which at least one content has been produced by the source whose anticipation value is calculated;

a number of events among all those of step IV, in relation to each of which no content has been produced by the source whose anticipation value is calculated;

a number of contents that have been produced in relation to at least one of the events of step IV, and whose references were collected in step 121, whereas the source whose anticipation value is calculated has produced or not content related to this event; and

at least one value of a ratio of height of peak to width of the same peak, relative to variations of a number of contents that have been produced per day in relation with one of the events of step IV, and whose references were collected at step 121, whereas the source whose anticipation value is calculated has or has not produced a content related to this event.

Such additional parameters for calculating the anticipation value of a source make it possible to further highlight a clairvoyant source that has been early and relevant for several events. In addition, each event can be weighted in the calculation of the anticipation value, by an importance value of this event which can be evaluated by an expert, or which can be evaluated for example according to the height and / or peak width of variations in the number of contents produced. It is also possible to penalize a source that has not reacted to one of the events, or to take into account the general reactivity to an event. Thus, the value of anticipation can even better reflect the merit of a source to cover several events and to be precursor on them. The method of the invention may further comprise the following step, which is performed after step 151: / 6 / obtain at least one content that has been generated by a source whose identifier and anticipation value has been provided in step / 5 /.

In other words, one of the contents that has been produced by a source revealed to be clairvoyant by the method of the invention may be provided to the operator of the documentary surveillance.

Possibly also, the method of the invention, including steps / M to 15 / can be executed twice, using the second implementation mode indicated above for the second execution. Then, the domain of interest whose characterization is entered in step IM of the second execution can be determined at least partially from another domain of interest which is relative to a source whose identifier and the value anticipation were provided at step 15 / of the first execution. In this way, the first execution of the method of the invention makes it possible to identify a clairvoyant source, and the second execution can be focused on subjects of interest from this clairvoyant source, which may not have been covered by the request. interrogation of the first execution of the method of the invention.

A second aspect of the invention provides an automatic search module, which comprises:

means for producing at least one interrogation request that corresponds to several events, including optionally means for aggregating several elementary interrogation requests;

collection means, adapted to collect from at least one database, content references that are obtained in response to said at least one query request, and whose contents each correspond to one of the fewer events;

identification means adapted to identify a source and a production date for each content whose reference has been collected by the collection means; selection means adapted to select from sources identified by the identification means, at least one source which has produces at least one content relating to at least one of the events corresponding to the query request;

calculation means adapted to determine for each source selected by the selection means, a temporal advance acquired by this source having produced a content relating to one of the events, with respect to a date of this event or in relation to a date where other contents relating to the same event were produced, then to combine the temporal compensations acquired by one of the selected sources in order to calculate a numerical value, called anticipation value and attributed to the source, which varies monotonically in the function of each time advance determined for that source; and

- Output means, adapted to provide an identifier of at least one of the selected sources, with the anticipation value calculated for this source.

Such an automatic search module is adapted to perform a method according to the first aspect of the invention, possibly including the improvements and embodiments mentioned for this method.

Optionally, the selection means may be adapted to select from the sources identified by the identification means, at least one source that has produced content relating to several of the events corresponding to the query request.

More particularly, in order to execute the first modes of implementation mentioned above, the means for producing the query request can be adapted to allow a user to enter several events, and furthermore to produce the query of query from the events entered. In this case, the calculation means can be adapted to determine each time advance, for a selected source that has produced a content relating to several of the events, as a difference between the date of one of the events and a date on which this source produced content related to the same event, and whose reference was collected by the collection means. In order to execute the second modes of implementation mentioned above, the means for producing the query request are adapted to allow a user to enter a field of interest. The automatic search module then further comprises counting means adapted to count, for several dates, the contents that were produced at each of these dates, and whose references were collected by the collection module. It then determines the dates on which the largest numbers of these contents were produced. Each date that is thus determined is associated with an event that corresponds to said at least one query request. In addition, the calculation means may be adapted to determine each time advance, for a source selected by the selection means, as a difference between one of the dates on which a greater number of contents whose references have been produced has been produced. collected by the collection means, and a date on which the source has produced a content whose reference has also been collected by the collection means.

Finally, a third aspect of the invention provides a computer program that includes codes adapted to produce an execution of a method according to the first aspect of the invention, when these codes are read and executed by at least one processor, and that this processor has access to the database. For the present patent application, such a program is considered as a product as such, which is derived from the invention and which brings a new function to a computer. For this reason, it is referred to as a computer program product. Other features and advantages of the present invention will emerge in the following description of nonlimiting exemplary embodiments, with reference to the appended drawings, in which:

- Figure 1 is a timing diagram showing several sources of content; Figure 2a is a step diagram for first possible embodiments of the present invention; FIG. 2b is a temporal diagram of content production whose references have been collected, which illustrates the first modes of implementation of the invention;

FIG. 3a is a diagram of steps for second possible embodiments of the present invention; and

FIG. 3b is a temporal diagram of production of contents whose references have been collected, which illustrates the second modes of implementation of the invention.

In these figures, identical references designate elements that are identical, or that have identical roles.

In the present invention description, an event is any event or manifestation that belonged to the news at a time, which is called the date of the event. Depending on the context, a date may mean a daily date, but also more generally the identification of a moment with any precision: date with time, date with number of week, with indication of month, or only with an indication of year, etc.

Content means any data or document, including a link to a site or site page, any information, set of information, article, image, video, message, including published via a social network, which can be obtained in response to a query request.

By database is meant any grouping or collection of contents that can be queried by formulating a query, and a certain number of contents may be selected to be provided to a user in response to the query request. A reference refers to any type of reference that allows the user to access this content, including access references, for example a link to a content web page, bibliographic references, or combinations references of variable types. For simplicity and clarity of writing, it may in some cases confuse content collected with content whose reference has been collected. By source we mean any author to whom a content is attributed, or any publisher referenced for the publication of the content. A source can possibly produce several contents that are related to the same event. Possibly also, the same content can have several different sources, for example several authors who collaborated.

The term identifier of a source any coordinate or reference, such as for example an e-mail address, a website reference or social network, etc., which identifies the source uniquely. The date of production of a content is a date that is assigned to that content in a database, to identify the incorporation of the content in this database. It is thus a date of provision of the content, for the benefit of a user of the database.

An automatic search module is understood to mean any functional, hardware or software entity that makes it possible to interrogate a database, and to provide in response references of contents that correspond to the query used for the interrogation. Such an automatic search module may include a search engine as known to the general public, but also includes features for calculating an anticipation value for implementing the present invention. In addition, it can be enriched with additional and optional features, such as running collaborative or machine-learning processes to build query queries.

Sources produce content that is stored in one or more database (s), at times of production that may vary between content and / or sources.

A user of a communication network that provides access to this database (s), can use an automatic search module to query the database (s) based on a request formulated by this user. This request determines the domain of interest of the user, who is the object of his search for contents. It can be developed using a characterization form of the area of interest, called generic ontology in the jargon of the skilled person. For this purpose, the user completes fields of the generic ontology, such as a concerned product indication field, a usage indication field, model indication fields, targeted target, brand name fields. , sourcing, origin, etc. The combination of these fields as completed by the user is called the business ontology and expresses the domain of interest of the user to query the database (s) of data.

The automatic search module then establishes a query to query the database (s) of data based on the business query. This request can be established from the business ontology only, as elaborated by the user.

However, it may be advantageous, to provide the user with more relevant content or a rich service, to combine his business query with other queries to establish a final query on the basis of which the database (s) data will be (are) queried. In this case, the business query and each other request are called elementary queries, and are aggregated to build the final query, called the query query with which the database (s) is (are) queried (s) . Predefined aggregation rules are used for this, which are well known to those skilled in the art. Such aggregation rules translate in particular semantic or linguistic proximities or equivalences, or even binary operations on the contents of the fields of the elementary requests.

According to a first possibility, the business request that is established by the user can be combined with at least one other business request that has been established by another user, preferably on the condition of proximity between these users. Such an interrogation method is commonly called collaborative filtering by the skilled person.

Optionally, one of the basic requests that is combined with the business request established by the user, may correspond to a characterization of a domain of interest relating to a content source that is active in particular in the field of interest of the user. 'user. Thus, the business query as produced by the user can be enriched or oriented in according to that of the source, so that the field of interest of the user can follow that of the source, possibly also taking into account an evolution of the domain of interest of the source. The business request of the source, which is intended to form an elementary query in the aggregation with that of the user, may have been established by the source itself, for example to facilitate access to the content it has products, or automatically, including a content editing module.

According to a second possibility, the automatic search module may have stored business requests that were previously established by the user, and aggregate them as basic requests to build the query query. The aggregation can then result from a learning process, which extrapolates the business requests established successively by the user. Alternatively or in combination, the business request that has been established by the user can also be combined with content that has been previously consulted by the user, and possibly taking into account the assessments of some of these contents that have been entered. by the user. In this way, the query request can anticipate an evolution of the user's field of interest. Such an interrogation method is commonly called filtering by human learning of the art.

In Figures 1, 2b and 3b, the horizontal axis symbolically marks the time, noted t, with a chronological order from left to right of the figures. In FIG. 1, SrS ₄ denote sources of contents that are stored in the data base (s) BD. Such contents are indicated generically by the letter D. The automatic search module is designated by the reference 1. The user, or document monitoring operator, is denoted U, and the interrogation request denoted RQ. The brace on the right of FIG. 1 designates those contents D of the database BD that correspond to the interrogation request RQ, excluding contents that do not correspond to this interrogation request.

The automatic search module 1 collects references of the contents D which correspond to the request for interrogation RQ, as well as the production dates and sources of these contents. In the example of Figure 1, the source Si does not produce content that corresponds to the query RQ query. The source S2 produces several contents that correspond to the interrogation request RQ, whose content noted D ₂ at the date T (D ₂ ). Likewise, the source S ₃ produces several contents that correspond to the interrogation request RQ, whose content noted D ₃ at the date T (D ₃ ). And also, the source S ₄ produces several contents that correspond to the interrogation request RQ, whose content noted D ₄ at the date T (D ₄ ). By way of illustration, the content D ₂ has been produced by the source S ₂ before the content D ₃ produced by the source S ₃ , the latter before the content D produced by the source S ₄ . The automatic search module 1 thus collects in particular the references, the source identifiers and the production dates of the contents D ₂ , D ₃ and D ₄ .

FIGS. 2a and 2b illustrate first embodiments of the invention, in which events are initial data of the method. The step STi consists of an input, for example by the user U, of a series of events that are individually designated by EV-ι for a first of these events whose event date is T (EV-i ), EV ₂ for a second of these events whose event date is T (EV ₂ ), etc. From this series of events EV-i, EV ₂ ,..., The automatic search module 1 constructs the interrogation request RQ at the step ST ₂ , and implements this request to interrogate the database of BD data. The result of the query is a set of contents that are denoted D ,, D _j , ..., whose automatic search module 1 collects the references, the identifiers of the sources that produced these contents, and the production dates. of these contents (step ST ₃ ). Thus, the content D, was produced by the source S (Di) at the date T (Di), the content Dj was produced by the source S (Dj) at the date T (Dj), etc.

In step ST, the automatic search module 1 classifies the contents that have been collected in response to the query query RQ, depending on the sources that produced them. For example, a same source Sk has produced at least the two contents D _m and D _n , the content D _m at the date T (D _m ) and the content D _n at the date T (D _n ). Optionally, those sources that each correspond to only one reference of collected content, can be removed from the rest of the process, so that are not preserved than multi-content sources. For each of these, each content that it has produced is brought closer to that of the events of the step STi to which this content relates, and a temporal advance is calculated. For example, the content D _m that has been produced by the source S _k concerns the event EV _X , and the temporal advance of the source S _k for this content D _m is T (EV _X ) - T (D _m ), where T (EV _X ) is the date of the event EV _X and T (D _m ) is the production date of the content D _m by the source Sk. Similarly for the content D _n which has also been produced by the source S _k but which relates to the event EV _y : the temporal advance of the source S _k for this content D _n is T (EV _y ) - T (D _n ), where T (EVy) is the date of the event EV _y and T (D _n ) is the production date of the content D _n . Sk source is selected further so that the EV EV events _X and _y are different from one another. All the time advances that are acquired by the source S _k having produced different contents, are then combined at the step ST ₅ to calculate a forward value VA (Sk) which is attributed to this source Sk. In other words: VA (Sk) = f {..., T (EV _X ) - T (D _m ), T (EV _y ) - T (D _n ), ...}, where f is a combination function of all time outflows of the same source of contents. The anticipation value VA (Sk) is then supplied to the user U with an identifier of the source Sk. The diagram of FIG. 2b represents the variations of the number of contents that have been produced per day for the two EV events _X (curve denoted x) and EV _y (curve denoted y), and whose references were collected in step ST ₃ . The ordinate axis of this diagram, denoted N / j, thus locates these numbers of contents produced per unit of time, for example per day. The curve x shows that the contents relating to the event EV _X have mainly been produced late compared to the date T (EV _X ) of this event. This is the case, in particular, for the content D _m , since the difference of dates T (EV _X ) - T (D _m ), which constitutes the temporal advance, is negative. In contrast, many content related to EV event _y , whose content D _n , were produced before this event EV _y , corresponding to anticipatory or premonitory contents. The temporal advance T (EV _y ) - T (D _n ) is then positive.

Returning to FIG. 2a, the step ST-ι is executed using input means of the automatic search module 1, the step ST ₂ is executed by means for producing interrogation requests, the step ST ₃ is executed by content collection means in combination with means for identifying sources and dates of production of the contents, and the steps ST ₄ and ST ₅ are executed by content source selection means in combination with calculation means of the automatic search module 1.

FIGS. 3a and 3b illustrate second embodiments of the invention, in which the events to which the collected contents refer are not known initially by the user U. The ST-T step consists of an input, for example by the user U, of a domain of interest D1, for example by using a business request as described above. In step ST ₂ ', the automatic search module 1 constructs the query RQ query from the field of interest Dl. The interrogation of the database BD by the request RQ and the step ST ₃ of collection of content references that correspond to the request RQ are identical to those of the implementation modes of Figure 2a.

The additional step ST ₃ 'is illustrated by FIG. 3b, and aims to determine, with as much likelihood as possible, the dates of the events that are concerned by the contents whose references were collected at the step ST ₃ . This likelihood is greater when the content production dates are grouped into separate or roughly separate periods, so that the existence of a separate event can be attributed to each period.

In step ST ₃ ', the automatic search module 1 identifies maxima in the variations of the number N / j of the contents that were produced per day during a period of analysis PA, and whose references were collected at step ST ₃ . This is the total number of content collected per unit of time, for example per day, regardless of the event that is affected by each content. For example, the curve of the number N / d as a function of time t which is represented in the diagram of FIG. 3b may have three maxima, denoted Mi, M ₂ and M ₃ , corresponding to the dates T (Mi), T (M ₂ ) and T (M ₃ ) respectively. So, assuming that each maximunn of the curve of N / j as a function of time was probably caused by an event covered by the interrogation request RQ, the maximum Mi is identified in the rest of the process to a first event that would have occurred on the date T (Mi), and likewise the maximum M ₂ is identified with a second event that would have occurred on the date T (M ₂ ), and the maximum M ₃ is identified with a third event that would have occurred on the date T ( M ₃ ). According to another hypothesis of likelihood, each content whose reference has been collected in step ST ₃ concerns that of the events thus identified which is closest to it chronologically. Thus, in the example of FIG. 3b, the content D ₂ is assumed to relate to the event of the maximum Mi, anticipating this event Mi, the content D ₃ is assumed to also relate to the event of the maximum Mi, but it being posterior, and the content D ₄ is assumed to relate to the event of the maximum M ₂ , with anticipation with respect to the latter. In FIG. 3b, T (RQ) designates the date on which the query query RQ is used in the database BD. The date T (RQ) may be the end of the analysis period PA during which the variations of the number N / d as a function of time t are analyzed, but not necessarily. The analysis period PA can possibly be stopped before the date T (RQ). Step ST ₄ 'of FIG. 3a corresponds to step ST ₄ of FIG. 2a, replacing the actual date of the event concerned by each content by the most likely date of an event that would be concerned by this content. , obtained from the variations of the number N / day as a function of time t as just described. Thus, in the example of FIG. 3b, the anticipation value VA (S ₂ ) of the source S ₂ depends on the positive time advance T (Mi) - T (D ₂ ), the anticipation value VA (S ₃ ) of the source S ₃ depends on the negative time advance T (Mi) - T (D ₃ ), and the anticipation value VA (S ₄ ) of the source S ₄ depends on the positive time advance T (M ₂ ) - T (D ₄ ).

Optionally also for the second embodiments that are being described, an anticipation value can be calculated only for those sources that have produced at least two contents that corresponded to the RQ request. , and which concern different maxima of the number N / d. Thus, for the source S _k which produced a content D _m chronologically close to a maximum M _x of the number N / day of contents produced per day, and which also produced a content D _n chronologically close to a maximum M _y of the number N / d, the anticipation value VA (S _k ) depends on the two temporal detours T (M _X ) - T (D _m ) and T (M _y ) - T (D _n ), as indicated in steps ST ₄ 'and ST ₅ ' of Figure 3a. The function f, having as variables the temporal advances of the same source, and which was used in the first embodiments of the invention (FIGS. 2a and 2b), can be used identically for the second embodiments. of the invention (Figures 3a and 3b).

The step STY is executed using the input means of the automatic search module 1, the step ST ₂ 'is executed by the query request generation means, the step ST ₃ is again executed by means of collecting content in combination with the means for identifying sources and content production dates, and the steps ST ₄ 'and ST ₅ ' are executed by the source content selection means in combination with the means for calculating the content. automatic search module 1.

Preferably, the function f which is used to calculate the anticipation values is an increasing function of the algebraic value of each time advance, expressed as the date of the event or a maximum of the curve of the number N / d as a function of the time t, reduced by the date of production of the content, so that a source which is more clairvoyant has a value of anticipation which is higher. Thus, the function f can be such that the contribution in the anticipation value of that of the temporal advance that corresponds to the first content that has been produced by the source in relation to an event, is greater than another contribution corresponding to other content that has also been produced by the same source in relation to the same event. An example of such a function f can be given for each event EV _X of date T (EV _X ), which is either detected by the analysis of the number of documents produced per unit of time, such that this number comes from the request RQ , provided as input to the automatic search module, and for each source S _k having produced a content D _n at the date T (D _n ) which is relative to EV _X , and thus returned in response to the request RQ. Thus, a contribution VA _x (Sk) to the anticipation value of the source Sk relative to the event EV _X , can be for example:

VA _x (Sk) = T (EV _X ) - T (D _n ) if T (D _n ) <T (EV _X ) and T (EV _X ) - T (D _n ) <MA

VA _x (S _k ) = 0 if T (D _n ) <T (EV _X ) and T (EV _X ) - T (D _n )> MA

VA _x (S _k ) = max (0, MA - (T (D ") - T (EV _X ))) if T (EV _X ) <T (D")

where MA is a predetermined constant value which represents the maximum value allowed for an anticipatory contribution. If the source S _k has produced several contents D _n which are related to the same event EV _X , then the contributions corresponding to the anticipation value can be added to each other. Then, the anticipation value VA (Sk) of the source Sk can be: VA (S _k ) = Σ _E vxVA _x (S _k ).

In improvements of the invention, the function f may additionally depend on one of the following additional parameters:

the number N ⁺ _k of events among those of step ST-i, or among those identified in step ST ₃ ', which are concerned or supposed to be concerned by the collected contents that have been produced by the source S _k , whose anticipation value is calculated. It is thus possible to account in the anticipation value VA (S _k ), that the source Sk has produced relevant contents for a large number of events. By way of example, the anticipation value can be replaced by VA '(Sk) = VA (Sk) -N ⁺ _k , where VA (Sk) is as defined above;

the number N ^" _k of the events among those of step ST-i, or among those identified in step ST ₃ ', which are not concerned or which are supposed to be concerned by none of the collected contents which have been produced by the source S _k.It is thus possible to account in the anticipation value VA (Sk), that the source Sk has been mute, or faulty, with respect to certain events. , the anticipation value can be replaced by VA "(Sk) = VA (Sk) / N ^" _k , where VA (Sk) is still as defined above; for each event EV _X , the number NC (EV _X ) of collected content that has been produced or is supposed to have been produced, in connection with this event, regardless of whether the source S _k has or has not content product related to this event. It is thus possible to modulate in the anticipation value VA (Sk), the contribution of each pair formed by one of the contents produced by the source S _k with the event concerned by this content, depending on the importance what happened to this event for all identified sources. By way of example, the contribution to the anticipation value can be replaced by VAx '(Sk) = VA _x (Sk) -NC (EV _x ), where VA _x (S _k ) is as defined above; and

for each event, a value of an HL _X ratio of peak height over peak width, which concerns the variations in the number N / j of collected contents that have been produced per unit of time, in relation to this event, independently because the source S _k has or has not produced content relating to this event. It is thus possible to modulate in the anticipation value VA _x (Sk), the contribution of each pair formed by one of the contents produced by the source S _k with the event concerned by this content, as a function of characteristics of the reaction caused by this event for all identified sources. By way of example, the contribution to the anticipation value can be replaced by VA _x "(Sk) = VA _x (Sk) -HL _x , where VA _x (Sk) is still as defined above.

Possibly, the steps ST ₄ and ST ₅ , or ST ₄ 'and ST ₅ ', can be executed separately for several different sources. Then the identifiers of these sources can be provided to the user U at the end of the process in descending order of the anticipation values that have been calculated. The most clairvoyant sources can thus be presented first to the user U.

It is still possible that clairvoyant sources that have low occurrence levels in the response that has been obtained to the query query RQ, are relegated to worse anticipation values because of other sources that have problems. higher levels of occurrence. The the level of occurrence of a source, or occurrence value, can be determined in particular as the number of contents it has produced and which have been collected in step ST ₃ . Then, a source may optionally be excluded from the remainder of the process if its occurrence value is greater than a fixed threshold value, or is greater than a limit value that is adjusted according to the other sources of content collected.

Once a clairvoyant source has been identified according to the invention, by its anticipation value which is good, it is possible to consult one of the contents that it has produced and which has been collected. The user's time spent searching for precursor information with respect to an event has been reduced thanks to the invention.

But it is possible that a clairvoyant source produces precursor contents in separate domains, so that some of these precursor contents are not collected by the interrogation request. In other words, some content that is produced by the clairvoyant source does not relate to the events that are covered by the query query. In this case, the method of the invention may be executed a first time, corresponding to a first set of events, to identify the clairvoyant source, then executed a second time to correspond to a second set of events which is different from the first set of events. first, but which better corresponds to all the fields of activity of the clairvoyant source. For this, for the second execution, in the STY step, the domain of interest that is entered is advantageously developed taking into account a field of interest of the clairvoyant source. In general, the invention makes it possible to indicate to the user which sources have been active first in his field of interest. Thus, by subsequently following these sources, in the same field of interest or in similar domains, particularly domains extrapolated by collaborative filtering or learning, the user can have direct access to precursor content. Access to such precursor contents can then be provided to the user by specific or priority means, for example by means of alerts, so that the user is aware of the existence of these precursor content even if they present a signal that is still weak for conventional search engines. The invention thus makes it possible to favor the speed of access to new information compared to information that is already widely available. Indeed, a really new information, or precursory information, has not yet had time to be repeated, taken back and / or reused by secondary sources of content other than the initial source of the precursor information.

It is understood that the invention may be reproduced by adapting or modifying secondary aspects thereof, with respect to the embodiments which have just been described in detail. In particular, other mathematical expressions can be used to calculate the anticipation value of a source, provided that it varies monotonically as a function of each time advance of the source that is evaluated. In addition, it is recalled that the shapes of the curves that are shown in Figures 2b and 3b are only examples of changes in the daily number of contents that correspond to a query query. In particular, the number of maximum values of this daily number during the analysis period, as well as each maximum value, and the width and / or the surface of each peak can be arbitrary, and this independently of a peak to another .

Claims

1. A method of identifying an information source, comprising the following steps performed using an automatic search module (1):

IM producing at least one query query (RQ) that corresponds to several events; and

121 collect from at least one database (BD), content references (D ₂ -D ₄ ) that are obtained in response to the query query (RQ), and whose contents each correspond to the at least one of the events, and for each content, identifying a source (S ₂ -S ₄ ) and a date (T (D ₂ ) -T (D ₄ )) of producing said content; characterized in that the method further comprises:

131 among the sources identified (S ₂ -S), selecting at least one source that has produced at least one content (D ₂ -D ₄ ) relating to at least one of the events of step IM;

IAI for each source selected in step 131, and for each content produced by said source that relates to one of the events of step IM, determining a time advance acquired by said source having produced said content, relative to at a date of the event or in relation to a date when other contents relating to said event have been produced, then combining the temporal clearances acquired by one of the sources selected in step 131 in order to calculate a numerical value, called anticipation value and attributed to said source, which varies monotonically according to each time advance determined for this source; then

15 / provide an identifier of at least one of the sources selected in step 131, with the anticipation value that has been calculated in step IAI for said source.

2. Method according to claim 1, wherein said at least one interrogation request (RQ) is produced in step IM from an event capture (EV-ι, EV ₂ , ...) in the automatic search module (1), and each time advance that is acquired by a source is determined in step 141 as a difference between the date of one of the events and a date that said source has produced content relating to said event , and whose reference was collected at step 121.

3. The method of claim 1, wherein the step IM comprises: entering a characterization of a domain of interest (Dl) in the automatic search module (1), the query query (RQ) being determined to from the field of interest entered; and wherein step 121 comprises, after collecting the content references obtained in response to the query query (RQ): determining dates at which larger numbers of said contents have been generated, each date thus determined being associated with one of the events of step IM, and according to which each temporal advance that is acquired by a source is determined in step 14 / as a difference between one of the dates at which a greater number of contents have been produced whose references were collected at step 121, and a date that said source produced content whose reference was also collected at step 121.

4. A method according to any one of the preceding claims, wherein the anticipation value which is calculated for each source selected in step 13 / is an increasing function of each time advance acquired by said source having produced a relative content. at one of the events.

The method of any one of the preceding claims, wherein steps 141 and 15 / are performed for a plurality of sources selected at step 131, and the identifiers of said sources are provided to step 15 / by being ranked according to the anticipation values that have been calculated for each of said sources.

The method according to any one of the preceding claims, wherein an occurrence value is further determined for each of the sources selected in step 131, and one of said sources is rejected if the occurrence value of said source is greater than a predetermined threshold value, or greater than the occurrence value of at least one of the other sources selected in step 131.

7. Method according to any one of the preceding claims, according to which the query request (RQ) produced in step IM is an aggregation of several elementary requests, established according to predetermined aggregation rules, in particular proximity rules. or semantic or linguistic equivalence.

The method according to any one of the preceding claims, wherein the anticipation value is calculated for each source selected in step 131, also according to at least one of the following parameters:

a number of events among the events of the step IM, in relation to each of which at least one content has been produced by said source;

a number of events among the events of the step IM, in relation to each of which no content has been produced by said source;

a number of contents that have been produced in relation to at least one of the events of step IM, and whose references were collected in step 121, said source having or not producing a content related to said event; and

at least one value of a ratio of peak height over width of said peak, relating to variations of a number of contents that have been produced per day in relation with one of the events of step IM, and of which the references were collected at step 121, said source having or not produced a content related to said event.

The method of any one of the preceding claims, further comprising the following step, performed after step 151: 16 / obtaining at least one content that has been generated by a source whose identifier and the value of anticipation were provided at step 151.

A method according to any one of the preceding claims, comprising a first execution of steps IM at 151, then a second execution of steps IM to 151 according to claim 3, and wherein the domain of interest (D1) whose characterization is entered in step IM of the second execution is determined at least partially from another domain of interest relating to a source whose identifier and the anticipation value were provided in step 151 of the first execution.

1 1. Automatic search module (1), comprising:

means for producing at least one interrogation request (RQ) that corresponds to several events;

collection means, adapted to collect from at least one database (BD), content references (D ₂ -D ₄ ) which are obtained in response to said at least one query request (RQ ), and whose contents each correspond to at least one of the events; and

identification means adapted to identify a source (S ₂ -S ₄ ) and a production date for each content (D ₂ -D ₄ ) whose reference has been collected by the collection means; characterized in that the automatic search module (1) further comprises:

selection means adapted to select from identified sources (S ₂ -S) by the identification means, at least one source which has produced at least one content relating to at least one of the events corresponding to the request of interrogation (RQ); calculating means adapted to determine for each source selected by the selection means, a temporal advance acquired by said source having produced a content relating to one of the events, with respect to a date of said event or in relation to a date in which other contents relating to said event have been produced, then to combine the temporal compensations acquired by one of the selected sources in order to calculate a numerical value, called anticipation value and attributed to said source, which varies monotonically as a function of each time advance determined for that source; and

- Output means, adapted to provide an identifier of at least one of the selected sources, with the anticipation value calculated for said source.

12. automatic search module (1) according to claim 1 1, wherein the means for producing the query query (RQ) are adapted to allow a user (U) to enter multiple events (EV-i, EV ₂ , ...), and further adapted to produce the query request from the events entered, and the computing means are adapted to determine each time advance, for a selected source that has produced a content relating to several of the events, such as a difference between the date of one of the events and a date that said source produced content relating to said event, and whose reference was collected by the collection means.

13. automatic search module (1) according to claim 1 1, wherein the means for producing the query query (RQ) are adapted to allow a user (U) to enter a field of interest, and the automatic search module (1) further comprises counting means adapted to count, for several dates, contents that were produced at each of said dates, and whose references were collected by the collection module, then to determine those said dates at which larger numbers of said contents have been produced, each date thus determined being associated with one of the events, and the calculation means are adapted to determine each time advance, for a source selected by the selection means, as a difference between one of the dates on which a greater number of contents whose references have been collected by the means of collection, and a date on which said source has produced content whose reference has also been collected by the collection means.

14. Automatic search module (1) according to any one of claims 1 1 to 13, said automatic search module being further adapted to perform a method that is according to any one of claims 4 to 10.

A computer program product, comprising codes adapted to produce an execution of a method according to any one of claims 1 to 10, when said codes are read and executed by at least one processor, and said at least one a processor has access to said at least one database (BD).