WO2005004003A1

WO2005004003A1 - Method and device for analysing a data stream

Info

Publication number: WO2005004003A1
Application number: PCT/FR2004/050296
Authority: WO
Inventors: Aurélien LETEINTURIER; Jean-Luc Stehle
Original assignee: Everbee Networks
Priority date: 2003-06-30
Filing date: 2004-06-29
Publication date: 2005-01-13
Also published as: EP1639506A1; FR2856868B1; FR2856868A1

Abstract

The invention relates to a method and device for analysing a data stream transmitting through a network in order to determine the stream conformity to one or several protocols or grammars and/or to classify said stream in one or several particular topics and/or to collect information necessary for transmission filtering, in particular in order to improve the network safety. Said invention uses limited information resources and can be used by a device for cutting a liaison between the network and a workstation. Symbols forming said stream are successively taken into account during the passage thereof, the analysis being carried out on-the-fly in two nested phases one of which consists in locating patterns and the other in analysing respective positions and/or relative frequencies. Apprenticeship mechanism makes it possible to improve the relevance of result.

Description

METHOD AND SYSTEM FOR ANALYZING A DATA STREAM

The present invention relates to a method and a system intended for analyzing a data flow passing over a network, with a view, in particular, to collecting information necessary for filtering transmissions so as to improve network security. Today's communication networks make a very large number of information and / or applications available to an ever-increasing number of users. This wealth of possibilities nevertheless has some drawbacks. The global Internet network indeed includes as many pornographic pages as documents dealing with academic subjects, if not more. Unwanted mail invades mailboxes, by advertising messages, often of a pornographic nature, which represents a considerable discomfort and loss of time for users, not to mention the cost, in terms of IT resources, of these mails unwanted. Mention should also be made of computer viruses and other harmful codes which may contaminate the user's computer when they consult information on the network or simply when they receive infected e-mail. Some of these harmful codes can be detected by simply noting the presence of certain patterns (viral signatures, etc.) in the data flow, but a simple detection of these patterns is generally very insufficient to ensure effective protection. The content of pages available on the global network, and more generally of applications accessible by network (reservation services, banking operations,) evolves rapidly and network access providers often store pages or applications with various content on the same machine. . It is therefore impossible to give rules allowing a user to filter information by using as sole data the address of origin of this information. With regard to electronic mail, such filtering poses an insoluble problem. To effectively filter unwanted content, it is therefore necessary to analyze the entire flow of data circulating on the network or between the network and a workstation, then decide, depending on the result of this analysis, whether to deliver the flow to the user or not. Filtering must be carried out not only in relation to the address of the transmitter or in relation to a port number but also and above all in relation to the content of the information which circulates. The classification by keyword turns out to be insufficient, since certain words can be interpreted differently depending on the context, or even be root of different words

(like porn and pornichet, or sexagenarian, or analysis). Access to harmless documentation should not be prejudiced by too strict rules, nor should the classification be too lax. In general, we will designate by theme the class in which we want to classify the data flow or a part of it. In the case of a text in natural language, it could be themes or semantic fields such as violence, weapons, pornography or text of a scientific, legal, religious nature, etc. If the data flow contains an executable code, the theme is linked to characteristics of this code? Is it likely to have read and / or write access to certain files? Is it likely to duplicate to spread in the network? Does it contain a computer virus already referenced? Note that a data flow can belong to several themes. The classification algorithms usually used use a large repository of documents, require large amounts of memory, and of course require having previously found and classified the relevant documents. To perform an effective analysis of the network flow, you must also determine which protocol is used by a particular communication flow, and, if necessary, analyze the flow to extract various information of a semantic nature (port used, address of the 'sender, type of encryption used ...) information which will be used to decide what action to take to ensure network security. Note that several protocols can be nested. One can possibly transmit via a given protocol, a text containing a part respecting another protocol, this text itself being able to contain harmful codes (Viruses, Trojans ...) contained for example in macros of certain office files ( Word, Excel) or in an executable program. Such nesting is called encapsulation. The encapsulation of the protocols plays a role similar to the quotation marks which make it possible, in a text, to quote another text, whose grammatical structure can be very different from the first. The problem posed is therefore complex. On the one hand, it is necessary to recognize which, among a large number of possible protocols, is the particular protocol used in such or such part of the flow. On the other hand, depending on this protocol, it is necessary to determine the theme or themes to which all or part of the content of the data flow is related. The invention which is the subject of this patent provides an original solution to this problem. It makes it possible to analyze, on the fly, data flows passing over a network. For the purposes of the present invention, "on the fly" means that the symbols constituting the data stream • are taken into account sequentially as they are encountered, none of them being stored in memory. In addition, and this is one of the major interests of the present invention, this analysis uses only limited computer resources, both in memory and in computing power and can therefore be implemented in particular on a small device placed in break between the network and a particular workstation or more generally between the network and a computer system connected to this network. This device can be an independent unit and it can also be integrated into a modem, a network card, a router or any other communication unit. Network security can thus be ensured within the network itself, and not on the user's workstation. The invention which is the subject of this patent therefore prevents a significant part of the computer resources of the workstation from being used by security applications. On the other hand, it makes it possible to filter unwanted content before it enters the user's workstation. Among the many applications of the invention, it should be noted that it thus makes it possible to produce an antivirus which detects harmful codes, even before they have reached the computer, and which can thus block their passage, as opposed to antiviruses classics that can only detect viruses once they have entered the workplace. The invention also makes it possible to carry out parental control functions, filtering texts which are not desired to be read by children. Likewise, it could allow, within a company, to filter network accesses and to avoid that employees are tempted to pass too much time on websites unrelated to their missions. In the following, we will say that the data flow passing over the network is composed of symbols (for example bytes), and we will call pattern a sequence of symbols whose presence must be sought within the flow. These reasons include words whose presence in the analyzed data stream can provide useful indications for determining the protocol and / or classifying the text. In addition, certain reasons (such as for example prohibited URLs, or virus signatures and more generally harmful executable codes, etc.) are such that their presence must cause immediate actions to be triggered, possibly in some cases up to '' to cause communication breakdown and / or analysis stop. These reasons will hereinafter be called priority reasons. The invention which is the subject of this patent is structured around four entities. The first entity is a dictionary, called the pattern dictionary, containing the patterns to be searched for, and providing for some of them, additional information useful for determining the protocol and / or classifying the flow. The pattern dictionary can be made up a priori, for example using a corpus of texts whose classification is known, and^• it can be enriched by self-learning as the invention is used. The pattern dictionary can relate to one or more themes. The second entity is an algorithm making it possible to detect on the fly if there are patterns in the pattern dictionary in the data stream. For each symbol constituting the data flow, this algorithm implements a small software unit which we will call, thereafter, a first syntactic analysis process, and which is responsible for detect whether or not there is a pattern starting at this symbol and equal to a pattern in the dictionary. The pattern dictionary is stored computer in a specific format, in the form of a tree structure which makes it possible to optimize both the memory space occupied and the computer resources necessary for syntactic and semantic algorithms. The third entity is a semantic analysis realizing the classification of all or part of the data flow by a semantic algorithm. It uses evaluation variables linked to the patterns detected, as well as their frequencies of appearance and / or the mutual positions of the patterns detected. These evaluation variables then make it possible to determine the probabilities of belonging of the data flow to a given theme. Learning mechanisms make it possible to improve the relevance of the results of the invention, this learning being done either in a supervised manner, thanks to the provision of data streams whose classification is known in advance, or in an unsupervised manner , during the operational operation of the invention. The fourth entity performs the protocol search by determining whether the patterns detected by the syntactic algorithm are arranged with respect to each other in accordance with the syntax of one of the protocols sought (IP, FTP, application protocol ...) c ' that is, if they check the grammar of this protocol. From a more abstract point of view, it is a question of recognizing whether the stream analyzed (or a part of it) conforms to one grammar or to several grammars among a family of grammars to be analyzed. This grammatical analysis collects certain semantic information in passing. Classically, we use a compiler to decide if a text conforms to a given grammar (cf. for example A. Aho, R. Sethi, J. Ull an: Compilers, Principles, Techniques et Outils, InterEditions 1989 for French translation). More precisely, this task is carried out by the first two stages of compilation, which are lexical analysis (division of the flow into lexical units) and syntactic analysis (grouping these lexical units into grammatical structures). Collecting the information sought is the third step in the compilation, which is semantic analysis. These conventional techniques are not applicable in the present case, because they do not make it possible to carry out on the fly, in a single pass of the stream to be analyzed, the search for patterns, the semantic analysis and the search for protocols. In addition, it does not allow the simultaneous analysis, during this single passage, of several protocols in accordance with several grammars. Indeed, the implementation of as many compilers as there are grammars taken into account would lead to an increase in the necessary computer resources (computation time and memory spaces) which would quickly become prohibitive, in particular in the event of implementation implemented on an embedded processor placed on the network. The principle of using an automaton to recognize if a text (series of symbols) belongs to a given language (therefore respects a grammar) is classic (cf. for example J.-L. Stehlé, P. Hochard: Computers et Langages, Éditions Ellipses 1989). When the grammar is rational, a finite automaton is sufficient. When the grammar is algebraic and not rational, a battery-powered automaton is required. Note that, in certain cases, the use of a battery-powered automaton even for a rational grammar can lead to significant savings in computer resources (memory, computing power / processing time) compared to the use of a finite automaton. We refer to the works cited for details on grammar concepts. One of the original ideas of the present invention lies in the way in which these theoretical concepts are implemented, allowing simultaneous and on-the-fly taking into account of several grammars while minimizing computer resources (computation time, memory space) implemented. More specifically, the invention relates to a method of on-the-fly analysis of a data stream which is presented as a series of symbols taken from a set of symbols called hereinafter the alphabet of symbols. The method which is the subject of the present invention implements a dictionary hereinafter called the pattern dictionary, composed of certain particular sequences of symbols hereinafter called patterns. The pattern dictionary is represented in the form of a tree structure made up of branches and nodes. Each branch of the tree structure starts from a node of the tree structure, hereinafter called the start node of the branch, and arrives at another node of the tree structure below called the end node of the branch. A node in the tree from which no branch of the tree leaves is hereinafter called a terminal node. The tree structure is such that there is one and only one node to which no branch arrives, this node to which no branch arrives is hereinafter called the root. Each branch of the tree structure is assigned a symbol from the alphabet of symbols, hereinafter called the branch label. Each node of the tree is associated with a series of symbols hereinafter called the prefix of this node, composed of the labels of the branches connecting the root to this node, these labels being taken in the order in which they are encountered. The prefix of the root is the sequence containing no elements. Each node is also associated with a family of patterns formed by the patterns which are contained in the dictionary of patterns and which begin with the prefix of this node. The number of patterns contained in this family is hereinafter called the richness of the knot. No tree node has a wealth equal to zero. In addition, every terminal node of the tree has a prefix equal to a pattern contained in the dictionary of patterns and for any pattern contained in the dictionary of patterns, there is one and only one node of the tree whose prefix is equal. on this ground. Each node in the tree is associated with a number called the address of that node so that the addresses of two different nodes are different numbers. The method comprises a first preliminary step of constituting the dictionary of patterns. The method comprises a first phase consisting in detecting the presence or not, within the data stream, of patterns belonging to the dictionary of patterns. This first phase operates on the fly and successively takes into account the symbols constituting the data flow. It is such that if a pattern, belonging to the dictionary of patterns, is present in the data stream, this presence is detected as soon as the last symbol constituting this pattern is taken into account. The first phase implements the first syntactic analysis processes operating in parallel, with each of them being associated - on the one hand, one of the symbols constituting the data flow, this symbol being hereinafter called the start symbol of this first parsing process and never being modified during the execution of the first parsing process, - on the other hand a number equal to the address of a node in the tree structure, this number being hereinafter called the position of this first parsing process and being intended to be modified during the execution of the first parsing process. Each of the first syntactic analysis processes begins to run as soon as its start symbol is taken into account and is responsible for detecting, for all successive values of the integer N, if the sequence of N symbols consecutive, extracted from the data stream, this extraction starting at this start symbol, is equal to one of the patterns from the dictionary of patterns. When a first parsing process begins to execute, its position is equal to the address of the root of the tree. The execution of a first syntactic analysis process includes, when taking into account each of the symbols constituting the data flow, the following steps: - the step of identifying whether there is, in the tree structure, one or more branch (es) starting from the node whose address is equal to the position of this first syntactic analysis process and whose label is equal to the symbol taken into account, this or these branch (es) being above after called (s) the active branch (es) for this first parsing process, - the step - if there is no active branch for this first parsing process, d 'stop the execution of this first parsing process, - if there is only one active branch for this first parsing process, give as value the position of the first parsing process the address of the arrival node of this active branch, - if there are several active branches for this first process of an syntactic analysis, to duplicate this first syntactic analysis process as many times as necessary, so as to associate with each of these active branches, a copy of this first syntactic analysis process, the position of this copy then being equal to l address of the arrival node of the active branch with which it is associated - the step, for this first parsing process, if its execution was not stopped during the previous step, or, in the case where there was duplication during the previous step, the step, for each of the copies of this first syntactic analysis process, to indicate whether the prefix of the node whose address is equal to the position of this first parsing process is equal to a pattern contained in the pattern dictionary, this pattern then being hereinafter called a pattern detected by this first parsing process. The first phase further comprises, after taking into account each of the symbols constituting the data stream, a first complementary step consisting in providing a list of all the patterns detected, when this symbol is taken into account, by the at least one of the first parsing processors running. This list is hereinafter called the list of detected patterns and therefore includes all the patterns present in the stream to be analyzed, which end at the symbol taken into account and which are equal to a pattern from the dictionary of patterns. Preferably, according to the invention, the pattern dictionary associates with all or part of the patterns one or more numerical values hereinafter called pattern coefficients and / or information hereinafter called pattern information. The first phase is then such that if the dictionary of patterns associates with a pattern coefficients of patterns and / or information of patterns, these coefficients of patterns and / or this information of patterns are provided by the first phase at the same time as the presence of this pattern is detected. Preferably, according to the invention, the method also uses one or more variables hereinafter called evaluation variables, and it further comprises a preliminary step consisting in initializing these evaluation variables to values fixed in advance. and hereinafter called initial values of the evaluation variables. The method then further comprises a second phase, implementing an analysis algorithm, which takes into account one or more arguments and which has the effect, depending on the value of these arguments, of modifying the value of all or part of the variables devaluation. The method is then such that the first phase and the second phase are executed on the fly, in a nested fashion, the analysis algorithm being executed as soon as the first phase has provided a detected pattern, this detected pattern then being one of the arguments of the analysis algorithm. Preferably, according to the invention, the pattern dictionary associates with all or part of the patterns one or more numerical values hereinafter called pattern coefficients and / or information hereinafter called pattern information, the first phase then being such that if the pattern dictionary associates with a pattern pattern coefficients and / or pattern information, these pattern coefficients and / or this pattern information are provided by the first phase at the same time as the presence of this is detected pattern, these pattern coefficients and / or this pattern information then being arguments of the analysis algorithm. Preferably, according to the invention, the analysis algorithm is also executed as soon as the first phase has completed taking into account a symbol of the data stream, this symbol taken into account then being an argument of the analysis algorithm. Preferably, according to the invention, the first preliminary step further comprises a sub-step consisting in marking all or part of the patterns as being priority patterns, and in associating with each of these priority patterns at least one action to be executed. The first complementary step then further comprises the sub-step of launching the execution of the action (s) associated with all of the priority patterns contained in the list of detected patterns. The implementation of the analysis algorithm is inhibited when one of the detected patterns is a priority pattern with which an action is associated having the effect of inhibiting this analysis algorithm. Preferably, according to the invention, the method comprises the prior step of modifying the dictionary of patterns, the result of this modification being hereinafter called the modified pattern dictionary, and of modifying the tree structure, by adding new branches. and new nodes, and / or by removal of branches and nodes. The result of this modification is hereinafter called the modified tree structure. The modified tree structure is such that any terminal node in this modified tree structure has a prefix equal to a pattern contained in the modified pattern dictionary and that, for any pattern contained in the modified pattern dictionary, there is one and only one node. of the modified tree structure whose prefix is equal to this pattern. Preferably, according to the invention, the analysis algorithm of the second phase consists in determining the probability or the probabilities that all or part of the data constituting the data flow belongs to one or more particular thematic (s) ), these probabilities being hereinafter called membership probabilities and forming part of the evaluation variables. The membership probabilities are calculated using a function, hereinafter called the evaluation function, which takes into account the patterns detected during the first phase. The analysis algorithm of the second phase modifies the values of all or part of the evaluation variables, in particular the values of the probabilities of belonging, by applying, for each symbol taken into account, the following steps: - the step , if, at the end of the first phase, a pattern, belonging to the dictionary of patterns and ending with the symbol taken into account, has been detected in the data flow, modify all or part of the evaluation variables by setting implements an algorithm hereafter called re-evaluation algorithm, this re-evaluation algorithm taking into account the pattern which has been detected, the method which is the subject of the present invention being such that, if the dictionary of patterns associates with the pattern having been detected coefficients of patterns and / or pattern information, the reassessment algorithm also takes into account these pattern coefficients and / or this pattern information, - the step, if, at the end of the first phase, no pattern, belonging to the dictionary of patterns and ending with the symbol taken into account, has been detected in the data flow, to modify all or part of the variables evaluation by implementing an algorithm hereinafter called the relaxation algorithm, - the step, after implementation as appropriate, of the re-evaluation algorithm or of the relaxation algorithm, of applying an algorithm, hereinafter called the final probability calculation algorithm, which takes as arguments the values of all or part of the evaluation variables and provides as a result provisional values of the probability or probabilities as the part of the data stream which ends at symbol taken into account belongs to the particular thematic or particular thematics, the values then taken by this or these probability (s) being hereinafter called (s) local probabilities. The evaluation function then consists in calculating these probabilities of belonging, for each particular thematic, by taking into account the successive values of the local probabilities, provided by the algorithm of final calculation of the probability, after taking into account all the symbols constituting the data flow. Preferably, according to the invention, the re-evaluation algorithm is reduced to a first family of functions taking as variables all or part of the evaluation variables and such that if the dictionary of patterns associates with the pattern which has been detected coefficients of patterns and / or pattern information, this first family of functions also takes as variables these pattern coefficients and / or this pattern information. This first family of functions then provides as a result the new values of all or part of the evaluation variables. The function (s) constituting this first family of functions are functions dependent on a family of parameters hereinafter called parameters of the re-evaluation algorithm. The relaxation algorithm then boils down to a second family of functions taking as variables all or part of the evaluation variables and providing as a result the new values of all or part of these evaluation variables. The function (s) constituting this second family of functions are functions dependent on a family of parameters hereinafter called parameters of the relaxation algorithm. The algorithm for the final calculation of probabilities then boils down to a third family of functions taking as variables the evaluation variables and providing as a result the local probabilities. The function (s) constituting this third family of functions are functions dependent on a family of parameters hereinafter called parameters of the final probability calculation algorithm. The pattern coefficients, the initial values of the evaluation variables, the parameters of the re-evaluation algorithm, the parameters of the relaxation algorithm, the parameters of the final probability calculation algorithm are hereinafter called so the calibration parameters. Preferably, according to the invention, the method further comprises the additional preliminary step of associating with each symbol of the alphabet of symbols, a weighting coefficient hereinafter called symbol weighting coefficient. The second family of functions and / or the third family of functions then also take as an additional variable the symbol weighting coefficient associated with the symbol taken into account and the calibration parameters further comprising these symbol weighting coefficients. Preferably, according to the invention, the method comprising an additional step, hereinafter called recalibration, consisting, after the analysis of the data flow and depending on the results provided by the evaluation function, to execute one or more of the following sub-steps: - the sub-step of adding one or more patterns to the pattern dictionary, - the sub-step of removing one or more patterns from the pattern dictionary, - the sub-step of varying all or part of the calibration parameters. Preferably, according to the invention, the method comprising a preliminary phase hereinafter called the learning phase, consisting in repeatedly repeating the following steps: the step of operating the first phase and the second phase on flow of data for which the probabilities of belonging to particular themes are known in advance, - the step of performing the recalibration, so that the probabilities of belonging determined during the second phase of the process are the closest possible values set in advance. Preferably, according to the invention, the dictionary of patterns and the calibration parameters can be modified using information from an external source. Preferably, according to the invention, the patterns contained in the pattern dictionary are classified into three categories of patterns hereinafter called category of operational patterns, category of candidate patterns and category of learning patterns. The pattern dictionary is then such that it associates pattern coefficients at least with each of the candidate patterns and with each of the learning patterns. The method is such that the possible presence in the data stream of patterns belonging to the category of candidate patterns is detected during the first phase but is not taken into account during the second phase. It also includes a self-learning phase taking place in parallel of the first phase and the second phase. This self-learning phase itself comprising two parts called below selection of apprentices and training of apprentices. The selection of apprentices consists, when the presence of a pattern belonging to the category of candi ats patterns was detected during the first phase, to modify all or part of the pattern coefficients of the pattern thus detected, this modification taking into account all or part of the evaluation variables. It also consists, depending on the values taken by the pattern coefficients of the pattern thus detected, in passing or not passing this pattern thus detected in the category of patterns in learning. The training of apprentices consists, when the presence of a pattern belonging to the category of patterns in learning is detected during the first phase, to give new values to the pattern coefficients of the pattern thus detected, these new values being determined at from the evaluation variables and the prior values of the pattern coefficients of the pattern thus detected, the training of the apprentices being able, moreover, depending on the evolution of the values of the pattern coefficients of the pattern thus detected, to modify the category of this pattern . Preferably, according to the invention, the method implements a family of grammars, composed of grammars, each of these grammars comprising one or more rules capable of being verified or not by a series of patterns, a series of patterns verifying all the rules of a grammar belonging to this family of grammars being hereinafter grammatically correct for this grammar. The objective of the second phase analysis algorithm is then to perform a grammatical analysis of the sequences of patterns formed from all or part of the patterns detected by the first syntactic analysis processes implemented during the first phase, such a series of patterns being hereinafter called the detected series. This grammatical analysis checks, for each of the detected sequences and for each of the grammars belonging to the grammar family, whether the detected sequence is grammatically correct for this grammar. Preferably, according to the invention, the family of grammars being represented by an automaton. This automaton is made up of states and transitions. Each state of the PLC is associated with a number called the address of this state, so that the addresses of two different states are different numbers. At least one of the states of the PLC is called the final state of this PLC. Each grammar taken from the grammar family is associated with a state of the automaton, called the initial state of the automaton for this grammar, and with each transition of the automaton there are associated two states of the automaton, hereinafter called transition start state and transition finish state. A grammar taken from the grammar family is associated with each of the final states of the automaton, and each transition of the automaton is assigned a set composed of one or more sequences of symbols, hereinafter called the label. total of this transition, one of these symbol sequences of the total label being a pattern belonging to the dictionary of patterns and being hereinafter called the lexical label of the transition. The method comprises a second preliminary step consisting in building this automaton. The second phase implements, second grammatical analysis processes operating in parallel, with each of them being associated: - a pattern detected during the first phase, hereinafter called the reason for starting this second process of grammatical analysis and never being modified during the execution of the second grammatical analysis process, a grammar belonging to the grammar family, a number equal to the address of a state of the automaton, this number forming part of the evaluation variables and being hereinafter called the position of the second grammatical analysis process and being intended to be modified during the execution of this second grammatical analysis process. The second grammatical analysis process begins to run as soon as the start pattern is detected and is responsible for analyzing whether a sequence of detected patterns starting at this start pattern is grammatically correct for the grammar associated with this second process. grammatical analysis. The position of the second grammar analysis process being, at the moment when this second grammar analysis process begins to run, equal to the address of the initial state of the automaton for the grammar associated with the second grammar process grammatical analysis. The execution of the second grammatical analysis process includes, when taking into account each of the detected patterns: - a) a filtering step consisting in deciding whether one or more of the transitions of the automaton will be used when taking into account account of the detected pattern, this filtering step successively considering all the transitions whose starting state is the state whose address is equal to the position of the second grammatical analysis process, and applying to each of the transitions thus considered a decision algorithm which aims to decide whether this transition will be used when taking into account the detected pattern, such a transition being then subsequently called an active transition for this detected pattern, this decision algorithm taking as arguments all or part of the total transition label as well as the detected pattern taken into account, - b) a consistent execution step, - if there is no tr active ansition for the second grammatical analysis process, to implement a stop algorithm intended to decide whether this second grammatical analysis process should be stopped and if so, to end this second grammatical analysis process, - if there is only one active transition for the second grammatical analysis process, to give as value to the position of the second grammatical analysis process the address of the arrival state of the active transition, - if there are several active transitions for the second process of grammatical analysis, to perform as many duplicates as necessary of this second grammatical analysis process, so as to associate with each of these active transitions, one copy of this second grammatical analysis process resulting from the duplication (s) ), the position of this copy then being equal to the address of the arrival state of the transition with which this copy is associated, - c) a signaling step consisting, for the second grammatical analysis process or, in L e case where there was duplication during the previous step, for each of the copies of the second grammatical analysis process resulting from this or these duplication (s), to be reported if the state whose address is equal to the position of the second grammatical analysis process is a final state of the automaton. The second phase then further comprises, after taking into account each of the detected patterns, a second complementary step consisting in providing a list of all the final states whose address is equal to the position of at least one of the second processors grammar analysis in progress, this list being hereinafter called list of detected final states. It results from the combination of the technical features of the invention that each final state of the list of detected final states corresponds to a detected sequence, grammatically correct for the grammar associated with this final state, and ending with the detected pattern taken into account. Preferably, according to the invention, to each of the second grammatical analysis processes is associated a series of symbols called the stack of this second grammatical analysis process; this stack being part of the evaluation variables, having an initial value defined in advance at the time when the second grammar analysis process begins to run, and being intended to be modified during the execution of the second process grammatical analysis. The decision algorithm, implemented during the filtering step and applied to a transition, then also takes as argument the value of the stack, and, when the transition is an active transition, it also determines a sequence of symbols called new stack this active transition. The execution stage also performs - if there is only one active transition for the second grammatical analysis process, the operation of replacing the stack with the new stack for this active transition, - if there is had duplication, the operation, for each of the second grammatical analysis process from the duplication (s), to replace the stack of the second grammatical analysis process from the duplication (s), with the new stack for the active transition with which this second grammatical analysis process associated with the duplication (s) is associated. The stop algorithm implemented in the absence of active transitions then also takes the stack as an argument. Preferably, according to the invention, to each of the second grammatical analysis processes is associated a variable called the result variable which forms part of the evaluation variables and which is, when this second grammatical analysis process is started, an initial value defined in advance. The execution of the second grammatical analysis process then including, when taking into account each of the detected patterns, an additional calculation step having for the purpose of modifying the value of the result variable, this calculation step taking as arguments: - the detected pattern, - the value, before modification, of the result variable, - the total label of the active transition, or, if the second grammatical analysis process comes from a duplication, the total label of the active transition with which it is associated, and providing as result a value which will then be assigned as new value to the variable result. Preferably, according to the invention, the method applies to a data flow passing over a communications network, and it further comprises a final phase consisting, depending on the values taken by the evaluation variables, of letting the data flow without any modification or to execute one or more of the following actions: - modify the content of the data flow, - modify the destination address of the data flow, - send information to a previously specified address, - block the passage of the data flow. Preferably, according to the invention, the method further comprises an initial phase of temporary storage of all or part of the data flow, the data flow or the part of the data flow thus stored, being destocked and transmitted, with or without modification, at the end of the final phase. The invention also relates to a system for analyzing a data flow, this analysis being carried out on the fly, the data flow being presented as a series of symbols taken from a set of symbols called hereinafter the alphabet symbols, the system implementing a dictionary below called a pattern dictionary and composed of certain particular sequences of symbols hereinafter called patterns. The pattern dictionary is represented in the form of a tree structure made up of branches and nodes. Each branch of the tree structure starts from a node of the tree structure, hereinafter called the start node of this branch, and arrives at another node of the tree structure below called the end node of the branch. A node in the tree from which no branch of the tree leaves is hereinafter called a terminal node. The tree structure is such that there is one and only one node to which no branch arrives, this node to which no branch arrives being hereinafter called the root. Each branch of the tree is assigned a symbol from the alphabet of symbols, hereinafter called the label of this branch and each node of the tree is associated with a series of symbols hereinafter called the prefix of this node, this prefix being composed of the labels of the branches connecting the root to this node, these labels being taken in the order in which they are encountered. The prefix of the root is the sequence containing no elements. Each node is also associated with a family of patterns formed of the patterns contained in the dictionary of patterns and which begin with the prefix of this node, the number of patterns contained in the family associated with this node being hereinafter called the richness of the node. The tree structure is such that none of its nodes has a richness equal to zero and it is further such that any terminal node of the tree structure has a prefix equal to a pattern contained in the dictionary of patterns and that for all motif contained in the motif dictionary, there is one and only one node in the tree structure whose prefix is equal to this motif. Each node in the tree is associated with a number called the address of that node, so that the addresses of two different nodes are different numbers. The system includes first storage means for storing this tree structure. The system comprises first processing means making it possible to detect the presence or not, within the data flow, of patterns belonging to the pattern dictionary, these first processing means operating on the fly and successively taking into account the symbols constituting the flow of data. The first processing means are such that if a pattern, belonging to the dictionary of patterns, is present in the data stream, the first processing means detect its presence as soon as the last symbol constituting this pattern is taken into account. The first processing means make it possible to implement the first syntactic analysis processes operating in parallel, with each of them being associated - on the one hand, one of the symbols constituting the data flow, this symbol being hereinafter called symbol of starting this first parsing process and never being modified during the execution of the first parsing process, - on the other hand storage means intended to contain a number equal to the address d a node of the tree structure, this number being hereinafter called the position of the first parsing process and being intended to be modified during the execution of the first parsing process. The first syntactic analysis process begins to run as soon as the start symbol is taken into account and is responsible for detecting, for all the successive values of the integer N, if the sequence of N consecutive symbols, extracted from the stream of data, this extraction starting at the start symbol, is equal to one of the patterns in the pattern dictionary. The position of the first parsing process is, when this first parsing process begins to run, equal to the address of the root of the tree. The first processing means allowing, taking into account each of the symbols constituting the flow of data, for each first syntactic analysis process, to carry out the following operations: the operation of locating whether there is, in the tree structure, one or more branch (es) starting from the node whose address is equal to the position of this first syntactic analysis process and whose label is equal to the symbol taken into account, this or these branch (es) being hereinafter called (s) the active branch (s) for this first syntactic analysis process, - the operation - if there is no active branch for this first syntactic analysis process, to stop the execution of this first syntactic analysis process, - s 'there is only one active branch for this first syntactic analysis process, to give as value to the position of this first syntactic analysis process the address of the arrival node of this active branch, - if there has several active branches for this first parsing process, to duplicate this first process of a parsing as many times as necessary, so as to associate with each of these active branches, a copy of this first syntactic analysis process, the position of this copy then being equal to the address of the arrival node of the branch active with which it is associated, - the operation, for this first parsing process, if its execution was not stopped during the previous step, or, in the event of duplication during from the previous step, the operation, for each copy of the first parsing process, to indicate whether the prefix of the node whose address is equal to the position of this first parsing process is equal to one motif contained in the dictionary of motifs, this motif then being hereinafter called a motif detected by this first syntactic analysis process. The first processing means also making it possible, after taking into account each of the symbols constituting the data stream, to carry out a first complementary operation consisting in providing a list of all the patterns detected, when the symbol is taken into account, by at least one of the parsing processors running, this list being hereinafter called the list of detected patterns. It results from the combination of the technical features of the invention that the list of detected patterns includes all the patterns present in the stream to be analyzed, ending at the symbol taken into account and equal to a pattern from the dictionary of patterns. Preferably, according to the invention, the dictionary of patterns associates with all or part of the patterns one or more numerical values hereinafter called pattern coefficients and / or information hereinafter called pattern information, and the system comprising means for storing these pattern coefficients and / or this pattern information, the first processing means then being such that if the pattern dictionary associates pattern coefficients and / or pattern information with a pattern, these pattern coefficients and / or this pattern information is supplied by the first processing means at the same time as the presence of the pattern is detected. Preferably, according to the invention, the system comprises storage means making it possible to store one or more variables called hereinafter evaluation variables and processing means making it possible to initialize these evaluation variables at values fixed at l 'advance and hereinafter called initial values of the evaluation variables. The system then further comprises second processing means making it possible to implement an analysis algorithm, this analysis algorithm taking into account one or more arguments and having the effect, depending on the value of these arguments, of modifying the value of all or part of the evaluation variables. The system is such that the first processing means and the second processing means are executed at the stolen, nested, the analysis algorithm being implemented as soon as the first processing means have provided a detected pattern, this detected pattern then being one of the arguments of the analysis algorithm. Preferably, according to the invention, the pattern dictionary associates with all or part of the patterns one or more numerical values hereinafter called pattern coefficients and / or information hereinafter called pattern information, and the system comprises means for storing pattern coefficients and / or pattern information. The first processing means are then such that if the dictionary of patterns associates with a pattern pattern coefficients and / or pattern information, these pattern coefficients and / or this pattern information are provided by the first processing means. at the same time as the presence of this pattern is detected, the pattern coefficients and / or the pattern information then being arguments of the analysis algorithm. Preferably, according to the invention, the analysis algorithm is also implemented as soon as the first processing means have completed taking into account a symbol constituting the data stream, this symbol taken into account then being an argument of the analysis algorithm. Preferably, according to the invention, the first processing means further comprise a functionality making it possible to mark all or part of the patterns as being priority patterns, and to associate with each of these priority patterns at least one action to be executed. The system which is the subject of the present invention further comprises processing means making it possible to execute the actions associated with these priority patterns. The first complementary operation then also consists in launching the execution of the action (s) associated with all of the priority patterns contained in the list of detected patterns. The system further comprises processing means making it possible to inhibit the implementation of the analysis algorithm when one of the patterns detected is a priority pattern which is associated an action having the effect of inhibiting the analysis algorithm. Preferably, according to the invention, the system further comprises processing means making it possible to carry out the prior operation of modifying the dictionary of patterns, the result of this modification being hereinafter called the dictionary of modified patterns, and of modifying the tree, by adding new branches and new nodes, and / or by deleting branches and nodes. The result of this modification is hereinafter called the modified tree structure. The modified tree structure is further such that any terminal node of the modified tree structure has a prefix equal to a pattern contained in the modified pattern dictionary and that for any pattern contained in the modified pattern dictionary, there is one and one only node of the modified tree structure whose prefix is equal to this pattern. Preferably, according to the invention, the analysis algorithm consists in determining the probability or the probabilities that all or part of the data constituting the data flow belongs to one or more particular thematic (s), these probabilities being hereinafter called membership probabilities and forming part of the evaluation variables. These membership probabilities are calculated using a function, hereinafter called the evaluation function, which takes into account the patterns detected by the first processing means. The system includes processing means making it possible to modify the values of all or part of the evaluation variables, in particular the value of the probabilities of belonging, by applying, for each symbol taken into account, the following operations: the operation, if a pattern, belonging to the dictionary of patterns and ending with the symbol taken into account, was detected in the data stream by the first processing means, to modify all or part of these evaluation variables by implementing an algorithm. hereinafter called the re-evaluation algorithm, this re-evaluation algorithm taking into account the pattern which has been detected, the system being such that, if the dictionary of patterns associates with the pattern which has been detected, coefficients of patterns and / or information of patterns, the reassessment algorithm also takes into account these pattern coefficients and / or this pattern information, the operation, if no pattern, belonging to the pattern dictionary and ending with the symbol taken into account, has been detected in the data flow by the first processing means, to modify all or part of the evaluation variables by implementing an algorithm hereinafter called relaxation algorithm, - the operation, after implementation as appropriate , the reassessment algorithm or the relaxation algorithm, to apply an algorithm, hereinafter called the final probability calculation algorithm, which takes as arguments the values of all o u part of the evaluation variables and provides as a result provisional values of the probability or probabilities that the part of the data flow which ends at the symbol taken into account belongs to the particular thematic or to the particular thematic, the values then taken by the probability or probabilities being hereinafter called local probabilities. The system further comprises processing means making it possible to calculate, using the evaluation function, the probabilities of belonging, for each particular thematic, by taking into account the successive values of the local probabilities, provided by the algorithm of final probability calculation, after taking into account all the symbols constituting the data flow. Preferably, according to the invention, the re-evaluation algorithm is reduced to a first family of functions taking as variables all or part of the variables evaluation, this first family of functions being such that if the dictionary of patterns associates with the pattern which has been detected patterns coefficients and / or pattern information, the first family of functions also takes as variables these pattern coefficients and / or this reason information. The first family of functions provides as a result the new values of all or part of the evaluation variables, the function (s) constituting this first family of functions being functions dependent on a family of parameters called hereinafter parameters of the reassessment algorithm. The relaxation algorithm is reduced to a second family of functions taking as variables all or part of the evaluation variables and providing as result the new values of all or part of the evaluation variables, the function (s) constituting the second family of functions being functions dependent on a family of parameters hereinafter called parameters of the relaxation algorithm. The final probability calculation algorithm is reduced to a third family of functions taking as variables the evaluation variables and providing as a result the local probabilities, the function (s) constituting the third family of functions being functions dependent on 'a family of parameters hereinafter called parameters of the final probability calculation algorithm. The pattern coefficients, the initial values of the evaluation variables, the parameters of the re-evaluation algorithm, the parameters of the relaxation algorithm, the parameters of the final probability calculation algorithm are hereinafter called so the calibration parameters. The system includes storage means for storing the values of the calibration parameters. Preferably, according to the invention, the system further comprises processing means making it possible to carry out an additional operation consisting in associating with each symbol of the alphabet of symbols, a weighting coefficient hereinafter called symbol weighting coefficient. The second family of functions and / or the third family of functions then also take the symbol weighting coefficient associated with the symbol taken into account as an additional variable. The calibration parameters further include the symbol weighting coefficients. Preferably, according to the invention, the system further comprises processing means making it possible to carry out an additional operation, hereinafter called recalibration, consisting, after the analysis of the data flow and as a function of the results provided by the function evaluation, to execute one or more of the following sub-operations: - the sub-operation of adding one or more patterns to the pattern dictionary, - the sub-operation of removing one or more patterns from the pattern dictionary, - the sub-operation to vary all or part of the calibration parameters. Preferably, according to the invention, the system comprises processing means making it possible to carry out a prior operation hereinafter called learning, consisting in repeatedly repeating the following sub-operations: - the sub-operation of operating the first means of processing and the second processing means on data streams whose probabilities of belonging to particular themes are known in advance, - the sub-operation of performing the recalibration so that the probabilities of belonging determined by the second processing means are as close as possible to values fixed in advance. Preferably, according to the invention, the system comprises processing means making it possible to modify the dictionary of patterns and the calibration parameters using information coming from an external source. Preferably, according to the invention, the patterns contained in the pattern dictionary are classified into three categories of patterns hereinafter called category of operational patterns, category of candidate patterns and category of learning patterns. The pattern dictionary is such that it associates pattern coefficients at least with each of the candidate patterns and with each of the learning patterns. The system is then such that the possible presence, in the data flow, of patterns belonging to the category of candidate patterns is detected by the first processing means but is not taken into account by the second processing means. The system further comprises processing means making it possible to carry out a self-learning operation, this self-learning operation itself comprising two parts called hereinafter selection of apprentices and training of apprentices. The selection of apprentices consists, when the presence of a pattern belonging to the category of candidate patterns has been detected by the first processing means, to modify all or part of the pattern coefficients of the pattern thus detected, this modification taking into account all or part of the evaluation variables. The selection of apprentices also consists, according to the values taken by the pattern coefficients of the pattern thus detected, in passing or not passing this pattern thus detected in the category of patterns in learning. The training of apprentices consists, when the presence of a pattern belonging to the category of patterns in learning is detected by the first processing means, to give new values to the coefficients of patterns of the pattern thus detected, these new values being determined from evaluation variables and prior values of the pattern coefficients of the pattern thus detected, this training of the apprentices being able, depending on the evolution of the values of the pattern coefficients of the pattern pattern thus detected, modify the category of this pattern thus detected. Preferably, according to the invention, the system comprises processing means making it possible to implement a family of grammars, composed of grammars, each of these grammars comprising one or more rules capable of being verified or not by a series of patterns . A series of patterns verifying all the rules of a grammar belonging to this family of grammars is described below grammatically correct for this grammar. The objective of the analysis algorithm implemented by the second processing means is then to perform a grammatical analysis of the sequences of patterns formed of all or part of the patterns detected by the first processing means, such a sequence of patterns being hereinafter called detected sequence, this grammatical analysis verifying, for each of the detected sequences and for each of the grammars belonging to the grammar family, whether the detected sequence is grammatically correct for this grammar. Preferably, according to the invention, the family of grammars is represented by an automaton consisting of states and transitions. Each state of the PLC is associated with a number called the address of this state, so that the addresses of two different states are different numbers. At least one of the states of the controller is called the final state of the controller. Each grammar taken from the grammar family is associated with a state of the automaton, called the initial state of the automaton for this grammar. Each transition of the PLC is associated with two states of the PLC, hereinafter called the start state of the transition and the arrival state of the transition. A grammar taken from the grammar family is associated with each of the final states of the automaton, and each transition of the automaton is assigned a set composed of one or more sequences of symbols, hereinafter called the label. total of this transition, one of the symbol sequences of the total label being a pattern belonging to the dictionary of patterns and being hereinafter called the lexical label of the transition. The system includes storage means for storing the machine and processing means for performing a second preliminary operation of building the machine. The system_^includes processing means making it possible to implement second grammatical analysis processes operating in parallel, with each of them being associated: - a pattern detected by the first processing means, this pattern being hereinafter called the starting pattern of this second grammatical analysis process and never being modified during the execution of the second grammatical analysis process, - a grammar belonging to the family of grammars, - a number equal to the address of a state of the automaton, this number forming part of the evaluation variables and being hereinafter called the position of this second grammatical analysis process and being intended to be modified during the execution of the second grammatical analysis process. The system comprises processing means making it possible to start the execution of a second grammatical analysis process as soon as its start reason is detected, this second grammatical analysis process being responsible for analyzing whether a series of detected patterns beginning to this start pattern is grammatically correct for the grammar associated with the second grammatical analysis process. The position of the second grammatical analysis process is, at the time when it begins to execute, equal to the address of the initial state of the automaton for the grammar associated with this second grammatical analysis process. These processing means also make it possible to implement the second grammatical analysis processes and make it possible, when taking into account each of the detected patterns, to carry out the following operations: - a) a filtering operation consisting in deciding whether a or more of the transitions of the automaton will be used when the detected pattern is taken into account, this filtering step successively considering all the transitions whose starting state is the state whose address is equal to the position of the second grammatical analysis process, and applying to each of the transitions thus considered a decision algorithm having the object of deciding whether this transition will be used when taking into account the detected pattern, such a transition being then subsequently called an active transition for this detected reason, the decision algorithm taking as argument all or part of the total label of the transition thus that the detected pattern taken into account, - b) an execution operation consisting, - if there is no active transition for the second grammatical analysis process, to implement a stop algorithm having for object decide if this second grammatical analysis process should be stopped and if so, to end this second grammatical analysis process, - if there is only one active transition for this second grammatical analysis process, to give as value at the position of this second grammatical analysis process the address of the arrival state of this active transition, - if there are several active transitions for the second grammatical analysis process, to perform as many duplications of the second grammar analysis process as necessary, so as to associate with each of these active transitions, a copy of this second grammar analysis process resulting from this or these duplication (s), the position of this copy then being equal to the address of the state of arrival of the transition to which this copy is associated, - c) a signaling operation consisting, for the second grammatical analysis process or, in the case of duplication during the previous step, for each of the copies of the second grammatical analysis process resulting from this or these duplication (s), to report if the state whose address is equal to the position of the second analysis process grammatical is an end state of the automaton. The system further comprises processing means allowing, after taking into account each of the detected patterns, to carry out a second complementary operation consisting in providing a list of all the final states whose address is equal to the position of a at least second grammar analysis processors running, this list being hereinafter called list of detected final states. It results from the combination of the technical features of the invention that each final state in this list of detected final states corresponds to a detected sequence, grammatically correct for the grammar associated with this final state, and ending with the detected pattern taken into account. Preferably, according to the invention, each of the second grammatical analysis processes is associated with a series of symbols called stack of this second grammatical analysis process, this stack being part of the evaluation variables, having an initial value defined at the advance at the time when the second grammar analysis process begins to execute, and being intended to be modified during the execution of the second grammatical analysis process. The system then comprises storage means making it possible to store this stack and the decision algorithm implemented during the filtering step and applied to a transition, also takes as argument the value of the stack, and, when the transition is an active transition, the decision algorithm also determines a series of symbols called new stack for this active transition. The execution stage also performs - if there is only one active transition for the second grammatical analysis process, the operation of replacing the stack with the new stack for this active transition, - if there is had duplication, the operation, for each of the second grammatical analysis process resulting from this or these duplication (s), to replace the stack of this second grammatical analysis process resulting from the duplication (s), by the new stack for the active transition with which this second grammatical analysis process associated with the duplication (s) is associated. The stop algorithm implemented in the absence of active transitions also takes the stack as an argument. Preferably, according to the invention, to each of the second grammatical analysis processes is associated a variable called result variable, this result variable being part of the evaluation variables. The system then comprises storage means making it possible to store the value of the result variable and processing means making it possible, when starting a second grammatical analysis process, to set the result variable to an initial value defined at advance, and also making it possible to modify the value of the result variable when taking into account each of the detected patterns, this modification taking into account the detected pattern as well as the total label of an active transition for this second process of grammatical analysis, or if the second grammatical analysis process comes from a duplication, the total label of the active transition with which it is associated. Preferably, according to the invention, the system applies to a data flow passing over a communications network, and further comprising processing means making it possible, depending on the values taken by the evaluation variables, to let the data flow without any modification or to execute one or more of the following actions: - modify the content of the data flow, - modify the destination address of the data flow, - send information to a previously specified address, - block the passage of the data flow. Preferably, according to the invention, the system further comprises means for temporary storage of all or part of the data stream, and means for retransmission, with or without modification, of the stream thus stored. The analysis is carried out, as we have seen, on the fly.

The symbols that make up the data flow are taken into account one after the other, as they arrive. The analysis is carried out in two phases which run in a nested fashion. The first phase first phase is responsible for identifying the patterns, the presence of a pattern being detected when the last symbol making up this pattern is taken into account. This allows, if it is a priority reason, to perform various actions attached to this priority reason, going as far as, in certain cases (viruses, harmful codes, ...) blocking communication or stopping the analysis. This first phase implements the first syntactic analysis processes which operate in parallel, each of them being responsible for detecting the presence of one or more patterns starting at a particular symbol of the data flow, the start symbol. The only case where several patterns are detected from the same start symbol is the case where one of these patterns is the start of another of these patterns. Each first syntactic analysis process starts when its start symbol is taken into account, and stops as soon as the series of symbols encountered since this start symbol is no longer part of one of the patterns in the dictionary. In real applications, there are generally only a small number of first syntactic analysis processes operating simultaneously (it is exceptional that this number exceeds ten). In addition, each of them requires only very limited IT resources. It must keep in memory on the one hand the position of its start symbol within the data flow, (or what is equivalent, the number of symbols which have been taken into account since this start symbol) and of on the other hand the address of a node of the tree structure, that is to say two whole numbers. The operations that a first syntactic analysis process must carry out for each symbol taken into account are limited to a very small number of elementary instructions described above. The first phase of the analysis therefore consumes only a small part of the power of a processor embedded in a device that is broken on the network. In addition, the first parsing processes can easily be implemented on one or more specific hardware blocks, operating in parallel with the main processor of the device. As soon as a pattern is detected, if none of the possible actions attached to this pattern has stopped the analysis, the second phase takes this pattern into account and is executed nested with the first phase. This second phase has two modes of operation. One of them consists of a semantic analysis in order to classify the thematic of all or part of the data flow. The other determines whether the flow conforms to a protocol among a set of given protocols. In certain embodiments cited here only by way of nonlimiting example of the possibilities of the invention presence, these two operating modes are implemented simultaneously, the semantic analysis then taking place only if the data flow conforms to certain protocols. We present below, by way of illustration and in no way limit the possibilities of the present invention, some examples of implementation. In these examples, the invention is implemented in a device through which passes the entire data flow connecting a computer to the network, this device being broken on a cable of a computer network, or, in other modes implementation, integrated into the computer or connected to a communication port thereof. All the information passing through this device is then analyzed in accordance with the method which is the subject of the present invention. In some networks, all or part of the workstations or computer units making up the network are equipped with such a device. Each of these devices contains a processor implementing the method of the invention, and storage means making it possible to store the dictionary of patterns and the calibration parameters. This dictionary of patterns and these calibration parameters can be modified from a central computer server which sends updates. There are many applications. One of them is the detection of viruses: as soon as the first phase has detected the presence of a chain of symbols characteristic of a known virus (what we will call thereafter the footprint of the virus), the device can block the flow of information to prevent the detected virus from spreading through the network. Note that, in this example, the presence of a virus is only detected once the entire fingerprint of the virus, or at least a significant part, has been analyzed by the device. To ensure effective blocking of the information, a buffer memory is provided in the device temporarily storing the information until the moment when the analysis revealed that the information is authorized to circulate. In a particular embodiment of the invention, the device uses a processor capable of analyzing on the fly a flow of several tens of megabits per second. The fact of blocking a few kilobytes in buffer memory introduces a minimal and totally invisible delay for a user of a computer workstation equipped with this device. One of the great advantages of the invention, in this example of application, is that, as soon as a new virus is known, a central computer server can send updates to their dictionary of patterns to all the devices on the network. allow them to prevent the spread of this new virus. If the invention is deployed on a global network such as the Internet, it therefore provides significantly better protection against viruses than all existing systems. In the example described above, the invention only detects already known viruses. In another exemplary implementation, presented here by way of illustration and in no way limiting the possibilities of the present invention, one can obtain indications on the harmful nature of an executable code. If this code contains instructions for writing and / or erasing files it can do considerable damage. If it contains instructions for opening and then reading a file, followed by instructions for sending mail, the code could be a virus, which duplicates itself and sends itself to a family of correspondents. It could also simply have the objective of spying on the content of a workstation without the knowledge of its authorized user. To determine the theme of information passing through the network, whether it be a harmful character, as before, or a pornographic, political, religious character, the process which is the subject of the invention can, in its second phase, analyze the frequencies of appearance and / or mutual positions of the patterns detected during its first phase. Coefficients of patterns relating to each of the themes that we seek to detect are associated with each motif. In addition, each theme has specific evaluation variables associated with it, initialized to zero when the process is started (here the initial values of the evaluation variables are zero) and modified for each pattern detected. The new value of the evaluation variable is determined as being a weighted sum of its old value and of the pattern coefficients provided by the dictionary of patterns (the weighting coefficients of this sum being part of the parameters of the re-evaluation algorithm) . If a symbol of the information to be analyzed is processed without any reason being detected, the relaxation algorithm multiplies the evaluation variables by coefficients between 0 and 1, (these coefficients being one of the parameters of the relaxation algorithm), so that the evaluation variables decrease by exponential decay when no pattern is detected during the successive passage of the symbols of the analyzed information. The evaluation variable associated with the family is therefore linked to the number and positions of the patterns detected in the near past. When it is relevant to identify a couple of patterns separated or not by intermediate symbols, a similar mechanism makes it possible to keep up to date an evaluation variable linked to the presence in the near past of the first or two patterns of the couple, this variable being all the higher as the two patterns are close, and taking even greater values if the couple to be detected was present several times in the recent past. The same technique can be applied to a couple formed by a first motif belonging to a first family and a second motif belonging to a second family (or to the same family). We can, in some examples, work on more complex configurations, such as triplets or quadruplets of patterns or others. The final probability calculation algorithm uses, in this implementation example, weighted sums of the values of the different evaluation variables, associated with thresholds (certain intermediate variables are only taken into account if they are greater than a threshold particular), the thresholds and the weighting coefficients that are part of the parameters of the final probability calculation algorithm. Information here will be all the more likely to be considered as belonging to a particular thematic as it will detect a significant number of patterns, and / or pairs of patterns, which are close to each other and significant for the analyzed theme. . Let us indicate, by way of illustrative example and in no way limiting the possibilities of the present invention, a method of implementing self-learning mechanisms with a view to improving the relevance of the results. In this example, we will only talk about data flows containing texts which we seek to detect if it belongs to a particular theme (violence, pornography, ...) but the principle can be generalized to any type of theme on any type of data flow. The candidate patterns which are added to the pattern dictionary can be words which we think could be related to the theme and be relevant in the detection of this theme, or also words generated automatically by checking certain rules ( minimum number of letters, the proportion of vowels and consonants, the arrangement of these ...). The invention is then made to function by analyzing a large number of texts, the subject of which can be known and provided in advance (supervised learning), or the subject of which is not known in advance and is determined by l invention (unsupervised learning). The possible presence of candidate patterns in the data stream is detected during the first phase of the process which is the subject of the present invention, but this information is not taken into account during the second phase of this process. For each pattern, its frequency of appearance is determined in the various themes to which the data supplied to the invention are attached, the intermediate results necessary for determining these frequencies of occurrence being stored in the coefficients of patterns of the pattern analyzed. Depending on the values taken by these frequencies after analysis of a large number of texts, it will be estimated whether the pattern is more or less correlated with one or other of the themes, and whether it is likely to contribute to the detection of this thematic, or on the contrary to its elimination. In this case, the pattern is selected and goes into the pattern category in training. It will then be taken into account in the second phase of the method which is the subject of the present invention, and its pattern coefficients will be gradually adjusted so as to improve the relevance of the probabilities of belonging found during this second phase. This adjustment of the pattern coefficients is done using a specific adjustment algorithm. The adjustment algorithm uses as input, on the one hand, the probabilities of belonging as calculated during the second phase of the process, as well as the thematic of the text analyzed in the case where this is known (case of supervised learning), and, on the other hand, the prior values of the pattern coefficients which had made it possible to calculate these probabilities of belonging. The adjustment algorithm outputs the new values of the pattern coefficients. The practical applications of the method and system which are the subject of the present invention are numerous. We can cite, by way of illustrative and in no way limitative example of the possible applications, assistance in the selection and classification of letters of application and curriculum vitae. Certain large recruitment consultancy firms or recruiting services of large companies indeed receive, by e-mail, a large number of applications, often spontaneous. The present invention then makes it possible to make a first classification and to redirect each application to people or services likely to be particularly interested in the profile of this application. The relevance of the classification is constantly checked by the recipients of the applications, and the results of this check are used to feed the self-learning mechanisms. When the second phase performs a grammatical analysis (therefore checks whether the analyzed flow respects a grammar among a given family of grammars), then, for each of these grammars and for each of the patterns detected during the first phase, second grammatical analysis process. Each of them consumes little IT resources. In addition to the information linked to the symbol and / or the reason for starting and the address of a state of the PLC, it must, in certain cases, keep a battery and / or a result variable intended to receive the information of semantic order (port number, sender address, type of encryption used etc. etc.) required by the application. This represents only a few tens or hundreds of bytes, and the computer processing that must be carried out by the second grammatical analysis processes remains extremely limited. Like the first syntactic analysis processes, the second grammatical analysis processes can easily operate on a specific hardware block, or better, on several specific hardware blocks, operating as auxiliary processors in parallel with the main processor of this device. In a particular embodiment of the invention, cited here only by way of example, a specific hardware component with a massively parallel architecture is produced comprising, next to a main processor, and on the same component, a large number of specialized elementary coprocessors, each of them having only a small memory space, these specialized processors being intended to host syntactic and grammatical analysis processes and to make them work in parallel. Other characteristics of the invention will appear with the description of figures given below. The latter is carried out by way of description and without limitation, with reference to: FIG. 1 which schematically illustrates the way in which a data stream 1 composed of symbols S is analyzed by the method which is the subject of the present invention; FIG. 2 which illustrates how a dictionary of patterns can be represented by a tree structure 6; FIG. 3 illustrates a particular case of an automaton representing a family of grammars. FIG. 1 schematically represents, on an illustrative and nonlimiting example of the possibilities of the present invention, the way in which a data stream 1 composed of symbols S is analyzed by the method which is the subject of the present invention. The first phase of the method makes it possible to detect, within this data stream, various patterns M1, M2, M3,, M15 belonging to a dictionary of patterns. It will be noted that these detected patterns can be totally or partially superimposed, and that certain patterns (like Mil) can be limited to a single symbol. The second phase of the method then analyzes the sequences of patterns thus detected and checks whether these sequences are grammatically correct for one or more grammars. Some patterns like M2, M4, M9 do not belong to a grammatically correct sequence. On the other hand, the sequence of consecutive patterns {M7, M8, M10, Mil, M12}, (in dotted lines in FIG. 1) form a grammatically correct sequence. The sequence of patterns {M1, M3, M5, M6, M13, M14, MIS} also forms a grammatically correct sequence, this sequence being interrupted between the patterns M6 and M13, patterns between which is nested the grammatically correct sequence {M7 , M8, M10, Mil, M12}. This example illustrates the case of encapsulated protocols. The situation is similar to what happens in a text quoting, between quotation marks, another text, the symbols M6 and M13 playing here the same role as the opening and closing quotation marks. FIG. 2 explains on an illustrative and nonlimiting example of the possibilities of the present invention, the way in which a pattern dictionary can be represented by a tree structure 6. This tree structure is made up of branches 7 (represented by large arrows in FIG. 2 ) and nodes 8 (represented by circles in Figure 2). Each branch 7 is associated on the one hand with a starting node 9

(the origin of the arrow) and a destination node 10 (the end of the arrow) and on the other hand a label 13 which is a symbol of the alphabet of symbols (in the present example the alphabet of symbols consists of the 26 lowercase letters of the Latin alphabet). Root 12 is the only node to which no branch arrives. The prefix of a node 8 in the tree structure is the series of symbols made up of the labels 13 of the branches connecting the root to this node, these labels 13 being taken in the order in which they are encountered. In the present example, the pattern dictionary includes 10 patterns which are {ai, axis, axes, axial, bottom, bottom, ci, neck, glue, neck}, and each of these patterns corresponds to one and only one node, the prefix is this pattern. In Figure 2, these nodes, whose prefixes are patterns in the dictionary, are colored in gray. The terminal nodes 11 are those from which no branch leaves, and the prefix of a terminal node is always a pattern belonging to the dictionary of patterns. When the prefix of a non-terminal node is a pattern in the dictionary, this means that this pattern is the start of at least one other pattern in the dictionary (as we can see in this example with "low" and "low" or "collar" and "glue"). FIG. 3 illustrates a particular case of automaton 34 representing a family of grammars, this automaton 34 being composed of states 35 (represented by circles in FIG. 3) and transitions 36 (represented by arrows in FIG. 3) , with each transition 39 being associated a starting state 40 and a finishing state 41 (which, in FIG. 3, are the state which is represented at the origin and the state which is represented at the end of the arrow representing this transition). Certain states are qualified initial states 48 and final states 38, and are represented in FIG. 3 by double arrows entering or leaving this state. There is exactly one initial state per grammar.

Claims

1. Method for analyzing a data flow (1); said analysis being carried out on the fly; said data stream (1) presenting itself as a series of symbols taken from a set of symbols hereinafter called the alphabet of symbols; said method implementing a dictionary hereinafter called a pattern dictionary; said pattern dictionary being composed of certain particular sequences of symbols called patterns below; said dictionary of patterns being represented in the form of a tree structure (6); said tree (6) consisting of branches (7) and nodes (8); each branch (7) of said tree (6) starting from a node of said tree (6), hereinafter called the start node (9) of said branch, and arriving at another node of said tree (6) ci -after called arrival node (10) of said branch; a node of said tree (6) from which does not share any branch of said tree (6) being hereinafter called a terminal node (11); said tree structure (6) being such that there is one and only one node to which no branch arrives, the node to which no branch arrives being hereinafter called the root (12); to each branch of said tree structure (6) being assigned a symbol of said alphabet of symbols, this symbol being hereinafter called the label of said branch (13); each node of said tree (6) being associated with a series of symbols hereinafter called the prefix of said node; said prefix of said node being composed of the labels of the branches (7) connecting said root (12) to said node, these labels being taken in the order in which they are encountered; the prefix of said root (12) being the sequence comprising no element; each node also being associated with a family of patterns formed by the patterns contained in said dictionary of patterns and which begin with the prefix of said node, the number of patterns contained in 'said associated family said node being hereinafter called the richness of said node; said tree structure (6) being such that no node of said tree structure (6) has a richness equal to zero; said tree structure (6) being further such that any terminal node (11) of said tree structure (6) has a prefix equal to a pattern contained in said dictionary of patterns; said tree structure (6) being further such that for any pattern contained in said dictionary of patterns, there is one and only one node of the tree structure (6) whose prefix is equal to this pattern; each node of said tree (6) being associated with a number called address of said node so that the addresses of two different nodes are different numbers; said method comprising a first preliminary step of constituting said dictionary of patterns; said method comprising a first phase consisting in detecting the presence or not, within said data stream (1), of patterns belonging to said dictionary of patterns; said first phase operating on the fly and successively taking into account the symbols constituting said data stream (1); said first phase being such that if a pattern, belonging to said dictionary of patterns, is present in said data stream (1), this presence is detected as soon as the last symbol constituting said pattern is taken into account; said first phase implementing first syntactic analysis processes operating in parallel; each of said first parsing processes being associated - on the one hand, one of the symbols constituting said data flow (1), said symbol being hereinafter called the symbol for starting said first parsing process and never being modified during the execution of said first parsing process, - on the other hand, a number equal to the address of a node of said tree structure (6), said number being hereinafter called the position of said first syntactic analysis process and being intended to be modified during the execution of said first parsing process, said first parsing process starting to run as soon as said start symbol is taken into account and being responsible for detecting, for all the successive values of the integer N, whether the sequence N consecutive symbols, extracted from said data stream (1), this extraction starting at said start symbol, is equal to one of the patterns of said dictionary of patterns; said position of said first parsing process being, when said first parsing process begins to run, equal to the address of the root (12) of said tree (6); the execution of said first syntactic analysis process including, when taking into account each of the symbols constituting said data flow (1), the following steps: - the step of identifying if there is, in said tree structure ( 6), one or more branch (s) starting from the node whose address is equal to the position of said first syntactic analysis process and whose label is equal to said symbol taken into account, this or these branch (s) being hereinafter called (s) the active branch (es) for said first parsing process, - the step - if there is no active branch for said first parsing process , to stop the execution of said first parsing process, - if there is only one active branch for said first parsing process, to give as value to the position of said first parsing process the address of the arrival node (10) of said active branch, - if there are several active branches for said first parsing process, duplicate said first parsing process as many times as necessary, so as to associate with each of said active branches, a copy of said first process syntactic analysis, the position of said copy then being equal to the address of the arrival node (10) of the active branch with which it is associated, - the step, for said first syntactic analysis process, if its execution was not stopped during the previous step, or, in the case of duplication in the previous step, the step, for each of said copies of said first syntax analysis process, to report if the prefix of the node whose address is equal to the position of said first parsing process is equal to a pattern contained in said dictionary of patterns, this pattern then being hereinafter called a pattern detected by said first process known syntactic analysis; said first phase further comprising, after taking into account each of the symbols constituting said data flow (1), a first complementary step consisting in providing a list of all the patterns detected, when taking into account said symbol, by at least one of said first parsing processors running; said list being hereinafter called list of detected patterns; so that said list of detected patterns includes all the patterns present in said stream to be analyzed, ending at said symbol taken into account, and equal to a pattern of said dictionary of patterns.

2. Method according to claim 1; said dictionary of patterns associating with all or part of said patterns one or more numerical values hereinafter called pattern coefficients and / or information hereinafter called pattern information; said first phase being such that if said pattern dictionary associates with said pattern pattern coefficients and / or pattern information, said pattern coefficients and / or said pattern information are supplied by said first phase at the same time as is detected the presence of said motif.

3. Method according to claim 1; said method further using one or more variables hereinafter called evaluation variables; said method further comprising a preliminary step of initializing said evaluation variables to values fixed in advance and hereinafter called initial values of evaluation variables; said method further comprising a second phase, implementing an analysis algorithm, said analysis algorithm taking into account one or more arguments and having the effect, according to the value of said arguments, of modifying the value of all or part of said evaluation variables; said method being such that said first phase and said second phase are executed on the fly, in a nested fashion, said analysis algorithm being executed as soon as said first phase has provided a detected pattern, said detected pattern then being one of the arguments of said analysis algorithm.

4. Method according to claim 3; said dictionary of patterns associating with all or part of said patterns one or more numerical values hereinafter called pattern coefficients and / or information hereinafter called pattern information; said first phase being such that if said pattern dictionary associates with said pattern pattern coefficients and / or pattern information, said pattern coefficients and / or said pattern information are supplied by said first phase at the same time as is detected the presence of said pattern, said pattern coefficients and / or said reason information then being arguments of said analysis algorithm.

5. Method according to claim 3; said analysis algorithm being further executed as soon as said first phase has completed taking into account a symbol constituting said data flow (1), said symbol taken into account then being an argument of said analysis algorithm.

6. Method according to any one of claims 3 to 5; said first preliminary step further comprising a sub-step consisting in marking all or part of said patterns as being priority patterns, and in associating with each of said priority patterns at least one action to be executed; said first additional step further comprising the sub-step of launching the execution of the action (s) associated with all of the priority patterns contained in said list of detected patterns; the implementation of said analysis algorithm being inhibited when one of said detected patterns is a priority pattern with which an action is associated having the effect of inhibiting said analysis algorithm.

7. Method according to any one of claims 3 to 6; said method comprising the prior step of modifying said dictionary of patterns, the result of this modification being hereinafter called the modified pattern dictionary, and of modifying said tree structure, by adding new branches and new nodes, and / or by removal of branches and nodes; the result of this modification being hereinafter called the modified tree structure; said modified tree structure being further such that any terminal node of said modified tree structure has a prefix equal to a pattern contained in said dictionary of patterns modified; said modified tree structure further being such that for any pattern contained in said dictionary of modified patterns, there is one and only one node of said modified tree structure whose prefix is equal to this pattern.

8. The method of claim 4; said algorithm for analyzing said second phase consisting in determining the probability or probabilities that all or part of the data constituting said data flow belongs to one or more particular thematic (s); said probabilities being hereinafter called membership probabilities; said membership probabilities being one of said assessment variables; said membership probabilities being calculated using a function, hereinafter called the evaluation function; said evaluation function taking into account the patterns detected during said first phase; said analysis algorithm of said second phase modifying the values of all or part of said evaluation variables, in particular the value of said probabilities of belonging, by applying, for each symbol taken into account, the following steps: the step, if, at the end of said first phase, a pattern, belonging to said dictionary of patterns and ending with said symbol taken into account, has been detected in said data flow, to modify all or part of said evaluation variables by implementing an algorithm hereinafter called the re-evaluation algorithm, said re-evaluation algorithm taking into account said pattern having been detected; said method being such that, if said pattern dictionary associates with said pattern having been detected pattern coefficients and / or pattern information, said reassessment algorithm further takes into account said pattern coefficients and / or said pattern information; - the step, if, at the end of said first phase, no pattern, belonging to said dictionary of patterns and ending with said symbol taken into account, has been detected in said data flow, to modify all or part of said variables evaluation by implementing an algorithm hereinafter called relaxation algorithm; the step, after implementation, as the case may be, of the reassessment algorithm or of the relaxation algorithm, of applying an algorithm, hereinafter called the final probability calculation algorithm, which takes the values as arguments of all or part of said evaluation variables and provides as a result provisional values of the probability or probabilities that the part of said data flow which ends at said symbol taken into account belongs to said particular thematic or to said particular thematic, the values taken then by said one or more probabilities being hereinafter called local probabilities; said evaluation function consisting in calculating said probabilities of belonging, for each particular thematic, taking into account the successive values of said local probabilities, provided by said algorithm for final calculation of the probability, after taking into account all the symbols constituting said data flow.

9. The method of claim 8; said re-evaluation algorithm being reduced to a first family of functions taking as variables all or part of said evaluation variables; said first family of functions, being such that if said dictionary of patterns associates with said pattern which has been detected pattern coefficients and / or pattern information, said first family of functions also takes as variables said pattern coefficients and / or said reason information; said first family of functions providing as a result the new values of all or part of said evaluation variables; the function or functions constituting said first family of functions being functions dependent on a family of parameters hereinafter called parameters of the re-evaluation algorithm; said relaxation algorithm being reduced to a second family of functions taking as variables all or part of said evaluation variables and supplying as result the new values of all or part of said evaluation variables; the function or functions constituting said second family of functions being functions dependent on a family of parameters hereinafter called parameters of the relaxation algorithm; said algorithm for final calculation of probabilities being reduced to a third family of functions taking as variables said evaluation variables and providing as a result said local probabilities; the function or functions constituting said third family of functions being functions dependent on a family of parameters hereinafter called parameters of the algorithm for final calculation of probabilities; said pattern coefficients, said initial values of the evaluation variables, said parameters of the re-evaluation algorithm, said parameters of the relaxation algorithm, said parameters of the algorithm for final calculation of probabilities being hereinafter called so the calibration parameters.

10. The method of claim 9; said method further comprising the additional preliminary step of associating with each symbol of said alphabet of symbols, a weighting coefficient hereinafter called symbol weighting coefficient; said second family of functions and / or said third family of functions also taking as additional variable said symbol weighting coefficient associated with said symbol taken into account; said calibration parameters further comprising said symbol weighting coefficients.

11. Method according to any one of claims 9 or 10; said method comprising an additional step, hereinafter called recalibration, consisting, after the analysis of said data flow and as a function of the results provided by said evaluation function, in executing one or more of the following sub-steps: sub-step of adding one or more patterns to said dictionary of patterns, - the sub-step of removing one or more patterns from said dictionary of patterns, - the sub-step of varying all or part of said calibration parameters.

12. The method of claim 11; said method comprising a prior phase hereinafter called learning phase, consisting of repeatedly repeating the following steps: the step of operating said first phase and said second phase on data streams including the probabilities of belonging to specific themes are known in advance, - the step of executing said recalibration, so that said probabilities of belonging determined during said second phase of said process are as close as possible to values fixed in advance .

13. The method of claim 11; said pattern dictionary and said calibration parameters can be modified using information from an external source.

14. The method of claim 8; the patterns contained in said pattern dictionary being classified into three categories of patterns hereinafter called category of operational patterns, category of candidate patterns and category of learning patterns; said pattern dictionary being such that it associates pattern coefficients at least with each of said candidate patterns and with each of said learning patterns; said method being such that the possible presence, in said data stream, of patterns belonging to said category of candidate patterns is detected during said first phase but is not taken into account during said second phase; said method further comprising a self-learning phase; said self-learning phase taking place in parallel with said first phase and said second phase; said self-learning phase itself comprising two parts called hereinafter selection of apprentices and training of apprentices; said selection of the apprentices consisting, when the presence of a pattern belonging to the category of candidate patterns was detected during said first phase, in modifying all or part of the pattern coefficients of the pattern thus detected, this modification taking into account all or part of said evaluation variables; said selection of apprentices further consisting, according to the values taken by said coefficients of patterns of said pattern thus detected, to pass or not said pattern thus detected in the category of patterns in learning; said training of the apprentices consisting, when the presence of a pattern belonging to the category of patterns in learning is detected during said first phase, in giving new values to the pattern coefficients of the pattern thus detected, said new values being determined at from said evaluation variables and from the prior values of said pattern coefficients of said pattern thus detected, said training of the apprentices being able, depending on the evolution of the values of said pattern coefficients of said pattern thus detected, to modify the category of said pattern thus detected.

15. Method according to any one of claims 3 to 14; said method implementing a family of grammars, composed of grammars, each of said grammars comprising one or more rules capable of being verified or not by a series of patterns; a series of patterns verifying all the rules of a grammar belonging to said family of grammars being hereinafter qualified grammatically correct for said grammar r said analysis algorithm of said second phase having the objective of carrying out a grammatical analysis of the sequences of patterns formed by all or part of the patterns detected by said first syntactic analysis processes implemented during said first phase, such a sequence of patterns being hereinafter called detected sequence; said grammatical analysis verifying, for each of said detected sequences and for each of the grammars belonging to said family of grammars, whether said detected sequence is grammatically correct for said grammar.

16. The method of claim 15; said family of grammars being represented by an automaton (34); said automaton (34) consisting of states (35) and transitions (36); each state of said automaton (34) being associated with a number called address of said state, so that the addresses of two different states (35) are different numbers; at least one of the states (35) of said automaton (34) being called final state (38) of said automaton (34); each grammar taken from said family of grammars being associated with a state of said automaton (34), called the initial state (48) of said automaton for said grammar; each transition (39) of said automaton (34) being associated with two states (35) of said automaton (34), hereinafter called start state (40) of the transition (39) and end state (41) of the transition (39); each of said final states (38) of said automaton (34) being associated with a grammar taken from said family of grammars; each transition (39) of said automaton (34) is also assigned a set composed of one or more symbol sequences, hereinafter called the total label of said transition (39), one of said symbol sequences of said total label being a pattern belonging to said dictionary of patterns and being hereinafter called the lexical label of said transition (39); said method comprising a second preliminary step consisting in building said automaton (34); said second phase implementing, second grammatical analysis processes operating in parallel; each of said second grammatical analysis process being associated with: - a pattern detected during said first phase, said detected pattern being hereinafter called the reason for starting said second grammatical analysis process and never being modified during the execution of said second grammatical analysis process, - a grammar belonging to said family of grammars, - a number equal to the address of a state of said automaton (34), said number being part of said evaluation variables, said number being hereinafter called the position of said second grammatical analysis process and being intended to be modified during the execution of said second grammatical analysis process, said second grammatical analysis process starting to run upon detection said start pattern and being responsible for analyzing whether a series of detected patterns starting from said start pattern is grammatically correct for said grammar associated with said second grammatical analysis process; said position of said second grammar analysis process being, at the time when said second grammar analysis process begins to run, equal to the address of the initial state (48) of said automaton (34) for the grammar associated with said grammar second grammatical analysis process; the execution of said second grammatical analysis process including, when taking into account each of the detected patterns: - a) a filtering step consisting in deciding whether one or more of the transitions (36) of said automaton (34) will be used during of said taking into account of said detected pattern, said filtering step successively considering all the transitions (36) whose starting state (40) is the state whose address is equal to the position of said second grammatical analysis process , and applying to each of the transitions (36) thus considered a decision algorithm, said decision algorithm having for its object to decide whether said transition (39) will be used during said taking into account of said detected pattern taken into account, such a transition (39) then being subsequently called an active transition for said detected pattern, said decision algorithm taking as argument all or part of the total label of said trans ition (39) as well as said detected pattern taken into account, - b) an execution step consisting, - if there is no active transition for said second grammatical analysis process, in implementing an algorithm d stop intended to decide if said second grammatical analysis process should be stopped and if so, to end said second grammatical analysis process, - if there is only one active transition for said second analysis process grammatical, to give as value to the position of said second grammatical analysis process the address of the arrival state (41) of said active transition (39), - if there are several active transitions for said second grammatical analysis process, to be carried out as many duplicates as necessary of said second grammatical analysis process, so as to associate with each of said active transitions, a copy of said second grammatical analysis process resulting from said duplication (s), the position of said copy being then equal at the address of the arrival state (41) of the transition (39) with which said copy is associated, - c) a signaling step consisting, for said second grammatical analysis process or, in the case where there was duplication during the previous step, for each of said copies of said second grammatical analysis process from said duplicate (s), to be reported if the state whose address is equal to the position of said second d grammatical analysis process is a final state (38) of said automaton (34); said second phase further comprising, after taking into account each of said detected patterns, a second complementary step consisting in providing a list of all final states (35) whose address is equal to the position of at least one of said second grammar analysis processors running; said list being hereinafter called list of detected final states; so that each final state (38) of said list of detected final states corresponds to a detected sequence, grammatically correct for the grammar associated with this final state (38), and ending at said detected pattern taken into account.

17. The method of claim 16; each of said second grammatical analysis process being associated with a series of symbols called stack of said second grammatical analysis process; said pile being part of said evaluation variables, said stack having an initial value defined in advance at the time when said second grammatical analysis process begins to execute, said stack being intended to be modified during the execution of said second grammatical analysis process; said decision algorithm, implemented during said filtering step and applied to a transition (39), further taking as argument the value of said stack, and, when said transition (39) is an active transition, further determining a series of symbols called a new stack for said active transition; said execution step further realizing - if there is a single active transition (39) for said second grammatical analysis process, the operation of replacing said stack with said new stack for said active transition (39), - if there has been duplication, the operation, for each of said second grammatical analysis process originating from said duplication (s), to replace said stack of said second grammatical analysis process originating from said duplication (s) ), by said new stack for the active transition (39) with which is associated said second grammatical analysis process originating from said duplication (s), said stop algorithm implemented in the absence of transitions (36 ) active taking as argument the said stack.

18. Method according to any one of claims 16 or 17; each of said second grammatical analysis processes being associated with a variable called result variable, said result variable being part of said evaluation variables; said result variable being, at the start of said second grammatical analysis process, set to an initial value defined in advance; the execution of said second grammatical analysis process including, when taking into account each of the detected patterns, an additional calculation step intended to modify the value of said result variable; said calculation step taking as arguments - said detected pattern, - the value, before modification, of said result variable, - the total label of said active transition, or, if said second grammatical analysis process is the result of a duplication , the total label of the active transition with which it is associated, and providing as result a value which will then be assigned ^• as a new value to said variable result.

19. Method according to any one of claims 3 to 18; said method applying to a data flow passing over a communications network, said method further comprising a final phase consisting, according to the values taken by said evaluation variables, of letting said data flow pass without any modification or execute one or more of the following actions: - modify the content of said data flow, - modify the destination address of said data flow, - send information to a previously specified address, - block the passage of said data flow.

20. The method of claim 19; said method further comprising an initial phase of temporary storage of all or part of said data flow, said data flow or the part of the data flow thus stored, being destocked and transmitted, with or without modification, at the end of said final phase.

21. System for the analysis of a data flow (1); said analysis being carried out on the fly; said data stream (1) presenting itself as a series of symbols taken from a set of symbols hereinafter called the alphabet of symbols; said system implementing a dictionary hereinafter called a pattern dictionary; said dictionary of patterns ^' being composed of certain particular sequences of symbols hereinafter called patterns; said dictionary of patterns being represented in the form of a tree structure (6); said tree (6) consisting of branches (7) and nodes (8); each branch (7) of said tree (6) starting from a node of said tree (6), hereinafter called the start node (9) of said branch, and arriving at another node of said tree (6) ci -after called arrival node (10) of said branch; a node of said tree (6) from which does not share any branch of said tree (6) being hereinafter called a terminal node (11); said tree structure (6) being such that there is one and only one node to which no branch arrives, the node to which no branch arrives being hereinafter called the root (12); to each branch of said tree structure (6) being assigned a symbol of said alphabet of symbols, this symbol being hereinafter called the label of said branch (13); each node of said tree (6) being associated with a series of symbols hereinafter called the prefix of said node; said prefix of said node being composed of the labels of the branches (7) connecting said root (12) to said node, these labels being taken in the order in which they are encountered; the prefix of said root (12) being the sequence comprising no element; each node also being associated with a family of patterns formed by the patterns contained in said dictionary of patterns and which begin with the prefix of said node, the number of patterns contained in said family associated with said node being hereinafter called the richness of said node; said tree structure (6) being such that no node of said tree structure (6) has a richness equal to zero; said tree structure (6) being further such that any terminal node (11) of said tree structure (6) has a prefix equal to a pattern contained in said dictionary of patterns; said tree structure (6) being further such that for any pattern contained in said dictionary of patterns, there is one and only one node of the tree structure (6) whose prefix is equal to this pattern; each node of said tree (6) being associated with a number called address of said node so that the addresses of two different nodes are different numbers; said system comprising first storage means making it possible to store said tree structure; said system comprising first processing means making it possible to detect the presence or not, within said data stream (1), of patterns belonging to said dictionary of patterns; said first processing means operating on the fly and successively taking into account the symbols constituting said data stream (1); said first processing means being such that if a pattern, belonging to said dictionary of patterns, is present in said data stream (1), said first processing means detect its presence as soon as the last symbol constituting said pattern is taken into account; said first processing means making it possible to implement first syntactic analysis processes operating in parallel; each of said first syntactic analysis processes being associated: - on the one hand, one of the symbols constituting said data flow (1), said symbol being hereinafter called symbol of starting of said first parsing process and never being modified during the execution of said first parsing process, - on the other hand storage means intended to contain a number equal to the address of a node of said tree structure (6), said number being hereinafter called the position of said first parsing process and being intended to be modified during the execution of said first parsing process, said first parsing process syntactic starting to execute as soon as said start symbol is taken into account and being responsible for detecting, for all the successive values of the integer N, if the sequence of N .. consecutive symbols, extracted from said data stream (1) , this extraction starting at said start symbol, is equal to one of the patterns of said dictionary of patterns; said position of said first parsing process being, when said first parsing process begins to run, equal to the address of the root (12) of said tree (6); said first processing means making it possible, when taking into account each of the symbols constituting said data stream (1), for each first syntactic analysis process, to carry out the following operations: - the operation of locating whether there has, in said tree structure (6), one or more branch (es) starting from the node whose address is equal to the position of said first syntactic analysis process and whose label is equal to said symbol taken into account, this or these branch (s) being hereinafter called (s) the branch (s) active (s) for said first syntactic analysis process, - the operation - if there is no active branch for said first parsing process, to stop the execution of said first parsing process, - if there is only one active branch for said first syntactic analysis process, to give as value to the position of said first syntactic analysis process the address of the arrival node (10) of said active branch, - if there are several active branches for said first parsing process, duplicating said first parsing process as many times as necessary, so as to associate with each of said active branches, a copy of said first parsing process syntactic, the position of said copy then being equal to the address of the arrival node (10) of the active branch with which it is associated, - the operation, for said first syntactic analysis process, if its execution n 'was not stopped during the previous step, or, in the case where there was duplication in the previous step, the operation, for each of said copies of said first syntax analysis process, to indicate whether the prefix of node whose address is equal to the position of said first parsing process is equal to a pattern contained in said pattern dictionary, this pattern then being hereinafter called a pattern detected by said first parsing process; said first processing means further allowing, after taking into account each of the symbols constituting said data flow (1), to perform a first complementary operation consisting in providing a list of all the patterns detected, when taking into account of said symbol, by at least one of said first parsing processors running; said list being hereinafter called list of detected patterns; so that said list of detected patterns includes all the patterns present in said stream to be analyzed, ending at said symbol taken into account, and equal to a pattern of said dictionary of patterns.

22. The system of claim 21; said dictionary of patterns associating with all or part of said patterns one or more numerical values hereinafter called pattern coefficients and / or information hereinafter called pattern information; said system comprising means for storing said pattern coefficients and / or said pattern information; said first processing means being such that if said pattern dictionary associates with said pattern pattern coefficients and / or pattern information, said pattern coefficients and / or said pattern information are supplied by said first processing means at the same time that the presence of said pattern is detected.

23. The system of claim 21; said system comprising storage means making it possible to store one or more variables called hereinafter evaluation variables; said system further comprising processing means making it possible to initialize said evaluation variables to values fixed in advance and hereinafter called initial values of evaluation variables; said system further comprising second processing means making it possible to implement an analysis algorithm, said analysis algorithm taking into account one or more arguments and having the effect, according to the value of said arguments, of modifying the value of any or part of said evaluation variables; said system being such that said first processing means and said second processing means are executed on the fly, in a nested manner, said analysis algorithm being implemented as soon as said first processing means have provided a detected pattern, said detected pattern then being one of the arguments of said analysis algorithm.

24. The system of claim 23; said dictionary of patterns associating with all or part of said patterns one or more numerical values hereinafter called pattern coefficients and / or information hereinafter called pattern information; said system comprising means for storing said pattern coefficients and / or said pattern information; said first processing means being such that if said pattern dictionary associates with said pattern pattern coefficients and / or pattern information, said pattern coefficients and / or said pattern information are supplied by said first processing means at the same time that the presence of said pattern is detected, said pattern coefficients and / or said pattern information then being arguments of said analysis algorithm.

25. The system of claim 23; said analysis algorithm being further implemented as soon as said first processing means ¹ have completed taking into account a symbol constituting said data stream (1), said symbol taken into account then being an argument of said algorithm 'analysis.

26. System according to any one of claims 23 to 25; said first processing means further comprising a functionality making it possible to mark all or part of said patterns as being priority patterns, and to associate with each of said priority patterns at least one action to be executed; said system further comprising processing means making it possible to execute said actions associated with said priority patterns; said first additional operation further consisting in launching the execution of the action (s) associated with all of the priority patterns contained in said list of detected patterns; said system further comprising processing means making it possible to inhibit the implementation of said analysis algorithm when one of said detected patterns is a priority pattern with which an action is associated having the effect of inhibiting said analysis algorithm.

27. System according to any one of claims 23 to 26; said system further comprising processing means making it possible to carry out the prior operation of modifying said dictionary of patterns, the result of this modification being hereinafter called the modified pattern dictionary, and of modifying said tree structure, by adding new branches and new nodes, and / or by deleting branches and nodes; the result of this modification being hereinafter called the modified tree structure; said modified tree structure being further such that any terminal node of said modified tree structure has a prefix equal to a pattern contained in said modified pattern dictionary; said modified tree structure being further such that for any pattern contained in said dictionary of modified patterns, there is one and only one node of said modified tree structure whose prefix is equal to this pattern.

28. The system of claim 24; said analysis algorithm consisting in determining the probability or probabilities that all or part of the data constituting said data flow belongs to one or more particular thematic (s); said probabilities being hereinafter called membership probabilities; said membership probabilities forming part of said evaluation variables; said membership probabilities being calculated using a function, hereinafter called the evaluation function; said evaluation function taking into account the patterns detected by said first processing means; said system comprising processing means making it possible to modify the values of all or part of said evaluation variables, in particular the value of said probabilities of belonging, by applying, for each symbol taken into account, the following operations: the operation, if a pattern, belonging to said pattern dictionary and ending with said symbol taken into account, has been detected in said data flow by said first processing means, to modify all or part of said evaluation variables by implementing an algorithm hereinafter called the re-evaluation algorithm, said re-evaluation algorithm taking into account said pattern having been detected; said system being such that, if said pattern dictionary associates with said pattern having been detected pattern coefficients and / or pattern information, said reevaluation algorithm also takes into account said pattern coefficients and / or said pattern information; - the operation, if no pattern, belonging to said dictionary of patterns and ending with said symbol taken into account, has been detected in said data stream by said first processing means, to modify all or part of said variables of evaluation by implementing an algorithm hereinafter called relaxation algorithm; the operation, after implementation, as the case may be, of the re-evaluation algorithm or of the relaxation algorithm, of applying an algorithm, hereinafter called the final probability calculation algorithm, which takes the values as arguments of all or part of said evaluation variables and provides as a result provisional values of the probability or probabilities that the part of said data flow which ends at said symbol taken into account belongs to said particular thematic or to said particular thematic, the values taken then by said one or more probabilities being hereinafter called local probabilities; said system comprising processing means making it possible to calculate using the evaluation function said probabilities of belonging, for each particular thematic, taking into account the successive values of said local probabilities, provided by said final calculation algorithm of the probability, after taking into account all the symbols constituting said data stream.

29. The system of claim 28; said re-evaluation algorithm being reduced to a first family of functions taking as variables all or part of said evaluation variables; said first family of functions, being such that if said dictionary of patterns associates with said pattern which has been detected pattern coefficients and / or pattern information, said first family of functions also takes as variables said pattern coefficients and / or said reason information; said first family of functions providing as a result the new values of all or part of said evaluation variables; the function or functions constituting said first family of functions being functions dependent on a family of parameters hereinafter called parameters of the re-evaluation algorithm; said relaxation algorithm being reduced to a second family of functions taking as variables all or part of said evaluation variables and supplying as result the new values of all or part of said evaluation variables; the function or functions constituting said second family of functions being functions dependent on a family of parameters hereinafter called parameters of the relaxation algorithm; said algorithm for final calculation of probabilities being reduced to a third family of functions taking as variables said evaluation variables and providing as a result said local probabilities; the function (s) constituting said third family of functions being functions dependent on a family of parameters hereinafter called parameters of the algorithm for final calculation of probabilities; said pattern coefficients, said initial values of the evaluation variables, said parameters of the re-evaluation algorithm, said parameters of the relaxation algorithm, said parameters of the algorithm for final calculation of probabilities being hereinafter called so general calibration parameters; said system comprising storage means making it possible to store the values of said calibration parameters.

30. The system of claim 29; said system further comprising processing means further making it possible to carry out an additional operation consisting in associating with each symbol of said alphabet of symbols, a weighting coefficient hereinafter called symbol weighting coefficient; said second family of functions and / or said third family of functions also taking as additional variable said symbol weighting coefficient associated with said symbol taken into account; said calibration parameters further comprising said symbol weighting coefficients.

31. System according to any one of claims 29 or 30; said system further comprising processing means making it possible to carry out an additional operation, hereinafter called recalibration, consisting, after the analysis of said data flow and according to the results provided by said evaluation function, in executing one or more several of the following sub-operations: - the sub-operation of adding one or more patterns to said pattern dictionary, - the sub-operation of removing one or more patterns from said dictionary of patterns, - the sub-operation of varying all or part of said calibration parameters.

32. The system of claim 31; said system comprising processing means making it possible to carry out a prior operation hereinafter called learning, consisting of repeatedly repeating the following sub-operations: - the sub-operation of operating said first processing means and said second processing means on data streams whose probabilities of belonging to particular themes are known in advance, - the sub-operation of executing said recalibration so that said probabilities of belonging determined by said second processing means are as close as possible to values fixed in advance.

33. The system of claim 31; said system comprising processing means making it possible to modify said dictionary of patterns and said calibration parameters using information coming from an external source.

34. The system of claim 28; the patterns contained in said pattern dictionary being classified into three categories of patterns hereinafter called category of operational patterns, category of candidate patterns and category of learning patterns; said pattern dictionary being such that it associates pattern coefficients at least with each of said candidate patterns and with each of said learning patterns; said system being such that the possible presence, in said data flow, of patterns belonging to said category of candidate patterns is detected by said first processing means but is not taken into account by said second processing means; said system further comprising processing means making it possible to carry out a self-learning operation; said self-learning operation itself comprising two parts called hereinafter selection of apprentices and training of apprentices; said selection of apprentices consisting, when the presence of a pattern belonging to the category of candidate patterns has been detected by said first processing means, to modify all or part of the pattern coefficients of the pattern thus detected, this modification taking into account all or part of said evaluation variables; said selection of apprentices further consisting, according to the values taken by said coefficients of patterns of said pattern thus detected, to pass or not said pattern thus detected in the category of patterns in learning; said training of apprentices consisting, when the presence of a pattern belonging to the category of patterns in learning is detected by said first processing means, to give new values to the pattern coefficients of the pattern thus detected, said new values being determined from said evaluation variables and from the prior values of said pattern coefficients of said pattern thus detected, said training of the apprentices being able, depending on the evolution of the values of said pattern coefficients of said pattern thus detected, to modify the category of said pattern thus detected .

35. System according to any one of claims 23 to 34; said system comprising processing means making it possible to implement a family of grammars, composed of grammars, each of said grammars comprising one or more rules capable of being verified or not by a series of patterns; a series of patterns verifying all the rules of a grammar belonging to said family of grammars being hereinafter qualified grammatically correct for said grammar r said analysis algorithm implemented by said second processing means having the objective of performing a grammatical analysis of the sequences of patterns formed of all or part of the patterns detected by said first processing means, such a sequence of patterns being hereinafter called detected sequence; said grammatical analysis verifying, for each of said detected sequences and for each of the grammars belonging to said family of grammars, whether said detected sequence is grammatically correct for said grammar.

36. The system of claim 35; said family of grammars being represented by an automaton (34); said automaton (34) consisting of states (35) and transitions (36); each state of said automaton (34) being associated with a number called address of said state, so that the addresses of two different states (35) are different numbers; at least one of the states (35) of said automaton (34) being called final state (38) of said automaton (34); each grammar taken from said family of grammars being associated with a state of said automaton (34), called the initial state (48) of said automaton for said grammar; with each transition (39) of said automaton (34) being associated with two states (35) of said automaton (34), hereinafter called start state (40) of the transition (39) and arrival state (41) of the transition (39); each of said final states (38) of said automaton (34) being associated with a grammar taken from said family of grammars; each transition (39) of said automaton (34) is also assigned a set composed of one or more symbol sequences, hereinafter called the total label of said transition (39), one of said symbol sequences of said total label being a pattern belonging to said dictionary of patterns and being hereinafter called the lexical label of said transition (39); said system comprising storage means making it possible to store said automatic device said system comprising processing means making it possible to carry out a second preliminary operation consisting in building said automatic device (34) '; said system comprising processing means making it possible to implement second grammatical analysis processes operating in parallel; each of said second grammatical analysis process being associated: - a pattern detected by said first processing means, said detected pattern being hereinafter called the reason for starting said second grammatical analysis process and never being modified during the execution of said second grammatical analysis process, - a grammar belonging to said family of grammars, - a number equal to the address of a state of said automaton (34), said number being part of said evaluation variables, said number being hereinafter called the position of said second grammatical analysis process and being intended to be modified during the execution of said second grammatical analysis process; said system comprising processing means making it possible to start the execution of said second grammatical analysis process as soon as said starting reason is detected; said second grammatical analysis process being responsible for analyzing whether a series of detected patterns starting from said start pattern is grammatically correct for said grammar associated with said second grammatical analysis process; said position of said second grammar analysis process being, at the time when said second grammar analysis process begins to run, equal to the address of the initial state (48) of said automaton (34) for the grammar associated with said grammar second grammatical analysis process; said processing means making it possible to implement said second grammatical analysis processes allowing, when taking into account each of the detected patterns, to carry out the following operations: - a) a filtering operation consisting in deciding whether one or more of the transitions (36) of said automaton (34) will be used during said taking into account of said detected pattern, said filtering step successively considering all the transitions (36) whose starting state (40) is the state whose address is equal to the position of said second grammatical analysis process, and applying to each of the transitions (36) thus considered a decision algorithm, said decision algorithm having for its object to decide whether said transition (39) will be used during said taking taking into account said detected pattern taken into account, such a transition (39) then being subsequently called an active transition for said detected pattern, said algo decision rithm taking as argument all or part of the total label of said transition (39) as well as said detected pattern taken into account - b) a consistent execution operation, - if there is no active transition for said second grammatical analysis process, to implement a stop algorithm having as its object to decide whether said second grammatical analysis process should be stopped and if so, to end said second grammatical analysis process, - if there is only one active transition for said second grammatical analysis process, to give as value to the position of said second grammatical analysis process the address of the arrival state (41) of said transition ( 39) active, - if there are several active transitions for said second grammatical analysis process, to make as many duplicates as necessary of said second grammatical analysis process, so as to associate with each of said active transitions, a copy of said second grammatical analysis process resulting from said duplication (s), the position of said copy then being equal to the address of the arrival state (41) of the transition (39) with which said copy is associated , - c) a signaling operation consisting, for said second grammatical analysis process or, in the case of duplication during the previous step, for each of said copies of said second grammatical analysis process resulting from said duplication (s), to be reported if the state whose address is equal to the position of said second grammatical analysis process is a final state (38) of said automaton (34); said system further comprising processing means allowing, after taking into account each of said detected patterns, to perform a second complementary operation consisting in providing a list of all final states (35) whose address is equal to the position at least one of said second grammar analysis processors being executed; said list being hereinafter called list of detected final states; so that each final state (38) of said list of detected final states corresponds to a detected sequence, grammatically correct for the grammar associated with this final state (38), and ending at said detected pattern taken into account.

37. The system of claim 36; each of said second grammatical analysis process being associated with a series of symbols called stack of said second grammatical analysis process; said stack being part of said evaluation variables, said stack having an initial value defined in advance at the time when said second grammatical analysis process begins to execute, said stack being intended to be modified during execution said second grammatical analysis process; said system comprising storage means for storing said stack; said decision algorithm, implemented during said filtering step and applied to a transition (39), further taking as argument the value of said stack, and, when said transition (39) is an active transition, further determining a series of symbols called a new stack for said active transition; said execution step further realizing - if there is a single active transition (39) for said second grammatical analysis process, the operation of replacing said stack with said new stack for said active transition (39), - if there has been duplication, the operation, for each of said second grammatical analysis process originating from said duplication (s), to replace said stack of said second grammatical analysis process originating from said duplication (s) ), by said new stack for the active transition (39) with which is associated said second grammatical analysis process originating from said duplication (s), said stop algorithm implemented in the absence of transitions (36 ) active taking as argument the said stack.

38. System according to any one of claims 36 or 37; each of said second grammatical analysis processes being associated with a variable called result variable, said result variable being part of said evaluation variables; said system comprising storage means making it possible to store the value of said result variable and processing means making it possible, when starting said second grammatical analysis process, to set said result variable to an initial value defined in advance, and further allowing the value of said result variable to be modified when each of the detected patterns is taken into account, this modification taking into account said detected pattern as well as the total label of said active transition for said second grammatical analysis process, or if said second grammatical analysis process comes from a duplication, the total label of the active transition with which it is associated.

39. System according to any one of the preceding claims 23 to 38; said system applying to a data flow passing over a communications network, said system further comprising processing means making it possible, depending on the values taken by said evaluation variables, to allow said data flow to pass without any modification or to execute one or more of the following actions: - modify the content of said data flow, - modify the destination address of said data flow, - send information to a previously specified address, - block the passage of said data flow.

40. The system of claim 39; said system further comprising means for temporarily storing all or part of said data stream, and for retransmission, with or without modification, of the stream thus stored.