WO2004061651A2 - A method and system for dynamically creating parsers in a message broker - Google Patents

A method and system for dynamically creating parsers in a message broker Download PDF

Info

Publication number
WO2004061651A2
WO2004061651A2 PCT/EP2003/015015 EP0315015W WO2004061651A2 WO 2004061651 A2 WO2004061651 A2 WO 2004061651A2 EP 0315015 W EP0315015 W EP 0315015W WO 2004061651 A2 WO2004061651 A2 WO 2004061651A2
Authority
WO
WIPO (PCT)
Prior art keywords
message
rule
tree
memory
grammar
Prior art date
Application number
PCT/EP2003/015015
Other languages
French (fr)
Other versions
WO2004061651A3 (en
Inventor
Marc Fiammante
Gérard Simon
Original Assignee
International Business Machines Corporation
Compagnie Ibm France
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corporation, Compagnie Ibm France filed Critical International Business Machines Corporation
Priority to AU2003303605A priority Critical patent/AU2003303605A1/en
Priority to EP03808294A priority patent/EP1581869A2/en
Publication of WO2004061651A2 publication Critical patent/WO2004061651A2/en
Publication of WO2004061651A3 publication Critical patent/WO2004061651A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing

Definitions

  • the present invention generally relates to Enterprise Application Integration programs and more particularly to the function of message analysis performed by the Message Broker which transforms the content of a message received from a first application to make it comprehensible by the target application.
  • MOM Message Oriented Middleware
  • the heterogeneous applications can communicate without being aware of the communication means between them.
  • MOM programs provide message management between applications which become loosely coupled in a connectionless mode.
  • the MOM takes care of the message transport layer and manages message queues for the applications.
  • each application using the service of the MOM needs to know the format of the messages of the application it communicates with.
  • EAI Enterprise Application Integration
  • applications are adapted to appears as transaction oriented using use real-time messaging mechanism (e.g. Publish/subscribe, request/reply, synchronous, asynchrounous) to communicate.
  • a new programming layer, the Message Broker programs is responsible for looking into the message content to identify the message type, look at the corresponding message format, identify the logical content of the message, modify the message content if necessary, initiate the formatting adaptation to the target application and route the message to a MOM queue or directly to the target application.
  • WebSphere MQIntegrator V2 (noted WMQI in the present document) of IBM (IBM, WebSphere and MQIntegrator are trademarks of IBM Corporation in certain countries) .
  • the Message Brokers currently sold by the EAI software manufacturers support either binary form (C language or Cobol messages) messages, tagged messages optionally including fixed formats (such as SWIFT, EDI or FIX messages) or tagged but flexible, self-describing messages such as with XML protocol.
  • binary form C language or Cobol messages
  • tagged messages optionally including fixed formats (such as SWIFT, EDI or FIX messages) or tagged but flexible, self-describing messages such as with XML protocol.
  • a Message Broker may be generated with different parsers, each of them performing the syntactical analysis of a different message type.
  • Message Broker programs provide an interface to add more parsers.
  • the Message Brokers are not able to support more highly flexible syntax messages such as e-mail messages. This type of messages is well known under the name of context free language based messages. In the rest of the document they will be named 'e-mail like' type messages.
  • WMQI Message Broker supports for XML type messages or Cobol like Custom Wire Format messages or tagged messages. These e-mail like type messages are messages having a more complex structure which cannot be described by COBOL or C structures or tags.
  • an e-mail address can include optional comments between ,x ( " and ")" at any place in the address.
  • the expression Dupont.(I am the greatest) Marc @(the)Negas.WBA is a valid address and is equivalent to Dupont.Marc@Negas.WBA or even to Dupont(I am).
  • WMQI provides a C language API to add a parser so that it will handle any given message type or format provided that a suitable parser is available.
  • Any new parser plugged in a Message Broker should also be adapted to provide output format compatible with the logical message representation in memory as expected by the other functions (routing, formatting) of the Message Broker.
  • the method comprising the steps of, for each type of e-mail like types of messages, creating in memory a Rule-Tree which is a tree based logical representation in memory of the rules of the LL(k) grammar corresponding to the type of the e-mail like types of messages; then, when a message is received and identified by the Message Broker as being one of said e-mail like types, reading the corresponding Rule-Tree and trying to match the LL(k) grammar rules on said message; if a grammar rule is matched, building in memory the Message-Tree which is a tree based logical representation in memory of the message; then, reading the logical representation of the message built in memory in output of the preceding step execution and using it in the following processing step of the Message Broker.
  • the entry of the LL(k) grammar rules by
  • the context free language based messages can be based on a LL(k) grammar because the syntax of these messages can be analyzed on the basis of a set of LL(k) grammar rules that one person skilled in the art can easily define.
  • the LL(k) 5 grammars up to now known for high level programming language compilation, are particularly adapted to the syntactical analysis of context free language based messages.
  • the token discovery process is adapted to context free language based messages, .0 thus avoiding the weakness of the algorithms based on repeated token discovery process used with tagged or data structured.
  • One consequence is the good quality of the corresponding code to perform the syntactical analysis and the simplicity of the maintenance of that code.
  • One other advantage of the invention is the possibility to create in one pass the rule-tree corresponding to one context free language based message type.
  • a generic parser is created once and plugged in the Message Broker. It will be able to parse the context free language based messages by the
  • the solution provides a high dynamicity as the creation of the rule-tree in memory may be facilitated by a graphical user interface according to the preferred embodiment .
  • the EAI administrator can not only create a new 25 rule-tree corresponding to a new LL(k) grammar but can also modify an existing rule-tree file if he wants to reflect a change in the rules of that LL(k) grammar.
  • the Message Broker can be enriched in one pass with different 30 parsers handling different context free language based messages such as e-mail messages defined by the RFC2822.
  • a customer buying a Message Broker to build an EAI environment can use the parser plug in interface of the Message Broker to add the e-mail like type message parser.
  • a Message Broker program manufacturer can enrich the program by adding such a parser in the Message Broker program.
  • a Message Broker program manufacturer can also include as a program component the rule-tree generator and the generic LL(K) grammar parser to let the user generate himself the parsers to process any type of e-mail type messages he desires .
  • Fig. 1 illustrates an EAI environment allowing a set of heterogeneous applications to communicate together
  • Fig. 2 shows the logical blocks of a Message Broker according to the preferred embodiment, implemented as a program executing in a computing system;
  • Fig. 3 shows the general flow chart of the method of the Message Broker according to the preferred embodiment
  • Fig. 4 shows the flow chart of the method to generate a parser used in the method for brokering messages according to the preferred embodiment
  • Fig. 5 shows the Rule-Tree memory graphical illustration
  • Fig. 6 illustrates one embodiment of the invention using an object oriented programming language on one example of looking of a match with one rule of the LL(k) grammar
  • Fig. 7 shows a graphical illustration of one rule taken as an example .
  • Fig.l illustrates the use of a Message Broker (105) in the Enterprise Application Integration programs for a typical business environment.
  • a set of applications (140) may operate on different computers communicating through local (150) or wide area networks (130) , either public or private.
  • the EAI programs are preferably installed and operates on a separate computer (100) connected to the application computers. Any applications may receive messages from other applications and send back answers or sends new requests to other applications through the EAI computer.
  • Standalone customers (110, 120) are connected through a Public network to an SMTP server (140) can also communicate with the applications through the EAI programs.
  • the applications which communicate through the EAI programs may use different programming languages depending on the time they were developed, they may also use on the computers different hardware and operating systems.
  • a MOM software uses a common transport protocol such as HTTP or TCP/IP to perform asynchronous communication with the applications.
  • the messages received by the MOM are sent to the Message Broker which looks into their content and route them to the queue of the MOM for a target application.
  • the Message Broker software layer has been adapted to the application environment, he knows the message formats, the processing rules and supports different types of messages.
  • the Message Broker according to the preferred embodiment of the invention, is able to receive e-mail like type messages.
  • Fig. 2 describes the program logical blocks of the Message Broker (200) according to the preferred embodiment.
  • the dotted lines describe the path used by a e-mail like type message in the Message Broker logical blocks.
  • the Message Broker receives messages from applications through an INPUT/OUTPUT interface logical block (250) .
  • the INPUT/OUTPUT interface may be of different type: the Message Broker may have a direct API interface with an application (241, 215) or may have an interface to the MOM which handles (235, 245, 265) a common transport layer for all the applications (HTTP as an example) .
  • the output of this interface logical block is a byte stream containing information which need to be processed by the Message Broker and transmitted to the target application.
  • An e-mail like type message is received via the MOM (245) by the Message Broker.
  • a message type identification block (255) switches the message byte stream to one of the parser logical blocks (260, 265, 270).
  • the parser logical blocks 260, 265, 270.
  • Fig. 2 more than one parser are represented, each of them corresponding to a specific message type.
  • Some parsers may be provided by the Message Broker, other ones may have been added using the parser plug-in API available in most of the Message Brokers.
  • the e-mail like type message is sent to a logical block (270) for executing a generic code able to parse any e-mail like type messages.
  • the message type identification block (255) provides to the e-mail type message parser (270) a pointer to a specific LL(k) grammar Rule-tree saved in a Rule-Tree data base (275) .
  • the LL(k) grammar Rule-Tree has been previously created via a new logical block (242).
  • the logical block for executing the generic code for parsing (270) receives as input data the message byte stream and a pointer to a specific Rule-Tree.
  • the syntactical analysis of the message performed in the parsing logical block consists in reading the message and identifying the matches with the rules in the Rule-Tree.
  • the output of the parsing may be an exception error transmitted to the Message Broker which sends back an error to the originator application if no match has been found with the rules. If matches are found during the syntactical analysis of the message, the message follows the grammar rules and the output of the parsing logical block is a logical representation of the message which has been built in memory during the message analysis. It is noted that most of the Message Brokers have a tree based representation of the message. Thus, most of the time, the parsing logical block provides to the Message Broker a tree based representation of the e-mail like type message in memory. The message Broker uses the Message-Tree to create a message content as expected by the target application in the Message Content Processing logical block (280) .
  • the process is performed according to processing rules (290) defined by the EAI environment administrator.
  • the processing rules are used to analyze a specific message semantic content (a message format to update a banking account, for instance) and to perform logical routing.
  • a specific message semantic content a message format to update a banking account, for instance
  • specific processing rules in the Message Content Processing logical block (280) may be computed a new field as expected by a target application.
  • the message content is then sent to one or more target application (s) or one or more specific message queue in the MOM through the INPUT/OUTPUT interface logical block (250).
  • the message formats (295) are applied on the output interface (250) for producing the physical byte stream as expected by the target application (s) .
  • the generic code for parsing e-mail like types of messages is a LL(k) grammar based syntactical analyzer. Furthermore, this analyzer reads a byte stream message, and referring to a Rule-Tree representing the LL(k) grammar rules in memory, finds the possible matches in the message with the grammar rules.
  • a logical block (242) is added for allowing an EAI administrator to enter on the computer the description of the Rules for a LL(k) grammar corresponding to the syntax of a specific e-mail type of message. In the logical block is performed the building in memory of a tree based representation of the grammar rules .
  • the Rule-Tree may be stored on a data base for further utilization.
  • the grammar rule entry is performed through a graphical user interface.
  • grammar rules can be easily graphically represented as a tree.
  • the Rule-Tree being the logical view of the graphical representation. It is noted also that still in the logical block for Rule-Tree preparation, an existing Rule-Tree in memory can be edited for modifications of the grammar rules.
  • Each Rule-Tree created by the EAI administrator corresponds to a LL(k) grammar syntax and thus to one e-mail like type of message.
  • an access to a Rule-Tree corresponding to a given e-mail like message type is assigned to the parser (270) in the Message Type Identification logical block (255).
  • the parser will perform the syntactical analysis of a byte stream corresponding to the e-mail like type identified in the previous logical block and will build in memory a Message-Tree representing the message in a way expected by the Message Broker following logical block (280) . Consequently, the EAI administrator by creating a Rule-Tree can dynamically enable the generic e-mail like type message parser (270) to support parsing of messages of the corresponding type of e-mail like type.
  • a syntax grammar usually is described with a notation called a syntactic meta-language. This notation is used to define the set of rules that describe the structure of programming languages.
  • the usual notations derive from BNF (Backus-Naur Form) , a notation that started with the Algol 60 programming language (Naur, 1960) .
  • Some recent forms are called eBNF (extended) and is described in the ISO/IEC 14977 standard.
  • Unix systems provide the lex and yacc formal tokenizer and parser that use a form of BNF.
  • LL grammars When used for compiling context free based programming languages, LL grammars, which first L stands for left to right scanning of the input programming source code and the second L for producing the leftmost derivation (also known as top down parsing) .
  • the LL grammar based compilers generate source code based on dedicated function calls with names matching the rules .
  • Fig 3 shows the general flowchart of the method of a Message Broker according to the preferred embodiment.
  • the EAI administrator has to enable the Message Broker to support an e-mail like type of messages (300) .
  • the EAI administrator preferably uses a graphical editor to enter a graphical representation of the LL(k) grammar rules corresponding to the syntax of these e-mail like type of messages.
  • a programming language or a descriptive language such as XML could be used to enter the grammar rules but this operating mode is not preferred as less dynamic than with a graphical editor.
  • a Rule-Tree is created in memory and saved for further use by the Message Broker.
  • the Rule-Trees are generally stored on a data base accessible by the computer operating the EAI programs.
  • the administrator will generate a new reference to this Rule-Tree in the Message Broker in order that the Message Type Identification be able to identify this new type of message and this new Rule-Tree.
  • the tools for generating and installing a new parser in a Message Broker are part of the Message Brokers basic functions.
  • the Message Broker can at any time handle the e-mail like type of messages for which the syntax follows the grammar rules described in the Rule-Tree.
  • a Message Type Identification is performed and the message is transmitted to a parser which can be any type .
  • a parser which can be any type .
  • the Message Type Identification logical block are used some information in the message headers or specified in its own properties to determine how the message body is parsed: - If the message has a recognizable routing header, the domain identified in the header is used to decide which message parser is invoked.
  • the parser specified can be a plug-in parser such as an e-mail like type message parser.
  • the message is handled as a binary object.
  • the message byte stream is provided to the generic parser for e-mail like type message interpretation.
  • This interpretation step (330) consists in reading the message and analyzing if the syntax is in line wit the grammar rules read in the tree.
  • the Rule-Tree branches will be explored up to the finding of a rule match or rule matches with the content of the message. A Message-Tree is built during the look for a match process, one branch is created each time a match is found.
  • a Message-Tree is stored in memory in a format adapted to the Message Broker processing step (380) following the parsing.
  • the parsing step is an interpretation rather than a compilation because its execution results in a creation in memory of a directly usable data, the Message-Tree.
  • the Message-Tree is used in the following step of the Message Broker to create a message content (360) as expected by the target application
  • Fig. 4 describes the steps of dynamic creation of a
  • Rule-Tree by the EAI administrator.
  • a graphical tree editor is started (410) . If there is a LL(k) grammar rule or node to enter (answer yes to test 420) , a node is drawn as a leaf and/or a rule is drawn as a branch (430) of a tree. The corresponding logical representation of the leaves and branch is then created in memory (440) . If there no more new node or rule to enter (answer no to test 420) , the tree as drawn is saved in memory.
  • the saving can be done in a data base having a repertory in which one entry contains one initial rule name of a LL(k) grammar Rule-Tree and a pointer to it.
  • the administrator changes an existing rule, he can starts the Rule-Tree creation (400) and choose to start it on an already saved Rule-Tree that it identifies by its initial rule name.
  • the graphical editor (410) retrieves the Rule-Tree, displays it and allows delete of a tree, a branch or a leave or entry of a new leave, or new branch in the case he has a new rule or node to enter (answer yes to test 420) .
  • StopChkAudRs ( [Status] RqUID [AsyncRqUID Custid] ) [ [RecCtrlOut ] (DepAcctld
  • LL(k) grammar rule is graphically represented as a tree in Fig. 7.
  • Fig. 7 is a visual representation of the LL(k) grammar rule above given as an example. It is represented as a part of a tree of the complete LL(k) grammar rules on the display of the EAI administrator operating the graphical editor activated by the Rule-Tree preparation operation (in logical block 242 of Fig. 2) .
  • StopChkAudRs (700) is the named rule that is represented as a tree.
  • an attribute Seq, for sequence is indicated (710, 720, 730)
  • the following sequence of rules must match in the order as noted. If the attribute is SWT, for switch, is indicated (760) , any of the rules can match.
  • + is a signed used in this graphical representation to indicate that the element is not terminal and this is a reference to one other Named rule .
  • the rule in Fig. 7 only three elements (740, 750, 770) are terminal/final elements.
  • the non terminal element 770 has a cardinality from 1 to n, this means that it may not be found or may be repeated up to n times. This is also represented by the dotted lines which represent optional paths (cardinality of these paths means that the cardinality may be 0 or 1) . This is why the cardinality of element 770 which join able via a dotted line may be from 0 to n.
  • Fig. 5 is a visual representation of a general LL(k) grammar as a tree on the display of the EAI administrator operating the graphical editor activated by the Rule-Tree preparation operation (in logical block 242 of Fig. 2).
  • the e-mail like message type syntax can be well defined by the rules of a LL(k) Grammar.
  • a syntax grammar usually is described with a notation called a syntactic meta-language. The various syntactic parts are named. Terminal elements are usually called tokens and rules aggregating those terminals are called non-terminals .
  • Terminal elements are usually called tokens and rules aggregating those terminals are called non-terminals .
  • the valid sequences of symbols for the language or the message are described using the notation.
  • the syntax of any valid language or message structure is also described.
  • the tree (505) has an entry point which is the initial rule name table (510) . From this table which is the root of the tree are attached two branches (also called children) (515, 580) having a main node (520, 580) which are named.
  • the first branch (515) is the main branch of the tree. In this first there are two references (550, 570) to the second branch (580) . This implies that when discovering the first branch (515) three different sub branches are found (530, 560, 570) each of them having either a sub sub branch (540, 550) or being leaves (530, 560, 570).
  • the leaves may be a reference to a named rule (550, 570) which implies a virtual extension of the branch of the tree towards a new branch (580) having an entry point stored in the Rule Name hash table.
  • a type In a LL(k) grammar, to each node in the Rule-Tree are associated a type and optionally attributes.
  • the types can be a rule, a loop, any, match, and switch.
  • the attributes may define the lookahead depth (k) , the token identifier to be found, the optional attribute for the node.
  • the e-mail like type of messages sent by the Message Type Identifier according to the preferred embodiment Fig. 6 are analyzed by the parser according to a specific LL(k) grammar Rule-Tree.
  • the parser access the entry of the Rule-Tree by reading the Rule-Tree repertory of the Rule-Tree data base.
  • the steps of the flow chart of Fig. 6 describe the exploration of the message to find matches with the different branches of the Rule-Tree.
  • an EAI administrator may decide to create and include in the Message Broker a Rule-Tree for RFC2822 defined e-mail messages according to the preferred embodiment.
  • the following e-mail is provided as a byte stream to the e-mail parser:
  • the message parsing consists in creating a Message-Tree if the message byte stream follows the rules of the grammar as described in the Rule-Tree.
  • the type and attributes of each node of the tree representation of the LL(k) grammar rules, the Rule-Tree is read in order to find a match with the input byte stream message.
  • the node type of "rule” aggregates other nodes and has a name that can be referenced in other rules such as match. If the node type is "loop” the generic codes will loop around all of the child until a token match fails. If the node type is "any" all of the node children are tested for a match.
  • the token identifier or the named rule is matched. If the type of the node is "switch” then an attribute specifies the parser name that will proceed to subsequent analysis of the message and then return to the current parser. By default all node children are processed in sequence. Any node with the optional attribute will not generate a parsing failure if no recognition occurs in the node.
  • the generic code will produce the Message-Tree, an in memory representation of the syntax tree of the parsed byte stream message buffer.
  • Fig. 6 illustrates a different embodiment of the invention as the LL(k) grammar Rule-Tree created by the EAI administrator is in fact a Rule-Object model as used for object programming languages.
  • the generic code for parsing is an object oriented programming code using the LL(k) grammar Rule-Object model dynamically created by the EAI administrator.
  • the object oriented programming language can be either Smalltalk or Java (Java is a trademark of Sun Microsystems in certain countries) or any other object oriented programming language.
  • the result of the parsing of an e-mail like type message is a Message-Object which is a way to logically represent the message more general than a Message-Tree. This logical representation can be used, for instance, by the further process of the Message Broker.
  • the flow chart of Fig. 6 describes a part of the method for parsing an e-mail like type message according to this specific object oriented implementation of the invention.
  • This part of the method if for looking for a match of the message syntax and a specific rule of the LL(k) grammar (Rule A). To perform the complete parsing of the e-mail like type message, this part is repeated with all the rules of the LL(k) grammar which depend on the initial rule until one match is found.
  • Rule A is the initial rule of the LL(k) grammar
  • the result of the process illustrated in Fig. 6 is the result of the parsing of the e-mail like type Message.
  • the result of rule A match process (680) is OK(a) (680) a is the Message-Object. If the result is NotOk, an exception error will be issued for this message.
  • Fig. 6 it is assumed that the EAI administrator has entered a the rules of a LL(k) grammar corresponding to the syntax of the e-mail like type messages that one wants to parse.
  • the EAI administrator has created a Rule-Object model.
  • the process for Rule A matching (605) is triggered by a first message "getObject (s) " where s is the e-mail like type message byte stream.
  • getObject messages is to check if there is a match with the rule and the byte stream and, if there is a match to get back the result Message-Object.
  • rule A is represented, as illustrated in Fig. 6, by five objects (610, 620, 630, 650, 660) which are the rule A object (610) having Class A as instantiation, the object Cardinal CB (620) having a method (mb) to hydrate the result (a) and its rule B process object (630) and the object Cardinal CC (650) having a method mc to hydrate the result (a) .
  • the method for finding a match is started (600) by sending the message getObject (s) to the rule B object with message byte stream s to be parsed.
  • the rule A object calls every cardinality object to hydrate this instance.
  • the hydrateAsNeeded(a, s) is sent (615) to the first
  • This cardinality object will invoke as needed (according to the cardinality rules) the rule B Process by sending (625) a getObject (s) message to the rule B Object
  • the result of the message can be OK (with an appropriate b object) or NotOK (635) . If an OK is returned, then the a object is hydrated by sending to it the mb message with the result as parameter "a.mb(b)". If the cardinality specification are satisfied, an OK or NotOk result is returned
  • Cardinality object CC This cardinality object will invoke as needed (according to the cardinality rules) the rule C Process by sending (655) a getObject (s) message to the rule C Object
  • the result of the message can be OK (with an appropriate c object) or NotOk (665) . If an OK is returned, then the a object is hydrated by sending to it the c message with the result as parameter "a.mc(c)". If the cardinality specification are satisfied, an OK or NotOk result is returned (670) . Note that the result of the cardinality can be OK even if the result of the rule CB is NotOk, if the rule C is not mandatory (cardinality may be zero) .
  • Rule A is not matched (NotOk received from a cardinality object, e.g. in 640 or 670) , a NotOk object is returned and the current cursor in the stream reset to its initial position (600) . In this case, the process is continuing at the level of the first invoker (sender of getObject message 600) . If rule A is matched, then an OK result is returned with the a object as the recognized token object (680) .
  • the result of the parsing implemented in object oriented programming code is a message object instance that can be used by the following processing of the Message Broker using also object oriented programming language.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Machine Translation (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

A method is disclosed for enhancing a Message Broker operating in a data processing system with parser dynamically created for e-mail like types of messages. The method comprising the steps of, for each type of e-mail like types of messages, creating in memory a Rule-Tree which is a tree based logical representation in memory of the rules of the LL(k) grammar corresponding to the type of the e-mail like types of messages; then, when a message is received and identified by the Message Broker as being one of said e-mail like types, reading the corresponding Rule-Tree and trying to match the LL(k) grammar rules on said message; if a grammar rule is matched, building in memory the Message-Tree which is a tree based logical representation in memory of the message; then, reading the logical representation of the message built in memory in output of the preceding step execution and using it in the following processing step of the Message Broker. The entry of the LL(k) grammar rules by the EAI administrator is preferably done with a graphical editor to bring more dynamicity to the parser creation.

Description

A METHOD AND SYSTEM FOR DYNAMICALLY CREATING PARSERS IN A MESSAGE BROKER
The present invention generally relates to Enterprise Application Integration programs and more particularly to the function of message analysis performed by the Message Broker which transforms the content of a message received from a first application to make it comprehensible by the target application.
Background of the Invention
With the use of Message Oriented Middleware (MOM) programs, the heterogeneous applications can communicate without being aware of the communication means between them. Using an asynchronous communication protocol, MOM programs provide message management between applications which become loosely coupled in a connectionless mode. The MOM takes care of the message transport layer and manages message queues for the applications. However, each application using the service of the MOM, needs to know the format of the messages of the application it communicates with.
In order to provide a deeper Enterprise Application Integration (EAI) , applications are adapted to appears as transaction oriented using use real-time messaging mechanism (e.g. Publish/subscribe, request/reply, synchronous, asynchrounous) to communicate. A new programming layer, the Message Broker programs, is responsible for looking into the message content to identify the message type, look at the corresponding message format, identify the logical content of the message, modify the message content if necessary, initiate the formatting adaptation to the target application and route the message to a MOM queue or directly to the target application. One example is the WebSphere MQIntegrator V2 (noted WMQI in the present document) of IBM (IBM, WebSphere and MQIntegrator are trademarks of IBM Corporation in certain countries) .
The Message Brokers currently sold by the EAI software manufacturers support either binary form (C language or Cobol messages) messages, tagged messages optionally including fixed formats (such as SWIFT, EDI or FIX messages) or tagged but flexible, self-describing messages such as with XML protocol.
A Message Broker may be generated with different parsers, each of them performing the syntactical analysis of a different message type. Usually, Message Broker programs provide an interface to add more parsers. However, the Message Brokers are not able to support more highly flexible syntax messages such as e-mail messages. This type of messages is well known under the name of context free language based messages. In the rest of the document they will be named 'e-mail like' type messages. WMQI Message Broker supports for XML type messages or Cobol like Custom Wire Format messages or tagged messages. These e-mail like type messages are messages having a more complex structure which cannot be described by COBOL or C structures or tags. As an example, an e-mail address can include optional comments between ,x ( " and ")" at any place in the address. According to the RFC2822 describing e-mail syntax, the expression Dupont.(I am the greatest) Marc @(the)Negas.WBA is a valid address and is equivalent to Dupont.Marc@Negas.WBA or even to Dupont(I am).(the)
Marc(greatest)@(The)Negas(in nevada).WBA. If a message contains such a variable content text, fixed parsing cannot handle it and tags description using the "(" would lead to too complex definition because of the occurence of such a comment at any location. In addition, these messages contain a syntactic structure that cannot be described with fixed tags because their interpretation is relying on a large sequence of tokens rather than one simple tag/value or even tag/subtags recursive sequences. Consequently, the solution consisting in modifying the message broker parsers as of today would lead to a huge amount of code to be repeatedly executed for applying the tagged or data structured message token discovery to multiple any location in the message, i.e. as many times as these locations are authorized by the syntax of these highly flexible messages. Furthermore, all the new cases could not be covered by such an syntactical analysis . One other drawback of this approach would be a poor level of quality of the corresponding code for maintenance purpose .
It is noted that to enrich the current Message Brokers with new parsers, one can use the parser plug-in API provided by most of the Message Brokers as of today. WMQI provides a C language API to add a parser so that it will handle any given message type or format provided that a suitable parser is available. Any new parser plugged in a Message Broker should also be adapted to provide output format compatible with the logical message representation in memory as expected by the other functions (routing, formatting) of the Message Broker.
Summary of the Invention
It is an object of the present invention to add to a Message Broker parsers able to handle context free language based messages such as e-mail messages.
It is another object of the invention to provide a quick easy way to add context free language based message parsers in a Message Broker.
These objects are achieved by the use of a method as claimed in claims 1 to 9 for enhancing a Message Broker operating in a data processing system with parser dynamically created for e-mail like types of messages. The method comprising the steps of, for each type of e-mail like types of messages, creating in memory a Rule-Tree which is a tree based logical representation in memory of the rules of the LL(k) grammar corresponding to the type of the e-mail like types of messages; then, when a message is received and identified by the Message Broker as being one of said e-mail like types, reading the corresponding Rule-Tree and trying to match the LL(k) grammar rules on said message; if a grammar rule is matched, building in memory the Message-Tree which is a tree based logical representation in memory of the message; then, reading the logical representation of the message built in memory in output of the preceding step execution and using it in the following processing step of the Message Broker. The entry of the LL(k) grammar rules by the EAI administrator is preferably done with a graphical editor to bring more dynamicity to the parser creation.
These objects are also reached with the computing system comprising means adapted for carrying out the method according to anyone of claims 1 to 9. The context free language based messages can be based on a LL(k) grammar because the syntax of these messages can be analyzed on the basis of a set of LL(k) grammar rules that one person skilled in the art can easily define. The LL(k) 5 grammars up to now known for high level programming language compilation, are particularly adapted to the syntactical analysis of context free language based messages.
With the use of LL(k) grammar, the token discovery process is adapted to context free language based messages, .0 thus avoiding the weakness of the algorithms based on repeated token discovery process used with tagged or data structured. One consequence is the good quality of the corresponding code to perform the syntactical analysis and the simplicity of the maintenance of that code.
L5 One other advantage of the invention is the possibility to create in one pass the rule-tree corresponding to one context free language based message type. A generic parser is created once and plugged in the Message Broker. It will be able to parse the context free language based messages by the
JO use of the rule-tree just defined.
The solution provides a high dynamicity as the creation of the rule-tree in memory may be facilitated by a graphical user interface according to the preferred embodiment . Furthermore, the EAI administrator can not only create a new 25 rule-tree corresponding to a new LL(k) grammar but can also modify an existing rule-tree file if he wants to reflect a change in the rules of that LL(k) grammar.
Consequently, with the solution of the present invention, the Message Broker can be enriched in one pass with different 30 parsers handling different context free language based messages such as e-mail messages defined by the RFC2822. With the solution of the present invention, a customer buying a Message Broker to build an EAI environment can use the parser plug in interface of the Message Broker to add the e-mail like type message parser. A Message Broker program manufacturer can enrich the program by adding such a parser in the Message Broker program. Using the solution of the present invention, a Message Broker program manufacturer can also include as a program component the rule-tree generator and the generic LL(K) grammar parser to let the user generate himself the parsers to process any type of e-mail type messages he desires .
Brief Description of the Drawings
Fig. 1 illustrates an EAI environment allowing a set of heterogeneous applications to communicate together;
Fig. 2 shows the logical blocks of a Message Broker according to the preferred embodiment, implemented as a program executing in a computing system;
Fig. 3 shows the general flow chart of the method of the Message Broker according to the preferred embodiment;
Fig. 4 shows the flow chart of the method to generate a parser used in the method for brokering messages according to the preferred embodiment;
Fig. 5 shows the Rule-Tree memory graphical illustration;
Fig. 6 illustrates one embodiment of the invention using an object oriented programming language on one example of looking of a match with one rule of the LL(k) grammar;
Fig. 7 shows a graphical illustration of one rule taken as an example .
Detailed Description of the preferred embodiment
Fig.l illustrates the use of a Message Broker (105) in the Enterprise Application Integration programs for a typical business environment. A set of applications (140) may operate on different computers communicating through local (150) or wide area networks (130) , either public or private. The EAI programs are preferably installed and operates on a separate computer (100) connected to the application computers. Any applications may receive messages from other applications and send back answers or sends new requests to other applications through the EAI computer. Standalone customers (110, 120) are connected through a Public network to an SMTP server (140) can also communicate with the applications through the EAI programs. The applications which communicate through the EAI programs may use different programming languages depending on the time they were developed, they may also use on the computers different hardware and operating systems. A MOM software (115) uses a common transport protocol such as HTTP or TCP/IP to perform asynchronous communication with the applications. The messages received by the MOM are sent to the Message Broker which looks into their content and route them to the queue of the MOM for a target application. The Message Broker software layer has been adapted to the application environment, he knows the message formats, the processing rules and supports different types of messages. The Message Broker, according to the preferred embodiment of the invention, is able to receive e-mail like type messages.
Fig. 2 describes the program logical blocks of the Message Broker (200) according to the preferred embodiment. The dotted lines describe the path used by a e-mail like type message in the Message Broker logical blocks. The Message Broker receives messages from applications through an INPUT/OUTPUT interface logical block (250) . The INPUT/OUTPUT interface may be of different type: the Message Broker may have a direct API interface with an application (241, 215) or may have an interface to the MOM which handles (235, 245, 265) a common transport layer for all the applications (HTTP as an example) . The output of this interface logical block is a byte stream containing information which need to be processed by the Message Broker and transmitted to the target application. An e-mail like type message is received via the MOM (245) by the Message Broker. A message type identification block (255) switches the message byte stream to one of the parser logical blocks (260, 265, 270). In Fig. 2 more than one parser are represented, each of them corresponding to a specific message type. Some parsers may be provided by the Message Broker, other ones may have been added using the parser plug-in API available in most of the Message Brokers. The e-mail like type message is sent to a logical block (270) for executing a generic code able to parse any e-mail like type messages. For specifying the e-mail type of message, the message type identification block (255) provides to the e-mail type message parser (270) a pointer to a specific LL(k) grammar Rule-tree saved in a Rule-Tree data base (275) . As explained later in the document in reference with the same figure, the LL(k) grammar Rule-Tree has been previously created via a new logical block (242). Thus, the logical block for executing the generic code for parsing (270) receives as input data the message byte stream and a pointer to a specific Rule-Tree. The syntactical analysis of the message performed in the parsing logical block consists in reading the message and identifying the matches with the rules in the Rule-Tree. The output of the parsing may be an exception error transmitted to the Message Broker which sends back an error to the originator application if no match has been found with the rules. If matches are found during the syntactical analysis of the message, the message follows the grammar rules and the output of the parsing logical block is a logical representation of the message which has been built in memory during the message analysis. It is noted that most of the Message Brokers have a tree based representation of the message. Thus, most of the time, the parsing logical block provides to the Message Broker a tree based representation of the e-mail like type message in memory. The message Broker uses the Message-Tree to create a message content as expected by the target application in the Message Content Processing logical block (280) . The process is performed according to processing rules (290) defined by the EAI environment administrator. The processing rules are used to analyze a specific message semantic content (a message format to update a banking account, for instance) and to perform logical routing. According to specific processing rules, in the Message Content Processing logical block (280) may be computed a new field as expected by a target application. The message content is then sent to one or more target application (s) or one or more specific message queue in the MOM through the INPUT/OUTPUT interface logical block (250). The message formats (295) are applied on the output interface (250) for producing the physical byte stream as expected by the target application (s) .
The generic code for parsing e-mail like types of messages is a LL(k) grammar based syntactical analyzer. Furthermore, this analyzer reads a byte stream message, and referring to a Rule-Tree representing the LL(k) grammar rules in memory, finds the possible matches in the message with the grammar rules. Coming back to the Message Broker, in the preferred embodiment, a logical block (242) is added for allowing an EAI administrator to enter on the computer the description of the Rules for a LL(k) grammar corresponding to the syntax of a specific e-mail type of message. In the logical block is performed the building in memory of a tree based representation of the grammar rules . The Rule-Tree may be stored on a data base for further utilization.
It is noted that, in the preferred embodiment, the grammar rule entry is performed through a graphical user interface. As a matter of fact, grammar rules can be easily graphically represented as a tree. The Rule-Tree being the logical view of the graphical representation. It is noted also that still in the logical block for Rule-Tree preparation, an existing Rule-Tree in memory can be edited for modifications of the grammar rules.
Each Rule-Tree created by the EAI administrator corresponds to a LL(k) grammar syntax and thus to one e-mail like type of message. As explained sooner an access to a Rule-Tree corresponding to a given e-mail like message type is assigned to the parser (270) in the Message Type Identification logical block (255). The parser will perform the syntactical analysis of a byte stream corresponding to the e-mail like type identified in the previous logical block and will build in memory a Message-Tree representing the message in a way expected by the Message Broker following logical block (280) . Consequently, the EAI administrator by creating a Rule-Tree can dynamically enable the generic e-mail like type message parser (270) to support parsing of messages of the corresponding type of e-mail like type.
This is made possible because, a syntax grammar usually is described with a notation called a syntactic meta-language. This notation is used to define the set of rules that describe the structure of programming languages. The usual notations derive from BNF (Backus-Naur Form) , a notation that started with the Algol 60 programming language (Naur, 1960) . Some recent forms are called eBNF (extended) and is described in the ISO/IEC 14977 standard. As an example Unix systems provide the lex and yacc formal tokenizer and parser that use a form of BNF. When used for compiling context free based programming languages, LL grammars, which first L stands for left to right scanning of the input programming source code and the second L for producing the leftmost derivation (also known as top down parsing) . The LL grammar based compilers generate source code based on dedicated function calls with names matching the rules .
Fig 3 shows the general flowchart of the method of a Message Broker according to the preferred embodiment. According to the preferred embodiment, the EAI administrator has to enable the Message Broker to support an e-mail like type of messages (300) . The EAI administrator preferably uses a graphical editor to enter a graphical representation of the LL(k) grammar rules corresponding to the syntax of these e-mail like type of messages. A programming language or a descriptive language such as XML could be used to enter the grammar rules but this operating mode is not preferred as less dynamic than with a graphical editor. At the end of this entry a Rule-Tree is created in memory and saved for further use by the Message Broker. The Rule-Trees are generally stored on a data base accessible by the computer operating the EAI programs. The administrator will generate a new reference to this Rule-Tree in the Message Broker in order that the Message Type Identification be able to identify this new type of message and this new Rule-Tree. The tools for generating and installing a new parser in a Message Broker are part of the Message Brokers basic functions.
Once the Rule-Tree is generated, the Message Broker can at any time handle the e-mail like type of messages for which the syntax follows the grammar rules described in the Rule-Tree. When such a message is received by the Message Broker, a Message Type Identification is performed and the message is transmitted to a parser which can be any type . In the Message Type Identification logical block are used some information in the message headers or specified in its own properties to determine how the message body is parsed: - If the message has a recognizable routing header, the domain identified in the header is used to decide which message parser is invoked. - If the message does not have a recognizable header, or the header does not identify the domain, but the properties in the Message Type Identification logical block indicate the domain of the message, the parser specified by the property is invoked. The parser specified can be a plug-in parser such as an e-mail like type message parser.
- If the message domain cannot be identified, the message is handled as a binary object.
If the message is of the e-mail type corresponding to the new Rule-Tree (answer Yes to test 315) , the message byte stream is provided to the generic parser for e-mail like type message interpretation. This interpretation step (330) consists in reading the message and analyzing if the syntax is in line wit the grammar rules read in the tree. As described later in the document in reference to Fig. 6, the Rule-Tree branches will be explored up to the finding of a rule match or rule matches with the content of the message. A Message-Tree is built during the look for a match process, one branch is created each time a match is found. At the end of the interpretation a Message-Tree is stored in memory in a format adapted to the Message Broker processing step (380) following the parsing. The parsing step, according to the preferred embodiment is an interpretation rather than a compilation because its execution results in a creation in memory of a directly usable data, the Message-Tree. The Message-Tree is used in the following step of the Message Broker to create a message content (360) as expected by the target application
(s) receiving it. The message is then routed (370) towards the target application (s) . These two last steps have not changed in the Message Broker.
Fig. 4 describes the steps of dynamic creation of a
Rule-Tree by the EAI administrator. When the administrator starts the Rule-Tree creation (400) , a graphical tree editor is started (410) . If there is a LL(k) grammar rule or node to enter (answer yes to test 420) , a node is drawn as a leaf and/or a rule is drawn as a branch (430) of a tree. The corresponding logical representation of the leaves and branch is then created in memory (440) . If there no more new node or rule to enter (answer no to test 420) , the tree as drawn is saved in memory. The saving can be done in a data base having a repertory in which one entry contains one initial rule name of a LL(k) grammar Rule-Tree and a pointer to it. Also, when the administrator changes an existing rule, he can starts the Rule-Tree creation (400) and choose to start it on an already saved Rule-Tree that it identifies by its initial rule name. The graphical editor (410) retrieves the Rule-Tree, displays it and allows delete of a tree, a branch or a leave or entry of a new leave, or new branch in the case he has a new rule or node to enter (answer yes to test 420) .
On example of one LL(k) grammar rule is:
StopChkAudRs : ( [Status] RqUID [AsyncRqUID Custid] ) [ [RecCtrlOut ] (DepAcctld | CardAcctld | LoanAcctld ) ] [SelRangeDt ChkRange]
(StopChkMsgRec )*];
This LL(k) grammar rule is graphically represented as a tree in Fig. 7. Fig. 7 is a visual representation of the LL(k) grammar rule above given as an example. It is represented as a part of a tree of the complete LL(k) grammar rules on the display of the EAI administrator operating the graphical editor activated by the Rule-Tree preparation operation (in logical block 242 of Fig. 2) .
StopChkAudRs (700) is the named rule that is represented as a tree. When an attribute Seq, for sequence is indicated (710, 720, 730), the following sequence of rules must match in the order as noted. If the attribute is SWT, for switch, is indicated (760) , any of the rules can match. + is a signed used in this graphical representation to indicate that the element is not terminal and this is a reference to one other Named rule . In the example of the rule in Fig. 7 only three elements (740, 750, 770) are terminal/final elements.
The non terminal element 770 has a cardinality from 1 to n, this means that it may not be found or may be repeated up to n times. This is also represented by the dotted lines which represent optional paths (cardinality of these paths means that the cardinality may be 0 or 1) . This is why the cardinality of element 770 which join able via a dotted line may be from 0 to n.
Fig. 5 is a visual representation of a general LL(k) grammar as a tree on the display of the EAI administrator operating the graphical editor activated by the Rule-Tree preparation operation (in logical block 242 of Fig. 2). The e-mail like message type syntax can be well defined by the rules of a LL(k) Grammar. A syntax grammar usually is described with a notation called a syntactic meta-language. The various syntactic parts are named. Terminal elements are usually called tokens and rules aggregating those terminals are called non-terminals . The valid sequences of symbols for the language or the message are described using the notation. The syntax of any valid language or message structure is also described.
The tree (505) has an entry point which is the initial rule name table (510) . From this table which is the root of the tree are attached two branches ( also called children) (515, 580) having a main node (520, 580) which are named. The first branch (515) is the main branch of the tree. In this first there are two references (550, 570) to the second branch (580) . This implies that when discovering the first branch (515) three different sub branches are found (530, 560, 570) each of them having either a sub sub branch (540, 550) or being leaves (530, 560, 570). The leaves may be a reference to a named rule (550, 570) which implies a virtual extension of the branch of the tree towards a new branch (580) having an entry point stored in the Rule Name hash table.
In a LL(k) grammar, to each node in the Rule-Tree are associated a type and optionally attributes. The types can be a rule, a loop, any, match, and switch. The attributes may define the lookahead depth (k) , the token identifier to be found, the optional attribute for the node. The use of the node types and attributes are explained later in the document in reference to Fig. 6 describing the method for performing a syntactical analysis of a byte stream message in accordance with the LL(K) grammar rules described in the Rule-Tree.
The e-mail like type of messages sent by the Message Type Identifier according to the preferred embodiment Fig. 6 are analyzed by the parser according to a specific LL(k) grammar Rule-Tree. The parser access the entry of the Rule-Tree by reading the Rule-Tree repertory of the Rule-Tree data base. The steps of the flow chart of Fig. 6 describe the exploration of the message to find matches with the different branches of the Rule-Tree.
As an example, an EAI administrator may decide to create and include in the Message Broker a Rule-Tree for RFC2822 defined e-mail messages according to the preferred embodiment.
The following e-mail is provided as a byte stream to the e-mail parser:
Received: from x.y.test by example.net via TCP with ESMTP id ABC12345 for <mary@examρle.net>; 21 Nov 199710:05:43 -0600 Received: from machine.exampleby x.y.test; 21 Nov 199710:01:22 -0600 From: John Doe <jdoe@machine.example> To: Mary Smith <mary@example.net>, from@domain,":sysmail"@ Some-Group.Some-Org,Guess.(I am the greatest) Who @(the)Vegas.WBA Reply-To: reply@domain Sender: sender@domain
Subject: Saying Hello, test of a subject with @ special \ ( characters ) ( inside of the #~ subject
Date: Fri, 21 Nov 1997 09:55:06 -0600 Message-ID : <1234 @ local.machine.example>
This is a message just to say hello. So, "Hello".
Once parsed according to the method described the ollowing Message-Tree is obtained:
Message — — > Received: from x. etc
+ — > Received: from machine, etc
+ — > From: + — > Angle address + — > display-name : John Doe
+ — > local-address: jdoe
+ — > domain : machine.example
+ — > To: + — > Angle address + — > display-name : Mary Smith
+ — > local-address: mary
+ — > domain : example.net
+ — > Classic address÷ — > local-address: from + — > domain : domain
+ — > Quoted address + — > local-address: :sysmail
+ — > domain : Some-Group. Some-Org
+ — > Classic address+ — > local-address: Guess. Who + — > domain : Vegas.WBA
+ — > Reply-To: + — > Classic address÷ — > local-address: reply
+ — > domain : domain
+ — > Sender: + — > Classic address÷ — > local- address: sender
+ — > domain : domain
+ — > Subject: Saying Hello, test of a subject
+— > Date: Fri, 21 Nov 1997
+ — > Message-ID: <1234@local.machine
+ — > Body of message This is a message The message parsing consists in creating a Message-Tree if the message byte stream follows the rules of the grammar as described in the Rule-Tree. The type and attributes of each node of the tree representation of the LL(k) grammar rules, the Rule-Tree is read in order to find a match with the input byte stream message. In the branches of the Rule-Tree, the node type of "rule" aggregates other nodes and has a name that can be referenced in other rules such as match. If the node type is "loop" the generic codes will loop around all of the child until a token match fails. If the node type is "any" all of the node children are tested for a match. If the type of the node is "match" then the token identifier or the named rule is matched. If the type of the node is "switch" then an attribute specifies the parser name that will proceed to subsequent analysis of the message and then return to the current parser. By default all node children are processed in sequence. Any node with the optional attribute will not generate a parsing failure if no recognition occurs in the node. The generic code will produce the Message-Tree, an in memory representation of the syntax tree of the parsed byte stream message buffer.
Fig. 6 illustrates a different embodiment of the invention as the LL(k) grammar Rule-Tree created by the EAI administrator is in fact a Rule-Object model as used for object programming languages. Furthermore, the generic code for parsing is an object oriented programming code using the LL(k) grammar Rule-Object model dynamically created by the EAI administrator. The object oriented programming language can be either Smalltalk or Java (Java is a trademark of Sun Microsystems in certain countries) or any other object oriented programming language. The result of the parsing of an e-mail like type message, is a Message-Object which is a way to logically represent the message more general than a Message-Tree. This logical representation can be used, for instance, by the further process of the Message Broker. The flow chart of Fig. 6 describes a part of the method for parsing an e-mail like type message according to this specific object oriented implementation of the invention. This part of the method if for looking for a match of the message syntax and a specific rule of the LL(k) grammar (Rule A). To perform the complete parsing of the e-mail like type message, this part is repeated with all the rules of the LL(k) grammar which depend on the initial rule until one match is found. If Rule A is the initial rule of the LL(k) grammar, the result of the process illustrated in Fig. 6 is the result of the parsing of the e-mail like type Message. If the result of rule A match process (680) is OK(a) (680) a is the Message-Object. If the result is NotOk, an exception error will be issued for this message.
In Fig. 6, it is assumed that the EAI administrator has entered a the rules of a LL(k) grammar corresponding to the syntax of the e-mail like type messages that one wants to parse. The EAI administrator has created a Rule-Object model. The process for Rule A matching (605) is triggered by a first message "getObject (s) " where s is the e-mail like type message byte stream. The meaning of getObject messages is to check if there is a match with the rule and the byte stream and, if there is a match to get back the result Message-Object. One other message is used in the match process which is "hydrateAsNeeded(a, s) ", a being a buffer allocate for the Message-Object and which will be filled during the process of matching. The message "hydrateAsNeeded" means that the Message-Object buffer allocated must be filled or completed by other elements. In the example of Fig. 6, it is assumed that Rule A comprises the successive execution of two rules Rule B, Rule C; each of these rules must be matched n times, that is with a certain "Cardinality". If the cardinality is zero this occurrence must not be verified, if the cardinality is n, this occurrence must be verified n times and if it is not the case, there is an error. The same process is described using the object representations as follows.
For instance, in this model rule A is represented, as illustrated in Fig. 6, by five objects (610, 620, 630, 650, 660) which are the rule A object (610) having Class A as instantiation, the object Cardinal CB (620) having a method (mb) to hydrate the result (a) and its rule B process object (630) and the object Cardinal CC (650) having a method mc to hydrate the result (a) .
The method for finding a match is started (600) by sending the message getObject (s) to the rule B object with message byte stream s to be parsed. The Rule A prepares an instance a of the class ClassA associated to the rule ("a=new ClassA;") (610) .
Then, the rule A object calls every cardinality object to hydrate this instance.
The hydrateAsNeeded(a, s) is sent (615) to the first
Cardinality object CB. This cardinality object will invoke as needed (according to the cardinality rules) the rule B Process by sending (625) a getObject (s) message to the rule B Object
(630) . Note that the details for the Rule B Process are not shown (the process is similar to the Rule A match process) .
The result of the message can be OK (with an appropriate b object) or NotOK (635) . If an OK is returned, then the a object is hydrated by sending to it the mb message with the result as parameter "a.mb(b)". If the cardinality specification are satisfied, an OK or NotOk result is returned
(640) . Note that the result of the cardinality can be OK even if the result of the rule B is NotOk, if the rule B is not mandatory (cardinality may be zero) . If the Rule A object is composite (sequence) , the process is continuing from the Rule A object by sending to the second cardinality a hydrate(a,s) message (645). Similarly to the match verified for Rule B , a match is verified with Rule C as follows. The hydrateAsNeeded(a, s) is sent (645) to the first
Cardinality object CC . This cardinality object will invoke as needed (according to the cardinality rules) the rule C Process by sending (655) a getObject (s) message to the rule C Object
(660) . Note that the details for the Rule C Process are not shown (the process is similar to the Rule A match process) . The result of the message can be OK (with an appropriate c object) or NotOk (665) . If an OK is returned, then the a object is hydrated by sending to it the c message with the result as parameter "a.mc(c)". If the cardinality specification are satisfied, an OK or NotOk result is returned (670) . Note that the result of the cardinality can be OK even if the result of the rule CB is NotOk, if the rule C is not mandatory (cardinality may be zero) .
If Rule A is not matched (NotOk received from a cardinality object, e.g. in 640 or 670) , a NotOk object is returned and the current cursor in the stream reset to its initial position (600) . In this case, the process is continuing at the level of the first invoker (sender of getObject message 600) . If rule A is matched, then an OK result is returned with the a object as the recognized token object (680) .
The result of the parsing implemented in object oriented programming code is a message object instance that can be used by the following processing of the Message Broker using also object oriented programming language.

Claims

Claims
1. A method for enhancing a Message Broker operating in a data processing system with parser dynamically created for e-mail like types of messages, said method comprising the
5 steps of:
- for each type of e-mail like types of messages, creating in memory a Rule-Tree which is a tree based logical representation in memory of the rules of the LL(k) grammar corresponding to the type of the e-mail like types of
LO messages;
- when a message is received and identified by the Message Broker as being one of said e-mail like types, reading the corresponding Rule-Tree and trying to match the LL(k) grammar rules on said message;
L5 - if a grammar rule is matched, building in memory the Message-Tree which is a tree based logical representation in memory of the message;
- reading the logical representation of the message built in memory in output of the preceding step execution and using it
20 in the following processing step of the Message Broker.
2. The method of claim 1 wherein if no grammar rule is matched, sending back to the Message Broker an exception error on the syntax of the message because the LL(k) grammar rules where not matched in said message.
25 3. The method of anyone of claims 1 or 2 wherein the step of creating in memory a Rule-Tree, further comprise a step of using a graphical user interface to draw the tree representing the grammar rules .
4. The method of anyone of claims 1 or 2 wherein the step of creating in memory a Rule-Tree, further comprise a step of using a programming language to enter the tree representing the grammar rules .
5. The method of anyone of claims 1 to 4 wherein the step of creating in memory a Rule-Tree further comprises an editor allowing a modification or deletion of an existing Rule-Tree.
6. The method of anyone of claims 1 to 5 further comprising a step for storing on a data base the Rule-Trees in the step for creating the Rules-Trees and where, in the step for reading the corresponding Rule-Tree the Rule-Tree is read on said data base.
7. The method of anyone of claims 1 to 6 wherein, in the Rule-Tree data base a repertory for pointing to the initial named rules of the Rule-Trees .
8. The method of anyone of claims 1 to 7 wherein each Rule-Tree comprises a hash table as a repertory of all the Named rules of the LL(k) grammar Rule-Tree.
9. The method of anyone of claims 1 to 8 wherein the step for creating a Rule-Tree is for creating a Rule-Object model, wherein the step reading the Rule-Tree is for reading the Rule-Object model, and wherein the step for building in memory the Message-Tree is for building a Message Object which is an object oriented logical representation in memory of the message.
10. A computing system comprising means adapted for carrying out the method according to anyone of claims 1 to 9.
PCT/EP2003/015015 2003-01-07 2003-11-14 A method and system for dynamically creating parsers in a message broker WO2004061651A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2003303605A AU2003303605A1 (en) 2003-01-07 2003-11-14 A method and system for dynamically creating parsers in a message broker
EP03808294A EP1581869A2 (en) 2003-01-07 2003-11-14 A method and system for dynamically creating parsers in a message broker

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP03368006.7 2003-01-07
EP03368006 2003-01-07

Publications (2)

Publication Number Publication Date
WO2004061651A2 true WO2004061651A2 (en) 2004-07-22
WO2004061651A3 WO2004061651A3 (en) 2005-07-14

Family

ID=32695665

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2003/015015 WO2004061651A2 (en) 2003-01-07 2003-11-14 A method and system for dynamically creating parsers in a message broker

Country Status (4)

Country Link
EP (1) EP1581869A2 (en)
CN (1) CN1997965A (en)
AU (1) AU2003303605A1 (en)
WO (1) WO2004061651A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7636787B2 (en) 2005-01-21 2009-12-22 Huawei Technologies Co., Ltd. Parser for parsing text-coded protocol
CN101180627B (en) * 2005-01-28 2011-06-15 爱克发公司 Message-based connectivity manager.
CN112860233A (en) * 2019-11-28 2021-05-28 华为技术有限公司 Target syntax tree generation method and related equipment
CN115712563A (en) * 2022-11-03 2023-02-24 上海安般信息科技有限公司 Grammar variation-based fuzzy test method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2354850A (en) * 1999-09-29 2001-04-04 Ibm Message broker using tree structures

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2354850A (en) * 1999-09-29 2001-04-04 Ibm Message broker using tree structures

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "LL parser" WIKIPEDIA, [Online] 5 November 2002 (2002-11-05), XP002326871 Retrieved from the Internet: URL:http://en.wikipedia.org/w/index.php?ti tle=LL_parser&oldid=1577428> [retrieved on 2005-05-02] *
DUNN T: "Design and Implementation Considerations for IBM WebSphere MQ Integrator Message Flows" IBM DEVELOPERWORKS, [Online] 20 November 2002 (2002-11-20), pages 1-4, XP002326621 Retrieved from the Internet: URL:http://www-106.ibm.com/developerworks/ websphere/library/techarticles/0211_dunn/d unn.html> [retrieved on 2005-04-28] *
WAGNER U ET AL: "Generic Parser for an Evolving Mapping Language" INTERNET ARTICLE, [Online] December 2000 (2000-12), XP002326870 Retrieved from the Internet: URL:http://cib.bau.tu-dresden.de/forschung sbericht/res-act/2001/res-act-01.pdf> [retrieved on 2005-05-02] *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7636787B2 (en) 2005-01-21 2009-12-22 Huawei Technologies Co., Ltd. Parser for parsing text-coded protocol
CN101180627B (en) * 2005-01-28 2011-06-15 爱克发公司 Message-based connectivity manager.
CN112860233A (en) * 2019-11-28 2021-05-28 华为技术有限公司 Target syntax tree generation method and related equipment
CN112860233B (en) * 2019-11-28 2024-03-15 华为云计算技术有限公司 Method for generating target grammar tree and related equipment
CN115712563A (en) * 2022-11-03 2023-02-24 上海安般信息科技有限公司 Grammar variation-based fuzzy test method

Also Published As

Publication number Publication date
CN1997965A (en) 2007-07-11
EP1581869A2 (en) 2005-10-05
AU2003303605A8 (en) 2004-07-29
AU2003303605A1 (en) 2004-07-29
WO2004061651A3 (en) 2005-07-14

Similar Documents

Publication Publication Date Title
JP3272014B2 (en) Method and apparatus for creating a data processing dictionary including hierarchical data processing information
US7391735B2 (en) Parsing messages with multiple data formats
US8701080B2 (en) Template components having constraints representative of best practices in integration software development
US8495136B2 (en) Transaction-initiated batch processing
CN111338637B (en) Code generation method and device
JPH11272667A (en) Method and device for preparing structured document and storage medium stored with program for preparing structured document
KR20010024487A (en) Method and apparatus for structured communication
KR20110066087A (en) Consolidating duplicate messages for a single destination on a computer network
US20200111487A1 (en) Voice capable api gateway
CN112764726A (en) Data synthesis method and device
US6829758B1 (en) Interface markup language and method for making application code
WO2004061651A2 (en) A method and system for dynamically creating parsers in a message broker
CN110311826B (en) Network equipment configuration method and device
CN115168365B (en) Data storage method and device, electronic equipment and storage medium
US11973595B2 (en) Recast repetitive messages
CN113626001A (en) API dynamic editing method and device based on script
Bacchiani et al. A session subtyping tool
EP1584027A2 (en) A method and system for improving message syntactic analysis in a message broker
CN113296745B (en) Data processing method and device, computer readable storage medium and processor
CN112181474B (en) Block chain service processing method, electronic device and computer storage medium
CN117707503A (en) Method, apparatus, device, storage medium and program product for generating code
JP2004265164A (en) Service cooperation system and service cooperation method between client and server using data transfer protocol
CN118535136A (en) API request function specification management method, terminal equipment and medium
Sinnott et al. Finite state machine based SDL
CN117193890A (en) Method and device for realizing unified mode call by integrating external interfaces

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 20038A83463

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2003808294

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2003808294

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2003808294

Country of ref document: EP

NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP