CA2246943A1 - Method and means for encoding storing and retrieving hierarchical data processing information for a computer system - Google Patents

Method and means for encoding storing and retrieving hierarchical data processing information for a computer system Download PDF

Info

Publication number
CA2246943A1
CA2246943A1 CA002246943A CA2246943A CA2246943A1 CA 2246943 A1 CA2246943 A1 CA 2246943A1 CA 002246943 A CA002246943 A CA 002246943A CA 2246943 A CA2246943 A CA 2246943A CA 2246943 A1 CA2246943 A1 CA 2246943A1
Authority
CA
Canada
Prior art keywords
data
data stream
definition
command
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002246943A
Other languages
French (fr)
Inventor
Dora Vell
Peter K. Shum
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IBM Canada Ltd
Original Assignee
Ibm Canada Limited-Ibm Canada Limitee
Dora Vell
Peter K. Shum
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ibm Canada Limited-Ibm Canada Limitee, Dora Vell, Peter K. Shum filed Critical Ibm Canada Limited-Ibm Canada Limitee
Priority to CA002246943A priority Critical patent/CA2246943A1/en
Publication of CA2246943A1 publication Critical patent/CA2246943A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/06Notations for structuring of protocol data, e.g. abstract syntax notation one [ASN.1]

Abstract

A data transmission dictionary is provided, which is adapted for use by a computer system for encoding, storing, or retrieving hierarchically related data transmission information. The dictionary is comprised of a group of one or more computer searchable definition trees relating to transmission information of the computer system. The trees are derived from a first definition group which includes characteristics of commands, replies or data usable by the computer system. The characteristics include structure and value properties and restrictions, if any, applying to the commands, replies or data. Each tree represents, respectively, a definition of a the command, reply or data to which it relates. Each tree includes a root node identified by name, eg. a codepoint. The root node includesinformation describing the type of definition tree concerned (i.e. whether it relates to a command, reply or data), and may include one or more internal or terminal descendant nodes. These nodes represent components of the definition represented by the tree. The descendent nodes include level information describing the level of the node within its tree.
The nodes may include attribute information, and may include value requirements relating to transmission information represented by the nodes.

Description

CA 02246943 l998-09-04 METHOD AND MEANS FOR ENCODING STORING AND RETRIEVING
HIERARCHICAL DATA PROCESSING INFORMATION FOR A COMPUTER
SYSTEM

FIELD OF THE INVENTION
This invention relates to data processing and storage systems and in particular to methods and means for specifying the syntax of a hierarchical language for use in data Ildns",issions of such systems.
s BACKGROUND OF THE INVENTION
Data processing, for instance distributed processing, requires a connection protocol that definesspecificflows,andi"l~:,dulions.Theseflowsandi"te:,d"tivllsconveytheintentand results of distributed p,uces~i"g requests. The protocol is necessary for semantic 0 connectivity between a,~ s and plucessol~ in a distributed environment. The protocol must define the responsibilities between the participants and specify when flows should occur and their contents. Distributed applications allow operations to be pl ucessed over a network of coope, dlil ,9 processors.

Clients and servers send i"fur",dlion between each other using that set of protocols.
These protocols define the order in which messages can be sent and received, the data that accom,ual1ies the messages, remote processor connection flows, and the means for converting data that is received from foreign environments.

20 The client provides the col1l1eulion between the application and the servers via protocols.
It supports the ar~F I " .1 end of the connection by: (1 ) Initiating a remote connection (2) Translating requests from the application into the :,ldnda,vi~:d format, otherwise known as genel dlil ,9, (3) Translating replies from ::~tdl Iddl di~ :d formats into the ., ,~ ' n format, otherwise known as parsing, (4) Disconnecting the link from the remote processor when 2s the ~PF' - tic n terminates or when it switches processors.

The server responds to requests received from the client. It supports the server end of the connection by: (1) Accepting a connection (2) Receiving input requests and data and converting them to its own internal format (parsing) (3) Constructing (generating) and sending standardized reply messages and data.

In a particular distributed data prucessi, ,9 architecture uses the Distributed Data Manag~",er,lArchitecture (DDM) used forthe ~Idl1ddldi~d formatofthe messages. DDM
provides the conceptual framework for constructing common interfaces for command and reply interchange between a client and a server. Most DDM commands have internalstatement counterparts.

BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 depicts DDM Objects.
Fig. 2 depicts a DDM Object l"l~,~l,al)ge Format.
Fig. 3 depicts a flowchart illustrating depth first searching.
Figs. 4a b illustrate a example DDM Object: Root Node as defined in the architecture.
Figs. 5a b illustrate an example of the Root Node OPNQRY.
Fig. 6 comprises a diagram ,~p,~se"li"g a method of constructing the definition for loosely coupled files.
Fig. 7 illustrates a tree for the Command portion of ACCRDBRM.
Fig. 8 depicts an example of retrieving a definition for the LCF method.
Fig. 9 depicts a CASE method as used in RSM.
Fig. 10 comprises a diagram ,~pr~se"li"g the construction of a DDM definition bythe root storage method.
Fig. 11 depicts an example of retrieving a definition for the RSM method.
Fig. 12 depicts an ADDG Flowchart.
Fig. 13 depicts a flowchart for step 1 of ADDG; generate DDMTXT.
Fig. 14 depicts a flowchart for step 2 of ADDG; create DDM definitions.
Fig. 15 depicts a flowchart for step 3 of ADDG; assemble DDM definitions.

CA 02246943 l998-09-04 Fig. 16 depicts ADDG tool pseudocode.
Figs. 17a-l depict an i~,ulemenl~d DDM dictionary and retrieval method in accordance with the instant invention.
Fig. 18 comprises a ,~p,~se"ldlion of a DDM Command in the form of a tree.
s Fig. 19 illustrates the DDM Dictionary Definition Syntax.
Fig. 20 depicts parsers and gel1elalul~ in a Distributed System.
Fig. 21 illustrates a tree for the Command portion of OPNQRY.
Fig. 22 illustrates a tree for the Command Data portion of OPNQRY.
Fig. 23 illustrates a tree for the Reply Data portion of OPNQRY.
Fig. 24 depicts the method of parsing and yent:ldliol1 employed by the instant invention.

DEFINITIONS
The following definitions are provided to assist in und~,~ldnd;"g the invention described below. Additional information may be found in the manual, "IBM Distributed Data Management Architecture Level 3: Reference, SC21-9526".

DSS (Data Stream Structure): DDM can be viewed as a multi-layer architecture forcommunicating data ",anage",e"l requests between servers located on different data processing systems. All information is exchanged in the form of objects mapped onto a data stream d,uprupridl~ to communication facilities being used by DDM. A data stream structure is a set of bytes which contains, among others, information about whether the enclosed structure is a request, reply, or data (an object structure); whether the structure is chained to other structures; etc. There are three general types of DDM data stream structures: "request structures" ( RQSDSS ) which are used for all requests to a target system for processing; "reply structures" ( RPYDSS ) which are used for all replies from a target system to a source system regarding the conditions detected during the processing of the request; and "object structures" ( OBJDSS ) which are used for all objects sent between systems.

CA 02246943 l998-09-04 Mnemonic: specifies a short form of the full name of a DDM object.

Class: describes a set of objects that have a common structure and respond to the same COI I ll I Idl IdS.

Codepoint: A codepoint (code point) specifies the data l~p,~se"ldliull of a dictionary class. Codepoints are hexadecimal synonyms for the named terms of the DDM
architecture. Codepoints are used to reduce the number of bytes required to identify the class of an object in memory and in data streams.

Command: Commands are messages sent to a server to request the execution of a function by that server. For example the command "Get_Record" can be sent to a file system. Each DDM command normally returns (results in the sending of) one or more reply messages or data objects.

DDM commands can be described under four headings:

1. Des~ ,lion. The des~ "ioll part usually includes a Command Name or the ""~e",onic name of the command such as "OPNQRY"; and an Expanded Name such as "Open Query" that is a description of the command function.
2. Fa, dl I le:lel ::~. The pdl dl I l~ l a or instance variables describe the objects that can (or must be) sent as pald~ of the command. The parameters can be sent in any order because they are identified by their class codepoints. The pa,d",~le,~ aregenerally associAted with a set of attributes (~I,a,d.l~ lics):
(a) required optional or ignorable. A Required attribute specifies that support or use of a paldll,elér is required: when a parameter is specified as being required in a pa,d",~
list for a command the parameter must be sent for that command . All receivers supporting the command must recognize and process the parameter as defined. When specified in the parameter list of a reply message the parameter must be sent for that reply message.

All receivers must accept the parameter. An Optional attribute specifies that support or use of a pa~ al "~l~r is optional . When a parameter is specified as being optional in a pdl dl ' l~l~r list for a command, the parameter can optionally be sent for that command. All receivers supporting the command must recognize and process the parameter as defined and use s the default value if it is not sent. When specified in the parameter list of a reply message, the pa~ dl "~ler can optionally be sent for that reply message. All receivers must accept the pa,d",~l~r. An Ignorable attribute specifies that a pa,d",~l~r can be ignored by the receiver of a command if the receiver does not provide the support requested. The pdl dl l l~l~r can be sent optionally by all senders. The parameter must be recognized by all 10 receivers. The receiver is not required to support the architected default value and does not have to validate the specified value;
(b) Repeatable or Not Repedldble. A Repeatable attribute specifies that a paldl, lt~ can be repeated in the value of the object variable being described. There are no requirements that the elements of the list be unique or that the elements of the list be in any order;
(c) Length characteristic: This describes the length requirements or l~ ,lions of the co"t:spondi"g data lldn~llli~aiull.
3. Command Data: the list of all the possible classes of data objects (for example, records) that can be Agsoci~t~d with the command. Each data object is generally 20 asso~idled with a set of attributes (cha,d~ ), as are the paldn,t~
4. Reply Data: The reply data section lists all possible classes of data objects that can be returned for the command. The list may contain notes about selecting the data objects to return. The reply data objects that are normally returned for the command. When 2s exception conditions occur, the reply data objects may not be returned, instead reply messages may return a description of the exception conditions.

All DDM commands may be enclosed in a RQSDSS before lldnsllli:~sioll:

30 RQSDSS(command(command pald",~l~ra)) All DDM command data objects and reply data objects may be enclosed in an OBJDSSstructure for lldnal,,i~sion OBJDSS(command-data-object(object pdl dl"t~ )) OBJDSS(reply-data-object(object s pa~dl~

All DDM command replies may be enclosed in a RPYDSS structure for l,d"s",i~sion.
RPYDSS(command-reply(reply parameters)) Parsing: the process of verifying syntactic correctness of a DDM string (DDM stream), and of translating it into a ,~:coy"i~dble internal format.

Generation: the process of creating a valid DDM string from an internal format.
Tree: A tree structure is either: (a) an empty structure, or (b) a node with a number of subtrees which are acyclic tree structures. A node y which is directly below node x is called a direct descendent of x; if x is at level I and y is at level 1+1 the x is the parent of y and y is the child of x . Also, x is said to be an ancestor of y . The root of the tree is 20 the only node in the tree with no parent. If a node has no descende"b it is called a terminal node or a leaf. A node which is not a terminal node nor a root node is an internal node.

DDM Architecture Dictionary: The architecture dictionary describes a set of named des~ ,lions of objects. The primary objects listed in the dictionary are broken down into 2s the classes "CLASS" and " HELP". Each of these objects has an external name and an external codepoint that can be used to locate it. These are complex objects (nested " ns of many sub-objects). The entries in a dictionary are of varying length and each contains a single complete object. For scalar objects, all of the data of the object i" " "edidltlly follows the length and class codepoint of the object. For collection objects, the 30 data following the length and class codepoint of the collection consists of four byte binary numbers specifying the entry number in the dictionary at which the collection is stored.
The DDM Architecture Dictionary is also referred to as the DDM Architecture document.

DDM Architecture: The DDM architecture is fully described by the DDM Architecture s Dictionary.

Forest: A grouping of trees.

Parameter: There are three kinds of DDM objects, as shown in Figure 1.

First there are simple scalars which contain only a single instance of one of the DDM data classes, such as a single number or a single character string. DDM attributes, such as length, alignment and scale are simple scalars.
Then, there are mapped scalars which contain a sequence of instances of the 1s DDM data classes that are mapped onto a byte stream by an external descriptor that specifies their class identifier and other attributes.
Finally, there are ~ :" ns which contain a sequence of scalar and collection objects. DDM cor, l~ ~ ,an ;Is, reply messages, and attribute lists are all examples of collection objects.
All objects (including pald~ lel~) are lld~ d as a contiguous string of bytes with the following format:
(a) a two byte binary length. The length field of an object always includes the length of the length field and the length of the codepoint field, as well as the length of the object's data 2s value;
(b) a two byte binary value that specifies the codepoint of the class that describes the object. All objects are instances of the "CLASS" object that specifies the variables of the object, specifies the commands to which the object can respond, and provides theplUy,dl"",i"g to respond to messages;
30 (c) an object's data area consists of the data value of primitive classes of objects, such as CA 02246943 l998-09-04 numbers and character strings or the element objects of a collection. A parameter can be either a scalar or a collection.

Since the class of a DDM object describes its pald~ it thereby defines the s i~,Lel~ l ,ange data stream form as shown in Figure 2. This makes it possible to transmit a command consisting of multiple scalar parameters from one manager to another.

Definition: A definition as used in reference to data processing structures and operations described herein is the association of a name with an attribute list. Definitions are used 10 to specify the ~;hdld- L~ri~Li~;s of variables values and other aspects of objects.

Database Management System (DBMS): A software system that has a catalog describing the data it manages. It controls the access to data stored within it. The DBMS also has ~Idl ,sa~ ~ion ",anagel"e"L and data recovery facilities to protect data integrity.
SQL( Structured Query Language): A language used in database ",d"age",ell~ systems to access data in the database.

Depth First Search: is a means of s~ ", 'Iy visiting nodes in a tree. The order is as 20 follows: (1 ) Visit the root node; (2) Visit the children of the root node; (3) To visit a child chose its children and visit them in turn. In general other alternatives at the same level or below are ignored as long as the current node that is being visited is not a terminal node. One way to i",~,le",er,~ depth-first search is depicted in Figure 3.

25 The conc:spol1di"9 pseudo-code is:
1. Form a one element queue consisting of the root node.
2. Until the queue is empty remove the first element from the queue and add the first element s children if any to the front of the queue.

30 Other types of searches are possible such as breadth-first search which expands the nodes in order of their proximity to the start node, measured by the number of arcs between them.

Application Requester(AR): the source of a request to a remote relational database s ",anage",el,l system (DBMS). The AR is considered a client.

Application Server(AS): the target of a request from an AR. The DBMS at the AS site provides the data. The AS is considered a server.

10 DESCRIPTION OF THE IBM DISTRIBUTED DATA MANAGEMENT (DDM) LANGUAGE
The Distributed Data Management (DDM) Architecture (as described in the IBM
pll' ' ~ "-n, "IBM Distributed Data Manage",e"lArchitecture Level 3: Reference, SC21-9526") describes a ~Id"da"~ d language for Distributed Applications. This language is used by the data management components of existing systems to request data services 1s from one another. It manipulates data interchange amongst different kinds of currently existing systems; efficient data i"l~ ,l ,ange amongst systems of the same kind; common data management facilities for new systems; and evolution of new forms of data management. DDM provides the abstract models necessary for bridging the gap between disparate real operating system i"",le",e"ldlions. Some of the services addressed by the 20 DDM distributed database models are to (a) establish a connection with a remote database;
(b) create application specific access methods (packages) in the database or dropping them from the database. These packages include the definitions of application variables used for input and output of SQL statements defined by the Application Requester;
2s (c) retrieve descriptions of answer set data;
(d) execute SQL :,ldl~r"e"t, bound in a database package;
(e) dynamically prepare and execute SQL ~Idl~",enti, in the database;
(f) maintain consistent unit of work boundaries between the ~ ' n requester and the database;
30 (9) terminate the connection with the database.

_ _ _ SPECIFICATION OF DDM OBJECTS
The DDM Architecture is defined by a "dictionary" of terms that describe the concepts, structures, and protocols of DDM. DDM entities are called objects. They are alsosynonymously called terms. See Figures 4a and 4b for a sample DDM Object. The object s drawn is EXCSATRD (Exchange Server Attributes Reply Data). In order to obtain more information about the object EXCSATRD, one should look at the objects that form EXCSATRD. For example, the objects EXTNAM, MGRLVLLS, SRVCLSNM, SRVNAM and SRVRLSLV, which constitute the parameters of EXCSATRD are themselves DDM objectsand can be found elsewhere in the architecture (architecture dictionary) in alphabetical 10 order. Every object has a help variable. This variable is for su~,plel "e"ldl information and explains the purpose and the semantics of the object. Another example of a DDM
Command as documented in the DDM architecture reference, above is depicted in Figures 5a, and 5b.

Likeobject-orientedlanguages,DDMhasthreecharacteristicsthatmakeitobject-oriented.
These are enc~pslll~ti~n, inheritance, and polymorphism.

EnfAps~ tionisatechniquefor",i"i",i~i,lginterdependenciesamongstseparatelywritten objects by defining strict external interfaces. DDM uses this concept to define each object 20 class (an instance of which is an object) that is part of the architecture. Most of the DDM
object classes have the following attributes: inscmd (instance ~:or"" ,ands), clscmd (class col"",and~),insvar(instancevariables),clsvar(classinstancevariables).lnaddition,there are other attributes, namely length and class.

2s Length indicates length or size of the object. There are two length attributes ~co~ cl with most objects: one is the abstract length referring to the fact that if the entire object classweretobel,dll~",illed,includinghelptext,itwouldbeaslongasthevaluespecified with the attribute. This is always "~", where "~" represents a indefinite length due to its abstract nature. The second length attribute is a part of the instance variable list. It 30 specifies the length of the object when it is 1, dn:~l, lill~d as part of the protocol . The length of some objects is clear (fixed) at the time of definition. Most objects however, have variable lengths which are determined clependi"g on their use. Thus, these objects have their lengths available only at the time of lldl 1~11 ,i~sion of the objects.

s Class indicates the class name or codepoint. Each object class has a name which briefly describes its type. Each object class also has a codepoint which is an alternate and more efficient (for 1, dnsl "i:,sion) way of naming it. This attribute is specified twice for every DDM
object class, first as a brief des~ .lion and then, as part of the instance variable list (as a hexadecimal number). There are some DDM objects which are not self-describing, when 10 they are lldnal,,ill~d. That is, when these objects are lldn~,lllitl~d they are recognized by the receiver from the context; the length and the codepoint which are essential for the recognition of the object by the receiver are not 1, dns, llill~d even though these attributes are defined forthese objects by DDM. The second ~.hald~ lialic, I"he, itdl1ce is a technique that allows new, more specialized classes to be built from the existing classes. DDM uses the inheritance structure to encourage the reusability of the definition (and eventually of the code, if the implementation is object-oriented). The class COMMAND for example, is the superclass of all commands. From the superclass, thesubclass inherits its structure. The third ~hdldl,l~ri~lic, Polymorphism is a technique that allows the same command to be understood by different objects, which respond differently.
In this disclosure, the following will be used:
N : the number of terms in the dictionary (number of trees), m: the number of total nodes in the expansion of a DDM command or reply message (number of nodes in a tree;
k : number of top level nodes, approximately N/10 in the specific application described herein;
average number of children per node.

OTHER METHODS
This section describes other methods of hierd,-,l,icdl language storage and retrieval methodologies, including Loosely Coupled Files (LCF) and Root Storage Method (RSM).
5 LOOSELY COUPLED FILES (LCF) Given that the DDM model isolates di~,l;ul1alies from processing, LCF design represents the DDM dictionaries by a collection of static data structures, which may be generated by macros. Each DDM Dictionary is assembled and link-edited into separate load modules.
10 Isolation of DDM objects requires as search arguments, (a) the object name (character string) and (b) the dictionary identification. The dil,liul1dri~s closely resemble the structure ofthe DDMdocumentation i.e.ur,l"~ i,lga networkofnodes. Thus, if oneisfamiliarwith the DDM documentation, one may correlate DDM concepts (scalars, collections, codepoints) to the LCF DDM Dk,liol1alies.
LCF Retrieval Ml:ll,odology:
Since but a single definition of each DDM object exists, the requirement to generate the object orto recognize its existence is dependent upon that single definition. Thus, LCF
creates generation and parsing methods which are driven entirely by the DDM dictionaries.
20 Any DDM object to be generated first isolates the object definition within the applupridl~
dictionary. Then, it "pushes" the length and codepoint attributes onto a stack if the object is a collection and proceeds recursively through all the instance variables of the collection, haltingwhenascalar(leaforterminalnode)isencountered.Whenascalar(terminalnode) is reached, a generator routine is invoked, which "pushes" the scalar length, codepoint as 2s well as the scalar value onto the stack. The length is returned to the invoker at the higher level. In this fashion, when all instance variables of a collection have been processed, the length of the collection is the sum of the lengths returned from the individual invocations.
The example below depicts the LCF pseudo-code for building the definition at run-time.
Note that recursion is used. Another way is depicted in Figure 6 without recursion (i.e.
30 recursion is simulated).

Example Newdef LCF_Construct (IN Codepoint) (*LCF Method for constructing Definition~) Search for the file identified by the Codepoint Scan for all its pdldllltl~t:lb (or instance variables), if any If There Are Some Then Do;
0 Scan file for instance variables Do for all the Instance Variables Definition = Definition +
LCF_Construct(Codepoint) End Do;
End If;
End LCF_Construct;

To illustrate the LCF flow and provide some insight with regard to the impact of Dictionary access and recursion on path length consider the example illustrated in Fig. 7 which depicts the definition tree to be built. LCF maintains 13 files for this tree. To illustrate the LCF flow and provide some insight with regard to the impact of Dictionary access and recursion on path length consider the example as depicted in Fig. 8.

Hence, LCF retrieves each file, sequentially searches for parameters in each file (the search argument is a variable length character string, or DDM Mnemonic, such as RDBNAM in the example above), and then for each parameter found, gets the file and extracts its parameters. This is a recursive method. This recursive step is done at run time, each time one wants to generate or parse a DDM stream. This means that the methods to construct a DDM Dictionary definition is an exhaustive search that goes through the entire file: Hence, in order to build the definition, LCF requires m retrievals and with each retrieval there is a sequential search to locate the parameters.

LCF Storage '' ~ d ~I-vJ:
LCF stores each DDM definition in a file, in the format shown in Figures 5a and 5b. This s means that each term is stored in a separate file with information that is not needed by the parsing and generation processes. Also each of its instance variables are stored in the same fashion, etc.

The storage requirements for LCF are approximately 1000 + 1 00m bytes per term in the 0 dictionary, i.e. assuming 1000 bytes head and tail overhead plus 100 bytes per internal node. Hence, the storage requirements for the entire dictionary are approximately:
(1000+100m) N.

ROOT STORAGE METHOD
TheRootStorageMethod(RSM)d~Jp,uxi",dlesorsimulatestherecursionaspectsofDDM
object definition construction by an app,oplidL~ i"")le",e"Ldliun technique (nested CASE
~Idlel"~"l~,CASE within CASE within CASE). Given this direction, the objects defined within the DDM dictionaries can be entirely eliminated or restricted to objects of a given type. RSM restructures the DDM Di-liol1al ies by first e';. "i, Idlil 19 the dictionary identifier as an element in the search argument, and hence all dictionaries are merged together. Then, the dictionary search arguments are changed from character strings to codepoints. The character strings are still ",ai"~ained within the dictionary. Finally, objects defined within the dictionaries are restricted to root nodes only. Thus, only DDM commands, command data, reply messages and reply data are defined. However, the constituent instance 2s variables of any given DSS (or pardllle:~ra)l collection or scalar are not defined.

RSM RETRIEVAL METHODOLOGY
Once the object has been identified to satisfy a request, then for each root level object, a unique root level object generator exists, which will generate one complete object. The object generator non-recursively constructs the instance variables (collections and scalars) which constitute the object. Consequently, the Generator must simulate the recursion inherent in the geneldliol1 of all instance variables col",uli~i"g that object. Figure 9 depicts the CASE within CASE method. Figure 10 depicts the flowchart of RSM object construction . With this approach, the DDM di~l;ol1dl ies are partitioned such that objects are s defined within static data structures and the constituent instance variables are hardcoded.
Note that in this method, the definitions of the various parameters are hardcoded multiple times, and that this method is not extendible to all possible variations of DDM. For example, it has the limitation in the number of levels of nesting that CASE statements are allowed .

To construct the definition for ACCRDBRM (as depicted in Figure 7 ), RSM undertakes the steps depicted in Figure 11.

To construct a definition, one must execute one retrieval with cost proportional to Log N
15 to the base 2, and m CASE statements. Thus, RSM retrieves the root term definition.
Thereafter, the pdldlll~ xpansions are hard-coded into the procedure. This method approximates the recursion aspects of DDM Object Generation by an i",ple",e"ldli(,n technique (e.g. CASE within CASE... etc.). Due to limitations in programming languages, there are only so many levels of nesting of case statements that are possible, hence 20 making the method not expandable. This appears to be a hard limitation. If DDM expands to have more levels, the RSM will exhaust its usefulness. If DDM strings reach a depth exceeding the nesting limit, then ,~desiy"i"g of the code will have to be done. In addition, this method is not well suited to parsing, because DDM is not static. When parsing DDM
Strings the parameters at each level of DDM term in the tree can appear in any order. The 2s CASE within a CASE... does not provide all possible ~;or"bindliuns of parameter ordering.
Also, for each occurrence of the parameter in the dictionary, the semantic procedure associated with it is duplicated. The programs are hardcoded, and therefore difficult to maintain. Due to the increased size, the programs are more complex. In order to maintain the program" ~COI "~,ildliun is performed each time. Hence, in order to obtain the definition 30 of the DDM term, there is one retrieval necessary and one sequential search in the top _ level file. Then, a series of embedded CASE bldl~",e,llb provide the rest of the DDM
definition.

RSM STORAGE METHODOLOGY
s RSM stores only the root or "top level" definitions. The constituent instance variables of the paralll~ lb are not defined. This means that only the top level codepoint definitions are stored as data. All the pàl dl 11~1~1 b derived through the root are hardcoded in the program.
Thisresultsinthelossofi,,run,,dliull,includingsomeofthenecessaryinformationrequired to parse and generate a DDM string. That is, all the i"rul"~aliull about the structure of the 10 pa, dl I l~ l b is not available as data. If there are changes in the dictionary, this may result in ~:onbibl~ncy problems. While LCF stored all the information for all the codepoints, this method only stores the structural information for the top level codepoi"lb. The storage requirements for RSM are aupru~illldl~ly 1000+100m pertop level term assuming 1000 bytes for head and tail overhead plus 100 bytes per internal node. Hence, there are about (1000 +100m)k for the entire dictionary. The rest of the information for the structure of the pal al I l~ l b is hardcoded in the program as depicted in Figure 9. Assuming there are N/10 top level objects, then the cost of storage is (1000+100m) N/10 bytes.

DRAWBACKS OF THE LCF AND RSM METHODS
20 LCF maintains a set of files without constructing the definition. This means that each time a definition of an object is required, LCF has to reconstruct it using the methods described above. There is no added value to reconstructing the definition each time it is required since the same definition will be required over and over again. In addition, LCF does not keep a very compact form of each of the definitions of each of the parameters; it 25 remembersinformationthatisnotneeded, i.e. i~rullll ' .1thatisnotessentialforparsing and generating. The invention herein overcomes these drawbacks by expanding the definition of a DDM command inside the data structure, and therefore not requiring its reconstruction each time it is accessed and by defining a short form of the data to describe the essence of the definition in a few bytes.

CA 02246943 l998-09-04 RSM only stores the top level node definition of the tree. The rest of the definition is hardcoded in the program. While this saves on space compared to the LCF method, RSM
does not record the information of the root node in a compact fashion. RSM maintenance may be difficult due to hard coding of each parameter and duplication of code for each s instance of the parameter in the dictionary. RSM is also subject to the limitations of pluy,dl"",i"g languages such as the level of nesting of CASE ~ldll:",er,l~. The invention herein overcomes these problems.

SUMMARY OF THE INVENTION
10 Inconveniences of other methods discussed above and elsewhere herein are remedied by the means and method provided by the instant invention which is described hereafter.

In accordance with one aspect of the invention a data l~dns",i:,sion dictionary is provided, which is adapted for use by a computer system for encoding, storing, or retrieving 1s hierarchicallyrelateddatalldns",i~sioni"rv,,,,dlioll. Thedictionaryiscomprisedofagroup of one or more computer searchable definition trees relating to lldn~",ission information of the computer system. The trees are derived from a first definition group which includes characteristics of cor"",ands, replies or data usable by the computer system. The chard,,l~ lics include structure and value properties and restrictions, if any, applying to 20 the co"""ands, replies or data. Each tree represents, respectively, a definition of the command, reply or data to which it relates. Each tree includes a root node identified by name, such as a codepoint. The root node includes information describing the type of definition tree concerned (i.e. whether it relates to a command, reply or data), and may include one or more internal or terminal descendant nodes, which nodes represent2s components of the definition I ~,ul ~senlt:d by the tree. The descendent nodes include level information describing the level of the node within its tree. The nodes may include attribute information, and may include value requirements relating to l~dns,,,i~,siùn information l~:plt:se"l~d by the nodes.

3c The root node of each definition in the dictionary may include information relating to length , CA 02246943 l998-09-04 iuliuns of lldns",ission information It:pr~se"l~d by its definition tree.

The attribute illru,,,,dlioll may include a requirement as to whether data l,dns",i:,sion illrulllldlicln l~pr~ser,l~d by a node is required optional or ignorable.

The attribute information also may include illrul",dliol1 on length repeatability or non-r~pe -t~ y of data lldl ,s",ission illrum~dlioll ,~plt:senl~d by the node.

Advantageously the root node of each of the definition trees may be made the sole 10 accessible entry for the tree.

As their size tends to be compact the definition trees may be stored in main memory of the computer system using them for use by parsing or gene,_~;.,g pruyld"""i"g to process data lldns",i~sion for the computer system.
s Advantageously the definition trees are stored in a compact linear form preferably expressed in a depth first search form.

In accordance with another aspect of the invention there is provided a method of creating 20 the data lldns",i:,sion dictionary above by deriving a group of one or more computer searchable definition trees from a first definition group of nodes defining portions of co" ~ ~ ,anda replies or data usable by a computer system compacting each of the nodes by retaining only information necessary for the processing of data lldns",issiol1 streams according to the definition trees; assembling each definition tree by sequencing the 2s compacted nodes in a linearform starting with the root node of each of the definition trees by placing information included in each compacted node in a resulting implemented dictionary; and by assembling each child node of said definition tree in turn. The process of assembling each child node involves placing information included in the child node in the resulting implemented dictionary and assembling each of the child s child nodes in turn.
30 The process of assembling a terminal node involves placing i~run,,dlion included in the terminal node in the resulting i",~lel"t:"led dictionary.

In accoldance with still another aspect of the invention means is provided for constructing the data ll d"s" ,iasiol1 dictionary described above which comprise an extractor for deriving s a group of one or more computer searchable definition trees from a first definition group of nodes defining portions of cor"" Idl Ids replies or data usable by a computer system. A
compactor is provided for compacting each of the nodes while retaining only information necessaryfortheprocessingofdatalldllslllia~iollstreamsaccordingtothedefinitiontrees An assel "bler is provided for asse", 'i. ,g each definition tree starting with the root node for 10 each tree. The asse",bler can place information included in each compacted root node in the resulting i" ,ple" ,er,l~d dictionary and assemble each of the ( oi "~,a~ led node s child nodes if any in turn. The assel"l.ler is adapted to place information included in each child node in the resulting i",ple",~"led dictionary and to assemble each of said child s child nodes if any in turn.

In accordance with a further aspect of the invention the dictionary described above is i"col~,ordled into a computer system for use by it for encoding storing or retrieving hieldll hically related data lldns",i~siun information for use by said computer system internally or in communication with another computer system.
In accordance with another aspect of the invention there is provided a method of encoding and decoding a data l,dns",i:,sion of one or more computer systems using the dictionary described above using the following steps:
separating the data lldns",i~,sio,1 into command reply ordata parts corresponding 2s to individual definitions in the dictionary and ensuring that the parts conform to required sp - ,ns of the data l,dn~",ission protocol used by the system;
for each of the parts retrieving a ~ ol,~sl)ondi"g definition tree from the dictionary and stepping through the data 1, dn51~ ,iOIl ensuring that required i, If ur, "dlion is present 30 and that relevant rules are obeyed for the tree structure for each of said nodes -encountered in the data lldns",ission; and also ensuring that structural and value rules relating to the nodes, as described in the definition corresponding to the node are adhered to.

s Advantageously, in the above method when used for encoding the data lldnal, liaSiull the dictionary definitions serve as a roadmap for the translation of internal data structures of the computer system into a data lldnsllliasioll which conforms to requirements of the definitions.

0 Advantageously as well in the dru~t:l"er,' :led method when used for decoding a data lldns",i~sion the dictionary definitions serve as a roadmap for the verification of the data Il dllSI I ~ iun according to the definition requirements and the ll dnsldlion into internal data structures of the computer system.

Inaccordancewithanotheraspectoftheinventionthereisprovidedadistributedcomputer system com,ul i~i"g a source system and destination system. The source system includes an application requestor, a parser and a generator supporting the ap,c' " ,n requestor.
The de~li"dliun system includes a server and a parser and generator supporting the server. The parsers and generators have access to one or more dictionaries constructed 20 in accordance with the dictionary described above for the purpose of processing data lldnallliasiuns between the source and de~li"dlioll systems.

The distributed computersystem described above may contain the destination and source systems within one or a local computer system.
2s In accordance with yet another aspect of the invention a data processing dictionary is provided, which is adapted for use by a computer system for encoding, storing, or retrieving hierarchically related data processing information. The dictionary is comprised of a group of one or more computer searchable definition trees relating to data processing 30 information of the computer system. The trees are derived from a first definition group which includes chdld~ liali~,s of cum~ands, replies or data usable by the computer system. The chdld~ ,lics include structure and value properties and l~ ,liuns, if any, applying to the ~,ol "" Idnd~, replies or data. Each tree, ~pr~senl~, respectively, a definition of a the command, reply or data to which it relates. Each tree includes a root node s identified by name. The root node includes information describing the type of definition tree concerned (i.e. whether it relates to a command, reply or data), and may include one or more internal or terminal descendant nodes, which nodes represent co" ,~.one"t~ of the definition represented by the tree. The descende"l nodes include level illru""dlio~
describing the level of the node within its tree. The nodes may include attribute information, 10 and may include value requirements relating to data processing i"fu""dlion represented by the nodes.

It may prove advantageous for some of the nodes of the tree to be linked to data stored by the data p,ucessi"y system for representing or accessing the data stored.
1s DETAILED DESCRIPTION OF THE INVENTION
In the invention described herein below the definitions of DDM uor""~al1d~, replies, and data are stored in command, reply, and data trees, respectively.

20 This invention which will be termed the DDM Dictionary Structure Optimizer (including method and means) (DDSO) compacts the definition of nodes of the DDM command andreply data trees by retaining only the information necessary for parsing and yenel dlion of the DDM data streams. DDSO also asse" ,bles the definition of a DDM command, reply, or data by sequencing the ~;ulllpd~ d nodes in the corresponding tree in a depth first 2s search manner. Definitions are created by first scanning the DDM Architecture document (which may be on line advantageously) and then by extracting the necessary information.
Then, each of the definitions is assembled. In order to explain DDSO, it is first described how to create the DDM Dictionary structure of the invention from the DDM architecture document, then what the storage and retrieval methodologies are, and the formal 30 specification of the definition syntax. Finally, we discuss the advantages and disadvantages of DDSO are discussed.

CREATING THE DDM DICTIONARY DATA STRUCTURE
The DDM Dictionary Data Structure is a compactform of definitions derived from selections s of the dictionary defined by the DDM architecture document. Each definition is expressed as a tree (having one or more nodes) in a linear form, and preferably expresses it in depth first search form, with each of the nodes defined in a compact form. In general, the steps are the following:

Step 0:(Extraction Stage) Get all the codepoints (identifiers of the nodes) for the trees required in the forest. The DDM architecture provides a network of nodes that are pointing to each other. This stage extracts the nodes needed for the trees of the a?F ' " n. Only the root nodes are given to the Extraction Stage. This step calculates which nodes are needed for the definitions.
Step 1: ( Compaction Stage ) Scan all the DDM files created in step 0 for essential information, i.e. the top level codepoint for each node and all node pa,d",~ ,. Retain the information in DDSO form for the pal dm~ . The specifics of the DDSO form are described below . An example of DDSO form is: "RN1: 2401,~255", which indicates attributes (RN), level in the tree (1), unique identifier (2401) and length attribute (~255).

Step2: ( AssemblyStage ) This step assembles (expands) each of the pa,d",~ ,. This means that if a pdldlll~l~r itself has pdldl I It~ (i.e. it is a parent) then the children are added in a depth first search manner, and they are given one level higher than that of the parent.

ADDG (Automated DDM Dictionary Generator) is a convenient tool which can be usedto create one or more DDM Dictionary data structures (dictionaries) from the DDMarchitecture document. ADDG has three steps, as depicted in Figure 12:

1. Generate DDMTXT: This exec steps through the DDM architecture document extracting the il IFul 1,, n required by the user. This includes the root nodes specified by the user as well as all the nodes required in the expansion of the root nodes. Each of these nodes is extracted into a file with filename equal to the DDM "" ,e",onic term and a file type of s DDMTXT. Other files are generated such as DDM FLVL which provides a list of all DDM
terms which are going to be expanded; EXPCDPT FILE which provides a list of all valid part sp~,_iricdliuns (a part specification specifies whether the DDM object is a command reply or data object) and their co~ ondi"g DDM codepoi"ts and DDM HEX which provides a list of all DDM "",el"onics with corresponding codepoints. The 10 generate_DDMTXT high level flowchart is depicted in Figure 13.

2. Create DDM Definitions:
The Generate_DDMTXT exec must be run before the Create_DDM_Definitions exec.
Create_DDM_Definitions creates the DDM_DEF FILE which contains a DDM definition for s each DDM Term. It follows the specific rules that were setup in the DDSO form for the dictionary. Create_DDM_Definitions is depicted in Figure 14.

3. Assemble DDM Definitions The Generate_DDMTXT and Create_DDM_Definitions execs must have been executed 20 before this exec is run. This exec assembles all top level DDM terms by assembling parts of several DDM definitions. It also contains the source language specific statements in order to store each definition. The definitions are stored in a file. Pseudocode for the Assemble_DDM_Definitions is depicted in Figure 15.

2s The pseudocode for the ADDG tool is shown in Figure 16.

There are therefore two main operations involved in constructing the definition and these are compaction and assembly. Compaction involves storing each parameter in the compacted form while assembly is an expansion process that reassembles a complete 30 definition of a root node in depth first search format. It is possible to compact the definitions CA 02246943 l998-09-04 of each pa,dl, lellt"- without pel ru" "i"g the assembly. Resulting storage savings over LCF
will occur. However, the pe, rul " ,ance overhead of LCF to create the definition will have to be incurred, since the definition will have to be created at run-time as opposed to creating the definition before runtime, as is done in the instant invention. It is also possible to s assemble the definition without compacting it. Due to the d~ of certain internal nodes, and large storage requirements for each node, this alternative may not prove attractive. However, if compaction and assembly are both done then maximum benefits may be obtained from the instant invention.

10 Storage~~ ~ rl~'rJ~y DDSO stores the DDM definition files in the format shown by the example depicted in Figures 1 7a-l. A DDM definition is a linear ~ ssion of a tree, assembled in depth first search manner, and contains i"rv""dlion required, namely: i"ru""dliun required for the root node and information stored for non-root nodes. The root node requires 6 bytes for its definition and each non root node requires 11 bytes. If there are m nodes in the tree then the tree requires 11 m + 6 bytes. Hence, for N trees in a dictionary, 11 mN + 6N bytes are required. In addition, a small search table requires 6 bytes per tree, hence 6N bytes.
Therefore the total ill~,ule~ llldliun requires 11mN + 12N bytes.

Note that in the example, the constants 11 and 6, i.e. the number of bytes per internal and root nodes respectively are slightly higher. Certain additional characters ( "/"'s) and punctuation (",") were added to improve human readability.

For the example ap~ ' - " ,n, app,u,d",dl~ly 5088 bytes of data are required for the dictionaryitselfandasmalllookuptableofabout510bytesforthepurposesofsearching.
Since the definition is already constructed, the cost of retrieval reduces to the cost of a search through the lookup table, eg. the cost using binary searching.

1. Illfunlldliùll Stored for Root Node:
30 The following attribute information is stored for the root node:

CA9-91-001 C 2s (a) Carrier Type: i.e. whether it is a request, reply, or data object. In DDM there is one general format for the request data stream structure. The request envelope (RQSDSS) fields must be specified in a certain order because they are not self-defining structures.
Only one command can be carried by a RQSDSS. Similarly, in DDM there is one general s format for the reply data stream structure. All fields must be specified in the order required because the reply envelope (RPYDSS) is not a self-defining structure. Similarly, the data object envelope (OBJDSS) has a pre-specified format, and carries all objects except the col"",ands and reply messages. An OBJDSS however may carry multiple objects;

0 (b) The codepoint of the root node;

(c) The length ~,l ,a~ d~.lel ialic. The length ~,l ,a, dl,lel i~lic includes descriptions forfixed length objects, variable length objects, objects with a maximum length, and objects with a minimum length.

2. Information Stored for Internal Nodes and Leaves (terminal nodes):
The following attribute information is stored for non-root nodes:
(a) whether the node is Required, Optional, or Ignorable;
(b) whether the node (and its descendents) are repeatable or not;
20 (c) the level or depth of the node in the tree;
(d) the length l,hald~leli~lic of that node.

The first attribute stored is the Required, Optional, or Ignorable attribute.

2s A Required attribute specifies that support or use of a parameter is required: when a pa, dl I leler is specified as being required in a parameter list for a command, the parameter must be sent for that command . All receivers (of 1, dnSI l lissiOns) supporting the command must recognize and process the parameter as defined. When specified in the parameter list of a reply message, the parameter must be sent for that reply message. All receivers 30 must accept the parameter.

An Optional attribute specifies that support or use of a pa,d,,,~l~l is optional. When a pa~ al "~l~r is specified as being optional for a parameter in a pa~ dl I l~l~r list for a command, the pa~dr"~ can optionally be sent for that command. All receivers supporting the commandmustrecognizeandprocesstheparameterasdefinedandusethedefaultvalue s if it is not sent. When specified in the parameter list of a reply message, the pdl dl 11~ 1 can optionally be sent for that reply message. All receivers must accept the parameter.

An Ignorable attribute specifies that a parameter can be ignored by the receiver of a command if the receiver does not provide the support requested. The pdldlln~ r can be sent optionally by all senders. The parameter codepoint must be recognized by all receivers. The receiver can ignore the parameter value.

Next is the Repeatable or Not Repeatable attribute. A Repeatable attribute specifies that a parameter can be repeated. If it is specified as Not Repeatable it can't. There are no requirements that the elements of the list be unique, or that the elements of the list be in any order. The information stored for root and non root nodes is logically depicted in Figures 21-23.

For example, a top level node with the description " 1,200C,~ " has a carrier of 1 (request), codepoint of hex'200C' and variable length (i.e. up to an unspecified limit).

Inaddition,apa,dl"~ l,orinternalnode,withthefollowingdescription:" RN2:2408,~255" means that the parameter is required, non-repeatable, has a codepoint of hex'2408' and has variable length of up to 255.
2s ORDERING OF THE PARAMETERS
Intheembodimentdescribedthepa,d",~ foreachfulltreearelistedinalinearfashion;
for example, for the tree depicted in Figure 18, the ordering of the pa, dl I It:l~l definitions in the tree for depth first search is: N0, N1, L1, N2, N2.1, L2, N2.2, L3, N3, L4, N4, N4.1, N4.1a, L5, N4.1b, L6, N5, L7, where:

N stands for Node and L stands for Leaf.
The order of the tree is " ,ai, llained . The tree can be reconstituted in a I ,iel dl . hi1dl form since depth first search order was used and depth i, ~rvl,, ,dlion was maintained.
s Other Parameter Orderings: Because all the valid orderings in which DDM paldlllt:lel~
sent are all of the orderings of depth first search (not just those limited to the left-to-right notation convention) it is more convenient to store the definition in this manner. It would be possible but more expensive to store them in another order. Additional information eg.
parent i"r~"",dliu,~ would have to be added to the definition so that the tree may be reconstructed from the linear form.

RETRIEVAL MECHANISM
Inthee",l,odi",e"loftheinventiondescribedtheretrievalmechanismisbasedonasimple search technique a binary search. However other suitable search methods can be used dependi"g on the range of the codepoints the values of the codepoints the size of the forest to be i" I,ulel I ~e~ d etc.

DDM Dictionary Syntax Figure 19 depicts DDM dictionary definition syntax for ~;ur"" ,ands replies and data using the embodiment of the invention described herein.

I"l~r~,r~ldlion Rules 2s The rules describing DDM Dictionary syntax can be i"le",l~led as follows:
3. := means is defined by e.g. A := B means that A is defined by B.
4. I means logical or eg. A := B I C means that A is either defined as B or C.
5. Lower case characters represent terminal nodes of the definition and are defined as literals.
6. Upper case characters represent non-terminal nodes and are defined as a collection of terminals and non-terminals.
7. quotes: Items in quotes are literals. For example 'B' means the letter B.

s Acronyms & Syntax used in Figure 19 Carrier indicates the DSS carrier 0 indicates the DSS carrier used for partial replies indicates the DSS carrier field RQSDSS (request DSS), used for commands;
2 indicates the DSS carrier field RPYDSS (reply DSS), used for replies;
0 3 indicates the DSS carrier field OBJDSS (object DSS), used for objects;
Codept indicates the DDM codepoint: identifier for the DDM term;
Maxlen indicates the maximum length of the DDM term;

Minlen indicates the minimum length of the DDM term; Level indicates the level of the DDM tree, i.e. indicates the level of nesting with the parameter; Length is the length of the DDM pald~ means variable length; $ signals the end of the definition;
LOWERA indicates a lower level architecture used by DDM. This allows for DDM to 5 include other architectures.

The formal specification of the definition basically means the following (still referring to Figure 19):

0 DDM_ENTRY: Line 1 is the top level entry and defines the root node. The root node can have either a request, reply or data object envelope and this is specified by the Carrier.
A carrier for the specific a?;~ has four possible values, 0 through 3, but this can be extended for other types of envelopes. In addition to the carrier, the root node information includes the codepoint, Codept of the node and the length spe~ific~tion of the root node 1s (the length cperiF:~tion of the root node is usually variable length although this is not required. The length specification can specify a fixed length field, a maximum length field, a minimum length field or a variable length field). The root node can be composed of DDM
objects, referred to as DDM_PARMS (first line in the formal specification) or can be composed of objects of a lower level architecture and can either have itself a lower level 20 data value (Line 2) or can be a collection of lower level objects (Line 3).

DDM_PARMS: If the root node contains a collection of DDM objects and lower levelobjects, then this DDM definition is followed. The DDM object can either be (a) a terminal object (Line 4), with information such as required/optional/ignorable, repeatable/non-2s repeatable, level of the terminal object in the tree (with root node being level 1), thecodepoint and length chdl dl~ lic, (b) A terminal object with lower level object contents, with the same ~,I,a,d~ ,lics as the terminal object above (Lines 5-6); ~ Two DDM_PARMS objects. This allows a DDM_PARMS object to recursively define itself in order to allow more than one terminal object and more than one depth in the tree (line 7);
30 (d)OneDDM_PARMSobject.Thisisasyntactictricktoallowforthe'$'whichindicatesthe end of the object, and is required in the definition (Line 8).

LOWOBJ: Allows for the same structure as a DDM object and hence allows nesting and terminal nodes. The terminal nodes contain the same basic information as a DDM terminal node (Lines 9-11).
Line 12: A carrier can have values ranging from '0' to '3'. This can be expanded to more values as the need arises.
Line 13: The level of the parameter in the tree. The root has level 1 and its children have level 2. If a node has level I then its children have level 1+1.
0 Line 14: Codept indicates any valid DDM codepoint.
Line 15: Length ~,l ,a, dl,~ ; for DDM: For example, it may take on the following values: (a) dddd, such as 1233, which means fixed length of 1233, (b) ~ , which means variable length, ~ ~maxlen, such as ~255 which means that the DDM object has a maximum length of 255, (d) minlen~, such as 255~, which means that the DDM object has length of at least 255. Note that there are only four characters for length. This can easily be expanded as needed Lines 16 and 17: S~ ' ~ " n of minlen and maxlen Line 18: "roi" means that the parameter is either required, optional, or ignorable.
Line 19 :"rn" means that the parameter is either repeatable or not.
Line 20 :"d" is any valid digit from 0 to 9.

It is possible to modify the formal cpe~ n of the syntax in various ways, without changing the intent and the meaning of the invention. Various ways of modifying it include:
(a) adding more carrier types, (b) adding more attributes to the root node, or to the 2s paldlll~ nodes; as more attribute characteristics are added to the architecture, more attribute place holders or more valid values may be added to describe DDM;~) length s~ - ' ns could change such as to add more digits to one length specification, or to add a parameter which has both minimum and maximum length ~ ns. As DDM
evolves, the formal specification for the dictionary syntax will evolve as well.

Example: The files depicted in Figures 5a,b can be stored as follows:

Request:
1,200C,~ /ON2:2110,0022/RN2:2113,0068/RN2:2114,0008/ON2 :2132,0006$
s Command Data:
3,200C,~ /ON2:2412,~ ,LOWERA/RR3:0010,A~/OR3:147A, ~ $

There are two degenerate cases one can look at to compare DDSO with LCF and RSM.10 These are:

(a) a tree with one node: while DDSO stores the node in compact form, LCF stores one node in one file; LCF still needs to scan the file, but does not need to perform the assembly. RSM in the case of the tree with one node reduces to LCF, since there are no 15 CASE statements associated with one node. Hence in the case of the tree with one node, DDSO still maintains its advantage of storage compaction, but is still slightly better than LCF and RSM in pe"ur",d"ce.

(b) A forest with one tree; in this case, DDSO avoids the binary search. LCF and RSM still 20 have to construct the definition. Hence, in the case of a forest with one tree, the invention has advantages.

HOW DDSO DEFINITIONS ARE USED
The DDSO definitions are retrieved in both the parsing and the generation p~ucesai~g of 2s DDM strings. Parsing means receiving a DDM string, checking its syntactic con~ull,ess and building the equivalent internal data structure for use by the local processor.
Generation means receiving an internal data structure and building the DDM string using the definition tree. Figure 2û depicts the parsing and geneldlion process in a requester-serverdistributed system. An aPP' " )n program first submits a request in internal format.
30 (Step 1) The request is translated into the DDM string by the geneldliun process (the generator consults the DDM Dictionary to do this).
(Step 2) Then, the request is sent to the server, which receives it. The parser translates the request into internal format by consulting the DDM dictionary for syntax verification.
(Step 3) Then, the internally formatted request is executed by the server. This can be one s of various different suitable types of servers such as file servers, or database servers.
(Step 4) The server issues one or more replies in internal format, which are translated by the generator (Generator consults the DDM Dictionary) into a DDM string or strings.
(Step 5) DDM reply is sent to the source system.
(Step 6) Finally, the source system's parser translates DDM reply into internal format 10 (Parser consults DDM Dictionary) and returns to the ap~ program.

CONCEPTUAL LAYERING OF DDM
In the specific ~n ,bodM ~e~ ~I described the parser and generator advantageously share a common design which stems from pal lilioni"g DDM data streams (DDM strings) into a 15 series of layers.

The first, or topmost layer, Layer Zero, consists of a DDM Command or a DDM Reply, which constitutes a logical object. A request for parsing or generating must always come at layer 0.
Next is Layer One, which is derived from breaking up this logical object into one or more Data Stream Structures, or DSSs (or data communications envelopes) which are linked to each other. For example, the DDM Command to execute an SQL Statement is accol "~.d"ied by various pa, d" ,~ as well as command data (the SQL statement). DSSs 25 can include a command part and zero or more command data parts; or, a reply part and zero or more reply data parts; or, one or more reply data parts.

Layer Two consists of the structural properties of a tree without looking at the specific values of the nodes within that tree. An example of a structural property of the tree is the 30 length value at each node which is the sum of its children's length plus a constant (for its CA 02246943 l998-09-04 own length field and codepoint, or identifier).

Finally, Layer Three: consists of each node of the DDM Tree. Each node has structural properties in the tree and valid required values. Examples of the structural properties s within the tree include whether the node is required, optional, ignorable"~pedldble1 a collection, or a scalar. ( "Collection" refers to an internal node, and "scalar" refers to a leaf node). Examples of values of the nodes: Leaf nodes carry values and these values carry certain, ~ ,lions. For example, leaves may be of certain data types, such as enumerated value data types or they may have certain length l~ ,lions, such as maximum length.
10 Non leaf nodes don't have values but have length ~ ,lions.

SOFTWARE ARCHITECTURE FOR DDM PARSING AND GENERATION
METHODS

There are three major levels of the DDM Parsing/Generation Process which correspond to the three layers mentioned above, and are depicted in Figure 24.

The first level deals with the processing of a DDM Entry (Multiple Related Data Stream 20 Structures): or relating two logical DDM Objects together. For example, a command must always be followed by command data if it has any. The "links" between the two Data Stream Structures (DSSs) (command, command data objects) are ~:~ldbli~l,ed by the processing of the DDM Entry. This level takes care of linking DSSs together, through various continuation bits, and ensures that the rules as defined by DDM architecture for 2s linkage are enforced.

The second level involves processing one Data Stream Structure at a time. This level takes one of the DSSs and looks at its internal structure. Each DSS is composed of a tree.
This level obtains the definition of the relevant DDM object from the DDM Dictionary, and then proceeds to step through the definition, and starts Co~ Jdlil,g it to the actual data received (parsing), or, uses it as a roadmap to generate the appropriate data stream (yene, dlion). While level 1 was concerned with the ~ nship between DSSs, level 2, the DDM layer, takes care of the I t:ldlionships between the nodes within a DDM tree, with such activities as length checking for collection objects, etc.

The third level (the action level) concerns itself with individual nodes which include: Action Execution, Action Specifics, and a Link to a Lower Level Architecture. The Action Execution sublevel is the next natural level down and deals with individual nodes. These nodes have properties, such as: required, optional, ignorable, repeatable, etc. It is the 0 Itlspoll ' " y of the Action Execution sublevel to ensure that required nodes are parsed or generated and that other structural properties of the codepoi"b are obeyed. T h e Action Specifics sublevel deals with the values in individual nodes. The nodes are either collection objects, (i.e. internal nodes: in which case they are composed of other DDM
nodes), or they are scalars (i.e. Ieaf nodes ). The collection objects have no specific values ~so-.;,,lPd with them. The scalars do, and it is the l~:spon ' "'y of this sublevel of the hierarchy to ensure that the values parsed or generated are the correct ones. The length attribute is also verified against its corresponding definition in the dictionary. The third sublevel or the lower level architecture sublevel deals with more complex scalar objects defined in another architecture, such as the Formatted Data Object Content Architecture 20 developed by IBM Corporation.

The common Parser and Generator design provides the following advantages including maintainability, generality, and non-recursive methodology.

25 Mdil Itdil, ' :"'y is due to the fact that changes in the syntax of DDM are only limited to the action specifics portion. For example, if a parameter changes, it is very easy to locate the unique instance of its action in the code. Also, the common logic makes it easier to maintain the code. The Parsing and Generation processes use common data structures, such as the Length Tree Data Structure.

The code is very general, in that changes in the dictionary are localized to the action specifics (Generality). One could merely change the action specifics part and have a Parser and Generator for a Distributed File System Application, for example . The structure of DDM is followed and hence changes can easily be incorporated.

The actions described above are for a Data Base Application. However, it would be relatively easy for a person skilled in the art herein to build a set of actions for another application of DDM and substitute the new set to achieve the intended results.

0 Another advantage of the use of the dictionary of the invention is that the method of use simulates recursion by having a completely expanded dictionary. That is, the DDM Tree is expanded in a depth-first search manner. Therefore, the method has the advantages of a recursive solution without the overhead of the actual recursion.

In terms of storage requirements DDSO shows useful advantages. The efficient utilization of storage is due to the fact that only essential i"ru""..~iJn is retained. The dictionary is encoded into a specific format so that it will contain the definition in its most minimal form while still including information about all the nodes in the tree of the definition including the 20 optionality information about the node, the node's length information, and the node's level i, If url ~ IdLioll .

Also, there is only one dictionary access per top level DDM definition. One dictionary access gives access to the entire definition as opposed to the definition of the node only.
2s By comparison, LCF requires as many accesses as the number of parameters in the tree.
RSM requires one access per top level node, but only provides structural illrunlld~ioll for the top level node and not the entire definition tree.

In addition to being more storage efficient and requiring only one dictionary access to 30 obtain the full definition, DDSO constructs the definition prior to compile time. Since the definition has been expanded prior to co,,,uildliun, the recursive step is not done at run time which would be at the expense of the user. DDSO incurs the cost once per definition prior to compiling the code. DDSO uses binary searches into a table of top-level nodes.
DDSO could also utilize other search methods, such as hashing etc. LCF and RSM
s appear to be limited to sequential search methods.

DDSO code is less complex. DDSO has a unique action for the same node and hence does not duplicate code unnecessarily. DDSOis independent of the plUy,d"""i"g language. Also, DDSO can use a table driven method while RSM has hardcoded programs. DDSO encodes the definitions as data. A change in DDM architecture would requireRSMtochangetheprogramratherthan justthedata.Forclarity""di"l~nd"ce,and simplicity, the table driven approach has advantages. Also, the method is expandable for future use. DDSO appears to be independent of programming language, while RSM
appears limited to the number of nestings of CASE ~Idl~l"er,ts allowed in the i""~le",e"ldlion of programming languages.

DDSO compacts the definitions, and defines a grammar to describe DDM. The expansion of the trees is done before compile time, and hence the recursive step of LCF need not be done for each DDM tree parsed or generated. DDSO is a table-driven method, in which the table contains the node identifier followed by a pointer to the already expanded definition.

DDM DICTIONARY DATA STRUCTURE EXAMPLE
An example of a DDM dictionary according to the invention herein is depicted in Figures 1 7a-l. Some points to note about the example are:
2s 1. Data Structures Used: In this example, a DDM Dictionary data structure and retrieval ",eul,allis", are discussed. It is composed of the following declarations:
TABLE: a table collldilliny.
--Spe ' "-1 and codepoint: used to search for a root level codepoint concdlel1dl~d with the specification, which indicates: CD -command data, CP -command part, RD - reply data to distinguish between carrier types.

--Length of definition s --Pointer of definition: this points to the definition of the tree. This table is used for binary search. The cr-e~ ' " n and root level are listed in alphabetical/numerical order.
TBLBASE: a pointer to the table used to remember the starting location of the 0 table.
TBL_PTR: a pointer used to search through the table DDM TBL: a template used in conjunction with TBL_PTR to search in the table and obtain the necessary fields.

1. Specific Method to Retrieve the Data:
(a) Find out part :"Je~,iricdt;~ and codepoint in last four character positions.(b) Do a binary search in the table to match desired codepoint. When found, then move to the definition buffer area.

The retrieval mechanism depicted in Figures 17k,1 is based on a simple binary search. However, other search methods can be used to fit the particular nr,:~ n.
The above-described embodiments are merely illustrative of the ~! ~' " n of the principles of the invention . Other a" dnge" ,e"Ij may be devised by those skilled in the art without departing from the spirit and scope of the invention.

The present invention is not limited to the specifically disclosed embodiments, and variations and modifications may be made without departing from the scope of the present invention.

Claims (26)

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for generating a data stream for transmission between one or more computer systems using a data dictionary, said data dictionary comprising a plurality of data definition trees, each said tree having a root node for identifying the definition of said tree, said method steps comprising:
(1) receiving a command data from an application;
(2) using a definition portion of said command data as an index into said data dictionary by matching said definition portion of said command to a root node in said data dictionary;
(3) generating a data stream for said command data from a data definition tree identified by said root node, said tree representing a compacted expression of either a request command data stream, a reply command data stream, or an object command data stream.
2. The method of claim 1, wherein said generated data stream is received at a destination system, and wherein said destination system parses said generated data stream to determine whether required information is present and whether relevant rules are obeyed for the type of command specified by said generated data stream.
3. The method of claim 2, wherein a processor in said destination system uses a definition specified in said generated data stream to index a destination data dictionary located in said destination system, said destination data dictionary comprising a plurality of data definition trees, each said destination dictionary tree having a root node for identifying the definition thereof, and wherein the tree defined by said root node specified in said destination system is used to build a command from said generated data stream, said command being understood by said destination system.
4. The method of claim 1, wherein said generated data stream is received at a destination system, and wherein said destination system parses said generated data stream for semantic correctness.
5. The method of claim 1, wherein said data definition tree comprises a linear, depth-first arrangement of a plurality of linear and terminal descent nodes expressing a plurality of parameters associated with said root node.
6. The method of claim 5, wherein said generated data stream is received at a destination system, and wherein said destination system parses said generated data stream to determine whether required information is present and whether relevant rules are obeyed for the type of command specified by said generated data stream.
7. The method of claim 6, wherein a processor in said destination system uses a definition specified in said generated data stream to index a destination data dictionary located in said destination system, said destination data dictionary comprising a plurality of data definition trees, each said destination dictionary tree having a root node for identifying the definition thereof, and wherein the tree defined by said root node specified in said destination system is used to build a command from said generated data stream, said command being understood by said destination system.
8. The method of claim 7, wherein said command built from said generated data stream is executed by a server processor in said destination system.
9. The method of claim 7, wherein said generated data stream is a communicationspacket having a command mapped therein.
10. The method of claim 7, wherein subsequent to execution of said command built from said generated data stream, a reply command is returned from a processor in saiddestination system.
11. The method of claim 10, wherein said destination system (1) uses a definition portion of said reply data as an index into said destination data dictionary by matching said reply data definition portion to a reply root node in said destination data dictionary;
(2) generates a second data stream for said reply data from a data definition tree identified by said reply root node, said tree for said reply root node representing a compacted expression of either a request command data stream, a reply command data stream, or an object command data stream.
12. The method of claim 5, wherein said generated data stream is received at a destination system, and wherein said destination system parses said generated data stream for semantic correctness thereof.
13. A program storage device readable by a machine, tangibly embodying a programof instructions executable by the machine to perform method steps for parsing a data stream received from one or more computer systems using a data dictionary, said data dictionary comprising a plurality of data definition trees, each said tree having a root node for identifying the definition of said tree, said method steps comprising:
(1) receiving a data stream;
(2) retrieving a definition portion from said data stream;

(3) using said definition portion of said data stream as an index into said datadictionary by matching said definition portion of said data stream to a root node in said dictionary;
(4) parsing said data stream into a command data using a data definition tree identified by said root node, said tree representing a compacted expression of either a request command data stream, a reply command data stream, or an object command data stream, wherein said data definition tree comprises one or more internal or terminal descent nodes expressing parameters required for implementation of said command.
14. A program storage device readable by a machine, tangibly embodying a programof instructions executable by the machine to perform generate a data stream for transmission between one or more computer systems using a data dictionary, said data dictionary comprising a plurality of data definition trees, each said tree having a root node for identifying the definition of said tree, said device comprising:
means for receiving a command data from an application;
means for using a definition portion of said command data as an index into said data dictionary by matching said definition portion of said command to a root node in said dictionary;
means for generating a data stream for said command data from a data definition tree identified by said root node, said tree representing a compacted expression of either a request command data stream, a reply command data stream, or an object commanddata stream.
15. The system of claim 14, wherein said generated data stream is received at a destination system, and wherein said destination system parses said generated data stream to determine whether required information is present and whether relevant rules are obeyed for the type of command specified by said generated data stream.
16. The system of claim 15, wherein a processor in said destination system uses a definition specified in said generated data stream to index a destination data dictionary located in said destination system, said destination data dictionary comprising a plurality of data definition trees, each said destination dictionary tree having a root node for identifying the definition thereof, and wherein the tree defined by said root node specified in said destination system is used to build a command from said generated data stream, said command being understood by said destination system.
17. The system of claim 14, wherein said generated data stream is received at a destination system, and wherein said destination system parses said generated data stream for semantic correctness thereof.
18. The system of claim 14, wherein said data definition tree comprises a linear, depth-first arrangement of a plurality of linear and terminal descent nodes expressing a plurality of parameters associated with said root node.
19. The system of claim 18, wherein said generated data stream is received at a destination system, and wherein said destination system parses said generated data stream to determine whether required information is present and whether relevant rules are obeyed for the type of command specified by said generated data stream.
20. The system of claim 19, wherein a processor in said destination system uses a definition specified in said generated data stream to index a destination data dictionary located in said destination system, said destination data dictionary comprising a plurality of data definition trees, each said destination dictionary tree having a root node for identifying the definition thereof, and wherein the tree defined by said root node specified in said destination system is used to build a command from said generated data stream, said command being understood by said destination system.
21. The system of claim 20, wherein said command built from said generated data stream executed by a server processor in said destination system.
22. The system of claim 20, wherein said generated data stream is a communications packet having a command mapped therein.
23. The system of claim 20, wherein subsequent to execution of said command built from said generated data stream, a reply command is returned from a processor in said destination system.
24. The system of claim 23, wherein said destination system further comprises:
means for using a definition portion of said reply data as an index into said destination data dictionary by matching said reply data definition portion to a reply root node in said destination data dictionary;
means for generating a second data stream for said reply data from a data definition tree identified by said reply root node, said tree for said reply root node representing a compacted expression of either a request command data stream, a reply command data stream, or an object command data stream.
25. The system of claim 18, wherein said generated data stream is received at a destination system, and wherein said destination system parses said generated data stream for semantic correctness thereof.
26. A program storage device readable by a machine, tangibly embodying a programof instructions executable by the machine to parse a data stream received from one or more computer systems using a data dictionary, said data dictionary comprising a plurality of data definition trees, each said tree having a root node for identifying the definition of said tree, said device comprising:
means for receiving a data stream;
means for retrieving a definition portion from said data stream;
means for using said definition portion of said data stream as an index into said data dictionary by matching said definition portion of said data stream to a root node in said dictionary;
means for parsing said data stream into a command data using a data definition tree identified by said root node, said tree representing a compacted expression of either a request command data stream, a reply command data stream, or an object command data stream, wherein said data definition tree comprises one or more internal or terminal descent nodes expressing parameters required for implementation of said command.
CA002246943A 1991-03-28 1991-03-28 Method and means for encoding storing and retrieving hierarchical data processing information for a computer system Abandoned CA2246943A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA002246943A CA2246943A1 (en) 1991-03-28 1991-03-28 Method and means for encoding storing and retrieving hierarchical data processing information for a computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA002246943A CA2246943A1 (en) 1991-03-28 1991-03-28 Method and means for encoding storing and retrieving hierarchical data processing information for a computer system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CA002039365A Division CA2039365C (en) 1991-03-28 1991-03-28 Method and means for encoding storing and retrieving hierarchical data processing information for a computer system

Publications (1)

Publication Number Publication Date
CA2246943A1 true CA2246943A1 (en) 1992-09-29

Family

ID=4162814

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002246943A Abandoned CA2246943A1 (en) 1991-03-28 1991-03-28 Method and means for encoding storing and retrieving hierarchical data processing information for a computer system

Country Status (1)

Country Link
CA (1) CA2246943A1 (en)

Similar Documents

Publication Publication Date Title
CA2246946C (en) Method and means for encoding storing and retrieving hierarchical data processing information for a computer system
US5778223A (en) Dictionary for encoding and retrieving hierarchical data processing information for a computer system
US5664181A (en) Computer program product and program storage device for a data transmission dictionary for encoding, storing, and retrieving hierarchical data processing information for a computer system
US5291583A (en) Automatic storage of persistent ASN.1 objects in a relational schema
US6115710A (en) Portable and dynamic distributed transaction management method
US7792852B2 (en) Evaluating queries against in-memory objects without serialization
US5960200A (en) System to transition an enterprise to a distributed infrastructure
JP5255605B2 (en) Registry-driven interoperability and document exchange
CN110309196A (en) Block chain data storage and query method, apparatus, equipment and storage medium
US7386541B2 (en) System and method for compiling an extensible markup language based query
US7124358B2 (en) Method for dynamically generating reference identifiers in structured information
US20020078041A1 (en) System and method of translating a universal query language to SQL
US20070219959A1 (en) Computer product, database integration reference method, and database integration reference apparatus
JP2005018776A (en) Query intermediate language method and system
CN108959626B (en) Efficient automatic generation method for cross-platform heterogeneous data profile
US5687365A (en) System and method for creating a data dictionary for encoding, storing, and retrieving hierarchical data processing information for a computer system
CN111859426A (en) Universal encrypted database connector and setting method thereof
US20040049495A1 (en) System and method for automatically generating general queries
CA2246943A1 (en) Method and means for encoding storing and retrieving hierarchical data processing information for a computer system
US20160231936A1 (en) In-memory variable meta data system and method
Dongilli et al. A Multi-Agent System for Querying Heterogeneous Data Sources with Ontologies.
KR20020061888A (en) Method for developing xml application program
Pascoe Construction of interfaces for the transfer of data between geographical information systems
Trewitt et al. HEMS monitoring and control language
Hsu et al. FSTV: The Software Test Vehicle for the Functional Hierarchy of the INFOPLEX Database Computer.

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued