WO2002103557A1 - Method for handling of data and data structure - Google Patents

Method for handling of data and data structure Download PDF

Info

Publication number
WO2002103557A1
WO2002103557A1 PCT/FI2002/000474 FI0200474W WO02103557A1 WO 2002103557 A1 WO2002103557 A1 WO 2002103557A1 FI 0200474 W FI0200474 W FI 0200474W WO 02103557 A1 WO02103557 A1 WO 02103557A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
block
data structure
value
column
Prior art date
Application number
PCT/FI2002/000474
Other languages
English (en)
French (fr)
Inventor
Sari Peltonen
Esa Tervo
Original Assignee
Ebsotech Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ebsotech Oy filed Critical Ebsotech Oy
Publication of WO2002103557A1 publication Critical patent/WO2002103557A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]

Definitions

  • the present invention relates to a method, according to the preamble of Claim 1, for the handling of data.
  • the invention also relates to a data structure, according to Claim 7, and to a method, according to Claim 13, for handling the data structure.
  • the invention then, relates to a method and arrangement for creating a unified and efficient means of handling data in a distributed computing environment in order to enable efficient transferring and handling of large amounts of data.
  • the invention relates to computer networks in general, and especially to making distributed computing more efficient.
  • the Internet is a global communication network, which connects countless local area networks formed by companies, educational establishments and other communities.
  • An individual user or business can have direct access to a local area network, or the connection can be established via a public switched telephone network, an ISDN (Integrated Services Digital Network) connection, or a mobile teleterminal element, for example.
  • Internet traffic uses the different methods provided by the TCP/IP (Transmission Control Protocol/Internet Protocol) protocol family.
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • service providers rely on methods based on the WWW concept (World Wide Web), which typically uses the HTTP (HyperText Transfer Protocol) protocol in the direction of the end user.
  • the WWW utilizes HTML (HyperText Markup Language) documents, which can contain text, images, video, sound, and links to other documents and, at the same time, links to other services. More and more often, these Websites are connected to different kinds of background systems, background system networks, and data warehouses, which are connected to at runtime, or from which data is retrieved, or to which data is written.
  • HTML HyperText Markup Language
  • the purpose of this invention is to create a completely new kind of method for the handling of data, which would make it possible to solve the problems entailed in the currently known technique described above.
  • the invention relates to methods and arrangements with which the scalability of a distributed software system to large data masses can be made significantly more efficient on the part of the individual machines taking part in the handling of data, on the part of the transferring of data, and, finally, on the part of the amount of machines taking part in the handling of data.
  • One preferred implementation of the invention is based on the premise of the format of the data being the same both when it is transferred in the network (Structured Data Protocol) and when it resides in RAM memory (Structured Data Block), and there is no need for any conversions between the data flow and the memory structures.
  • Another preferred implementation of the invention is based on the premise of the data being structured to a continuous data block, which contains the descriptions and the values of the data, and all other information necessary for enabling its direct utilization in such a way that transferring and defining of empty memory allocations will be avoided by using an indexed data structure.
  • the goal of the present invention is to significantly enhance retrieving and saving of data, and especially the performance of distributing data handling in a networked environment, by offering a method according to which data is transferred in a network in exactly the same format as in which it is handled in the RAM memory of individual machines.
  • this goal will be achieved via a method characterized by the following phases: 4
  • Data is retrieved or received from a data source (database, file, data flow) without using traditional two-dimensional arrays, lists, etc., even for temporary storage in RAM memory.
  • RAM memory data is handled in continuous data blocks, which in addition to the actual data contain all the definition and control information needed for its handling as a continuous memory block without any "empty bytes".
  • Data blocks can be transferred directly to the network as such, without the conversion from data structures to data flow, which is typical of the traditional methods.
  • the receiving computer can utilize data directly from the network, without having to convert data flow to data structures.
  • a data structure and a method for handling the data structure is characterized by what is presented in the characterizing portions of Claims 7 and 13.
  • the proposed method does not require any external changes to existing server or network configurations in order to enable the proposed functionality.
  • Figure 1 is a block diagram representation of a method according to the currently known technique.
  • Figure 2 is a diagrammatical block diagram representation of a method according to the invention for the handling and transferring of data.
  • Figure 3 is a diagrammatical block diagram representation of the utilization of a method according to the invention in a distributed system.
  • Figure 4 represents diagrammatically one data structure according to the invention.
  • Figure 5 shows the data structure of Figure 4 in more detail.
  • Figure 6 shows block 51 of Figure 5 in more detail.
  • Figure 7 is a flow chart representation of one method according to the invention for retrieving the pointer and the space allocation of the value of one cell.
  • Figure 8 is a flow chart representation of a method according to the invention for retrieving the name of a column.
  • a data structure refers to an organized entity of data groups, which includes the functions for handling these data groups and the axioms defining the functions.
  • a data block refers to a block of data, which can be, for example, stored in a computer's memory or transferred in a network.
  • structured data is understood as data stored in a relational database.
  • this concept will be expanded to refer to any data that can be presented in structured format, such as: 1.
  • the number of records (row count) in each data block can be represented with the variable n, where n is greater than or equal to zero.
  • the number of fields (column count) in each data block can be represented with the variable m, where m is greater than or equal to 1. In practice, this means that the number of columns is the same for all the records in a data block.
  • Each individual field can contain fixed or variable sized data.
  • Numeric fields are of fixed length, for example 2 bytes for short integers, 4 bytes for long integers and floats, 8 bytes for doubles, etc.
  • a variable sized field can be a field the maximum length of which is known, but which usually contains data that would fit into a smaller space allocation.
  • a string is a typical example.
  • a field of variable size can also be a field the size of which is either unlimited or unknown.
  • Figure 4 illustrates the basic structure of such a data block 15, comprising of a definition block 16, which in turn is divided into a header block 18, a column definition block 19 and a value slot block 20.
  • the data structure contains a data block (blob block) 17.
  • Structured data block can be transferred as such to a network between the machines taking part in distributed computing. This means that there is no need to perform additional conversions between the format for transferring in a network and the format T FI02/00474
  • the logical storage model is exactly the same regardless of whether the data is in a network or in RAM.
  • Structured data block 15 is a continuous memory block, which contains column definitions and values in one continuous memory block. Additional or separate memory areas will be neither allocated nor used. This kind of optimal memory management prevents memory fragmentation also in intensive usage and in massively repeated operations.
  • An additional benefit of the proposed method is that memory does not need to be unnecessarily allocated for the actual values. With columns of variable size the unnecessary part ( ⁇ MAX_COLUMN_SIZE in_bytes> ⁇ required_bytesize_for_storing_the_value>) can be avoided altogether.
  • the proposed functionality can be implemented with the C programming language, for example, and by offering C-style exported functions for external usage.
  • This kind of internal implementation can be used from all those programming, macro and script languages that support the calling of external functions.
  • the possibility to receive the value as a pointer to a value maintained in the data block is provided.
  • the latter option provides the possibility to handle values efficiently without forcing to unnecessary copying of memory.
  • a method for filling a data block for example, column definitions, values).
  • Figure 1 is a diagrammatical representation of a traditional method. The entities of Figure 1 are as follows:
  • Data is stored on a server 40 in a traditional relational database 41.
  • Data is transferred to the calling computer 42 and converted/stored in RAM 43 in a two-dimensional array 47 for handling; the array consists of fields 12 or cells as defined by rows 10 and columns 11. Some of the fields 13 in the array do not contain data or they are not using all of the space allocated for them, but nevertheless RAM 43 is allocated for the field values. 3. Data is transferred to another computer 44 via network for further handling or storing. To this end, the two-dimensional array is converted to a data flow 45 for transferring. Typically, the extra space allocations of the two-dimensional array (“empty data”) 13 are also transferred.
  • the data flow is converted once again and/or stored in RAM 46 in a two-dimensional array 47 for handling. Parts of the array do not contain data, but nevertheless RAM is allocated for them.
  • Figure 2 is a diagrammatical representation of a method according to the invention.
  • the entities of Figure 2 are as follows:
  • Data is stored on a server 40 in a traditional relational database 41.
  • Data is transferred to the calling computer 42 and stored in RAM 43 in a continuous data block 15 for handling. Arrays or other traditional data structures are not used, and there is no "empty data" being stored.
  • Data is transferred to another computer 44 via network for further handling or storing. Because it is stored in RAM 46 in the data flow format (i.e. data block format) 15 used in the network, there is no need for any conversions.
  • the data block 15 can be transferred to network as such, and no "empty" or extra data is being transferred.
  • the target computer 44 receives the data flow 15 to RAM 46 as such, and can handle it without any prior conversion.
  • Figure 3 is a diagrammatical representation of a method according to the invention applied to a distributed environment according to the example.
  • the entities of Figure 3 are as follows:
  • Data collection phase a. Data is stored on a server 40 in a traditional relational database and retrieved according to the invention to continuous data blocks 15. b. Data is stored in files from which it is read according to the invention to continuous data blocks 15. c. Data is retrieved from customer specific data sources according to the invention to continuous data blocks 15. d. Data is received from other systems as data flow according to the invention to continuous data blocks 15.
  • Data handling phase e. Data is transferred as continuous data blocks 15 to the RAM of the computer that handles and combines the data. Arrays or other traditional data structures are not used, and no "empty data" is stored. Any conversions during the transfer are not needed.
  • f. Data is handled in RAM exactly in the same format. g. After handling and combining, the data is transferred to a cache computer and a reporting server as continuous data blocks. Arrays or other traditional data structures are not used, and no "empty data” is stored. Any conversions during the transfer are not needed.
  • Intermediate data storage phase h. Data has been received via network for end-user handling, intermediate storing of results, and reporting. Because the data is stored in RAM in the data flow format (i.e. data block format) 15 used in the network, no conversions for redistribution are needed. 24. Adjoining data to other systems: i. Target computer receives the data flow 15 to RAM as such and can handle it as such without any conversions or transformations. If necessary, the data is converted to a suitable format, such as XML, as required by the receiving external system.
  • a suitable format such as XML
  • Client computer receives data flow to RAM as such and can handle it as such without any conversions or transformations. If necessary, the data is converted to a suitable format, such as XML, as required by the receiving customer system.
  • the data structure 15 comprises a definition block 16 and an indexed data block, blob block 17.
  • the definition block 16 contains the general definitions of the actual data structure (typically, a table), the header block 18, which typically contains the definitions for the number of columns and rows.
  • the column definition block 19 defines the properties of the columns contained in the data block 15, such as column value's data type and size in the value slot block 20.
  • the value slot block 20 comprises of row specific value slot sequences, each of which contains the value slots (cells) for each column. The value of each individual cell is interpreted either as the data contents of the cell or the location of the data content in the blob block 17.
  • a cell in block 20 is interpreted as the data content for the columns in which the space allocation of the data content is always an exact match for the maximum space needed for the value of the column.
  • a cell in block 20 is interpreted as its index (location) in the blob block 17.
  • the value slot block 20 serves the data system as a "map" to the contents of block 17 with regard to fields of variable sizes.
  • the header block 18 is to be implemented in such a way that it provides at least the following information:
  • header block 18 It is recommended, although not mandatory, to implement the header block 18 in such a way that it provides also the following information:
  • a 32-bit bitmask to describe the properties of, or to provide additional information on the data block 15. For example, it is recommended to store the information about whether the contents were retrieved in big-endian or little- endian byte order. This prevents the need for data conversions if the computer that fills the data block and the computer that handles the data have the same byte order.
  • the column definition block 19 comprises of sequential column definition records 50. In the block, there is exactly one column definition record 50 for each column.
  • column definition record 50 It is recommended, although not mandatory, to implement the column definition record 50 in such a way that it contains also the following information:
  • the value slot block 20 can be seen as a two-dimensional vector, which has been implemented via the means of one-dimensional vector.
  • the block comprises of sequential row specific value slot ques.
  • the space allocation of the block can be calculated by multiplying the number of rows by row width.
  • Row width refers to the sum of the sizes of the values of all columns, which may possibly be rounded up so that the row size becomes divisible by 32 bits.
  • the value slot que of each row comprises of sequential column values of that row.
  • the value is interpreted by data type and/or by a possible column property bitmask.
  • the value of the value slot of a fixed sized column is interpreted as the value of the field.
  • the value of the value slot of a variable sized column is a zero-based index to the blob block 17.
  • a numerical field is a typical example of fixed sized columns.
  • Variable sized columns are all those columns the value of which can have a smaller space allocation than the maximum limit, or which do not have a limited maximum size.
  • a typical example of a variable sized column is a text field.
  • the blob block 17 contains variable sized values and, depending on data type, possibly the size of the value is stored, too.
  • the size of the blob block 17 can be zero. With RISC machines implementation must take into account the fact that the data is expected to start from the 32-bit boundary.
  • Figure 6 describes in more detail the structure of the row specific value slot que 51 located in the value slot block 20.
  • the size of each individual value slot 52 corresponds to the size of the column.
  • the value contained in each slot 52 is interpreted according to the data type and/or the property mask of the column. For fixed sized columns the slot is interpreted as the value of the field as such, and for variable sized columns the slot is interpreted as an index to the blob block 17.
  • a data structure according to the invention defines the field (column) of a record as in block 20 in Figure 4.
  • This data block 20 contains, for example, the following elements:
  • a structured data block 15 defines columns and values and it contains, for example, the following elements with reference to blocks 19 and 18 in Figure 4:
  • the value in the value slot is always either the actual value itself (fixed sized values) or a 0-based index to an internal blob block (variable sized values).
  • variable sized columns comprise also such columns that from a database point of view may be of fixed size (for example, SQL_CHAR and length 15).
  • variable sized columns comprise all those columns the maximum size of which is greater than 4 bytes, and in which the value of an individual column may have a space allocation that is smaller than the maximum limit.
  • a variable sized column can be seen as a matter of definition. Then, we can define as fixed sized a column that meets the above default requirement for a variable sized column, but the value of which has a space allocation that is known to vary no more than 4 bytes from the maximum.
  • the actual value can be retrieved from the blob block 17 with the blob index.
  • a pointer to the actual value can be calculated as follows:
  • the value of the value slot is an index to a null-terminated string.
  • the value of the value slot is an index to the data block where the value itself is preceded by a 32-bit integer representing the space allocation of the value.
  • a pointer to the value can be calculated as follows:
  • Figure 7 shows an example of retrieving the pointer and the space allocation of a value from row j in column k.
  • the data block 15 begins from the memory block s.
  • s is interpreted as an address to a byte sequence.
  • the size of the value is returned in the variable return_val_size.
  • the data block 15 has been implemented in such a way that blob always begins with the space allocation of a value.
  • the space allocation of specific values could be solved on the basis of column data type or column property masks.
  • the values of text-type columns could be defined to be stored as null-terminated and the size could be solved by the length of the text.
  • ASNUMBER refers to reading a numerical value from a location pointed to by a memory address provided as a parameter.
  • INDEXSIZE is a constant, which shows the size of the data type of an index pointing to the blob block 17, 4 bytes, for example.
  • the following values can be retrieved from the header block 18 of the data block 15.
  • the column pointer (col_ptr) is given the value s+ IColOffset (the column definition block offset from the beginning of the data block 15 in bytes)+k*lColSize(The size of the column definition block in bytes).
  • the value pointer (val_ptr) is given the value s+lValueOffset (the value slot block 20 offset in bytes from the beginning of the data block 15) +j* lRowWidth (the space allocation of one value slot row 51 in the value slot block) + col_ptr.IValOffset (column specific value slot cell 52 offset from the beginning of each row's value slot que 51 in the value slot block 20).
  • a numerical value retrieved from a memory block pointed to by the value pointer is set as the value index.
  • a memory address is returned as a pointer to the value, calculated by adding the following to the memory address pointing to the beginning of the data block 15: lBlobOffset (blob block 17 offset from the beginning of the data block 15 in bytes) + value index + INDEXSIZE (a constant, which shows the size of the data type of the blob index in bytes).
  • Figure 8 presents an example of retrieving the name of the column for column k.
  • the data block 15 starts from the memory address s. In the formulas, s is interpreted as an address to a byte sequence.
  • the data block 15 has been implemented in such a way that includes the optional column structure size and column specific name field.
  • the column pointer (col_ptr) is given the value s + IColOffset (the column definitition block offset from the beginning of the data block 15) + k*lColSize(the size of the column definition record in bytes).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/FI2002/000474 2001-06-01 2002-06-03 Method for handling of data and data structure WO2002103557A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20011160A FI111999B (sv) 2001-06-01 2001-06-01 Förfarande för behandling av data samt datastruktur
FI20011160 2001-06-01

Publications (1)

Publication Number Publication Date
WO2002103557A1 true WO2002103557A1 (en) 2002-12-27

Family

ID=8561326

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2002/000474 WO2002103557A1 (en) 2001-06-01 2002-06-03 Method for handling of data and data structure

Country Status (2)

Country Link
FI (1) FI111999B (sv)
WO (1) WO2002103557A1 (sv)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2305752A (en) * 1994-11-18 1997-04-16 Chase Manhattan Bank Nat Ass Electronic check image storage and retrieval system
US6105017A (en) * 1997-09-15 2000-08-15 International Business Machines Corporation Method and apparatus for deferring large object retrievals from a remote database in a heterogeneous database system
US6151602A (en) * 1997-11-07 2000-11-21 Inprise Corporation Database system with methods providing a platform-independent self-describing data packet for transmitting information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2305752A (en) * 1994-11-18 1997-04-16 Chase Manhattan Bank Nat Ass Electronic check image storage and retrieval system
US6105017A (en) * 1997-09-15 2000-08-15 International Business Machines Corporation Method and apparatus for deferring large object retrievals from a remote database in a heterogeneous database system
US6151602A (en) * 1997-11-07 2000-11-21 Inprise Corporation Database system with methods providing a platform-independent self-describing data packet for transmitting information

Also Published As

Publication number Publication date
FI20011160A0 (sv) 2001-06-01
FI20011160A (sv) 2002-12-02
FI111999B (sv) 2003-10-15

Similar Documents

Publication Publication Date Title
US7448043B2 (en) System and method of compact messaging in network communications by removing tags and utilizing predefined message definitions
DE69128952T2 (de) Gerät und Verfahren zur Entkupplung von Datenaustauschdetails zur Beschaffung einer Hochleistungskommunikation zwischen Softwareprozessen
AU636152B2 (en) Apparatus and method for providing decoupling of data exchange details for providing high performance communication between software processes
US6401097B1 (en) System and method for integrated document management and related transmission and access
US9300764B2 (en) High efficiency binary encoding
US20030149823A1 (en) System and method for providing context information
Lum et al. On balancing between transcoding overhead and spatial consumption in content adaptation
CN1625179B (zh) 按可定制的、基于标签协议中的引用发送
JP2000089988A (ja) 文書プロパティに基づく自己管理型文書の文書管理方法
EP1197882A3 (en) System, computer program product and method for managing documents
US20030145096A1 (en) Method and device for delivering information through a distributed information system
CN1414485A (zh) 内容转换系统,自动样式表选择方法及其程序
US20020107881A1 (en) Markup language encapsulation
CN101609415A (zh) 基于中间件的通用服务调用系统及方法
CN105700825B (zh) 一种基于Android系统的缩略图存储方法和装置
WO2002103557A1 (en) Method for handling of data and data structure
US20070112848A1 (en) Method and system for concurrently processing multiple large data files transmitted using a multipart format
US20100131673A1 (en) System and method for distributing foveated data in a network
JPH0619744B2 (ja) 複合データ構造の作成方式
US7620734B2 (en) System and method for distributing foveated data in a network
Chen et al. A static resource allocation framework for Grid‐based streaming applications
Paepcke et al. Towards Interoperability in Digital Libraries
JP3055498B2 (ja) データベース検索方法
Gordon et al. Bluejay: a browser for linear units in Java
JP3446610B2 (ja) データベースキャッシュシステム

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ CZ DE DE DK DK DM DZ EC EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC (COMMUNICATION DATED 09-02-2004, EPO FORM 1205A)

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP