WO1991016682A1 - Procede pour structurer et stocker des donnees dans un fichier - Google Patents

Procede pour structurer et stocker des donnees dans un fichier Download PDF

Info

Publication number
WO1991016682A1
WO1991016682A1 PCT/GB1991/000666 GB9100666W WO9116682A1 WO 1991016682 A1 WO1991016682 A1 WO 1991016682A1 GB 9100666 W GB9100666 W GB 9100666W WO 9116682 A1 WO9116682 A1 WO 9116682A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
file
structuring
name
preceded
Prior art date
Application number
PCT/GB1991/000666
Other languages
English (en)
Inventor
Sydney Reading Hall
Original Assignee
International Union Of Crystallography
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Union Of Crystallography filed Critical International Union Of Crystallography
Publication of WO1991016682A1 publication Critical patent/WO1991016682A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • a method of structuring or storing datawithin a file is a method of structuring or storing datawithin a file.
  • the present invention relates to handling data and more particularly to a method of structuring or storing text data within a file and to a file containing such data.
  • a method of structuring or storing data within a file comprising the following steps:
  • the nature of the file is preferably such that it is visually readable as text in addition to being machine readable.
  • Each text line contains up to a pre-set maximum number of visible ascii characters. The limit will normally be set at eighty.
  • Each data item may be directly preceded by the respective data name. Alternatively, a plurality of data names in a group may be followed by a like plurality of data items repeated a desired number of times.
  • the first common feature may be the text string 'data_', and the members of the first set are of the form 'data_blockcode' where 'blockcode' is a unique block code in each case.
  • the second common feature may be just an underline '_', and the members of the second set are of the form '_name' where 'name' is a respective data name.
  • the data handled may relate to any desired subject, but the method is especially suitable for crystallographic data. Another suitable use is for chemical data
  • the method is especially suitable for the archiving of data and for inputting data to data-bases because of its facility for upwards compatibility and flexibility.
  • the method is also particularly advantageous for the electronic transport of text and data, via computer networks or magnetic media. It is particularly well- suited for submitting publications to technical journals.
  • each data item being stored in the file as a text, character or numerical quantity is uniquely identified by a text name.
  • the text name serves as an identifier which can be interpreted visually as well as by machine.
  • the data items may appear in any order.
  • the method relates to a process for handling text data; although it is primarily intended for computer application it does not itself relate to a computer program. Nor does it relate to a method of presenting information because its format of presentation is arbitrary. According to a second aspect of the present invention there is provided means for structuring or storing data within a file comprising:
  • a data file comprising a plurality of data blocks, each preceded by a respective data block code, and, within each block, a plurality of data items each preceded by a respective data name, wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.
  • a fourth aspect of the present invention there is provided a method of retrieving data from a file of the above type, comprising listing the requested data items and outputting the requested data items in the order requested, the output file having the same format as the accessed file.
  • the BCCAB archive file is used by the Cambridge Data Centre (U.K) to prepare the packed crystallographic organic structural data base file ASER.
  • Appendix 1 is an extract from one entry of the BCCAB file.
  • the format is "free” in the sense that many lines have an identifying code (e.g. #Author) which provides flexibility in the order of lines, and for optional line input. Certain data items are "free” in that they are separated either by a single blank or comma. However, all line identifying codes, and many data sequences, are predefined and have a fixed function within the BCCAB definitions. Software processing this format expects predefined protocols to be observed. Violations of this protocol, or the presence of foreign data, will out of necessity be treated as a processing error and terminate data access.
  • the second example of a "pre-defined free format” file is that used by the XTAL3.0 Crystallographic Program System (Hall & Stewart, 1990), as shown in Appendix 2. It is classed as a "free format” file because every line, and many individual data items, are tagged with an identification code. This provides for variations in the order of line input but only within strict guidelines.
  • the program initiation lines (those with the line codes in upper case letters) may be in order but the optional control lines (codes in lower case letters) are specific to a particular program.
  • Data items, and data codes are also specific to a line. Violation of an input rule will terminate data processing of this file. These types of restrictions are typical of those placed on many "predefined free format" files.
  • STAR Self-defining Text Archive and Retrieval
  • This file contains standard ascii text which defines both the data structure (i.e. the arrangement of the data) and the data items. Each data item is explicitly identified by a name and these may be stored in any order. Simple syntactical rules applied to the data names provide access to each data item in a STAR file. No other knowledge of the data items is required.
  • a STAR file is normal text data that can be edited and read with a text editor. Its contents are intelligible as text and can be stored or transmitted electronically without conversion.
  • the structure of a STAR file is simple. Each file is divided into a sequence of data blocks which contain individual data items. The identity of each data item is determined by a preceding data name. It is possible to repeat data items by placing them within simple looping structures.
  • a STAR file can be defined by only a few simple rules. This ensures maximum flexibility in data storage and its widest possible applicability. No assumptions are made about the order of the data blocks or data items, other than the requirement that identifying names be unique. There are no rules regarding the placement of data names or data items within a data block, other than the requirement that the name must precede the item. Access to data in a STAR file is made simply by requesting a specific data name within a specific data block. No prior knowledge is needed about either the data type, whether the item is looped, or whether an item exists in the file. As an introduction to STAR file concepts, here are some examples of data syntax. A data block is identified by a unique string with the construction 'data_blockcode'.
  • a data item is identified by a unique data name which starts with an underline '_'.
  • a data item may be repeated individually or in a group. These are referred to as looped data items and are specified with a 'loop_'string. Here is an example of looped data items.
  • a STAR file is a formatted sequential file containing text lines of standard visible ascii characters. It may be viewed or edited with any standard text editor.
  • a STAR file is divided into any number of sequential data blocks. The information within a data block defines the data structure (i.e. the data order), and the data items. All of this information is intelligible as text.
  • the "save frame” command will now be described.
  • the principal purpose of the "save frame” command is to define a block of data items that can be internally referenced within a data block via a single code.
  • This code is the "save frame” code which is used within the data block as a character string preceded by a "$" character.
  • the save frame command enables data definitions to be repeated within a data block, and yet these definitions are insulated from one another.
  • a save frame definition may precede or follow its reference as a $ ⁇ frame-code>.
  • Frame codes may be also referenced within other save frames. Recursive references to save frames are not perm it t ed. The following nine syntax rules provide the specifications for a STAR file.
  • a text string is defined as either a sequence of non-blank characters, a sequence of characters bounded by matching single or double quotes (i.e. ⁇ '> or ⁇ ">), or a sequence of lines bounded by a semicolon ⁇ ;> as the first character of a line.
  • a text string must not span more than one line, except if bounded by semicolons.
  • a data name is a text string starting with an underline'_'.
  • a data item is a text string not starting with an underline '_', and preceded by the identifying data name.
  • a data loop i ⁇ a list of data names, followed by a repeated list of data items, and preceded by the text string 'loop_'.
  • a save frame is a sequence of data names, data items and data loops preceded by the text string 'save_ framecode' where 'framecode' is a unique identifying code within a data block.
  • a save frame sequence is closed by another save frame command, by the text string 'stop_' or by a data block command.
  • a data block is a sequence of data names, data items, data loops and save frames preceded by the text string 'data_blockcode' where 'blockcode' is a unique identifying code within a STAR file.
  • the data block sequence is closed by another data block command or the en d o f t h e S T A R f i l e .
  • a data name must be unique within each save frame sequence and a data block sequence.
  • a save frame declaration must be unique within a data block sequence.
  • the save frame code may be referred to within a data block as the data item '$framecode'.
  • the key to accessing a STAR file is the data name. It is essential that the data names needed for a given application be defined carefully and precisely in a distributed Glossary. Data names and their definitions must not be changed in the lifetime of the archive file, but new names and definitions may be added as needed. A glossary does not restrict the data that can be stored in a STAR file; it is only to provide information about data items in general use.
  • One application of the STAR file is as a basis for a
  • Crystallographic Information File This application will be used to illustrate the STAR file concepts.
  • a data item is assumed to be of type number if it is not bound by matching single or double quotes, and starts with digit 0-9, a plus '+ ' , a minus '-', or a period '.'.
  • a number may be in integer, real or scientific format. If a number is concatenated with another number bounded by parentheses, it is taken to be the standard deviation [e.g. nn.nnn(m)].
  • a data item is assumed to be of type text if it extends over more than one line.
  • a data item is assumed to be of type character if it is surrounded by matching single and double quotes and is not either of type number or type text.
  • Appendix 4 shows an example of a CIF file containing two data blocks 'manuscript' and 'crystal-structure'.
  • Data is retrieved from a STAR file by locating its data name. This would normally be done by 'parsing' the file and locating a request list of data names.
  • Existing software called QUASAR uses this approach to access a STAR file. Data items and data blocks are output by QUASAR in the order requested. The QUASAR output file is also in STAR format. For a given data block the same data item may be requested up to 5 times. The STAR file is always checked for logical integrity.
  • the names of the archive file (i.e the input STAR file) and output file are specified as the strings 'star_arc' and 'star_out', respectively. These are entered at the start of the requested list. In the example request list shown in Appendix 5 these files names are 'qtest.arc' and 'qtest.out'.
  • Appendix 6A and 6B shows the file 'qtest.out' which is output after entering the request list of Appendix 5.
  • the output is itself a STAR file that can also be processed by a request list. Note that requested items missing from the archive file are flagged with '??'.
  • Appendix 7A and 7B shows examples of save frame commands relating to a standard molecular data format.
  • the above-described file formats and the associated method of handling data have the advantage of generality, upwards compatibility and flexibility.
  • the file is machine-independent and portable so that data items are accessible quite independently of their point of origin . It is fundamental that the file allows for future data to be incorporated without the need to modify existing files.
  • the STAR file format meets the requirements of a "universal" archival file. It may be used for archiving all types of text and numerical data, in any order. It is particularly suited to electronic transmission purposes.

Abstract

Un procédé d'organisation ou de stockage de données dans un fichier comprend les étapes suivantes: (i) l'agencement du fichier en une pluralilté de blocs de données qui sont tous précédés d'un code de bloc de données respectif; et (ii) l'agencement des données dans chaque bloc, en une pluralité de données élémentaires qui sont toutes précédées d'un nom de données respectif; dans ce procédé les codes des blocs de données sont extraits d'un premier ensemble prédéterminé, peuvent être dans un ordre quelconque, et ont tous une première caractéristique commune; alors que les noms des données proviennent quant à eux d'un deuxième ensemble prédéterminé, peuvent être dans un ordre quelconque, et ont une deuxième caractéristique commune. La première et la deuxième caractéristique commune sont facilement différenciables. Le fichier peut être lu sous forme de texte par l'utilisateur, il peut également être lu par la machine.
PCT/GB1991/000666 1990-04-26 1991-04-26 Procede pour structurer et stocker des donnees dans un fichier WO1991016682A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9009447.5 1990-04-26
GB9009447A GB2243467B (en) 1990-04-26 1990-04-26 Handling data

Publications (1)

Publication Number Publication Date
WO1991016682A1 true WO1991016682A1 (fr) 1991-10-31

Family

ID=10675075

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1991/000666 WO1991016682A1 (fr) 1990-04-26 1991-04-26 Procede pour structurer et stocker des donnees dans un fichier

Country Status (5)

Country Link
EP (1) EP0526516A1 (fr)
JP (1) JPH05509183A (fr)
AU (1) AU7763591A (fr)
GB (1) GB2243467B (fr)
WO (1) WO1991016682A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994027232A1 (fr) 1993-05-12 1994-11-24 Apple Computer, Inc. Unite de gestion de memoire destinee a un systeme informatique
WO1995010091A1 (fr) * 1993-10-04 1995-04-13 Robert Dixon Procede et appareil de stockage et d'extraction de donnees

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10177508A (ja) * 1996-12-18 1998-06-30 G & G Pharma Kk コンピュータ上のデータ格納構造

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4878167A (en) * 1986-06-30 1989-10-31 International Business Machines Corporation Method for managing reuse of hard log space by mapping log data during state changes and discarding the log data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Compsac 87, IEEE Proceedings, Computer Software & Applications Conference, 7-9 October 1987, Tokyo, JP, M.M. Blattner et al.: "Data structure and format conversion using syntactive inference", pages 416-421 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994027232A1 (fr) 1993-05-12 1994-11-24 Apple Computer, Inc. Unite de gestion de memoire destinee a un systeme informatique
US5857207A (en) * 1993-05-12 1999-01-05 Apple Computer, Inc. Storage manager for computer system
US5870764A (en) * 1993-05-12 1999-02-09 Apple Computer, Inc. Method of managing a data structure for concurrent serial and parallel revision of a work
WO1995010091A1 (fr) * 1993-10-04 1995-04-13 Robert Dixon Procede et appareil de stockage et d'extraction de donnees
AU695765B2 (en) * 1993-10-04 1998-08-20 Robert Dixon Method and apparatus for data storage and retrieval
US5799308A (en) * 1993-10-04 1998-08-25 Dixon; Robert Method and apparatus for data storage and retrieval

Also Published As

Publication number Publication date
AU7763591A (en) 1991-11-11
JPH05509183A (ja) 1993-12-16
EP0526516A1 (fr) 1993-02-10
GB2243467A (en) 1991-10-30
GB2243467B (en) 1994-03-09
GB9009447D0 (en) 1990-06-20

Similar Documents

Publication Publication Date Title
US7293006B2 (en) Computer program for storing electronic files and associated attachments in a single searchable database
US11361006B2 (en) Systems and methods for document sorting
US6226630B1 (en) Method and apparatus for filtering incoming information using a search engine and stored queries defining user folders
JPS61193266A (ja) 情報検索システム
EP2172853B1 (fr) Index de base de données et base de données pour indexer des documents textuels
US20080250425A1 (en) Systems and methods for interfacing multiple types of object identifiers and object identifier readers to multiple types of applications
EP1590749B1 (fr) Procede et systeme de mappage d'une structure de donnees xml en une structure de donnees a n-dimensions
Sayers et al. Building customized data pipelines using the entrez programming utilities (eUtils)
WO1991016682A1 (fr) Procede pour structurer et stocker des donnees dans un fichier
JP5420317B2 (ja) 変換パラメータ生成システム及び同変換プログラム
WO2002059726A2 (fr) Procede permettant d'effectuer la recherche d'un modele objet de document numerique
Hulse The ALADDIN atomic physics database system
Hamelers et al. A full text collection of COVID-19 preprints in Europe PMC using JATS XML
US20070214127A1 (en) Scalable data extraction from data stores
Tollefson Importing and Creating Data
JPH11353316A (ja) 省略語補完装置
LaPointe Jr GDP: Generalized Document Processing
JPH10301940A (ja) 情報処理装置及びその方法
JPS5820073B2 (ja) シソ−ラス構成方式
Cutler et al. Input, Output, and the Web
Hall et al. Genesis of the Crystallographic Information File
Blumer The Burrows-Wheeler Transform with applications to bioinformatics
Bose et al. Interventions for post‐transplant anaemia in kidney transplant recipients
Vaghela et al. Unicode Based Multilingual Catalogue Module: A New Feature of SOUL
Goyal et al. dBase to CDS/ISIS: a program to convert data from dBase/foxBase to CDS/ISIS

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AT AU BB BG BR CA CH DE DK ES FI GB HU JP KP KR LK LU MC MG MW NL NO PL RO SD SE SU US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE BF BJ CF CG CH CM DE DK ES FR GA GB GR IT LU ML MR NL SE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 1991908220

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1991908220

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1991908220

Country of ref document: EP