GB2243467A

GB2243467A - Handling data

Info

Publication number: GB2243467A
Application number: GB9009447A
Authority: GB
Inventors: Sydney Reading Hall
Original assignee: INT UNION OF CRYSTALLOGRAPHY
Current assignee: INT UNION OF CRYSTALLOGRAPHY
Priority date: 1990-04-26
Filing date: 1990-04-26
Publication date: 1991-10-30
Anticipated expiration: 2010-04-26
Also published as: GB9009447D0; EP0526516A1; WO1991016682A1; AU7763591A; JPH05509183A; GB2243467B

Abstract

A method of structuring or storing data within a file has the following steps: (i) arranging the file into a plurality of data blocks each preceded by a respective data block code; and (ii) arranging the data within each block into a plurality of data items each preceded by a respective data name; wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable. The file is visually readable as text in addition to being machine readable.

Description

HANDLING DATA The present invention relates to handling data and more particularly to a method of structuring or storing data within a file and to a file containing such data.

Many existing procedures for computer archiving use a 'fixed format' file in which the data structure is determined by specific data requirements. A fixed format file is simple and fast to access but the data structure cannot be modified without reformatting existing files.

Other archival files are based on 'pre-defined free formats'. This approach does not restrict data to specific positions in the file. Data 'keys' are often used to aid in date recognition and this permits fewer restrictions on the ordering of data lines and items.

This is an important advantage over the fixed format files. Access to free format files currently in use still requires some advance knowledge of the expected data types and the data structure. The addition of any new data types or structures also requires that processing software be modified. This means that existing data processing software must be altered to provide common access to files which pre- and post-date the file changes. The term 'free format' is therefore misleading because it really refers to an improved flexibility within a relatively restricted data structure.

The inflexibility of the two traditional archival approaches described above restricts the exchange of data, even within the same discipline, especially if the number and nature of data types changes rapidly and continually. This is the case in many data processing fields and as a result a vast repertoire of specialized and 'local' file formats has evolved over the years. A diversity of file formats is tolerable when electronic data transfer is infrequent and processing speeds require that file formats be finely tuned to specific applications. Rapid increases in computing power and in computer networks have signalled an end to this rationale. In the era of widespread data exchange, global data bases, electronic mail and electronic publication submission, the critical need is for a general, flexible and extensible file format.

The present invention seeks to provide an improved file format and an associated method of handling data which overcome one or more of the above problems.

According to a first aspect of the present invention there is provided a method of structuring or storing data within a file comprising the following steps: (i) arranging the file into a plurality of data blocks each preceded by a respective data block code; and (ii) arranging the data within each block into a plurality of data items each preceded by a respective data name; wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.

The first common feature may be the text string 'data~', and the members of the first set are of the form 'data~blockcode' where 'blockcode' is a unique block code in each case. The second common feature may be just an underline '#', and the members of the second set are of the form ',name' where 'name' is a respective data name.

The nature of the file is preferably such that it is readable as text in addition to being machine readable.

Each text line contains up to a pre-set maximum number of visible ascii characters. The limit will normally be set at eighty.

Each data item may be directly preceded by the respective data name. Alternatively, a plurality of data names in a group may be followed by a like plurality of data items repeated a desired number of times.

The data handled may relate to any desired subject, but the method is especially suitable for crystallographic data.

The method is especially suitable for the archiving of data and for inputting data to data-bases because of its facility for upwards compatibility and flexibility.

The method is also particularly advantageous for the electronic transport of text and data, via computer networks or magnetic media. It is particularly wellsuited for submitting publications to technical journals.

According to a second aspect of the present invention there is provided means for structuring or storing data within a file comprising: (i) means for arranging the file into a plurality of data blocks each preceded by a respective data block code; and (ii) means for arranging the data within each block into a plurality of data items each preceded by a respective data name; wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.

According to a third aspect of the invention there is provided a data file comprising a plurality of data blocks, each preceded by a respective data block code, and, within each block, a plurality of data items each preceded by a respective data name, wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.

The many types of information to which the file is well suited include crystallographic data.

According to a fourth aspect of the present invention there is provided a method of retrieving data from a file of the above type, comprising listing the requested data items and outputting the requested data items in the order requested, the output file having the same format as the accessed file.

A preferred embodiment of the present invention will now be described, by way of example.

First of all it will be of assistance to review three existing pre-defined free format" files.

The BCCAB archive file is used by the Cambridge Data Centre (U.K) to prepare the packed crystallographic organic structural data base file ASER. In Appendix 1 is an extract from one entry of the BCCAB file. The format is "free" in the sense that many lines have an identifying code (e.g. #Author) which provides flexibility in the order of lines, and for optional line input. Certain data items are "free" in that they are separated either by a single blank or comma.

However, all line identifying codes, and many data sequences, are predefined and have a fixed function within the BCCAB definitions. Software processing this format expects predefined protocols -to be observed.

Violations of this protocol, or the presence of foreign data, will out of necessity be treated as a processing error and terminate data access.

The second example of a "pre-defined free format'1 file is that used by the XTAL3.0 Crystallographic Program System (Hall & Stewart, 1990), as shown in Appendix 2.

It is classed as a "free format" file because every line, and many individual data items, are tagged with an identification code. This provides for variations in the order of line input but only within strict guidelines. In this file the program initiation lines (those with the line codes in upper case letters) may be in order but the optional control lines (codes in lower case letters) are specific to a particular program. Data items, and data codes, are also specific to a line. Violation of an input rule will terminate data processing of this file. These types of restrictions are typical of those placed on many "predefined free format" files.

The last example of a "pre-defined free format" file is the Standard Crystallographic File Structure (Brown, 1988) as shown in Appendix 3. This is an archival file structure which is more restrictive than the previous two examples. There is some flexibility in the order data sequence (note the end-of-sequence code *EOS) but the data items and the character positions within a sequence are fixed. The addition of extra date types to a SCFS file is almost impossible without invalidating the format of previously archived data.

Turning now to the present invention, a Self-defining Text Archive and Retrieval (STAR) file, is proposed especially for the computer archiving and electronic transmission of text and numerical data. This file contains standard ascii text which defines both the data structure (i.e. the arrangement of the data) and the data items. Each data item is explicitly identified by a name and these may be stored in any order. Simple syntactical rules applied to the data names provide access to each data item in a STAR file.

No other knowledge of the data items is required.

A STAR file is normal text data that can be edited and read with a text editor. Its contents are intelligible as text and can be stored or transmitted electronically without conversion. The structure of a STAR file is simple. Each file is divided into a sequence of data blocks which contain individual data items. The identity of each data item is determined by a preceding data name. It is possible to repeat data items by placing them within simple looping structures.

It should be noted that a STAR file can be defined by only a few simple rules. This ensures maximum flexibility in data storage and its widest possible applicability. No assumptions are made about the order of the data blocks or data items, other than the requirement that identifying names be unique. There are no rules regarding the placement of data names or data items within a data block, other than the requirement that the name must precede the item. Access to data in a STAR file is made simply by requesting a specific data name within a specific data block. No prior knowledge is needed about either the data type, whether the item is looped, or whether an item exists in the file.

As an introduction to STAR file concepts, here are some examples of data syntax. A data block is identified by a unique string with the construction 'data,blockcode', An example follows.

data#crystal~structure A data item is identified by a unique data name which starts with an underline '~'. Three examples of data names followed by their associated data items, follow.

-cell-~volume 2310 (2) ,chemical-formula 'C23 H36 07' ,publication,author~address Prof Barry O'Connell Department of Chemistry University of Kalamazoo Michigan U.S.A.

A data item may be repeated individually or in a group.

These are referred to as looped data items and are specified with a 'loop,'string. Here is an example of looped data items.

loop~ ,exptl~crystal~face~h ,exptl,crystal~face~k ,exptl,crystal,face~1 ,exptl~crystal,face~distance 0 0 1 0.012 0 0 1 0.012 1 0 0 0.023 -1 0 0 0.023 A STAR file is a formatted sequential file containing text lines of standard visible ascii characters. It may be viewed or edited with any standard text editor.

A STAR file is divided into any number of sequential data blocks. The information within a data block defines the data structure (i.e. the data order), and the data items. All of this information is intelligible as text.

The following seven syntax rules provided the specifications for a STAR file.

1. A text string is defined as either a sequence of non-blank characters, a sequence of characters bounded by matching single or double quotes (i.e. < ' > or < " > ), or a sequence of lines bounded by a semicolon < ; > as the first character of a line. A text string must not span more than one line, except if bounded by semicolons.

2. A data name is a text string starting with an underline' 3. A data item is a text string not starting with an underline '~', and preceded by the identifying data name.

4. A data loop is a list of data names, followed by a repeated list of data items, and preceded by the text string 'loop~'.

5. A data block is a sequence of data names, data items and data loops preceded by the text string 'data~ blockcode' where 'blockcode' is a unique block code.

of blank or tab characters is used only to separate text strings.

7. Except if contained within a text string, a single sharp '&num;' signals that the characters following on a line are used for comment only.

The key to accessing a STAR file is the data name. It is essential that the data names needed for a given application be defined carefully and precisely in a distributed Glossary. Data names and their definitions must not be changed in the lifetime of the archive file, but new names and definitions may be added as needed. A glossary does not restrict the data that can be stored in a STAR file; it is only to provide information about data items in general use.

One application of the STAR file is as a basis for a Crystallographic Information File (CIF). This application will be used to illustrate the STAR file concepts.

Since the CIF in intended only for crystallographic data and text, this application has imposed some formatting constraints, other than those of the STAR syntax, which simplify data handling but do not inhibit flexibility. These constraints involve certain data typing and the text string limitations which may be of use in other scientific applications and are cited here.

1. Lines may not exceed 80 characters in length.

2. Data names and block codes may not exceed 32 characters in length.

3. A data item is assumed to be of type number if it is not bound by matching single or double quotes, and starts with digit 0-9, a plus '+', a minus '-', or a period '.'.

A number may be in integer, real or scientific format.

If a number is concatenated with another number bounded by parentheses, it is taken to be the standard deviation [ e.g. nn.nnn(m)#.

4. A data item is assumed to be of type text if it extends over more than one line.

5. A data item is assumed to be of type character if it is surrounded by matching single and double quotes and is not either of type number or type text.

6. Only one level of loop, data is permitted.

Additional levels of repeated data must be stored as lists within a single text string.

Appendix 4 shows an example of a CIF file containing two data blocks 'manuscript' and 'crystal-structure'.

Data is retrieved from a STAR file by locating its data name. This would normally be done by 'parsing' the file and locating a request list of data names.

Existing software called QUASAR uses this approach to access a STAR file. Data items and data blocks are output by QUASAR in the order requested. The QUASAR output file is also in STAR format. For a given data block the same data item may be requested up to 5 times. The STAR file is always checked for logical integrity. The names of the archive file (i.e the input STAR file) and output file are specified as the strings 'star,arc' and 'star,out', respectively. These are entered at the start of the requested list. In the example request list shown in Appendix 5 these files names are 'qtest.arc' and 'qtest.out'.

Appendix 6A and 6B shows the file 'qtest.out' which is output after entering the request list of Appendix 5.

The output is itself a STAR file that can also be processed by a request list. Note that requested items missing from the archive file are flagged with '??'.

The above-described file format and the associated method of handling data have the advantage of generality, upwards compatibility and flexibility. The file is machine-independent and portable so that data items are accessible quite independently of their point of origin. It is fundamental that the file allows for future data to be incorporated without the need to modify existing files.

The STAR file format meets the requirements of a "universal" archival file. It may be used for archiving all types of text and numerical data, in any order. It is particularly suited to electronic transmission purposes.

The advantages of upwards compatibility and flexibility are two very desirable properties for any new archive system. These properties are especially important for fields, such as crystallography, where there is a wide diversity of data types, and where the archival requirements may vary from site to site. It is essential that data files written in one laboratory can be read easily in another, independent of the software on which it was generated. It is also important that these files can be easily "viewed" without the need for sophisticated archival software.

Also important for the long term is the flexibility and the eye-readable nature of the STAR format. Because a CIF may contain local as well as 8global" data items, it is ideal for internal as well as external data communication purposes. Existing program systems, such as XTAL, currently use self-defining binary files internally because these are faster and more compact than character files. As computer technology improves the value of a flexible, eye-readable, and easily editable, character format outweighs speed and disc considerations.

If parts of a file are lost, e.g. during electronic communication, the whole file is not corrupted; thus the file format has the advantage of being robust.

With a data-base such loss of characters might cause corruption.

APPENDIX I ?CILHIO12 FIGXIC #ADATE 880114 #MDATE 891013 &num;UNIS int 3 since 2 ig 1 ac 1 JRNL 68,41,1319,1986 AUTHOR H.Endres,H.J.xeller,R.Swietlik,D.Schweitzer,K.Angermund,C.Kruger EQUAL alpha-phase,at 100 deg.K PROPS Note: at RT, a 9.211(2),b 10.850(4),c 17.488(5),alpha 96.95(2), beta 97.97(2),gamma 90.75(2),v 1717.

#SYSCAT sys A cat 3 CELL a 9.068(2) b 10.721(3) c 17.403(2) alpha 96.56(1) beta 97.75(1) gamma 91.14(2) z 2 cent 1 sg P-1 *RFACT R= 0.0500.

#ERROR Author has supplied missing x coordinate for 12.

CREF nbsid 569947 batch 102 CLASS 1/60 1/39 XCOMPND bis (bis (Ethylenedithio)tetrathiafulvalene) tri-iodide FORMAL C10 H8 S8 1+,C10 H8 s8,I3 1 #CONN El= S 3 4 5 6 11 12 13 14 21 22 23 24 29 30 31 32 I 37 38 39 V= 3 3 V= 3 37 Ch- + 3 Ch= - 37 Res- Plot= 1 B= 1 1-3 1-4 2-5 2-6 3-7 4-8 5-9 6-10 7-11 8-12 9-13 10-14 11-15 12-16 13-17 14-18 15-16 17-18 B- 2 1-2 7-8 9-10 Res- Plot= 1 8= 1 19-21 19-22 20-23 20-24 21-25 22-26 23-27 24-28 25-29 26-30 27-31 28-32 29-33 30-34 31-35 32-36 33-34 35-36 B= 2 19-20 25-26 27-28 Res= Plot= 1 B= 1 37-38 37-39 #DIAGRAM 435 703 536 703 376 782 377 623 596 782 595 623 284 753 284 654 688 753 688 654 197 803 196 603 775 803 776 603 110 753 110 653 862 753 862 653 770 495 771 292 100 444 100 343 858 444 858 343 562 100 461 100 662 100 0 0 0 0 0 0 &num;SYMM x, y #RADIUS C 0.68 H 0.23 1 1.40 S 1.02 #TOLER 0.40 #ATOM I1 I1 0.0000 0.5000 0.5000 12 12 0.3100,2 0.5758,1 0.4936,1 13 13 0.5000 0.0000 0.5000 14 14 0.8102,1 -0.0756,1 0.5124,1 S1 S1 0.8918,2 -0.0600,2 0.1135,1 S2 52 0.7017,2 0.1453,2 0.0663,1 S3 S3 0.7229,2 0.2887,1 0.2208,1 ......................... data omitted for brevity S15\C19 515 C19 1.828,7 S16\C18 S16 C18 1.760,7 C19\C20 C19 C20 1.503,9 #CRYCON 2 0 4 0 21 21 2 22 26 26 27 28 31 31 32 33 36 36 37 38 26 5 6 7 8 0 9 10 11 12 57 13 14 15 16 70 17 18 19 20 24 24 25 25 29 29 30 30 34 34 35 35 39 40 1 3 58 60 57 62 59 64 61 65 63 64 64 65 65 71 73 70 75 72 77 74 78 76 77 78 22 23 24 25 27 28 29 30 32 33 34 35 37 38 39 40 60 61 73 74 #FLAGS err 0 pol 0 val 1 sb 0 bca 0 bcb 0 #END APPENDIX 2 title glycyl-l-leucine C8H16N203 P21 rspecifythepagehcader compid glycyl : specify the file names STARTX : load symm & cell data cell 6.369 5.565 15.350 90 102.77 90 :specifythecell cellsd 0.005 0.005 0.010 0 0.04 0 specifythecellerrors sgname P 2yl : specify the space group symmetry celcon C 16 : specify the carbons in the cell celcon H 32 : specify the hydrogens in the cell celcon N 4 : specify the nitrogens in the cell celcon 0 6 : specify the oxygens in the cell ADDREF : convert intensites to IFI s reduce itof rlp3 : specify I-lo-F and Lp function hklin skip hkl irel sigi rcod : specify hkl dala to be processed hkl glycyl 0 0 2 98695 453 1 hkl glycyl 0 0 4 864 22 1 ....................... reflection data omitted for brevity hkl glycyl 8 7 18 24 34 2 hkl glycyl 8 7 19 33 28 2 ADDATM : add atomic parameters to the bdf cl .5673 .9521 -.0362 .045 c2 .5285 .8792 -.1064 .042 ....................... atom data omitted for brevity o5 .7819 .5346 .4362 .033 o6 .8793 .4682 .5562 .035 CRYLSQ ws is cy 4 : refine peak sites from PIG PIG : check refined atom sites CRYLSQ ws an cy 4 : refine atom sites anisotropically FOURR diff : calculate difference map PEKPIK : pick the top peaks from diff map PIG : select H atom sites using graphics peaks prad 0.7 : specify peak site bond radius REGFE : cale esd's with correlation matrix BONDLA cdis dang : calc all dist & angle data LSQPL : cale least squares plane plane : specify plane calculation define C3 C4 C7 : specify which sites define the plane LISTFC s m f : list hkl and F data for publication ATABLE : list the atom parameters as tables PIG : useuse graphics to orient molecule ORTEP : generate ellipsoid plots exec mole :specify plot single molecule only sphere C .2 H .1 N .2 0 .25 : specify sphere radii for cell plot PLOTX laser : plot both frames on hi res laser CIFIO : input and output CIF archive/publicalion APPENDIX 3 TITLE REFORM BARNEY 3/03/85 PAGE 1/ 2 *EOS CELL DIMENSIONS A B C ALPHA BETA GAMMA 19.6612 8.9988 13.8816 90.0000 99.3561 90.0000 ERRS .0088 .0027 .0074 .0000 .0406 .0000 *EOS SPACE GROUP NOTATION NROT NRSZ FMUL CC 2. 2. 4.

*EOS SYMMETRY Ril 2 3 T1 R21 2 3 T2 R31 2 3 ~ T3 1 0 0 .0000000 0 1 0.0000000 0 0 1 .0000000 1 0 0 .0000000 0-1 0.0000000 0 0 1 .5000000 *EOS FORM FACTOR ST/L F+FD FDD C .0000 6.0169 .0000 C .0200 5.9758 .0000 C .0400 5.8557 .0000 ...........form form factors omitted for brevity H 1.9600 .0015 .0000 H 1.9800 .0015 .0000 *EOS CONDITIONS 1 LAMBDA TEMP(K) SCALE LIN ABS OBS DENS CALC DENS 1.541800 293.00 .518931 .0000 1.1000 *EOS ATOM NAME X Bll Y B22 Z B33 U B12 P B13 B23 MUL FFN ITF ATCO Ol .47539 .85308 -.13813 .00000 1.00000 8 3 2 BETA Ol .00425 .01714 .00728 -.00106 .00294 .00187 ATCO 02 .50519 .65074 .13030 .00000 1.00000 8 3 2 BETA 02 .00149 .01131 .00718 .00017 -.00013 -.00064 ..............atom atomdata omitted for brevity ATCE H31 .00000 .00000 .00000 .54772 .00000 BETE H31 .00000 .00000 .00000 .00000 .00000 .00000 *EOS HKL H K L ST/L TBAR RC ML E P S FREL SIGFR FCAL XTAL 0 0 2 .0730 .3000 1 1 4 1 1 222.79 1.01 129.76 XTAL 0 0 6 .2190 .3000 1 1 4 1 1 15.74 .49 8.98 ................. reflection data omitted for brevity XTAL 22 2 -2 .5707 .3000 1 2 2 1 1 31.94 .70 16.09 XTAL 22 2 -3 .5707 .3000 1 2 2 1 1 16.83 .74 9.01 *EOS END APPENDIX 4 data manuscript ~manuscript summary This is some dummy text to show how a multiple data-block STAR file works.

data crystal structure chemical formula 'C13 H12 O5' chemical name 3- (2, S-dihydro-4-hydroxy-5-oxo-3-phenyl-2-furyi)propionic acid publication~title Structure of WF-3681, 3- (2, 5-Dihydro-4-hydroxy-5-oxo-3-phenyl-2-furyl) propionic Acid.

loop~ gublicatiopautho rpame publication author address "O'Connell, Barry" Department of Chemistry University of Kalamazoo Michigan U.S.A.

'Clark, Joan I.' University of Washington Seattle WA 98195 U.S.A.

;cell a 18.757(8) ceil~b 7.282(2) cell c 17.511(8) cell alpha 90 cell beta 91.20(3) cell gamma 90 ~cell volume 2391(3) symmetry~space~group '-C 2yc' loop~ symmetr~pos in xyz 'x,y,z' '-x,-y,~Z' '-x,y,l/2-z' 'x,-y,l/2+z' '1/2+x,1/2+y,z' '1/2-x,1/2-y,-z' '1/2-x,1/2+y,1/2-z' '1/2+x,1/2-y,1/2+z' exptl radiation wave length 1.54179 loop exptl crystal face h dexptl crystal face k exptl crystal face 1 exptl crystal face distance 0 0 0 -1 0.012 0 0 1 0.012 1 0 0 0.023 -1. 0 0 0.023 loop atom site label -atom site-x/a atom~sitey/b atom site~z/c atom/site U iso atom x/a - y/b zic U Cl .6237(1) -.2055(4) -.3119(2) .053 C2 .6022(2) -.2468t6) -.2322(2) .059 05' .7504(1) .0454(3) .0417(1) .056 APPENDIX 5 star~arc~qtest.arc star~out~qtest.out data manuscript manuscript summary data crystal~structure chemical name ~publication title gublication author name publication~authoraddress ~cell~a cell b cell c cell alpha ~cell~beta -cell pamma ~chemical~name ~symmetry~space~group ~symmetry~pos~in~XYZ ~atom site label~ ~atom~site~x/a -atom site y/b ~atom~site~z/c atom site U iso ~atom site label~ ~expt1 radiation~wave length exptl radiation type ~exptl~crystal face distance ~expt1~dummy exptl~crystal face h -exptl~crystal face k ~exptl~crystal~face~1 tom site label ~atom site U iso~ ~ #ublication~author~name data~manuscript ~manuscript~summary APPENDIX 6A data manuscript manuscript summary This is some dummy text to show how a multiple data-block STAR file works! # end-of-data-block data~crystal~structure chemical name 3-(2,5-dihydro-4-hydroxy-5-oxo-3-phenyl-2-furyl) propionic acid publication title Structure of WF-3681, 3-(2,5-Dihydro-4-hydroxy-5-oxo-3-phenyl-2-furyl) propionic Acid.

loop~ gublication author name gublicationuthoraddress "O'Connell, Barry" ; Department of Chemistry University of Kalamazoo Michigan U.S.A.

'Clark, Joan I.' ; University of Washington Seattle WA 98195 U.S.A.

~cell~a 18.757(8) -cell b 7.282(2) ~cell 17.511(8) ~cell~alpha 90 ~cell beta 91.20(3) cel l#gamma 90 chemical name 3-(2,5-dihydro-4-hydroxy-5-oxo-3-phenyl-2-furyl) propionic acid symmetry~space-group '-C 2yc' loop~ symmetry pos in xyz 'x, y, z' '~x,~y, Z '-x,y,l/2-z' tx, -y, l/2+z' '1/2+x,1/2+y,z' '1/2-x,1/2-y,-z' '1/2-x,1/2+y,1/2-z' '1/2+x,1/2-y,1/2+z' APPENDIX 6B looe atom site label ~atom~site~x/a ~atom~site~y/b -atom site z/e ~atom site u iso~ ~ atom site label C1 .6237(1) -.2055(4) -.3119(2) .053 C1 C2 .6022(2) -.2468(6) -.2322(2) .059 C2 05' .7504(1) .0454(3) .0417(1) .056 05' ~exptl~radiation~wave~length 1.54179 expt1 radiation~type ?? # requested item not present loop~ exptl crystal~face~distance expt 1 dummy + ?? requested item not present exptl crystal face h expt 1-crystal face k expt l-e rys t a I~f a cel 0.012 ?? 0 0 -1 0.012 ?? 0 0 1 0.023 ?? 1 0 0 0.023 ?? -1 0 0 loop~ atom site label ~atom~site~u~iso Ci .053 C2 .059 05' .056 loop~ gublication authoyname "O'Connell, Barry" 'Clark, Joan I.' # -----end-of-data-block----data~manuscript manuscript summary This is some dummy text to show how a multiple data-block STAR file works! # -----end-of-data-block----

Claims

Claims 1. A method of structuring or storing data within a file comprising the following steps: (i) arranging the file into a plurality of data blocks each preceded by a respective data block code; and (ii) arranging the data within each block into a plurality of data items each preceded by a respective data name; wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.
2. A method of structuring or storing data within a file according to claim 1, wherein the first common feature is the text string 'data~', and the members of the first set are of the form 'data~blockcode' where 'blockcode' is a unique block code in each case.
3. A method of structuring or storing data within a file according to claims 1 or 2 wherein the second common feature is an underline '~', and the members of the second set are of the form ' name' where 'name' is a respective data name.
4. A method of structuring or storing data within a file according to any preceding claim, wherein the file is readable as text in addition to being machine readable.
5. A method of structuring or storing data within a file according to claim 4, wherein each text line contains up to a pre-set maximum number of visible ascii characters.
6. A method of structuring or storing data within a file according to any preceding claim, wherein each data item is directly preceded by the respective data name.
7. A method of structuring or storing data within a file according to any of claims 1 to 5, wherein a plurality of data names in a group are followed by a like plurality of data items repeated a desired number of times.
8. A method of structuring or storing data within a file according to any preceding claim, wherein the data handled is crystallographic data.
9. A method of structuring or storing data within a file substantially as described herein with reference to Appendix 4.
10. Means for structuring or storing data within a file comprising: (i) means for arranging the file into a plurality of data blocks each preceded by a respective data block code; and (ii) means for arranging the data within each block into a plurality of data items each preceded by a respective data name; wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.
11. A data file comprising a plurality of data blocks, each preceded by a respective data block code, and, within each block, a plurality of data items each preceded by a respective data name, wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.
12. A data file according to claim 11, wherein the file relates to crystallographic data.
13. A data file substantially as described herein with reference to Appendix 4.
14. A method of retrieving data from a file according to claims 11 or 12, comprising listing the requested data items and outputting the requested data items in the order requested, the output file having the same format as the accessed file.
15. A method of retrieving data from a file substantially as described herein with reference Appendices 5 and 6.