WO2000041097A1

WO2000041097A1 - Method for generating a file identifier by performing a hash function on the content of a file

Info

Publication number: WO2000041097A1
Application number: PCT/EP1999/010089
Authority: WO
Inventors: Alexander Kolbeck; Bernhard Seen; Thomas Frey; Martin Merck; Thomas Stocker
Original assignee: Giesecke & Devrient Gmbh
Priority date: 1998-12-30
Filing date: 1999-12-17
Publication date: 2000-07-13
Also published as: DE19860803A1; AU2284200A; EP1064607A1

Abstract

The invention relates to a method for generating a file identifier for a file to be written into the memory of a processor integrated circuit. According to the invention, at least one part (15) of the file identifier (13) is generated by performing an unequivocal mathematical operation on a predetermined quantity of the code characters (16) of the data code contained in the file (12). Said operation generates a code sequence (23) of a predetermined number of characters for said quantity of code characters. Preferably, the file identifier additionally comprises one part (14) which contains information on the content of the file.

Description

METHOD FOR CREATING A FILE IDENTIFIER BY APPLYING A HASH FUNCTION TO THE FILE'S CONTENT

The invention is based on a method according to the type of the main claim. Such a method can be found in the DIN standard EN 27816-5. This describes possible designs of application identifiers, each of which characterize an application on a chip card. According to a design form that is important in practice, an application identifier consists of two sections, the first section containing a registered identifier of an application provider and the second section an extension which can be freely defined by a user. The use of a registered identifier helps to avoid the assignment of identical application identifiers by different providers. To avoid duplicating identical file identifiers, however, the identifier components assigned by the user must also be managed. For example, the systematic designation of a further development of an existing application should determine whether and which other further developments of the same application already exist. Such administration is complex and prone to errors. If there is a double assignment of a file identifier, it is usually not immediately recognizable whether the identically named files also functionally match.

It is therefore an object of the invention to provide a method which simplifies the testing of a number of files for functional correspondence with a comparison file.

This object is achieved by a method with the features of the main claim. The proposed method is based on the concept of not leaving the choice of a file identifier to the manufacturer of a file alone, but to derive at least part of the identifier from the data code contained in a file. The method according to the invention provides a file identifier that is unique for each file and is permanently linked to the file. It is no longer necessary to manage file identifiers to avoid double assignment of individual identifiers. A particular advantage of the method according to the invention is that third parties can also use it to create certain files for integration into existing applications without the risk of an identifier collision. The easy verifiability of databases for existing files of the same type helps the file manufacturers to avoid incomplete or multiple version numbers. The method according to the invention also offers the advantage that the file identifier can be used to check the content of a file for correctness and completeness. Every data element corrupted, for example, by an incorrect transmission leads to a new file identifier which no longer matches the original. In an advantageous development, the file identifier comprises a number of independent partial identifiers. This allows the selection of a file according to several criteria and thus supports file management. A file identifier is advantageously generated by applying a hash function to the data elements of a file.

A particularly suitable use of the proposed method is an arrangement with the features of independent claim 6. The use in a data carrier with a circuit containing a processor is particularly advantageous because errors generally no longer occur after the data carrier has been transferred to an end user can be remedied. The reliable avoidance of multiple storage of identical files ensures the best possible memory utilization. An embodiment of the invention is explained below with reference to the drawing. Show it:

1 a pro ram structure with several applications,

2 structure and interaction of several applications,

3 shows the structure of a file identifier,

4 is an illustration of the mode of operation of a file identifier-forming function, and

5 shows the structure of a file with a file identifier comprising several partial identifiers.

1 illustrates, in a highly simplified manner, the structure of a circuit having a processor with an operating system 20 and an associated memory 21. Arrangements with the structure shown are implemented in chip cards, for example. A plurality of programs 10 implementing technical applications, referred to below as applications, can be located in the memory 21 at the same time. Possible applications 10 in the case of a chip card are, for example, user authentication for account transactions or a wallet function. The applications 10 are typically present in code suitable for interpretation by an interpreter unit or in machine code that can be executed directly. Each application 10 contains a code sequence 11 serving as an application identifier, by means of which it can be selected by a calling institution via the operating system 20. The application identifier 11 is also known to the calling institution for this purpose. The application identifier 11 is by the Manufacturer of an application 10 specified, the use of code components pointing to the content of the application 10 is common to facilitate administration. It is regularly clear for the associated application.

2 illustrates the substructure of an application 10. It is divided into a number of files 12, referred to below as a package file. Each of these is uniquely identified by a file identifier 13; identical identifiers for two package files 12 within an application 10 are not permitted. Each packet file 12 generally implements a single function, such as controlling the transmission of data to an external unit or managing certain, similar data. The packet files 12 each contain a data code consisting of directly executable or to be interpreted code characters 16, as indicated on the right in FIG. 2 by dashes. An application 10 can comprise one or, as indicated in FIG. 2 on the left, several package files 12. Certain package files 12 can be part of different applications 10 in the same form. In order to reduce the memory requirement, such package files 12 are not inserted into each application 10, but rather a package file 12 that is present exactly once is cross-called by all applications 10 that use them. Such a cross call of a file in a first application by a second application is indicated in FIG. 2 by an arrow.

In order to enable reliable cross-callability, the package files 12 are not only clearly identified within an application 10, but also across all applications in the entire memory space 21. 3 shows the structure of a suitable unique file identifier 13. It is divided into two sections based on the DIN standard EN 27816-5, of which the first 14 is a characteristic provider that can be assigned multiple times. contains code sequence. According to the standard, it has a length of five bytes and is preferably registered centrally, in particular by the authorities. The second file identifier section 15 contains a user code sequence which can be freely defined by a manufacturer or user. It is unique for an associated package file and, according to the standard, has a length of up to 11 bytes.

The freely defined user code sequence 15 obtains its uniqueness by applying a predetermined, file identifier-forming mathematical operation to the package file 12 to be designated, which generates a code sequence of a certain length from the code characters 16 contained therein. Such performing operations are particularly known as a "hash" function. Hash functions are used in particular for the production of digital signatures and are described in detail in the specialist literature, e.g. in the Handbook of Applied Cryptography, Menezes, van Oorschot, Vanstein, CRC Press 1996, Chapter 9. It is characteristic of hash functions that they convert a string of indefinite length into a string of certain length in an irreversible manner. According to the invention, such a hash function is now applied to the code characters 16 contained in a package file 12. The contextual content of the character string used is not taken into account.

Figure 4 illustrates the effect of a hash function. By using the hash function 22, a code sequence 23 with a defined length, which has a predetermined number of h, with, is formed from the n data code of a packet file 12 comprising n = 1, 2,..., N, code characters 16 h = 0, 1,2, ..., N, characters. The code sequence 23 is determined only by the characters 16 contained in a package file 12 and uniquely identifies the package file 12. In order to avoid that two different packet files 12 lead to the same code sequence 23 due to random coincidence. ren, their length, ie the number of characters h, is specified so that the random generation of the same code sequences 23 from different packet files 12 is excluded with sufficient certainty. Suitable hash functions for the intended use can be found in the specialist literature, including the above-mentioned reference, and are therefore not described further here. The code sequence 23 formed by means of the hash function can be used directly as the user sequence 15 in the file identifier 13.

The entire data code of a packet file 12 is expediently subjected to the hash function for the formation of the user sequence 15. However, it is also possible to use only a section of the entire data code which is limited by its length, its position, a fixed character or in a similar manner. It can be particularly useful to use a user sequence 15 consisting of a plurality of partial identifiers, each partial identifier being assigned to specific file contents. An example of this is indicated in FIG. 5. The file identifier 13 here comprises a user sequence 15 formed from a total of three partial identifiers 17, 18, 19. The first and the third 17, 19 relate to areas of the data code forming the package file 12 that are separate from the code that realizes the application function. The example in FIG. 5 is based on the structuring of the data code of a packet file 12, which is frequently found in practice, into an interface part 24, which describes the behavior of an implementation brought about by a data code, and an actual implementation part 25. For example, a packet file 12 is used for encryption of data, the type of the expected parameters and the format of the returned data could be stored in the interface part 24, the coding of the encryption function itself in the implementation part 25. Both file code components 24, 25 are now in the example According to FIG. 5, using the method described with reference to FIGS. 3 and 4, a separate part identifier is assigned to each, the interface part 24 an interface identifier 17, the implementation part 25 a content identifier 18. The third part identifier 19 results from the name assigned to a file by the manufacturer, the name of the manufacturer and / or the name of the type of application. The use of several part identifiers 17, 18, 19 increases the number of options for selecting a package file 12. For example, in the example according to FIG. 5, whole groups of package files 12 of the same type can be selected simultaneously via the interface identifier 17 or via the Address content identifier 18 individually. If a specific package file 12 is not available, an alternative can easily be found which shows at least the same basic behavior.

As an alternative to the mathematically implemented version or in addition to the above-described, possibly multi-part file identifier 13, a package file 12 for referencing can also contain a file name 26, which is stored externally and is stored in plain text in the usual way. Carrying conventional file names can be particularly useful to make package files 12 easily recognizable or to easily introduce another feature to improve the referencing of a package file.

It is widespread to document the creation of package files over time by version numbers, ie by an ascending sequence of integers beginning with 1. Such version numbers are expediently also attached in plain text to file identifiers such as file names 26 formed according to the invention. The above-described file identifier formation allows automation of the version number assignment. Should a If a further package file is added to the database, which can already contain package files of the same type, a file identifier 13 is first formed as described above, which has at least one part identifier 17, 18 that identifies the manufacturer, the type of application and / or another file group characteristic. 19 contains. The entire file identifier 13 is then checked for agreement with the file identifiers of the package files already stored in the database. If a complete match is found, the package file to be added receives the same version number 26 as the found match. If a match is found only with regard to the specified part identifiers 17, 19 but not with regard to the content identifier 18, the new package file is given the version number next to the one assigned to the found matching file.

The proposed method is suitable for determining file identifiers 13 within the framework of the DIN standard 27816-5. However, it is not restricted to this application. Likewise, the application identifier 11 for the applications 10 can be obtained in the same way. File identifiers 13 formed according to the invention do not necessarily have to have a two-part basic structure with provider code sequence 14 and user code sequence 15, but can also be formed entirely, for example, by a code sequence obtained as a result of the application of a hash function. Furthermore, the length of the file identifier 13 is not fixed at 16 bytes. Rather, any length is possible. The same applies analogously to the length of the user code sequence. Instead of the two part identifiers 17, 18 shown in FIG. 5, more part identifiers can also be provided. The proposed concept is particularly suitable for mobile data carriers in the form of chip cards because of the good use of storage space that is possible because the storage space is very limited due to the size of the device. Multiple storage of identical package files in different applications should be avoided here in particular. In the case of chip cards, the hash function can be implemented in the integrated circuit of the card. The newly loaded package files 12 are then provided with file identifiers 13 only during the loading process. By comparing the file identifier 13 determined during loading with those already present on the card, it can be easily and reliably checked whether a package file 12 already exists on the card. Existing package files 12 are not loaded. With the aid of the file identifier formed by means of a hash function, any errors that may be present in the data code of a package file 12 can also be made directly visible in a simple manner. The proposed method thus opens up the possibility for a provider to allow a user to bring his own package files 12 into the memory 21 of a data carrier without fear of double storage or unintended overwriting.

Claims

P a t e n t a n s r u c h e

Method for generating a file identifier for a file to be written into the memory of a circuit containing a processor, characterized in that at least a part of the file identifier (13) by applying a unique mathematical operation to code characters (16) of the data code contained in the file (12) is derived.

Method according to claim 1, characterized in that the unique mathematical function is irreversible.

3. The method according to claim 2, characterized by the following steps: determining a mathematical operation which irreversibly generates a code sequence with a predetermined number of characters for a set of code characters,

- Select at least part of the characters (16) of the data code of a file (12), - Apply the mathematical operation (18) to the selected ones

Characters (16) for generating a first code sequence (23),

- Accepting the first code sequence (23) at a designated position (15) in the file identifier (13).

4. The method according to claim 3, characterized by the following steps:

- A user specifies a second code sequence (14) of a certain length,

- Merging the second code sequence (14) with the first code sequence (23) into a file identifier (13).

5. The method according to claim 2, characterized in that the predetermined, irreversible mathematical operation (22) is a hash function.

6. A data carrier with a circuit containing a processor, which has a memory, in which there is an application program executable by the processor, which is at least partially constructed in the form of files which are stored in the memory under respectively assigned, individual file identifiers, thereby - indicates that the file identifier (13) of at least one file

(12) contains a component (15) which results from the application of a unique mathematical operation to the code characters (16) contained in the file (12).

7. Data carrier according to claim 6, characterized in that the file identifier (13) contains a second component in the form of a code sequence (14) of certain length, which is determined by the manufacturer of the file (12).

8. A data carrier according to claim 6, characterized in that the file identifier (13) contains at least two partial identifiers (17, 18), each of which is formed by applying a unique mathematical operation to different areas of the data code contained in the file (12).

Data carrier according to Claim 7, characterized in that the data identifiers (17, 18) each correspond to data code areas (24, 25) which perform separate sub-functions within the application (10) implemented by the file (12).