EP1125221A1

EP1125221A1 - Method and apparatus for improving code execution performance by using parsed html

Info

Publication number: EP1125221A1
Application number: EP99916169A
Authority: EP
Inventors: Michael Giroux
Original assignee: Bull HN Information Systems Inc
Current assignee: Bull HN Information Systems Inc
Priority date: 1998-05-13
Filing date: 1999-03-26
Publication date: 2001-08-22
Also published as: EP1125221A4; WO1999059087A1

Abstract

An interpreter (42) first parses source HyperText Markup Language (HTML) scripts (40) for active or dynamic Web pages into static and dynamic sequences. These static and dynamic sequences of HTML are written into variable length records in a parsed HTML file (44), with each such record consisting entirely of static or of dynamic HTML commands. A database server (46) in response to a query from a Web browser on a client computer (48) reads the parsed HTML (44), transmitting (47) static HTML records without interpretation to the browser (48), while interpreting the dynamic HTML records and performing required database (45) actions before transmitting (47) the results to the browser (48).

Description

METHOD AND APPARATUS FOR IMPROVING CODE EXECUTION PERFORMANCE BY USING PARSED HTML

Field of the Invention

The present invention generally relates to improving performance of code execution of a Web server, and more specifically to improving code execution performance in a client/server environment by using parsed HTML source code.

Background of the Invention

A World Wide Web (WWW) site typically consists of a collection of HyperText Markup Language (HTML) documents. HTML is a text language that provides for hyper-linked graphic display. A user of a Web browser utilizing the World Wide Web (WWW) typically requests that a Web server download HTML text to his Web browser. The Web browser interprets the downloaded HTML text and generates screen images from the HTML text. The HTML text invariably describes hyper-linked hot spots, that cause further downloads when selected.

HTML documents are typically generated by text editors or by more specialized HTML document editors and are stored as text files in directories on Web servers. By convention, HTML text files have file names that include extensions of either ".HTML" or ".HTM". Web servers recognize these file extensions and tread such documents as static byte streams. Static HTML files are typically transmitted verbatim by Web servers to Web browsers, and are thus fairly efficient for the Web servers to process.

One recent application that has gained popularity on the World Wide

Web is database access. HTML provides an efficient, flexible method of providing a sophisticated user interface to databases. Many WWW access requests ultimately are translated into database accesses. Part of the flexibility of using HTML for this type of application is that user interface changes tend to be reasonably easy and do not require the significant programming resources that were required by earlier generations of database interfaces.

To support dynamic or customized Web content, modern commercial

Web servers typically recognize additional file types. For example, Microsoft recognizes files with file extensions of ".ASP" indicating that the file is an "Active Server Page" containing various fields that must be analyzed by a Web server. In some instances, these various active or dynamic fields are embedded database requests.

Unfortunately, one problem that frequently arises when utilizing active or dynamic server pages is that having a Web server interpret each HTML command in a dynamic HTML file typically requires a significant amount of computer resources. The present solution is to scale the server up by replicating the database and database servers to the extent necessary to provide required levels of service. Currently, some applications are implemented with databases and database servers replicated upwards of thirty times, with access to the replicated servers provided by sophisticated high speed load leveling routers.

This approach works reasonably well for fairly small databases since the amount of data that needs to be replicated for each database server is fairly small. However, this solution does not scale well. In particular, this approach is currently totally infeasible for enterprise level databases consisting of terabytes of data. One reason for this infeasibility is that large companies often have a hard enough time keeping online access to a single copy of their enterprise level databases, given the size of these databases. ' Replicating the databases even a couple of times is not feasible. To this should be added the potential for concurrency problems between database copies that arise any time there is multiple database copies in use and the difficulty of supporting online update for replicated data.

There is therefore a need to provide an efficient mechanism for Web servers to process HTML files containing active or dynamic HTML commands. This is particularly important when the dynamic HTML commands provide access to enterprise level databases. Any such mechanism should reduce the overhead for interpreting active HTML scripts in these applications.

Brief Description of the Drawings

The features and advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying FIGURES where like numerals refer to like and corresponding parts and in which:

FIG. 1 is a block diagram of a data processing system, in accordance with the present invention;

FIG. 2 is a flowchart illustrating construction of parsed HTML file by a parser interpreting an original HTML file, as shown in FIG. 1 ;

FIG. 3 is a flowchart illustrating operation of a database server interpreting parsed HTML, as shown in FIG. 1 ;

FIG. 4 is a sample sequences of original HTML;

FIG. 5 is a sample sequence of parsed HTML (PHTML) generated from the sample HTML in FIG. 4; and

FIG. 6 is a block diagram illustrating a General Purpose Computer.

Detailed Description

An interpreter first parses HyperText Markup Language (HTML) scripts for active or dynamic Web pages into static and dynamic (or active) portions. The static HTML is typically transmitted without change to Web browsers in response to WWW requests from the browsers. Dynamic or active HTML commands on the other hand provide database access commands to the database servers. In response to these dynamic HTML commands, the database server will typically perform database queries, and

-3- report the result of these queries to the requesting browser. Static and dynamic sequences of HTML are written into variable length records in a parsed HTML file. Each variable length record will preferably consist entirely of static HTML commands, or entirely of dynamic HTML commands. Preferably, the dynamic HTML records and static HTML records alternate in the parsed HTML file in most instances. A database server in response to a query then reads the parsed HTML. Static HTML records are transmitted without interpretation to the Web browser, while the dynamic HTML records are interpreted, causing database queries to be performed, with the results being transmitted to the Web browser. One advantage of this mechanism is that it is not necessary for a database server to ever interpret the static HTML text. This results in significant savings of computing resources.

FIG. 1 is a block diagram of a data processing system, in accordance with the present invention. Original HTML text 40 stored on disk is parsed by a parser 42 into parsed HTML (PHTML) text 44, again stored in a Non- Volatile Storage Medium. The parser sequentially interprets the original HTML text 40. Sequences of static HTML commands are collected into static HTML records, and sequences of active or dynamic HTML commands are collected into dynamic HTML records. These variable length static and dynamic HTML records are then written to the parsed HTML (PHTML) text 44 file stored in a Non-Volatile Storage Medium.

A web server computer 46, then communicates over a communications line 47 with Web browsers, executing on client computers 48. In accessing enterprise level databases, the web server 46 is typically a mainframe computer. Presently, client computers 48 are typically personal computers (PCs). However, it is envisioned that clients 48 will take other forms in the future. The Web server computer 46 is utilized to provide database access to an enterprise level database 45. A Web server 46 database program sequentially reads the parsed HTML (PHTML) text 44 in response to a Web request. The contents of static HTML records are transmitted without modification to the client computer 48. Dynamic HTML records contain dynamic HTML commands. These dynamic HTML commands provide database query commands, which in turn result in

-4- database accesses to the database 45. The results of the database 45 search are then transmitted from the server computer 46 to the client computer 48 executing the Web browser.

A number of different formats of parsed HTML (PHTML) files 44 are envisioned by the current invention. A first file format utilizes records where a record length is followed by the record contents. The record length may be in ASCII, EBCDIC, or binary, depending on the characteristics of the database computer. In one preferred embodiment, a four-digit ASCII length precedes each record. This conforms to HTTP 1.1 chunk encoding and provides for record lengths of up-to 9999 bytes in length. File systems in most computer architectures automatically identify the end-of-file. For those file systems that do not provide this capability, a zero length record can be used to identify the end of the file. A zero length record can also be transmitted from the server 46 to the client 48 when utilizing HTTP 1.1 chunk encoding to indicate end-of-file. The first couple of bytes are then read from each record to determine whether the record contains dynamic HTML, or static HTML.

Many computer architectures directly support variable length records. In these architectures, it would most often be preferable to utilize a standard variable length record format. For example, the standard symbolic format utilized in Unisys 2200 systems is System Data Format (SDF). This format can be used directly. VAX systems by Digital Equipment Corporation support a similar variable length record format. Count key data records provide a variable record length format on International Business Machines. These architectures typically provide an accurate end-of-file indication, and thus would not require the use of the zero length records to indicate such.

One alternate embodiment is to store the parsed HTML (PHTML) 44 as variable length records in a database. Instead of interpreting the first couple of bytes in each record to determine whether it contains static or dynamic HTML, a flag can be included in each database record for this purpose.

Another alternate embodiment is to guarantee that static and dynamic HTML records alternate in the file. Zero length records may then be

-5- utilized for the rare situations where this alternating does not occur naturally. The reason that this would be a rare occurrence is a result of the operation of the flowchart in FIG. 2 below. Zero length records are typically only necessary when the record length exceeds the maximum supported record length. Taking this a step further, a single record descriptor could be used for alternating static and dynamic HTML text streams. Each such record descriptor would have a field for the length of the static HTML text, and a field for the length of the dynamic HTML text. For example, two four-byte record lengths could precede each double record. Utilizing such an encoding mechanism, an end-of-file could be indicated with both length fields being zero. Both of these alternate embodiments have the advantage that interpreting the first couple of bytes of each HTML record is not necessary to determine whether the record is dynamic or static, since this determination can be made positionally. One cost of these embodiments is slightly longer parsed HTML files 44.

However, the added length in most instances is probably de minimis. The original HTML file 40 is opened (not shown) and a loop is entered. The next HTML command is read, step 52. A test is then made whether the HTML command just read is of the same type, static or dynamic, as the HTML commands being accumulated, step 54. If the type of the HTML command just read, step 52, is different from the type of HTML commands being accumulated, a record containing the HTML commands being accumulated is completed and written out to the parsed HTML file 44, step 56. A new record is then started to accumulate the type of HTML commands just read, step 58. In any case, the HTML command just read, step 52, is then added to the record under construction, step 60. A test is then made whether there is any more input, step 62. If there is more input to be processed from the original HTML file 40, step 62, the loop repeats, and the next HTML command is read, step 52. Otherwise, when no more input remains from the original HTML file 40 to be processed, step 62, the last record is completed and written to the parsed HTML file 44, step 64, and that file is then closed (not shown). As noted above, this mechanism for building a parsed HTML file 44 typically generates alternating static and dynamic HTML records.

-6- FIG. 3 is a flowchart illustrating operation of a Web server 46 interpreting parsed HTML 44, as shown in FIG. 1. In response to a Web request, the Web server 46 opens the parsed HTML file 44 and enters a loop. Within the loop, the next record descriptor is read, step 72. A test is then made whether there are more records in the file, step 74. This is typically a test for end-of-file. Most modern computer architectures automalically provide such an indication. If the end-of-file has not been detected, step 76, a test is made whether the record contains dynamic HTML, step 80. In the preferred embodiment, interpreting the start of the first HTML command in the next record does this. However, other types of record formats would require correspondingly different types of testing. If the record contains dynamic HTML, the HTML in the record is processed, step 82. This typically results in database accesses in response to the HTML commands, with the results of the database queries being transmitted to the Web browser on the client computer 48. Otherwise, when static HTML is being processed, the entire record is transmitted to the Web browser executing on the client computer 48 without interpretation by the server computer 46, step 84. In either case, the loop is then repeated, starting with reading the next record descriptor, step 72. When no more records remain to be processed, step 74, end-of-file processing is performed, step 86. A record length of zero may be transmitted at this point to indicate end-of-file to the Web browser on the client system 48.

FIG. 4 is a sample sequences of original HTML 40. FIG. 5 is a sample sequence of parsed HTML (PHTML) 44 generated from the sample HTML 40 in FIG. 4. The first section of the original HTML 40 contains static HTML commands. In the parsed HTML 44, a record length of "0288" bytes is inserted before these static HTML commands. This is followed by an active or dynamic set of HTML commands. These are preceded by a length of "0059" in the parsed HTML 44. A second set of static HTML follows, proceeded by a length of "0281" in the parsed HTML 44. Finally, the parsed HTML 44 is terminated by a record length of "0000". The PHTML 44 file will typically not physically contain the zero record length indication, but rather this will typically inserted in the end-of-file processing in step 86 in FIG. 3. FIG. 6 is a block diagram illustrating a General Purpose Computer 20. The General Purpose Computer 20 has a Computer Processor 22, and Memory 24, connected by a Bus 26. Memory 24 is a relatively high speed machine readable medium and includes Volatile Memories such as DRAM, and SRAM, and Non-Volatile Memories such as, ROM, FLASH, EPROM, EEPROM, and bubble memory. Also connected to the Bus are Secondary Storage 30, F-xternal Storage 32, output devices such as a monitor 34, input devices such as a keyboard (with mouse) 36, and printers 38. Secondary Storage 30 includes machine-readable media such as hard disk drives, magnetic drum, and bubble memory. External Storage 32 includes machine-readable media such as floppy disks, removable hard drives, magnetic tape, CD-ROM, and even other computers, possibly connected via a communications line 28. The distinction drawn here between Secondary Storage 30 and External Storage 32 is primarily for convenience in describing the invention. As such, it should be appreciated that there is substantial functional overlap between these elements. Computer software such as the parser 42 and user programs can be stored in a Computer Software Storage Medium, such as memory 24, Secondary Storage 30, and External Storage 32. Executable versions of computer software 33, can be read from a Non- Volatile Storage Medium such as External Storage 32, Secondary Storage 30, and Non- Volatile Memory and loaded for execution directly into Volatile Memory, executed directly out of Non-Volatile Memory, or stored on the Secondary Storage 30 prior to loading into Volatile Memory for execution.

Those skilled in the art will recognize that modifications and variations can be made without departing from the spirit of the invention. Therefore, it is intended that this invention encompass all such variations and modifications as fall within the scope of the appended claims.

Claim elements and steps herein have been numbered and/or lettered solely as an aid in readability and understanding. As such, the numbering and/or lettering in itself is not intended to and should not be taken to indicate the ordering of elements and/or steps in the claims.

Claims

ClaimsWhat is claimed is:

1. A method of improving HTML code execution comprising: A) parsing a source HTML file stored in a non-volatile storage medium into a parsed FITML file stored in a non-volatile storage medium, wherein: the parsed HTML file comprises: a set of static HTML records containing static HTML commands, and a set of dynamic HTML records containing dynamic HTML commands, each of the set of static HTML records and each of the set of dynamic HTML records is a variable length record; B) reading the parsed HTML file; and C) processing the parsed HTML file read in step (B), wherein: each of the set of static HTML records is transmitted without interpreting the static HTML commands, and each of the set of dynamic HTML records is interpreted.

2. The method in claim 1 which further comprises: D) performing a database access in response to interpreting one of the set of dynamic HTML records; E) transmitting data as a result of the database access in step (D).

3. The method in claim 2 wherein: the database access accesses a hierarchical database.

4. The method in claim 2 wherein: the database access accesses a relational database.

5. The method in claim 1 wherein: each of the set of static HTML records and each of the set of dynamic HTML records is a record in a hierarchical database.

6. The method in claim 1 wherein: each of the set of static FITML records and each of the set of dynamic HTML records is a record in a sequential database.

7. The method in claim 1 wherein: each of the set of static HTML records and each of the set of dynamic HTML records is a record in a sequential flat file in a variable length record format.

8. The method in claim 7 wherein: the variable length format comprises: a four character alphanumeric record length, and an ordered set of alphanumeric characters.

9. The method in claim 7 wherein: an end of the parsed HTML file is indicated by a record length equal to zero.

10. The method in claim 7 wherein: the sequential flat file utilizes a count-key-data format.

1 1 . The method in claim 7 wherein: the variable length format comprises: a binary record length, and an ordered set of alphanumeric characters.

12. The method in claim 7 wherein: the variable length format comprises: a word length, a byte length of a last word, and an ordered set of alphanumeric characters.

-10-

13. The method in claim 7 wherein: the variable length format comprises: a length header comprising: a static HTML record length, and a dynamic HTML record length, a first ordered set of alphanumeric characters containing static HTML commands, and a second ordered set of alphanumeric characters containing dynamic HTML commands.

14. The method in claim 1 wherein: a one of the set of static HTML records alternates with a one of the set of dynamic HTML records throughout the parsed HTML file.

15. The method in claim 1 wherein step (A) comprises: 1 ) reading an HTML command as a next HTML command, 2) testing whether the next HTML command was a same type as an immediately preceding HTML command, 3) writing a current record to the parsed HTML file if the next HTML command was not the same type as the immediately preceding HTML command, 4) starting a new record as the current record when the next HTML command was not the same type as the immediately preceding HTML command, 5) adding the next HTML command to the current record, 6) testing whether there is another HTML command in the source HTML file, 7) repeating substeps (1), (2), (3), (4), (5), and (6) as a loop if there is another HTML command in the source HTML file, and 8) writing the current record to the parsed HTML file.

-1 1-

16. The method in claim 15 wherein step (A) further comprises: testing whether the next HTML command will fit in the current record, and 9) writing the current record to the parsed HTML file when the next HTML command will not fit in the current record, and, 10) starling a new record as the current record when the next HTML command will not fit in the current record.

1 7. The method in claim 1 wherein: the set of dynamic HTML records contains an SQL command.

1 8. A data processing system for improving HTML code execution comprising: A) means for parsing a source HTML file stored in a non-volatile storage medium into a parsed HTML file stored in a non- volatile storage medium, wherein: the parsed HTML file comprises: a set of static HTML records containing static HTML commands, and a set of dynamic HTML records containing dynamic HTML commands, each of the set of static HTML records and each of the set of dynamic HTML records is a variable length record; B) means for reading the parsed HTML file; and C) means processing the parsed HTML file read in means (B), wherein: each of the set of static HTML records is transmitted without interpreting the static HTML commands, and each of the set of dynamic HTML records is interpreted.

ΓÇó12-

19. Computer software stored in a computer software storage medium for improving HTML code execution comprising: A) a set of computer instructions stored in a computer software storage medium for parsing a source HTML file stored in a non-volatile storage medium into a parsed HTML file stored in a non-volatile storage medium, wherein: the parsed HTML file comprises: a set of static HTML records containing static HTML commands, and a set of dynamic HTML records containing dynamic HTML commands, each of the set of static HTML records and each of the set of dynamic HTML records is a variable length record; B) a set of computer instructions stored in a computer software storage medium for reading the parsed HTML file; and C) a set of computer instructions stored in a computer software storage medium for processing the parsed FITML file read in set of computer instructions (B), wherein: each of the set of static HTML records is transmitted without interpreting the static HTML commands, and each of the set of dynamic HTML records is interpreted.

20. The computer software in claim 19 which further comprises: a set of computer instructions stored in a computer software storage medium for performing a database access in response to interpreting one of the set of dynamic HTML records; and E) transmitting data as a result of the database access in set of computer instructions (D).

21. A computer software storage medium containing the computer software in claim 19.

-13-