CN1222009A - Compression of decuments with markup language that preserves syntactical structure - Google Patents

Compression of decuments with markup language that preserves syntactical structure Download PDF

Info

Publication number
CN1222009A
CN1222009A CN 98119772 CN98119772A CN1222009A CN 1222009 A CN1222009 A CN 1222009A CN 98119772 CN98119772 CN 98119772 CN 98119772 A CN98119772 A CN 98119772A CN 1222009 A CN1222009 A CN 1222009A
Authority
CN
China
Prior art keywords
code
information
indication
expression
grammer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 98119772
Other languages
Chinese (zh)
Inventor
小布鲁斯·K·马丁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BRUCE K MARTIN JR
Original Assignee
BRUCE K MARTIN JR
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BRUCE K MARTIN JR filed Critical BRUCE K MARTIN JR
Priority to CN 98119772 priority Critical patent/CN1222009A/en
Publication of CN1222009A publication Critical patent/CN1222009A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

A method of compressing documents with markup language for hypertext of expression of markup language transmited by communication channels between mobile phone and internet. Grammar property of document element is easily expressed by coding expression when elements are compressed and encoded and transmites instruction which markup label property of language and grammar message existence of content on start position of code without relationship to element type, processes effectively compression and coding expression of document element without need of unfolding or decoding. Moreover, extension part of markup language which unidentified can be processed effectively by present coder and decoder.

Description

The method of utilizing the SGML condensed document that keeps syntactic structure
The present invention relates generally to be used for the information compressing method of the information of on the low bandwidth communication channel, transmitting, be specifically related to be used for having the information compressing method of the information (such as the document description that meets the generalized markup language rule) of certain syntactic structure to the wireless device transmissions of similar hand held mobile phone to receiving system.
Resembling the such network of Internet (internet) has existed for many years; Yet, they just become the popular medium of information exchange up to date, recently the application rapid growth of Internet, has been simplified user capture and has been read the required operating process of multimedia messages that is stored in the webserver because of the development of equipment and method to a great extent.Access resources, as the hyperlink (hyperlink) that people know, allow different message slots to get up not according to sequential organization and make the user easily the link information in browse.Available each different sheet multimedia messages distributes unique identifier in the network by giving, i.e. the uniform resource locator of called optical imaging (URL), and the user is visit information and need not consider the position of information storage easily.Participate in this " hypermedia (hypermedia) " network of network client-server and be referred to as hypermedia client computer and hypermedia server herein respectively.
An important development of facilitating this growth is the use such as " SGML (markup language) " and relevant treatment instrument, and these instrument definitions have also adopted a plurality of elements of the various grammar properties of regulation document.Now the many SGMLs that use meet international standard ISO 8879:1986, this standard definition the primitive rule based on markup language (tag-basedlanguage) of a series of being referred to herein as " standard generalized markup language (SGML) ".Perhaps, the SGML of the most popular SGML of meeting rule is a HTML(Hypertext Markup Language) on Internet.
Use the document represented based on the SGML of label usually to show and operate with the application software that is called browser or reader.These application software are carried out the processing that meets the respective markers language rule, and the information of expression document is carried out (grammer) analyze and explain, thus display document content correctly.
The information that meets the expression document of SGML class SGML generally includes some elements of label and presumable respective labels attribute and label substance, and these elements have transmitted the grammar property of the information of carrying in the document.
Tag identifier the element classification, for example, in HTML, the element of expression entire document begins to identify with the label of end by being labeled in document, the element of expression text fragment identifies by the label that is labeled in the paragraph section start, will be shown the text that has underscore and identify by the label that is labeled in underscore and begins and finish.
Tag attributes provides the information of the one or more features of designed element.For example, expression is embedded in a label of the image file in the document, has comprised the attribute of specifying the embedded images file name.According to the standard of SGML, tag attributes can be option or essential option according to corresponding tag types.
The general expression of label substance is used for showing or by the information of user-operable.Label substance can be optional or essential option according to the label classification, also can comprise " nested " element that its itself also has label, attribute and content.
Because SGML itself has very big flexibility ratio, so those SGMLs that meet SGML can provide very flexible and powerful instrument for carrying out document elements.This flexibility ratio has cost.It needs extra bandwidth to transmit label and tag attributes and needs extra resource that label and tag attributes are carried out (grammer) analysis and explanation.In HTML, label and attribute with character string with a kind of being similar to<tagid name=value formal representation, wherein tagid is the identifier of label, name is the title of attribute, value is a value of distributing to that attribute.A label can have not only attribute.
Owing to can be easy to obtain having sufficient computing capability and the enough personal computer and the work station of wide communication channel, therefore in most cases, transmit and handle label and required extra bandwidth and the resource of tag attributes is not a significant unfavorable factor.
Yet people are to by this class hand-held device of mobile device, especially radio telephone, come that to net the interest that hypermedia server that such network is connected conducts interviews increasing with Intemet.These devices all have strict restriction on disposal ability and memory space.In addition, the communication channel bandwidth of connection mobile device and all the other networks also is subjected to extremely strict restriction.
Wireless telephonic resource has only the sub-fraction in general desk-top or the resource that portable computer provided.Usually, disposal ability is less than one of percentage of most computers, and memory space is generally much smaller than 150 kilobytes (kB), and talk path is usually in scope 400~19, and 200 bits/per second use the expense of communication path with every 100kB or what dollars of more kB to calculate.
By reducing the capacity requirement of the information of transmitting on the communication channel, can use band-limited communication channel effectively.Adopt some data formats or Information Compression can reduce the capacity requirement of information.
The compression scheme such as the Huffman coding on general order ground are studied, but unfortunately, it is not very good because the compressed information result that the scheme of general objects obtains has been covered the grammar property of bottom-up information, in other words, whether the existence of the identification of label and tag attributes and content is not easy to obtain from the compression expression formula, in addition, the general objects compression scheme can not resemble based on the capacity requirement that reduces a lot of information the compression scheme of certain specific markers language.
Various compression schemes based on the specific markers language of similar HTML are studied.This type of compression scheme can reach higher compression degree by the known features of utilizing the specific markers language.For example, the special compression scheme of certain SGML does not allow the label of no content is transmitted the possibility of label substance.Unfortunately, these schemes require browser or expansion process can handle and expand all compression elements.The expansion of SGML and change part can not be restored from the compression expression formula, unless the finishing browser is handled new language feature; Otherwise the element that comprises new feature and the grammar property of nested element have been sheltered in the compression of new feature.Particularly, both made to add browser in can not or need not to use certain application program of new feature or installing, browser also must be trimmed.
For example, compressed certain new display format if certain compression scheme based on SGML expands, browser can not restore display format information the expression formula from compression, comprises the desired handling procedure of expansion new feature unless browser is trimmed to.In addition, do not have this finishing, browser can not be ignored or skip new feature and go to launch all the other information, because its disposal ability can not be measured the new compressive features of expansion.
The present invention seeks under the situation of the grammar property that does not cover the bottom document elements, reduce transmission and handle desired bandwidth of information and the resource of expressing document.
According to characteristics of the present invention, a kind of method that reduces the input information institute capacity required of expressing document, comprise: receive input information and identification a plurality of elements wherein, each unit have corresponding type and also at least some element have the syntactic information of the one or more corresponding grammar properties of expression; Generate a plurality of codes, each code has the beginning part and represents at least a portion of respective element and the information capacity that code requires is lower than the information capacity that representative partly requires, the grammer indication whether each code transmits element type separately and indicate element syntactic information separately to exist, and each code with the corresponding precalculated position of code the beginning part transfer syntax indication separately; Thereby and generate the coded message of representing document by part that need not a plurality of coded representation in a plurality of codes and a plurality of element being assembled into the form that is suitable for transmitting and store.
According to another characteristics of the present invention, a kind of method of from coded message, restoring the document that comprises a plurality of elements, comprise: the coded message and the identification a plurality of codes wherein that receive the expression document, wherein, each code has a beginning part, at least a portion of expression respective element is transmitted the dependent parser indication whether type indication of indicating the respective element type and the syntactic information of indicating the one or more grammar properties of expression coherent element exist; From obtaining each grammer indication with each corresponding precalculated position of code the beginning part; Generate a plurality of decoding expression formulas, wherein each decoding expression formula is drawn by respective code and corresponding to the part with the respective element of each coded representation, and wherein the information capacity requirement of the generation of the decoding expression formula of each grammer indication control expression syntactic information and the corresponding decoding expression formula that drawn is greater than the information capacity requirement of respective code; Thereby and generate the output information of representing document with collecting without the part element of coded representation in a plurality of decoding expression formulas and a plurality of element.
According to another characteristics of the present invention, a kind of method of from the code element of a plurality of compressions, restoring document, comprise: handle code element with the recognition element type and obtain the grammer indication of element grammar property, wherein the precalculated position draws in the respective coding element by starting with code element in the grammer indication, and the compression expression of element type expands into the uncompressed form of markup language tag; If at least one tag attributes of grammer indication indication exists, then, handle the tag attributes information in the code element by the compression expression formula of tag attributes information being launched into the uncompressed form of markup language tag Property Name or tag attributes value; Exist if reach grammer indication indicating label content, then handle the label substance information of code element according to the processing procedure that is applicable to label substance.
The various characteristics of invention and preferred embodiments thereof can be understood better by accompanying drawing and following relevant explanation.Wherein the same numeral in a few width of cloth accompanying drawings is represented same unit.Content and the accompanying drawing that the following describes only provide as an example at this, and should not be interpreted as limitation of the scope of the invention.
Fig. 1 is the schematic diagram of certain system's Main Ingredients and Appearance, and each characteristics wherein of the present invention can embody in this system.
Fig. 2 generates the processing of document elements compression expression formula or the block diagram of device.
Fig. 3 restores the processing of document elements or the block diagram of device from the compression expression formula.
Fig. 4 is the state diagram that generates the processing of document elements compression expression formula.
Fig. 5 is the state diagram of restoring the processing of document elements from the compression expression formula.
Fig. 6 is the functional flow diagram of compression or expansion document information process.
Fig. 7 illustrates a simple documents of expressing with SGML.
Fig. 8 illustrates the coded message of document in the presentation graphs of handling according to cataloged procedure of the present invention 7.
Summary
Fig. 1 illustrates system that realizes each characteristics of the present invention of explanation.Some compositions that draw among the figure might omit in different embodiment, as shown in the figure, and the resource that client computer 1 utilizes network 40 visits to be provided by server 51 and server 52.Although imagination server 51 and server 52 are hypermedia servers, meet HTTP(Hypertext Transport Protocol) mostly, operate, be not necessary to implementing the present invention.In exemplary embodiments, remote terminal 11 sends information and receives the interface of input from the user for the user provides to the user, and computer 31 carries out information exchange with the mode and the network 40 of general networks client computer.
At memory 32, memory 32 usually is the mixture of the so permanent storing apparatus of random access memory (RAM) (RAM), read-only memory (ROM) and disk and CD drive to computer 31 with parameter and information storage.Computer 31 communicates with remote terminal 11 by receiver 21 and transmitter 22.Information by computer 31 sends through transmitter 22 is received by the receiver in the remote terminal 11 16.The information that is sent by transmitter in the remote terminal 11 15 is received by computer 31 through receiver 21.
In the embodiment shown in fig. 1, remote terminal 11 is made up of display 12, one or several button 13, memory 14, transmitter 15 and receiver 16.For example, device 11 can be the similar radio telephone of the MobileAccessTM of radio communication Co., Ltd of Mitsubishi phone, or the Duette phone of Samsung Electronics.In typical radio telephone, display 12 is piece liquid crystal display (LCD) screens.The one or more data input devices of button 13 representative similar switch, key or buttons.Memory 14 expression storage circuits or other can store the device of digital information.Preferably, the part of memory 14 is permanent memories at least, means that information still was saved when device 11 was closed.At some embodiment, part memory 14 is formed the fast buffer storage of incorporate pushing away/draw high.Also can be with part memory 14 as permanent memory or ROM storage program command, and install 11 treatment circuits that comprise the energy execution of program instructions of a microprocessor or other type.
Computer shown in the figure 31, server 51,52, the communication path character between receiver 21 and the transmitter 22 is unimportant to enforcement of the present invention, for example, can utilize the exchange of special use and/or common equipment and/or non-commutative path to realize.Equally, the topological structure of network 40 is unimportant, can be realized by a series of modes that comprise classification or peer network.Computer 31 and server 51 can relative to each other be arranged in this locality, also can realize on same hardware.
Computer 31 and install 11 communication path character and the present invention is carried out also inessential; Yet on a lot of the application, device 11 is at the wireless device of radio frequency to the communication technologys such as electromagnetic transmission between infrared spectrum as frequency of utilization.Device 11 is radio telephones in the application, and as cell phone, transmitter 15, receiver 16, receiver 21 and transmitter 22 expressions are as the communication equipment of common telephone call.
Remote terminal
The client computer 1 that remote terminal 11 and computer 31 are carried out as the HTTP client functionality during application, remote terminal 11 provides three kinds of basic functions at least: (1) navigation feature allows user's navigation or traversal HTTP uniform resource locator (URL) hyperlink; (2) communication function is with computer 31 exchange messages; And (3) interface function, for the user provides a user interface that sends and receive from the user information to the user.
Preferably, these functions can realize by the software control process that adopts the event-driven system.For example, incident can be by the user by button 13 or next initial from the signal of receiver 16 receptions.Navigation feature is carried out under any state in two states.At " preparation " state, device is waited for and is specified the user input of hyperlink with traversal; At " hanging (pending) " state, communication function is to 31 requests of computer, then device is waited for from computer 31 and is received a response, according to http protocol, standby condition is waited for user's input of the URL of the appointment hypermedia entity that is shown or handles, and suspension status waits for that computer 31 provides a requested hypermedia entity.
In one embodiment, hypermedia information exchanges according to hand-held device host-host protocol (HOTP) and computer 31, a version of this agreement is at (the Unwired Planet of Unwired Planet Inc. by markon Fu Niya redwood seashore, Inc., Redwood Shores, California) on July 15th, 1997 publish, fascicle has description in number for HDTP-SPEC-DOC-101 " HDTP standard ", list for further reference at this.HDTP and HTTP are similar, but more are applicable to similar wireless telephonic remote-control device, and preferably utilize user data telegram agreement/IP (UDP/IP) to transmit.UDP/IP usually is considered to do not have TCP/IP reliable, and for example, it can not guarantee to receive packet, does not also guarantee to receive packet by the order that sends.Yet it is interesting to this class datagram agreement of UDI/IP to implement the present invention, because it does not set up certain " contact " at transmitting terminal and receiving terminal before requiring information exchange.This just need not set up a plurality of packet of session exchange.
In a preferred embodiment, hypermedia information is organized into card and card sets according to hand-held device SGML (HDML).A plurality of card sets and out of Memory entity type can be weaved into the message structure of so-called digest.A version of this SGML is by Unwired Planet, lists herein for further reference in HDML-SPEC-DOC-200 " HDML2.0 standard " description being arranged for the revised edition A that Inc. published in March, 1997, fascicle number.
Relay computer
According to the embodiment that illustrates herein, computer 31 and remote control 11 provide the function of conventional hypermedia client computer together.At this embodiment, computer 31 from remote control 11 reception information, and is translated into corresponding HTTP information with HDTP information according to HDTP on demand, and the result is delivered to server 51.Equally, computer is also translated into corresponding HDTP with HTTP information according to HTTP on demand from server 51 reception information, and the result is delivered to remote control 11.Thereby according to the present invention the HDTP information of computer 31 and 11 exchanges of remote control is compressed and to have been reduced information capacity and require also to have reduced remote control 11 and analyze and the desired processes of explain information.This compression and complementary expansion process are to operate by the Code And Decode that carries out in remote control 11 and the computer 31 to finish.
Processing procedure
Fig. 2 shows the cataloged procedure embodiment that generates document elements compression expression formula according to the present invention.The information of marker element 62 61 reception expression documents and a plurality of elements in the identifying information from the path.Element usually has the syntactic information of representing file structure and some aspect of grammar property at least.
Coding 64 generates many representatives a plurality of codes of some document elements at least a portion at least.Requiring at least, the information capacity of some code is lower than the information capacity that represented element information requires.Code has not only transmitted represented element type and has also transmitted the indication whether the element syntactic information exists.Preferably, at least some syntactic informations are to encode in the mode that reduces the information capacity requirement.Element information is the 63 processing nest information along the path on demand.Nest information can be handled in a series of modes that comprise the recurrence processing procedure.
Thereby compilation 66 is assembled into the form that is fit to transmission and stores and generates along the path 67 coded messages of representing documents by will encode 64 codes that generate and any element or part element without these coded representation.
An alternative embodiment of the invention comprises the code book 68 that many code books are provided.Coding 64 is selected a code book adaptively and is generated one or more codes according to the code book of being chosen from many code books.Selected code book indication is included in the coded message.
Fig. 3 illustrates the embodiment that restores the decode procedure of document elements according to the present invention from certain coding expression.The coded message of authentication code 72 71 reception expression documents from the path is also discerned many codes, and each code is represented the part of respective document element at least.
According to code, decoding 74 obtains grammer indication and generating solution sign indicating number expression formula.At least some decoding expression formulas need be utilized the more information capacity than respective code.Whether the syntactic information of one or more grammar properties of grammer indication indication expression document exists.The decoding expression formula 73 removes to handle any nested code along the path on demand.Nested code can be handled with a series of modes of recursive procedure that comprise.
Thereby compilation 76 collects and generates the output information of the 77 expression documents along the path by the decoding expression formulas that will be generated by decoding 74 and any element or part element without these coded representation.
An alternative embodiment of the invention comprises the code book 78 that many code books are provided.Decoding 74 is selected a code book and is generated one or more decoding expression formulas according to the code book of choosing according to the code book indication of choosing in the coded message adaptively from many code books.
In another embodiment of the present invention, handling 80 77 receives output informations and 81 generates the shows signal that is used to show along the path from the path.In some cases, decoding 74 may run into the code that can not decode, because the cognitive or support of the not decoded processing of these codes.Decoding 74 codes that these can not be supported 73 pass to the follow-up process that can use these codes along the path.Processing 80 utilizes the grammer in the code that is not supported to indicate and skips or avoid to handle these codes.
In a further embodiment, decoding 74 comprises and the processing procedure of handling the shows signal that 80 similar generations are used to show.In such an embodiment, because for example display unit can not be made appropriate responsive to the element of coded representation, which code element type that the process in the decoding 74 just adopts code to be transmitted and grammer indication decide to skip.
Coding
State course
The cataloged procedure of coding 64 can utilize the state course among Fig. 4 to describe, and each circle is represented a state, and the conversion between state is represented with line segment and changed towards the indicated direction of arrow.
Cataloged procedure originates in state 100 (beginning) and changes to state 101 (code tag) along 110.State 101 generates the coding expression of a respective element label.If relevant element tags is not followed any relevant syntactic information, then change along the path 111 and remove to generate the coding expression of next element tags to state 101.If one or more tag attributes, then change along the path 112 and carry out to state 102 (encoded attributes title).If there is no tag attributes but have label substance is then changed along the road 118 and is carried out to state 105 (encoded content).If no longer element tags occurs, then change along the path 122 to state 107 (ends) thus carry out the end cataloged procedure.
State 102 generates the coding expression of an association attributes title, and 113 conversions of carrying out to state 103 (encoded attributes value) along the path are arranged.State 103 generates a coding expression of respective attributes value.If there is next tag attributes, then changes along the path 114 and remove to generate the coding expression of next tag attributes to state 102.
Exist if no longer include tag attributes, then change along the path 115 and carry out to state 104 (attribute end).State 104 generates the code that a sign coherent element tag attributes finishes.If there is label substance, carry out handling label substance to state 105 thereby then change along the path 117.If there is no label substance is then changed along the path 116 and is removed to handle next element tags to state 101.
State 105 generates the coding expression of a respective labels content.If there is next label substance, then changes along the path 119 and remove to handle next label substance to state 105.If no longer have label substance, then change along the path 120 and carry out to state 106 (end of text).State 106 generates the code that a sign respective element label substance finishes, thereby has one 121 to remove to handle next element tags to the conversion of state 101 along the path then.
Explain in more detail that as following label substance can comprise nested element.If there is nested element, then carry out the recurrence conversion to state 100 along a unillustrated path.After all elements of certain nested certain layer is all processed, then carries out recurrence to state 105 and return conversion along another path of not drawing.
Example
Fig. 7 illustrates the simple documents of the SGML expression of usefulness such as HTML.Document write the form of embarking on journey and also for convenience of explanation each provisional capital put on sequence number.The row sequence number does not constitute the part of SGML.Can expect, in practical embodiments, when document transmits except SGML provide indication capable without any other or paragraph transmitted.
Row 1 comprises html document of sign initial<HTML〉label and row 8 comprise that a sign document finishes</HMTL〉label.In this example,<HTML〉label do not have attribute but meaningful, and content is respectively by the document body of starting and ending BODY label sign on row 2 and the row 7.<BODY〉content of label is nested in<HTML〉in the content of label.The expression be expert at 3 and row 6 between the BODY label substance comprise text and several label.
On the row 3<BODY〉label substance partly represents simple text.Content part on the row 4 is represented such element, though it has and does not have content and have that to specify the Property Name in display image source be the attribute of src and the IMG label of property value ("/item.gif ").By the text that the content part of row 5 is represented, comprise a pair of Character table of indicating with starting and ending B label that has by the boldface letter demonstration.Any<B〉label does not all have attribute but each all has content of text.The content part of row 6 is to comprise the text that has starting and ending A tag element,<A〉the existing attribute of label is also meaningful.Tag attributes has title (href) and specifies the value of other document URL (" http: ∥ a.url/info ").<A〉content of label be just appear at end</A text " here " before the label.
Fig. 8 is that the cataloged procedure with above-mentioned explanation is applied in the document markup language shown in Fig. 7 and the illustrative of the coding expression that obtains, for convenience of explanation, coding expression among Fig. 8 is aligned and is put on sequence number, and lines up indent form for easy to understand.Can expect, in practical embodiments, the coded message of generation except the coding expression by the SGML element provide, do not comprise any other the row or the section indication.
With reference to Fig. 8, expression symbol XYZ-C} represents to comprise markup language tag<XYZ〉and coding expression and comprise the code that has one or more tag attributes and have the indication of label substance.For example, row 1 expression symbol HTML-C} represents to comprise<HTML〉coding expression of label and have tag attributes not exist and the code of the indication that label substance exists.Equally, be expert at expression symbol on 4.1 IMG-A} represents to comprise<IMG〉label coding expression formula and comprise the code that has one or more tag attributes and do not have the indication of label substance.
According to the example shown in Fig. 8, expression on the row 1 symbol expression<HTML on the row 1 in the HTML-C} presentation graphs 7〉code of label.Explain that as above code has transmitted represented element type and comprised the indication that label substance exists.Expression on the row 2 symbol BODY-C} represents representative<BODY〉the 2nd capable among label<Fig. 7 and point out to exist the code of label substance.
{ STR} represents to be used for indicating the special code that text exists to expression symbol on the row 3.Expression symbol " The item " is represented text itself.This code has always hinted and has had label substance.Text can be in order to the serial of methods sign of explicit or implicit expression code.For example, the beginning of text string can come the implicit expression sign by certain several values that keeps in the text character.This scheme usually with context-sensitive because these retentions for example usually appear in the binary data field.At preferred embodiment, { explicit code that STR} represents indicates by the expression symbol among the figure by resembling in the beginning of text string.The ending of text string can come explicit sign with resembling the such spcial character of null character (NUL) or binary zero, also can start the explicit sign of representing in the code of length value by being included in, or come the implicit expression sign by the code of a non-effective text character.To practice the present invention, not to the restriction of this type of specified scheme.
Row 4.1 is the document elements coding expression on the 4th row to the row 4.3 common presentation graphs 7.Row 4.1, expression symbol IMG-A} represents<IMG〉code of label and point out to exist one or more tag attributes.Row 4.2, { src} represents the code of tag attributes title " src " to the expression symbol.This code can be the compression expression formula of following its name that more proves absolutely itself or to indicate Property Name be with some other generic attribute code as traditional text string formal definition Property Name.Expression symbol ("/item.gif ") expression provides the generic text string of property value.Perhaps, property value can be weaved into other form that resembles binary code.Be expert at 4.3, expression symbol END:img-a} represents<IMG〉code that finishes of the tag attributes of label.In one embodiment of the invention, a code is used for indicating the end of attribute, and another code is used for indicating the end of content.In another embodiment, use different codes according to element type.In another embodiment, use the different code sign attributes and the end of content according to element type.With reference to the example shown in the row 4.3,, expression symbol " img-a " can be understood as unique { END} code of expression IMG attribute end mark according to these embodiment.Yet in a preferred embodiment, one resembles null (sky) or the such special code of 0 value is used for indicating the attribute of all types label and the end of content, for present embodiment, can be interpreted as " img-a " only to be the corresponding relation between convenient tag attributes, content code and the end code that shows of reader.
The coding expression of row 5.1 document content on the 5th row to row 5.5 co expression Fig. 7.Be expert at 5.1 and row 5.3 on, { the 5th row is gone up the code and the text of two text strings of expression among STR} and text representation Fig. 7 of accompanying for expression symbol.
Row 5.2.1 the 5th row to the row 5.2.3 common presentation graphs 7 is gone up first<B〉coding expression of element.Be expert on the 5.2.1, the expression symbol B-C} represents<B〉label and point out to exist the code of content.Be expert on the 5.2.2, content of text is by { STR} " red " represents as above-mentioned expression symbol.Be expert on the 5.2.3, the expression symbol END:b-c} indicator sign<B〉code that finishes of label substance.Equally, going 5.4.1 to 5.4.3 has represented the last second<B of the 5th row among Fig. 7 jointly〉coding expression of element.
Be expert on 5.5, { coding of the text string that STR} " for a limited time " expression is above-mentioned has also been finished the coding expression that the 5th row among Fig. 7 is gone up document content to the expression symbol.According to the example among Fig. 8, the document content coding expression among Fig. 7 on the 6th row is represented to row 6.3 jointly by row 6.1.1.Yet in a practical embodiment of the invention, adjacent text string " for a limited time " and " Click " can be expressed as { the coding expression of STR} " for a limitedtime.Click ".
As illustrated just now, row 6.1.1 has represented jointly that to row 6.3 the 6th row among Fig. 7 goes up the coding expression of document content.As described, go the expression symbol { coding of STR} " Click " expression text string on the 6.1.1.On the row 6.1.2, expression symbol A-AC} represents<A〉there is the code of tag attributes and content in label and indication.On the row 6.1.3, { coding of tag attributes title and value is represented in href} (" http://a.url/info ") expression to the expression symbol.On the row 6.1.4, expression symbol END:a-a} indicator sign<A〉code that finishes of tag attributes.On the row 6.2.1, the expression symbol { coding of the text string of STR} " here " expression label substance.On the row 6.2.2, expression symbol END:a-c} sign<A〉end of label substance.Text string of expression symbol expression on the row 6.3, thus the coding expression of the 6th style of writing shelves content among Fig. 7 finished.
Expression on the row 7 symbol on END:body-c} and the row 8 END:html-c} represents respectively<BODY〉content and<HTML the code that finishes of label.
Compression
There are various codings or compression scheme to require to be lower than the code that the information capacity with its document elements of representing or partial document element requires with generating information capacity.The code that generates had both transmitted the type of represented document elements and had also pointed out whether there is syntactic information in the document elements.Syntactic information indication is transmitted starting on the corresponding preposition with code.
According to the preferred embodiment of the present invention, each byte of code (8 2 o'clock system positions) has fixing length, and wherein whether one or multidigit more exist such as two highest significant positions are used to refer to syntactic information.For example, for HTML, two bits are used to refer to one or more tag attributes respectively and whether label substance exists.Other code structure can comprise the code of variable-length.For example, code can comprise the element type indication and the independently syntactic information indication of the variable-length that is generated by the Huffmann coding.The syntactic information indication can be placed on code and start on the corresponding precalculated position.
Can set up the rule that allows the precalculated position to change according to element type, for example, this position can be set in after the element type indication of a variable-length and then.At another example, can be scheduled to certain position and be used for a class special code, have the code of one or several particular value such as those, another position can be intended for other code.Fixedly the precalculated position does not rely on element type in a preferred embodiment.
Special code
In a preferred embodiment, set up six special codes of a class, these special codes are called " global code ", because according to this embodiment, all encoder must correct interpretation and these codes of processing.The following describes this six codes.
The expression symbol is the { value of an appointment code book that self adaptation is chosen from numerous code books of the special code sign of CBK}.Decode according to the code book of being chosen.As top simplicity of explanation, 8 codes of regular length are used for transmitting element type and syntactic information indication.If two are used for transfer syntax information indication, only remaining six are transmitted element types.Ordinary circumstance, the number that element number can be represented considerably beyond 6 bits.Owing to also need to represent Property Name and/or the property value often used with these codes, this limitation is just more outstanding.By code being formed many code books and therefrom being selected suitable code book, the size of space encoder can significantly enlarge.Selected a code book when an encoder, the indication of selection is just enrolled in the coded message thereby the decoder of a complementation can determine which code book decoding should adopt.{ the indication that the CBK} code comes to this.
Expression symbol for the special code sign of CHR} the value of given character.For example, can not represent that with the document that meets ASCII (ASCII) text representation some are defined in the word mansion in the single code text.Any single code character can be by using { the numeric representation of CHR} code sign.
Expression symbol is { " opaque " data that the special code sign of DAT} can not decoded device be handled initial.The implication of so-called opaque data is that the data internal structure does not need to be known by encoder.Opaque Data Labels need not in the coded message to revise with being included in.{ length value of DAT} code transmits the scope of opaque data by following.
The expression symbol is { the above-mentioned element of special code sign of END} and the end of syntactic information.
The expression symbol is the { beginning of the above-mentioned text string of special code sign of STR}.
The expression symbol is { the special code sign unknown element type of UNK}.Use this code to improve existing encoder and handle the ability that comprises the document of undefined element when realizing them.The encoder in past can transmit unknown element with certain form, promptly allows nearest decoder remove to receive and handle new element.The decoder in past is worked with the encoder in past and can be skipped so that { element of UNK} code sign also continues to handle other known codes.
Decoding
The decode procedure of decoding 74 can state procedure shown in Figure 5 be illustrated, and each state is represented with a circle.Conversion between state is represented with line segment and is carried out along the arrow direction.
Decode procedure originates in state 130 (initial) and changes to state 131 (tag decoder) along 140.State 131 generates a decoding expression formula from the coherent element label of correlative code derivation.If there is not syntactic information in correlative code indication, then changes along the path 141 and carry out from next code generating solution sign indicating number expression formula to state 131.If there is one or more tag attributes in code indication, conversion 142 is carried out (decoding Property Name) to state 132 along the path.Do not have label substance if there is not tag attributes in code indication, conversion 148 is carried out to state 135 (decode content) along the path.If no longer element tags occurs, conversion along the path 152 to state 137 (ends) thus carry out the end decode procedure.
State 132 generates the decoding expression formula of an association attributes title.Conversion 143 is carried out to state 133 (decoding property value) along the path.State 133 generates the decoding expression formula of respective attributes value.If there is next tag attributes, thereby conversion carries out generating the decoding expression formula of next tag attributes along 144 to state 132.
When the tag attributes that no longer occurs, if there is label substance, conversion removes to handle label substance along 147 to 135.If there is no label substance, conversion 146 remove to handle next code to state 131 along the path.
State 135 generates the decoding expression formula of respective labels content.If there is next label substance, conversion 149 goes to handle next content to state 135 along the path.If no longer other label substance occurs, conversion 151 removes to handle next code to state 131 along the path.
As above-mentioned, label substance can comprise nested code.If there is nested code, then carry out the recurrence conversion to state 130 along a path that does not go out to mark.After all codes of certain nested certain layer are all processed, then carry out recurrence to state 135 and return conversion along another path that does not mark.
Recurrence
State diagram shown in Fig. 4 and Fig. 5 does not show the regulation of any recurrence.Implement the present invention and do not require recurrence, but in the many embodiment that handle nested element and code, be a kind of otherwise effective technique.Functional flow diagram shown in Fig. 6 has been represented the recursive procedure of the document elements that coding or decoding are expressed with the such SGML of HTML.
Coding
According to shown in cataloged procedure, step 221 is finished various initialization tasks.Step 222 is initialized as zero with recurring series.Thereby step 223 processing element label generates coding expression.Whether step 224 inspection exists any tag attributes.If have, thereby step 225 processing tag attributes generates a coding expression, returns step 224 inspection then and whether has other tag attributes.If no longer other tag attributes occurs, processing procedure proceeds to step 226.
Whether step 226 inspection exists label substance.If have, thereby step 227 processing label substance generates a coding expression.Whether nested any element in the step 228 inspection tag content.If no, processing procedure turns back to 226 and goes to check whether have any other label substance.If no longer other label substance occurs, processing procedure proceeds to step 230.If label substance has been nested with an element, step 229 increases progressively recurring series and continues execution in step 223.
Step 230 checks whether current recurring series is zero.If non-vanishing, successively decrease recurring series and continue the implementation of treatment step 226 of step 231.If recurring series is zero, whether step 232 inspection has finished cataloged procedure.If no, processing procedure is back to step 223.If finished cataloged procedure, step 233 is carried out various ending task.
Decoding
According to shown in decode procedure, step 221 is finished various initialization tasks.Step 222 is initialized as zero with recurring series.Thereby step 223 is handled code and is generated a decoding expression formula.Step 224 inspection is to have any tag attributes.If exist, thereby the code of step 225 processing list indicating label attribute generates a decoding expression formula, is back to step 224 inspection then and whether has any other tag attributes.If no longer have other tag attributes, processing procedure proceeds to step 226.
Whether step 226 inspection exists label substance.If exist, thereby the code of step 227 processing list indicating label content generates a decoding expression formula.Whether step 228 inspection has any code to be nested in the encoded label substance.If no, processing procedure is back to step 226 and checks whether there is any other label substance.If no longer have other label substance, processing procedure proceeds to step 230.If there is a code to be nested in the encoded label substance, step 229 increases progressively recurring series and continues the processing procedure of step 223.
Step 230 checks whether current recurring series is zero.If non-vanishing, successively decrease recurring series and processing procedure of step 231 proceeds to step 226.If recurring series is zero, whether step 232 inspection has finished decode procedure.If no, processing procedure is back to step 223.If finished decode procedure, step 233 is carried out various ending task.

Claims (24)

1. method that reduces the capacity requirement of the input information of representing document comprises:
Receive the input information of representing document and therefrom discern a plurality of elements, wherein each unit have a corresponding type, and some element has the syntactic information of representing one or more corresponding grammar properties at least;
Generate a plurality of codes, each code has a beginning and represents at least a portion of each element, and desired information capacity is lower than the information capacity that represented part requires, wherein each code has transmitted each element type and has indicated the grammer indication whether each element syntactic information exist, and each code transmits this grammer indication starting with each code on the corresponding precalculated position; And
Thereby by with a plurality of codes be not assembled into the coded message that the form that is suitable for transmitting or stores generates the expression document by the part element of a plurality of coded representation.
2. according to the method described in the claim 1, wherein, described element meets the SGML based on label, and each element comprises a markup language tag, and described syntactic information comprises tag attributes and label substance.
3. according to the method described in the claim 2, wherein, described SGML conformance with standard generalized markup language (SGML) DTD (DTD) based on label.
4. according to the method described in the claim 2, wherein, the described code of generation has the form that makes described grammer indication indicate syntactic information whether to exist in the mode that does not rely on element type.
5. according to the method described in the claim 2, wherein, described code has regular length and is starting transfer syntax indication on the corresponding fixed position with described code.
6. according to the method described in the claim 1, wherein, described code has regular length and is starting transfer syntax indication on the corresponding fixed position with described code.
7. according to the method described in the claim 1, also further comprise: select a code book from many code books, wherein the indication of some code code book of generating and chosen according to the code book of being chosen is compiled in the described coded message at least.
8. method of from coded message, restoring the decoded information of expression document, wherein, document comprises many elements, and the method comprises:
Receive the coded message of expression document and therefrom discern a plurality of codes, wherein each code has a beginning, represent certain part at least of each element, transmitted each type indication of indicating each element type and transmitted each grammer indication whether the indication syntactic information exists, described syntactic information is represented the one or more grammar property of described each element;
Starting corresponding precalculated position with each code and obtain the indication of each grammer from each code;
Generate a plurality of decoding expression formulas, wherein each decoding expression formula draws from code separately and corresponding to the counterpart of the coherent element of being represented by correlative code, the information capacity that the information capacity that the wherein generation of the decoding expression formula of each grammer indication control expression syntactic information, and resulting each decoding expression formula requires requires greater than correlative code; And
Thereby a plurality of decoding expression formulas and part element that need not described coded representation collected generate the output information of expression document.
9. the method described in according to Claim 8, wherein, described element meets the SGML based on label, and each element comprises a markup language tag, and described syntactic information comprises tag attributes and label substance.
10. according to the method described in the claim 9, wherein, described SGML conformance with standard generalized markup language (SGML) DTD (DTD) based on label.
11. according to the method described in the claim 9, wherein, described code has the form that makes described grammer indication indicate syntactic information whether to exist in the mode that does not rely on element type.
12. according to the method described in the claim 9, also further comprise: by usually handling output information according to the unit in the described output information, generation is used for the signal that shows on a display unit, wherein, this processing procedure is used the grammer indication of one or more elements in the described output information, and the syntactic information of avoiding influencing one or more described indicating characteristics is handled.
13. according to the method described in the claim 9, also further comprise: by handling coded message according to the code in the described coded message, generation is used for the signal that shows on a display unit, wherein, this processing procedure is used the grammer indication of one or more codes in the described coded message, and the syntactic information of avoiding influencing one or more described indicating characteristics is handled.
14. according to the method described in the claim 9, wherein, described coded message comprises one or more unsupported code samples, the expression formula of from these codes, can not being decoded accordingly, and described output information also generates by the one or more unsupported code samples that collect.
15. according to the method described in the claim 9, wherein, described code has regular length and transmit described grammer indication on the fixed position relevant with described code beginning.
16. the method according to Claim 8, wherein, described code has regular length and transmit described grammer indication on the fixed position relevant with described code beginning.
17. the method according to Claim 8, wherein, described coded message has comprised the indication of a code book of choosing from many code books, and some decoding expression formula is to derive from described code according to the code book of being chosen at least.
18. a method of restoring the decoded information of expression document from the coded message that comprises many compressed encoding elements comprises:
Thereby handle code element recognition element type and obtain the grammer indication of element grammar property, wherein this grammer indication starts with code element that corresponding precalculated position draws in the code element and the compression expression formula of element type expands into the uncompressed form of markup language tag;
If there is a tag attributes at least in this grammer indication indication, then, handle the tag attributes information in the code element by the compression expression formula of tag attributes information being expanded into the uncompressed form of markup language tag Property Name or tag attributes value; With
If there is label substance in grammer indication indication, then according to the label substance information in the processing procedure processing code element that is applicable to label substance.
19. according to the method described in the claim 18, wherein, described markup language tag conformance with standard generalized markup language (SGML) DTD (DTD).
20. according to the method described in the claim 18, wherein, described code element has the form that makes described grammer indication indicate syntactic information whether to exist in the mode that does not rely on element type.
21. according to the method described in the claim 18, also further comprise: usually handle output information according to the unit in the described output information, generation is used for the signal that shows on a display unit, wherein this processing procedure is used the grammer indication of the one or more elements in the described output information, and tag attributes information or the label substance avoiding influencing one or more described indicating characteristics are handled.
22. according to the method described in the claim 18, wherein, described coded message comprises the code element sample that one or more no types are supported, this code element can not expand into the uncompressed form of markup language tag, and described output information also generates by the code element sample of the one or more no type supports of collecting.
23. according to the method described in the claim 18, wherein, described coded message comprises the indication of certain code book of choosing from many code books, and markup language tag, tag attributes title or tag attributes content are launched into uncompressed form according to the code book of being chosen.
24. according to the method described in the claim 18, wherein, the compression expression formula of described element type has regular length, and transmits described grammer indication on a certain fixed position in described element type compression expression formula.
CN 98119772 1997-12-29 1998-09-28 Compression of decuments with markup language that preserves syntactical structure Pending CN1222009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 98119772 CN1222009A (en) 1997-12-29 1998-09-28 Compression of decuments with markup language that preserves syntactical structure

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US999518 1997-12-29
CN 98119772 CN1222009A (en) 1997-12-29 1998-09-28 Compression of decuments with markup language that preserves syntactical structure

Publications (1)

Publication Number Publication Date
CN1222009A true CN1222009A (en) 1999-07-07

Family

ID=5226459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 98119772 Pending CN1222009A (en) 1997-12-29 1998-09-28 Compression of decuments with markup language that preserves syntactical structure

Country Status (1)

Country Link
CN (1) CN1222009A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6928486B2 (en) 1999-12-16 2005-08-09 Nec Corporation Portable radio communication terminal having expression style processing apparatus therein and express style method
CN1871824B (en) * 2003-06-04 2010-04-28 高通股份有限公司 Method and apparatus for translating resource names in a wireless environment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6928486B2 (en) 1999-12-16 2005-08-09 Nec Corporation Portable radio communication terminal having expression style processing apparatus therein and express style method
CN100426827C (en) * 1999-12-16 2008-10-15 日本电气株式会社 Portable radio communication terminal and its representation style processing method thereof
CN1871824B (en) * 2003-06-04 2010-04-28 高通股份有限公司 Method and apparatus for translating resource names in a wireless environment

Similar Documents

Publication Publication Date Title
EP0928070A2 (en) Compression of documents with markup language that preserves syntactical structure
CN100593780C (en) Method and system for compressing/decompressing data for communication with wireless device
US9363309B2 (en) Systems and methods for compressing packet data by predicting subsequent data
US20090063530A1 (en) System and method for mobile web service
US6020972A (en) System for performing collective symbol-based compression of a corpus of document images
CN1235315A (en) Method and apparatus for accelerating navigation of hypertext pages using compound requests
EP1487113A3 (en) Coding and decoding of transformation coefficients in image or video coders
CN101040283A (en) Form related data reduction
CN101061500A (en) Methods, systems, devices and computer program products for providing dynamic product information in short-range communication
KR100967337B1 (en) A web browser system using proxy server of a mobile communication terminal
CN102053952A (en) Method and device for converting data format of electronic book and portable electronic book reader
CN1783882A (en) Flexibly transferring typed application data
JPH11203381A (en) Information reading system
WO2001024051A1 (en) Systems, methods and computer program products for scanning uniform resource locators to access and display internet resources
JP2004519038A (en) How to transfer a conforming object
US20030004994A1 (en) Method and apparatus for content based HTML coding
CN1748369A (en) Method and device for text data compression
CN1222009A (en) Compression of decuments with markup language that preserves syntactical structure
CN112218095A (en) Big data image transmission and viewing method and system
EP1710719A1 (en) Image transmission method, computer-readable image transmission program, recording medium, and image transmission apparatus
CN1127276C (en) Handwriting inputting, sending and receiving method and system for mobile terminal
CN1838596A (en) Method and apparatus for obtaining information
CN102185612A (en) Run-length coding and decoding methods and devices
CN1143516C (en) Data transmission display method
CN114039969A (en) Data transmission method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C01 Deemed withdrawal of patent application (patent law 1993)
WD01 Invention patent application deemed withdrawn after publication