WO2008058423A1 - Méthode de téléchargement par étapes et d'affichage de fichiers pdf - Google Patents
Méthode de téléchargement par étapes et d'affichage de fichiers pdf Download PDFInfo
- Publication number
- WO2008058423A1 WO2008058423A1 PCT/CN2006/003061 CN2006003061W WO2008058423A1 WO 2008058423 A1 WO2008058423 A1 WO 2008058423A1 CN 2006003061 W CN2006003061 W CN 2006003061W WO 2008058423 A1 WO2008058423 A1 WO 2008058423A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- page
- client
- indirect
- content server
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
Definitions
- the present invention relates to a method of displaying any PDF document on the Internet.
- PDF Portable Document Format
- PDF files are not optimized for network step-by-step downloading: Each page in a PDF document depends on various resources, and these resources It may appear in different places in the document, thus causing the user to spend a lot of time downloading and reading the required document information.
- the method for step-by-step downloading and displaying a PDF file of the present invention is characterized in that it comprises: a PDF presentation client program, a content server program and a preprocessor program, the program comprising the following steps:
- a client When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;
- the content server When the content server receives a document request, it loads the index data corresponding to the document. Then, it sends the basic information of the document to the client, including (but not limited to): (1) Basic information about the document, including the author of the document, title, etc.;
- a flat PDF document is a standard PDF document that conforms to the PDF specification, but it does not use the object stream feature. If the initial PDF document contains object streams, the preprocessor will break up the object streams into separate indirect objects.
- the standard PDF document contains a cross-reference list that will also appear in the flattened PDF.
- This table stores the location of all indirect objects. If the initial document contains an object stream, it is possible for the preprocessor to modify the table.
- Flat PDF documents can be stored as a file or placed in other types of data stores, such as in a database.
- Step-by-step download is a way to download document content from a computer over a network.
- the order in which the document content is downloaded ensures that the document can be displayed, and the user can operate the document as soon as possible. For example, if the user wants to see the first page of the document, the step-by-step download process should display the first page immediately after the first page of content is downloaded. As another example, when a user wishes to go directly to the last page of a document, the download process should allow the last page to be loaded immediately without having to tune into the middle page.
- the steps are as follows -
- a client When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;
- the content server When the content server receives a document request, it loads the index data corresponding to the document. It then sends the basic information of the document to the client, including but not limited to:
- a PDF file When a PDF file is loaded into the content server and is represented by a client request, it must be preprocessed by the preprocessor.
- the input to the preprocessor is the initial PDF file, and the output is a flat PDF document and corresponding index data.
- the preprocessor is called into the original PDF document to detect if the object uses the object stream. This detection process is accomplished by searching for cross-referenced streams and checking the flags of each indirect object in the cross-referenced stream. For details, see the PDF Reference » Fifth Edition, Section 3.4.6.
- the preprocessor decompresses the object stream and writes all indirect objects in the stream as separate indirect objects.
- the cross-reference list will also be modified accordingly. Finally, the modified cross-reference list is output to the flattened PDF result document.
- the preprocessor can still load all the inline objects, write them to a new document, and modify the cross-reference table accordingly.
- the preprocessor may fix some common problems in the cross-reference list and create the correct index data during the rewriting process.
- the first part of the index data is document-level data, such as the author of the document, the title, the total number of pages, and so on. They can be generated when the initial document is loaded.
- the second part of the index data is the location and size information of all indirect objects. This data is calculated when each indirect object is written to a flat PDF document.
- the third part of the index data is the page level data, including page width, page height, indirect ID of the page object, and so on. This data can be generated after all data in the document has been loaded.
- the last part of the index data is the list of dependencies. This requires a dependency detection process that takes a page as input and then outputs a list of all indirect objects that the page depends on.
- a page depends on an indirect object, meaning that if the data of the indirect object is not loaded, the page will not display properly.
- the referenced object is in a sublist of the page tree data structure
- Referenced objects are used for performance that we do not support in the client, such as annotations, internal data structures, etc.
- an object depends on other objects, but it is not directly referenced.
- the object An example is the "name tree".
- a target location (chapter, section, page, etc.) can be represented by a name.
- the actual target location is stored in the name tree.
- the page will depend on the entire tree of names.
- the last output dependency list will list the IDs of all dependent indirect objects, possibly with a total before them, or an end flag.
- the content server When the content server starts, it sets up a communication port and waits for a connection to that port.
- the server may receive a request to obtain a specific document.
- the content server will load the flat PDF document and the corresponding index data, and send the network information back and forth to request.
- index data should remain in the computer's memory until the client and server connections are lost.
- the content server also maintains a list of all indirect objects that have been transmitted by each document in each active connection.
- the content server When the performance client requests to get a specific page in the document, the content server will query the index data and find the following data for the requested page:
- the content server should send an indirect object representing the page, as well as all indirect objects in the dependency list that have not been previously sent (refer to the list of already sent objects). For each object that is sent, it should be added to the list of already sent objects.
- the content server For each indirect object to be transferred, the content server should query the index data to determine its location and size, and read out a specific portion of the flattened PDF document and send that portion to the client.
- the content server should send a flag to the client informing the client that all data used to display the page has been transferred.
- the presentation client maintains a connection to the content server, sends a request based on the user's actions, and displays the page when the page is available.
- the client maintains an internal form of the PDF document containing the indirect objects that the client receives from the content server.
- the internal document may not be complete, but it is sufficient to display certain pages requested by the user.
- the client When a user requests a document, the client creates a connection to a specific content server and sends a document request, including the ID of the document.
- the client When the client receives a response from the server, the client will use the basic information of all pages to determine the display ruler for all pages. With this ruler information, the client can display blank pages correctly. The client can also set the scroll bar information accordingly.
- the client When a document's information is received, or if the user's action needs to display a new page, the client sends a page request containing the page number of the requested page.
- Indirect object data can be written to an internal form of a PDF document, assembled into a temporary document that, although incomplete, can be used to display the requested page.
- the client may request access to these pages even if the user has not yet viewed the new page. This option keeps the communication connection busy so that when the user really needs to view a new page (usually the next page of the current page), the client improves the response speed.
- the client When the client receives the notification from the server and informs that all the indirect objects required by the requested page have been sent, the page will be displayed normally, and the effect is like displaying a normal PDF document, but actually the document Other pages may still be empty at this time.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention porte sur une méthode de téléchargement par étapes et d'affichage de fichiers PDF utilisant un terminal de client de représentation PDF et un terminal de serveur de contenu, ainsi qu'un préprocesseur permettant de réaliser un téléchargement réel par étapes sans devoir utiliser des formats de fichier différents.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2006/003061 WO2008058423A1 (fr) | 2006-11-14 | 2006-11-14 | Méthode de téléchargement par étapes et d'affichage de fichiers pdf |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2006/003061 WO2008058423A1 (fr) | 2006-11-14 | 2006-11-14 | Méthode de téléchargement par étapes et d'affichage de fichiers pdf |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008058423A1 true WO2008058423A1 (fr) | 2008-05-22 |
Family
ID=39401301
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2006/003061 WO2008058423A1 (fr) | 2006-11-14 | 2006-11-14 | Méthode de téléchargement par étapes et d'affichage de fichiers pdf |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2008058423A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113051505A (zh) * | 2021-03-24 | 2021-06-29 | 北京百度网讯科技有限公司 | 文档显示方法、装置和电子设备 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2346215A1 (fr) * | 2000-05-05 | 2001-11-05 | Xerox Corporation | Format de fichier a reimpression rapide utilisant des etiquettes privees pour produire des documents reimprimables pouvant etre visualises et edites a l'aide d'outils ordinaires |
US6538760B1 (en) * | 1998-09-08 | 2003-03-25 | International Business Machines Corp. | Method and apparatus for generating a production print stream from files optimized for viewing |
CN1479899A (zh) * | 2001-02-05 | 2004-03-03 | �ʼҷ����ֵ�������˾ | 具有格式改编的对象传输方法 |
-
2006
- 2006-11-14 WO PCT/CN2006/003061 patent/WO2008058423A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6538760B1 (en) * | 1998-09-08 | 2003-03-25 | International Business Machines Corp. | Method and apparatus for generating a production print stream from files optimized for viewing |
CA2346215A1 (fr) * | 2000-05-05 | 2001-11-05 | Xerox Corporation | Format de fichier a reimpression rapide utilisant des etiquettes privees pour produire des documents reimprimables pouvant etre visualises et edites a l'aide d'outils ordinaires |
CN1479899A (zh) * | 2001-02-05 | 2004-03-03 | �ʼҷ����ֵ�������˾ | 具有格式改编的对象传输方法 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113051505A (zh) * | 2021-03-24 | 2021-06-29 | 北京百度网讯科技有限公司 | 文档显示方法、装置和电子设备 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6832215B2 (en) | Method for redirecting the source of a data object displayed in an HTML document | |
JP4932240B2 (ja) | コンピュータ生成の文書におけるネストされたデータを透過的方法で公開するための方法およびシステム | |
JP5787963B2 (ja) | コンピュータプラットフォームのプログラミングインターフェース | |
US7318193B2 (en) | Method and apparatus for automatic document generation based on annotation | |
US20060271574A1 (en) | Exposing embedded data in a computer-generated document | |
US7525996B2 (en) | Intelligent access within a document package | |
US20020143822A1 (en) | Method and apparatus for applying an adaptive layout process to a layout template | |
US20040203624A1 (en) | Technique for sharing of files with minimal increase of storage space usage | |
US20020143523A1 (en) | System and method for providing a file in multiple languages | |
JP2004265402A (ja) | コンピュータ・ソフトウェア・アプリケーションのペースト機能を拡張するための方法およびシステム | |
CN102063483A (zh) | 基于用户代理类型以变化格式提供字体文件 | |
JP2006114045A (ja) | スキーマデータ(schemadata)からデータ構造へのマッピング | |
JP2006526837A (ja) | ページ保存ファイルを用いてコンテンツを閲覧する方法 | |
JP2006178952A (ja) | コンピュータによって生成されるドキュメントのデータの範囲を関連するxml要素にリンクする方法およびシステム | |
US9317620B2 (en) | Server device | |
US20090006471A1 (en) | Exposing Specific Metadata in Digital Images | |
US20060230057A1 (en) | Method and apparatus for mapping web services definition language files to application specific business objects in an integrated application environment | |
JP4965014B2 (ja) | コンピュータ通信ネットワークにおけるデータオブジェクトの転送方法および転送装置、起動方法と起動装置 | |
JP5964847B2 (ja) | 動的な画像結果の繋合 | |
US20090024640A1 (en) | Apparatus and method for improving efficiency of content rule checking in a content management system | |
US7793220B1 (en) | Scalable derivative services | |
EP1345135A2 (fr) | Appareil, système, méthode et programme d'ordinateur pour la gestion de documents | |
US8037090B2 (en) | Processing structured documents stored in a database | |
US20080222183A1 (en) | Autonomic rule generation in a content management system | |
US20030217045A1 (en) | Generic proxy for representing search engine partner |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 06805241 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06805241 Country of ref document: EP Kind code of ref document: A1 |