WO2008058423A1 - Méthode de téléchargement par étapes et d'affichage de fichiers pdf - Google Patents

Méthode de téléchargement par étapes et d'affichage de fichiers pdf Download PDF

Info

Publication number
WO2008058423A1
WO2008058423A1 PCT/CN2006/003061 CN2006003061W WO2008058423A1 WO 2008058423 A1 WO2008058423 A1 WO 2008058423A1 CN 2006003061 W CN2006003061 W CN 2006003061W WO 2008058423 A1 WO2008058423 A1 WO 2008058423A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
page
client
indirect
content server
Prior art date
Application number
PCT/CN2006/003061
Other languages
English (en)
Chinese (zh)
Inventor
Yuqian Xiong
Original Assignee
Yuqian Xiong
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuqian Xiong filed Critical Yuqian Xiong
Priority to PCT/CN2006/003061 priority Critical patent/WO2008058423A1/fr
Publication of WO2008058423A1 publication Critical patent/WO2008058423A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing

Definitions

  • the present invention relates to a method of displaying any PDF document on the Internet.
  • PDF Portable Document Format
  • PDF files are not optimized for network step-by-step downloading: Each page in a PDF document depends on various resources, and these resources It may appear in different places in the document, thus causing the user to spend a lot of time downloading and reading the required document information.
  • the method for step-by-step downloading and displaying a PDF file of the present invention is characterized in that it comprises: a PDF presentation client program, a content server program and a preprocessor program, the program comprising the following steps:
  • a client When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;
  • the content server When the content server receives a document request, it loads the index data corresponding to the document. Then, it sends the basic information of the document to the client, including (but not limited to): (1) Basic information about the document, including the author of the document, title, etc.;
  • a flat PDF document is a standard PDF document that conforms to the PDF specification, but it does not use the object stream feature. If the initial PDF document contains object streams, the preprocessor will break up the object streams into separate indirect objects.
  • the standard PDF document contains a cross-reference list that will also appear in the flattened PDF.
  • This table stores the location of all indirect objects. If the initial document contains an object stream, it is possible for the preprocessor to modify the table.
  • Flat PDF documents can be stored as a file or placed in other types of data stores, such as in a database.
  • Step-by-step download is a way to download document content from a computer over a network.
  • the order in which the document content is downloaded ensures that the document can be displayed, and the user can operate the document as soon as possible. For example, if the user wants to see the first page of the document, the step-by-step download process should display the first page immediately after the first page of content is downloaded. As another example, when a user wishes to go directly to the last page of a document, the download process should allow the last page to be loaded immediately without having to tune into the middle page.
  • the steps are as follows -
  • a client When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;
  • the content server When the content server receives a document request, it loads the index data corresponding to the document. It then sends the basic information of the document to the client, including but not limited to:
  • a PDF file When a PDF file is loaded into the content server and is represented by a client request, it must be preprocessed by the preprocessor.
  • the input to the preprocessor is the initial PDF file, and the output is a flat PDF document and corresponding index data.
  • the preprocessor is called into the original PDF document to detect if the object uses the object stream. This detection process is accomplished by searching for cross-referenced streams and checking the flags of each indirect object in the cross-referenced stream. For details, see the PDF Reference » Fifth Edition, Section 3.4.6.
  • the preprocessor decompresses the object stream and writes all indirect objects in the stream as separate indirect objects.
  • the cross-reference list will also be modified accordingly. Finally, the modified cross-reference list is output to the flattened PDF result document.
  • the preprocessor can still load all the inline objects, write them to a new document, and modify the cross-reference table accordingly.
  • the preprocessor may fix some common problems in the cross-reference list and create the correct index data during the rewriting process.
  • the first part of the index data is document-level data, such as the author of the document, the title, the total number of pages, and so on. They can be generated when the initial document is loaded.
  • the second part of the index data is the location and size information of all indirect objects. This data is calculated when each indirect object is written to a flat PDF document.
  • the third part of the index data is the page level data, including page width, page height, indirect ID of the page object, and so on. This data can be generated after all data in the document has been loaded.
  • the last part of the index data is the list of dependencies. This requires a dependency detection process that takes a page as input and then outputs a list of all indirect objects that the page depends on.
  • a page depends on an indirect object, meaning that if the data of the indirect object is not loaded, the page will not display properly.
  • the referenced object is in a sublist of the page tree data structure
  • Referenced objects are used for performance that we do not support in the client, such as annotations, internal data structures, etc.
  • an object depends on other objects, but it is not directly referenced.
  • the object An example is the "name tree".
  • a target location (chapter, section, page, etc.) can be represented by a name.
  • the actual target location is stored in the name tree.
  • the page will depend on the entire tree of names.
  • the last output dependency list will list the IDs of all dependent indirect objects, possibly with a total before them, or an end flag.
  • the content server When the content server starts, it sets up a communication port and waits for a connection to that port.
  • the server may receive a request to obtain a specific document.
  • the content server will load the flat PDF document and the corresponding index data, and send the network information back and forth to request.
  • index data should remain in the computer's memory until the client and server connections are lost.
  • the content server also maintains a list of all indirect objects that have been transmitted by each document in each active connection.
  • the content server When the performance client requests to get a specific page in the document, the content server will query the index data and find the following data for the requested page:
  • the content server should send an indirect object representing the page, as well as all indirect objects in the dependency list that have not been previously sent (refer to the list of already sent objects). For each object that is sent, it should be added to the list of already sent objects.
  • the content server For each indirect object to be transferred, the content server should query the index data to determine its location and size, and read out a specific portion of the flattened PDF document and send that portion to the client.
  • the content server should send a flag to the client informing the client that all data used to display the page has been transferred.
  • the presentation client maintains a connection to the content server, sends a request based on the user's actions, and displays the page when the page is available.
  • the client maintains an internal form of the PDF document containing the indirect objects that the client receives from the content server.
  • the internal document may not be complete, but it is sufficient to display certain pages requested by the user.
  • the client When a user requests a document, the client creates a connection to a specific content server and sends a document request, including the ID of the document.
  • the client When the client receives a response from the server, the client will use the basic information of all pages to determine the display ruler for all pages. With this ruler information, the client can display blank pages correctly. The client can also set the scroll bar information accordingly.
  • the client When a document's information is received, or if the user's action needs to display a new page, the client sends a page request containing the page number of the requested page.
  • Indirect object data can be written to an internal form of a PDF document, assembled into a temporary document that, although incomplete, can be used to display the requested page.
  • the client may request access to these pages even if the user has not yet viewed the new page. This option keeps the communication connection busy so that when the user really needs to view a new page (usually the next page of the current page), the client improves the response speed.
  • the client When the client receives the notification from the server and informs that all the indirect objects required by the requested page have been sent, the page will be displayed normally, and the effect is like displaying a normal PDF document, but actually the document Other pages may still be empty at this time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention porte sur une méthode de téléchargement par étapes et d'affichage de fichiers PDF utilisant un terminal de client de représentation PDF et un terminal de serveur de contenu, ainsi qu'un préprocesseur permettant de réaliser un téléchargement réel par étapes sans devoir utiliser des formats de fichier différents.
PCT/CN2006/003061 2006-11-14 2006-11-14 Méthode de téléchargement par étapes et d'affichage de fichiers pdf WO2008058423A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2006/003061 WO2008058423A1 (fr) 2006-11-14 2006-11-14 Méthode de téléchargement par étapes et d'affichage de fichiers pdf

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2006/003061 WO2008058423A1 (fr) 2006-11-14 2006-11-14 Méthode de téléchargement par étapes et d'affichage de fichiers pdf

Publications (1)

Publication Number Publication Date
WO2008058423A1 true WO2008058423A1 (fr) 2008-05-22

Family

ID=39401301

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2006/003061 WO2008058423A1 (fr) 2006-11-14 2006-11-14 Méthode de téléchargement par étapes et d'affichage de fichiers pdf

Country Status (1)

Country Link
WO (1) WO2008058423A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051505A (zh) * 2021-03-24 2021-06-29 北京百度网讯科技有限公司 文档显示方法、装置和电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2346215A1 (fr) * 2000-05-05 2001-11-05 Xerox Corporation Format de fichier a reimpression rapide utilisant des etiquettes privees pour produire des documents reimprimables pouvant etre visualises et edites a l'aide d'outils ordinaires
US6538760B1 (en) * 1998-09-08 2003-03-25 International Business Machines Corp. Method and apparatus for generating a production print stream from files optimized for viewing
CN1479899A (zh) * 2001-02-05 2004-03-03 �ʼҷ����ֵ������޹�˾ 具有格式改编的对象传输方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6538760B1 (en) * 1998-09-08 2003-03-25 International Business Machines Corp. Method and apparatus for generating a production print stream from files optimized for viewing
CA2346215A1 (fr) * 2000-05-05 2001-11-05 Xerox Corporation Format de fichier a reimpression rapide utilisant des etiquettes privees pour produire des documents reimprimables pouvant etre visualises et edites a l'aide d'outils ordinaires
CN1479899A (zh) * 2001-02-05 2004-03-03 �ʼҷ����ֵ������޹�˾ 具有格式改编的对象传输方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051505A (zh) * 2021-03-24 2021-06-29 北京百度网讯科技有限公司 文档显示方法、装置和电子设备

Similar Documents

Publication Publication Date Title
US6832215B2 (en) Method for redirecting the source of a data object displayed in an HTML document
JP4932240B2 (ja) コンピュータ生成の文書におけるネストされたデータを透過的方法で公開するための方法およびシステム
JP5787963B2 (ja) コンピュータプラットフォームのプログラミングインターフェース
US7318193B2 (en) Method and apparatus for automatic document generation based on annotation
US20060271574A1 (en) Exposing embedded data in a computer-generated document
US7525996B2 (en) Intelligent access within a document package
US20020143822A1 (en) Method and apparatus for applying an adaptive layout process to a layout template
US20040203624A1 (en) Technique for sharing of files with minimal increase of storage space usage
US20020143523A1 (en) System and method for providing a file in multiple languages
JP2004265402A (ja) コンピュータ・ソフトウェア・アプリケーションのペースト機能を拡張するための方法およびシステム
CN102063483A (zh) 基于用户代理类型以变化格式提供字体文件
JP2006114045A (ja) スキーマデータ(schemadata)からデータ構造へのマッピング
JP2006526837A (ja) ページ保存ファイルを用いてコンテンツを閲覧する方法
JP2006178952A (ja) コンピュータによって生成されるドキュメントのデータの範囲を関連するxml要素にリンクする方法およびシステム
US9317620B2 (en) Server device
US20090006471A1 (en) Exposing Specific Metadata in Digital Images
US20060230057A1 (en) Method and apparatus for mapping web services definition language files to application specific business objects in an integrated application environment
JP4965014B2 (ja) コンピュータ通信ネットワークにおけるデータオブジェクトの転送方法および転送装置、起動方法と起動装置
JP5964847B2 (ja) 動的な画像結果の繋合
US20090024640A1 (en) Apparatus and method for improving efficiency of content rule checking in a content management system
US7793220B1 (en) Scalable derivative services
EP1345135A2 (fr) Appareil, système, méthode et programme d'ordinateur pour la gestion de documents
US8037090B2 (en) Processing structured documents stored in a database
US20080222183A1 (en) Autonomic rule generation in a content management system
US20030217045A1 (en) Generic proxy for representing search engine partner

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06805241

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06805241

Country of ref document: EP

Kind code of ref document: A1