CN113051504A - Document preview method, apparatus, device, storage medium and program product - Google Patents

Document preview method, apparatus, device, storage medium and program product Download PDF

Info

Publication number
CN113051504A
CN113051504A CN202110309374.1A CN202110309374A CN113051504A CN 113051504 A CN113051504 A CN 113051504A CN 202110309374 A CN202110309374 A CN 202110309374A CN 113051504 A CN113051504 A CN 113051504A
Authority
CN
China
Prior art keywords
document
data block
index table
previewed
address index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110309374.1A
Other languages
Chinese (zh)
Other versions
CN113051504B (en
Inventor
邹涛
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110309374.1A priority Critical patent/CN113051504B/en
Publication of CN113051504A publication Critical patent/CN113051504A/en
Application granted granted Critical
Publication of CN113051504B publication Critical patent/CN113051504B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a document preview method, a document preview device, document preview equipment, a storage medium and a program product, and relates to the field of intelligent search. The method comprises the following steps: receiving a document preview request, and acquiring a target data block in a plurality of data blocks of a document to be previewed according to the document preview request, wherein the target data block is a data block comprising a root object of the document to be previewed and indication information of an object address index table; respectively determining a root object and an object address index table of a document to be previewed according to the target data block; searching a first page object to be previewed according to the root object and the object address index table; and acquiring a data block corresponding to an object in a page in the first page object to be previewed, and analyzing and rendering the data block. The method improves the document opening speed during online preview of the document.

Description

Document preview method, apparatus, device, storage medium and program product
Technical Field
The embodiment of the application relates to computer technology, in particular to a document previewing method, a document previewing device, document previewing equipment, a document previewing storage medium and a document previewing program product, which can be used in the field of intelligent search.
Background
In the client era, document browsing is usually performed by local browsing through a dedicated reader, and in the internet era, the demand of users for online previewing of documents is very strong.
In a conventional document online preview scheme, the whole document is often downloaded first and then parsed and rendered, and the download speed of the document in the scheme directly affects the opening speed of the preview. Since the size of the document is not controllable, when the document is large, the downloading time of the whole document is long, which results in too slow opening speed of the document when the document is previewed online.
Disclosure of Invention
The application provides a document previewing method, a document previewing device, a document previewing apparatus, a storage medium and a program product for improving the opening speed of a document during online previewing.
According to an aspect of the present application, there is provided a document preview method including:
receiving a document preview request, and acquiring a target data block in a plurality of data blocks of a document to be previewed according to the document preview request, wherein the target data block is a data block comprising a root object of the document to be previewed and indication information of an object address index table;
respectively determining a root object and an object address index table of the document to be previewed according to the target data block;
searching a first page object to be previewed according to the root object and the object address index table;
and acquiring a data block corresponding to an object in a page in the first page object to be previewed, and analyzing and rendering the data block.
According to another aspect of the present application, there is provided a document preview apparatus including:
the acquisition module is used for receiving a document preview request and acquiring a target data block in a plurality of data blocks of a document to be previewed according to the document preview request, wherein the target data block comprises a root object of the document to be previewed and indication information of an object address index table;
the first processing module is used for respectively determining a root object and an object address index table of the document to be previewed according to the target data block;
the second processing module is used for searching a first page object to be previewed according to the root object and the object address index table;
and the display module is used for acquiring a data block corresponding to an object in a page in the first page object to be previewed, and analyzing and rendering the data block.
According to still another aspect of the present application, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
According to yet another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect described above.
According to yet another aspect of the present application, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first aspect.
According to the technical scheme of the application, the opening speed of the document during online preview is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic diagram of a document page tree provided in accordance with an embodiment of the present application;
FIG. 2 is a flowchart illustrating a document preview method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a document data block provided according to an embodiment of the application;
FIG. 4 is a schematic diagram of partial information of a data block provided in accordance with an embodiment of the present application;
FIG. 5 is a schematic diagram of partial information of another data block provided in accordance with an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a document previewing apparatus according to an embodiment of the present application;
FIG. 7 is a schematic block diagram of an electronic device for implementing a document preview method of an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The document previewing method provided by the embodiment of the application is suitable for a scene of online previewing the document. In the related technology of online previewing documents, a common method is full-text document downloading, that is, previewing after downloading the whole document data, however, the preview opening speed of this method is affected by the size of the document, and when the document is large, the document downloading time is long, resulting in too slow opening speed.
In order to increase the opening speed of the online preview of the document, a streaming loading method may be adopted, that is, data is loaded according to the position of the page to be previewed, and a stream pulling mode is adopted, so that only one section of data corresponding to the page to be previewed in the document to be previewed is requested each time, and thus, under the condition that the whole amount of the document is not needed, the acquired partial data can be normally analyzed and rendered to realize the preview of the corresponding page. The precondition of adopting the streaming loading for the document is that the document elements are organized by taking pages as units, so that only data corresponding to one page can be acquired during previewing, and rapid previewing is realized.
Taking a PDF document as an example, the format of the PDF document is composed of a file header, a file body, a cross reference table and a file trailer, where the content in the file body is an intra-page object of each page, and these objects are indexes established by the pages. Illustratively, as shown in fig. 1, the tree relationship of the page objects in the PDF document has a complete inclusion relationship from the root node to the page tree and then to a single page. Therefore, in the document preview, only data required for rendering one page can be taken in order to realize a faster opening speed. After the data of the page is downloaded, the page can be analyzed and rendered, the data of the page can be continuously downloaded, the downloading while viewing is realized, and the opening speed of online preview is improved.
In order to implement the page-by-page downloading and previewing, when a data block of a document is obtained in a pull stream manner, a data block capable of analyzing a tree relationship from a root node to a page object and an object address index in the document needs to be obtained first, and then a data block corresponding to a single page is obtained according to the page object and a corresponding address, so that rapid previewing is achieved. Hereinafter, the document preview method provided in the present application will be described in detail by specific embodiments. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 2 is a flowchart illustrating a document preview method according to an embodiment of the present application. The execution subject of the method is a document previewing device, which can be implemented in a software and/or hardware manner, for example, the device can be a terminal such as a personal computer, a mobile phone, and the like. The method comprises the following steps:
s201, receiving a document preview request, and acquiring a target data block in a plurality of data blocks of a document to be previewed according to the document preview request.
The target data block is a data block comprising a root object of the document to be previewed and indication information of the object address index table.
The document preview request may be triggered by a user through a terminal, for example, the user clicks a preview icon of a document to be previewed on a user interface of the terminal, so as to trigger the document preview request, and the terminal requests the server for the document to be previewed according to the document preview request. In the implementation of the application, when the terminal requests the document to be previewed from the server, the terminal does not directly request to download the data of the whole document, but acquires the target data block in the document to be previewed on the premise that the document to be previewed is regarded as a plurality of data blocks so as to be convenient for subsequent gradual analysis.
The root object of the document to be previewed is a total entry of the document object, that is, a root node, and the document to be previewed is taken as a PDF document for example, where the PDF document may include a page object, a bookmark object, a form object, and the like, and as described above, the objects in the PDF have a tree relationship, so that the objects can be gradually found from the root object. The object address index table of the document to be previewed includes the corresponding relationship between the object and the address, taking a PDF document as an example, the object address index table is a cross index table, which includes address information of each object in the PDF.
The target data block includes indication information of the root object and the object address index table, and the indication information may be identification, address and the like, so that the root object and the object address index table can be further acquired based on the target data block. For a document to be previewed in a certain data format, the document structure of the document to be previewed is fixed, description information, indication information and the like in the document are usually in fixed data blocks, for example, target data blocks are usually at the head and/or tail of the document, and in practical application, the target data blocks can be obtained according to the format of the document to be previewed.
S202, respectively determining a root object and an object address index table of the document to be previewed according to the target data block.
The indication information of the root object and the object address index table is obtained from the target data block, so that the root object and the object address index table of the document to be previewed can be respectively obtained based on the indication information. It should be noted that, if the root object or the object address index table indicated by the indication information is in the already acquired target data block, the root object or the object address index table may be directly acquired; and if the root object or the object address index table indicated by the indication information is not in the acquired target data block, acquiring the data block corresponding to the indication information according to the indication information to acquire the root object or the object address index table.
S203, searching the first page object to be previewed according to the root object and the object address index table.
After the root object is determined, the tree relationship of the page objects under the root object can be determined step by step based on the root object and the object address index table, so that the first page object to be previewed can be found. For example, after a user triggers a document preview request, a terminal first loads a top page of a document to be previewed, the first page to be previewed is the top page of the document to be previewed, after the top page is loaded according to the embodiment of the present application, the terminal continues to load a second page, at this time, the first page to be previewed is the second page, the loading process of only one page in the embodiment of the present application is described in detail, the loading processes of other pages are similar, and the terminal may load the pages one by one.
And S204, acquiring a data block corresponding to the object in the page in the first page object to be previewed, and analyzing and rendering the data block.
The page objects in the first page to be previewed can include content streams such as numbers, characters, pictures and the like, and can also include resource streams such as fonts, color spaces, embedded objects and the like, and the addresses of the page objects can be determined through the object address index table, so that the terminal can only obtain data blocks corresponding to the page objects in the first page object to be previewed, and the rendering of the first page to be previewed can be realized by combining the page size according to the analysis of the data blocks.
In the embodiment of the application, a pull stream mode is adopted for a document to be previewed, a target data block comprising indication information of a root object and an object address index table is obtained firstly, so that the root object and the object address index table are analyzed, then page objects to be previewed can be determined step by step according to the root object, and data blocks corresponding to objects in pages of the page to be previewed can be obtained according to the page objects to be previewed and the object address index table, so that the page to be previewed can be rendered through a small amount of data, the opening speed of online previewing of the document is improved, and full-text file stream type opening is realized by loading the data blocks corresponding to the pages one by one.
On the basis of the above embodiment, how to obtain the target data block and how to determine the root object and the object address index table of the document to be previewed according to the target data block are further described in detail.
Determining respective corresponding addresses of a plurality of data blocks of a document to be previewed according to the document previewing request; and acquiring the target data block according to the respective corresponding addresses of the plurality of data blocks.
In the embodiment of the application, the terminal takes the document to be previewed as a plurality of data blocks, and acquires partial data blocks in the document each time in a pull stream mode. Specifically, after receiving the document preview request, the terminal may first obtain the size of the document to be previewed, and determine the address corresponding to each data block according to the size of the document to be previewed in a manner of dividing the document to be previewed into a plurality of data blocks. Therefore, the root object and the object address index table of the document can be analyzed and obtained by acquiring a small amount of data, and the construction of the page tree is realized.
Optionally, a plurality of data blocks of the document to be previewed are determined according to a fixed size, and at most one of the plurality of data blocks has a size different from sizes of other data blocks. For example, as shown in fig. 3, the document to be previewed is divided from the head part and the tail part in sequence according to a fixed size of 128KB, and finally, a data block smaller than 128KB exists in the middle position of the document to be previewed.
The PDF document can be divided into a linearized document and a conventional document, wherein the header of the conventional document stores information such as a file version, and the tail of the conventional document stores identification information of a root object, namely an indirect object number of the root object, and address information of a cross reference table; in the linearized document, the identification information of the root object, the address information of the cross reference table and the like are all located at the head of the file, and the data blocks are pulled in sequence to obtain the data blocks. Therefore, the target data block in the embodiment of the present application can be described in two cases.
In an embodiment, in a case that it may be determined in advance that the document to be previewed is a linearized document, the target data block may be a first data block of a plurality of data blocks of the document to be previewed, that is, the terminal first obtains the first data block of the document to be previewed, and then, according to the first data block, respectively searches for the identification information of the root object and the address information of the object address index table.
It should be noted that, in this embodiment, if the identification information of the root object and the address information of the object address index table are not found in the first data block, the second data block of the document to be previewed is continuously obtained for searching, and the data blocks are sequentially pulled. In this way, streaming preview of the document can be achieved by acquiring a small amount of data at a time.
In one embodiment, the target data chunk may be a first data chunk and a last data chunk of a plurality of data chunks of the document to be previewed, in case that it cannot be predetermined that the document to be previewed is a linearized document or a regular document. The terminal can simultaneously acquire a first data block and a last data block of the document to be previewed, and then analyze the first data block to determine whether the document to be previewed is a linearized document; if yes, respectively searching the identification information of the root object and the address information of the object address index table from the first data block, and if not, respectively searching the identification information of the root object and the address information of the object address index table from the last data block.
For example, as shown in fig. 4, part of information in the last data block is shown, in fig. 4, 10 after the Root tag is an indirect object number (representing information) of the Root object, and 16644 after the startxref tag is address information of the cross reference table (object address index table).
For the PDF document, when the first data block is analyzed, it may be further determined whether the document to be previewed is a valid PDF document according to the identifier of the document type in the first data block, and if not, the subsequent steps are stopped to report an error.
In the two embodiments, after the identification information of the root object and the address information of the object address index table are found, the root object is found according to the identification information of the root object and the first data block; and acquiring a data block for storing the target address index table according to the address information of the target address index table, and acquiring the target address index table from the data block for storing the target address index table.
For example, as shown in fig. 5, part of the information in the first data block, and as shown in fig. 4 and 5, after the indirect object number of the root object is determined to be 10, 10obj in the first data block is found to be the root object. After determining address information 16644 of the cross reference table, the corresponding data block is obtained according to the address information, so as to obtain the cross reference table.
It should be noted that, when the root object is searched from the first data block according to the identification information of the root object, if the root object may not be found in the first data block, the second data block of the document to be previewed needs to be obtained again, and the root object is searched from the second data block. That is, if the information of the first data block shown in fig. 5 does not include 10obj, the search is continued from the second data block. Since the root object of a conventional PDF document is in the document header, the root object is typically found in two data blocks. In this way, the page tree of the document can be analyzed by acquiring several data blocks at the head and tail of the document.
How to search the first page object to be previewed according to the root object and the object address index table is further described below.
Searching a page object according to the root object and the object address index table; searching the identification information of the sub-page object from the page object; and searching the first page object to be previewed according to the identification information of the sub-page object and the object address index table.
Because the tree-shaped relation exists between the root object and the page, the identification information of the page object can be searched from the root object; and searching the page object according to the identification information of the page object and the object address index table. Still referring to fig. 5. As shown in fig. 5, the root object, i.e., 10obj, includes an indirect object number of a page (Pages) object, and the indirect object number of the Pages object is 20. In this way, the addresses of the Pages objects can be determined from the cross reference table according to the indirect object numbers of the Pages objects, thereby obtaining the Pages objects. It should be noted that fig. 5 is only an example, that is, the Pages object 20obj is in the first data block, so that the Pages object can be found directly from the first data block. In a specific application, if the address of the Pages object is not in the currently acquired data blocks, the corresponding data blocks are acquired according to the address of the Pages object, so that the Pages object can be found.
The page object, i.e. Pages object, is the total entry of the page, and the information in the Pages object indicates the specific page included in the document to be previewed, i.e. the number of sub-page objects and the indirect object number of each sub-page object. The sub page is a specific page in the document to be previewed. As shown in fig. 5, the Pages object 20obj includes therein a number of subpage objects (Count) of 1 and an indirect object number of subpage objects (Kids) of 30. In fig. 5, taking one page as an example, in practical application, if there are 10 sub-pages in the document to be previewed, the Count is 10, and the Kids includes 10 indirect object numbers. Therefore, the page tree of the document can be quickly constructed according to the root object and the object address index table, and then the data blocks corresponding to a single page can be acquired one by one, so that streaming preview is realized.
When the home page of the document to be previewed is obtained, namely the first page object to be previewed is the home page object, the address of the home page object can be determined from the cross reference table according to the indirect object number of the home page object, and therefore the home page object is searched; when the second page of the document to be previewed is obtained, namely the first page object to be previewed is the second page object, the address of the second page object can be determined from the cross reference table according to the indirect object number of the second page object, and therefore the second page object is searched.
It should be noted that, when searching for the first page object to be previewed, if the address of the first page object to be previewed is in the already acquired data block, the first page object to be previewed may be directly searched for from the address, and if the address of the first page object to be previewed is not in the already acquired data block, the corresponding data block is acquired according to the address of the first page object to be previewed to obtain the first page object to be previewed.
After the first page object to be previewed is found, the page object in the first page object to be previewed may be determined, and as described above, the page object may include content streams such as numbers, characters, and pictures, and may also include resource streams such as fonts, color spaces, and embedded objects. The first page object to be previewed comprises the indirect object numbers of the objects in the pages, so that the addresses of the objects in the pages are determined according to the indirect object numbers and the cross reference table, and the data blocks corresponding to the objects in the pages can be sequentially pulled.
It should also be noted that, if the address of the intra-page object is in the already acquired data block, the intra-page object may be directly acquired therefrom, and if the address of the intra-page object is not in the already acquired data block, the corresponding data block is acquired according to the address of the intra-page object to acquire the intra-page object. And after the data blocks corresponding to the objects in all the pages of the first page to be previewed are acquired, analyzing and rendering can be carried out, so that page previewing is realized.
In the embodiment of the application, the target data block is pulled first, the page tree from the root object to the page object is constructed, and then the data blocks corresponding to all the pages are pulled one by one, so that the pages can be quickly previewed through a small amount of data.
In addition, it should be further noted that the size of the data block in the embodiment of the present application may be a fixed size, so that when the terminal pulls data each time, it is not necessary to analyze and calculate how much data needs to be pulled this time, it is only necessary to pull a corresponding data block according to an address in the cross reference table, if it is determined that data of an object in the data block is incomplete after pulling the data block, it is only necessary to continue pulling the next data block, and by using the stream pulling method with the fixed size, the calculation amount of the terminal can be reduced, and the preview speed is increased.
Fig. 6 is a schematic structural diagram of a document previewing apparatus according to an embodiment of the present application. As shown in fig. 6, the document preview apparatus 600 includes:
an obtaining module 601, configured to receive a document preview request, and obtain a target data block in a plurality of data blocks of a document to be previewed according to the document preview request, where the target data block is a data block that includes a root object of the document to be previewed and indication information of an object address index table;
a first processing module 602, configured to determine, according to the target data block, a root object and an object address index table of a document to be previewed respectively;
the second processing module 603 is configured to search for the first page object to be previewed according to the root object and the object address index table;
the display module 604 is configured to obtain a data block corresponding to an object in a page in the first page object to be previewed, and analyze and render the data block.
In one embodiment, the second processing module 603 comprises:
the first searching unit is used for searching the page object according to the root object and the object address index table;
the second searching unit is used for searching the identification information of the sub-page object from the page object;
and the third searching unit is used for searching the first page object to be previewed according to the identification information of the sub-page object and the object address index table.
In one embodiment, the first lookup unit:
the first searching subunit is used for searching the identification information of the page object from the root object;
and the second searching subunit is used for searching the page object according to the identification information of the page object and the object address index table.
In one embodiment, the obtaining module 601 includes:
the first determining unit is used for determining respective corresponding addresses of a plurality of data blocks of a document to be previewed according to the document previewing request;
and the first acquisition unit is used for acquiring the target data block according to the respective corresponding addresses of the plurality of data blocks.
In one embodiment, the target data chunk is a first data chunk of a plurality of data chunks of a document to be previewed;
the first processing module 602 includes:
the fourth searching unit is used for respectively searching the identification information of the root object and the address information of the object address index table according to the first data block;
a fifth searching unit, configured to search the root object according to the identification information of the root object;
and the second acquisition unit is used for acquiring the data block for storing the target address index table according to the address information of the target address index table and acquiring the target address index table from the data block for storing the target address index table.
In one embodiment, the target data block is the first data block and the last data block in a plurality of data blocks of the document to be previewed;
the first processing module 602 includes:
the analysis unit is used for analyzing the first data block and determining whether the document to be previewed is a linearized document;
a sixth searching unit, configured to search, from the last data block, identification information of the root object and address information of the object address index table, respectively;
a seventh searching unit, configured to search the root object according to the identification information of the root object and the first data block;
and the third acquisition unit is used for acquiring the data block for storing the target address index table according to the address information of the target address index table and acquiring the target address index table from the data block for storing the target address index table.
In one embodiment, the seventh lookup unit includes:
a third searching subunit, configured to search the root object from the first data block according to the identification information of the root object;
the first obtaining subunit is configured to obtain a second data block of the document to be previewed if the root object is not found in the first data block;
and the fourth searching subunit is used for searching the root object from the second data block.
In one embodiment, there is at most one data block of the plurality of data blocks having a size different from the size of the other data blocks.
The embodiment of the present application can be used to execute the document preview method in the above method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.
The present application also provides an electronic device and a non-transitory computer-readable storage medium storing computer instructions according to embodiments of the present application.
There is also provided, in accordance with an embodiment of the present application, a computer program product, including: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.
FIG. 7 is a schematic block diagram of an electronic device for implementing a document preview method of an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 701, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as the document preview method. For example, in some embodiments, the document preview method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into RAM 703 and executed by the computing unit 701, one or more steps of the document preview method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the document preview method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (19)

1. A document preview method, comprising:
receiving a document preview request, and acquiring a target data block in a plurality of data blocks of a document to be previewed according to the document preview request, wherein the target data block is a data block comprising a root object of the document to be previewed and indication information of an object address index table;
respectively determining a root object and an object address index table of the document to be previewed according to the target data block;
searching a first page object to be previewed according to the root object and the object address index table;
and acquiring a data block corresponding to an object in a page in the first page object to be previewed, and analyzing and rendering the data block.
2. The method of claim 1, wherein the searching for the first page object to be previewed according to the root object and the object address index table comprises:
searching a page object according to the root object and the object address index table;
searching the identification information of the sub-page object from the page object;
and searching the first page object to be previewed according to the identification information of the sub-page object and the object address index table.
3. The method of claim 2, said looking up a page object according to the root object and the object address index table, comprising:
searching the identification information of the page object from the root object;
and searching the page object according to the identification information of the page object and the object address index table.
4. The method according to any one of claims 1 to 3, wherein the obtaining a target data chunk of a plurality of data chunks of a document to be previewed according to the document preview request comprises:
determining respective corresponding addresses of a plurality of data blocks of the document to be previewed according to the document previewing request;
and acquiring the target data block according to the respective corresponding addresses of the plurality of data blocks.
5. The method of any of claims 1-3, the target data chunk being a first data chunk of a plurality of data chunks of the document to be previewed;
determining a root object and an object address index table of the document to be previewed according to the target data block, wherein the determining comprises the following steps:
according to the first data block, respectively searching the identification information of the root object and the address information of the object address index table;
searching the root object according to the identification information of the root object;
and acquiring a data block for storing the object address index table according to the address information of the object address index table, and acquiring the object address index table from the data block for storing the object address index table.
6. The method of any of claims 1-3, wherein the target data chunk is a first data chunk and a last data chunk of a plurality of data chunks of the document to be previewed;
determining a root object and an object address index table of the document to be previewed according to the target data block, wherein the determining comprises the following steps:
analyzing the first data block to determine whether the document to be previewed is a linearized document;
if not, respectively searching the identification information of the root object and the address information of the object address index table from the last data block;
searching the root object according to the identification information of the root object and the first data block;
and acquiring a data block for storing the object address index table according to the address information of the object address index table, and acquiring the object address index table from the data block for storing the object address index table.
7. The method of claim 6, wherein said finding the root object according to the identification information of the root object and the first data block comprises:
searching the root object from the first data block according to the identification information of the root object;
if the root object is not found in the first data block, acquiring a second data block of the document to be previewed;
and searching the root object from the second data block.
8. The method of any of claims 1-3, wherein there is at most one of the plurality of data blocks having a size different from sizes of other data blocks.
9. A document preview device comprising:
the acquisition module is used for receiving a document preview request and acquiring a target data block in a plurality of data blocks of a document to be previewed according to the document preview request, wherein the target data block comprises a root object of the document to be previewed and indication information of an object address index table;
the first processing module is used for respectively determining a root object and an object address index table of the document to be previewed according to the target data block;
the second processing module is used for searching a first page object to be previewed according to the root object and the object address index table;
and the display module is used for acquiring a data block corresponding to an object in a page in the first page object to be previewed, and analyzing and rendering the data block.
10. The apparatus of claim 9, the second processing module comprising:
the first searching unit is used for searching a page object according to the root object and the object address index table;
the second searching unit is used for searching the identification information of the sub-page object from the page object;
and the third searching unit is used for searching the first page object to be previewed according to the identification information of the sub-page object and the object address index table.
11. The apparatus of claim 10, the first lookup unit to:
the first searching subunit is used for searching the identification information of the page object from the root object;
and the second searching subunit is used for searching the page object according to the identification information of the page object and the object address index table.
12. The apparatus of any of claims 9-11, the obtaining means comprising:
a first determining unit, configured to determine, according to the document preview request, addresses corresponding to multiple data blocks of the document to be previewed;
and the first acquisition unit is used for acquiring the target data block according to the respective corresponding addresses of the plurality of data blocks.
13. The apparatus of any of claims 9-11, the target data chunk being a first data chunk of a plurality of data chunks of the document to be previewed;
the first processing module comprises:
a fourth searching unit, configured to search, according to the first data block, the identification information of the root object and the address information of the object address index table respectively;
a fifth searching unit, configured to search the root object according to the identifier information of the root object;
and the second acquisition unit is used for acquiring the data block storing the target address index table according to the address information of the target address index table and acquiring the target address index table from the data block storing the target address index table.
14. The apparatus of any of claims 9-11, the target data chunk being a first data chunk and a last data chunk of a plurality of data chunks of the document to be previewed;
the first processing module comprises:
the analysis unit is used for analyzing the first data block and determining whether the document to be previewed is a linearized document;
a sixth searching unit, configured to search, from the last data block, identification information of the root object and address information of the object address index table, respectively;
a seventh searching unit, configured to search the root object according to the identification information of the root object and the first data block;
and the third acquisition unit is used for acquiring the data block storing the target address index table according to the address information of the target address index table and acquiring the target address index table from the data block storing the target address index table.
15. The apparatus of claim 14, the seventh lookup unit comprising:
a third searching subunit, configured to search the root object from the first data block according to the identifier information of the root object;
a first obtaining subunit, configured to obtain a second data block of the document to be previewed if the root object is not found in the first data block;
a fourth searching subunit, configured to search the root object from the second data block.
16. The apparatus of any of claims 9-11, wherein there is at most one of the plurality of data blocks having a size different from sizes of other data blocks.
17. An electronic device, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-8.
CN202110309374.1A 2021-03-23 2021-03-23 Document preview method, device, apparatus, storage medium and program product Active CN113051504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110309374.1A CN113051504B (en) 2021-03-23 2021-03-23 Document preview method, device, apparatus, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110309374.1A CN113051504B (en) 2021-03-23 2021-03-23 Document preview method, device, apparatus, storage medium and program product

Publications (2)

Publication Number Publication Date
CN113051504A true CN113051504A (en) 2021-06-29
CN113051504B CN113051504B (en) 2023-08-01

Family

ID=76514889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110309374.1A Active CN113051504B (en) 2021-03-23 2021-03-23 Document preview method, device, apparatus, storage medium and program product

Country Status (1)

Country Link
CN (1) CN113051504B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357031A (en) * 2022-01-06 2022-04-15 南方电网数字电网研究院有限公司 Dynamic calling method for data viewing engine

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002132835A (en) * 2000-10-24 2002-05-10 Toppan Forms Co Ltd System and method for providing document
WO2013185254A1 (en) * 2012-06-11 2013-12-19 Google Inc. Contextual content for online previews
CN103678698A (en) * 2013-12-27 2014-03-26 福建福昕软件开发股份有限公司北京分公司 Method and device for improving on-line browsing loading speed of PDF document
KR101415179B1 (en) * 2013-01-29 2014-07-04 주식회사 이파피루스 Page loading system and method thereof
CN106156148A (en) * 2015-04-14 2016-11-23 腾讯科技(深圳)有限公司 The rendering intent of a kind of page, device and terminal device
CN106202337A (en) * 2016-07-04 2016-12-07 华中师范大学 A kind of PPT sharing method and realize teacher side and the student side of the method
CN111881650A (en) * 2020-07-20 2020-11-03 北京百度网讯科技有限公司 PDF document generation method and device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002132835A (en) * 2000-10-24 2002-05-10 Toppan Forms Co Ltd System and method for providing document
WO2013185254A1 (en) * 2012-06-11 2013-12-19 Google Inc. Contextual content for online previews
KR101415179B1 (en) * 2013-01-29 2014-07-04 주식회사 이파피루스 Page loading system and method thereof
CN103678698A (en) * 2013-12-27 2014-03-26 福建福昕软件开发股份有限公司北京分公司 Method and device for improving on-line browsing loading speed of PDF document
CN106156148A (en) * 2015-04-14 2016-11-23 腾讯科技(深圳)有限公司 The rendering intent of a kind of page, device and terminal device
CN106202337A (en) * 2016-07-04 2016-12-07 华中师范大学 A kind of PPT sharing method and realize teacher side and the student side of the method
CN111881650A (en) * 2020-07-20 2020-11-03 北京百度网讯科技有限公司 PDF document generation method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357031A (en) * 2022-01-06 2022-04-15 南方电网数字电网研究院有限公司 Dynamic calling method for data viewing engine

Also Published As

Publication number Publication date
CN113051504B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
US20190188222A1 (en) Thumbnail-Based Image Sharing Method and Terminal
US20150234927A1 (en) Application search method, apparatus, and terminal
US20150135061A1 (en) Systems and methods for parallel traversal of document object model tree
CN113159010B (en) Video classification method, device, equipment and storage medium
CN111209374B (en) Data query display method, device, computer system and readable storage medium
CN113538450B (en) Method and device for generating image
CN114218890A (en) Page rendering method and device, electronic equipment and storage medium
CN112966469A (en) Method, device and equipment for processing chart in document and storage medium
CN113657395A (en) Text recognition method, and training method and device of visual feature extraction model
CN106899755B (en) Information sharing method, information sharing device and terminal
CN113836462A (en) Page description file generation method, device, equipment and storage medium
CN113051504B (en) Document preview method, device, apparatus, storage medium and program product
CN113656737A (en) Webpage content display method and device, electronic equipment and storage medium
US10963690B2 (en) Method for identifying main picture in web page
CN111966846B (en) Image query method, device, electronic equipment and storage medium
CN110647327A (en) Method and device for dynamic control of user interface based on card
CN115904240A (en) Data processing method and device, electronic equipment and storage medium
CN112887426B (en) Information stream pushing method and device, electronic equipment and storage medium
CN112861504A (en) Text interaction method, device, equipment, storage medium and program product
CN113221035A (en) Method, apparatus, device, medium, and program product for determining an abnormal web page
CN113656731A (en) Advertisement page processing method and device, electronic equipment and storage medium
CN112784596A (en) Method and device for identifying sensitive words
CN112860626A (en) Document sorting method and device and electronic equipment
CN113051505A (en) Document display method and device and electronic equipment
CN113268987B (en) Entity name recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant