WO2024087566A1 - Procédé et appareil de conversion de document, support de stockage lisible par ordinateur et dispositif informatique - Google Patents

Procédé et appareil de conversion de document, support de stockage lisible par ordinateur et dispositif informatique Download PDF

Info

Publication number
WO2024087566A1
WO2024087566A1 PCT/CN2023/091535 CN2023091535W WO2024087566A1 WO 2024087566 A1 WO2024087566 A1 WO 2024087566A1 CN 2023091535 W CN2023091535 W CN 2023091535W WO 2024087566 A1 WO2024087566 A1 WO 2024087566A1
Authority
WO
WIPO (PCT)
Prior art keywords
page
row
document
preset
columns
Prior art date
Application number
PCT/CN2023/091535
Other languages
English (en)
Chinese (zh)
Inventor
李乐乐
刘海林
Original Assignee
深圳市网旭科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市网旭科技有限公司 filed Critical 深圳市网旭科技有限公司
Publication of WO2024087566A1 publication Critical patent/WO2024087566A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems

Definitions

  • the present application relates to the technical field of document conversion, for example, to a document conversion method and apparatus, a computer-readable storage medium, and a computer device.
  • PDF Portable Document Format
  • Word documents are non-editable and editable documents that we commonly use, respectively. Due to the non-editable nature of non-editable documents, it is often necessary to convert non-editable documents into editable documents during the use of documents. For example, most PDF documents are non-editable, and some software can also implement the editing function of PDF documents, but they are often not as convenient as Word documents. Therefore, when users want to use some content in PDF documents to re-edit and obtain new document content, they usually need to convert PDF documents into Word documents.
  • the present application provides a document conversion method and apparatus, a computer-readable storage medium, and a computer device, which can improve the restoration degree of document conversion.
  • an embodiment of the present application provides a document conversion method, which is used to convert a non-editable first document into an editable second document, and the document conversion method includes: parsing the first document page by page to obtain all elements of each page of the first document, each element having a position and content; mapping all elements of each page of the content to each preset page, so that each page contains all elements of the corresponding page in the first document; constructing at least one of the following items according to the position and content of all elements in each preset page: at least one text block and at least one shape block; according to The preset layout rules determine the sections and columns of at least one of each text block and each shape block in each preset page, and obtain the layout of all elements of each page content in the corresponding preset page; the second document is generated according to each preset page with all elements laid out; the element layout of each page of the second document is the same as the element layout of the corresponding preset page.
  • an embodiment of the present application provides a computer device, the computer device comprising a memory and a processor.
  • the memory is configured to store program instructions.
  • the processor is configured to execute the program instructions to implement the above document conversion method.
  • an embodiment of the present application provides a document conversion device, wherein the document conversion device is configured to convert a non-editable first document into an editable second document, and the document conversion device includes a parsing module, a mapping module, a construction module, a layout module, and a generation module.
  • the parsing module is configured to parse the first document page by page to obtain all elements of each page of the first document, each element having a position and content.
  • the mapping module is configured to map all elements of each page of the content to each preset page, so that each page contains all elements of the corresponding page in the first document.
  • the construction module is configured to construct at least one of the following items according to the position and content of all elements in each preset page: at least one text block and at least one shape block.
  • the layout module is configured to determine the sections and columns of at least one of each text block and each shape block in each preset page according to a preset layout rule, and obtain the layout of all elements of each page of the content in the corresponding preset page.
  • the generation module is configured to generate the second document according to each preset page where all elements are laid out; the element layout of each page of the second document is the same as the element layout of the corresponding preset page.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium is used to store computer program instructions, and the computer program instructions are executed by a processor to implement the above-mentioned document conversion method.
  • FIG1 is a schematic diagram of a process flow of a document conversion method provided in an embodiment of the present application.
  • FIG2 is a schematic diagram of elements in a first document provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of mapping a page in a first document to a preset page according to an embodiment of the present application
  • FIG4 is a schematic diagram of creating a text block or a shape block in a preset page provided by an embodiment of the present application
  • FIG5 is a schematic diagram of the sub-step flow chart of step S107 of the document conversion method provided in the first embodiment of the present application;
  • FIG6 is a schematic diagram of the sub-step flow chart of step S107 of the document conversion method provided in the second embodiment of the present application.
  • FIG. 7 is a schematic diagram of the sub-step flow chart of step S107 of the document conversion method provided in the third embodiment of the present application.
  • FIG8 is a schematic diagram of the sub-step flow chart of step S107 of the document conversion method provided in the fourth embodiment of the present application.
  • FIG9 is a schematic diagram of the sub-step flow chart of step S105 of the document conversion method provided in the first embodiment of the present application.
  • FIG10 is a schematic diagram of the sub-step flow chart of step S105 of the document conversion method provided in the second embodiment of the present application.
  • FIG11 is a schematic diagram of a document conversion device provided in an embodiment of the present application.
  • FIG12 is a schematic diagram of a document conversion process provided in an embodiment of the present application.
  • FIG13a is a schematic diagram of a double-column page of a first document provided in an embodiment of the present application.
  • FIG13b is a schematic diagram of a single-column page of a first document provided in an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a computer device provided in an embodiment of the present application.
  • the present application provides a document conversion method, which can convert a non-editable document into an editable document, and can also layout the content of each page in the non-editable document according to the layout method of the editable document, so that the content layout of the document before and after the conversion is the same.
  • Fig. 1 is a document conversion method provided by an embodiment of the present application, for converting a non-editable first document into an editable second document.
  • the document conversion method includes the following steps.
  • Step S101 parse the first document page by page to obtain all elements of each page of the first document, each element having a position and content, and the content of each element is text content or format content.
  • an element whose content is text content is a text element
  • an element whose content is format content is a format element.
  • Text content includes text, pictures, graphics, etc.
  • Format content includes formats used to represent text content, such as stroke, underline, table border, fill, text highlight, cell background color, etc.
  • element C1 whose content is text and element C2 whose content is picture are text elements.
  • Element S1 whose content is background color, element S2 whose content is underline, and element S3 whose content is rectangular box are format elements.
  • the text content and the format content representing the text content are represented by different elements respectively.
  • the content of each page of the first document can be implemented by a deep learning model or obtained by a specific method.
  • the first document can be a PDF document, and the content of each page of the first document is each PDF page in the PDF document.
  • PDF documents generally include PDF documents in text format and PDF documents in scanned format.
  • PDF documents in text format can be parsed by the PDFium tool (a PDF rendering engine).
  • the deep learning model can be implemented using an existing model.
  • the first document can also be a picture format document, such as a JPG format document, a PNG format document, and the like.
  • Step S103 mapping all elements of each page content to each preset page, so that each preset page includes all elements of the corresponding page in the first document.
  • each preset page is used to lay out all elements of each page content.
  • the preset page has the same layout method as the second document, and the elements in the preset page can be laid out using the layout method. For example, all elements of each page content of the first document can be laid out in the corresponding preset page to obtain the layout attributes of each element, and the layout attributes obtained after the layout can be supported by the second document. That is, the layout attributes obtained after each element is laid out in the preset page are still effective in the second document.
  • the document conversion method provided by the present application further creates a A blank preset page is created, and all elements of each page of the first document are laid out in each preset page so that each element has layout attributes.
  • the size of the preset page matches the size of each page of the PDF document.
  • the document conversion method provided by the present application first parses the size of each page in the first document, such as the height and width of the page; then the height and width of the preset page are set according to the height and width of the obtained page, so that the size of each page of the first document matches that of each preset page.
  • FIG. 3 shows a schematic diagram of all elements a1 ⁇ an in a page 11 of the first document F being mapped to the preset page X one by one according to the position.
  • the position of the elements a1 ⁇ an is the coordinate (X01, Y01)...(X0n, Y0n)
  • the position of the elements a1 ⁇ an is the coordinate (X11, Y11)...(X1n, Y1n)
  • each element in the elements a1 ⁇ an has a unique coordinate corresponding to the preset page.
  • the second document is an editable document
  • the editable document usually has layout attributes, so that the user can present a better layout when editing the content, so as to achieve neat and standardized content typesetting.
  • the elements of the non-editable first document do not have layout attributes. That is to say, before all elements of each page of the first document are mapped to the preset page, each element does not contain layout attributes.
  • the preset page provides the same layout attributes as those of the second document. After all elements of each page of the first document are laid out in the corresponding preset page, each element has the layout attributes.
  • the second document may be a Word document
  • the layout attributes of the Word document include rows, sections, and columns.
  • the preset page also provides corresponding layout attributes such as rows, sections, and columns. In other words, elements without layout attributes are laid out in the preset page to obtain the layout attributes such as rows, sections, and columns of the elements.
  • Step S105 construct at least one text block and/or at least one shape block according to the positions and contents of all elements in each preset page.
  • the positional relationship between multiple elements is combined with the content to infer whether the multiple elements can be combined together, so that multiple text elements or multiple shape elements are integrated together to form an element block.
  • the element block is a text block or a shape block.
  • the element block in which multiple text elements are integrated together is the text block.
  • Multiple shape elements are combined together to form a shape block.
  • whether to group the elements together to form an element block is mainly determined based on the positional relationship between the elements.
  • element A and element B overlap in position, and the content of element A is text and the content of element B is a picture, then element A and element B can be combined together to form text block C, that is, text block C is equivalent to a picture with text.
  • element D1, element D2, and element D3 are all text elements, and element D1, element D2, and element D3 are located in multiple consecutive lines, and element D1, element D2, and element D3 have the same font and the text length is basically the same. If they are consistent, element D1, element D2, and element D3 can be combined together to form text block D0, that is, text block D0 is equivalent to a text paragraph.
  • each text block includes position coordinates, area information, rows, and elements of each row.
  • the position coordinates of the text block are represented by the position coordinates of the upper left corner of the text block.
  • the area information represents the size information covered by the text block in the preset page, such as the height and width of the text block.
  • the shape block also includes position coordinates, area information, rows, and elements of each row.
  • the rows of the text block in the preset page are represented by the upper and lower side lines in the horizontal direction, and the left and right side lines in the vertical direction.
  • each element block before creating each element block, the row of each element is determined first, that is, the elements in each page are divided into rows.
  • each element is divided into the same row according to the position of each element. In implementation, elements with little difference in vertical position are divided into one row.
  • the position coordinates of each text block and the covered area are determined according to the row of each row of elements.
  • the upper sideline of the starting row and the lower sideline of the ending row in the text block are used as the upper sideline and lower sideline of the text block, and the left sideline (i.e. the leftmost left sideline) with the smallest horizontal coordinate in the text block and the right sideline (i.e. the rightmost right sideline) with the largest horizontal coordinate in the text block are used as the left sideline and right sideline of the text block, thereby determining the regional information.
  • all elements are located in the area covered by the text block or the shape block. Through the construction of the text block or the shape block, it is possible to prevent the occurrence of wrong lines, overflow or font size being changed in each element.
  • the construction of the text block is described below by taking the element block D0 shown in FIG4 as an example.
  • the elements D1, D2, and D3 in the element block D0 are located in different rows, wherein the row where the element D1 is located is the starting row of the element block D0, and the upper edge line of the row where the element D1 is located is used as the upper edge line L1 of the element block D0.
  • the row where the element D3 is located is the ending row of the element block D0, and the lower edge line of the row where the element D3 is located is used as the lower edge line L2 of the element block D0.
  • the left edge line of the row where the element D2 is located is the leftmost line in the element block D0, therefore, the left edge line of the row where the element D2 is located is used as the left edge line of the element block D0.
  • the right edge line of the row where the element D1 is located is the rightmost line in the element block D0, therefore, the right edge line of the row where the element D1 is located is used as the right edge line of the element block D0.
  • the upper, lower, left, and right edges L1-L4 of the element block D0 determine the area covered by the element block D0 and obtain the area information of the element block D0.
  • the coordinates of the upper left corner of the element block D0 can be determined according to the area covered by the element block D0.
  • the row of each element block is first determined based on the positions of all elements; then the text block area is determined based on the row, so that the position coordinates of the text block, that is, the position coordinates of the upper left corner of the text block, and the area information of the text block, that is, the width and height, can be determined.
  • a shape block E0 which is equivalent to a table.
  • elements E1 to E10 are multiple groups of intersecting border lines, then elements E1 to E10 are combined to form a shape block E0, which is equivalent to a table.
  • elements E1 to E10 are combined to form a shape block E0, which is equivalent to a table.
  • a shape block also includes position coordinates, area information, rows, and elements of each row. That is, it can be understood that the method for constructing element blocks can set corresponding rules according to different content forms in the PDF, so as to construct the element blocks according to the rules. Build an element block.
  • Step S107 determining the sections and columns of each text block and/or each shape block in each preset page according to the preset layout rule, and obtaining the layout of all elements of each page content in the corresponding preset page.
  • the sections and columns of each text block and/or each shape block in each preset page are determined according to the preset layout rules. For example, the sections of each text block and/or each shape block in each preset page are determined according to the preset layout rules, and then the columns of each text block and/or each shape block are determined.
  • the sections of each text block are determined according to a preset layout rule, and then the columns of each text block are determined. Then, the sections and columns of the corresponding shape blocks are determined according to the sections and columns of each text block.
  • the layout of the shape blocks is determined according to the layout of the text blocks, and the corresponding text blocks and shape blocks are placed in the same section and column. How to determine the sections and columns will be described below.
  • Step S109 Generate a second document according to each preset page with all elements laid out, and the second document is an editable document.
  • the element layout of each page of the second document is the same as the element layout of the corresponding preset page.
  • page F of the first document maps the elements in page F of the first document to the preset page X and lays them out in the preset page, and then generates the second document W.
  • the preset page X is created using an Extensible Markup Language (XML) file.
  • the document conversion method of this embodiment can convert a non-editable first document into an editable second document, and during the conversion process, the above-mentioned document conversion method can also add section and column layouts to all elements of the content of each page of the first document, which greatly improves the situation where all elements in the first document have position deviations due to the lack of layout when they are converted to the second document, thereby improving the degree of restoration when converting non-editable documents into editable documents.
  • Step S107 may include steps S500-S508.
  • steps S500-S508 implement how to Divide into sections.
  • Step S500 calculating the gaps between all text blocks in each line line by line.
  • Step S502 determining the number of columns in each row based on the gaps between all text blocks, wherein when the gap between two text blocks is greater than a first preset value, it is determined that the two text blocks are located in two different columns; when the gap between two text blocks is less than or equal to the first preset value, it is determined that the two text blocks are located in the same column.
  • Step S504 detecting the number of columns in each row line by line.
  • Step S506 when the number of columns in a row is different from the number of columns in the row before the row, the row and the row before the row are divided into different sections.
  • Step S508 when the number of columns in a row is the same as the number of columns in the row before the row, the row and the row before the row are grouped into the same section.
  • the number of columns of each row of elements in a section should be the same, and the layout needs to be consistent. Therefore, in this embodiment, the number of columns of the upper and lower rows is used to determine whether the upper and lower rows are divided into the same section or different sections.
  • Step S107 includes steps S600-S604.
  • the columns of each row are single-column or double-column, as shown in Figures 13a and 13b
  • the layout of the first document is equivalent to a double-column page as shown in page F1
  • the layout of the first document is equivalent to a single-column page as shown in F2.
  • Steps S600-S604 implement a method of dividing each row of elements into columns.
  • Step S600 calculating the gaps between all text blocks in each line line by line.
  • Step S602 determining the number of columns in each row based on the gaps between all text blocks, wherein when the gap between two text blocks is greater than a first preset value, it is determined that the two text blocks are located in two different columns; when the gap between two text blocks is less than or equal to the first preset value, it is determined that the two text blocks are located in the same column.
  • Step S604 if the number of columns in a row is greater than two, the row is set to a single column.
  • the rows with more than two columns are set as single columns.
  • PDF files there are generally only two columns at most, so when there are more than two columns, it means that the text blocks in each row are not divided into columns, but it is caused by the arrangement of the text blocks. Therefore, such rows are determined as single columns, so as to quickly determine the columns of the rows with more than two columns.
  • Step S107 may also include steps S700-S704.
  • Step S700 If the number of columns in a row is equal to two, and the width of a column in the row is less than When the second preset value is set, the row is set to a single column.
  • the columns in the row may also be determined through the following steps.
  • Step S702 if the number of columns in a row is equal to two, detect the number of columns in the previous section of the row, the column dividing line of the previous section, and the column dividing line of the section where the row is located.
  • Step S704 if the number of columns in the previous section of the row is also equal to two, and the column dividing line of the previous section does not overlap with the column dividing line of the section where the row is located, set the row to a single column.
  • step S702-step S704 normally, if the consecutive rows are double columns, the dividing lines should overlap, and the rows in the same section are either all single columns or all double columns. Therefore, if the number of columns in a row is equal to two but the dividing line of the column of the previous section does not overlap with the column dividing line of the section where the row is located, it means that the row is not divided into columns and is thus regarded as a single column. That is, it is quickly determined that the row with the number of columns equal to two but the column dividing line does not overlap with the column dividing line of the previous section is regarded as a single column.
  • the present application can divide a row into columns when the number of columns in a row is equal to two.
  • Step S107 may also include steps S800-S808.
  • Step S800 If the number of columns in a row is equal to one and the number of columns in the previous section of the row is two, determine whether the text block in the row is completely located in the left column of the previous section of the row. In a section, from left to right, the column on the left is the left column, and the column on the right is the right column.
  • Step S802 When the text block of the row is completely located in the left column of the previous section of the row, the row is set to double columns.
  • the columns of the line can also be determined by the following steps.
  • Step S804 if the number of columns in a row is equal to one and the number of columns in the previous section of the row is two, detect the height of the previous section of the row.
  • Step S806 determining whether the height of the previous section of the row is less than a third preset value.
  • Step S808 When the height of the previous section of the row is less than a third preset value, the row is set to a single column, and the columns of the previous section of the row are adjusted to a single column.
  • the present application can determine the row when the number of columns in a row is equal to one. Columns.
  • Step S105 includes steps S900-S908.
  • steps S900-S908 how to construct a shape block is implemented.
  • what is constructed is an explicit table shape block, that is, the area of the table with border lines displayed in the PDF document.
  • Step S900 Detect whether there is one or more groups of intersecting border lines in each preset page, each border line corresponding to an element.
  • a set of intersecting border lines includes at least two intersecting border lines.
  • Step S902 If there are multiple groups of intersecting border lines, the areas corresponding to the multiple groups of intersecting border lines are determined as potential explicit table areas, and area information of a shape block is obtained.
  • Step S904 determining the table structure of the potential explicit table area according to the one or more groups of intersecting border lines to obtain one or more cells.
  • imaginary horizontal and vertical lines can be used to calculate whether the horizontal and vertical lines intersect with the vertical border lines or the horizontal border lines. If there are no intersection points, it means that there are merged cells in the horizontal or vertical direction, thereby obtaining the table structure of the explicit table area.
  • Step S906 confirming the area corresponding to each of the cells as the area information of each of the cells.
  • Step S908 obtaining a corresponding explicit table shape block according to the area information of the shape block, the area information of all cells, and the elements corresponding to all border lines.
  • the text block in the corresponding area is set to a table format, thereby improving the convenience of editing in the editable second document and meeting the layout requirements.
  • Step S105 includes steps S1001-S1009.
  • an invisible table-shaped block is constructed, that is, an area in which no border line is displayed in the PDF document but a table is needed to layout the corresponding text block.
  • the content of some areas is not a table, it can be moved as a whole when editing, such as a text block without a border but in a table layout.
  • the editing of these text blocks requires the use of a setting table for layout needs.
  • Step S1001 determining a potential implicit table area according to the positional relationship between all text blocks.
  • Step S1003 determining the imaginary border lines of the potential implicit table area, each imaginary border line It is represented by a hypothetical element, which includes position and format content.
  • Step S1005 determining the table structure of the potential implicit table area according to the imaginary border line to obtain one or more cells.
  • the implementation method is the same as step S904.
  • Step S1007 confirming the area corresponding to each of the cells as the area information of each of the cells.
  • Step S1009 obtaining a corresponding invisible table shape block according to the area information of the shape block, the area information of all cells, and the imaginary elements corresponding to all imaginary border lines.
  • the invisible table area is identified and an imaginary border line is added to set the text block content in the area into a table format, thereby improving the convenience of editing in the editable second document and meeting layout requirements.
  • FIG. 11 is a functional module diagram of a document conversion device 100.
  • the document conversion device 100 is configured to convert a non-editable first document into an editable second document.
  • the document conversion device 100 includes a parsing module 101, a mapping module 103, a construction module 105, a layout module 107, and a generation module 109.
  • the parsing module 101 is configured to parse the first document page by page to obtain all elements of each page of the first document, each element having a position and content.
  • the implementation process of the parsing module 101 can be implemented with reference to the description of the above step S101.
  • the mapping module 103 is configured to map all elements of each page content to each preset page, so that each preset page contains all elements of the corresponding page in the first document.
  • the mapping module 103 can refer to the description of the above step S103.
  • the construction module 105 is configured to construct at least one of the following items according to the positions and contents of all elements in each preset page: at least one text block and at least one shape block.
  • the construction module 105 can refer to the description of the above step S105 and its sub-steps.
  • the layout module 107 is configured to determine the sections and columns of at least one of each text block and each shape block in each preset page according to a preset layout rule, and obtain the layout of all elements of each page content in the corresponding preset page.
  • the layout module 107 can refer to the description of the above step S107 and its sub-steps.
  • the generating module 109 is configured to generate a second document according to each preset page with all elements laid out, and the second document is an editable document; the element layout of each page of the second document is the same as the element layout of the corresponding preset page.
  • the generating module 109 can refer to the description of the above step S109.
  • FIG 14 is a schematic diagram of the internal structure of a computer device provided in an embodiment of the present application.
  • the computer device 10 includes a memory 11 and a processor 12.
  • the memory 11 is configured to store program instructions
  • the processor 12 is configured to execute program instructions to implement the above document conversion method.
  • the processor 12 can be a central processing unit (CPU), a controller, a microcontroller, a microprocessor or other data processing chip, which is configured to run program instructions stored in the memory 11.
  • CPU central processing unit
  • controller a controller
  • microcontroller a microprocessor or other data processing chip, which is configured to run program instructions stored in the memory 11.
  • the memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, an SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc.
  • the memory 11 may be an internal storage unit of a computer device, such as a hard disk of a computer device.
  • the memory 11 may also be an external storage device of a computer device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), etc. equipped on the computer device.
  • the memory 11 may also include both an internal storage unit of a computer device and an external storage device.
  • the memory 11 may not only be configured to store application software and various types of data installed in the computer device, such as codes for implementing a document conversion method, etc., but may also be configured to temporarily store data that has been output or is to be output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Document Processing Apparatus (AREA)

Abstract

La présente demande concerne un procédé de conversion de document. Le procédé consiste à : analyser un premier document page par page pour obtenir tous les éléments du contenu de chaque page du premier document, chaque élément ayant une position et un contenu ; mapper de manière correspondante tous les éléments du contenu de chaque page à chaque page prédéfinie ; en fonction des positions et du contenu de tous les éléments dans chaque page prédéfinie, construire au moins un bloc de texte et/ou au moins un bloc de forme ; selon une règle de mise en page prédéfinie, déterminer une section et une colonne de chaque bloc de texte et/ou de chaque bloc de forme dans chaque page prédéfinie ; et générer un second document selon chaque page prédéfinie dans laquelle tous les éléments sont mis en page. De plus, la présente demande concerne en outre un appareil utilisant le procédé de conversion de document, ainsi qu'un support de stockage lisible par ordinateur et un dispositif informatique.
PCT/CN2023/091535 2022-10-28 2023-04-28 Procédé et appareil de conversion de document, support de stockage lisible par ordinateur et dispositif informatique WO2024087566A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211332538.3 2022-10-28
CN202211332538.3A CN115510821A (zh) 2022-10-28 2022-10-28 文档转换方法及装置、计算机可读存储介质、计算机设备

Publications (1)

Publication Number Publication Date
WO2024087566A1 true WO2024087566A1 (fr) 2024-05-02

Family

ID=84511518

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/091535 WO2024087566A1 (fr) 2022-10-28 2023-04-28 Procédé et appareil de conversion de document, support de stockage lisible par ordinateur et dispositif informatique

Country Status (2)

Country Link
CN (1) CN115510821A (fr)
WO (1) WO2024087566A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115510821A (zh) * 2022-10-28 2022-12-23 深圳市网旭科技有限公司 文档转换方法及装置、计算机可读存储介质、计算机设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582934A (zh) * 2018-12-04 2019-04-05 万兴科技股份有限公司 版式文档的转换方法及装置
CN113361257A (zh) * 2021-06-29 2021-09-07 深圳壹账通智能科技有限公司 Pdf文档解析方法、系统、电子装置及存储介质
CN115114481A (zh) * 2022-06-09 2022-09-27 抖音视界有限公司 文档格式转换方法、装置、存储介质及设备
CN115510821A (zh) * 2022-10-28 2022-12-23 深圳市网旭科技有限公司 文档转换方法及装置、计算机可读存储介质、计算机设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582934A (zh) * 2018-12-04 2019-04-05 万兴科技股份有限公司 版式文档的转换方法及装置
CN113361257A (zh) * 2021-06-29 2021-09-07 深圳壹账通智能科技有限公司 Pdf文档解析方法、系统、电子装置及存储介质
CN115114481A (zh) * 2022-06-09 2022-09-27 抖音视界有限公司 文档格式转换方法、装置、存储介质及设备
CN115510821A (zh) * 2022-10-28 2022-12-23 深圳市网旭科技有限公司 文档转换方法及装置、计算机可读存储介质、计算机设备

Also Published As

Publication number Publication date
CN115510821A (zh) 2022-12-23

Similar Documents

Publication Publication Date Title
US8321783B2 (en) Visualizing content positioning within a document using layers
JP4332477B2 (ja) レイアウト調整方法及び装置並びにプログラム
US7337393B2 (en) Methods and systems for providing an editable visual formatting model
US9043698B2 (en) Method for users to create and edit web page layouts
US7750924B2 (en) Method and computer-readable medium for generating graphics having a finite number of dynamically sized and positioned shapes
EP2291010A1 (fr) Procédé et appareil de traitement de structure pour fichier de mise en page
US20050223319A1 (en) Layout-rule generation system, layout system, layout-rule generation program, layout program, storage medium, method of generating layout rule, and method of layout
US20100128293A1 (en) Document processing apparatus, control method therefor, and computer program
US20030070146A1 (en) Information processing apparatus and method
EP2544099A1 (fr) Procédé de création d'un fichier d'enrichissement associé à une page d'un document électronique
WO2024087566A1 (fr) Procédé et appareil de conversion de document, support de stockage lisible par ordinateur et dispositif informatique
US20100131566A1 (en) Information processing method, information processing apparatus, and storage medium
US11714953B2 (en) Facilitating dynamic document layout by determining reading order using document content stream cues
US20230153516A1 (en) Systems and methods for generating webpage data for rendering a design
JP2010108208A (ja) 文書処理装置
JP5612557B2 (ja) 表のセルの高さを決定する方法、コンピューター読取可能媒体及びシステム
CN112417826B (zh) Pdf在线编辑方法、装置、电子设备和可读存储介质
CN116702705B (zh) 页面及数据图表混合展现的可阅读文件的签批方法及装置
JP2009282969A (ja) 書籍掲載文書の電子的な編集・内容変更システム、書籍掲載文書の電子的な編集・内容変更プログラムおよび書籍作成システム
CN112416340A (zh) 基于草图的网页生成方法和系统
CN114580365A (zh) 表格组件绘制方法、装置、电子设备及存储介质
JPH06110995A (ja) ワードイメージの再配置によりテキストの特性を自動的に変更する方法
JP2004326567A (ja) 表コンテンツ作成支援システム、方法及びプログラム
Hansen A function-based formatting model
CN113505566B (zh) 一种版式文档的处理方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23881161

Country of ref document: EP

Kind code of ref document: A1