CN118095204A

CN118095204A - Document rearrangement method and device

Info

Publication number: CN118095204A
Application number: CN202311760203.6A
Authority: CN
Inventors: 黄高准
Original assignee: Beijing Ape Power Future Technology Co Ltd
Current assignee: Beijing Ape Power Future Technology Co Ltd
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-05-28

Abstract

The application provides a document rearrangement method and a device, wherein the document rearrangement method comprises the following steps: detecting the document image through a layout detection model, a picture and text detection model and a character detection model respectively, and determining a layout image, a picture and text frame and a character frame according to detection results; sequencing the layout frames in the layout images according to the reading sequence to obtain a layout frame reading list, and fusing the picture frames and the character frames to obtain a fusion result; merging layout frames in the layout images by traversing the layout frame reading list to obtain column-division images, and mapping fusion frames in the fusion results to the column-division images to obtain images to be updated; and carrying out area update on the image to be updated according to the rectangular frame distribution information of the image to be updated, carrying out text rearrangement processing on the updated image to be updated, and generating a target file corresponding to the document image according to a processing result.

Description

Document rearrangement method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a document rearrangement method and apparatus.

Background

With the development of computer technology, the application of document conversion is being applied in more and more scenes. Most of the existing document format conversion is realized based on a rearrangement technology, and the layout rearrangement technology mainly supports layout documents containing structural information, such as PDF, CEBX, EPUB. Such layout documents with structured information contain content-based information such as literal codes, positions, font sizes, insert positions, graphical expressions, and the like. Based on this information, it is possible to more conveniently rearrange according to different sizes. However, for the scanned version document without the structured information, two methods exist in the prior art, namely, the image format document is converted into the format document with the structured information by means of a format conversion tool, but the method is long in time consumption, and a plurality of format errors are easy to occur after the conversion, so that a rearrangement result is influenced. The other is to directly perform image processing on the image document, typically white-cutting processing or switching the display focus in reading order. These approaches can make efficient use of the display area, but still suffer from poor display readability on small screen devices for large documents (e.g., exercise books, test papers, etc.). And switching the display focus according to the reading order does not conform to the reading habit of people, and the feedback experience of users is poor. There is therefore a need for an effective solution to the above problems.

Disclosure of Invention

In view of the above, the embodiments of the present application provide a document rearrangement method to solve the technical defects existing in the prior art. The embodiment of the application also provides a document rearrangement device, a computing device and a computer readable storage medium.

According to a first aspect of an embodiment of the present application, there is provided a document rearrangement method including:

detecting the document image through a layout detection model, a picture and text detection model and a character detection model respectively, and determining a layout image, a picture and text frame and a character frame according to detection results;

Sequencing the layout frames in the layout images according to the reading sequence to obtain a layout frame reading list, and fusing the picture frames and the character frames to obtain a fusion result;

Merging layout frames in the layout images by traversing the layout frame reading list to obtain column-division images, and mapping fusion frames in the fusion results to the column-division images to obtain images to be updated;

And carrying out area update on the image to be updated according to the rectangular frame distribution information of the image to be updated, carrying out text rearrangement processing on the updated image to be updated, and generating a target file corresponding to the document image according to a processing result.

According to a second aspect of the embodiment of the present application, there is provided a document rearranging apparatus including:

the detection module is configured to detect the document image through the layout detection model, the graphic detection model and the character detection model respectively, and determine the layout image, the graphic frame and the character frame according to detection results;

The ordering module is configured to order the layout frames in the layout images according to the reading sequence to obtain a layout frame reading list, and fuse the picture frames and the character frames to obtain a fusion result;

The traversing module is configured to merge layout frames in the layout images by traversing the layout frame reading list to obtain column images, and map fusion frames in the fusion results to the column images to obtain images to be updated;

The generating module is configured to perform area update on the image to be updated according to the rectangular frame distribution information of the image to be updated, perform text rearrangement processing on the updated image to be updated, and generate a target file corresponding to the document image according to a processing result.

According to a third aspect of embodiments of the present application, there is provided a computing device comprising:

A memory and a processor;

the memory is used for storing computer executable instructions, and the processor implements the steps of the document reordering method when executing the computer executable instructions.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the document reordering method.

In order to improve the rapid and accurate generation of the target file on the basis of the image, and the document rearrangement result is highly consistent with the content in the image, the document image can be detected by a layout detection model, a graphic detection model and a character detection model respectively so as to determine the layout image, the graphic frame and the character frame through three types of neural networks; and images with different dimensions are obtained from different granularity division angles. In order to ensure that the content typesetting in the finally generated document is the same as the content typesetting in the image, the layout frames in the layout image can be ordered according to the reading sequence to obtain a layout frame reading list, and the frame and the character frame are fused to obtain a fusion result; on the basis, merging layout frames in the layout images by traversing the layout frame reading list to obtain column images, and mapping fusion frames in fusion results to the column images to obtain images to be updated; the preliminary typesetting of characters, layouts and columns in the image dimension can be achieved. And then the area of the image to be updated can be updated according to the rectangular frame distribution information of the image to be updated, and text rearrangement processing is carried out on the updated image to be updated, so that the text rearrangement can be carried out on the basis of correct preliminary typesetting, the rearrangement result is more in accordance with the reading habit of a user, and the text content can be clearly displayed on any equipment, thereby generating a target file corresponding to the document image according to the processing result, and the text content in the document is consistent with the text content in the image, so that the downstream use is convenient.

Drawings

FIG. 1 is a schematic diagram of a document rearrangement method according to an embodiment of the present application;

FIG. 2 is a flow chart of a document rearrangement method according to one embodiment of the present application;

FIG. 3 is a schematic view of an image in a first document rearrangement method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an image in a second document rearrangement method according to an embodiment of the present application;

FIG. 5 is a schematic view of an image in a third document rearrangement method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a document rearrangement apparatus according to an embodiment of the present application;

FIG. 7 is a block diagram of a computing device according to one embodiment of the application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. The present application may be embodied in many other forms than those herein described, and those skilled in the art will readily appreciate that the present application may be similarly embodied without departing from the spirit or essential characteristics thereof, and therefore the present application is not limited to the specific embodiments disclosed below.

The terminology used in the one or more embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the application. As used in one or more embodiments of the application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the application.

In the present application, a document rearrangement method is provided. The present application also relates to a document rearranging apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments.

Referring to the schematic diagram shown in fig. 1, in order to enhance the rapid and accurate generation of the target file on the basis of the image, and the document rearrangement result is highly consistent with the content in the image, the document image may be detected by a layout detection model, a graphic detection model and a character detection model, so as to determine the layout image, the graphic frame and the character frame through three types of neural networks; and images with different dimensions are obtained from different granularity division angles. In order to ensure that the content typesetting in the finally generated document is the same as the content typesetting in the image, the layout frames in the layout image can be ordered according to the reading sequence to obtain a layout frame reading list, and the frame and the character frame are fused to obtain a fusion result; on the basis, merging layout frames in the layout images by traversing the layout frame reading list to obtain column images, and mapping fusion frames in fusion results to the column images to obtain images to be updated; the preliminary typesetting of characters, layouts and columns in the image dimension can be achieved. And then the area of the image to be updated can be updated according to the rectangular frame distribution information of the image to be updated, and text rearrangement processing is carried out on the updated image to be updated, so that the text rearrangement can be carried out on the basis of correct preliminary typesetting, the rearrangement result is more in accordance with the reading habit of a user, and the text content can be clearly displayed on any equipment, thereby generating a target file corresponding to the document image according to the processing result, and the text content in the document is consistent with the text content in the image, so that the downstream use is convenient.

Referring to fig. 2, fig. 2 is a flowchart of a document rearrangement method according to an embodiment of the present application, which specifically includes the following steps:

Step S202, detecting the document image through a layout detection model, a picture and text detection model and a character detection model respectively, and determining a layout image and a picture and text frame and a character frame according to detection results.

The document rearrangement method provided by the embodiment can be applied to any document rearrangement scene, for example, after text content rearrangement is carried out on document images with 20 characters displayed on one line, 15 characters or 25 characters can be displayed on each line, and the rearranged text content can be adapted to the current display equipment, so that the display content is ensured to be more convenient for a user to watch, and the browsing and the use of the text content are not influenced by the fact that the character content is too large or too small. The document image to be rearranged can be an image uploaded after the user shoots the document, or can be an image in a PDF document uploaded by the user. The present embodiment is not limited in any way herein. In the embodiment, taking the document image uploaded after the user shoots the test paper as an example, the contents contained in the document image are rearranged, so that the user can more conveniently answer the test paper on the terminal.

Specifically, the document image specifically refers to an image which is uploaded by a user and needs to extract text content in the image to generate a document with a set format. Correspondingly, the layout detection model specifically refers to a model for detecting and positioning layout contents in a document image, and the model can detect the attribute of each content area in the image, and the main categories include texts, pictures, tables, headers, footers, side bars and the like. Wherein, each block area can decide whether to need to be divided into areas according to the number of empty rows. That is, the attribute and location of each region in the image can be predicted by the layout detection model.

Correspondingly, the image-text detection model specifically refers to a model for detecting and positioning a picture area and a text area in a document image, and the model can detect and position areas with larger occupied areas than characters, such as combined images (such as connection questions, complex flowcharts and the like), complex characters (such as vertical calculation, pinyin Chinese characters, chemical expressions, field characters, four-line three-grids and the like), formulas (such as numerical formulas and matrixes), underlines, bracket frames and the like in the image. That is, the image-text block area in the document image can be detected and positioned by the image-text detection model. Correspondingly, the character detection model specifically refers to a model for detecting and positioning characters in the document image, wherein the characters can be letters, chinese characters, initials, finals, numbers and the like. For detecting and locating all characters in the document image.

Correspondingly, the layout image specifically refers to an image obtained by detecting a document image by using a layout detection model and then selecting frames for different layout contents in the document, wherein the image comprises layout frames used for selecting different layout areas. Correspondingly, the frame is a rectangular frame for framing the block areas, and the character frame is a rectangular frame for framing each character area.

That is, after the layout image is obtained and input to the layout detection model, the graphic detection model, and the character detection model, the analysis and extraction of the layout, the graphic block, and the text can be performed. The three models can be respectively responsible for detecting the layout structure, the image-text structure and the text information of the text in the image. The layout detection model can detect the position and the attribute of each effective content area of the document, and is responsible for detecting complex elements which cannot be covered by text detection, such as a flow chart, complex Chinese characters, formulas and the like. The text detection model is responsible for detecting all the texts in the image, so that detection and positioning of all text contents in the image are completed from different granularities, and downstream service use is facilitated.

Further, when the layout image, the frame and the character frame are generated after the document image is detected by the layout detection model, the picture and text detection model and the character detection model, the result output by each model is actually used for drawing and marking the rectangular frame in the document image. In this embodiment, the specific implementation manner is as follows:

Acquiring a document image, inputting the document image into a layout detection model, a picture and text detection model and a character detection model for processing, and acquiring layout frame position information, picture and text frame position information and character frame position information corresponding to the document image; generating a layout frame in the document image according to the layout frame position information, and obtaining a layout image according to a generation result; and generating a frame and a character frame in the document image according to the frame position information and the character frame position information.

Specifically, the layout frame position information specifically refers to coordinate information obtained after the layout detection model performs frame selection on document contents in the document image; the frame position information specifically refers to coordinate information obtained after the frame selection of the image-text blocks in the document image by the image-text detection model. The character frame position information specifically refers to coordinate information obtained after the character detection model performs frame selection on characters in the document image. It should be noted that, the coordinate system to which the coordinates belong may be constructed according to the document image size, for example, the coordinate system is constructed by using the corner point of the image as the origin of coordinates and the long and short sides as the horizontal and vertical axes. The coordinate information of each rectangular frame may be determined based on the center point or corner point of the rectangular frame, and the present embodiment is not limited in any way.

Based on the above, after the document image uploaded by the user is obtained, the document image can be input into the layout detection model, the image-text detection model and the character detection model, so that the layout frame position information corresponding to the document image is output through the layout detection model, the image-text frame position information corresponding to the document image is output through the image-text detection model, and the character frame position information corresponding to the document image is output through the character detection model. On the basis, a layout frame can be generated in the document image according to the layout frame position information, and the layout image is obtained according to the generation result; and generating a frame and a character frame in the document image according to the frame position information and the character frame position information, so that the subsequent rearrangement processing of text contents in the document image is conveniently completed by combining the frame, the character frame and the layout image.

For example, a document image uploaded by the user for a test paper to be answered is acquired, as shown in fig. 3a. In order to facilitate the user to answer the test paper on the terminal, so that the automatic correction can be carried out on the test paper which is finished by the user on the terminal directly, the rearrangement processing can be carried out on the content of the test paper in the document image, and the rearranged content can be clearly displayed on the terminal equipment held by the user, so that the problem of overlarge or overlarge characters cannot occur. Based on the above, the document image can be input into the layout detection model, the graphic detection model and the character detection model for processing, so as to obtain the layout frame position information of the layout frame for framing the layout area in the document image, the frame position information of the frame for framing the graphic block (such as a formula, a combined diagram and the like), and the character frame position information of the character frame for framing the characters. On the basis, the layout area can be subjected to frame selection and attribute definition in the document image according to the layout frame position information, so that the layout image shown in the b of fig. 3 is obtained, and meanwhile, the frame selection and the frame selection are performed on the text blocks in the document image according to the frame position information and the character frame position information, so that the frame and the character frame shown in the c of fig. 3 are obtained, and the downstream service can be conveniently used.

In practical application, when detecting and identifying the text blocks and characters in the document image, the character frame and the text frame can be generated in the same document image at the same time, or can be generated in two document images respectively, and the embodiment is not limited in any way.

In conclusion, the layout, the image text and the characters of the document image are detected and positioned from different dimensions through the layout detection model, the image text detection model and the character detection model, so that the loss of text content identification in the document image can be avoided, the finally generated target file is ensured to be consistent with the text content in the document image, and the downstream service is convenient to use.

Step S204, sorting the layout frames in the layout images according to the reading sequence to obtain a layout frame reading list, and fusing the picture frames and the character frames to obtain a fusion result.

Specifically, after obtaining the frames, the character frames and the layout images, further, in order to ensure that the text content contained in the rearrangement result is consistent with the text content arrangement in the document image in the process of rearranging the text content in the document image, the layout frames in the layout images can be ordered according to the reading sequence, so as to obtain a layout frame reading list according to the ordering result, and the layout dimension is ensured to be consistent with the layout arrangement result in the document. After that, considering that the frames and the character frames are rectangular frames for detecting and positioning all the contents in the document image, after the two types of rectangular frames are used for selecting the frames of all the text contents in the document image, the frames and the character frames can be fused, so that a fusion result containing the fusion frames is obtained according to the fusion result, and after the frames of all the document contents in the document image are selected, the two types of rectangular frames are combined, so that the calculation complexity brought by the types of rectangular frames is reduced, and the rearrangement efficiency is improved.

The layout frame reading list specifically refers to a list obtained by sequencing a plurality of layout frames contained in a layout image according to a reading sequence, and when the layout frames are combined later, the layout frames belonging to the same column can be ensured to be combined together through the layout frame reading list, so that in a document rearrangement stage, the generated result can be ensured to be consistent with layout arrangement in the document image, and contents in each column are consistent. Correspondingly, the fusion result specifically refers to a result obtained by fusing frames with distributed overlapping relation and the character frames, wherein the result comprises the fusion frame, the fusion frame can be understood as a rectangular frame obtained by carrying out frame selection on all the image contents in the document image, and when the character frames and the image frames are fused, if the character frames and the image frames do not have the fusion relation, the character frames or the image frames without the fusion relation can be directly used as the fusion frame, so that all the fusion frames can be ensured to carry out frame selection on all the image contents in the image, and the downstream use is convenient.

Further, when the layout frames in the layout images are ordered according to the reading sequence, different lists are created according to the reading sequence, and the corresponding sequence of each layout frame is determined by selecting different layout frames from the lists for analysis.

In this embodiment, the specific implementation manner is as follows:

Creating an initial layout frame reading list according to a reading sequence, creating a layout frame candidate list according to the layout frames in the layout images, and creating a layout frame analysis list aiming at the layout images, wherein the layout frame analysis list comprises virtual layout frames; selecting a first layout frame from the layout frame analysis list, and moving the first layout frame from the layout frame analysis list to the initial layout frame reading list under the condition that the first layout frame is a non-virtual layout frame; selecting a second layout frame from the layout frame candidate list according to the first layout frame, and migrating the second layout frame from the layout frame candidate list to the layout frame analysis list; and taking the layout analysis list added with the second layout as the layout analysis list, executing the step of selecting the first layout in the layout analysis list until the layout analysis list is empty, and taking the initial layout reading list corresponding to the target ordering period as the layout reading list.

Specifically, the initial layout frame reading list specifically refers to a list in which layout frame information needs to be recorded in sequence, and the layout frames recorded in the list are ordered according to the reading sequence. Correspondingly, the layout frame candidate list specifically refers to a list for recording all layout frame information in the layout image, and the layout frames in the list are added to the layout frame analysis list according to the period, and then the initial layout frame reading list is written, so that the sorting of the layout frames can be completed. Correspondingly, the layout frame analysis list specifically refers to a list of layout frames to be analyzed, wherein the list is extracted from the layout frame candidate list, the layout frames are read from the layout frame candidate list and added to the layout frame analysis list for analysis, and the layout frames can be ordered according to the reading sequence, so that after the layout frames extracted from the list are added to the initial layout frame reading list, the layout frames can be ordered according to the reading sequence. Correspondingly, the virtual layout frames specifically refer to virtual layout frames added in a layout frame analysis list, and are used for being stored in the candidate layout frames and the list in a layout frame analysis stage, so that the problem that analysis tasks cannot be advanced after empty sets are read in a cyclic analysis stage is avoided. Correspondingly, the first layout frame specifically refers to a layout frame to be analyzed read from the layout frame analysis list. Correspondingly, the second layout frame specifically refers to a layout frame which is selected from the layout frame candidate list and is associated with the first layout frame, and the layout frame is added to the layout frame analysis list, so that the analysis task of the next period can be triggered.

Based on this, when constructing the layout frame reading list corresponding to the layout frames included in the layout image according to the reading order, in order to ensure that all the layout frames can be ordered according to the order, an initial layout frame reading list can be created according to the reading order, a layout frame candidate list is created according to the layout frames in the layout image, a layout frame analysis list is created for the layout image, and the layout frame analysis list includes virtual layout frames. On this basis, analysis and processing of each layout frame can be sequentially performed according to the sorting period.

That is, the first layout frame may be selected from the layout frame analysis list, if the first layout frame is a virtual layout frame, the second layout frame may be directly selected from the layout frame candidate list for subsequent processing, and the selected second layout frame may be selected at this time without referring to other layout frames for random selection. If the first layout frame is a non-virtual layout frame, the first layout frame can be moved from the layout frame analysis list to the initial layout frame reading list; at this time, it can be understood that the first layout frame finishes the sorting, and then a second layout frame can be selected from the layout frame candidate list according to the first layout frame, and the second layout frame is migrated from the layout frame candidate list to the layout frame analysis list; after the updating of the layout analysis list is completed, the layout analysis list added with the second layout frame can be used as the layout analysis list, and the step of selecting the first layout frame in the layout analysis list is executed, so that after each layout frame is analyzed one by one, the initial layout frame reading list corresponding to the target ordering period can be used as the layout frame reading list until the layout frame analysis list is empty. That is to say, the initial layout frame reading list obtained in the last sorting period is taken as the layout frame reading list.

In conclusion, by creating different lists to analyze each layout frame in the layout image, the result of each layout frame after being sequenced according to the reading sequence can be accurately determined, and the layout frame reading list constructed based on the result is more accurate, so that the problem of typesetting errors can be avoided when the subsequent rearrangement processing is performed on the content in the image.

Further, since there are a plurality of layout frames included in the layout candidate list, in order to ensure that the selected second layout frame has association with the first layout frame, the sorting accuracy is improved, and the sorting can be implemented according to the boundary value of the layout frames. In this embodiment, the specific implementation manner is as follows:

Filtering the layout frame candidate list according to the first boundary value of the first layout frame to obtain an intermediate layout frame candidate list; sorting the layout frames contained in the intermediate layout frame candidate list according to the second boundary value of each layout frame in the intermediate layout frame candidate list, and selecting a third layout frame according to the sorting result; creating an adjacent layout frame for the third layout frame, and constructing a layout frame candidate set based on the third layout frame and the adjacent layout frame; and filtering the layout frame candidate set according to the first boundary value, sorting the filtering result, and determining the second layout frame according to the sorting result.

Specifically, the first boundary value specifically refers to a value of a right boundary of the first layout frame corresponding to a horizontal axis coordinate. Correspondingly, the intermediate layout frame candidate list specifically refers to a candidate list composed of the remaining layout frames obtained after filtering the layout frame candidate list according to the boundary value, and the filtering refers to the process of deleting the layout frames which are not matched with the first boundary value in the layout frame candidate list. Correspondingly, the second boundary value specifically refers to the value of the horizontal axis coordinate corresponding to the left boundary of each layout frame in the intermediate layout frame candidate list. Correspondingly, the third layout frame specifically refers to a layout frame which is selected according to the sorting result and possibly has an association relation with the first layout frame after sorting each layout frame in the intermediate layout frame candidate list according to the second boundary value. Correspondingly, the adjacent layout frames are specifically left and right adjacent layout frames created by the pointer to the third layout frame, and the layout frame candidate set can be obtained by combining the third layout frame and the adjacent layout frames thereof.

Based on the above, when selecting the second layout frame with the association relation with the first layout frame from the layout frame candidate list, the layout frame candidate list is filtered according to the first boundary value of the first layout frame, so as to reject the layout frame which does not have the association relation with the first layout frame according to the filtering result, thereby obtaining an intermediate layout frame candidate list; after that, the layout frames contained in the intermediate layout frame candidate list can be ordered according to the second boundary value of each layout frame in the intermediate layout frame candidate list, so that a third layout frame with a strong association relation with the first layout frame is selected based on the ordering result; on the basis, adjacent layout frames can be established for the third layout frame, and a layout frame candidate set is established based on the third layout frame and the adjacent layout frames; and then filtering the candidate set of layout frames according to the first boundary value, and ordering the filtering result, namely selecting one or more second layout frames from a plurality of layout frames possibly associated with the first layout frames, adding the second layout frames into a layout frame analysis list, and then using the second layout frames in subsequent processing to accurately finish the processing of arranging the layout frames according to the reading sequence.

That is, when the layout frames are ordered according to the reading sequence, the clear sequence planning is performed on all the content areas in the layout according to the reading habit of the user. When all the content blocks are operated according to the correct reading sequence, the subsequent typesetting work becomes relatively smooth.

Further, in the sorting stage, a special virtual rectangular box may be created first and added to the stack S (layout analysis list). In the next cycle, we pop an element (any one layout frame) from the stack S as the current candidate frame S; if the candidate frame is not a virtual frame, it is added to the result list R (initial layout reading list). The next set of candidate boxes is then determined and added to stack S together. At the same time, all candidate boxes that have been added to the stack S are removed from the content area coordinate list I (layout frame candidate list). And so on until the stack S is empty, and finally returning to the ordered coordinate list of all the content areas (rectangular frames) to obtain the layout frame reading list.

Further, in the stage of determining the next set of candidate frames, it is a process of searching for the next set of candidate rectangular frames according to the right boundary x2 of the current rectangular frame (layout frame) s. That is, rectangular boxes with all left boundary values x1 smaller than or equal to the current boundary d (i.e., right boundary value x 2) are selected from the content area (rectangular box) coordinate list I. The result of the next candidate region, which is not absolutely possible, is excluded as a next list of selection results. The portion of the candidate boxes may then be sorted in ascending order by the size of y1 (i.e., the upper boundary value). As a result of the ordering, the rectangular box with the smallest y1 value is located at the forefront (i.e., read first). This rectangular box with the smallest y1 value is called the nearest candidate box m. Next, in order to determine the reading order of the blank columns, two blank area candidate frames respectively representing the left and right sides of the candidate frame m may be created. Together with m as the remaining set of candidate boxes. Finally, the rest candidate frame set with the right boundary exceeding the boundary d is deleted for filtering to obtain a final candidate frame. And the rest candidate frames are arranged in descending order according to the x1 value so as to adapt to the characteristic of 'last in first out' when the candidate frames are processed by using the stack. The modified result list is the next group of candidate frames, so that the subsequent processing of sequencing according to the reading order can be completed according to the group of candidate frames.

Along the above example, after obtaining the layout image shown in b in fig. 3, a stack S including a virtual rectangular frame may be constructed first, then a candidate frame S is selected from the stack S, and if the candidate frame S is a non-virtual frame, the candidate frame S is added to the list R. And then screening the list I according to the right boundary of the candidate frame s, and deleting the candidate frames with the left boundary smaller than or equal to the right boundary of the candidate frame s in the list I. And then, carrying out ascending order sorting on the rest candidate frames according to the respective corresponding upper boundaries, and selecting the candidate frame with the smallest upper boundary as the candidate frame m nearest to the candidate frame s according to the sorting result. Again, blank region candidate boxes are created for the left and right sides of candidate box m, and a candidate box set is composed with candidate box m. And finally, deleting the candidate frames of which the right boundaries exceed the right boundaries of the candidate frames s in the set, and sorting the rest candidate frames according to the respective corresponding left boundaries, so that the candidate frames s+1 corresponding to the candidate frames s can be selected according to the sorting result. Further, adding the candidate frame s+1 to the stack S, and repeating the process of reading the candidate frames from the stack S for sorting until the stack S is empty, so that a list of the layout frames in the layout image sorted according to the reading sequence can be generated according to the finally obtained list R. The candidate frame is the layout frame in the layout image.

In sum, through combining the boundary value of the first layout frame to carry out the screening of the second layout frame, the second layout frame that can screen has the reading sequence relation with the first layout frame, can also guarantee that the layout frame reading list who generates accords with the arrangement sequence of the layout frame in the image, and follow-up typesetting can be more accurate based on this.

When the frame and the character frame are fused, the overlapping degree between the frame and the character frame is calculated, and the distribution relation between the frame and the character frame is determined according to the overlapping degree, so that the frame and the frame with stronger overlapping relation are fused. In this embodiment, the specific implementation manner is as follows:

determining a plurality of frames and a plurality of character frames, and calculating a first overlapping degree between each frame and each character frame; selecting a picture frame and a character frame with overlapping relation according to the first overlapping degree to fuse, and obtaining a fusion result containing the fusion frame according to the fusion result; the attribute information of the fusion frame is determined by the picture frame associated with the fusion frame.

Specifically, the first degree of overlap refers to a value of the degree of overlap between each frame and each character frame, and a larger value indicates a higher degree of overlap. Correspondingly, the overlapping relation specifically refers to a relation that the frame and the character frame are highly overlapped, and the relation characterizes that the character frame and the frame have overlapping arrangement relation in the document image. Accordingly, the attribute information specifically refers to attribute information of the fusion frame, including, but not limited to, a size, an area, and the like of the fusion frame.

Based on the above, when fusing the character frame and the frame, the frame selection is performed on all the pictures and the characters in the document image in consideration of the functions of the frame and the character frame, so that the calculation complexity caused by the cross-category can be avoided by fusing the frame and the frame into the same type of fusion frame. Thus, a plurality of frames and a plurality of character frames may be determined, after which a first degree of overlap between each frame and each character frame may be calculated; then, the frame with the highest overlapping degree and the character frame can be selected for fusion, namely, the frame with the overlapping relation and the character frame are selected for fusion according to the first overlapping degree, and then a fusion result containing the fusion frame can be obtained according to the fusion result; in addition, since the character frame and the frame before fusion have respective attributes, the attribute information of the obtained fusion frame can be determined by the frame associated with the fusion frame.

That is, when fusing frames and character frames as fusion frames with minimum granularity for subsequent typesetting, the overlapping degree between all frames and character frames can be calculated, for example, a IOU (Intersection over Union) Min value is calculated, when the value reaches a preset threshold value, the two rectangular frames are illustrated to be highly overlapped, the two rectangular frames can be combined, and the attribute of the fused frame obtained by fusion is recorded as the attribute of the frame with the largest area, wherein IOUMin can be used for calculating the ratio of the overlapping area of the frames and the character frames to the area of the smallest rectangular frame, as shown in a of fig. 4, when IOUMin value calculation is performed on the two rectangular frames, the coordinate of the upper left corner of the intersection is determined to be (xI 1, yI 1) = (max (x 1, x1 ') max (y 1, y 1')); the coordinates of the upper right corner of the intersection are (xI 2, yI 2) = (min (x 2, x2 '), min (y 2, y 2')); further, determining that the intersection area is intersection _area=max (0, xii2-xii1+1) ×max (0, yi2-yi1+1); wherein the area of rectangle a is areaA = (x 2-x1+1) ×1 (y2-y1+1); the area of rectangle B is areaB = (x 2'-x1' +1) ×1 (y 2'-y1' +1), so that IOUMin = intersection _area/min can be determined (areaA, areaB).

Along the above example, after obtaining the character frames and the text frames shown in c in fig. 3, the text frames and the character frames can be calculated by IOUMin values, the text frames and the character frames with IOUMin values meeting the preset overlapping degree threshold are selected according to the calculation result, the attribute of the fusion frame obtained after the combination is recorded as the attribute corresponding to the rectangular frame with the largest area in the two rectangular frames, and so on, until all the text frames and the text frames are calculated, the fusion result shown in d in fig. 3 can be obtained, the fact that all the characters and the text in the document image can be selected and positioned by using the fusion frame is realized, the downstream business can conveniently complete the re-typesetting of the test paper content in the document image according to the arrangement sequence of the layout frames recorded in the list R and the fusion frame contained in the fusion result, and the typeset result can be displayed on terminal equipment held by a user normally, so that the user can answer the test paper conveniently.

In conclusion, fusion of the picture frames and the character frames is completed by calculating the overlapping degree, and frame selection of all characters and pictures in the image can be ensured, so that the problem of inaccurate gear shifting caused by information loss is avoided.

Step S206, merging the layout frames in the layout images by traversing the layout frame reading list to obtain column images, and mapping the fusion frames in the fusion result to the column images to obtain the images to be updated.

Specifically, after the reading list of the layout frames and the fusion result are obtained, further, the layout frames recorded in the reading list of the layout frames are considered to be ordered according to the reading sequence, and the whole layout arrangement of the corresponding document images is considered. On the basis, if the document is directly converted according to the layout frame, the problem that contents belonging to different columns are distributed together may occur, for example, the identity information input column contents in the test paper and the test question contents in the test question column are distributed together, so that the real distribution result of the test paper in the document image cannot be corresponded. Therefore, in order to avoid the problem, layout frames in the layout images can be combined by traversing the layout frame reading list, so that a column-division image containing column-division frames is obtained according to the combination result, and the content contained in each column is independently divided. And then, considering that the frame of the frame contained in the frame image is a result obtained from the overall typesetting angle of the content in the document image, the image-text content in each frame can be determined according to the fusion frame in the fusion result, and the fusion frame in the fusion result can be mapped to the frame image, so that the image to be updated containing the frame of the frame can be obtained, the fusion frames are distributed in the frame of the frame according to the arrangement relation, and the image-text content contained in at least one frame of the document image and each frame can be embodied through the image to be updated.

The column-dividing image specifically refers to an image obtained by combining layout frames in the layout image according to layout frame reading lists corresponding to reading sequences, wherein the image comprises column-dividing frames, and each column-dividing frame corresponds to each column of the content in the document image. Correspondingly, the image to be updated specifically refers to a result obtained by mapping a fusion frame in the fusion result to a column-division image, and the image not only comprises the column-division frame but also comprises the fusion frame.

Further, when layout fusion is performed according to the arrangement order of layout frames in the layout frame reading list, the layout frames with the merging relationship are actually constructed into one column. In this embodiment, the specific implementation manner is as follows:

Determining a fourth layout frame corresponding to a first traversing period by traversing the layout frame reading list, and determining a first column corresponding to the first traversing period; detecting the position relation between the fourth layout frame and the first column, creating a layout frame merging task according to the position relation, and executing the layout frame merging task; determining a second traversing period according to a task executing result, taking the second traversing period as the first traversing period, and executing a step of determining a fourth layout frame corresponding to the first traversing period by traversing the layout frame reading list; and determining the column-divided image containing the column-divided frame according to the task execution result corresponding to the target traversal period until the layout frame reading list is empty.

Specifically, the first traversal period specifically refers to a period of traversing the layout frame reading list to read a fourth layout frame, correspondingly, the fourth layout frame specifically refers to a layout frame read according to the layout frame arrangement sequence in the list, and correspondingly, the first column specifically refers to a column required to be determined in the current traversal period. Correspondingly, the position relation specifically refers to a relation of detecting whether the fourth layout frame belongs to the first column, and the corresponding layout frame merging task is a task of merging the layout frames belonging to the same column and not merging the layout frames not belonging to the same column.

In view of this, in merging layout frames, the process of merging layout frames having a reading relationship is actually performed in the reading order. Therefore, a fourth layout frame corresponding to the first traversal period can be determined by traversing the layout frame reading list, and a first column corresponding to the first traversal period is determined; then, the position relation between the fourth layout frame and the first column can be detected, the position relation can be realized by calculating the overlapping degree, and then, the layout frame merging task can be created and executed according to the position relation; after entering the second traversing period, the second traversing period can be used as a first traversing period, and a step of determining a fourth layout frame corresponding to the first traversing period by traversing the layout frame reading list is executed; and under the condition that the reading list of the layout frames is empty, the fact that all the layout frames are combined is achieved is indicated, so that the column division images containing the column division frames can be determined according to the task execution results corresponding to the target traversal period, and the downstream service can be conveniently used.

In sum, the layout frames to be combined are determined by calculating the position relation between the layout frames and the columns, and the layout frames are combined based on the layout frames, so that the layout frames which are combined together can be ensured to conform to the reading habit of a user, and information intersection can not be caused, thereby ensuring that the rearrangement of text contents in an image can be completed rapidly and accurately in the follow-up process, ensuring that the rearrangement result is matched with display equipment, and ensuring that the text contents are displayed normally.

Furthermore, considering that the dimensions corresponding to the rectangular frames included in the column-division images and the fusion result are different, in order to ensure that the generated images to be updated can be arranged in one column, the fusion frames belonging to the same column can be arranged in one column, and the overlapping degree between the column frames and the fusion frames can be calculated. In this embodiment, the specific implementation manner is as follows:

Determining a plurality of fusion frames in the fusion result, and determining a plurality of column frames in the column image; calculating second overlapping degree between each fusion frame and each column frame, and selecting the fusion frame and the column frame with the mapping relation according to the second overlapping degree to carry out mapping treatment; and generating the image to be updated according to the mapping processing result, wherein the fusion frame in the image to be updated is positioned in the column frame.

Specifically, the second overlapping degree refers to the overlapping degree between the column dividing frame and the fusion frame, and the description of the overlapping degree can be referred to the description of the first overlapping degree, which is not repeated here. Based on the above, when the mapping processing of the fusion frames to the column images is performed, a plurality of fusion frames can be determined in the fusion result, and a plurality of column frames can be determined in the column images; then, a second overlapping degree between each fusion frame and each column frame can be calculated, and then the fusion frames and the column frames with the mapping relation are selected according to the second overlapping degree to carry out mapping processing; and generating an image to be updated according to the mapping processing result, wherein the fusion frame in the image to be updated is positioned in the column frame.

That is, in the case of performing the division processing, the range of each division and the content included in the division are determined. Specifically, when the columns are determined, the layout frames in the layout frame reading list can be analyzed and combined one by one according to the arrangement sequence of the layout frames in the reading sequence. During merging, whether the current layout frame belongs to the right side of the current column can be detected, if yes, a new column can be created if the current layout frame does not belong to the current column, if no, the layout frame can be merged with the current column if the current layout frame belongs to the current column, if the merged column and the existing column have an overlapping area, the layout frame can be determined not to belong to the current column, otherwise, the layout frame belongs to the current column. And merging the layout frames belonging to the current column with the layout frames, so as to obtain a column image containing the column frames. When the fusion frames in the fusion result are mapped to the column-division images, the overlapping degree of each fusion frame and each column-division frame can be calculated, and the fusion frame with the highest overlapping degree is selected to be classified into the corresponding column-division frame, so that each column-division region in the document image and the image-text content contained in each column-division region are determined, and the subsequent typesetting is facilitated.

Along the above example, after obtaining the layout image shown in b in fig. 3 and the fusion result shown in d in fig. 3, the layout frames in the layout image may be fused, in the fusion stage, it may be detected whether each layout frame belongs to the same column, and according to the detection result, the layout frames belonging to the same column are selected to be combined, and according to the combination result, the column image shown in b in fig. 4 may be obtained, where the column image includes an identity information input column and an answer column. After the column images are obtained, the fusion frames for selecting the picture and text content in the fusion results can be mapped into the column images, the mapping processing stage can calculate the overlapping degree between each fusion frame and each column frame, the column frame with the highest overlapping degree and the fusion frame are selected for merging, and the image to be updated shown in the figure 4 c can be obtained, and the image can be used for determining which of the fusion frames contained in each column.

In conclusion, the fusion frames and the column frames can be mapped in a mode of calculating the overlapping degree, so that the fusion frames contained in each column frame can be precisely assembled and positioned, the image-text content contained in each column frame is determined, and the accuracy of subsequent document rearrangement is improved.

Step S208, area updating is carried out on the image to be updated according to the rectangular frame distribution information of the image to be updated, text rearrangement processing is carried out on the updated image to be updated, and a target file corresponding to the document image is generated according to a processing result.

Specifically, after the image to be updated is obtained, further, considering that the image to be updated contains a column division frame and a fusion frame, in order to ensure that the fusion frame in the column division frame can be typeset according to the arrangement mode of the graphic content in the document image, the image to be updated can be subjected to area update according to the rectangular frame distribution information of the image to be updated, the text content in the image can be divided according to lines and paragraphs, and the contents belonging to the same line and the same paragraph are combined together. Thereafter, in order to ensure that the generated document can be used on any device, text rearrangement processing can be performed on the updated image to be updated, so that a target file corresponding to the document image can be generated according to the processing result.

The rectangular frame distribution information specifically refers to distribution information corresponding to fusion frames and column division frames in an image to be updated, and the corresponding area update specifically refers to merging fusion frames belonging to the same row in the image to be updated, and merging fusion frames belonging to the same paragraph. Correspondingly, the rearrangement processing specifically refers to the processing of adjusting typesetting in the updated image, so as to ensure that the rearranged result can have higher suitability, i.e. the obtained target file can be clearly watched on any device. Correspondingly, the target file is specifically a file obtained by re-typesetting the content contained in the document image by the pointer, and the file can be in an image form or a file with a specified format generated according to requirements, such as a file corresponding to a doc format, an html format and the like.

Further, when the area of the image to be updated is updated according to the rectangular frame distribution information of the image to be updated, the merging processing of the contents belonging to the same part is performed according to the line merging mode and the paragraph merging mode. In this embodiment, the specific implementation manner is as follows:

Determining a plurality of fusion frames contained in the column frames in the image to be updated, and calculating third overlapping degree among the fusion frames in a first direction dimension to be used as the rectangular frame distribution information; and selecting fusion frames with a row distribution relation according to the rectangular frame distribution information to form a row rectangular frame, and carrying out paragraph updating on the row rectangular frame to serve as area updating processing on the image to be updated.

Specifically, the first direction dimension specifically refers to the longitudinal axis direction, and correspondingly, the third overlapping degree specifically refers to the overlapping degree between the fusion frames along the longitudinal axis direction, which is used for determining the overlapping degree of each fusion frame in the longitudinal axis direction. Correspondingly, the row rectangular frame specifically refers to a rectangular frame obtained by merging fusion frames belonging to the same row.

Based on the above, when the area update is performed, the merging can be performed according to the row unit, so that a plurality of fusion frames contained in the column frame in the image to be updated can be determined first, and the third overlapping degree among the fusion frames is calculated in the first direction dimension and used as rectangular frame distribution information; and then, selecting fusion frames with a line distribution relation according to the rectangular frame distribution information to form a line rectangular frame, and after the line rectangular frame is constructed, updating paragraphs of the line rectangular frame, thereby finishing the area updating processing of the image to be updated.

In specific implementation, when the image to be updated is subjected to line analysis processing, rough line division and then fine line division can be performed. In rough division, the IOU (namely, the ratio of the overlapping part of the two fusion frames in the Y-axis direction to the small part of the respective heights, namely, the IOU in the Y-axis direction, namely, IOUOnY) between all fusion frames in the column can be calculated; if IOUOnY exceeds the set threshold, the fusion box can be considered to belong to the same row, thereby completing the fusion box-to-row packet mapping. Further, in the subdivision process, the coarse division of the rows is considered to not completely solve the complex typesetting problem, such as the condition of image-text surrounding. In this case, it is necessary to perform the line subdivision processing. Specifically, all the fusion frames may be first ordered in a sequence from top to bottom and from left to right. Next, IOUOnY of each fusion frame in the Y-axis direction can be calculated to determine whether it has an overlapping relationship. After determining that the two fusion frames have an overlapping relationship, it may be further determined whether the two fusion frames belong to the same row, where the determination basis may be: the fusion frame set that detects that the fusion frame can form an overlapping relationship with other fusion frames must be identical. So that it is possible to complete the determination of whether or not fusion frames belong to the same row. If the number of fusion frames in which one fusion frame can form an overlapping relationship with other fusion frames exceeds the number of fusion frames of another fusion frame, the fusion frame cannot be merged into the same line as any other fusion frame. And carrying out line division processing on each fusion frame to obtain a plurality of line rectangular frames, so as to realize the combination of the fusion frames in the same line.

Along the above example, after the image to be updated shown in fig. 4c is obtained, in order to ensure that the subsequent typesetting is more accurate, the fusion frame in each column of the image to be updated may be subjected to line subdivision processing, and the specific line subdivision processing procedure can be referred to the above description. Thus, the fusion frames belonging to each row are combined into a row rectangular frame, and an image shown as a in fig. 5 can be obtained according to the combination result, wherein each rectangular frame corresponds to one row of image-text content.

In summary, through carrying out row merging processing on the fusion frames in the images, fine-to-coarse typesetting can be realized, so that typesetting accuracy can be effectively improved, typesetting in the generated result is ensured to be matched with any equipment, normal content display is ensured, and the problem of too small or too large characters does not occur.

Further, when the paragraph update is performed after the line update is completed, the paragraph update can be completed by calculating the distance. In this embodiment, the specific implementation manner is as follows:

Determining the distance information between the row rectangular frame in the image to be updated and the column frame in the image to be updated; and merging the row rectangular frames in the image to be updated according to the distance information, and performing paragraph updating on the row rectangular frames.

Specifically, the distance information specifically refers to distance information of a line rectangular frame and a column frame, and the distance information can combine the line rectangular frames to achieve that rectangular frames belonging to the same paragraph are combined together, so that an image containing the paragraph frame is obtained. Based on the distance information, after the image containing the row rectangular frames is obtained, the distance information between the row rectangular frames in the image to be updated and the column frames in the image to be updated can be determined; the line rectangle boxes in the image to be updated may thereafter be merged according to the distance information as a process of paragraph updating for the line rectangle boxes.

In practical applications, when performing paragraph analysis, all the lines are further organized according to the reading logic based on the line analysis result, so as to form each paragraph. Where a paragraph refers to basic format features such as first line indentation (i.e., left blank width of the first line of the paragraph), second indentation (i.e., left blank width of the second and subsequent lines of the paragraph), and blank distance between the paragraphs. Further, in the paragraph analysis, considering that text content may involve the arrangement of paragraphs conforming to the first line indentation, the paragraph analysis may be performed through paragraph rules. Specifically, the left and right indents of a text line are defined by the distance of the text line to the left and right sides of the column, respectively, and the width of the character can be calculated from the median derived from the text detection model. It may then be determined whether the current text line begins a new paragraph or ends the current paragraph according to specific rules. The specific rule may be: if the right blank width of the current text line is greater than a certain threshold value, determining the end of the current paragraph through the rule; if the left indentation of a text line within an existing paragraph and at or after the third line differs from the left indentation of the previous line by up to a width of X characters, then a new paragraph will be considered to have started; if the left indentation of the current text line differs from the left indentation of the previous line by more than X characters in width, then the new paragraph is also considered to be started. Based on the method, the combination processing of the line rectangular frames can be realized, and the line rectangular frames belonging to the same paragraph are combined into the paragraph rectangular frames, so that the subsequent use is convenient.

Along the above example, after obtaining the image shown in fig. 5 a, a segmentation analysis may be performed on the image, that is, the line rectangular frames belonging to the same paragraph in the image are combined, and the combining rule may be referred to the above description. Thereby realizing the combination of the rectangular frames belonging to the same paragraph into rectangular frames of paragraphs, and obtaining the image shown as b in fig. 5 according to the combination result.

In conclusion, paragraph analysis is performed after line analysis is completed, so that more accurate area updating results can be ensured, and the contents belonging to the through paragraphs are combined together, so that the typesetting can be more accurate in the follow-up typesetting.

In addition, when text rearrangement is performed on the updated image, the fact that the current typesetting is possibly not in line with the reading or use of the user is considered, so that rearrangement processing can be performed according to a document rearrangement strategy. In this embodiment, the specific implementation manner is as follows:

Generating an image to be rearranged according to the image to be updated after the area updating, and loading a document rearrangement strategy; determining character rearrangement information, document attribute information and image attribute information based on the document rearrangement policy; updating the image to be rearranged according to the character rearrangement information, the document attribute information and the image attribute information, and performing text rearrangement processing on the updated image to be updated.

Specifically, the image to be rearranged specifically refers to an image that needs to be rearranged with respect to the typesetting result currently obtained. Correspondingly, the document rearrangement strategy specifically refers to a strategy for rearranging characters, document attributes and images in images to be rearranged.

Based on the above, in order to ensure that the generated target file corresponds to the document image, an image to be rearranged can be generated according to the image to be updated after the area update, and a document rearrangement strategy is loaded; thereafter, character rearrangement information, document attribute information, and image attribute information may be determined based on the document rearrangement policy; and updating the image to be rearranged according to the character rearrangement information, the document attribute information and the image attribute information, so that the text rearrangement processing of the text in the image can be completed.

In practical applications, when the rearrangement processing is performed on the image to be rearranged, the rearrangement processing is actually performed after determining the rearrangement parameters based on the strategy. The rearrangement parameter is a basic element for determining the typesetting effect of the image-text content, so that multidimensional information can be set for the basic element. Such as the number of characters per line: this parameter determines the number of characters that can be accommodated after each line reorder. The magnitude of the value directly affects the typesetting effect and the layout of the paragraph. Output page resolution: the parameter is a parameter including both height and width, which sets the overall size of the page. Page margin: the parameter sets margins in four directions of up, down, left and right of the page. The layout of the text content in the page can be affected, including visual effects such as alignment of text blocks, text size, etc. Resolution of input image: the parameters include the width and height of the input image for referencing and adapting the processing effect at different resolutions. The rearrangement strategy can be determined based on the parameters, so that after the user defines the numerical values of the parameters, the effect of text rearrangement can be controlled and adjusted more flexibly, and various reading requirements and aesthetic standards can be met.

In summary, after the information such as text content in the images to be rearranged is ordered by combining the multidimensional rearrangement parameters, the document generated after rearrangement can be enabled to meet the reference requirement of a user.

On this basis, in order to enhance the rearrangement effect, the rearrangement processing of the document lines may be performed during the rearrangement processing. In this embodiment, the specific implementation manner is as follows:

Generating an image to be branched according to the updated image to be rearranged, and loading a document branching strategy; determining document branching information based on the character rearrangement information, the document attribute information and the image attribute information according to the document branching strategy; and performing rectangular frame progressive traversing processing on the image to be branched according to the document branching information, and generating a target file corresponding to the document image according to a traversing result.

Specifically, the image to be classified specifically refers to an image to be rearranged, which is obtained after parameter setting is performed on the image to be rearranged, and the image to be rearranged is required to be rearranged. Correspondingly, the document line sorting strategy specifically refers to a strategy for carrying out rearrangement processing on line texts in images to be sorted. Accordingly, the character rearrangement information specifically means information for performing rearrangement processing on characters. Correspondingly, the document attribute information specifically refers to attribute description information corresponding to the document. Correspondingly, the image attribute information specifically refers to attribute description information corresponding to the image. Correspondingly, the document line dividing information specifically refers to information for carrying out line dividing processing on a line rectangular frame in the image.

Based on this, in the rearrangement processing stage, rearrangement of the line may be performed first, that is, a line image to be divided may be generated from the updated image to be rearranged, and a document line policy may be loaded; thereafter, the document branching information can be determined based on the character rearrangement information, the document attribute information and the image attribute information according to the document branching policy; and performing rectangular frame progressive traversal processing on the images to be branched according to the document branching information, namely generating a target file corresponding to the document image according to the traversal result, wherein the obtained document is a document obtained after the line rearrangement processing.

In practical application, during the re-branching processing, the text content is re-branching and organizing based on the rearrangement parameters and the rectangular frames of each line grouped by paragraphs. Specifically, the maximum line width of the reordered paragraphs may be calculated first, and the process may calculate a maximum line width for each paragraph based on the reorder parameters. The calculation formula of the value is as follows: maximum line width = specified maximum number of characters per line. The maximum line width is used as a reference for the following rearrangement operation. Further, each original paragraph may be traversed in turn, then each line in the paragraph, and then the individual fusion boxes for each line. In this process, the width of the new row will be the sum of the widths of the various fusion boxes traversed. If the width of the new line exceeds its maximum line width after the next fusion frame is added to the new line, the new line is validated and noted, and then processing of the new line continues from the next fusion frame. During processing, the maximum line width for each line will be related to where it is located. For the top line, its maximum line width=the maximum line width of the paragraph-top line indentation of the paragraph. For the second and subsequent lines, its maximum line width = the maximum line width of a paragraph-the second indentation of a paragraph. If an element of an image or table having a width greater than the maximum line width of the current line is encountered at the time of processing, the element as a whole may be processed as a new line.

In summary, by rearranging the image-text content in the image, the adaptive processing based on explicit setting and complex typesetting can be realized.

In the rearrangement processing, a plurality of pages possibly exist in the rearrangement result, so that the document page can be constructed by combining the document paging strategy to obtain the target file according to the update result. In this embodiment, the specific implementation manner is as follows:

Generating an image to be paged according to the traversing result, and loading a document paging strategy; and carrying out reconstruction processing on the image to be paged based on the document paging strategy, and generating a target file corresponding to the document image according to a reconstruction processing result.

Specifically, the image to be paged specifically refers to an image obtained after line rearrangement processing. Accordingly, the document paging policy specifically refers to a policy for performing paging processing, and is used for generating a target file of a plurality of pages according to a document image as required.

Based on the above, after the line rearrangement processing is completed, an image to be paged can be generated according to the traversing result, and a document paging strategy is loaded; based on the method, the reconstruction processing is carried out on the images to be paged based on the document paging strategy, and the target file corresponding to the document image can be generated according to the reconstruction processing result.

In practical application, in order to accurately control the layout of the document, the scaling and position of the content in the image can be processed, and each element is ensured to be suitable for a new page. Specifically, the scaling factor for the new line may be calculated. Namely: in the initial stage, each new line from the re-branching process may be analyzed to calculate a corresponding scaling factor. The scaling factor is calculated by dividing the resolution width of the output page by the maximum line width of the paragraph. But if the width of the new line exceeds the maximum line width of the paragraph, there is an exception: at this point, the resolution width of the output page may be used to recalculate the scaling factor for the new line divided by the width of the line. This ensures that each row, regardless of its length, can accommodate a new page width while maintaining integrity. Thereafter, when new pages are constructed, once the width of each new line has been properly scaled and trimmed, the page rearrangement intercepts and combines the new lines within the source image. After appropriate scaling, the new line will be pasted directly into the current new page. In this process, if any new line after pasting causes the page to exceed the layout in height, the module will pick up the new page and continue to paste the remaining new line. Thus, the target file meeting the requirements can be obtained according to the processing result.

Along the above example, after the image shown in fig. 5 b is obtained, in order to ensure that the finally generated document meets the response requirement of the user, and the display suitability of the user holding terminal can be adapted, the rectangular frames in the image can be re-classified according to the rearrangement parameter preset by the user, a new line can be obtained according to the re-segmentation process, and then the document page is constructed according to the new line, the image shown in fig. 5 c can be generated, and the characters contained in each line in the image are reduced relative to the document image, so that the display of the image is more adapted to the terminal equipment held by the user, the user can conveniently answer the test paper, and the automatic correction of the test paper based on the result of the user answering on the terminal is supported.

For example, if the terminal equipment held by the user is a mobile phone, the number of characters in each row can be reduced when the test paper content in the document image is rearranged, and the text surrounding the picture content is rearranged below the picture, so that the user can conveniently answer the test paper on the mobile phone. If the terminal device held by the user is a computer, the number of characters in each row can be increased when the test paper content in the document image is rearranged, and the text surrounding the picture content is rearranged at the left or right of the picture, so that the user can conveniently and directly watch the whole paper surface content of the test paper on the computer, and the user can conveniently answer.

The document rearrangement method provided by the application is further described below by taking the application of the document rearrangement method in the scene of the job to be answered as an example. The method specifically comprises the following steps:

Step S1, obtaining a document image, inputting the document image into a layout detection model, a picture and text detection model and a character detection model for processing, and obtaining layout frame position information, picture and text frame position information and character frame position information corresponding to the document image.

And S2, generating a layout frame in the document image according to the layout frame position information, and obtaining the layout image according to the generation result.

And S3, generating a frame and a character frame in the document image according to the frame position information and the character frame position information.

Step S4, an initial layout frame reading list is created according to the reading sequence, a layout frame candidate list is created according to the layout frames in the layout images, and a layout frame analysis list is created for the layout images, wherein the layout frame analysis list contains virtual layout frames.

Step S5, selecting a first layout frame from the layout frame analysis list, and moving the first layout frame from the layout frame analysis list to the initial layout frame reading list under the condition that the first layout frame is a non-virtual layout frame.

Step S6, selecting a second layout frame from the layout frame candidate list according to the first layout frame, and transferring the second layout frame from the layout frame candidate list to the layout frame analysis list.

And S7, taking the layout analysis list added with the second layout as a layout analysis list, and executing the step of selecting the first layout in the layout analysis list until the layout analysis list is empty, and taking the initial layout reading list corresponding to the target ordering period as a layout reading list.

Wherein selecting a second layout frame from the layout frame candidate list according to the first layout frame comprises: filtering the layout frame candidate list according to a first boundary value of the first layout frame to obtain an intermediate layout frame candidate list; sorting the layout frames contained in the intermediate layout frame candidate list according to the second boundary value of each layout frame in the intermediate layout frame candidate list, and selecting a third layout frame according to the sorting result; creating an adjacent layout frame aiming at the third layout frame, and constructing a layout frame candidate set based on the third layout frame and the adjacent layout frame; and filtering the layout frame candidate set according to the first boundary value, sorting the filtering result, and determining a second layout frame according to the sorting result.

Step S8, determining a plurality of frames and a plurality of character frames, and calculating a first overlapping degree between each frame and each character frame.

Step S9, selecting a picture frame and a character frame with overlapping relation according to the first overlapping degree to fuse, and obtaining a fusion result containing the fusion frame according to the fusion result; the attribute information of the fusion frame is determined by the picture frame associated with the fusion frame.

Step S10, determining a fourth layout frame corresponding to the first traversal period through traversing the layout frame reading list, and determining a first column corresponding to the first traversal period.

Step S11, detecting the position relation between the fourth layout frame and the first column, creating a layout frame merging task according to the position relation, and executing the layout frame merging task.

Step S12, determining a second traversing period according to a task execution result, taking the second traversing period as a first traversing period, and executing a step of determining a fourth layout frame corresponding to the first traversing period through traversing the layout frame reading list;

And S13, determining a column image containing a column frame according to a task execution result corresponding to the target traversal period until the layout frame reading list is empty.

Step S14, determining a plurality of fusion frames in the fusion result and determining a plurality of column frames in the column image.

And S15, calculating second overlapping degree between each fusion frame and each column frame, and selecting the fusion frame and the column frame with the mapping relation according to the second overlapping degree to carry out mapping processing.

And S16, generating an image to be updated according to the mapping processing result, wherein a fusion frame in the image to be updated is positioned in the column frame.

Step S17, a plurality of fusion frames contained in the column frames in the image to be updated are determined, and third overlapping degree among the fusion frames is calculated in the first direction dimension and used as rectangular frame distribution information.

And S18, selecting fusion frames with row distribution relation to form row rectangular frames according to the rectangular frame distribution information, and determining the distance information between the row rectangular frames in the image to be updated and the column dividing frames in the image to be updated.

And S19, merging the row rectangular frames in the image to be updated according to the distance information, generating the image to be rearranged according to the merging result, and loading a document rearrangement strategy.

Step S20, character rearrangement information, document attribute information, and image attribute information are determined based on the document rearrangement policy.

And S21, updating the image to be rearranged according to the character rearrangement information, the document attribute information and the image attribute information, and performing text rearrangement processing on the updated image to be updated.

Step S22, generating an image to be branched according to the updated image to be rearranged, and loading a document branching strategy.

Step S23, determining the document branching information based on the character rearrangement information, the document attribute information and the image attribute information according to the document branching strategy.

And step S24, performing rectangular frame progressive traversing processing on the images to be segmented according to the document segmentation information, generating the images to be paged according to the traversing result, and loading a document paging strategy.

And S25, carrying out reconstruction processing on the image to be paged based on the document paging strategy, and generating a target file corresponding to the document image according to the reconstruction processing result.

Corresponding to the above method embodiment, the present application further provides an embodiment of a document rearrangement apparatus, and fig. 6 shows a schematic structural diagram of a document rearrangement apparatus according to an embodiment of the present application. As shown in fig. 6, the apparatus includes:

The detection module 602 is configured to detect the document image through the layout detection model, the graphic detection model and the character detection model respectively, and determine a layout image, a graphic frame and a character frame according to the detection result;

the ordering module 604 is configured to order the layout frames in the layout image according to the reading sequence to obtain a layout frame reading list, and fuse the frame and the character frame to obtain a fusion result;

the traversing module 606 is configured to sort the layout frames in the layout image according to the reading sequence to obtain a layout frame reading list, and fuse the frame and the character frame to obtain a fusion result;

The generating module 608 is configured to perform area update on the image to be updated according to the rectangular frame distribution information of the image to be updated, perform text rearrangement processing on the updated image to be updated, and generate a target file corresponding to the document image according to a processing result.

In an alternative embodiment, the detection module 602 is further configured to:

In an alternative embodiment, the ranking module 604 is further configured to:

In an alternative embodiment, the traversal module 606 is further configured to:

In an alternative embodiment, the generating module 608 is further configured to:

The above is an exemplary scheme of a document rearranging apparatus of the present embodiment. It should be noted that, the technical solution of the document rearrangement apparatus and the technical solution of the document rearrangement method belong to the same concept, and details of the technical solution of the document rearrangement apparatus, which are not described in detail, can be referred to the description of the technical solution of the document rearrangement method. Furthermore, the components in the apparatus embodiments should be understood as functional blocks that must be established to implement the steps of the program flow or the steps of the method, and the functional blocks are not actually functional partitions or separate limitations. The device claims defined by such a set of functional modules should be understood as a functional module architecture for implementing the solution primarily by means of the computer program described in the specification, and not as a physical device for implementing the solution primarily by means of hardware.

Fig. 7 illustrates a block diagram of a computing device 700 provided in accordance with an embodiment of the present application. The components of computing device 700 include, but are not limited to, memory 710 and processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.

Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 740 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network interface controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the application, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 7 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 700 may also be a mobile or stationary server.

Wherein the processor 720 is configured to execute computer-executable instructions of the document reordering method.

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the document rearrangement method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the document rearrangement method.

An embodiment of the present application also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, are used in a document reordering method.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the document rearrangement method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the document rearrangement method.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be increased or decreased appropriately according to the requirements of the patent practice, for example, in some areas, according to the patent practice, the computer readable medium does not include an electric carrier signal and a telecommunication signal.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the application disclosed above are intended only to assist in the explanation of the application. Alternative embodiments are not intended to be exhaustive or to limit the application to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and the full scope and equivalents thereof.

Claims

1. A document reordering method, comprising:

2. The method according to claim 1, wherein the detecting the document image by the layout detection model, the graphic detection model, and the character detection model, respectively, and determining the layout image, the graphic frame, and the character frame based on the detection results, comprises:

Acquiring a document image, inputting the document image into a layout detection model, a picture and text detection model and a character detection model for processing, and acquiring layout frame position information, picture and text frame position information and character frame position information corresponding to the document image;

generating a layout frame in the document image according to the layout frame position information, and obtaining a layout image according to a generation result;

And generating a frame and a character frame in the document image according to the frame position information and the character frame position information.

3. The method of claim 1, wherein the sorting the layout frames in the layout image according to the reading order to obtain the layout frame reading list comprises:

Creating an initial layout frame reading list according to a reading sequence, creating a layout frame candidate list according to the layout frames in the layout images, and creating a layout frame analysis list aiming at the layout images, wherein the layout frame analysis list comprises virtual layout frames;

Selecting a first layout frame from the layout frame analysis list, and moving the first layout frame from the layout frame analysis list to the initial layout frame reading list under the condition that the first layout frame is a non-virtual layout frame;

selecting a second layout frame from the layout frame candidate list according to the first layout frame, and migrating the second layout frame from the layout frame candidate list to the layout frame analysis list;

And taking the layout analysis list added with the second layout as the layout analysis list, executing the step of selecting the first layout in the layout analysis list until the layout analysis list is empty, and taking the initial layout reading list corresponding to the target ordering period as the layout reading list.

4. A method according to claim 3, wherein said selecting a second layout frame from said candidate list of layout frames based on said first layout frame comprises:

Filtering the layout frame candidate list according to the first boundary value of the first layout frame to obtain an intermediate layout frame candidate list;

sorting the layout frames contained in the intermediate layout frame candidate list according to the second boundary value of each layout frame in the intermediate layout frame candidate list, and selecting a third layout frame according to the sorting result;

Creating an adjacent layout frame for the third layout frame, and constructing a layout frame candidate set based on the third layout frame and the adjacent layout frame;

And filtering the layout frame candidate set according to the first boundary value, sorting the filtering result, and determining the second layout frame according to the sorting result.

5. The method of claim 1, wherein fusing the frame and the character frame to obtain a fused result comprises:

determining a plurality of frames and a plurality of character frames, and calculating a first overlapping degree between each frame and each character frame;

Selecting a picture frame and a character frame with overlapping relation according to the first overlapping degree to fuse, and obtaining a fusion result containing the fusion frame according to the fusion result; the attribute information of the fusion frame is determined by the picture frame associated with the fusion frame.

6. The method of claim 1, wherein the merging the layout frames in the layout image by traversing the layout frame reading list to obtain a column image comprises:

determining a fourth layout frame corresponding to a first traversing period by traversing the layout frame reading list, and determining a first column corresponding to the first traversing period;

detecting the position relation between the fourth layout frame and the first column, creating a layout frame merging task according to the position relation, and executing the layout frame merging task;

Determining a second traversing period according to a task executing result, taking the second traversing period as the first traversing period, and executing a step of determining a fourth layout frame corresponding to the first traversing period by traversing the layout frame reading list;

And determining the column-divided image containing the column-divided frame according to the task execution result corresponding to the target traversal period until the layout frame reading list is empty.

7. The method according to claim 1, wherein mapping the fusion frame in the fusion result to the column image to obtain an image to be updated comprises:

Determining a plurality of fusion frames in the fusion result, and determining a plurality of column frames in the column image;

calculating second overlapping degree between each fusion frame and each column frame, and selecting the fusion frame and the column frame with the mapping relation according to the second overlapping degree to carry out mapping treatment;

And generating the image to be updated according to the mapping processing result, wherein the fusion frame in the image to be updated is positioned in the column frame.

8. The method according to claim 1, wherein the performing area update on the image to be updated according to the rectangular frame distribution information of the image to be updated comprises:

Determining a plurality of fusion frames contained in the column frames in the image to be updated, and calculating third overlapping degree among the fusion frames in a first direction dimension to be used as the rectangular frame distribution information;

and selecting fusion frames with a row distribution relation according to the rectangular frame distribution information to form a row rectangular frame, and carrying out paragraph updating on the row rectangular frame to serve as area updating processing on the image to be updated.

9. The method of claim 8, wherein the performing paragraph updates for the row rectangular box comprises:

determining the distance information between the row rectangular frame in the image to be updated and the column frame in the image to be updated;

and merging the row rectangular frames in the image to be updated according to the distance information, and performing paragraph updating on the row rectangular frames.

10. The method according to claim 1, wherein the text rearrangement processing for the updated image to be updated includes:

generating an image to be rearranged according to the image to be updated after the area updating, and loading a document rearrangement strategy;

Determining character rearrangement information, document attribute information and image attribute information based on the document rearrangement policy;

Updating the image to be rearranged according to the character rearrangement information, the document attribute information and the image attribute information, and performing text rearrangement processing on the updated image to be updated.

11. The method according to claim 10, wherein the method further comprises:

Generating an image to be branched according to the updated image to be rearranged, and loading a document branching strategy;

Determining document branching information based on the character rearrangement information, the document attribute information and the image attribute information according to the document branching strategy;

And performing rectangular frame progressive traversing processing on the image to be branched according to the document branching information, and generating a target file corresponding to the document image according to a traversing result.

12. The method of claim 11, wherein generating the target file corresponding to the document image according to the traversal result includes:

generating an image to be paged according to the traversing result, and loading a document paging strategy;

And carrying out reconstruction processing on the image to be paged based on the document paging strategy, and generating a target file corresponding to the document image according to a reconstruction processing result.

13. A document reordering device, comprising:

14. A computing device, comprising:

A memory and a processor;

the memory is configured to store computer executable instructions and the processor is configured to execute the computer executable instructions to implement the steps of the method of any one of claims 1 to 12.

15. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 12.