CN116152835A - Layout analysis method, device, equipment and storage medium - Google Patents

Layout analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN116152835A
CN116152835A CN202310097728.XA CN202310097728A CN116152835A CN 116152835 A CN116152835 A CN 116152835A CN 202310097728 A CN202310097728 A CN 202310097728A CN 116152835 A CN116152835 A CN 116152835A
Authority
CN
China
Prior art keywords
original image
layout analysis
sub
image
analysis result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310097728.XA
Other languages
Chinese (zh)
Inventor
乔美娜
刘珊珊
章成全
姚锟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310097728.XA priority Critical patent/CN116152835A/en
Publication of CN116152835A publication Critical patent/CN116152835A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Abstract

The disclosure provides a layout analysis method, a layout analysis device, layout analysis equipment and a layout analysis storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, image processing and computer vision, and can be applied to scenes such as OCR. The layout analysis method comprises the following steps: acquiring an original image; performing structural analysis processing on an original image to determine the structural type of the original image; if the structure type is a single structure, dividing the original image into a plurality of sub-images, and carrying out layout analysis processing on the sub-images to obtain a sub-image layout analysis result; and obtaining a final layout analysis result of the original image according to the sub-image layout analysis result. The accuracy of the layout analysis result is improved.

Description

Layout analysis method, device, equipment and storage medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, image processing and computer vision, and can be applied to scenes such as OCR (optical character recognition), and particularly relates to a layout analysis method, device, equipment and storage medium.
Background
Optical character recognition (OpticalCharacterRecognition, OCR) technology plays an important role in a variety of scenarios. Layout analysis is a basic step of OCR technology, and the attribute and the position of elements in the image can be obtained through layout analysis, so that the subsequent processing is convenient.
In the related art, for layout analysis of a long image, the longest side of an original image is generally scaled to a fixed length, and on the premise of maintaining the proportion of the original image, the original image is subjected to size conversion (resize), and then the image after size conversion is subjected to layout analysis.
Disclosure of Invention
The disclosure provides a layout analysis method, a layout analysis device, layout analysis equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a layout analysis method including: acquiring an original image; performing structural analysis processing on an original image to determine the structural type of the original image; if the structure type is a single structure, dividing the original image into a plurality of sub-images, and carrying out layout analysis processing on the sub-images to obtain a sub-image layout analysis result; and obtaining a final layout analysis result of the original image according to the sub-image layout analysis result.
According to another aspect of the present disclosure, there is provided a layout analysis apparatus including: the acquisition module is used for acquiring an original image; the structure analysis module is used for carrying out structure analysis processing on the original image so as to determine the structure type of the original image; the first processing module is used for dividing the original image into a plurality of subgraphs if the structure type is a single structure, and carrying out layout analysis processing on the subgraphs to obtain a subgraph layout analysis result; and obtaining a final layout analysis result of the original image according to the sub-image layout analysis result.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the above aspects.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the above aspects.
According to the technical scheme, the accuracy of the layout analysis result can be improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic illustration of an original image of different structure types provided in accordance with an embodiment of the present disclosure;
fig. 3 is a schematic diagram of an application scenario provided according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of the overall architecture of layout analysis provided in accordance with an embodiment of the present disclosure;
FIG. 5 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 6 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 7 is a schematic diagram according to a fourth embodiment of the present disclosure;
fig. 8 is a schematic diagram of an electronic device for implementing the layout analysis method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the related art, for layout analysis of a long image, the longest side of an original image is generally scaled to a fixed length, and on the premise of maintaining the proportion of the original image, the original image is subjected to size conversion (resize), and then the image after size conversion is subjected to layout analysis.
However, the size-converted image may significantly shrink the image, and performing layout analysis on the size-converted image may affect the layout analysis effect, resulting in inaccurate layout analysis results.
In order to improve the accuracy of the layout analysis result, the present disclosure provides the following embodiments.
Fig. 1 is a schematic diagram of a first embodiment of the disclosure, where the embodiment provides a layout analysis method, the method includes:
101. an original image is acquired.
102. And carrying out structural analysis processing on the original image to determine the structural type of the original image.
103. If the structure type is a single structure, dividing the original image into a plurality of sub-images, and carrying out layout analysis processing on the sub-images to obtain a sub-image layout analysis result; and obtaining a final layout analysis result of the original image according to the sub-image layout analysis result.
The original image refers to an image to be subjected to layout analysis.
Layout analysis refers to automatically identifying and regressing the attributes and positions of elements such as pictures, tables, paragraphs and the like in an image, so as to obtain the attribute information and the position information of the elements in the image. Then, elements with different attributes can be identified by adopting different identification modes. For example, a picture recognition model is used to recognize a picture region, a text recognition model is used to recognize a paragraph region, and so on.
Accordingly, the layout analysis result includes: attribute information and position information of elements in the image. Wherein the attribute information is used for indicating the type of the element, such as a paragraph, a picture or a table; the positional information is represented by, for example, (x, y, w, h), where x and y are the upper left corner coordinates of the element, and w and h are the width and height of the element. In addition, the layout analysis result of the subgraph is specific to attribute information and position information of elements in the subgraph; the final layout analysis result is attribute information and position information for the original image, i.e., elements in the original image.
After the final layout analysis result is obtained, elements with different attributes can be identified by adopting different identification modes. For example, a picture recognition model is used to recognize a picture region, a text recognition model is used to recognize a paragraph region, and so on.
The structure types of the original image include: a single structure, or a hybrid structure.
The single structure type further includes: left-right structure, or up-down structure.
The left-right structure means that the image has obvious separators, the image content is divided into at least two parts by the separators, and the heights of each part are the same.
The upper and lower structures are formed by typesetting the image content from top to bottom, and have no separator.
The hybrid structure means that there are both a left-right structure and an up-down structure in the image content.
The separator may be specifically a Column (Column), which is a layout format that is vertically divided from top to bottom, and divides the layout into several parts.
Taking the example that the separator is a column, fig. 2 shows an image of a left-right structure, an image of an up-down structure, and an image of a mixed structure, respectively. Rectangular boxes in an image represent elements in the image, which may be paragraphs, tables, pictures, etc., for example.
For an original image of a single structure, the original image can be segmented into a plurality of corresponding subgraphs according to the structure type. For example, for an original image of a left-right structure, the original image is split into a plurality of sub-images in the left-right direction, or for an original image of an up-down structure, the original image is split into a plurality of sub-images in the up-down direction.
The image includes a width and a height, and the left-right direction refers to the width direction of the image, and the up-down direction refers to the height direction of the image.
The plurality of subgraphs in the left-right direction means that the heights of the subgraphs are the same and are the heights of the original images; the sum of the widths of each sub-image is the width of the original image.
The plurality of sub-images in the up-down direction means that the width of each sub-image is the same and is the width of the original image; the sum of the heights of each sub-image is the height of the original image.
After a plurality of sub-images are obtained, carrying out layout analysis processing on each sub-image to obtain a sub-image layout analysis result of each sub-image, and obtaining a final layout analysis result of the original image according to the sub-image layout analysis result.
In this embodiment, since the subgraph is obtained after the original image is segmented, the size of the subgraph is smaller than that of the original image, and the subgraph is subjected to layout analysis, the subgraph does not need to be reduced or the reduction degree is reduced, so that the accuracy of the layout analysis result can be improved. In addition, by carrying out structural analysis processing on the original image and carrying out segmentation processing on the original image with a single structure, the integrity of elements in the subgraph can be ensured, and the accuracy of the layout analysis result is further improved.
For better understanding of the present disclosure, application scenarios of embodiments of the present disclosure are described.
Fig. 3 is a schematic diagram of an application scenario provided according to an embodiment of the present disclosure.
The user may transmit the original image to be layout analyzed to the server 302 through the user terminal 301, the server 302 performs layout analysis on the original image, and after the server 302 obtains the layout analysis result of the original image, the layout analysis result may be returned to the user terminal 301, and the layout analysis result is displayed to the user through the user terminal 301. The user terminal includes, for example: personal computers (Personal Computer, PCs), mobile devices (e.g., cell phones, tablet computers), notebook computers, wearable devices (e.g., smart watches), and the like. The server may be a local server or a cloud server. The user terminal and the server may communicate over a wired network and/or a wireless network. In the above, the layout analysis is performed in the server as an example, it is understood that if the user terminal has the layout analysis capability, the layout analysis may be performed locally in the user terminal.
As shown in fig. 4, the overall architecture of layout analysis includes: aiming at an original image, obtaining a final layout analysis result of the original image through structural analysis, segmentation, sub-image layout analysis and sub-image layout analysis result combination; or, obtaining a final layout analysis result of the original image through structural analysis and full-page analysis.
Structural analysis refers to performing structural analysis processing on an original image to obtain a structural type of the original image, wherein the structural type comprises: a single structure, or a hybrid structure. The unitary structure further includes a left-right structure, or an up-down structure. The hybrid structure is meant to include a left-right structure and an up-down structure.
If the original image is of a single structure, the original image is segmented to obtain a plurality of subgraphs; performing layout analysis (sub-image layout analysis) processing on each sub-image to obtain a sub-image layout analysis result of each sub-image; and then, merging the sub-image layout analysis results of the plurality of sub-images to obtain a final layout analysis result.
And if the original image is of a mixed structure, carrying out whole-image plate analysis processing to obtain a final plate analysis result. In the whole-plate surface analysis processing, for example, the longest edge of the original image may be scaled to a fixed value, the original image may be subjected to size conversion (size) while maintaining the scale of the original image, and then the image subjected to size conversion may be subjected to surface analysis.
In combination with the application scenario, the embodiment of the disclosure further provides a layout analysis method.
Fig. 5 is a schematic diagram of a second embodiment of the present disclosure, where the present embodiment provides a layout analysis method, the method includes:
501. an original image is acquired.
502. And adopting a pre-trained structural analysis model to carry out structural analysis processing on the original image so as to determine the structural type of the original image.
In this embodiment, the structure type includes: the left-right structure, the up-down structure, and the mixed structure are exemplified.
The structural analysis model may include a feature extraction network whose input is an original image, whose output is an image feature of the original image, and a classification network whose input is an image feature of the original image, whose output is a structural type of the original image.
The backbone network of the feature extraction network may be a convolutional neural network, and the convolutional neural network is used for performing convolutional processing on the original image to obtain the image features of the original image.
The classification network mainly comprises a full-connection layer, in particular a three-classification network, and the structure type of the original image is obtained by processing the image characteristics of the original image, such as a left-right structure, an upper-lower structure and a mixed structure which are respectively represented by 0,1 and 2.
During training, some image samples can be collected, each image sample is labeled with the structure type (0, 1, 2), and then the image samples and the labeling information thereof are used for training, so that a structure analysis model can be obtained.
503. If the original image is in a left-right structure, dividing the original image into a plurality of subgraphs based on the position information of the columns in the original image, performing layout analysis processing on the subgraphs to obtain subgraph layout analysis results, and combining the subgraph layout analysis results to obtain final layout analysis results of the original image.
Wherein, take separator as the example of the column division.
A pre-trained segmentation detection model may be used to perform segmentation detection processing on the original image to determine location information of the segmentation in the original image.
The split detection model may be a target detection model. During training, the columns in the image sample can be used as targets for marking, for example, the position information of the columns is marked, and the column detection model can be trained based on the image sample and the column marking information thereof.
When the original image is subjected to the column detection by the column detection model, the column position in the original image can be detected. The detected positional information of the division column includes, for example, x1, x2, and the like, assuming that the width direction of the image is the x-axis direction and the height direction is the y-direction. Taking x1 as an example of the position information of the column, a straight line perpendicular to the x-axis direction of the abscissa x=x1 may be used to divide the original image into two parts, each of which is used as a sub-image.
Assuming that n (n is a positive integer) sub-columns are provided, the original image is segmented into (n+1) sub-images.
The height of each sub-image after slicing is the same for the original image of the left-right structure.
After obtaining the plurality of sub-images, a pre-trained layout analysis model can be adopted for each sub-image, and layout analysis processing can be carried out on each sub-image so as to obtain a sub-image layout analysis result of each sub-image.
The layout analysis model may be an object detection model. During training, elements in the image sample are used as targets to be marked, for example, attribute information and position information of the marked elements are used, and a publication surface analysis model can be trained based on the image sample and the marked information. Furthermore, when the layout analysis model is adopted for the layout analysis processing, the attribute information and the position information of the elements in the image to be processed can be identified.
Attribute information is used to indicate the type of the element, such as a picture, a paragraph, a table, etc.; the position information is used to indicate the position of the element, as represented by [ x, y, w, h ], where (x, y) is the upper left angular position of the element and w and h are the width and height of the element, respectively.
After each sub-graph is processed by adopting the layout analysis model, the attribute information and the position information of the elements in each sub-graph, namely the layout analysis result of the sub-graph, can be obtained.
Different subgraphs can adopt the same layout analysis model, and a plurality of subgraphs can be subjected to layout analysis processing in parallel.
After the sub-image layout analysis results are obtained, the sub-image layout analysis results of the plurality of sub-images can be combined to obtain the layout analysis results of the original images.
Combining, namely combining a plurality of sub-image layout analysis results, taking two sub-images as examples, and taking the corresponding sub-image layout analysis results as a first sub-image layout analysis result and a second sub-image layout analysis result, wherein the overall layout analysis result after the combining is the [ first sub-image layout analysis result, second sub-image layout analysis result ].
In this embodiment, for the original image with the left-right structure, the original image is segmented based on the separator, and because the elements in the segmented region of the separator are relatively complete, that is, the same element is not located in the different segmented regions of the separator, the original image is segmented based on the separator to obtain the sub-image, so that the integrity of the elements in the sub-image can be ensured, the accuracy of the layout analysis result of the sub-image is further improved, and the accuracy of the final layout analysis result of the original image is further improved.
In addition, the final layout analysis structure is obtained by combining the sub-layout analysis results, so that the final layout analysis result can be simply and efficiently obtained.
504. If the original image is in an up-down structure, dividing the original image into a plurality of subgraphs based on the position information of the target element in the original image, performing layout analysis processing on the subgraphs to obtain subgraph layout analysis results, and combining the subgraph layout analysis results to obtain final layout analysis results of the original image.
The method comprises the steps of detecting position information of target elements in a target area in an original image aiming at the target area in the original image, wherein the target area is a partial area in the original image, and the target elements are elements at the uppermost position in the target area; and cutting the original image into a plurality of subgraphs according to the position information of the target element.
For the original images of the upper and lower structures, the target area can be determined in a multi-iteration mode.
Initially, the original image may be divided into upper and lower portions, for example, based on the center line of the original image in the up-down direction, the original image may be divided into an upper portion and a lower portion, and the upper portion may be used as an initial target area.
For the upper part (initial target area), a layout analysis model may be employed to detect attribute information and position information of target elements in the upper part.
After the layout analysis is performed on the upper portion by the layout analysis model, attribute information and position information of the element in the upper portion can be obtained, and further attribute information and position information of the element in the uppermost position (i.e., the target element) can be obtained therefrom, such as attribute information=paragraph, position information= [ x1, y1, w1, h1] represents, where (x 1, y 1) is the upper left corner coordinate of the target element (paragraph), and w1 and h1 are the width and height of the target element (paragraph), respectively.
After the position information of the target element is obtained, the original image can be segmented by utilizing the position information. Taking the upper left corner of the original image as the origin of coordinates as an example, the method specifically may include: the original image is segmented by adopting a straight line perpendicular to the y-axis direction with an ordinate y= (y1+h1), the area where the target element belongs is taken as a subgraph, namely, the position information of the first subgraph= [0, w0, y1+h1], (0, 0) is the origin of coordinates, namely, the coordinates of the upper left corner of the original image, w0 is the width of the original image, y1 is the ordinate of the upper left corner of the target element, and h1 is the height of the target element.
After the first sub-graph is obtained, the part of the original image except the first sub-graph, namely, the part of y > (y1+h1) is taken as a target area of the next iteration, the target elements in the target area are identified in a similar manner, and the area to which the target elements belong is taken as the sub-graph, so that each sub-graph is obtained one by one.
For each sub-graph, a pre-trained layout analysis model can be adopted to carry out layout analysis processing on each sub-graph so as to obtain a sub-graph layout analysis result of each sub-graph.
After the sub-image layout analysis results are obtained, the sub-image layout analysis results of the plurality of sub-images can be combined to obtain the layout analysis results of the original images.
For the specific content of the layout analysis of the subgraph and the merging of the layout analysis results of the subgraph, reference can be made to the related content about the left-right structure.
In this embodiment, for the original image with the up-down structure, the original image is segmented based on the target element, and since the target element is complete, the original image is segmented based on the target element to obtain the sub-image, so that the integrity of the elements in the sub-image can be ensured, and the accuracy of the layout analysis result of the sub-image is further improved, so that the accuracy of the final layout analysis result of the original image is improved.
In addition, the final layout analysis structure is obtained by combining the sub-layout analysis results, so that the final layout analysis result can be simply and efficiently obtained.
505. And if the original image is of a mixed structure, carrying out whole-image-plate analysis processing on the original image so as to obtain a plate analysis result of the original image.
The whole image plane analysis means that the original image is not segmented, and the whole original image is subjected to layout analysis.
If the longest side of the original image is larger than a fixed value, scaling the longest side to a fixed length, performing size conversion (resize) on the original image on the premise of keeping the proportion of the original image, and performing layout analysis on the image after size conversion.
And carrying out layout analysis processing on the size-converted image by adopting a layout analysis model.
The size conversion processing is to compress an original image, in which the longest side is larger than a fixed value, and to process the compressed image, and the size conversion processing is inferior to the processing of an original image (the split sub-image is a part of the original image and can be considered as the original image), in that the effect is inferior, and in particular, in the layout analysis processing, the layout analysis processing is performed on the compressed image, and the layout analysis processing is inferior to the layout analysis processing is performed on the original image.
In addition, in the case where the longest side of the subgraph is possibly larger than a fixed value required by the layout analysis model, the above-described size conversion process may be performed on the subgraph, and the layout analysis may be performed on the converted image in which the longest side is compressed to a fixed value.
It will be appreciated that although it is also possible to compress the sub-image, since the sub-image is smaller in size than the original image, the compression degree can be reduced compared to directly compressing the original image, thereby improving the accuracy of layout analysis.
In this embodiment, the whole-image-plane analysis processing is performed on the original image with the hybrid structure, so that the processing of the original image with different structure types can be realized, the corresponding final layout analysis results can be obtained for the original image with different structure types, and the overall coverage of the final layout analysis results is improved.
Fig. 6 is a schematic diagram of a third embodiment of the present disclosure, where a layout analysis apparatus 600 includes: an acquisition module 601, a structure analysis module 602 and a first processing module 603.
The acquisition module 601 is used for acquiring an original image; the structure analysis module 602 is configured to perform structure analysis processing on an original image to determine a structure type of the original image; the first processing module 603 is configured to segment the original image into a plurality of sub-images if the structure type is a single structure, and perform layout analysis processing on the sub-images to obtain a sub-image layout analysis result; and obtaining a final layout analysis result of the original image according to the sub-image layout analysis result.
In this embodiment, since the subgraph is obtained after the original image is segmented, the size of the subgraph is smaller than that of the original image, and the subgraph is subjected to layout analysis, the subgraph does not need to be reduced or the reduction degree is reduced, so that the accuracy of the layout analysis result can be improved. In addition, by carrying out structural analysis processing on the original image and carrying out segmentation processing on the original image with a single structure, the integrity of elements in the subgraph can be ensured, and the accuracy of the layout analysis result is further improved.
In some embodiments, the unitary structure comprises: a left-right structure; the original image of the left and right structure comprises separators; the first processing module is further configured to: detecting position information of the separator in the original image; and dividing the original image into a plurality of sub-images according to the position information of the separator, wherein the plurality of sub-images have the same height.
In this embodiment, for the original image with the left-right structure, the original image is segmented based on the separator, and because the elements in the segmented region of the separator are relatively complete, that is, the same element is not located in the different segmented regions of the separator, the original image is segmented based on the separator to obtain the sub-image, so that the integrity of the elements in the sub-image can be ensured, the accuracy of the layout analysis result of the sub-image is further improved, and the accuracy of the final layout analysis result of the original image is further improved.
In some embodiments, the unitary structure comprises: an up-down structure; the first processing module is further configured to: detecting position information of target elements in a target area in the original image, wherein the target area is a partial area in the original image, and the target elements are elements at the uppermost position in the target area; and dividing the original image into a plurality of sub-images according to the position information of the target element, wherein the plurality of sub-images have the same width.
In this embodiment, for the original image with the up-down structure, the original image is segmented based on the target element, and since the target element is complete, the original image is segmented based on the target element to obtain the sub-image, so that the integrity of the elements in the sub-image can be ensured, and the accuracy of the layout analysis result of the sub-image is further improved, so that the accuracy of the final layout analysis result of the original image is improved.
In some embodiments, the first processing module 603 is further configured to: and merging the sub-image layout analysis results to obtain the final layout analysis result.
The final layout analysis structure is obtained by combining the sub-layout analysis results, so that the final layout analysis result can be simply and efficiently obtained.
Fig. 7 is a schematic diagram of a fourth embodiment of the disclosure, and the present embodiment provides a layout analysis device, which is different from the previous embodiment in that the device 700 includes: the acquiring module 701, the structure analyzing module 702, and the first processing module 703 further include: a second processing module 704.
For details of the acquisition module 701, the structure analysis module 702 and the first processing module 703, reference may be made to the previous embodiment.
And a second processing module 704, configured to perform full-page analysis processing on the original image if the structure type is a hybrid structure, so as to obtain a final layout analysis result of the original image.
In this embodiment, the whole-image-plane analysis processing is performed on the original image with the hybrid structure, so that the processing of the original image with different structure types can be realized, the corresponding final layout analysis results can be obtained for the original image with different structure types, and the overall coverage of the final layout analysis results is improved.
It is to be understood that in the embodiments of the disclosure, the same or similar content in different embodiments may be referred to each other.
It can be understood that "first", "second", etc. in the embodiments of the present disclosure are only used for distinguishing, and do not indicate the importance level, the time sequence, etc.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. Electronic device 800 may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the electronic device 800 can also be stored. The computing unit 801, the ROM802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in electronic device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 801 performs the respective methods and processes described above, for example, the layout analysis method. For example, in some embodiments, the layout analysis method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 800 via the ROM802 and/or the communication unit 809. When a computer program is loaded into the RAM803 and executed by the computing unit 801, one or more steps of the layout analysis method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the layout analysis method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-chips (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable load balancing apparatus, such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("VirtualPrivate Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (13)

1. A layout analysis method, comprising:
acquiring an original image;
performing structural analysis processing on an original image to determine the structural type of the original image;
if the structure type is a single structure, dividing the original image into a plurality of sub-images, and carrying out layout analysis processing on the sub-images to obtain a sub-image layout analysis result; and obtaining a final layout analysis result of the original image according to the sub-image layout analysis result.
2. The method of claim 1, wherein,
the unitary structure includes: a left-right structure;
the original image of the left and right structure comprises separators;
the splitting the original image into a plurality of sub-images includes:
detecting position information of the separator in the original image;
and dividing the original image into a plurality of sub-images according to the position information of the separator, wherein the plurality of sub-images have the same height.
3. The method of claim 1, wherein,
the unitary structure includes: an up-down structure;
the splitting the original image into a plurality of sub-images includes:
detecting position information of target elements in a target area in the original image, wherein the target area is a partial area in the original image, and the target elements are elements at the uppermost position in the target area;
and dividing the original image into a plurality of sub-images according to the position information of the target element, wherein the plurality of sub-images have the same width.
4. A method according to any one of claims 1-3, wherein said obtaining a final layout analysis result of said original image from said sub-image layout analysis result comprises:
and merging the sub-image layout analysis results to obtain the final layout analysis result.
5. A method according to any one of claims 1-3, further comprising:
and if the structure type is a mixed structure, carrying out whole-plate analysis processing on the original image so as to obtain a final layout analysis result of the original image.
6. A layout analysis apparatus comprising:
the acquisition module is used for acquiring an original image;
the structure analysis module is used for carrying out structure analysis processing on the original image so as to determine the structure type of the original image;
the first processing module is used for dividing the original image into a plurality of subgraphs if the structure type is a single structure, and carrying out layout analysis processing on the subgraphs to obtain a subgraph layout analysis result; and obtaining a final layout analysis result of the original image according to the sub-image layout analysis result.
7. The apparatus of claim 6, wherein,
the unitary structure includes: a left-right structure;
the original image of the left and right structure comprises separators;
the first processing module is further configured to:
detecting position information of the separator in the original image;
and dividing the original image into a plurality of sub-images according to the position information of the separator, wherein the plurality of sub-images have the same height.
8. The apparatus of claim 6, wherein,
the unitary structure includes: an up-down structure;
the first processing module is further configured to:
detecting position information of target elements in a target area in the original image, wherein the target area is a partial area in the original image, and the target elements are elements at the uppermost position in the target area;
and dividing the original image into a plurality of sub-images according to the position information of the target element, wherein the plurality of sub-images have the same width.
9. The apparatus of any of claims 6-8, wherein the first processing module is further to:
and merging the sub-image layout analysis results to obtain the final layout analysis result.
10. The apparatus of any of claims 6-8, further comprising:
and the second processing module is used for carrying out whole-image layout analysis processing on the original image if the structure type is a mixed structure so as to obtain a final layout analysis result of the original image.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-5.
CN202310097728.XA 2023-01-19 2023-01-19 Layout analysis method, device, equipment and storage medium Pending CN116152835A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310097728.XA CN116152835A (en) 2023-01-19 2023-01-19 Layout analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310097728.XA CN116152835A (en) 2023-01-19 2023-01-19 Layout analysis method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116152835A true CN116152835A (en) 2023-05-23

Family

ID=86373140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310097728.XA Pending CN116152835A (en) 2023-01-19 2023-01-19 Layout analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116152835A (en)

Similar Documents

Publication Publication Date Title
US11861919B2 (en) Text recognition method and device, and electronic device
CN114550177B (en) Image processing method, text recognition method and device
US11810319B2 (en) Image detection method, device, storage medium and computer program product
CN113780098B (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN113657274B (en) Table generation method and device, electronic equipment and storage medium
CN113205041A (en) Structured information extraction method, device, equipment and storage medium
JP7393472B2 (en) Display scene recognition method, device, electronic device, storage medium and computer program
US20220027661A1 (en) Method and apparatus of processing image, electronic device, and storage medium
CN113627439A (en) Text structuring method, processing device, electronic device and storage medium
US20230045715A1 (en) Text detection method, text recognition method and apparatus
CN116844177A (en) Table identification method, apparatus, device and storage medium
CN113378712A (en) Training method of object detection model, image detection method and device thereof
US11881044B2 (en) Method and apparatus for processing image, device and storage medium
CN113255501A (en) Method, apparatus, medium, and program product for generating form recognition model
CN113361371B (en) Road extraction method, device, equipment and storage medium
CN116152835A (en) Layout analysis method, device, equipment and storage medium
CN114842489A (en) Table analysis method and device
CN114511863A (en) Table structure extraction method and device, electronic equipment and storage medium
CN113435257A (en) Method, device and equipment for identifying form image and storage medium
CN113378958A (en) Automatic labeling method, device, equipment, storage medium and computer program product
CN113313125A (en) Image processing method and device, electronic equipment and computer readable medium
CN116259064B (en) Table structure identification method, training method and training device for table structure identification model
CN115497113B (en) Information generation method, device, electronic equipment and storage medium
CN116541549B (en) Subgraph segmentation method, subgraph segmentation device, electronic equipment and computer readable storage medium
CN116259064A (en) Table structure identification method, training method and training device for table structure identification model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination