CN115909356A - Method and device for determining paragraph of digital document, electronic equipment and storage medium - Google Patents

Method and device for determining paragraph of digital document, electronic equipment and storage medium Download PDF

Info

Publication number
CN115909356A
CN115909356A CN202211736986.XA CN202211736986A CN115909356A CN 115909356 A CN115909356 A CN 115909356A CN 202211736986 A CN202211736986 A CN 202211736986A CN 115909356 A CN115909356 A CN 115909356A
Authority
CN
China
Prior art keywords
digital document
detection
paragraph
determining
coordinate information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211736986.XA
Other languages
Chinese (zh)
Inventor
罗骁
徐天适
田丰
王晓亮
黄宇恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GRG Banking IT Co Ltd
Original Assignee
GRG Banking Equipment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GRG Banking Equipment Co Ltd filed Critical GRG Banking Equipment Co Ltd
Priority to CN202211736986.XA priority Critical patent/CN115909356A/en
Publication of CN115909356A publication Critical patent/CN115909356A/en
Priority to PCT/CN2023/137045 priority patent/WO2024140094A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The application discloses a paragraph determining method and device of a digital document, electronic equipment and a storage medium, and belongs to the field of character recognition. The digital document paragraph determining method comprises the following steps: acquiring coordinate information of a plurality of first detection frames of the digital document; inputting the coordinate information of the first detection frames into a target detection model, and obtaining a first connection relation between edge points of the first detection frames output by the target detection model; determining a first text paragraph outline of the digital document based on the first connection relationship; and the target detection model is obtained by training based on the sample coordinate information and the corresponding sample connection relation. The method solves the problem that the layout is difficult to recover when the digital document is inclined and deformed, can improve the accuracy of determining the document paragraph outline, effectively avoids the conditions of identifying blocks and information loss in optical character identification, and improves the accuracy of optical character identification.

Description

Method and device for determining paragraph of digital document, electronic equipment and storage medium
Technical Field
The present application relates to the field of text recognition, and in particular, to a method and an apparatus for determining paragraphs of a digital document, an electronic device, and a storage medium.
Background
In government affairs, auditing and other scenes, a large number of digital documents (images, PDF files and the like) are often required to be processed, the reading and arrangement of the documents are time-consuming and labor-consuming by manpower, and the documents are generally scanned or understood and analyzed by Optical Character Recognition (OCR) at present.
Today, digital documents are often parsed by optical character recognition techniques, which is a process of examining characters printed on paper using electronic devices and then translating the shapes into computer text using character recognition methods.
Common digital documents may have the situations of document inclination, deformation or complex typesetting, etc., and document paragraph outlines cannot be accurately divided, so that in the process of optical character recognition and analysis, the phenomena of recognition blocking and information loss occur, and the accuracy of optical character recognition is reduced.
Disclosure of Invention
The present application is directed to solving at least one of the problems in the prior art. Therefore, the application provides a paragraph determining method and device of a digital document, an electronic device and a storage medium, and the accuracy of determining the outline of the document paragraph is improved.
In a first aspect, the present application provides a method for determining paragraphs of a digital document, the method comprising:
acquiring coordinate information of a plurality of first detection frames of the digital document;
inputting the coordinate information of the plurality of first detection frames into a target detection model, and obtaining a first connection relation among edge points of the plurality of first detection frames output by the target detection model;
determining a first text paragraph outline of the digital document based on the first connection relationship;
the target detection model is obtained by training based on sample coordinate information and a corresponding sample connection relation.
According to the method for determining the paragraph of the digital document, whether the edge points of the first detection frames can be connected into one section or not is judged through acquiring the coordinate information of the first detection frames of the digital document and the connection relation of deep learning of the target detection model, the outline of the paragraph is accurately divided, the problem that the layout of the digital document is difficult to recover when the document inclines and deforms is solved, the accuracy of determining the outline of the paragraph of the document can be improved, the situations that the optical character recognition is blocked and the information is lost are effectively avoided, and the accuracy of the optical character recognition is improved.
According to an embodiment of the application, the acquiring coordinate information of a plurality of first detection boxes of a digital document includes:
determining the plurality of first detection boxes of the digital document;
acquiring corner coordinates of the plurality of first detection frames;
based on the coordinates of the corner points of the first detection frames, carrying out average interpolation between the corner points of the same first detection frame to obtain edge points of the first detection frames;
determining coordinate information of the plurality of first detection frames based on the edge points of the plurality of first detection frames.
According to an embodiment of the present application, the determining the plurality of first detection boxes of the digital document includes:
acquiring a plurality of second detection frames of the digital document according to an optical character recognition algorithm;
and fusing the plurality of second detection frames along a first direction to obtain the plurality of first detection frames, and fusing at least one second detection frame positioned in the same line into one first detection frame, wherein the first direction is the text line direction of the digital document.
According to an embodiment of the present application, the inputting the coordinate information of the plurality of first detection frames to an object detection model to obtain a first connection relationship between edge points of the plurality of first detection frames output by the object detection model includes:
inputting the coordinate information of the plurality of first detection frames into a graph convolution neural network of the target detection model, and obtaining the coordinate characteristics of the plurality of first detection frames output by the graph convolution neural network;
and inputting the coordinate characteristics of the plurality of first detection frames into a graph convolution self-encoder of the target detection model to obtain the first connection relation output by the graph convolution self-encoder.
According to one embodiment of the application, the coordinate characteristics of the first detection frame represent characteristic information of a first coordinate point of the first detection frame and characteristic information of a second coordinate point adjacent to the first coordinate point.
According to an embodiment of the present application, the determining a first text paragraph outline of the digital document based on the first connection relation includes:
determining a plurality of first text paragraph edges of the digital document based on the first connection relationship;
and connecting the plurality of first text paragraph edges, and removing the first text paragraph edges which cannot be closed to obtain the first text paragraph outline.
In a second aspect, the present application provides an apparatus for determining paragraphs of a digital document, the apparatus comprising:
the acquisition module is used for acquiring coordinate information of a plurality of first detection frames of the digital document;
the first processing module is used for inputting the coordinate information of the plurality of first detection frames into a target detection model and obtaining a first connection relation among the edge points of the plurality of first detection frames output by the target detection model;
a second processing module, configured to determine a first text paragraph outline of the digital document based on the first connection relationship;
the target detection model is obtained by training based on sample coordinate information and a corresponding sample connection relation.
According to the paragraph determining device of the digital document, the coordinate information of the first detection frames of the digital document is acquired, the connection relation of the target detection model in deep learning is judged, whether the edge points of the first detection frames can be connected into one section or not is judged, the paragraph outline is accurately divided, the problem that the layout of the digital document is difficult to recover when the document inclines and deforms is solved, the accuracy of determining the paragraph outline of the document can be improved, the situations that the optical character recognition is blocked and the information is lost are effectively avoided, and the accuracy of the optical character recognition is improved.
In a third aspect, the present application provides an electronic device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for determining paragraphs of a digital document according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for paragraph determination of a digital document as described in the first aspect above.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method for paragraph determination of a digital document as described in the first aspect above.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a method for determining paragraphs of a digital document according to an embodiment of the present application;
fig. 2 is a second flowchart of a paragraph determination method for a digital document according to an embodiment of the present application;
FIG. 3 is a schematic flowchart of generating a first detection box according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of determining an outline of a first text paragraph according to an embodiment of the present application;
fig. 5 is a schematic diagram illustrating determination of connection relationships of edge points in training of a target detection model according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of training a target detection model according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a paragraph determination apparatus of a digital document provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below clearly with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application are capable of operation in sequences other than those illustrated or described herein, and that the terms "first," "second," etc. are generally used in a generic sense and do not limit the number of terms, e.g., a first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/", and generally means that the former and latter related objects are in an "or" relationship.
The method for determining a paragraph of a digital document, the apparatus for determining a paragraph of a digital document, the electronic device, and the readable storage medium provided in the embodiments of the present application are described in detail below with reference to fig. 1 to 8 through specific embodiments and application scenarios thereof.
As shown in fig. 1, the paragraph determination method of a digital document includes steps 110 to 130.
Step 110, coordinate information of a plurality of first detection frames of the digital document is obtained.
The digital document refers to an electronic document in which information data is processed by a digitizing technique such as paper scanning, image optimization, compression filing, and the like.
For example, the digital document may be an electronic document such as an image and a PDF file.
The coordinate information of the first detection frame is used for describing the position of the first detection frame in the space.
As shown in fig. 3, the plurality of first detection frames may be a plurality of entire lines of detection frames obtained by subjecting the digital document to optical character recognition and processing.
The coordinate information of the first detection frame includes coordinate information of a plurality of edge points on the first detection frame.
And 120, inputting the coordinate information of the first detection frames into the target detection model, and obtaining a first connection relation between the edge points of the first detection frames output by the target detection model.
The target detection model is obtained by training based on the sample coordinate information and the corresponding sample connection relation.
The sample coordinate information includes coordinate information of sample coordinate points, and the sample connection relationship is a connection relationship between the sample coordinate points.
In this step, the object detection model is configured to process coordinate information of the first detection frame, and output a first connection relationship between edge points of the plurality of first detection frames, the first connection relationship being used to describe a relationship between the edge points of the first detection frame.
The target detection model is a detection model trained based on supervised deep learning.
The target detection model needs to preset an input object and an expected output corresponding to the input object in a training stage, then the input object is input into the target detection model, the output is obtained, the output is compared with the expected output, and the model is reversely updated.
In this step, the input object is sample coordinate information, and the desired output is a connection relationship of samples.
The target detection model processes the coordinate information of the first detection frame through deep learning, wherein the deep learning is to form more abstract high-level features by combining the low-level features and represents attribute categories or features.
Step 130, determining a first text paragraph outline of the digital document based on the first connection relation.
In this step, the first connection relation is a connection relation between a plurality of edge points in the first detection frame, and the first text paragraph outline is a boundary line of a text paragraph or an image of the digital document.
As shown in fig. 4, after the first connection relationship is determined, the target detection model may aggregate edge points in the first detection frame in the first line to form a paragraph outline, so as to implement extraction of the text paragraph outline.
In the related art, a layout reduction technology is usually used for detecting and dividing paragraphs of a document, but due to inclination, deformation or complex typesetting of the document, the efficiency and accuracy of detecting and dividing paragraphs of the document are low, redundant information such as matching, decoration and shading also causes interference to the detection and division of paragraphs of the document, further causes the problems of recognition blocking, semantic loss and the like in optical character recognition analysis, and causes low accuracy of optical character recognition.
In the embodiment of the application, whether a plurality of edge points of a first detection frame can be connected into a section or not is judged according to the coordinate information of a plurality of first detection frames of a digital document in a space and the connection relation of deep learning of a target detection model, and the section outline is accurately divided.
According to the paragraph determining method of the digital document, the coordinate information of the first detection frames of the digital document and the connection relation of the target detection model in deep learning are obtained, whether the edge points of the first detection frames can be connected into one section or not is judged, the paragraph outline is accurately divided, the problem that the layout of the digital document is difficult to recover when the document is inclined and deformed is solved, the accuracy of determining the paragraph outline of the document can be improved, the situations that the optical character recognition is blocked and information is lost are effectively avoided, and the accuracy of the optical character recognition is improved.
In some embodiments, obtaining coordinate information of a plurality of first detection boxes of a digital document comprises:
determining a plurality of first detection boxes of the digital document;
acquiring corner coordinates of a plurality of first detection frames;
based on the coordinates of the corner points of the first detection frames, carrying out average interpolation between the corner points of the same first detection frame to obtain edge points of the first detection frames;
and determining coordinate information of the plurality of first detection frames based on the edge points of the plurality of first detection frames.
The corner coordinates are used for describing position information of corners of the first detection frame in space, wherein the number of the acquired corner coordinates is determined according to the shape of the first detection frame.
For example, when the shape of the first detection frame is a triangle, the number of corner points of the first detection frame is three; when the shape of the first detection frame is pentagonal, the number of corner points of the first detection frame is five, and the shape of the first detection frame is not limited here.
And based on the coordinates of the corner points of the first detection frames, carrying out average interpolation between the corner points of the same first detection frame so as to obtain edge points corresponding to the first detection frames.
It should be noted that the edge lines of the first detection frame or the diagonal lines inside the first detection frame may be obtained by connecting the corner points of the first detection frame, and the average interpolation is performed only on the edge lines of the first detection frame to obtain a plurality of edge points.
In this step, a plurality of edge points can be obtained by using an interpolation method between corner points, so that the data volume of the coordinate information describing the first detection frame can be increased, more features are shown in the coordinate information, and a more accurate connection relationship can be obtained.
The average interpolation is carried out between two corner points according to the coordinate distance between the corner points and the principle that two adjacent edge points are separated by the same distance, so that the distance between the two adjacent edge points can be ensured to be fixed.
The intervals among the edge points obtained by using the average interpolation are equal, and when the edge points are input into the target detection model, the edge points obtained by using the average interpolation are used as input, so that the output of obtaining the target detection model is more accurate, and the processing efficiency of the target detection model is higher.
For example, assuming that the points a and b are corner points and the distance between the corner point a and the corner point b is 4, 3 edge points can be interpolated in the middle by using an average interpolation method, and the distances between the adjacent edge points are all 1.
In actual implementation, when coordinate information of the first detection frame is acquired, the first detection frame needs to be acquired first.
In some embodiments, determining a plurality of first detection boxes for a digital document comprises:
acquiring a plurality of second detection frames of the digital document according to an optical character recognition algorithm;
and fusing the detection frames of the second detection frames along the first direction to obtain a plurality of first detection frames, and fusing at least one second detection frame positioned in the same row into one first detection frame.
As shown in fig. 3, the second detection frame may be a plurality of small rectangles obtained by identifying the digital document through the optical recognition algorithm in the first line.
In this embodiment, the plurality of first detection frames are obtained by fusing the plurality of second detection frames along a first direction based on the coordinate information of the second detection frames, wherein the first direction is a text line direction of the digital document.
For example, when the text line is horizontal, the first direction is horizontal, and for example, when the text line is vertical, the first direction is vertical.
If the same row only has one second detection frame, the second detection frame is the first detection frame, if the same row has a plurality of second detection frames, the plurality of second detection frames are fused into one first detection frame, when the second detection frames are fused, the size of an interested area of second direction coordinates of each second detection frame needs to be calculated along the first direction, and a fusion threshold value is set.
And regarding the detection frames of the interesting region with the second direction coordinate larger than the fusion threshold as the same line, and fusing the second detection frames along the first direction to obtain the first detection frame.
The fusion threshold value can be properly adjusted according to different conditions, and when the font of the recognized text is large, the fusion threshold value can be properly increased, so that the situation that the length of the interested region in the second direction is too small, the interested region cannot include the characters of the character line, and information loss is caused is avoided.
When the recognized text font is smaller, the fusion threshold value can be properly reduced, so that the phenomenon that the length of the second direction of the interested area is too large, the interested area contains more than one character line information, and the character recognition error is caused is avoided.
In some embodiments, inputting the coordinate information of the plurality of first detection frames to the object detection model, and obtaining the first connection relationship between the edge points of the plurality of first detection frames output by the object detection model, includes:
inputting the coordinate information of the first detection frames into a graph convolution neural network of the target detection model to obtain the coordinate characteristics of the first detection frames output by the graph convolution neural network;
and inputting the coordinate characteristics of the first detection frames into a graph convolution self-encoder of the target detection model to obtain a first connection relation output by the graph convolution self-encoder.
The target detection model comprises a graph convolution neural network used for processing edge point coordinate information in the first detection frame and a graph convolution self-encoder used for processing coordinate characteristics of the first detection frame.
In actual implementation, the input of the object detection model includes coordinate information of a plurality of edge points in the first detection box.
When the graph convolution neural network processes the coordinate information of the first detection frame, all adjacent edge points may be taken into consideration, and the mutual distances, positions and other information of all edge points are integrated, so as to determine the most probable connection relationship, where the feature and state of each individual edge point are affected by the adjacent edge points, even the adjacent edge points of the adjacent edge points.
In this embodiment, the convolutional neural network is configured to process the coordinate information of the first detection frame and output the coordinate feature of the first detection frame, and the convolutional self-encoder is configured to process the coordinate feature output by the convolutional neural network and output the first connection relationship of the edge points of the plurality of first detection frames.
In actual implementation, before processing the coordinate information of the first detection frame, the target detection model needs to be pre-trained.
An embodiment of training is described below.
Firstly, a target detection model is trained by using pre-labeled edge point coordinates and data sets (G, V) of edge connection relations, wherein G is edge point information, and V is edge connection relation information.
As shown in FIG. 5, it is assumed that what needs to be predicted is the edge connection relationship V of the edge point 1 and the edge point 2 12 And the edge connection relation V of the edge point 6 and the edge 2 62
For the connection relation to be predicted, the input of the graph convolution neural network is a set of its two end edge points, edge point adjacent points and its next adjacent points, as shown in fig. 5, the set includes edge point 1, edge point 2, edge point 3, edge point 4, edge point 5, edge point 6 and edge point 7.
As shown in fig. 6, after the set is input into the convolutional neural network, the output coordinate feature of the convolutional neural network is obtained, then the coordinate feature is input into the convolutional self-encoder for decoding, the decoding result is input into the two classifiers, and the two classifiers output two results, which are 0 and 1 respectively.
Wherein 0 indicates that the connection relationship is not established, 1 indicates that the connection relationship is established, and V is the basis of 12 And V 62 Corresponding output result V of the binary divider 12 And V 62 In this embodiment, V 12 =1,V 62 =0, representing the connection relation V 12 Is established, connection relation V 62 It is not true.
And then comparing the result with a pre-labeled result, calculating the cross loss, and then performing back propagation to update the graph convolution target detection model to finish the training of the target detection model.
After the training of the target detection model is completed, the coordinate information of the edge points of the first detection frame may be input into the target detection model, and the target detection model may determine a connection relationship between the edge points and output the first connection relationship between the edge points of the first detection frame.
For example, the coordinate information of the input first detection box includes coordinate information of 3 edge points.
The target detection model judges the 1 st edge point and the second edge point, and the obtained result comprises the connection of the edge point 1 and the edge point 2, or the disconnection of the edge point 1 and the edge point 2, and then the connection relation of the edge point 1 and the edge point 3 and the connection relation of the edge points 2 and 3.
And integrating the three connection relations to obtain an edge point 1, an edge point 2 and an edge point 3, and connecting the three edge points.
In some embodiments, the coordinate characteristics of the first detection frame characterize characteristic information of a first coordinate point of the first detection frame and characteristic information of a second coordinate point adjacent to the first coordinate point.
And the adjacency indicates that the first coordinate point is adjacent to the second coordinate point, and the coordinate feature of the first detection frame represents feature information of a plurality of adjacent coordinate points.
The characteristics of the first coordinate point and the characteristics of the second coordinate point adjacent to the first coordinate point are comprehensively considered through the coordinate characteristics of the first detection frame, the distance position information of all adjacent edge points is reflected, and the accurate judgment of the connection relation is facilitated.
The coordinate feature of the first detection frame is obtained by inputting the edge point into the target detection model, and the feature output by the graph convolution neural network in the target detection model is obtained.
The coordinate characteristics of the first detection frame are used for being input into the target detection model, and the output of the target detection model is obtained and comprises a first connection relation among a plurality of edge points in the first detection frame.
In some embodiments, determining a first text paragraph outline of the digital document based on the first connection relationship comprises:
determining a plurality of first text paragraph edges of the digital document based on the first connection relationship;
and connecting a plurality of first text paragraph edges, and removing the first text paragraph edges which cannot be closed to obtain a first text paragraph outline.
In this embodiment, the first text paragraph edge is an edge obtained by connecting a plurality of edge points based on the first connection relationship.
In actual implementation, according to a first connection relationship between edge points, connecting the edge points to obtain a plurality of first text paragraph edges, then removing the first text paragraph edges which cannot be closed, and leaving a range which can be enclosed by the closed first text paragraph edges as a closed shape, where the closed shape is a first text paragraph outline.
By acquiring the closed first text paragraph outline, the completeness of the character line information in the outline can be ensured, and the identification is more accurate.
One embodiment is described below.
As shown in fig. 2, a digital document (e.g., pdf/picture) is first input into an optical character recognition module (i.e., OCR module), and the obtained optical character recognition module outputs a plurality of optical character recognition detection boxes (i.e., second detection boxes).
As shown in fig. 3, the plurality of second detection frames in the first row are fused along the first direction, so as to obtain a plurality of first detection frames in the second row.
And processing the plurality of first detection frames to obtain first detection frame corner points, and obtaining a plurality of edge points with equal distances by using an average interpolation method.
Inputting a plurality of edge points into a target detection model, and acquiring a first connection relation output by the target detection model, wherein a graph convolution neural network (GCN) is used for decoding the edge points and the corner points to acquire a plurality of coordinate features, and a graph convolution self-encoder (GAE) is used for processing the coordinate features to acquire a plurality of first connection relations.
As shown in fig. 4, according to a plurality of first connection relationships, edge points of the first detection frame in the first line are connected to obtain a plurality of first text paragraph edges in the second line, the first text paragraph edges that cannot be closed are removed, and an outline surrounded by the remaining first text paragraph edges is a first text paragraph outline.
The paragraph determining method of the digital document can be applied to the terminal, and can be specifically executed by hardware or software in the terminal.
The terminal includes, but is not limited to, a portable communication device such as a mobile phone or a tablet computer having a touch sensitive surface (e.g., a touch screen display and/or a touch pad). It should also be understood that in some embodiments, the terminal may not be a portable communication device, but rather a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).
In the following various embodiments, a terminal including a display and a touch-sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and joystick.
In the method for determining a paragraph of a digital document provided in an embodiment of the present application, an execution subject of the method for determining a paragraph of a digital document may be a functional module or a functional entity capable of implementing the method for determining a paragraph of a digital document in an electronic device or an electronic device, the electronic device mentioned in the embodiment of the present application includes, but is not limited to, a mobile phone, a tablet computer, a camera, a wearable device, and the like, and the method for determining a paragraph of a digital document provided in the embodiment of the present application is described below with an electronic device as an execution subject.
According to the method for determining paragraphs of a digital document provided by the embodiment of the application, the execution subject can be a paragraph determination device of the digital document. The embodiments of the present application take a method for determining paragraphs of a digital document executed by a paragraph determination device of the digital document as an example, and describe a paragraph determination device of a digital document provided by the embodiments of the present application.
The embodiment of the application also provides a paragraph determining device of the digital document.
As shown in fig. 7, the paragraph determining apparatus of the digital document includes:
an obtaining module 710, configured to obtain coordinate information of a plurality of first detection boxes of a digital document;
the first processing module 720 is configured to input the coordinate information of the plurality of first detection frames to the target detection model, and obtain a first connection relationship between edge points of the plurality of first detection frames output by the target detection model;
a second processing module 730, configured to determine a first text paragraph outline of the digital document based on the first connection relationship;
the target detection model is obtained by training based on the sample coordinate information and the corresponding sample connection relation.
According to the paragraph determining device of the digital document, the coordinate information of the first detection frames of the digital document is acquired, the connection relation of the target detection model in deep learning is judged, whether the edge points of the first detection frames can be connected into one section or not is judged, the paragraph outline is accurately divided, the problem that the layout of the digital document is difficult to recover when the document inclines and deforms is solved, the accuracy of determining the paragraph outline of the document can be improved, the situations that the optical character recognition is blocked and the information is lost are effectively avoided, and the accuracy of the optical character recognition is improved.
In some embodiments, the obtaining module 710 is configured to determine the plurality of first detection boxes of the digital document;
acquiring corner coordinates of the plurality of first detection frames;
based on the coordinates of the corner points of the plurality of first detection frames, carrying out average interpolation between the corner points of the same first detection frame to obtain the edge points of the plurality of first detection frames;
and determining the coordinate information of the plurality of first detection frames based on the edge points of the plurality of first detection frames.
In some embodiments, the obtaining module 710 is further configured to obtain a plurality of second detection boxes of the digital document according to an optical character recognition algorithm;
and fusing the detection frames of the plurality of second detection frames along a first direction to obtain a plurality of first detection frames, and fusing at least one second detection frame positioned on the same line into one first detection frame, wherein the first direction is the text line direction of the digital document.
In some embodiments, the first processing module 720 is configured to input the coordinate information of the plurality of first detection boxes to a convolutional neural network of the target detection model, and obtain the coordinate features of the plurality of first detection boxes output by the convolutional neural network;
and inputting the coordinate characteristics of the first detection frames into a graph convolution self-encoder of the target detection model to obtain a first connection relation output by the graph convolution self-encoder.
In some embodiments, the coordinate characteristics of the first detection frame characterize characteristic information of a first coordinate point of the first detection frame and characteristic information of a second coordinate point adjacent to the first coordinate point.
In some embodiments, the second processing module 730 is configured to determine a first plurality of text paragraph edges of the digital document based on the first connection relationship;
and connecting a plurality of first text paragraph edges, and removing the first text paragraph edges which cannot be closed to obtain a first text paragraph outline.
The paragraph determining apparatus of the digital document in the embodiment of the present application may be an electronic device, and may also be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be a device other than a terminal. The electronic Device may be, for example, a Mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic Device, a Mobile Internet Device (MID), an Augmented Reality (AR)/Virtual Reality (VR) Device, a robot, a wearable Device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a Network Attached Storage (Network Attached Storage, NAS), a personal computer (NAS), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not limited in particular.
The paragraph determination device of the digital document in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an IOS operating system, or other possible operating systems, which is not specifically limited in the embodiment of the present application.
The paragraph determining apparatus for a digital document provided in the embodiment of the present application can implement each process implemented in the method embodiments of fig. 1 to fig. 7, and is not described here again to avoid repetition.
In some embodiments, as shown in fig. 8, an electronic device 800 is further provided in the embodiments of the present application, and includes a processor 801, a memory 802, and a computer program stored in the memory 802 and capable of being executed on the processor 801, where the computer program, when executed by the processor 801, implements each process of the above paragraph determination method for a digital document, and can achieve the same technical effect, and is not described herein again to avoid repetition.
It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.
The embodiments of the present application further provide a non-transitory computer-readable storage medium, where a computer program is stored on the non-transitory computer-readable storage medium, and when executed by a processor, the computer program implements each process of the above paragraph determining method for a digital document, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read only memory ROM, a random access memory RAM, a magnetic or optical disk, and the like.
An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for determining paragraphs of the digital document is implemented.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read only memory ROM, a random access memory RAM, a magnetic or optical disk, and the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatuses in the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions recited, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
In the description herein, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. A method for paragraph determination in a digital document, comprising:
acquiring coordinate information of a plurality of first detection frames of the digital document;
inputting the coordinate information of the first detection frames into a target detection model, and obtaining a first connection relation between edge points of the first detection frames output by the target detection model;
determining a first text paragraph outline of the digital document based on the first connection relationship;
the target detection model is obtained by training based on sample coordinate information and a corresponding sample connection relation.
2. The method for determining paragraphs of a digital document according to claim 1, wherein the obtaining coordinate information of a plurality of first detection boxes of the digital document comprises:
determining the plurality of first detection boxes of the digital document;
acquiring corner coordinates of the plurality of first detection frames;
based on the coordinates of the corner points of the plurality of first detection frames, carrying out average interpolation between the corner points of the same first detection frame to obtain the edge points of the plurality of first detection frames;
determining coordinate information of the plurality of first detection frames based on the edge points of the plurality of first detection frames.
3. The method of claim 2, wherein the determining the plurality of first detection boxes of the digital document comprises:
acquiring a plurality of second detection frames of the digital document according to an optical character recognition algorithm;
and fusing the plurality of second detection frames along a first direction to obtain the plurality of first detection frames, and fusing at least one second detection frame positioned in the same line into one first detection frame, wherein the first direction is the text line direction of the digital document.
4. The method for determining paragraphs of a digital document according to claim 1, wherein the inputting the coordinate information of the first detection boxes into an object detection model to obtain a first connection relationship between edge points of the first detection boxes output by the object detection model comprises:
inputting the coordinate information of the plurality of first detection frames into a graph convolution neural network of the target detection model to obtain the coordinate characteristics of the plurality of first detection frames output by the graph convolution neural network;
and inputting the coordinate characteristics of the plurality of first detection frames into a graph convolution self-encoder of the target detection model to obtain the first connection relation output by the graph convolution self-encoder.
5. The method according to claim 4, wherein the coordinate feature of the first detection box represents feature information of a first coordinate point of the first detection box and feature information of a second coordinate point adjacent to the first coordinate point.
6. The method for determining paragraphs of a digital document according to any of claims 1-5, wherein the determining a first text paragraph outline of the digital document based on the first connection relation comprises:
determining a plurality of first text paragraph edges of the digital document based on the first connection relationship;
and connecting the plurality of first text paragraph edges, and removing the first text paragraph edges which cannot be closed to obtain the first text paragraph outline.
7. A paragraph determination apparatus for a digital document, comprising:
the acquisition module is used for acquiring coordinate information of a plurality of first detection frames of the digital document;
the first processing module is used for inputting the coordinate information of the first detection frames into a target detection model and obtaining a first connection relation among edge points of the first detection frames output by the target detection model;
a second processing module, configured to determine a first text paragraph outline of the digital document based on the first connection relationship;
the target detection model is obtained by training based on sample coordinate information and a corresponding sample connection relation.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for paragraph determination of a digital document according to any one of claims 1 to 6 when executing the program.
9. A non-transitory computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing a method for paragraph determination of a digital document according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements a method for paragraph determination of a digital document according to any of claims 1-6.
CN202211736986.XA 2022-12-30 2022-12-30 Method and device for determining paragraph of digital document, electronic equipment and storage medium Pending CN115909356A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211736986.XA CN115909356A (en) 2022-12-30 2022-12-30 Method and device for determining paragraph of digital document, electronic equipment and storage medium
PCT/CN2023/137045 WO2024140094A1 (en) 2022-12-30 2023-12-07 Paragraph determination method and apparatus for digital document, and electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211736986.XA CN115909356A (en) 2022-12-30 2022-12-30 Method and device for determining paragraph of digital document, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115909356A true CN115909356A (en) 2023-04-04

Family

ID=86473052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211736986.XA Pending CN115909356A (en) 2022-12-30 2022-12-30 Method and device for determining paragraph of digital document, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN115909356A (en)
WO (1) WO2024140094A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024140094A1 (en) * 2022-12-30 2024-07-04 广电运通集团股份有限公司 Paragraph determination method and apparatus for digital document, and electronic device and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012057891A1 (en) * 2010-10-26 2012-05-03 Hewlett-Packard Development Company, L.P. Transformation of a document into interactive media content
US11244203B2 (en) * 2020-02-07 2022-02-08 International Business Machines Corporation Automated generation of structured training data from unstructured documents
CN113221632A (en) * 2021-03-23 2021-08-06 奇安信科技集团股份有限公司 Document picture identification method and device and computer equipment
CN114399782B (en) * 2022-01-18 2024-03-22 腾讯科技(深圳)有限公司 Text image processing method, apparatus, device, storage medium, and program product
CN115909356A (en) * 2022-12-30 2023-04-04 广州广电运通金融电子股份有限公司 Method and device for determining paragraph of digital document, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024140094A1 (en) * 2022-12-30 2024-07-04 广电运通集团股份有限公司 Paragraph determination method and apparatus for digital document, and electronic device and storage medium

Also Published As

Publication number Publication date
WO2024140094A1 (en) 2024-07-04

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
US11886799B2 (en) Determining functional and descriptive elements of application images for intelligent screen automation
US10210415B2 (en) Method and system for recognizing information on a card
KR101690981B1 (en) Form recognition method and device
US9904847B2 (en) System for recognizing multiple object input and method and product for same
JP7132050B2 (en) How text lines are segmented
CN109697414B (en) Text positioning method and device
CN113239818B (en) Table cross-modal information extraction method based on segmentation and graph convolution neural network
US8515175B2 (en) Storage medium, apparatus and method for recognizing characters in a document image using document recognition
CN110210480B (en) Character recognition method and device, electronic equipment and computer readable storage medium
JP2019102061A5 (en)
WO2024140094A1 (en) Paragraph determination method and apparatus for digital document, and electronic device and storage medium
CN111951283A (en) Medical image identification method and system based on deep learning
CN113420848A (en) Neural network model training method and device and gesture recognition method and device
JP2019220014A (en) Image analyzing apparatus, image analyzing method and program
CN111783561A (en) Picture examination result correction method, electronic equipment and related products
Mohammad et al. Contour-based character segmentation for printed Arabic text with diacritics
CN110147785B (en) Image recognition method, related device and equipment
RU2597163C2 (en) Comparing documents using reliable source
CN113628113A (en) Image splicing method and related equipment thereof
CN113449726A (en) Character comparison and identification method and device
KR20220132536A (en) Math detection in handwriting
JP2020119559A (en) Character recognition method and character recognition device
CN116030472A (en) Text coordinate determining method and device
Xu et al. Tolerance Information Extraction for Mechanical Engineering Drawings–A Digital Image Processing and Deep Learning-based Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: 510663 9, 11, science Road, science and Technology City, Guangzhou high tech Industrial Development Zone, Guangdong

Applicant after: Guangdian Yuntong Group Co.,Ltd.

Address before: 510663 9, 11, science Road, science and Technology City, Guangzhou high tech Industrial Development Zone, Guangdong

Applicant before: GRG BANKING EQUIPMENT Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240623

Address after: Room 701, No. 11, Kelin Road, Science City, Huangpu District, Guangzhou City, Guangdong Province, 510663

Applicant after: GRG BANKING IT Co.,Ltd.

Country or region after: China

Address before: 510663 9, 11, science Road, science and Technology City, Guangzhou high tech Industrial Development Zone, Guangdong

Applicant before: Guangdian Yuntong Group Co.,Ltd.

Country or region before: China