CN116580415A - Electronic form identification method, electronic form identification device, electronic equipment and storage medium - Google Patents

Electronic form identification method, electronic form identification device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116580415A
CN116580415A CN202310556330.8A CN202310556330A CN116580415A CN 116580415 A CN116580415 A CN 116580415A CN 202310556330 A CN202310556330 A CN 202310556330A CN 116580415 A CN116580415 A CN 116580415A
Authority
CN
China
Prior art keywords
line segment
intersection points
template
identification
line segments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310556330.8A
Other languages
Chinese (zh)
Other versions
CN116580415B (en
Inventor
匡海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sifang Zhiyuan Technology Co ltd
Original Assignee
Shenzhen Sifang Zhiyuan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sifang Zhiyuan Technology Co ltd filed Critical Shenzhen Sifang Zhiyuan Technology Co ltd
Priority to CN202310556330.8A priority Critical patent/CN116580415B/en
Publication of CN116580415A publication Critical patent/CN116580415A/en
Application granted granted Critical
Publication of CN116580415B publication Critical patent/CN116580415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • G06V30/18019Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • G06V30/19013Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device, electronic equipment and a storage medium for recognizing an electronic form, wherein the method for recognizing the electronic form comprises the following steps: template construction: form features of the target form are extracted to form an identification template; element arrangement: extracting and arranging elements on a target table, wherein the elements are line segments; and comparison of intersection points: traversing the line segment, and detecting whether corresponding parameters of the line segment are matched with the recognition template; and, a table generation step: and restoring and identifying the real position of the obtained table according to the table characteristics to obtain the electronic table. The scheme takes the ratio of the adjacent intersection point spacing on the boundary as the characteristic, has stronger anti-interference capability, and improves the recognition reliability of the electronic form.

Description

Electronic form identification method, electronic form identification device, electronic equipment and storage medium
Technical Field
The present invention relates to vector graphics recognition technology, and in particular, to a method and apparatus for recognizing electronic forms, an electronic device, and a storage medium.
Background
In the field of engineering design, vector graphics is an important file storage format, and DWG, DWF, WMF, AI is common. Compared with the image format, the vector graphic format stores the complete object information such as line segments, characters and the like, can be infinitely scaled, and has small file occupation space.
In engineering applications, it is often necessary to identify and locate electronic forms present in vector graphics files, such as electronic signatures, batch printing, automatic drawing splitting, material sheet information extraction, and the like. The existing recognition method takes characters, line segments with specific lengths and the like as characteristics, and the characteristics are not stable enough and are easy to lose so as to cause recognition failure. For example, with characters as features, some users burst characters into line segments in order to not rely on font files of the characters, resulting in the loss of character features; other background objects are arranged behind the tables in some files, and the included line segments and characters can cause interference to the characteristics; if a line segment with a specific length is characterized, the possibility that the length characteristic is lost due to the connection of the line segments exists; if a grid of a specific aspect ratio is characterized, there is a possibility that the false recognition rate increases due to too wide a feature.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provide a method, a device, electronic equipment and a storage medium for identifying a spreadsheet, solve the problem of form identification failure caused by insufficient stability of characters, line segments with specific lengths and the like serving as characteristics in the prior art, and improve the reliability of form identification.
The invention realizes the above purpose through the following technical scheme: a method of spreadsheet identification, comprising the steps of:
template construction: form features of the target form are extracted to form an identification template;
element arrangement: extracting and arranging elements on a target table, wherein the elements are line segments;
and comparison of intersection points: traversing the line segment, and detecting whether corresponding parameters of the line segment are matched with the recognition template; a kind of electronic device with high-pressure air-conditioning system
A table generation step: and restoring and identifying the real position of the obtained table according to the table characteristics to obtain the electronic table.
As a further scheme of the invention: the form features extracted in the template construction step include:
the number features are as follows: the number of intersection points on four boundaries of the target table, and the ratio of adjacent intersection point intervals; a kind of electronic device with high-pressure air-conditioning system
Layout characteristics: the relative positions and sizes of the text and the target table in the vector graphics file.
As a further scheme of the invention: at least one of the four boundaries meets the condition: the number of the intersection points is more than 2;
the extracting of the number features includes: recording the number of intersection points on four boundaries of the target table; establishing a distance queue according to a preset sequence of the distances between adjacent intersection points on four boundaries of the target table, distinguishing the adjacent intersection points according to different boundaries, and marking the distances between the adjacent intersection points as:
A 1 ,A 2 ……A m
B 1 ,B 2 ……B n
C 1 ,C 2 ……C q
D 1 ,D 2 ……D r
calculating the ratio of adjacent intervals in the queue, and marking as:
……
……
wherein m, n, q and r are positive integers.
As a further scheme of the invention: the element also includes text;
the element arrangement step comprises the following steps:
element extraction substeps: extracting elements on the target table;
element grouping sub-steps: the elements respectively establish element queues according to different dip angles, wherein an element queue determined by one dip angle comprises characters inclined by the dip angle, line segments inclined by the dip angle and line segments vertical to the dip angle;
element transformation substeps: traversing all element queues, and carrying out rotation transformation on the selected element queues around a preset rotation center to change the element queues into horizontal or vertical directions; a kind of electronic device with high-pressure air-conditioning system
Element ordering substeps: and sequencing the line segments and the characters of the selected element queue according to the coordinates respectively, and sequencing the line segments according to a preset rule.
As a further scheme of the invention: the intersection point comparison step comprises the following steps:
and the intersection point extraction substep: traversing the line segments in the reordered element queue in a preset sequence to obtain a current detection line segment L and all line segments perpendicular to the line segment L, and using the intersection point P of the current detection line segment L and all line segments 1 ,P 2 ,……P x Establishing an intersection point queue according to the coordinates of the points;
acquiring the distance between adjacent intersections in an intersection queue:
L 1 =P 1 P 2 ,
L 2 =P 2 P 3 ,
……
L x-1 =P x-1 P x
wherein the value range of x is a positive integer;
and a crossing point matching sub-step: screening a boundary with the number of intersection points larger than 2 from the identification template, and setting the boundary as A 1 ,A 2 ……A m Detecting whether the ratio of two sections of distances exists in all intersection points of the line segment L is as followsIf not, excluding the line segment L, otherwise, sequentially detecting the rest intersection points of the line segment L in the mode;
if the ratio of the intervals of a plurality of intersection points on the line segment L matches one boundary of the recognition template, the last intersection point matched on the line segment L is recorded as L t As a starting point, detecting whether there is a line segment L perpendicular to the line segment L t Is defined by the line segment L',
wherein t is more than or equal to 1 and less than or equal to x-1;
if the line segment L 'exists, continuously detecting whether the intersection point on the line segment L' is matched with the other boundary of the recognition template, if not, excluding the line segment L, otherwise, continuously matching the other boundary of the recognition template according to the preset direction until the four boundaries of the recognition template are matched.
As a further scheme of the invention: the table generating step includes:
the calculation substep: acquiring text content, and calculating according to layout characteristics to obtain a target position of the restored electronic form;
and (3) an atomic step: the resulting table is identified, and the inverse of the rotation transformation is applied to restore the true position.
The invention also provides another technical scheme: a spreadsheet identification device comprising:
and a template construction module: form features of the target form are extracted to form an identification template;
element arrangement module: the method comprises the steps of extracting and arranging elements on a target table, wherein the elements are line segments;
intersection point comparison module: the method comprises the steps of traversing a line segment, and detecting whether corresponding parameters of the line segment are matched with an identification template or not; a kind of electronic device with high-pressure air-conditioning system
A table generation module: and the electronic form is obtained by restoring and identifying the real position of the obtained form according to the form characteristics.
As a further scheme of the invention: the table features include:
the number features are as follows: the number of intersection points on four boundaries of the target table, and the ratio of adjacent intersection point intervals; a kind of electronic device with high-pressure air-conditioning system
Layout characteristics: the relative positions and sizes of the text and the target table in the vector graphics file.
The invention also provides another technical scheme: an electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor;
the processor, when executing a program, implements a method as claimed in any one of the preceding claims.
The invention also provides another technical scheme: a computer readable storage medium storing a computer program which when executed implements a method according to any one of the preceding claims.
The invention has the beneficial effects that:
according to the scheme, the table features of the target table in the vector graphic file are identified, the identification template is formed, the number of adjacent intersection points on four boundaries of the target table and the ratio of the interval between the intersection points are used as the features, the possibility of feature loss in the table identification process is reduced, the interference of line segments on the table identification can be effectively avoided, and the reliability of the table identification technology is improved.
Drawings
FIG. 1 is a flow chart of a form identification method according to the present invention.
Fig. 2 is a schematic diagram of a vector graphics file to be identified.
FIG. 3 is a schematic diagram of a target table.
Fig. 4 is a schematic diagram of a target table with linear interference.
FIG. 5 is a partial schematic diagram of a target table with multiple linear disturbances.
Fig. 6 is a detailed flowchart of the form identification method of the present invention.
Fig. 7 is a schematic workflow diagram of the form recognition apparatus of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1 to 6, in an embodiment of the present invention, there is provided a method for recognizing a spreadsheet, including: template construction, element arrangement, intersection comparison and form generation.
Template construction: extracting form features of a target electronic form on a vector graphics file, comprising: the number of the intersection points on the four boundaries of the target table, the ratio of the distances between the adjacent intersection points, the characters contained in the target table and the relative positions and the sizes of the target table in the vector graphic file form an identification template.
Detailed: as shown in fig. 3, the number of intersections on four boundaries of the electronic form is recorded, a form with at least one boundary having a number of intersections greater than 2 is selected, a pitch queue is established according to a preset sequence, preferably according to a counterclockwise sequence, and the pitches of adjacent intersections are recorded as follows according to the different boundaries:
A 1 ,A 2 ……A m
B 1 ,B 2 ……B n
C 1 ,C 2 ……C q
D 1 ,D 2 ……D r
calculating the ratio of adjacent intervals in the queue, and marking as:
……
……
wherein: m, n, q and r are positive integers.
The recognition template is used for guiding the judgment of the recognition process and restoring the target form after the recognition is successful.
Furthermore, there are the following advantages: at least one boundary of the to-be-detected form is limited to have more than 2 intersection points, so that forms which do not meet the requirements are eliminated, and the form recognition efficiency is improved; the ratio of the distances between adjacent intersection points is used as a characteristic to be more stable, and the reliability of the table identification technology is improved.
Element arrangement: extracting line segments and characters on a target table, respectively establishing element queues according to different inclination angles, traversing all the queues, sequentially selecting a queue with an inclination angle to perform rotation transformation around a preset rotation center, respectively sequencing the line segments and the characters of the selected queue according to coordinates, and sequencing the line segments according to a preset rule.
Detailed: extracting elements in the target form, including line segments and characters; a plurality of element queues are established according to different character dip angles, and the element queues with a determined dip angle comprise: characters with the same inclination angle, line segments with the same inclination angle and line segments perpendicular to the inclination angle; traversing element queues, and carrying out rotation transformation on elements in one queue around a preset rotation center, wherein the rotation center is preferably an origin (0, 0) until characters in the queue and line segments with the same inclination angle become horizontal directions, and line segments vertical to the inclination angle become vertical directions; the characters of the selected queue are ordered according to coordinates, the line segments in the horizontal direction are ordered according to the ordinate thereof, and the line segments in the vertical direction are ordered according to the abscissa thereof; and then sequencing the line segments according to a preset rule, preferably sequencing the line segments from large to small according to the length of the line segments.
Furthermore, there are the following advantages: the inclined line segment and the line segment vertical to the inclined line segment are rotated into the horizontal/vertical direction, so that the coordinates of the intersection points of the inclined line segment and the line segment are calculated; the characters are ordered according to the coordinates, so that the characters which are not in the range of the table can be removed, the recognition speed of the table can be improved, and the table can be restored; the line segments are sorted according to coordinates, so that the line segments which are not in the range of the table can be removed, and the recognition speed of the table can be improved; the segments are then sorted by length to help quickly find boundaries.
And comparison of intersection points: traversing the line segments of the current queue according to a preset sequence, recording the distance between adjacent intersection points on the line segments, and detecting whether the ratio of the distance between the two intersection points exists or not to be capable of matching with the recognition template; if the ratio of the intervals of the intersecting points on the line segment matches one boundary of the recognition template, repeating the steps with the last matching intersecting point on the line segment as a starting point to continuously match other boundaries until the four boundaries of the recognition template are matched.
Detailed: traversing the line segments of the current queue according to a preset sequence, preferably according to the sequence from the large length to the small length of the line segments; acquiring a currently detected line segment L and all line segments perpendicular to the line segment L, and using the intersection point P of the line segments 1 ,P 2 ,……P x Establishing an intersection point queue according to the coordinates of the points; acquiring the distance between adjacent intersections in an intersection queue:
L 1 =P 1 P 2 ,
L 2 =P 2 P 3 ,
……
L x-1 =P x-1 P x
wherein the value range of x is a positive integer;
screening a boundary with the number of intersection points larger than 2 from the identification template, and setting the boundary as A 1 ,A 2 ……A m Detecting whether the ratio of the interval between two sections of intersection points is the following in all intersection points of the corresponding line segment AWherein x is greater than or equal to m+1; if not, the line segment L is excluded, otherwise, the intersection point which is related to the interval of the two sections of intersection points and is positioned at the back in the queue is used as a starting point, and the comparison and recognition templates are continued.
It should be noted that, the distance pointed by the distance between the two sections of intersection points may be the distance determined by any three intersection points on the current detection line segment L, and it is not required that the intersection points are adjacent to each other; for example, the ratio calculation after combining a plurality of pitches can further eliminate erroneous judgment caused by the generation of redundant intersection points of the interference straight lines.
From a mathematical point of view: as shown in fig. 4 and 5, a boundary on the recognition template is matched by the current detected line segment L, and the distance between adjacent intersection points on the line segment L is multiplied by λ assuming line segment a 1 The following queues were obtained:
L 1 λ 1 ,L 2 λ 1 ,……,L x-1 λ 1
detecting whether L exists in queue s λ 1 The method comprises the following steps:
L s λ 1 =L s+1 +L s+2 +……+L s+y
wherein x is more than or equal to m+1, s is more than or equal to 1 and less than or equal to x-m, and y is more than or equal to 1 and less than or equal to x-m.
If so, it can be considered that line segment L s From L s+1 +L s+2 +……+L s+y The determined line segments respectively match A on line segment A 1 、A 2 The method comprises the steps of carrying out a first treatment on the surface of the At the same time can use L s+1 +L s+2 +……+L s+y The determined segment continuation detection can be matched with A 3 Whether a line segment of (2) exists; repeating the steps until the matching fails, or the line segment A is matched on the current detection line segment.
If the step matching fails, the following steps are executed:
(L 1 +L 21 ,(L 1 +L 2 +L 31 ,……,(L 1 +L 2 +……+L x-11
(L 2 +L 31 ,(L 2 +L 3 +L 41 ,……,(L 2 +L 3 +……+L x-11
……
(L x-2 +L x-11
detecting whether a value s exists, satisfying the following formula:
(L s +L s+1 +……+L s+y1 =L s+y+1 +L s+y+2 +……+L s+y+z
wherein x is more than or equal to m+1, s is more than or equal to 1 and less than or equal to x-m, y is more than or equal to 1 and less than or equal to x-m, and z is more than or equal to 1 and less than or equal to x-m.
If not, then the current detected line segment L is excluded, otherwise L can be considered s +L s+1 +……+L s+y Determined line segment and L s+y+1 +L s+y+2 +……+L s+y+z The determined line segment matches A on line segment A 1 、A 2 The method comprises the steps of carrying out a first treatment on the surface of the At the same time can use L s+y+1 +L s+y+2 +……+L s+y+z The determined segment continuation detection can be matched with A 3 Whether a line segment of (2) exists; repeating the steps until the matching fails, or the line segment A is matched on the current detection line segment.
If a plurality of intersection points on the current detection line segment L match a boundary on the recognition template, recording the last matched intersection point L on the line segment L t Detecting whether there is a line segment perpendicular to the line segment L t Is defined by the line segment L',
wherein, t is more than or equal to 1 and less than or equal to x-1;
if not, excluding the line segment L and continuing traversing other line segments in the queue; otherwise, in point L t As a starting point, detecting whether the line segment L' is matched with another boundary on the recognition template; repeating the intersection point comparison step until four boundaries on the recognition template are matched; and after the identification is successful, the elements with coordinates within the identified boundary range are exited from the circular queue.
Furthermore, there are the following advantages: the ratio of the intersection point distance is used as the characteristic, so that the interference of other factors in the vector graphic file is eliminated, the characteristic has higher stability, and the reliability of table identification is improved.
A table generation step: and acquiring text content, proportionally calculating the restored electronic form and the real position thereof according to the layout characteristics and the relative size of the identified form, and restoring the electronic form by applying the inverse transformation of the rotation transformation.
After one element queue is identified, the other queues are identified in turn; after all queues are identified, the identification process is finished.
The invention also provides a form identification device, as shown in fig. 7, comprising: the system comprises a template construction module, an element arrangement module, an intersection point comparison module and a form generation module.
And a template construction module: a form feature for extracting a target spreadsheet on a vector graphics file, comprising: the number of the intersection points on the four boundaries of the target table, the ratio of the distances between the adjacent intersection points, the characters contained in the target table and the relative positions and the sizes of the target table in the vector graphic file form an identification template.
Element arrangement module: the method comprises the steps of extracting line segments and characters on a target table, respectively establishing element queues according to different inclination angles, traversing all the queues, sequentially selecting a queue with an inclination angle to perform rotation transformation around a preset rotation center, respectively sequencing the line segments and the characters of the selected queue according to coordinates, and sequencing the line segments according to a preset rule.
And comparison of intersection points: the method comprises the steps of traversing line segments of a current queue according to a preset sequence, recording the distance between adjacent intersection points on the line segments, and detecting whether the ratio of the distance between the two intersection points matches with an identification template; if the ratio of the intervals of the intersecting points on the line segment matches one boundary of the recognition template, repeating the steps with the last intersecting point on the line segment as a starting point to continuously match other boundaries until the four boundaries of the recognition template are matched.
A table generation module: the electronic form is used for acquiring text content, calculating the restored electronic form and the real position thereof in proportion according to the layout characteristics and the relative size of the identified form, and restoring the electronic form by applying the inverse transformation of the rotation transformation.
The invention also provides an electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor;
wherein the processor is configured to invoke the instructions stored by the memory to perform the method of any of the preceding claims.
The present invention also provides a computer readable storage medium storing a computer program which when executed implements a method of any of the preceding.
In view of the above, the present invention has the above-mentioned excellent characteristics, so that it can be used to improve the performance and practicality of the prior art, and is a product with great practical value.
The foregoing is merely exemplary of the present invention, and those skilled in the art should not be considered as limiting the invention, since modifications may be made in the specific embodiments and application scope of the invention in light of the teachings of the present invention.

Claims (10)

1. A method of electronic form identification, comprising the steps of:
template construction: form features of the target form are extracted to form an identification template;
element arrangement: extracting and arranging elements on a target table, wherein the elements are line segments;
and comparison of intersection points: traversing the line segment, and detecting whether corresponding parameters of the line segment are matched with the recognition template; the method comprises the steps of,
a table generation step: and restoring and identifying the real position of the obtained table according to the table characteristics to obtain the electronic table.
2. The method of claim 1, wherein the form features extracted by the template construction step include:
the number features are as follows: the number of intersection points on four boundaries of the target table, and the ratio of adjacent intersection point intervals; a kind of electronic device with high-pressure air-conditioning system
Layout characteristics: the relative positions and sizes of the text and the target table in the vector graphics file.
3. The method of claim 2, wherein at least one of the four boundaries is eligible: the number of the intersection points is more than 2;
the extracting of the number features includes: recording the number of intersection points on four boundaries of the target table; establishing a distance queue according to a preset sequence of the distances between adjacent intersection points on four boundaries of the target table, distinguishing the adjacent intersection points according to different boundaries, and marking the distances between the adjacent intersection points as:
A 1 ,A 2 ……A m
B 1 ,B 2 ……B n
C 1 ,C 2 ……C q
D 1 ,D 2 ……D r
calculating the ratio of adjacent intervals in the queue, and marking as:
……
……
wherein m, n, q and r are positive integers.
4. The method of spreadsheet identification as recited in claim 1, wherein said elements further comprise text;
the element arrangement step comprises the following steps:
element extraction substeps: extracting elements on the target table;
element grouping sub-steps: the elements respectively establish element queues according to different dip angles, wherein an element queue determined by one dip angle comprises characters inclined by the dip angle, line segments inclined by the dip angle and line segments vertical to the dip angle;
element transformation substeps: traversing all element queues, and carrying out rotation transformation on the selected element queues around a preset rotation center to change the element queues into horizontal or vertical directions; the method comprises the steps of,
element ordering substeps: and sequencing the line segments and the characters of the selected element queue according to the coordinates respectively, and sequencing the line segments according to a preset rule.
5. The method of claim 3, wherein the step of comparing the intersections comprises:
and the intersection point extraction substep: traversing the line segments in the reordered element queue in a preset sequence to obtain a current detection line segment L and all line segments perpendicular to the line segment L, and using the intersection point P of the current detection line segment L and all line segments 1 ,P 2 ,……P x Establishing an intersection point queue according to the coordinates of the points;
acquiring the distance between adjacent intersections in an intersection queue:
L 1 =P 1 P 2 ,
L 2 =P 2 P 3 ,
……
L x-1 =P x-1 P x
wherein the value range of x is a positive integer;
and a crossing point matching sub-step: screening a boundary with the number of intersection points larger than 2 from the identification template, and setting the boundary as A 1 ,A 2 ……A m Corresponding line segment A detects whether two sections exist in all intersection points of line segment LThe ratio of the spacing isIf not, excluding the line segment L, otherwise, sequentially detecting the rest intersection points of the line segment L in the mode;
if the ratio of the intervals of a plurality of intersection points on the line segment L matches one boundary of the recognition template, the last intersection point matched on the line segment L is recorded as L t As a starting point, detecting whether there is a line segment L perpendicular to the line segment L t Is defined by the line segment L',
wherein t is more than or equal to 1 and less than or equal to x-1;
if the line segment L 'exists, continuously detecting whether the intersection point on the line segment L' is matched with the other boundary of the recognition template, if not, excluding the line segment L, otherwise, continuously matching the other boundary of the recognition template according to the preset direction until the four boundaries of the recognition template are matched.
6. The method of electronic form identification of claim 1, wherein the form generating step comprises:
the calculation substep: acquiring text content, and calculating to obtain the real position of the electronic form according to the layout characteristics; the method comprises the steps of,
and (3) an atomic step: the identified form is then restored to the true position by applying an inverse of the rotational transformation.
7. A spreadsheet identification device, comprising:
and a template construction module: form features of the target form are extracted to form an identification template;
element arrangement module: the method comprises the steps of extracting and arranging elements on a target table, wherein the elements are line segments;
intersection point comparison module: the method comprises the steps of traversing a line segment, and detecting whether corresponding parameters of the line segment are matched with an identification template or not; the method comprises the steps of,
a table generation module: and the electronic form is obtained by restoring and identifying the real position of the obtained form according to the form characteristics.
8. The electronic form identification device of claim 7, wherein the form features comprise:
the number features are as follows: the number of intersection points on four boundaries of the target table, and the ratio of adjacent intersection point intervals; the method comprises the steps of,
layout characteristics: the relative positions and sizes of the text and the target table in the vector graphics file.
9. An electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor;
the processor, when executing a program, implements the spreadsheet identification method as claimed in any one of claims 1 to 6.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed, implements the spreadsheet identification method as claimed in any one of claims 1 to 6.
CN202310556330.8A 2023-05-17 2023-05-17 Electronic form identification method, electronic form identification device, electronic equipment and storage medium Active CN116580415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310556330.8A CN116580415B (en) 2023-05-17 2023-05-17 Electronic form identification method, electronic form identification device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310556330.8A CN116580415B (en) 2023-05-17 2023-05-17 Electronic form identification method, electronic form identification device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116580415A true CN116580415A (en) 2023-08-11
CN116580415B CN116580415B (en) 2023-11-28

Family

ID=87539274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310556330.8A Active CN116580415B (en) 2023-05-17 2023-05-17 Electronic form identification method, electronic form identification device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116580415B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090245645A1 (en) * 2008-03-28 2009-10-01 Smart Technologies Inc. Method and tool for recognizing a hand-drawn table
US20100100803A1 (en) * 2007-06-28 2010-04-22 Fujitsu Limited Computer product, spreadsheet generating apparatus, and spreadsheet generating method
CN101794280A (en) * 2010-03-11 2010-08-04 北京中科辅龙计算机技术股份有限公司 Form automatic generation method and system based on form template set
CN101882225A (en) * 2009-12-29 2010-11-10 北京中科辅龙计算机技术股份有限公司 Engineering drawing material information extraction method based on template
US20170147552A1 (en) * 2015-11-19 2017-05-25 Captricity, Inc. Aligning a data table with a reference table
CN111310682A (en) * 2020-02-24 2020-06-19 民生科技有限责任公司 Universal detection analysis and identification method for text file table
CN111428700A (en) * 2020-06-10 2020-07-17 上海交通大学苏州人工智能研究院 Table identification method and device, electronic equipment and storage medium
WO2021062896A1 (en) * 2019-09-30 2021-04-08 北京市商汤科技开发有限公司 Form recognition method, table extraction method, and relevant apparatus
CN112699651A (en) * 2020-12-31 2021-04-23 上海汇航捷讯网络科技有限公司 Method for restoring Excel layout based on picture

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100100803A1 (en) * 2007-06-28 2010-04-22 Fujitsu Limited Computer product, spreadsheet generating apparatus, and spreadsheet generating method
US20090245645A1 (en) * 2008-03-28 2009-10-01 Smart Technologies Inc. Method and tool for recognizing a hand-drawn table
CN101882225A (en) * 2009-12-29 2010-11-10 北京中科辅龙计算机技术股份有限公司 Engineering drawing material information extraction method based on template
CN101794280A (en) * 2010-03-11 2010-08-04 北京中科辅龙计算机技术股份有限公司 Form automatic generation method and system based on form template set
US20170147552A1 (en) * 2015-11-19 2017-05-25 Captricity, Inc. Aligning a data table with a reference table
WO2021062896A1 (en) * 2019-09-30 2021-04-08 北京市商汤科技开发有限公司 Form recognition method, table extraction method, and relevant apparatus
CN111310682A (en) * 2020-02-24 2020-06-19 民生科技有限责任公司 Universal detection analysis and identification method for text file table
CN111428700A (en) * 2020-06-10 2020-07-17 上海交通大学苏州人工智能研究院 Table identification method and device, electronic equipment and storage medium
CN112699651A (en) * 2020-12-31 2021-04-23 上海汇航捷讯网络科技有限公司 Method for restoring Excel layout based on picture

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何国辉;解正梅;: "快速实用的通用表格分析方法", 计算机工程与设计, no. 19, pages 234 - 236 *

Also Published As

Publication number Publication date
CN116580415B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
US8600164B2 (en) Method and tool for recognizing a hand-drawn table
CN111401371A (en) Text detection and identification method and system and computer equipment
US5841902A (en) System and method for unconstrained on-line alpha-numerical handwriting recognition
CN110472208A (en) The method, system of form analysis, storage medium and electronic equipment in PDF document
CN111507251A (en) Method and device for positioning answer area in test question image and electronic equipment
JPH0644408A (en) Method of recognizing hand-printed sign
JPH08510854A (en) Method and apparatus for grouping and manipulating electronic representations of handwriting, printing and drawing
CN112016551A (en) Text detection method and device, electronic equipment and computer storage medium
JPH06348896A (en) Segmenting method for character and device therefor
CN101727580A (en) Image processing apparatus, electronic medium, and image processing method
Uchiyama et al. Toward augmenting everything: Detecting and tracking geometrical features on planar objects
CN110490190A (en) A kind of structured image character recognition method and system
US7427984B2 (en) Point erasing
CN116580415B (en) Electronic form identification method, electronic form identification device, electronic equipment and storage medium
JPH07111739B2 (en) Image processing device
JP3884468B2 (en) Fast image search method
CN107480710B (en) Feature point matching result processing method and device
CN111259888A (en) Image-based information comparison method and device and computer-readable storage medium
JP3884462B2 (en) Fast image search method
CN114694159A (en) Engineering drawing BOM identification method and device, electronic equipment and storage medium
CN115719507A (en) Image identification method and device and electronic equipment
CN113763505A (en) Graph generation method and device, computer equipment and storage medium
CN113361511A (en) Method, device and equipment for establishing correction model and computer readable storage medium
JP4648084B2 (en) Symbol recognition method and apparatus
CN114926668B (en) Deformation target positioning algorithm based on SIFT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant