CN107909064B - Three line table recognition methods, electronic equipment and storage medium - Google Patents
Three line table recognition methods, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN107909064B CN107909064B CN201711445372.5A CN201711445372A CN107909064B CN 107909064 B CN107909064 B CN 107909064B CN 201711445372 A CN201711445372 A CN 201711445372A CN 107909064 B CN107909064 B CN 107909064B
- Authority
- CN
- China
- Prior art keywords
- line
- horizontal route
- route line
- path
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
Abstract
The invention discloses a kind of three line table recognition methods, electronic equipment and storage mediums, wherein three line table recognition methods include:Extract path-line all in current layout page;Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;According to the index value of horizontal route line, division processing is carried out to horizontal route line, obtains at least one set of horizontal route line;For every group of horizontal route line, whether detection meets preset table feature by the rectangular area that each horizontal route line in this group of horizontal route line forms;If so, determining that rectangular area is three line table sections, and according to three line table sections, generate form image.Technical solution provided by the invention detects rectangular area composed by every group of horizontal route line according to preset table feature, it can fast and accurately identify three line tables, and the three line table sections obtained according to identification generate form image, to efficiently solve the problems such as file format converts brought table dislocation.
Description
Technical field
The present invention relates to technical field of information processing, and in particular to three line table recognition methods of one kind, electronic equipment and storage
Medium.
Background technique
With the continuous development of the electronic equipments such as mobile phone, E-book reader, more and more users like read electronic
Book.It is by electronics mostly in the prior art to enable e-book to be perfectly suitable for the electronic equipment of different screen size
Book is fabricated to stream-oriented file, to can not only easily edit to eBook content, but also can make in e-book
Hold the line feed automatically according to screen width adjustment paragraph to adapt to the field range of single page.So utilizing the formats such as PDF text
When part makes e-book, then need that layout files is first converted into stream-oriented file.However the prior art can not accurately identify three
Line table, when in layout files including three line tables, after layout files is converted into stream-oriented file, it is easy to it is wrong conversion occur
Accidentally, especially when table is bigger, the problems such as then dislocation it is easy to appear table after conversion.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind
State three line table recognition methods, electronic equipment and the storage medium of problem.
According to an aspect of the invention, there is provided a kind of three line table recognition methods, this method include:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of horizontal route line, division processing is carried out to horizontal route line, obtains at least one set of horizontal route
Line;
For every group of horizontal route line, the rectangle region being made of each horizontal route line in this group of horizontal route line is detected
Whether domain meets preset table feature;If so, determining that rectangular area is three line table sections, and according to three line table sections, generate
Form image.
According to another aspect of the present invention, a kind of electronic equipment is provided, including:Processor, memory, communication interface and
Communication bus, processor, memory and communication interface complete mutual communication by communication bus;
Memory makes processor execute following operation for storing an at least executable instruction, executable instruction:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of horizontal route line, division processing is carried out to horizontal route line, obtains at least one set of horizontal route
Line;
For every group of horizontal route line, the rectangle region being made of each horizontal route line in this group of horizontal route line is detected
Whether domain meets preset table feature;If so, determining that rectangular area is three line table sections, and according to three line table sections, generate
Form image.
According to another aspect of the invention, a kind of storage medium is provided, it is executable that at least one is stored in storage medium
Instruction, executable instruction make processor execute following operation:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of horizontal route line, division processing is carried out to horizontal route line, obtains at least one set of horizontal route
Line;
For every group of horizontal route line, the rectangle region being made of each horizontal route line in this group of horizontal route line is detected
Whether domain meets preset table feature;If so, determining that rectangular area is three line table sections, and according to three line table sections, generate
Form image.
The technical solution provided according to the present invention extracts path-line all in current layout page, then to being mentioned
The path-line of taking-up carries out Screening Treatment, obtains horizontal route line, according to the index value of horizontal route line, to horizontal route line into
Row division processing, obtains at least one set of horizontal route line, is then directed to every group of horizontal route line, detects by this group of horizontal route line
In each horizontal route line composition rectangular area whether meet preset table feature;If meeting preset table feature, really
Determining rectangular area is three line table sections, and according to three line table sections, generates form image.Utilize technical side provided by the invention
Case, the index value based on horizontal route line are grouped horizontal route line, and combine preset table feature to every group of horizontal road
Rectangular area composed by radial line is detected, and can fast and accurately identify three line tables, and generate according to three line table sections
Form image, then enabling the stream-oriented file being converted to pass through image shape after layout files is converted into stream-oriented file
Formula completely shows the table content of three line tables, asks to efficiently solve file format and convert brought table dislocation etc.
Topic.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows the flow diagram of according to embodiments of the present invention one three line table recognition methods of one kind;
Fig. 2 shows the flow diagrams of according to embodiments of the present invention two three line table recognition methods of one kind;
Fig. 3 shows the structural schematic diagram of according to embodiments of the present invention four a kind of electronic equipment.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
Embodiment one
Fig. 1 shows the flow diagram of according to embodiments of the present invention one three line table recognition methods of one kind, such as Fig. 1 institute
Show, this method comprises the following steps:
Step S101 extracts path-line all in current layout page.
Wherein, the object in layout page can have text, path-line and image.For the ease of identifying three line tables,
It needs to extract path-line all in current layout page in step S101.Wherein, current layout page can be PDF pages
Face.It specifically, due to extraction is path-line all in current layout page, the path-line extracted may include
There are table line, headerfooter line, formula fraction line, the annotation path-lines such as cut-off rule and page layout background decorative thread.
Step S102 carries out Screening Treatment to the path-line extracted, obtains horizontal route line.
In conjunction with three line tables feature it is found that three line tables are made of at least three horizontal table lines, and in step S101
The path-line extracted may include a variety of path-lines such as table line, headerfooter line, in order to which which road quickly determined
Radial line may be the corresponding path-line of three line tables, need to carry out the path-line extracted Screening Treatment, screening obtains level
Path-line.
Step S103 carries out division processing to horizontal route line, obtains at least one set according to the index value of horizontal route line
Horizontal route line.
In layout page, each object has corresponding index value, then in step s 103, it can be according to step
The index value for the horizontal route line that S102 is screened carries out division processing to horizontal route line, obtains at least one set of horizontal road
Radial line.Specifically, the consecutive horizontal route line of index value can be divided into one group of horizontal route line, to obtain at least one set
Horizontal route line.
Step S104 selects one group of horizontal route line not being selected from least one set of horizontal route line.
After having obtained at least one set of horizontal route line, in order to effectively, comprehensively identify three line tables, it is also necessary to needle
Each group of horizontal route line at least one set of horizontal route line is all detected, then from least one set of horizontal route line
Select one group of horizontal route line not being selected.
Whether step S105, detection are met by the rectangular area that each horizontal route line in this group of horizontal route line forms
Preset table feature;If so, thening follow the steps S106;If it is not, thening follow the steps S107.
Wherein, every group of horizontal route line is made of a plurality of horizontal route line, and each item in this group of horizontal route line is horizontal
Path-line can make up a rectangular area, detect whether the rectangular area meets preset table feature.Those skilled in the art
Preset table feature can be configured according to actual needs, herein without limitation.Specifically, preset table feature may include pre-
If table title feature and preset table subfield feature etc..If detecting to meet preset table feature to rectangular area, execute
Step S106;If detecting not meeting preset table feature to rectangular area, S107 is thened follow the steps.
Step S106 determines that rectangular area is three line table sections, and according to three line table sections, generates form image.
In the case where detecting to meet preset table feature to rectangular area through step S105, determine that the rectangular area is
Three line table sections generate form image then according to three line table sections.It specifically, can be by screenshot mode to identified three
Line table section is handled, and obtains form image to generate.It can be fast and accurately using technical solution provided by the invention
Three line tables are identified, and the three line table sections obtained according to identification generate form image, then layout files to be converted into flowing
After formula file, the stream-oriented file being converted to is enable completely to show the table content of three line tables by image format, thus
Efficiently solve the problems such as file format converts brought table dislocation.
Step S107, determining rectangular area not is three line table sections.
In the case where detecting not meeting preset table feature to rectangular area through step S105, determine rectangular area not
For three line table sections.
Step S108, judges whether every group of horizontal route line at least one set of horizontal route line is all selected;If so,
Then this method terminates;If it is not, thening follow the steps S104.
If it is determined that the every group of horizontal route line obtained at least one set of horizontal route line is all selected, illustrate for extremely
Every group of horizontal route line in few one group of horizontal route line is completed whether composed rectangular area meets preset table spy
The detection of sign, then this method terminates;If it is determined that obtaining all not being selected, S104 is thened follow the steps.
Using three lines table recognition methods provided in this embodiment, path-line all in current layout page is extracted, is connect
Screening Treatment is carried out to the path-line that is extracted, horizontal route line is obtained, according to the index value of horizontal route line, to level
Path-line carries out division processing, obtains at least one set of horizontal route line, is then directed to every group of horizontal route line, detects by this group of water
Whether the rectangular area of each horizontal route line composition in level road radial line meets preset table feature;If meeting preset table spy
Sign, it is determined that rectangular area is three line table sections, and according to three line table sections, generates form image.Using provided by the invention
Technical solution, the index value based on horizontal route line are grouped horizontal route line, and combine preset table feature to every group
Rectangular area composed by horizontal route line is detected, and can fast and accurately identify three line tables, and obtain according to identification
Three line table sections generate form image, then after layout files is converted into stream-oriented file, keep the streaming being converted to literary
Part can completely show the table content of three line tables by image format, to efficiently solve file format conversion institute's band
The problems such as table dislocation come.
Embodiment two
Fig. 2 shows the flow diagrams of according to embodiments of the present invention two three line table recognition methods of one kind, such as Fig. 2 institute
Show, this method comprises the following steps:
Step S201 extracts path-line all in current layout page.
When needing to identify three line table in current layout page, institute in current layout page is extracted in step s 201
Some path-lines, the path-line extracted may include the path-lines such as table line, headerfooter line.
Step S202, the length and width of to acquisite approachs line.
After extracting all path-lines in current layout page, for the ease of from the path-line extracted
Screening obtains horizontal route line, needs to obtain the length and width of each path line in step S202.
Step S203 filters out length and width from path-line and meets default sieve according to the length and width of path-line
The path-line that screening obtains is determined as horizontal route line by the path-line for selecting rule.
Those skilled in the art can according to actual needs be configured default screening rule, herein without limitation.Specifically
Ground, default screening rule may include:Length is greater than pre-set length threshold, width is less than predetermined width threshold value and length and width
Between ratio be greater than preset ratio threshold value.Assuming that pre-set length threshold is single character length, predetermined width threshold value is 5 pictures
Element, preset ratio threshold value are 10, then filtering out length from path-line greater than less than 5 single character length, width pixels
And the ratio between length and width is greater than 10 path-line, and the path-line that screening obtains is determined as horizontal route line.
The consecutive horizontal route line of index value is divided into one group of horizontal route in horizontal route line by step S204
Line obtains at least one set of horizontal route line.
Each horizontal route line all has corresponding index value, by the consecutive level of index value in horizontal route line
Path-line is divided into one group of horizontal route line, if the index value of certain horizontal route lines is mutually continuous, illustrates these horizontal route lines
With stronger relevance, it is more likely that belong to the same three lines table.It is assumed that horizontal route line determined by step S203 has altogether
Including 7 horizontal route lines, respectively horizontal route line 1 to horizontal route line 7, wherein the index value of horizontal route line 1 is 7,
The index value of horizontal route line 2 is 8, and the index value of horizontal route line 3 is 9, and the index value of horizontal route line 4 is 12, horizontal road
The index value of radial line 5 is 16, and the index value of horizontal route line 6 is 17, and the index value of horizontal route line 7 is 18, it follows that water
The index value of level road radial line 1 to horizontal route line 3 is mutually continuous, and the index value of horizontal route line 5 to horizontal route line 7 is mutually continuous,
Horizontal route line 1 to horizontal route line 3 is so divided into one group of horizontal route line, by horizontal route line 5 to horizontal route line 7
It is divided into another group of horizontal route line.
Step S205 selects one group of horizontal route line not being selected from least one set of horizontal route line.
Optionally, it is made of due to three line tables at least three horizontal table lines, has at least three in three line tables
Bar horizontal route line can first detect the horizontal route line in this group of horizontal route line before step S206 based on this feature
Whether item number is greater than or equal to 3.If the item number that detection obtains horizontal route line is greater than or equal to 3, illustrate this group of horizontal route line
It may be the corresponding horizontal route line of three line tables, then follow the steps S206, further detected;If detection obtains horizontal road
The item number of radial line illustrates that this group of horizontal route line can not be the corresponding horizontal route line of three line tables, thens follow the steps less than 3
S212.Facilitate to reduce the data processing amount in three line table identification process in this way.
Whether step S206, the left end point and right endpoint for detecting each horizontal route line in this group of horizontal route line are distinguished
Alignment;If so, thening follow the steps S207;If it is not, thening follow the steps S212.
In order to rapidly judge whether this group of horizontal route line is the corresponding horizontal route line of three line tables, can detect should
Whether left and right is aligned each horizontal route line in group horizontal route line, specifically, can detect each in this group of horizontal route line
Whether the left end point of bar horizontal route line aligns, whether right endpoint aligns.If left end point aligns and right endpoint is opposite
Together, illustrate that each horizontal route line in this group of horizontal route line or so is aligned, then follow the steps S207;If left end point not phase
Alignment and/or right endpoint do not align, and illustrate that each horizontal route line or so in this group of horizontal route line is misaligned, then execute
Step S212.
Detection obtains the left end point of each horizontal route line in this group of horizontal route line and right endpoint is aligned respectively
In the case of, whether preset table then is met to the rectangular area being made of each horizontal route line in this group of horizontal route line
Feature is detected.Specifically, it can be realized by step S207 to step S210.
Step S207 extracts the first text within the scope of the predetermined width in current layout page corresponding to rectangular area
This object.
For the ease of being distinguished to the text object extracted, in the present invention, will from current layout page
The text object extracted within the scope of predetermined width corresponding to rectangular area is known as the first text object, will be out of rectangular area
The text object extracted is known as the second text object.Specifically, in step S207 can extract current layout page in
The first text object within the scope of predetermined width corresponding to above rectangular area.Those skilled in the art can be according to actual needs
Predetermined width range is configured, herein without limitation.For example, predetermined width range can be 3 character widths.
Step S208, whether the first text object of detection includes preset table title feature character;If so, thening follow the steps
S209;If it is not, thening follow the steps S212.
Wherein, preset table title feature character includes but is not limited to:" table " character, " Tab " character and " Table " character
Deng for indicating the character of table title.If it includes preset table title feature character, explanation that detection, which obtains the first text object,
The rectangular area is probably that three line table sections in order to more precisely identify three line tables then follow the steps S209;Such as
Fruit detects to obtain the first text object not including preset table title feature character, thens follow the steps S212.
Step S209 extracts the second text object in rectangular area, and in vertical direction by the second text object
It is projected, obtains upright projection.
The second text object is extracted out of rectangular area, then by the second text object extracted in vertical direction
On projected, and obtained projection result is known as upright projection.
Whether step S210, detection upright projection meet preset table subfield feature;If so, thening follow the steps S211;If
It is no, then follow the steps S212.
Although only horizontal table line is without vertical table line in three line tables, the text object in the table is still
It arranges in the way of subfield, therefore the projection of the text object in the table in vertical direction can also be divided into several areas
Preset table subfield feature can be arranged based on this feature in domain, then can detect vertical throw after step S209 obtains upright projection
Whether shadow meets preset table subfield feature.If detection obtains upright projection and meets preset table subfield feature, step is executed
Rapid S211;If detection obtains upright projection and do not meet preset table subfield feature, S212 is thened follow the steps.
Step S211 determines that rectangular area is three line table sections, and according to three line table sections, generates form image.
The vertical throwing that the first text object includes preset table title feature character and the second text object is obtained in detection
In the case that shadow meets preset table subfield feature, that is, it can determine that the rectangular area is three line table sections, it then can be to determining
Three line table sections carry out screenshot processing, generation obtain form image, then making after layout files is converted into stream-oriented file
The stream-oriented file being converted to can completely show the table content of three line tables by image format, to efficiently solve
File format converts the problems such as brought table dislocation.
Step S212, determining rectangular area not is three line table sections.
In the left end point and right endpoint for detecting to obtain each horizontal route line in this group of horizontal route line through step S206
In the case where not being aligned respectively, determining rectangular area not is three line table sections;In addition, detecting to obtain the first text through step S208
It is not inconsistent in the case that this object does not include preset table title feature character and detecting to obtain upright projection through step S210
In the case where closing preset table subfield feature, determining rectangular area not is three line table sections.
Step S213, judges whether every group of horizontal route line at least one set of horizontal route line is all selected;If so,
Then this method terminates;If it is not, thening follow the steps S205.
If it is determined that the every group of horizontal route line obtained at least one set of horizontal route line is all selected, illustrate for extremely
Every group of horizontal route line in few one group of horizontal route line is completed whether composed rectangular area meets preset table spy
The detection of sign, then this method terminates;If it is determined that obtaining all not being selected, S205 is thened follow the steps.
Using three lines table recognition methods provided in this embodiment, the continuity of the index value based on horizontal route line is to level
Path-line is grouped, and can rapidly determine from a plurality of horizontal route line may be for one corresponding to the same three lines table
Group horizontal route line, effectively reduces the data processing amount in three line table identification process;In conjunction with preset table title feature character
Rectangular area composed by every group of horizontal route line is detected with preset table subfield feature etc., it can be fast and accurately
It identifies three line tables, improves recognition efficiency and the recognition accuracy of three line tables;And it is raw according to the three line table sections that identification obtains
At form image, then enabling the stream-oriented file being converted to pass through image after layout files is converted into stream-oriented file
Form completely shows the table content of three line tables, so that efficiently solving file format converts brought table dislocation etc.
Problem.
Embodiment three
The embodiment of the present invention three provides a kind of non-volatile memory medium, and storage medium is stored at least one executable finger
It enables, which can be performed three line table recognition methods in above-mentioned any means embodiment.
Executable instruction specifically can be used for so that processor executes following operation:It extracts in current layout page and owns
Path-line;Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;According to the index of horizontal route line
Value, carries out division processing to horizontal route line, obtains at least one set of horizontal route line;For every group of horizontal route line, detection by
Whether the rectangular area of each horizontal route line composition in this group of horizontal route line meets preset table feature;If so, really
Determining rectangular area is three line table sections, and according to three line table sections, generates form image.
In a kind of optional embodiment, executable instruction further makes processor execute following operation:To acquisite approachs
The length and width of line;According to the length and width of path-line, length and width is filtered out from path-line and meets default screening
The path-line that screening obtains is determined as horizontal route line by the path-line of rule.
In a kind of optional embodiment, default screening rule includes:Length is greater than pre-set length threshold, width is less than
Ratio between predetermined width threshold value and length and width is greater than preset ratio threshold value.
In a kind of optional embodiment, executable instruction further makes processor execute following operation:On horizontal road
The consecutive horizontal route line of index value is divided into one group of horizontal route line in radial line.
In a kind of optional embodiment, executable instruction further makes processor execute following operation:Detect the group
Whether the left end point and right endpoint of each horizontal route line in horizontal route line are aligned respectively;If detection obtains the horizontal road of the group
The left end point and right endpoint of each horizontal route line in radial line are aligned respectively, then detection is by each item in this group of horizontal route line
Whether the rectangular area of horizontal route line composition meets preset table feature.
In a kind of optional embodiment, executable instruction further makes processor execute following operation:It extracts and works as
The first text object within the scope of predetermined width in preceding layout page corresponding to rectangular area;Detecting the first text object is
No includes preset table title feature character.
In a kind of optional embodiment, executable instruction further makes processor execute following operation:Extract square
The second text object in shape region;Second text object is projected in vertical direction, obtains upright projection;Detection is hung down
Deliver directly whether shadow meets preset table subfield feature.
Example IV
Fig. 3 shows the structural schematic diagram of according to embodiments of the present invention four a kind of electronic equipment, present invention specific implementation
Example does not limit the specific implementation of electronic equipment.
As shown in figure 3, the electronic equipment may include:Processor (processor) 302, communication interface
(Communications Interface) 304, memory (memory) 306 and communication bus 308.
Wherein:
Processor 302, communication interface 304 and memory 306 complete mutual communication by communication bus 308.
Communication interface 304, for being communicated with the network element of other equipment such as client or other servers etc..
Processor 302 can specifically execute the phase in above-mentioned three lines table recognition methods embodiment for executing program 310
Close step.
Specifically, program 310 may include program code, which includes computer operation instruction.
Processor 302 may be central processor CPU or specific integrated circuit ASIC (Application
Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention
Road.The one or more processors that electronic equipment includes can be same type of processor, such as one or more CPU;It can also
To be different types of processor, such as one or more CPU and one or more ASIC.
Memory 306, for storing program 310.Memory 306 may include high speed RAM memory, it is also possible to further include
Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.
Program 310 specifically can be used for so that processor 302 executes following operation:It extracts in current layout page and owns
Path-line;Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;According to the index of horizontal route line
Value, carries out division processing to horizontal route line, obtains at least one set of horizontal route line;For every group of horizontal route line, detection by
Whether the rectangular area of each horizontal route line composition in this group of horizontal route line meets preset table feature;If so, really
Determining rectangular area is three line table sections, and according to three line table sections, generates form image.
In a kind of optional embodiment, program 310 is further such that processor 302 executes following operation:Obtain road
The length and width of radial line;According to the length and width of path-line, length and width is filtered out from path-line and meets default sieve
The path-line that screening obtains is determined as horizontal route line by the path-line for selecting rule.
In a kind of optional embodiment, default screening rule includes:Length is greater than pre-set length threshold, width is less than
Ratio between predetermined width threshold value and length and width is greater than preset ratio threshold value.
In a kind of optional embodiment, program 310 is further such that processor 302 executes following operation:In level
The consecutive horizontal route line of index value is divided into one group of horizontal route line in path-line.
In a kind of optional embodiment, program 310 is further such that processor 302 executes following operation:Detection should
Whether the left end point and right endpoint of each horizontal route line in group horizontal route line are aligned respectively;If detection obtains group level
The left end point and right endpoint of each horizontal route line in path-line are aligned respectively, then detection is by each in this group of horizontal route line
Whether the rectangular area of bar horizontal route line composition meets preset table feature.
In a kind of optional embodiment, program 310 is further such that processor 302 executes following operation:It extracts
The first text object within the scope of predetermined width in current layout page corresponding to rectangular area;Detect the first text object
It whether include preset table title feature character.
In a kind of optional embodiment, program 310 is further such that processor 302 executes following operation:It extracts
The second text object in rectangular area;Second text object is projected in vertical direction, obtains upright projection;Detection
Whether upright projection meets preset table subfield feature.
The corresponding steps that the specific implementation of each step may refer in above-mentioned three lines table identification embodiment in program 310 are corresponding
Description, this will not be repeated here.It is apparent to those skilled in the art that for convenience and simplicity of description, it is above-mentioned
The specific work process of the equipment of description can refer to corresponding processes in the foregoing method embodiment description, and details are not described herein.
The scheme provided through this embodiment, the index value based on horizontal route line are grouped horizontal route line, and
Rectangular area composed by every group of horizontal route line is detected in conjunction with preset table feature, can fast and accurately be identified
Three line table out, and form image is generated according to three line table sections, then making to convert after layout files is converted into stream-oriented file
Obtained stream-oriented file can completely show the table content of three line tables by image format, to efficiently solve file
Format converts the problems such as brought table dislocation.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein.
Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system
Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various
Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects,
Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention:It is i.e. required to protect
Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
Meaning one of can in any combination mode come using.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.The use of word first, second, and third does not indicate any sequence.These words can be construed to title.
Claims (18)
1. a kind of three line table recognition methods, including:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of the horizontal route line, division processing is carried out to the horizontal route line, obtains at least one set of level
Path-line;Wherein, the consecutive horizontal route line of index value is divided into one group of horizontal route line in the horizontal route line;
For every group of horizontal route line, detection is by the rectangular area that each horizontal route line in this group of horizontal route line forms
It is no to meet preset table feature;If so, determine that the rectangular area is three line table sections, and according to the three lines table section,
Generate form image.
2. obtaining horizontal route according to the method described in claim 1, described carry out Screening Treatment to the path-line extracted
Line further comprises:
Obtain the length and width of the path-line;
According to the length and width of the path-line, length and width is filtered out from the path-line and meets default screening rule
Path-line, the obtained path-line of screening is determined as horizontal route line.
3. according to the method described in claim 2, the default screening rule includes:Length is greater than pre-set length threshold, width
It is greater than preset ratio threshold value less than the ratio between predetermined width threshold value and length and width.
4. method according to claim 1-3, horizontal by each item in this group of horizontal route line in the detection
Before whether the rectangular area of path-line composition meets preset table feature, the method also includes:
Whether the left end point and right endpoint for detecting each horizontal route line in this group of horizontal route line are aligned respectively;
Whether the detection meets preset table by the rectangular area that each horizontal route line in this group of horizontal route line forms
Feature is specially:If detecting the left end point for obtaining each horizontal route line in this group of horizontal route line and right endpoint being right respectively
Together, then it detects by whether the rectangular area that each horizontal route line in this group of horizontal route line forms meets preset table spy
Sign.
5. method according to claim 1-3, the detection is by the horizontal road of each item in this group of horizontal route line
Whether the rectangular area of radial line composition, which meets preset table feature, further comprises:
Extract the first text object in current layout page within the scope of the predetermined width corresponding to the rectangular area;
Detect whether first text object includes preset table title feature character.
6. method according to claim 1-3, the detection is by the horizontal road of each item in this group of horizontal route line
Whether the rectangular area of radial line composition, which meets preset table feature, further comprises:
Extract the second text object in the rectangular area;
Second text object is projected in vertical direction, obtains upright projection;
Detect whether the upright projection meets preset table subfield feature.
7. a kind of electronic equipment, including:Processor, memory, communication interface and communication bus, the processor, the storage
Device and the communication interface complete mutual communication by the communication bus;
The memory makes the processor execute following behaviour for storing an at least executable instruction, the executable instruction
Make:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of the horizontal route line, division processing is carried out to the horizontal route line, obtains at least one set of level
Path-line;Wherein, the consecutive horizontal route line of index value is divided into one group of horizontal route line in the horizontal route line;
For every group of horizontal route line, detection is by the rectangular area that each horizontal route line in this group of horizontal route line forms
It is no to meet preset table feature;If so, determine that the rectangular area is three line table sections, and according to the three lines table section,
Generate form image.
8. electronic equipment according to claim 7, the executable instruction further makes the processor execute following behaviour
Make:
Obtain the length and width of the path-line;
According to the length and width of the path-line, length and width is filtered out from the path-line and meets default screening rule
Path-line, the obtained path-line of screening is determined as horizontal route line.
9. electronic equipment according to claim 8, the default screening rule include:Length be greater than pre-set length threshold,
Width is less than the ratio between predetermined width threshold value and length and width and is greater than preset ratio threshold value.
10. according to the described in any item electronic equipments of claim 7-9, the executable instruction further holds the processor
The following operation of row:
Whether the left end point and right endpoint for detecting each horizontal route line in this group of horizontal route line are aligned respectively;
If detection obtains the left end point of each horizontal route line in this group of horizontal route line and right endpoint is aligned respectively, detect
Whether preset table feature is met by the rectangular area that each horizontal route line in this group of horizontal route line forms.
11. according to the described in any item electronic equipments of claim 7-9, the executable instruction further holds the processor
The following operation of row:
Extract the first text object in current layout page within the scope of the predetermined width corresponding to the rectangular area;
Detect whether first text object includes preset table title feature character.
12. according to the described in any item electronic equipments of claim 7-9, the executable instruction further holds the processor
The following operation of row:
Extract the second text object in the rectangular area;
Second text object is projected in vertical direction, obtains upright projection;
Detect whether the upright projection meets preset table subfield feature.
13. a kind of storage medium, it is stored with an at least executable instruction in the storage medium, the executable instruction makes to handle
Device executes following operation:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of the horizontal route line, division processing is carried out to the horizontal route line, obtains at least one set of level
Path-line;Wherein, the consecutive horizontal route line of index value is divided into one group of horizontal route line in the horizontal route line;
For every group of horizontal route line, detection is by the rectangular area that each horizontal route line in this group of horizontal route line forms
It is no to meet preset table feature;If so, determine that the rectangular area is three line table sections, and according to the three lines table section,
Generate form image.
14. storage medium according to claim 13, it is following that the executable instruction further executes the processor
Operation:
Obtain the length and width of the path-line;
According to the length and width of the path-line, length and width is filtered out from the path-line and meets default screening rule
Path-line, the obtained path-line of screening is determined as horizontal route line.
15. storage medium according to claim 14, the default screening rule include:Length is greater than preset length threshold
Value, width are less than the ratio between predetermined width threshold value and length and width and are greater than preset ratio threshold value.
16. the described in any item storage mediums of 3-15 according to claim 1, the executable instruction further makes the processor
Execute following operation:
Whether the left end point and right endpoint for detecting each horizontal route line in this group of horizontal route line are aligned respectively;
If detection obtains the left end point of each horizontal route line in this group of horizontal route line and right endpoint is aligned respectively, detect
Whether preset table feature is met by the rectangular area that each horizontal route line in this group of horizontal route line forms.
17. the described in any item storage mediums of 3-15 according to claim 1, the executable instruction further makes the processor
Execute following operation:
Extract the first text object in current layout page within the scope of the predetermined width corresponding to the rectangular area;
Detect whether first text object includes preset table title feature character.
18. the described in any item storage mediums of 3-15 according to claim 1, the executable instruction further makes the processor
Execute following operation:
Extract the second text object in the rectangular area;
Second text object is projected in vertical direction, obtains upright projection;
Detect whether the upright projection meets preset table subfield feature.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711445372.5A CN107909064B (en) | 2017-12-27 | 2017-12-27 | Three line table recognition methods, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711445372.5A CN107909064B (en) | 2017-12-27 | 2017-12-27 | Three line table recognition methods, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107909064A CN107909064A (en) | 2018-04-13 |
CN107909064B true CN107909064B (en) | 2018-11-16 |
Family
ID=61871745
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711445372.5A Active CN107909064B (en) | 2017-12-27 | 2017-12-27 | Three line table recognition methods, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107909064B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310682B (en) * | 2020-02-24 | 2023-05-12 | 民生科技有限责任公司 | Universal detection analysis and recognition method for text file forms |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101064007A (en) * | 2006-04-29 | 2007-10-31 | 北大方正集团有限公司 | Digital correction method for geometric distortion of form image |
CN101770446A (en) * | 2008-12-26 | 2010-07-07 | 北大方正集团有限公司 | Method and system for identifying form in layout file |
CN101887413A (en) * | 2009-05-14 | 2010-11-17 | 北大方正集团有限公司 | Structure processing method and system of plate type table |
CN105589841A (en) * | 2016-01-15 | 2016-05-18 | 同方知网(北京)技术有限公司 | Portable document format (PDF) document form identification method |
CN106446881A (en) * | 2016-07-29 | 2017-02-22 | 北京交通大学 | Method for extracting lab test result from medical lab sheet image |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315668A (en) * | 2008-07-01 | 2008-12-03 | 上海大学 | Automatic detection method for test paper form |
CN101676930A (en) * | 2008-09-17 | 2010-03-24 | 北大方正集团有限公司 | Method and device for recognizing table cells in scanned image |
CN101866335B (en) * | 2010-06-14 | 2012-12-12 | 深圳市万兴软件有限公司 | Form processing method and device in document conversion |
CN103377177B (en) * | 2012-04-27 | 2016-03-30 | 北大方正集团有限公司 | Method and the device of form is identified in a kind of digital layout files |
CN106156761B (en) * | 2016-08-10 | 2020-01-10 | 北京交通大学 | Image table detection and identification method for mobile terminal shooting |
CN107169486B (en) * | 2017-05-12 | 2018-06-15 | 掌阅科技股份有限公司 | The recognition methods of text type page, electronic equipment and computer storage media |
-
2017
- 2017-12-27 CN CN201711445372.5A patent/CN107909064B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101064007A (en) * | 2006-04-29 | 2007-10-31 | 北大方正集团有限公司 | Digital correction method for geometric distortion of form image |
CN101770446A (en) * | 2008-12-26 | 2010-07-07 | 北大方正集团有限公司 | Method and system for identifying form in layout file |
CN101887413A (en) * | 2009-05-14 | 2010-11-17 | 北大方正集团有限公司 | Structure processing method and system of plate type table |
CN105589841A (en) * | 2016-01-15 | 2016-05-18 | 同方知网(北京)技术有限公司 | Portable document format (PDF) document form identification method |
CN106446881A (en) * | 2016-07-29 | 2017-02-22 | 北京交通大学 | Method for extracting lab test result from medical lab sheet image |
Non-Patent Citations (1)
Title |
---|
基于内容的文档图像倾斜校正;吕亚军;《计算机仿真》;20061231;第23卷(第12期);第192-196页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107909064A (en) | 2018-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105786930B (en) | Based on the searching method and device for touching interaction | |
JP2016524229A (en) | Search recommendation method and apparatus | |
CN104503957B (en) | A kind of formula graphic automatic generation method and device | |
CN106294222A (en) | A kind of method and device determining PCIE device and slot corresponding relation | |
CN101169787A (en) | Information processing apparatus and control method for the same | |
CN105022757A (en) | Webpage revision method and webpage revision device | |
CN105528332A (en) | A presentation file processing method and device | |
CN103500158A (en) | Method and device for annotating electronic document | |
CN105260674A (en) | Screen capture processing method and apparatus and intelligent terminal | |
KR102018046B1 (en) | Method and apparatus for extracting image feature | |
CN106648568B (en) | Method and device for adding check box on table | |
CN104410790A (en) | Information processing method and electronic equipment | |
CN107909064B (en) | Three line table recognition methods, electronic equipment and storage medium | |
CN114816410A (en) | Interface generation method, device and storage medium | |
CN104462452B (en) | The method and device that the page is shown | |
CN107704341A (en) | File access pattern method, apparatus and electronic equipment | |
CN108153731B (en) | Uncommon word processing method calculates equipment and computer storage medium | |
CN105320406A (en) | Picture management method and terminal | |
CN103955713A (en) | Icon recognition method and device | |
CN109885708A (en) | The searching method and device of certificate picture | |
CN104268545A (en) | Method for table area recognition and content rasterization in electronic document layout files | |
CN105022746A (en) | Character library generation method, server and system | |
CN106599275A (en) | Shooting search method and device | |
CN105224575A (en) | A kind of document display method and device | |
CN104598289A (en) | Recognition method and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220701 Address after: 518054-13098, 13th floor, main tower of marine center, No. 59, Linhai Avenue, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong Patentee after: Shenzhen ZhangYue Animation Technology Co.,Ltd. Address before: 100124 2029e, Sihui building, Chaoyang District, Beijing Patentee before: ZHANGYUE TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right |