CN107909064B - Three line table recognition methods, electronic equipment and storage medium - Google Patents

Three line table recognition methods, electronic equipment and storage medium Download PDF

Info

Publication number
CN107909064B
CN107909064B CN201711445372.5A CN201711445372A CN107909064B CN 107909064 B CN107909064 B CN 107909064B CN 201711445372 A CN201711445372 A CN 201711445372A CN 107909064 B CN107909064 B CN 107909064B
Authority
CN
China
Prior art keywords
line
horizontal route
route line
path
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711445372.5A
Other languages
Chinese (zh)
Other versions
CN107909064A (en
Inventor
张恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhangyue Animation Technology Co ltd
Original Assignee
Zhangyue Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhangyue Technology Co Ltd filed Critical Zhangyue Technology Co Ltd
Priority to CN201711445372.5A priority Critical patent/CN107909064B/en
Publication of CN107909064A publication Critical patent/CN107909064A/en
Application granted granted Critical
Publication of CN107909064B publication Critical patent/CN107909064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables

Abstract

The invention discloses a kind of three line table recognition methods, electronic equipment and storage mediums, wherein three line table recognition methods include:Extract path-line all in current layout page;Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;According to the index value of horizontal route line, division processing is carried out to horizontal route line, obtains at least one set of horizontal route line;For every group of horizontal route line, whether detection meets preset table feature by the rectangular area that each horizontal route line in this group of horizontal route line forms;If so, determining that rectangular area is three line table sections, and according to three line table sections, generate form image.Technical solution provided by the invention detects rectangular area composed by every group of horizontal route line according to preset table feature, it can fast and accurately identify three line tables, and the three line table sections obtained according to identification generate form image, to efficiently solve the problems such as file format converts brought table dislocation.

Description

Three line table recognition methods, electronic equipment and storage medium
Technical field
The present invention relates to technical field of information processing, and in particular to three line table recognition methods of one kind, electronic equipment and storage Medium.
Background technique
With the continuous development of the electronic equipments such as mobile phone, E-book reader, more and more users like read electronic Book.It is by electronics mostly in the prior art to enable e-book to be perfectly suitable for the electronic equipment of different screen size Book is fabricated to stream-oriented file, to can not only easily edit to eBook content, but also can make in e-book Hold the line feed automatically according to screen width adjustment paragraph to adapt to the field range of single page.So utilizing the formats such as PDF text When part makes e-book, then need that layout files is first converted into stream-oriented file.However the prior art can not accurately identify three Line table, when in layout files including three line tables, after layout files is converted into stream-oriented file, it is easy to it is wrong conversion occur Accidentally, especially when table is bigger, the problems such as then dislocation it is easy to appear table after conversion.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind State three line table recognition methods, electronic equipment and the storage medium of problem.
According to an aspect of the invention, there is provided a kind of three line table recognition methods, this method include:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of horizontal route line, division processing is carried out to horizontal route line, obtains at least one set of horizontal route Line;
For every group of horizontal route line, the rectangle region being made of each horizontal route line in this group of horizontal route line is detected Whether domain meets preset table feature;If so, determining that rectangular area is three line table sections, and according to three line table sections, generate Form image.
According to another aspect of the present invention, a kind of electronic equipment is provided, including:Processor, memory, communication interface and Communication bus, processor, memory and communication interface complete mutual communication by communication bus;
Memory makes processor execute following operation for storing an at least executable instruction, executable instruction:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of horizontal route line, division processing is carried out to horizontal route line, obtains at least one set of horizontal route Line;
For every group of horizontal route line, the rectangle region being made of each horizontal route line in this group of horizontal route line is detected Whether domain meets preset table feature;If so, determining that rectangular area is three line table sections, and according to three line table sections, generate Form image.
According to another aspect of the invention, a kind of storage medium is provided, it is executable that at least one is stored in storage medium Instruction, executable instruction make processor execute following operation:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of horizontal route line, division processing is carried out to horizontal route line, obtains at least one set of horizontal route Line;
For every group of horizontal route line, the rectangle region being made of each horizontal route line in this group of horizontal route line is detected Whether domain meets preset table feature;If so, determining that rectangular area is three line table sections, and according to three line table sections, generate Form image.
The technical solution provided according to the present invention extracts path-line all in current layout page, then to being mentioned The path-line of taking-up carries out Screening Treatment, obtains horizontal route line, according to the index value of horizontal route line, to horizontal route line into Row division processing, obtains at least one set of horizontal route line, is then directed to every group of horizontal route line, detects by this group of horizontal route line In each horizontal route line composition rectangular area whether meet preset table feature;If meeting preset table feature, really Determining rectangular area is three line table sections, and according to three line table sections, generates form image.Utilize technical side provided by the invention Case, the index value based on horizontal route line are grouped horizontal route line, and combine preset table feature to every group of horizontal road Rectangular area composed by radial line is detected, and can fast and accurately identify three line tables, and generate according to three line table sections Form image, then enabling the stream-oriented file being converted to pass through image shape after layout files is converted into stream-oriented file Formula completely shows the table content of three line tables, asks to efficiently solve file format and convert brought table dislocation etc. Topic.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows the flow diagram of according to embodiments of the present invention one three line table recognition methods of one kind;
Fig. 2 shows the flow diagrams of according to embodiments of the present invention two three line table recognition methods of one kind;
Fig. 3 shows the structural schematic diagram of according to embodiments of the present invention four a kind of electronic equipment.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Embodiment one
Fig. 1 shows the flow diagram of according to embodiments of the present invention one three line table recognition methods of one kind, such as Fig. 1 institute Show, this method comprises the following steps:
Step S101 extracts path-line all in current layout page.
Wherein, the object in layout page can have text, path-line and image.For the ease of identifying three line tables, It needs to extract path-line all in current layout page in step S101.Wherein, current layout page can be PDF pages Face.It specifically, due to extraction is path-line all in current layout page, the path-line extracted may include There are table line, headerfooter line, formula fraction line, the annotation path-lines such as cut-off rule and page layout background decorative thread.
Step S102 carries out Screening Treatment to the path-line extracted, obtains horizontal route line.
In conjunction with three line tables feature it is found that three line tables are made of at least three horizontal table lines, and in step S101 The path-line extracted may include a variety of path-lines such as table line, headerfooter line, in order to which which road quickly determined Radial line may be the corresponding path-line of three line tables, need to carry out the path-line extracted Screening Treatment, screening obtains level Path-line.
Step S103 carries out division processing to horizontal route line, obtains at least one set according to the index value of horizontal route line Horizontal route line.
In layout page, each object has corresponding index value, then in step s 103, it can be according to step The index value for the horizontal route line that S102 is screened carries out division processing to horizontal route line, obtains at least one set of horizontal road Radial line.Specifically, the consecutive horizontal route line of index value can be divided into one group of horizontal route line, to obtain at least one set Horizontal route line.
Step S104 selects one group of horizontal route line not being selected from least one set of horizontal route line.
After having obtained at least one set of horizontal route line, in order to effectively, comprehensively identify three line tables, it is also necessary to needle Each group of horizontal route line at least one set of horizontal route line is all detected, then from least one set of horizontal route line Select one group of horizontal route line not being selected.
Whether step S105, detection are met by the rectangular area that each horizontal route line in this group of horizontal route line forms Preset table feature;If so, thening follow the steps S106;If it is not, thening follow the steps S107.
Wherein, every group of horizontal route line is made of a plurality of horizontal route line, and each item in this group of horizontal route line is horizontal Path-line can make up a rectangular area, detect whether the rectangular area meets preset table feature.Those skilled in the art Preset table feature can be configured according to actual needs, herein without limitation.Specifically, preset table feature may include pre- If table title feature and preset table subfield feature etc..If detecting to meet preset table feature to rectangular area, execute Step S106;If detecting not meeting preset table feature to rectangular area, S107 is thened follow the steps.
Step S106 determines that rectangular area is three line table sections, and according to three line table sections, generates form image.
In the case where detecting to meet preset table feature to rectangular area through step S105, determine that the rectangular area is Three line table sections generate form image then according to three line table sections.It specifically, can be by screenshot mode to identified three Line table section is handled, and obtains form image to generate.It can be fast and accurately using technical solution provided by the invention Three line tables are identified, and the three line table sections obtained according to identification generate form image, then layout files to be converted into flowing After formula file, the stream-oriented file being converted to is enable completely to show the table content of three line tables by image format, thus Efficiently solve the problems such as file format converts brought table dislocation.
Step S107, determining rectangular area not is three line table sections.
In the case where detecting not meeting preset table feature to rectangular area through step S105, determine rectangular area not For three line table sections.
Step S108, judges whether every group of horizontal route line at least one set of horizontal route line is all selected;If so, Then this method terminates;If it is not, thening follow the steps S104.
If it is determined that the every group of horizontal route line obtained at least one set of horizontal route line is all selected, illustrate for extremely Every group of horizontal route line in few one group of horizontal route line is completed whether composed rectangular area meets preset table spy The detection of sign, then this method terminates;If it is determined that obtaining all not being selected, S104 is thened follow the steps.
Using three lines table recognition methods provided in this embodiment, path-line all in current layout page is extracted, is connect Screening Treatment is carried out to the path-line that is extracted, horizontal route line is obtained, according to the index value of horizontal route line, to level Path-line carries out division processing, obtains at least one set of horizontal route line, is then directed to every group of horizontal route line, detects by this group of water Whether the rectangular area of each horizontal route line composition in level road radial line meets preset table feature;If meeting preset table spy Sign, it is determined that rectangular area is three line table sections, and according to three line table sections, generates form image.Using provided by the invention Technical solution, the index value based on horizontal route line are grouped horizontal route line, and combine preset table feature to every group Rectangular area composed by horizontal route line is detected, and can fast and accurately identify three line tables, and obtain according to identification Three line table sections generate form image, then after layout files is converted into stream-oriented file, keep the streaming being converted to literary Part can completely show the table content of three line tables by image format, to efficiently solve file format conversion institute's band The problems such as table dislocation come.
Embodiment two
Fig. 2 shows the flow diagrams of according to embodiments of the present invention two three line table recognition methods of one kind, such as Fig. 2 institute Show, this method comprises the following steps:
Step S201 extracts path-line all in current layout page.
When needing to identify three line table in current layout page, institute in current layout page is extracted in step s 201 Some path-lines, the path-line extracted may include the path-lines such as table line, headerfooter line.
Step S202, the length and width of to acquisite approachs line.
After extracting all path-lines in current layout page, for the ease of from the path-line extracted Screening obtains horizontal route line, needs to obtain the length and width of each path line in step S202.
Step S203 filters out length and width from path-line and meets default sieve according to the length and width of path-line The path-line that screening obtains is determined as horizontal route line by the path-line for selecting rule.
Those skilled in the art can according to actual needs be configured default screening rule, herein without limitation.Specifically Ground, default screening rule may include:Length is greater than pre-set length threshold, width is less than predetermined width threshold value and length and width Between ratio be greater than preset ratio threshold value.Assuming that pre-set length threshold is single character length, predetermined width threshold value is 5 pictures Element, preset ratio threshold value are 10, then filtering out length from path-line greater than less than 5 single character length, width pixels And the ratio between length and width is greater than 10 path-line, and the path-line that screening obtains is determined as horizontal route line.
The consecutive horizontal route line of index value is divided into one group of horizontal route in horizontal route line by step S204 Line obtains at least one set of horizontal route line.
Each horizontal route line all has corresponding index value, by the consecutive level of index value in horizontal route line Path-line is divided into one group of horizontal route line, if the index value of certain horizontal route lines is mutually continuous, illustrates these horizontal route lines With stronger relevance, it is more likely that belong to the same three lines table.It is assumed that horizontal route line determined by step S203 has altogether Including 7 horizontal route lines, respectively horizontal route line 1 to horizontal route line 7, wherein the index value of horizontal route line 1 is 7, The index value of horizontal route line 2 is 8, and the index value of horizontal route line 3 is 9, and the index value of horizontal route line 4 is 12, horizontal road The index value of radial line 5 is 16, and the index value of horizontal route line 6 is 17, and the index value of horizontal route line 7 is 18, it follows that water The index value of level road radial line 1 to horizontal route line 3 is mutually continuous, and the index value of horizontal route line 5 to horizontal route line 7 is mutually continuous, Horizontal route line 1 to horizontal route line 3 is so divided into one group of horizontal route line, by horizontal route line 5 to horizontal route line 7 It is divided into another group of horizontal route line.
Step S205 selects one group of horizontal route line not being selected from least one set of horizontal route line.
Optionally, it is made of due to three line tables at least three horizontal table lines, has at least three in three line tables Bar horizontal route line can first detect the horizontal route line in this group of horizontal route line before step S206 based on this feature Whether item number is greater than or equal to 3.If the item number that detection obtains horizontal route line is greater than or equal to 3, illustrate this group of horizontal route line It may be the corresponding horizontal route line of three line tables, then follow the steps S206, further detected;If detection obtains horizontal road The item number of radial line illustrates that this group of horizontal route line can not be the corresponding horizontal route line of three line tables, thens follow the steps less than 3 S212.Facilitate to reduce the data processing amount in three line table identification process in this way.
Whether step S206, the left end point and right endpoint for detecting each horizontal route line in this group of horizontal route line are distinguished Alignment;If so, thening follow the steps S207;If it is not, thening follow the steps S212.
In order to rapidly judge whether this group of horizontal route line is the corresponding horizontal route line of three line tables, can detect should Whether left and right is aligned each horizontal route line in group horizontal route line, specifically, can detect each in this group of horizontal route line Whether the left end point of bar horizontal route line aligns, whether right endpoint aligns.If left end point aligns and right endpoint is opposite Together, illustrate that each horizontal route line in this group of horizontal route line or so is aligned, then follow the steps S207;If left end point not phase Alignment and/or right endpoint do not align, and illustrate that each horizontal route line or so in this group of horizontal route line is misaligned, then execute Step S212.
Detection obtains the left end point of each horizontal route line in this group of horizontal route line and right endpoint is aligned respectively In the case of, whether preset table then is met to the rectangular area being made of each horizontal route line in this group of horizontal route line Feature is detected.Specifically, it can be realized by step S207 to step S210.
Step S207 extracts the first text within the scope of the predetermined width in current layout page corresponding to rectangular area This object.
For the ease of being distinguished to the text object extracted, in the present invention, will from current layout page The text object extracted within the scope of predetermined width corresponding to rectangular area is known as the first text object, will be out of rectangular area The text object extracted is known as the second text object.Specifically, in step S207 can extract current layout page in The first text object within the scope of predetermined width corresponding to above rectangular area.Those skilled in the art can be according to actual needs Predetermined width range is configured, herein without limitation.For example, predetermined width range can be 3 character widths.
Step S208, whether the first text object of detection includes preset table title feature character;If so, thening follow the steps S209;If it is not, thening follow the steps S212.
Wherein, preset table title feature character includes but is not limited to:" table " character, " Tab " character and " Table " character Deng for indicating the character of table title.If it includes preset table title feature character, explanation that detection, which obtains the first text object, The rectangular area is probably that three line table sections in order to more precisely identify three line tables then follow the steps S209;Such as Fruit detects to obtain the first text object not including preset table title feature character, thens follow the steps S212.
Step S209 extracts the second text object in rectangular area, and in vertical direction by the second text object It is projected, obtains upright projection.
The second text object is extracted out of rectangular area, then by the second text object extracted in vertical direction On projected, and obtained projection result is known as upright projection.
Whether step S210, detection upright projection meet preset table subfield feature;If so, thening follow the steps S211;If It is no, then follow the steps S212.
Although only horizontal table line is without vertical table line in three line tables, the text object in the table is still It arranges in the way of subfield, therefore the projection of the text object in the table in vertical direction can also be divided into several areas Preset table subfield feature can be arranged based on this feature in domain, then can detect vertical throw after step S209 obtains upright projection Whether shadow meets preset table subfield feature.If detection obtains upright projection and meets preset table subfield feature, step is executed Rapid S211;If detection obtains upright projection and do not meet preset table subfield feature, S212 is thened follow the steps.
Step S211 determines that rectangular area is three line table sections, and according to three line table sections, generates form image.
The vertical throwing that the first text object includes preset table title feature character and the second text object is obtained in detection In the case that shadow meets preset table subfield feature, that is, it can determine that the rectangular area is three line table sections, it then can be to determining Three line table sections carry out screenshot processing, generation obtain form image, then making after layout files is converted into stream-oriented file The stream-oriented file being converted to can completely show the table content of three line tables by image format, to efficiently solve File format converts the problems such as brought table dislocation.
Step S212, determining rectangular area not is three line table sections.
In the left end point and right endpoint for detecting to obtain each horizontal route line in this group of horizontal route line through step S206 In the case where not being aligned respectively, determining rectangular area not is three line table sections;In addition, detecting to obtain the first text through step S208 It is not inconsistent in the case that this object does not include preset table title feature character and detecting to obtain upright projection through step S210 In the case where closing preset table subfield feature, determining rectangular area not is three line table sections.
Step S213, judges whether every group of horizontal route line at least one set of horizontal route line is all selected;If so, Then this method terminates;If it is not, thening follow the steps S205.
If it is determined that the every group of horizontal route line obtained at least one set of horizontal route line is all selected, illustrate for extremely Every group of horizontal route line in few one group of horizontal route line is completed whether composed rectangular area meets preset table spy The detection of sign, then this method terminates;If it is determined that obtaining all not being selected, S205 is thened follow the steps.
Using three lines table recognition methods provided in this embodiment, the continuity of the index value based on horizontal route line is to level Path-line is grouped, and can rapidly determine from a plurality of horizontal route line may be for one corresponding to the same three lines table Group horizontal route line, effectively reduces the data processing amount in three line table identification process;In conjunction with preset table title feature character Rectangular area composed by every group of horizontal route line is detected with preset table subfield feature etc., it can be fast and accurately It identifies three line tables, improves recognition efficiency and the recognition accuracy of three line tables;And it is raw according to the three line table sections that identification obtains At form image, then enabling the stream-oriented file being converted to pass through image after layout files is converted into stream-oriented file Form completely shows the table content of three line tables, so that efficiently solving file format converts brought table dislocation etc. Problem.
Embodiment three
The embodiment of the present invention three provides a kind of non-volatile memory medium, and storage medium is stored at least one executable finger It enables, which can be performed three line table recognition methods in above-mentioned any means embodiment.
Executable instruction specifically can be used for so that processor executes following operation:It extracts in current layout page and owns Path-line;Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;According to the index of horizontal route line Value, carries out division processing to horizontal route line, obtains at least one set of horizontal route line;For every group of horizontal route line, detection by Whether the rectangular area of each horizontal route line composition in this group of horizontal route line meets preset table feature;If so, really Determining rectangular area is three line table sections, and according to three line table sections, generates form image.
In a kind of optional embodiment, executable instruction further makes processor execute following operation:To acquisite approachs The length and width of line;According to the length and width of path-line, length and width is filtered out from path-line and meets default screening The path-line that screening obtains is determined as horizontal route line by the path-line of rule.
In a kind of optional embodiment, default screening rule includes:Length is greater than pre-set length threshold, width is less than Ratio between predetermined width threshold value and length and width is greater than preset ratio threshold value.
In a kind of optional embodiment, executable instruction further makes processor execute following operation:On horizontal road The consecutive horizontal route line of index value is divided into one group of horizontal route line in radial line.
In a kind of optional embodiment, executable instruction further makes processor execute following operation:Detect the group Whether the left end point and right endpoint of each horizontal route line in horizontal route line are aligned respectively;If detection obtains the horizontal road of the group The left end point and right endpoint of each horizontal route line in radial line are aligned respectively, then detection is by each item in this group of horizontal route line Whether the rectangular area of horizontal route line composition meets preset table feature.
In a kind of optional embodiment, executable instruction further makes processor execute following operation:It extracts and works as The first text object within the scope of predetermined width in preceding layout page corresponding to rectangular area;Detecting the first text object is No includes preset table title feature character.
In a kind of optional embodiment, executable instruction further makes processor execute following operation:Extract square The second text object in shape region;Second text object is projected in vertical direction, obtains upright projection;Detection is hung down Deliver directly whether shadow meets preset table subfield feature.
Example IV
Fig. 3 shows the structural schematic diagram of according to embodiments of the present invention four a kind of electronic equipment, present invention specific implementation Example does not limit the specific implementation of electronic equipment.
As shown in figure 3, the electronic equipment may include:Processor (processor) 302, communication interface (Communications Interface) 304, memory (memory) 306 and communication bus 308.
Wherein:
Processor 302, communication interface 304 and memory 306 complete mutual communication by communication bus 308.
Communication interface 304, for being communicated with the network element of other equipment such as client or other servers etc..
Processor 302 can specifically execute the phase in above-mentioned three lines table recognition methods embodiment for executing program 310 Close step.
Specifically, program 310 may include program code, which includes computer operation instruction.
Processor 302 may be central processor CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention Road.The one or more processors that electronic equipment includes can be same type of processor, such as one or more CPU;It can also To be different types of processor, such as one or more CPU and one or more ASIC.
Memory 306, for storing program 310.Memory 306 may include high speed RAM memory, it is also possible to further include Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.
Program 310 specifically can be used for so that processor 302 executes following operation:It extracts in current layout page and owns Path-line;Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;According to the index of horizontal route line Value, carries out division processing to horizontal route line, obtains at least one set of horizontal route line;For every group of horizontal route line, detection by Whether the rectangular area of each horizontal route line composition in this group of horizontal route line meets preset table feature;If so, really Determining rectangular area is three line table sections, and according to three line table sections, generates form image.
In a kind of optional embodiment, program 310 is further such that processor 302 executes following operation:Obtain road The length and width of radial line;According to the length and width of path-line, length and width is filtered out from path-line and meets default sieve The path-line that screening obtains is determined as horizontal route line by the path-line for selecting rule.
In a kind of optional embodiment, default screening rule includes:Length is greater than pre-set length threshold, width is less than Ratio between predetermined width threshold value and length and width is greater than preset ratio threshold value.
In a kind of optional embodiment, program 310 is further such that processor 302 executes following operation:In level The consecutive horizontal route line of index value is divided into one group of horizontal route line in path-line.
In a kind of optional embodiment, program 310 is further such that processor 302 executes following operation:Detection should Whether the left end point and right endpoint of each horizontal route line in group horizontal route line are aligned respectively;If detection obtains group level The left end point and right endpoint of each horizontal route line in path-line are aligned respectively, then detection is by each in this group of horizontal route line Whether the rectangular area of bar horizontal route line composition meets preset table feature.
In a kind of optional embodiment, program 310 is further such that processor 302 executes following operation:It extracts The first text object within the scope of predetermined width in current layout page corresponding to rectangular area;Detect the first text object It whether include preset table title feature character.
In a kind of optional embodiment, program 310 is further such that processor 302 executes following operation:It extracts The second text object in rectangular area;Second text object is projected in vertical direction, obtains upright projection;Detection Whether upright projection meets preset table subfield feature.
The corresponding steps that the specific implementation of each step may refer in above-mentioned three lines table identification embodiment in program 310 are corresponding Description, this will not be repeated here.It is apparent to those skilled in the art that for convenience and simplicity of description, it is above-mentioned The specific work process of the equipment of description can refer to corresponding processes in the foregoing method embodiment description, and details are not described herein.
The scheme provided through this embodiment, the index value based on horizontal route line are grouped horizontal route line, and Rectangular area composed by every group of horizontal route line is detected in conjunction with preset table feature, can fast and accurately be identified Three line table out, and form image is generated according to three line table sections, then making to convert after layout files is converted into stream-oriented file Obtained stream-oriented file can completely show the table content of three line tables by image format, to efficiently solve file Format converts the problems such as brought table dislocation.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention:It is i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.The use of word first, second, and third does not indicate any sequence.These words can be construed to title.

Claims (18)

1. a kind of three line table recognition methods, including:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of the horizontal route line, division processing is carried out to the horizontal route line, obtains at least one set of level Path-line;Wherein, the consecutive horizontal route line of index value is divided into one group of horizontal route line in the horizontal route line;
For every group of horizontal route line, detection is by the rectangular area that each horizontal route line in this group of horizontal route line forms It is no to meet preset table feature;If so, determine that the rectangular area is three line table sections, and according to the three lines table section, Generate form image.
2. obtaining horizontal route according to the method described in claim 1, described carry out Screening Treatment to the path-line extracted Line further comprises:
Obtain the length and width of the path-line;
According to the length and width of the path-line, length and width is filtered out from the path-line and meets default screening rule Path-line, the obtained path-line of screening is determined as horizontal route line.
3. according to the method described in claim 2, the default screening rule includes:Length is greater than pre-set length threshold, width It is greater than preset ratio threshold value less than the ratio between predetermined width threshold value and length and width.
4. method according to claim 1-3, horizontal by each item in this group of horizontal route line in the detection Before whether the rectangular area of path-line composition meets preset table feature, the method also includes:
Whether the left end point and right endpoint for detecting each horizontal route line in this group of horizontal route line are aligned respectively;
Whether the detection meets preset table by the rectangular area that each horizontal route line in this group of horizontal route line forms Feature is specially:If detecting the left end point for obtaining each horizontal route line in this group of horizontal route line and right endpoint being right respectively Together, then it detects by whether the rectangular area that each horizontal route line in this group of horizontal route line forms meets preset table spy Sign.
5. method according to claim 1-3, the detection is by the horizontal road of each item in this group of horizontal route line Whether the rectangular area of radial line composition, which meets preset table feature, further comprises:
Extract the first text object in current layout page within the scope of the predetermined width corresponding to the rectangular area;
Detect whether first text object includes preset table title feature character.
6. method according to claim 1-3, the detection is by the horizontal road of each item in this group of horizontal route line Whether the rectangular area of radial line composition, which meets preset table feature, further comprises:
Extract the second text object in the rectangular area;
Second text object is projected in vertical direction, obtains upright projection;
Detect whether the upright projection meets preset table subfield feature.
7. a kind of electronic equipment, including:Processor, memory, communication interface and communication bus, the processor, the storage Device and the communication interface complete mutual communication by the communication bus;
The memory makes the processor execute following behaviour for storing an at least executable instruction, the executable instruction Make:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of the horizontal route line, division processing is carried out to the horizontal route line, obtains at least one set of level Path-line;Wherein, the consecutive horizontal route line of index value is divided into one group of horizontal route line in the horizontal route line;
For every group of horizontal route line, detection is by the rectangular area that each horizontal route line in this group of horizontal route line forms It is no to meet preset table feature;If so, determine that the rectangular area is three line table sections, and according to the three lines table section, Generate form image.
8. electronic equipment according to claim 7, the executable instruction further makes the processor execute following behaviour Make:
Obtain the length and width of the path-line;
According to the length and width of the path-line, length and width is filtered out from the path-line and meets default screening rule Path-line, the obtained path-line of screening is determined as horizontal route line.
9. electronic equipment according to claim 8, the default screening rule include:Length be greater than pre-set length threshold, Width is less than the ratio between predetermined width threshold value and length and width and is greater than preset ratio threshold value.
10. according to the described in any item electronic equipments of claim 7-9, the executable instruction further holds the processor The following operation of row:
Whether the left end point and right endpoint for detecting each horizontal route line in this group of horizontal route line are aligned respectively;
If detection obtains the left end point of each horizontal route line in this group of horizontal route line and right endpoint is aligned respectively, detect Whether preset table feature is met by the rectangular area that each horizontal route line in this group of horizontal route line forms.
11. according to the described in any item electronic equipments of claim 7-9, the executable instruction further holds the processor The following operation of row:
Extract the first text object in current layout page within the scope of the predetermined width corresponding to the rectangular area;
Detect whether first text object includes preset table title feature character.
12. according to the described in any item electronic equipments of claim 7-9, the executable instruction further holds the processor The following operation of row:
Extract the second text object in the rectangular area;
Second text object is projected in vertical direction, obtains upright projection;
Detect whether the upright projection meets preset table subfield feature.
13. a kind of storage medium, it is stored with an at least executable instruction in the storage medium, the executable instruction makes to handle Device executes following operation:
Extract path-line all in current layout page;
Screening Treatment is carried out to the path-line extracted, obtains horizontal route line;
According to the index value of the horizontal route line, division processing is carried out to the horizontal route line, obtains at least one set of level Path-line;Wherein, the consecutive horizontal route line of index value is divided into one group of horizontal route line in the horizontal route line;
For every group of horizontal route line, detection is by the rectangular area that each horizontal route line in this group of horizontal route line forms It is no to meet preset table feature;If so, determine that the rectangular area is three line table sections, and according to the three lines table section, Generate form image.
14. storage medium according to claim 13, it is following that the executable instruction further executes the processor Operation:
Obtain the length and width of the path-line;
According to the length and width of the path-line, length and width is filtered out from the path-line and meets default screening rule Path-line, the obtained path-line of screening is determined as horizontal route line.
15. storage medium according to claim 14, the default screening rule include:Length is greater than preset length threshold Value, width are less than the ratio between predetermined width threshold value and length and width and are greater than preset ratio threshold value.
16. the described in any item storage mediums of 3-15 according to claim 1, the executable instruction further makes the processor Execute following operation:
Whether the left end point and right endpoint for detecting each horizontal route line in this group of horizontal route line are aligned respectively;
If detection obtains the left end point of each horizontal route line in this group of horizontal route line and right endpoint is aligned respectively, detect Whether preset table feature is met by the rectangular area that each horizontal route line in this group of horizontal route line forms.
17. the described in any item storage mediums of 3-15 according to claim 1, the executable instruction further makes the processor Execute following operation:
Extract the first text object in current layout page within the scope of the predetermined width corresponding to the rectangular area;
Detect whether first text object includes preset table title feature character.
18. the described in any item storage mediums of 3-15 according to claim 1, the executable instruction further makes the processor Execute following operation:
Extract the second text object in the rectangular area;
Second text object is projected in vertical direction, obtains upright projection;
Detect whether the upright projection meets preset table subfield feature.
CN201711445372.5A 2017-12-27 2017-12-27 Three line table recognition methods, electronic equipment and storage medium Active CN107909064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711445372.5A CN107909064B (en) 2017-12-27 2017-12-27 Three line table recognition methods, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711445372.5A CN107909064B (en) 2017-12-27 2017-12-27 Three line table recognition methods, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN107909064A CN107909064A (en) 2018-04-13
CN107909064B true CN107909064B (en) 2018-11-16

Family

ID=61871745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711445372.5A Active CN107909064B (en) 2017-12-27 2017-12-27 Three line table recognition methods, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN107909064B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310682B (en) * 2020-02-24 2023-05-12 民生科技有限责任公司 Universal detection analysis and recognition method for text file forms

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101064007A (en) * 2006-04-29 2007-10-31 北大方正集团有限公司 Digital correction method for geometric distortion of form image
CN101770446A (en) * 2008-12-26 2010-07-07 北大方正集团有限公司 Method and system for identifying form in layout file
CN101887413A (en) * 2009-05-14 2010-11-17 北大方正集团有限公司 Structure processing method and system of plate type table
CN105589841A (en) * 2016-01-15 2016-05-18 同方知网(北京)技术有限公司 Portable document format (PDF) document form identification method
CN106446881A (en) * 2016-07-29 2017-02-22 北京交通大学 Method for extracting lab test result from medical lab sheet image

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315668A (en) * 2008-07-01 2008-12-03 上海大学 Automatic detection method for test paper form
CN101676930A (en) * 2008-09-17 2010-03-24 北大方正集团有限公司 Method and device for recognizing table cells in scanned image
CN101866335B (en) * 2010-06-14 2012-12-12 深圳市万兴软件有限公司 Form processing method and device in document conversion
CN103377177B (en) * 2012-04-27 2016-03-30 北大方正集团有限公司 Method and the device of form is identified in a kind of digital layout files
CN106156761B (en) * 2016-08-10 2020-01-10 北京交通大学 Image table detection and identification method for mobile terminal shooting
CN107169486B (en) * 2017-05-12 2018-06-15 掌阅科技股份有限公司 The recognition methods of text type page, electronic equipment and computer storage media

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101064007A (en) * 2006-04-29 2007-10-31 北大方正集团有限公司 Digital correction method for geometric distortion of form image
CN101770446A (en) * 2008-12-26 2010-07-07 北大方正集团有限公司 Method and system for identifying form in layout file
CN101887413A (en) * 2009-05-14 2010-11-17 北大方正集团有限公司 Structure processing method and system of plate type table
CN105589841A (en) * 2016-01-15 2016-05-18 同方知网(北京)技术有限公司 Portable document format (PDF) document form identification method
CN106446881A (en) * 2016-07-29 2017-02-22 北京交通大学 Method for extracting lab test result from medical lab sheet image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于内容的文档图像倾斜校正;吕亚军;《计算机仿真》;20061231;第23卷(第12期);第192-196页 *

Also Published As

Publication number Publication date
CN107909064A (en) 2018-04-13

Similar Documents

Publication Publication Date Title
CN105786930B (en) Based on the searching method and device for touching interaction
JP2016524229A (en) Search recommendation method and apparatus
CN104503957B (en) A kind of formula graphic automatic generation method and device
CN106294222A (en) A kind of method and device determining PCIE device and slot corresponding relation
CN101169787A (en) Information processing apparatus and control method for the same
CN105022757A (en) Webpage revision method and webpage revision device
CN105528332A (en) A presentation file processing method and device
CN103500158A (en) Method and device for annotating electronic document
CN105260674A (en) Screen capture processing method and apparatus and intelligent terminal
KR102018046B1 (en) Method and apparatus for extracting image feature
CN106648568B (en) Method and device for adding check box on table
CN104410790A (en) Information processing method and electronic equipment
CN107909064B (en) Three line table recognition methods, electronic equipment and storage medium
CN114816410A (en) Interface generation method, device and storage medium
CN104462452B (en) The method and device that the page is shown
CN107704341A (en) File access pattern method, apparatus and electronic equipment
CN108153731B (en) Uncommon word processing method calculates equipment and computer storage medium
CN105320406A (en) Picture management method and terminal
CN103955713A (en) Icon recognition method and device
CN109885708A (en) The searching method and device of certificate picture
CN104268545A (en) Method for table area recognition and content rasterization in electronic document layout files
CN105022746A (en) Character library generation method, server and system
CN106599275A (en) Shooting search method and device
CN105224575A (en) A kind of document display method and device
CN104598289A (en) Recognition method and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220701

Address after: 518054-13098, 13th floor, main tower of marine center, No. 59, Linhai Avenue, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong

Patentee after: Shenzhen ZhangYue Animation Technology Co.,Ltd.

Address before: 100124 2029e, Sihui building, Chaoyang District, Beijing

Patentee before: ZHANGYUE TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right