CN114120302B - Method for extracting structured information from form image - Google Patents

Method for extracting structured information from form image Download PDF

Info

Publication number
CN114120302B
CN114120302B CN202111393543.0A CN202111393543A CN114120302B CN 114120302 B CN114120302 B CN 114120302B CN 202111393543 A CN202111393543 A CN 202111393543A CN 114120302 B CN114120302 B CN 114120302B
Authority
CN
China
Prior art keywords
image
paper
fixedly connected
layer
shield
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111393543.0A
Other languages
Chinese (zh)
Other versions
CN114120302A (en
Inventor
朱宏宽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Yimaide Technology Co ltd
Original Assignee
Wuxi Yimaide Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Yimaide Technology Co ltd filed Critical Wuxi Yimaide Technology Co ltd
Priority to CN202111393543.0A priority Critical patent/CN114120302B/en
Publication of CN114120302A publication Critical patent/CN114120302A/en
Application granted granted Critical
Publication of CN114120302B publication Critical patent/CN114120302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • G06T3/02

Abstract

The invention belongs to the technical field of computer image processing, in particular to a method for extracting structural information from a form image, which comprises the following steps: the first step: customizing a form template; and a second step of: affine transforming the form image; and a third step of: matching the position of the header; fourth step: identifying the form image content; the method has reasonable design, can automatically carry out affine transformation on the table image to adapt to the size of the table template after customizing the template for different types of tables, then automatically adjusts the position by the layer of the table template to search the position of the table head in the table image, further uses the optical character recognition technology to recognize all the character contents and automatically fills the character contents into the structured data table, can effectively save human resources, improve the efficiency, avoid errors and effectively meet the use requirement.

Description

Method for extracting structured information from form image
Technical Field
The invention belongs to the technical field of computer image processing, and particularly relates to a method for extracting structural information from a form image.
Background
In a common commodity supply chain, for convenience in management, a commodity seller needs to report sales information of a commodity to an information management system, and for verification of the reported information, the seller needs to report a corresponding sales document picture (fig. 1) and a corresponding structured information table (fig. 2).
In the prior art, because of the lack of a method for extracting structured information from an image, on one hand, a salesman needs to manually fill sales information in an information system, and on the other hand, a supervisor needs to verify whether sales data is correct and reliable according to structured information and sales document pictures provided by a salesman. Not only is the efficiency low. A large amount of manpower resources are consumed, mistakes are easy to occur, and the use requirement cannot be effectively met.
To this end, the invention provides a method of extracting structured information from a tabular image.
Disclosure of Invention
In order to overcome the deficiencies of the prior art, at least one technical problem presented in the background art is solved.
The technical scheme adopted for solving the technical problems is as follows: the invention discloses a method for extracting structured information from a form image, which comprises the following steps:
the first step: customizing a form template: marking an area needing to be subjected to character recognition in the form image to generate a form template corresponding to the form;
and a second step of: affine transformation of the form image: affine transformation of the acquired form image into a form template generated in the first step;
and a third step of: matching the position of the header: matching the table template generated in the first step with the table head position in the table image after affine transformation in the second step;
fourth step: identifying the form image content: after the positions of the table heads are matched, other relative positions of the table templates in the third step are matched with corresponding positions in the table images after affine transformation in the second step, optical character recognition is carried out on image contents of the positions, and recognition results including the positions of the table heads are filled in the structured data table.
Preferably, in the first step, the area in the form image to be text-identified is marked as a red frame in the form template, and the form template only contains the position information of the red frame.
Preferably, in the second step, the form image is rotated to be in a positive position, and is enlarged or reduced to be suitable for the size of the form template generated in the first step.
Preferably, in the third step, the table image after affine transformation in the second step is specifically used as a first layer, the table template generated in the first step is used as a second layer, the second layer covers the first layer, optical character recognition is performed on the corresponding first layer part in the uppermost red frame of the second layer through optical recognition equipment, meanwhile, the optical recognition precision is improved by smoothing the table paper to be recognized through a brush roller matched with an air cylinder, if the recognition result is the same as the table head name in the table image in the first step, the table head position is found, if the recognition result is different from the table head name in the table image in the first step, the second layer is slid to try a new position and perform optical character recognition again until the recognition result is the same as the table head name, namely, the correct table head matching position is found.
Preferably, in the fourth step, optical text recognition is specifically performed on the image content corresponding to the first layer in all red frames of the second layer.
Preferably, the optical recognition apparatus includes a body; two sides of the body are respectively provided with a paper inlet board and a paper outlet board; the top of the body is connected with an identification module in a sliding way through a sliding rail, the bottom of the identification module is connected with a camera, a horn-shaped light shield is arranged on the periphery of the camera, and a positioning groove is formed in the position, corresponding to the light shield, of the top of the body; a group of rubber wheels are rotatably connected to two sides of the positioning groove, and the rubber wheels rotate in the same direction; an inclined plate is fixedly connected to one side, far away from the body, of the paper feeding plate; a conical semipermeable membrane is arranged in the light shield, the bottom end of the semipermeable membrane is fixedly connected with the lower edge of the light shield, and the top of the semipermeable membrane is fixedly connected with the periphery of the camera; a lamp panel is arranged between the semi-permeable membrane and the light shield, and the lamp panel is fixedly connected with the light shield through a bracket; during the use, transport the form paper of waiting to discern to the constant head tank through the rubber wheel, later through reducing identification module's height for the light shield descends the back and covers the form paper of waiting to discern, so as to shade, reduce the reflection of light that external light source shines and arouses on the paper of waiting to detect, increase the definition of camera collection image, scatter the light that the lamp plate sent through the semi-permeable membrane simultaneously, make the lamp plate even lighten the form paper of waiting to discern, further increase the optical recognition precision of text information on the form paper of waiting to discern.
Preferably, the two sides of the positioning groove, which are close to the paper feeding plate and the paper discharging plate, are in inclined transition with the top of the body; a cavity is formed in the body at a position corresponding to the positioning groove, and the cavity is communicated with the bottom of the positioning groove through a group of negative pressure holes; the two sides of the cavity are respectively provided with an air suction hole and an air inlet hole, the air suction holes are communicated with the vacuum pump through a pipeline, and a compact sponge plug is fixedly connected in the air inlet hole; after entering the constant head tank through the paper, start the vacuum pump and bleed to the cavity, and then suck paper through the negative pressure hole for the paper evenly paves in the constant head tank, increases the roughness of paper, further reduces the recognition error that the paper fold arouses, cooperates the slow gas leakage of closely knit sponge stopper simultaneously, reduces the vacuum degree of cavity, reduces the local sunken or the damage of paper that the suction force of negative pressure hole to paper arouses excessively, further increases optical character recognition accuracy.
Preferably, a protective cover is arranged at the top of the paper feeding plate, an air cylinder is fixedly connected to the top of the protective cover, and a piston rod of the air cylinder extends into the protective cover; the bottom end of the piston rod is hinged with a pair of connecting rods, one end of each connecting rod, which is far away from the piston rod, is provided with a brush roller, and the brush roller is rotationally connected with the end part of each connecting rod through a rotating shaft; arc grooves which are arranged in a splayed manner are formed in the positions, corresponding to the rotating shafts, of the inner wall of the shield, and the arc grooves are symmetrically arranged relative to the air cylinders; one end of the rotating shaft is embedded into the arc-shaped groove and is in sliding connection with the arc-shaped groove; a group of brush hairs are uniformly distributed on the periphery of the brush roller, a winding roller is fixedly connected to one side of the brush roller, a pull rope is wound on the winding roller, and the other end of the pull rope is fixedly connected to the inner wall of the protective cover at the corresponding position of the bottom of the air cylinder; tension springs are fixedly connected to the middle parts of the two connecting rods; through putting into cardboard top with the form paper that waits to discern, later start the cylinder and promote the connecting rod downwards, cooperation arc wall makes the brush roll slide downwards to cylinder both sides slope, and the stay cord drives the brush roll and rotates in opposite direction simultaneously, and then smoothes the paper of fold or turn-up, further increases the roughness of paper, increases form paper optical character recognition efficiency, reduces information omission.
Preferably, a sealed water tank is fixedly connected in the shield, a heating pipe is arranged at the bottom of the water tank, and an exhaust pipe is communicated with the position, close to the top, of the water tank; a first hole is formed in the rotating shaft and is communicated with the exhaust pipe through a hose; the heating pipe is used for heating water in the vertical direction to form water vapor, and then the water vapor is filled into the rotating shaft through the exhaust pipe to heat the rotating shaft and the brush roller, so that the heated brush roller is used for ironing paper, the flatness of the paper is further improved, the rebound warping edge of the paper after passing through the brush roller is reduced, and the optical character recognition precision is further improved.
Preferably, the first hole is a blind hole; an annular groove is formed in the middle of the periphery of the brush roller, an elastic ring is sleeved in the annular groove, and end faces at two ends of the elastic ring are in sealing connection with the side wall of the annular groove; an annular air passage is formed between the elastic ring and the annular groove, and the air passage is communicated with the bottom of the first hole through a group of second holes; a group of sliding holes are uniformly distributed on the inner circumference of the annular groove, steel balls are embedded at the tops of the sliding holes, the steel balls are fixedly connected with the bottoms of the sliding holes through springs, and the sliding holes are in sealing connection with the steel balls; one side of the steel ball, which is far away from the spring, is propped against the inner wall of the elastic ring; the steam is introduced into the air passage through the first hole and the second hole, so that the heating ironing efficiency of the steam on paper is improved, meanwhile, the contact between the steam and the paper is reduced, the wet softening damage of the paper is reduced, when the steam passes through the steel balls, the steel balls are extruded by the steam, the steel balls are attached again after being separated from the elastic ring for a short time, the steam is circulated continuously, and the shaking smoothing efficiency of the elastic ring and the brush hair is further improved
The beneficial effects of the invention are as follows:
1. according to the method for extracting the structured information from the table image, after templates are customized for different types of tables, affine transformation can be automatically carried out on the table image to adapt to the size of the table template, then the position of the table head in the table image is automatically adjusted by the table template layer, and then all text contents are identified by using an optical text identification technology and are automatically filled into the structured data table, so that manpower resources can be effectively saved, efficiency is improved, errors are avoided, and the use requirement is effectively met.
2. According to the method for extracting the structured information from the form image, the form paper to be identified is transported to the positioning groove through the rubber wheel, and then the height of the identification module is reduced, so that the form paper to be identified is covered after the light shield descends, light shielding is conducted, reflection caused by the fact that an external light source irradiates the paper to be detected is reduced, the definition of an image acquired by the camera is improved, meanwhile, light rays emitted by the lamp panel are scattered through the semi-transparent film, the form paper to be identified is uniformly illuminated by the lamp panel, and the optical identification precision of text information on the form paper to be identified is further improved.
Drawings
The invention is further described below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of a sales order.
Fig. 2 is a diagram of the structured document information representation corresponding to fig. 1.
FIG. 3 is a schematic diagram of an embodiment of a form template corresponding to FIG. 1.
FIG. 4 is a schematic diagram of the form template of FIG. 3 when the header positions in the sales order picture of FIG. 2 match.
FIG. 5 is a flow chart of the method of the present invention;
FIG. 6 is a perspective view of the present invention;
FIG. 7 is a front view of the present invention;
FIG. 8 is an enlarged view of a portion of FIG. 7 at A;
FIG. 9 is a schematic view of the structure of a brush roll according to the present invention;
in the figure: the device comprises a body 1, a paper inlet plate 11, a paper outlet plate 12, a sliding rail 13, an identification module 14, a camera 15, a light shield 16, a positioning groove 17, a rubber wheel 18, an inclined plate 19, a semipermeable membrane 2, a lamp panel 21, a cavity 22, a negative pressure hole 23, an air suction hole 24, an air inlet hole 25, a sponge plug 26, a shield 3, an air cylinder 31, a connecting rod 32, a brush roller 33, a rotating shaft 34, an arc-shaped groove 35, bristles 36, a winding roller 37, a pull rope 38, a tension spring 39, a water tank 4, a heating pipe 41, an exhaust pipe 42, a first hole 43, a hose 44, an annular groove 45, an elastic ring 46, an air passage 47, a second hole 48, a sliding hole 49 and a steel ball 5.
Detailed Description
The invention is further described in connection with the following detailed description in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.
Example 1
Although the formats of sales documents of different companies are not consistent, the sales document formats of the same company are always consistent, so the form template may be customized for a specific company sales document first. Based on the form template, the image processing technology and the character recognition technology are considered to be utilized to extract the structural information of the form from the image, namely the structural information shown in fig. 2
The first step of the embodiment of the invention is as follows: customizing a form template: and marking the area needing to be subjected to character recognition in the picture. For example, all of the red boxes in FIG. 3 constitute one form template for the sales order in FIG. 1. The form templates are manually customized, and can be customized for different types of documents according to sales orders of different manufacturers. It should be noted that the form template describes only the position information of the red frame.
And a second step of: affine transformation of the form image: affine transformations may rotate, scale up or scale down images. The purpose of the affine transformation is to scale the image rotation to a size that is mutually adapted to the form template. This is because the pictures of sales sheets uploaded by users are usually taken by the users using mobile phones, and there are cases where the pictures are inconsistent in size, skewed, flipped, and the like, so it is necessary to correct the pictures with affine transformation. A user takes a picture with a certain tilt with a cell phone, needs to convert it into a picture with an affine transformation, and scales to a size suitable for the form template.
And a third step of: matching the position of the header: the header is the header above the form, for example, the header in fig. 1 is "Ningxia Rui materials science trade company sales order". The specific method is to take the form picture (figure 1) as a first layer and the form template (figure 3) as a second layer. And taking out the part framed by the 1 st red frame at the top for optical character recognition, if the recognition result is the same as the header name of Ningxia Rui materials and trade company Limited sales order, indicating that the header position is found, and if the recognition result is different from the header name, sliding the second image layer to try to match the new position until the correct header matching position is found.
Fourth step: after the position of the header is determined, the relative positions of other red frames are determined (fig. 4), and the corresponding image content of each red frame in the layer 1 is taken for optical character recognition. The recognition result is filled into the structured data table shown in fig. 2.
In the first step, the region of the form image to be text-recognized is marked as a red frame in the form template, and the form template only contains the position information of the red frame.
In the second step, the form image is rotated to be in a normal position, and is enlarged or reduced to be matched with the size of the form template generated in the first step.
In the third step, the table image after affine transformation in the second step is specifically taken as a first layer, the table template generated in the first step is taken as a second layer, the second layer covers the first layer, optical character recognition is performed on the corresponding first layer part in the uppermost red frame of the second layer through optical recognition equipment, meanwhile, the brush roller 33 is matched with the air cylinder 31 to smooth the table paper to be recognized, the optical recognition precision is improved, if the recognition result is the same as the table name in the table image in the first step, the table position is found, if the recognition result is different from the table name in the table image in the first step, the second layer is slid to try a new position and perform optical character recognition again until the recognition result is the same as the table name, namely, the correct table head matching position is found.
In the fourth step, optical character recognition is specifically performed on the image content corresponding to the first image layer in all red frames of the second image layer.
Example two
As shown in fig. 6, the optical recognition apparatus includes a body 1; two sides of the body 1 are respectively provided with a paper inlet board 11 and a paper outlet board 12; the top of the body 1 is slidably connected with an identification module 14 through a sliding rail 13, the bottom of the identification module 14 is connected with a camera 15, a horn-shaped light shield 16 is arranged on the periphery of the camera 15, and a positioning groove 17 is formed in the position, corresponding to the light shield 16, of the top of the body 1; a group of rubber wheels 18 are rotatably connected to two sides of the positioning groove 17, and the rubber wheels 18 rotate in the same direction; an inclined plate 19 is fixedly connected to one side, far away from the body 1, of the paper feeding plate 11; a conical semipermeable membrane 2 is arranged in the light shield 16, the bottom end of the semipermeable membrane 2 is fixedly connected with the lower edge of the light shield 16, and the top of the semipermeable membrane 2 is fixedly connected with the periphery of the camera 15; a lamp panel 21 is arranged between the semipermeable membrane 2 and the light shield 16, and the lamp panel 21 is fixedly connected with the light shield 16 through a bracket; during the use, transport the form paper of waiting to discern to the constant head tank 17 through rubber wheel 18, afterwards through reducing the height of recognition module 14 for the light shield 16 falls the back and covers the form paper of waiting to discern, so as to shading, reduce the reflection of light that external light source shines and arouses on the paper of waiting to detect, increase the definition that camera 15 gathered the image, scatter the light that lamp plate 21 sent through semi-permeable membrane 2 simultaneously, make the even form paper of waiting to discern of illumination of lamp plate 21, further increase the optical recognition precision of text information on the form paper of waiting to discern.
As shown in fig. 7-8, the two sides of the positioning groove 17, which are close to the paper inlet plate 11 and the paper outlet plate 12, are in inclined transition with the top of the body 1; a cavity 22 is formed in the body 1 at a position corresponding to the positioning groove 17, and the cavity 22 is communicated with the bottom of the positioning groove 17 through a group of negative pressure holes 23; the two sides of the cavity 22 are respectively provided with an air suction hole 24 and an air inlet 25, the air suction hole 24 is communicated with a vacuum pump through a pipeline, and a compact sponge plug 26 is fixedly connected in the air inlet 25; after entering the positioning groove 17 through the paper, starting the vacuum pump to pump air out of the cavity 22, and then sucking the paper through the negative pressure hole 23, so that the paper is uniformly spread in the positioning groove 17, the flatness of the paper is increased, the recognition error caused by paper wrinkling is further reduced, meanwhile, slow air leakage of the sponge plug 26 is matched closely, the vacuum degree of the cavity 22 is reduced, the local sinking or damage of the paper caused by overlarge suction force of the negative pressure hole 23 to the paper is reduced, and the optical character recognition precision is further increased.
The top of the paper feeding plate 11 is provided with a shield 3, the top of the shield 3 is fixedly connected with a cylinder 31, and a piston rod of the cylinder 31 extends into the shield 3; the bottom end of the piston rod is hinged with a pair of connecting rods 32, one end of the connecting rod 32 far away from the piston rod is provided with a brush roller 33, and the brush roller 33 is rotationally connected with the end part of the connecting rod 32 through a rotating shaft 34; the inner wall of the shield 3 is provided with splayed arc grooves 35 at positions corresponding to the rotating shaft 34, and the arc grooves 35 are symmetrically arranged relative to the air cylinder 31; one end of the rotating shaft 34 is embedded into the arc-shaped groove 35 and is in sliding connection with the arc-shaped groove 35; a group of bristles 36 are uniformly distributed on the periphery of the brush roller 33, a winding roller 37 is fixedly connected to one side of the brush roller 33, a pull rope 38 is wound on the winding roller 37, and the other end of the pull rope 38 is fixedly connected to the inner wall of the shield 3 at the corresponding position of the bottom of the air cylinder 31; a tension spring 39 is fixedly connected to the middle parts of the two connecting rods 32; through putting into cardboard 11 top with the form paper that waits to discern, later start cylinder 31 promotes connecting rod 32 downwards, cooperation arc wall 35 makes brush roller 33 to cylinder 31 both sides slope downwardly sliding, and stay cord 38 drives brush roller 33 and rotates in opposite directions simultaneously, and then smoothes the paper of fold or turn-up, further increases the roughness of paper, increases form paper optical character recognition efficiency, reduces information omission.
As shown in fig. 9, the inside of the shield 3 is fixedly connected with a sealed water tank 4, the bottom of the water tank 4 is provided with a heating pipe 41, and the position of the water tank 4 close to the top is communicated with an exhaust pipe 42; a first hole 43 is formed in the rotating shaft 34, and the first hole 43 is communicated with the exhaust pipe 42 through a hose 44; the heating pipe 41 is used for heating water in the vertical direction to form water vapor, and then the water vapor is filled into the rotating shaft 34 through the exhaust pipe 42 to heat the rotating shaft 34 and the brush roller 33, so that the heated brush roller 33 is used for ironing paper, the flatness of the paper is further improved, the rebound and edge warping of the paper after passing through the brush roller 33 are reduced, and the optical character recognition precision is further improved.
The first hole 43 is a blind hole; an annular groove 45 is formed in the middle of the periphery of the brush roller 33, an elastic ring 46 is sleeved in the annular groove 45, and end surfaces of two ends of the elastic ring 46 are in sealing connection with the side wall of the annular groove 45; an annular air passage 47 is formed between the elastic ring 46 and the annular groove 45, and the air passage 47 is communicated with the bottom of the first hole 43 through a group of second holes 48; a group of sliding holes 49 are uniformly distributed on the inner circumference of the annular groove 45, steel balls 5 are embedded at the tops of the sliding holes 49, the steel balls 5 are fixedly connected with the bottoms of the sliding holes 49 through springs, and the sliding holes 49 are in sealing connection with the steel balls 5; one side of the steel ball 5 away from the spring is propped against the inner wall of the elastic ring 46; the steam is introduced into the air passage 47 through the first hole 43 and the second hole 48, so that the heating ironing efficiency of the steam on paper is improved, meanwhile, the contact between the steam and the paper is reduced, the wet softening damage of the paper is reduced, when the steam passes through the steel balls 5, the steam extrudes the steel balls 5, the steel balls 5 are attached again after being separated from the elastic ring 46 for a short time, the steam circulates continuously, and the shaking flattening efficiency of the elastic ring 46 and the brush hair 36 is further improved.
During operation, the form paper to be identified is transported to the positioning groove 17 through the rubber wheel 18, and then the height of the identification module 14 is reduced, so that the form paper to be identified is covered after the light shield 16 descends so as to shield light, reflection caused by the irradiation of an external light source on the form paper to be detected is reduced, the definition of the image acquired by the camera 15 is improved, meanwhile, the light emitted by the lamp panel 21 is scattered through the semi-permeable membrane 2, the form paper to be identified is uniformly illuminated by the lamp panel 21, and the optical identification precision of the text information on the form paper to be identified is further improved; after the paper enters the positioning groove 17, a vacuum pump is started to pump air out of the cavity 22, and then the paper is pumped through the negative pressure hole 23, so that the paper is uniformly spread in the positioning groove 17, the flatness of the paper is increased, the recognition error caused by paper wrinkling is further reduced, meanwhile, the slow air leakage of the dense sponge plug 26 is matched, the vacuum degree of the cavity 22 is reduced, the local concave or breakage of the paper caused by overlarge pumping force of the negative pressure hole 23 on the paper is reduced, and the optical character recognition precision is further increased; through putting the form paper to be identified into the top of the paper feeding board 11, then starting the air cylinder 31 to push the connecting rod 32 downwards, matching the arc-shaped groove 35 to enable the brush roller 33 to slide downwards towards the two sides of the air cylinder 31, and simultaneously driving the brush roller 33 to rotate reversely by the pull rope 38, so that the paper with folds or curled edges is smoothed, the flatness of the paper is further increased, the optical character identification efficiency of the form paper is increased, and information omission is reduced; the heating pipe 41 is used for heating water in the vertical direction to form water vapor, then the water vapor is filled into the rotating shaft 34 through the exhaust pipe 42, and the rotating shaft 34 and the brush roller 33 are heated, so that the heated brush roller 33 is used for ironing paper, the flatness of the paper is further improved, the rebound and edge warping of the paper after passing through the brush roller 33 are reduced, and the optical character recognition precision is further improved; the steam is introduced into the air passage 47 through the first hole 43 and the second hole 48, so that the heating ironing efficiency of the steam on paper is improved, meanwhile, the contact between the steam and the paper is reduced, the wet softening damage of the paper is reduced, when the steam passes through the steel balls 5, the steam extrudes the steel balls 5, the steel balls 5 are attached again after being separated from the elastic ring 46 for a short time, the steam circulates continuously, and the shaking flattening efficiency of the elastic ring 46 and the brush hair 36 is further improved.
The front, rear, left, right, up and down are all based on fig. 1 in the drawings of the specification, the face of the device facing the observer is defined as front, the left side of the observer is defined as left, and so on, according to the viewing angle of the person.
In the description of the present invention, it should be understood that the terms "center," "longitudinal," "lateral," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the scope of the present invention.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. A method of extracting structured information from a tabular image, characterized by: the method comprises the following steps:
the first step: customizing a form template: marking an area needing to be subjected to character recognition in the form image to generate a form template corresponding to the form;
and a second step of: affine transformation of the form image: affine transformation of the acquired form image into a form template generated in the first step;
and a third step of: matching the position of the header: matching the table template generated in the first step with the table head position in the table image after affine transformation in the second step;
fourth step: identifying the form image content: after the positions of the table heads are matched, other relative positions of the table templates in the third step are matched with corresponding positions in the table images after affine transformation in the second step, optical character recognition is carried out on the image contents of the positions, and recognition results including the positions of the table heads are filled in the structured data table;
in the third step, the table image after affine transformation in the second step is taken as a first layer, the table template generated in the first step is taken as a second layer, the second layer covers the first layer, optical character recognition is carried out on the corresponding first layer part in the uppermost red frame of the second layer through optical recognition equipment, meanwhile, the optical recognition precision is improved by smoothing the table paper to be recognized through a brush roller matched with an air cylinder, if the recognition result is the same as the table head name in the table head image in the first step, the table head position is found, if the recognition result is different, the second layer is slid to try a new position and carry out optical character recognition again until the recognition result is the same as the table head name, namely, the correct table head matching position is found;
the optical identification device comprises a body (1); two sides of the body (1) are respectively provided with a paper feeding plate (11) and a paper discharging plate (12); the top of the body (1) is slidably connected with an identification module (14) through a sliding rail (13), the bottom of the identification module (14) is connected with a camera (15), a horn-shaped light shield (16) is arranged on the periphery of the camera (15), and a positioning groove (17) is formed in the position, corresponding to the light shield (16), of the top of the body (1); a group of rubber wheels (18) are rotatably connected to two sides of the positioning groove (17), and the rubber wheels (18) rotate in the same direction; one side of the paper feeding plate (11) far away from the body (1) is fixedly connected with an inclined plate (19); a conical semipermeable membrane (2) is arranged in the light shield (16), the bottom end of the semipermeable membrane (2) is fixedly connected with the lower edge of the light shield (16), and the top of the semipermeable membrane (2) is fixedly connected with the periphery of the camera (15); a lamp panel (21) is arranged between the semipermeable membrane (2) and the light shield (16), and the lamp panel (21) is fixedly connected with the light shield (16) through a bracket;
the two sides of the positioning groove (17) close to the paper feeding plate (11) and the paper discharging plate (12) are in inclined transition with the top of the body (1); a cavity (22) is formed in the body (1) at a position corresponding to the positioning groove (17), and the cavity (22) is communicated with the bottom of the positioning groove (17) through a group of negative pressure holes (23); two sides of the cavity (22) are respectively provided with an air suction hole (24) and an air inlet hole (25), the air suction hole (24) is communicated with a vacuum pump through a pipeline, and a compact sponge plug (26) is fixedly connected in the air inlet hole (25);
the top of the paper feeding plate (11) is provided with a shield (3), the top of the shield (3) is fixedly connected with a cylinder (31), and a piston rod of the cylinder (31) extends into the shield (3); the bottom end of the piston rod is hinged with a pair of connecting rods (32), one end, far away from the piston rod, of the connecting rods (32) is provided with a brush roller (33), and the brush roller (33) is rotationally connected with the end part of the connecting rods (32) through a rotating shaft (34); arc grooves (35) which are arranged in a splayed manner are formed in the inner wall of the shield (3) at positions corresponding to the rotating shaft (34), and the arc grooves (35) are symmetrically arranged relative to the air cylinder (31); one end of the rotating shaft (34) is embedded into the arc-shaped groove (35) and is in sliding connection with the arc-shaped groove (35); a group of bristles (36) are uniformly distributed on the periphery of the brush roller (33), a winding roller (37) is fixedly connected to one side of the brush roller (33), a pull rope (38) is wound on the winding roller (37), and the other end of the pull rope (38) is fixedly connected to the inner wall of the shield (3) at the corresponding position of the bottom of the air cylinder (31); tension springs (39) are fixedly connected to the middle parts of the two connecting rods (32).
2. A method of extracting structured information from a tabular image as claimed in claim 1 wherein: in the first step, the region of the form image to be text-recognized is marked as a red frame in the form template, and the form template only contains the position information of the red frame.
3. A method of extracting structured information from a tabular image as claimed in claim 2 wherein: in the second step, the form image is rotated to be in a normal position, and is enlarged or reduced to be matched with the size of the form template generated in the first step.
4. A method of extracting structured information from a tabular image as claimed in claim 1 wherein: in the fourth step, optical character recognition is specifically performed on the image content corresponding to the first image layer in all red frames of the second image layer.
5. A method of extracting structured information from a tabular image as claimed in claim 1 wherein: a sealed water tank (4) is fixedly connected in the shield (3), a heating pipe (41) is arranged at the bottom of the water tank (4), and an exhaust pipe (42) is communicated with the position, close to the top, of the water tank (4); a first hole (43) is formed in the rotating shaft (34), and the first hole (43) is communicated with the exhaust pipe (42) through a hose (44).
6. The method of extracting structured information from a tabular image of claim 5, wherein: the first hole (43) is a blind hole; an annular groove (45) is formed in the middle of the periphery of the brush roller (33), an elastic ring (46) is sleeved in the annular groove (45), and end faces at two ends of the elastic ring (46) are connected with the side wall of the annular groove (45) in a sealing mode; an annular air passage (47) is formed between the elastic ring (46) and the annular groove (45), and the air passage (47) is communicated with the bottom of the first hole (43) through a group of second holes (48); a group of sliding holes (49) are uniformly distributed on the inner circumference of the annular groove (45), steel balls (5) are embedded at the tops of the sliding holes (49), the steel balls (5) are fixedly connected with the bottoms of the sliding holes (49) through springs, and the sliding holes (49) are connected with the steel balls (5) in a sealing mode; the side of the steel ball (5) far away from the spring is propped against the inner wall of the elastic ring (46).
CN202111393543.0A 2021-11-23 2021-11-23 Method for extracting structured information from form image Active CN114120302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111393543.0A CN114120302B (en) 2021-11-23 2021-11-23 Method for extracting structured information from form image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111393543.0A CN114120302B (en) 2021-11-23 2021-11-23 Method for extracting structured information from form image

Publications (2)

Publication Number Publication Date
CN114120302A CN114120302A (en) 2022-03-01
CN114120302B true CN114120302B (en) 2023-04-21

Family

ID=80439923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111393543.0A Active CN114120302B (en) 2021-11-23 2021-11-23 Method for extracting structured information from form image

Country Status (1)

Country Link
CN (1) CN114120302B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798273A (en) * 2016-08-31 2018-03-13 格科微电子(上海)有限公司 The method for improving optical finger print recognition performance
CN109033282A (en) * 2018-07-11 2018-12-18 山东邦尼信息科技有限公司 A kind of Web page text extracting method and device based on extraction template
US10572725B1 (en) * 2018-03-30 2020-02-25 Intuit Inc. Form image field extraction

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108335579A (en) * 2017-12-26 2018-07-27 昆山遥矽微电子科技有限公司 Text region book reading machine
CN214449596U (en) * 2020-12-24 2021-10-22 河南美图印刷有限公司 Digital printing machine with text detection function
CN113569677A (en) * 2021-07-16 2021-10-29 国网天津市电力公司 Paper test report generation method based on scanning piece

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798273A (en) * 2016-08-31 2018-03-13 格科微电子(上海)有限公司 The method for improving optical finger print recognition performance
US10572725B1 (en) * 2018-03-30 2020-02-25 Intuit Inc. Form image field extraction
CN109033282A (en) * 2018-07-11 2018-12-18 山东邦尼信息科技有限公司 A kind of Web page text extracting method and device based on extraction template

Also Published As

Publication number Publication date
CN114120302A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN114120302B (en) Method for extracting structured information from form image
CN206977532U (en) A kind of Multi-functional scanning platform
TWM353387U (en) Image capturing and transferring apparatus with uniform light
CN105469513A (en) Self-service all-in-one machine based on face detection and character recognition and using method thereof
CN109255901A (en) Billing machine
CN213518291U (en) Character and picture identification system
CN106384418A (en) Intelligent visitor machine
US10843499B2 (en) Printer for official documents
CN111680652A (en) Financial statement checking and error correcting method and device
WO2016008303A1 (en) Hanging-type self-service printing device
CN205566460U (en) Certificate shines collection equipment
CN206506577U (en) High photographing instrument with absorption flattening device
CN206775606U (en) Accounting bill scanning means based on internet intelligent terminal
CN111301003A (en) System and method for printing certificate
CN102223460A (en) Nondestructive quick file photographing instrument
CN207198546U (en) A kind of dept. of radiology read tablet device
CN205845057U (en) A kind of identity card reader carrying testimony of a witness comparison function
CN113658385A (en) Automatic page turning and certificate printing equipment for real property certificate
RU127977U1 (en) UNIVERSAL READER OF PASSPORT AND VISA DOCUMENTS
CN206442437U (en) Automatically adjust the high photographing instrument of photographing module height
CN104469073A (en) Portable file scanning device
CN202772977U (en) Scanning/shooting apparatus having projection function
CN111815865A (en) Self-service equipment for government affair service and control method thereof
CN215792543U (en) Energy-saving and environment-friendly stamping machine
CN206283576U (en) Papery test paper data inputting analysis system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant