CN105144195A - Parsing and rendering structured images - Google Patents

Parsing and rendering structured images Download PDF

Info

Publication number
CN105144195A
CN105144195A CN201480009496.7A CN201480009496A CN105144195A CN 105144195 A CN105144195 A CN 105144195A CN 201480009496 A CN201480009496 A CN 201480009496A CN 105144195 A CN105144195 A CN 105144195A
Authority
CN
China
Prior art keywords
data file
image
structured
bounding box
expression formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201480009496.7A
Other languages
Chinese (zh)
Inventor
S·吉尔瓦尼
O·波洛佐夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Publication of CN105144195A publication Critical patent/CN105144195A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Systems and methods for generating a tuple of structured data files are described herein. In one example, a method includes detecting an expression that describes a structure of a structured image using a constructor. The method can also include using an inference-rule based search strategy to identify a hierarchical arrangement of bounding boxes in the structured image that match the expression. Furthermore, the method can include generating a first tuple of structured data files based on the identified hierarchical arrangement of bounding boxes in the structured image.

Description

Resolve and present structured image
Background technology
Many software editing application allow to modify to conventional images.In the application of these software editings, some can identify the object in conventional images to natural image application signal processing algorithm.But signal processing algorithm may not resolve some image exactly.
Summary of the invention
Provided below is the general introduction of simplification, to provide the basic comprehension to aspects more described herein.This general introduction is not the extensive overview of claim theme.This general introduction is neither intended to identify the key of theme required for protection or important element, is not also intended to the scope describing theme required for protection.Unique object of this general introduction is some concept presenting theme required for protection in simplified form, as the prelude of the more detailed description presented after a while.
An embodiment provides a kind of method for generating structured data file tuple, and method comprises the expression formula of the structure detecting description scheme image.Method also comprises using and identifies based on the search strategy of rule of inference the bounding box layered arrangement matched with described expression formula in described structured image.In addition, method comprises and generates the first structured data file tuple based on the bounding box layered arrangement in identified structured image.
Another embodiment provides the one or more computer-readable recording mediums for generating structured data file tuple comprising multiple instruction.Instruction causes processor to detect the expression formula using constructed fuction to carry out the structure of description scheme image.Instruction also causes processor to use the search strategy based on rule of inference to identify the bounding box layered arrangement matched with described expression formula in described structured image.In addition, instruction causes processor to generate the first structured data file tuple based on the layered arrangement of the bounding box in identified structured image, and wherein said first structured data file tuple comprises the first data file be made up of contents value and the second data file be made up of the style properties relevant with described contents value.
Still another embodiment provides a kind of system for generating structured data file tuple, system comprises the processor performing processor executable code, and the memory device of storage of processor executable code.Processor executable code causes processor to detect the expression formula using constructed fuction to carry out the structure of description scheme image.Processor executable code also causes processor to use the search strategy based on rule of inference to identify the bounding box layered arrangement matched with described expression formula in described structured image.In addition, processor executable code causes processor to generate the first structured data file tuple based on the layered arrangement of the bounding box in identified structured image, and wherein said first structured data file tuple comprises the first data file be made up of contents value and the second data file be made up of the style properties relevant with described contents value.
Accompanying drawing explanation
Can understand following detailed description better by reference to each accompanying drawing, each accompanying drawing comprises the concrete example of numerous features of disclosed theme.
Fig. 1 is can based on expression parsing and the block diagram of example of computing system presenting structured image;
Fig. 2 is the process flow diagram flow chart of the exemplary method for generating data file;
Fig. 3 is the process flow diagram flow chart for generating the exemplary method of new image based on two conventional images;
Fig. 4 is the process flow diagram flow chart for the exemplary method based on the new image of modified Generating Data File;
Fig. 5 is the example chart that can describe with the programming language of operating structure image;
Fig. 6 shows the example of top-down inference rule and inference rule from bottom to top;
Fig. 7 is the example pearl figure that constructed fuction can be used to be described by expression formula; And
Fig. 8 illustrates resolve and present the block diagram of the tangible computer-readable recording medium of structured image.
Embodiment
There has been described the various methods for operating structure image.Structured image alleged herein can comprise the image of any pixel with layering or repetitive structure.In certain embodiments, structured image is two-dimensional array, and wherein each pixel represents particular color.In some instances, structured image can comprise four sub-chess chessboards, pearl G-Design, Japanese word filling, checkerboard, mathematics shop order, crossword puzzle plate, spelling cribbage-board, histogram or data form etc.
In one embodiment, programming language allows various application implementation data pick-up, picture editting and image creation etc.Data pick-up alleged herein can comprise use expression formula to represent the image as structural data tuple.In certain embodiments, structural data tuple can represent attribute or the characteristic of each pixel of structured image, such as color, shape or size etc.Expression formula alleged herein can comprise the specific region of description scheme image or any suitable number attribute of pixel.In certain embodiments, picture editting can comprise use expression formula structural data resolved to structural data tuple or present structured image according to structural data tuple.In certain embodiments, picture editting also can comprise amendment structural data tuple, and this structural data tuple uses additional structural data tuple or additional expression formula etc. to represent structured image.The image creation of indication can comprise any proper number expression formula or any suitable structural data tuple that can be combined to present new images herein.
In certain embodiments, the programming language of such as different because of territory language etc. and so on can allow to use Standard Order, structure and uniform type constructed fuction that the hierarchy of structured image is expressed as expression formula.In certain embodiments, programming language can be support for structured image being resolved to data file and data file being rendered as the bi-directional language of operation of structured image.In some instances, parse operation can based on inference rule and the search strategy based on dynamic programming.In one example, parse operation can comprise any suitable hierarchicabstract decomposition finding structured image based on rectangular area.Such as, parse operation can use the profile provided by bottom contour detecting algorithm to come search rectangular region.Parse operation also can use the combination of inference rule and from bottom to top to carry out the profile lacked in detection architecture image from top to bottom.
As quoted passage, some accompanying drawings are to be called as the context of one or more structural components of function, module, feature, element etc. to describe each concept.Each assembly shown in accompanying drawing can realize in various manners, such as, by software, hardware (such as, separate logic components etc.), firmware etc., or the combination in any of these embodiments.In one embodiment, each assembly can reflect corresponding assembly use in practical implementations.In other embodiments, any single component shown in accompanying drawing can be realized by multiple actual component.The description of two or more separation component any in accompanying drawing can reflect the difference in functionality performed by single actual component.Fig. 1 discussed below provides the details relevant with the system that can be used for realizing the function shown in accompanying drawing.
Other accompanying drawings describe concept in flow diagram form.In this format, some operation is described to form the different frame performed with a certain order.Such realization is illustrative rather than restrictive.Some frame described herein can be grouped in together and perform in single operation, and some frame can be divided into multiple component blocks, and some frame can by from shown here go out different order perform (comprise and perform these frames in a parallel fashion).Frame shown in process flow diagram can be realized by software, hardware, firmware, manual handle etc. or these any combinations realized.As used herein, hardware can comprise discreet logic assembly of computer system, such as special IC (ASIC) and so on etc. and their combination in any.
About term, phrase " is configured to " contain any mode that the structural components that can construct any type performs identified operation.Structural components can be configured to use software, hardware, firmware etc. or its combination in any to carry out executable operations.
Any function for executing the task contained in term " logic ".Such as, each operation shown in process flow diagram corresponds to the logic for performing this operation.Operation can use software, hardware, firmware etc. or its combination in any to perform.
As used herein, term " assembly ", " system ", " client computer " etc. are intended to refer to the relevant entity of computing machine, and they can be hardware, (such as, executory) software and/or firmware or its combination.Such as, assembly can be, the process run on a processor, object, executable code, program, function, storehouse, subroutine, and/or the combination of computing machine or software and hardware.By example, both the application and service devices run on the server can be all assemblies.One or more assembly can be stationed in process, and assembly and/or can be distributed between two or more computing machines on a computing machine.
In addition, theme required for protection can use and produce computer for controlling and be implemented as method, device or goods to realize the standard program of the software of disclosed theme, firmware, hardware or its combination in any and/or engineering.Term as used herein " goods " can comprise can from the computer program of any tangible computer readable device or medium access.
Computer-readable recording medium can include but not limited to magnetic storage apparatus (such as, hard disk, floppy disk and tape etc.), CD (such as, compact-disc (CD) and digital versatile disc (DVD) etc.), smart card and flash memory device (such as, block, rod and Keyed actuator etc.).On the contrary, computer-readable medium (that is, non-storage media) generally additionally can comprise communication media, such as the transmission medium of wireless signal and so on.
Fig. 1 is can based on expression parsing or the block diagram of example of computing system presenting structured image.Computing system 100 can be such as mobile phone, laptop computer, desk-top computer or flat computer etc.Computing system 100 can comprise joined fit into execution the processor 102 of instruction that stores, and store the memory devices 104 of the instruction that can be performed by processor 102.Processor 102 can be single core processor, polycaryon processor, calculating are trooped or other configurations of any amount.Memory devices 104 can comprise random access memory (such as, SRAM, DRAM, zero capacitor RAM, SONOS, eDRAM, EDORAM, DDRRAM, RRAM, PRAM, etc.), ROM (read-only memory) (such as, MaskROM, PROM, EPROM, EEPROM etc.)), flash memory or any other suitable storage system.The instruction performed by processor 102 can be used to resolve and present structured image.
Processor 102 also by system bus 106 (such as, PCI, ISA, PCI-Express, , NuBus, etc.) be connected to I/O (I/O) equipment interface 108, this interface 108 is joined fits as computing system 100 is connected to one or more I/O equipment 110.I/O equipment 110 can comprise, and such as, keyboard, gesture identification input equipment, speech recognition apparatus and sensing equipment, wherein sensing equipment can comprise touch pad or touch-screen etc.I/O equipment 110 can be the installed with built-in component of computing system 100, can be maybe the equipment being connected to computing system 100 from outside.
Processor 102 also links to display apparatus interface 112 by system bus 106, and this interface 112 is joined fits as computing system 100 is connected to display apparatus 114.Display apparatus 114 can comprise display screen, and it is the installed with built-in component of computing system 100.Display apparatus 114 also can comprise be connected to computing system 100 from outside computer monitor, televisor or projector etc.Network interface unit (NIC) 116 can also be joined to fit into, by system bus 106, computing system 100 is connected to network (not shown).
Store 118 and can comprise hard disk drive, CD drive, USB flash memory driver, drive array or its combination in any.Storer 118 can comprise parser modules 120 and renderer module 122.Parser modules 120 can any proper number bounding box in Identifying structured image and identify the hierarchy of bounding box.In certain embodiments, parser modules 120 also can the element that mates with expression formula in bounding box of Identifying structured image or region.In some instances, bounding box can represent the border around the region of structured image.Such as, mosaic block can comprise several region, and these regions have the border indicated by particular color.Element can refer to the discrete cell with particular color or value etc. or the pixel of image.In some instances, structured image can be represented by expression formula, and expression formula can element in Identifying structured image.The example of expression formula is discussed in more detail below in conjunction with Fig. 5.In one example, structured image can be represented by expression tree, and expression tree can comprise any proper number expression formula.
In certain embodiments, the data value of any proper number the element of resolver 120 also in available structured image generates data file tuple.Such as, data file can the color of each pixel of indicating structure image or the value that is associated with each pixel of structured image.The example generating data file is discussed in more detail below in conjunction with Fig. 2.
Renderer module 122 can accepted data file tuple as input and generating structured image.In certain embodiments, data file can comprise the value of any proper number element of structured image, or data file can comprise the style properties of the element of structured image.Such as, data file can interval between the pixel in indicating structure image, the width on border in structured image between element-specific or the position of instruction pixel in the plane of any proper number dimension of pixel Cartesian coordinates etc.In certain embodiments, renderer module 122 also can generate structured image based on modified data file, modified expression formula and multiple data file.The example generating structured image by renderer module 122 is discussed in more detail below in conjunction with Fig. 3-6.
Be appreciated that the block diagram of Fig. 1 is not intended to instruction computing system 100 and will comprises all components shown in Fig. 1.On the contrary, computing system 100 can comprise unshowned extra assembly in less or Fig. 1 (such as, application, other module, other memory devices, other network interface etc.) in addition.In addition, any one function of parser modules 120 or renderer module 122 partially or even wholly with hardware and/or can realize in processor 102.Such as, function can utilize special IC to realize, realizes or realize with any other equipment with the logic realized in processor 102.
Fig. 2 is the process flow diagram flow chart of the exemplary method for generating data file.Method 200 can realize with any computing equipment (computing equipment 100 of such as Fig. 1).
At frame 202, parser modules 120 can detect layering expression formula (herein also referred to as expression formula), the layered arrangement of this layering expression formula description scheme image.Expression formula can be used to image analysis to become structured data file tuple.In certain embodiments, analytic structure image can comprise the layered arrangement matched with expression tree of mark bounding box.Expression tree alleged herein can comprise union, structure, sequence or leaf node etc.As discussed above, bounding box can comprise any appropriate area of structured image.Say that the region of title can comprise any right quantity unit or the pixel of structured image herein.In certain embodiments, parser modules 120 can identify bounding box by the profile in structure based image.Profile described herein refers to the closed multi-section-line formed by the edge in region.Such as, profile can the square boundary of encircled area in indicating structure image, and wherein the unit of square boundary or pixel share identical color or value.In certain embodiments, the profile in the next Identifying structured image of the image processing techniques of such as contour detecting and so on can be used.Contour detecting a kind ofly from structured image, identifies to have the method for the closed region of prominent edge.In certain embodiments, expression formula can be matched with each profile in structured image by parser modules 120.
At frame 204, parser modules 120 can use the search strategy based on inference rule to carry out the bounding box layered arrangement matched with expression formula in Identifying structured image.Can make parser modules 120 can hierarchical relational in detection architecture image between any proper number bounding box based on the search strategy of inference rule.In certain embodiments, parser modules 120 uses the bounding box of the comparable contour detecting of search strategy more effectively in Identifying structured image, knows because contour detecting can generate a large amount of correcting errors or negatively to know by mistake.Such as, if contour detecting algorithm does not also have to adjust for certain structured image especially, then contour detecting may not describe the circumference of bounding box exactly.Correcting errors of indication is known corresponding to being captured as the region of the structured image of bounding box improperly herein.The negative of indication knows the region corresponding to the structured image not being identified as bounding box by mistake herein.
In certain embodiments, the search strategy based on inference rule can use expression formula and bounding box, and recursively expression formula can be matched with each region in bounding box.In some instances, the search strategy based on inference rule identifies bounding box based on the element description in expression formula.Element describes can describe the cell value or style properties that are associated with the unit in bounding box.The search strategy hierarchical relational that can comprise between each unit of structure based image and the corresponding bounding box of structured image based on inference rule carrys out analytic structure image.In certain embodiments, the search strategy based on inference rule identifies bounding box based on profile and the reasoning from the description deduction expression formula.Reasoning can comprise any suitable top-down matched rule or matched rule from bottom to top, and these rules will describe in more detail following.
At frame 206, parser modules 120 can generate structured data file tuple based on the bounding box layered arrangement in identified structured image.In certain embodiments, data file can comprise corresponding to any proper number unit of structured image or any proper number value in region or style properties.In some instances, data file can make structured image to be edited.Such as, structured image can represent form or pearl G-Design image etc.Expression formula can generate based on the example of structured image before, or can obtain expression formula from user.In one example, expression formula can describe the structural data of such as form and so on, such as the vertical series of form caption, a row of column headings and table row.In some instances, form caption can comprise have non-predominant height, across the text in the cell of the width of structured image.Every a line of form can comprise any suitable horizontal cell lattice sequence, and wherein each cell can be the rectangle having Unknown Background and unknown content and have the border of any type.The content of each cell can be marked as data to be extracted at (comprising title).Parser modules 120 can use expression formula that structured image is resolved to individual data file, retains semantic aligning simultaneously.
The step that the processing flow chart of Fig. 2 is not intended to indicating means 200 will perform with any particular order, or under any circumstance all comprise the Overall Steps of method 200.Further, depend on specific application, the extra step of any amount can be included in method 200.In some instances, parser modules 120 can use expression tree to the data identified in any proper number image and the data from each image is stored in individual data file.Such as, parser modules 120 can generate the individual data file of the data comprised from the arbitrary number file matched with particular expression formula or expression tree.
Fig. 3 is the process flow diagram flow chart for generating the exemplary method of new image based on two conventional images.Method 300 can realize with any suitable computing equipment (computing equipment 100 of such as Fig. 1).
At frame 302, parser modules 120 can generate the first data file based on some attribute (herein also referred to as P1) of the image using expression formula to extract during resolving.First data file can comprise any proper number unit of image or the value of pixel.Such as, the first data file can comprise any suitable attribute P1, the unit of such as image or the contents value etc. of pixel.In some instances, the contents value of the first data file can comprise the data value etc. from the unit in chart or form.
In one embodiment, parser modules 120 does not generate data file by all possible bounding box in enumeration structure image.On the contrary, parser modules 120 can take the top-down search strategy comprising top-down inference rule or the next searching structured image of description comprised in the use of the search strategy from bottom to top expression formula of inference rule from bottom to top.Top-down search strategy and search strategy from bottom to top will discuss in more detail below in conjunction with Fig. 6.
At frame 304, parser modules 120 can generate the second data file based on the adeditive attribute of the image using expression formula to extract during resolving (herein also referred to as P2).Such as, the second data file can comprise attribute P2, and attribute P2 comprises the style properties of bounding box, the aligning of the content in the color of such as bounding box, the shape of bounding box or bounding box.In certain embodiments, parser modules 120 can adopt and generate the second data file corresponding to the data value in the data tree of expression tree and the first data file.
At frame 306, parser modules 120 can use expression formula to generate the 3rd data file of the attribute P1 comprising the second structured image.In some instances, the attribute P1 of the second structured image can use the expression formula identical with the expression formula of the attribute P1 detected in the first structured image to detect.At frame 308, parser modules 120 can use expression formula to generate the 4th data file of the attribute P2 comprising the second structured image.In certain embodiments, the attribute P2 of the second structured image also can use the expression formula of the attribute P2 detected in the first structured image to detect.
At frame 310, renderer module 122 can present the 3rd structured image from the first data file based on the first structured image and the 4th data file based on the second structured image.Such as, present image can comprise data file combined with from the contents value (also referred to as attribute P1) in the bounding box in the first structured image and by data file with combine from the style properties (also referred to as attribute P2) that the bounding box in the second structured image is relevant.
The step that the processing flow chart of Fig. 3 is not intended to indicating means 300 will perform with any particular order, or under any circumstance all comprise the Overall Steps of method 300.Further, depend on specific application, the extra step of any amount can be included in method 300.
Fig. 4 is the process flow diagram flow chart for the exemplary method based on the new image of modified Generating Data File. method 400 can realize with any suitable computing equipment (computing equipment 100 of such as Fig. 1).
At frame 402, parser modules 120 can generate the structured data file tuple of the first structured image based on different images attribute, this structured data file tuple comprises the first data file and the second data file.In certain embodiments, the first data file can generate based on content of text (contents value in such as each bounding box).As discussed above, the data value during contents value can comprise in chart or form etc. cell.In some instances, the second data file can comprise the style properties of each bounding box, aligning of the content in the color of such as bounding box, the shape of bounding box or bounding box etc.
At frame 404, parser modules 120 can detect the amendment of the tuple for structured data file.In certain embodiments, contents value in structured data file tuple is included in or style properties can be modified.Such as, the first data file can comprise the style properties of the color of each unit in indicating structure image.In some instances, style properties can be modified with the different colours of each unit of indicating structure image.In certain embodiments, the first data file can comprise the contents value corresponding to each cell in form.In some instances, contents value can be modified, and makes the form presented according to contents value can comprise new contents value.
At frame 406, renderer module 122 can adopt modified structured data file tuple to present the second structured image.In certain embodiments, renderer module 122 can adopt can put upside down parse operation present operation to present the second structured image.Such as, image can present according to expression tree and any proper number data file.In certain embodiments, present operation and can expression formula be become the image of newly establishment with any proper number data File Mapping by the property value in usage data file.In some instances, data file can adopt parse operation and suitable expression formula to generate.In one example, can modify an arbitrary number data file and/or the expression formula generating this data file.Update Table file is to allow picture editting's (by changing data file or expression formula) and image creation (by mixing and mate to having the data file obtained the parsing of the different images of identical expression formula).
In one example, pearl G-Design can be modified.Such as, the color of pearl G-Design can be replaced by different color-set.Pearl G-Design can use any suitable expression formula to describe.In certain embodiments, pearl G-Design can comprise any proper number unit in any proper number vertical row sequence and every a line.In some instances, each unit can comprise a kind of different colours, and each sequence can have the constant clearance between each unit.In addition, each unit can have certain and fixes but the border of the width of the unknown and color.
Parser modules 120 can use the expression formula of description pearl G-Design and generate the data file comprising the color of each unit of this pearl G-Design.In some instances, the expression formula describing pearl G-Design can be modified and make the color set for unit background be restricted to specific color set.In other examples, between the unit that the expression formula describing pearl G-Design can comprise increase, gap is to hold the mosaic block with more thick rim.Renderer module 122 can use modified expression formula and original data file collection to generate new mosaics schemes subsequently.
The step that the processing flow chart of Fig. 4 is not intended to indicating means 400 will perform with any particular order, or under any circumstance all comprise the Overall Steps of method 400.Further, depend on specific application, the extra step of any amount can be included in method 400.
Fig. 5 is the example chart that can describe with the language different because of territory of operating structure image.In certain embodiments, chart can comprise theme 502, column heading 504, row headers 506 and data cells 508.In certain embodiments, the language different because of territory describing chart 500 can comprise expression formula (herein also referred to as element expression), and this expression formula can be used to structured image be converted to tree data file tuple (this conversion is also referred to as parse operation herein).In certain embodiments, the language different because of territory also can comprise element expression, and tree data file tuple can be converted to structured image (be also referred to as and present operation) herein by this expression formula.In some instances, element expression can any proper number region of Identifying structured image or unit.Element expression also can be recursively defined as structure, sequence, union or leaf type etc.
Describe can description scheme image the language different because of territory an example before, the following provide the example of expression formula representing chart 500.
struct(<Top,Elem(Descr(X:=0,U:=0,Width:=$.Width,Height:=?),E 1)>,<Head,Elem(Descr(X:=0,Y:=Top.Height,Width:=$.Width,Height:=?),E2)>,<Cells,Elem(Descr(X:=0,Width:=$.Width,Y:=Top.Height+Head.Height,Height:=?),E 3)>),
whereE 1=Elem(Descr(Content:=out?),Rectangle),E 2=Elem(Descr(Gap:=*),HSeq(?,E 1)),E 3=Elem(Descr(Gap:=*),VSeq(?,E 2))
In some instances, structured expression can represent constructed fuction Struct (<S 1,e v, 1> ..., <S m, E v,m>).Constructed fuction Struct can comprise any proper number subexpression Ev, and 1 ..., Ev, m, subexpression can be called as the symbol of this structure.Each symbol Ev, j can mark with corresponding designation Sj.In some instances, designation can be called as description in attribute, such as PropGet constructed fuction.In certain embodiments, symbol " $ " can represent a structure.Acquiescence, if do not provide aligning attribute in expression formula, element can be occupied the space of the bounding box of his father's element by hypothesis.
Union expression formula U can be represented as constructed fuction, such as Choice (E v, 1..., E v,m).Union expression formula can comprise the subexpression of any proper number or alternative, such as E v, 1..., E v,m.Parse operation can attempt carrying out analytic structure image with each subexpression in union expression formula.In some instances, the first subexpression of the bounding box in matching structure image can be identified as the value of union expression formula.
In some instances, describing " Descr " is Feature assignment collection, and it can specify the value of each attribute.Available attributes title collection can be depending on the type of the element be described.In certain embodiments, there are two attribute classifications: aim at attribute and data attribute.Aim at that attribute can adopt such as X, Y, width, highly, the form in center X, center Y, radius and gap etc. and so on comes the position of descriptive element in structured image.Data attribute can use the form of such as content, color, background, border, border width etc. and so on to carry out the style properties of descriptive element.
In certain embodiments, Feature assignment available label title marks.In some instances, the Feature assignment useful configuration function be marked represents, such as constructed fuction TaggedAssign (v j, s, p).In one example, the Feature assignment useful configuration function be not marked represents, such as constructed fuction Assign (s, p).In the example present, Property Name p can be the Attribute expression be assigned, and vj can be optional bookmark name.Constructed fuction TaggedAssign (v j, s, p) and instruction is during the parse operation of each example through resolving of assigning for TaggedAssign, and the value of the estimation of p will be stored in the output data file of correspondence.
Data file is the output of parse operation, and is a part for the input presenting operation.Data file can comprise the value through the Feature assignment of mark in expression formula.Data file can comprise tree, and the node of tree comprises the mapping to the contents value of their correspondences of expression formula attribute or feature or field name.In some instances, the shape of the tree in the data file generated by resolving image by expression formula can be the shape identical with the expression tree for resolving this image.
In certain embodiments, algorithm calculations is included in Attribute expression, constant constructed fuction and the constructed fuction enumerated.The expression formula that makes the constructed fuction enumerated can detect the possible values collection of attribute.In some instances, when belonging to the possible attribute value set in the constructed fuction enumerated when the respective attributes value of the unit in image, parse operation indicates this unit to be matching unit.
In some instances, unknown properties with such as "? " or the symbol of " * " and so on represents.In one example, symbol "? " can represent a known variables, and symbol " * " can represent operationally by the unknown constant determined.Ordinal expression S can use such as Seq (O, C, E v) and so on constructed fuction represent.In one example, order can comprise any proper number and E vthe subexpression of identical type.Subexpression can comprise any expression formula in expression tree, and subexpression can be aimed at according to specific direction O by the row in indicating image, this specific direction O can be level also can be vertical.
In certain embodiments, constructed fuction Seq (Horizontal, C, E v) HSeq (C, Ev) can be depicted as, and constructed fuction Seq (Vertical, C, E v) VSeq (C, E can be depicted as v).In some instances, the element number in sequence can be specified via counting expression formula variable C.In one example, the scope constructed fuction of scope that the value of expression formula can be configured to constant constructed fuction, known variables or provide probable value is counted.In certain embodiments, the number of the unit during parse operation can determine in structured image sequence.In one example, the number of the unit in sequence can be stored as the count attribute in data file.In some instances, count constructed fuction Const (k) can be depicted as " K ".Certain attribute in the description of IF expression tree or expression formula is designated as known variables, then the value of this known variables can be determined when the operation of parse operation.In certain embodiments, the sequence gap that the form of " Gap " in the description of such as expression formula and so on can be used to come between the unit of indicating structure image.
Leaf can represent the minimum divisible part of structured image.In some instances, leaf can comprise the value be associated with rectangular shape or round-shaped etc.In some instances, bounding box can comprise rectangular area or the leaf of image.In one example, if the respective image attribute of the description of the image attributes in bounding box and expression formula matches, then the description of bounding box and expression formula matches.In certain embodiments, the region of image or bounding box can comprise a kind of inner boundary of pixel of color or the outer boundary of pixel.In other words, two kinds of different colours can not be there are in the given number pixel in the border of bounding box, or two kinds of different colours can not be there are in a given number pixel outside the border of bounding box.
In certain embodiments, the region in structured image or bounding box can comprise a kind of pixel of particular color.In some instances, the border of region or bounding box can comprise various different colours.In one example, the border with the bounding box of shades of colour can indicate the object on the border of the frame that passes across the border, and therefore bounding box can not be regarded as an object be separated.Such as, form can comprise the black and white text of various mathematic(al) representation.If bounding box is identified as have black and white pixel in border simultaneously, then parser modules 120 can detect this boundary crossings arithmetic expression.In certain embodiments, the number of the potential bounding box in image may be no more than the number of the bounding box that the number of the profile detected in image and inference rule identify.
Fig. 6 shows the example of top-down inference rule and inference rule from bottom to top.In certain embodiments, the parser modules of top-down inference rule and the parser modules 120 that can adopt such as Fig. 1 of inference rule from bottom to top and so on realizes.
In certain embodiments, parse operation can accept expression formula E vwith image I, and image I is resolved to the tree of bounding box, this tree comprises can be stored in any proper number data file d 1..., d nin the property value of Feature assignment through mark.In some instances, parse operation can identify the layered arrangement of the bounding box matched with expression formula.In certain embodiments, bounding box is identified by the image processing techniques of such as contour detecting and so on.Contour detecting a kind ofly from image, identifies to have the method for the closed region of prominent edge.As discussed above, the edge (it forms the multi-section-line closed) in such region is called as profile.In certain embodiments, parse operation comprises by attempting expression formula being matched with the profile through mark that each profile through mark searches for bounding box.But if contour detecting algorithm is not for the controlled words of specific image type, then contour detecting can generate a large amount of correct errors knowledge and negative knowledge by mistake.In some instances, if the area of the symmetric difference between profile and bounding box is no more than the image size of certain percentage, then profile can be regarded as close to bounding box b.
In certain embodiments, parse operation adopts inference rule and identifies bounding box based on the search strategy of dynamic programming.Inference rule can comprise top-down inference rule and inference rule from bottom to top.Search strategy based on dynamic programming can solve that may be derived from corrects errors in a large number to be known and maybe can not identify which profile and correspond to the low efficiency problem of particular expression formula.
In certain embodiments, top-down inference rule implements top-down analytic method.Top-down inference rule can accept any expression formula as input, and uses the recursion resolution of this expression formula execution to image.Top-down inference rule can identify based on the descriptor in the profile and expression formula of mark the bounding box matched with expression formula.In some instances, if bounding box comprises negative knowledge by mistake, then the element-specific of image can not be matched with expression formula by top-down inference rule.In figure 6, top-down inference rule is marked as " TD ".Top-down matching process TDMatch (E v, b) 602 accept expression formula Ev and bounding box b as input, and recursively this expression formula be matched with this bounding box.If top-down matching process TDMatch (E v, b) 602 successes (layered arrangement of the bounding box matched with expression formula Ev namely in TDMatch602 mark b), then TDMatch602 return data file d v1..., d vntuple (being labeled as T in Fig. 6).In some instances, the tuple of data file can be stored in overall high-speed cache M.The tuple of data file can be filled with the property value of the unit of the image corresponded in bounding box b.In one example, function F ill (D, b, T) 604 can the tuple T of accepted data file, and returns the tuple of data file and describing the value found in the bounding box of the Feature assignment through mark in D.Match if bounding box b does not describe with any one in expression formula Ev, then process can return error signal.In some instances, enough information is not had to be used for determining that bounding box b and expression formula Ev matches in IF expression Ev, then top-down matching process TDMatch (E v, b) 602 can return possibility signal.
In certain embodiments, top-down matching process TDMatch602 recursively can call additional top-down inference rule.In some instances, top-down matching process can detect execution and the successful result of the given top-down matching process whether having employing identical parameters (such as bounding box and expression formula).If top-down matching process detects the execution of the identical top-down matching process adopting identical parameters, then the result of the execution of the top-down matching process before can returning.
In figure 6, if top-down inference rule comprises the execution below to the horizontal line of top-down matching process TDMatch602, if the condition then above horizontal line is true, then coupling is successful, and its result to be illustrated in below horizontal line after "=" number.If TD-RECT-INCOMPLETE rule 606 describes the imperfect (attribute of rectangle leaf of description D of rectangle leaf, such as border and color, be not designated) and close to the profile c mating the bounding box b describing D, then can fill the successful match of leaf from the bounding box of profile c.Call TD-RECT-COMPLETE rule 608 for the rectangle leaf with complete description, and fill successful match from given bounding box.We use ready-made TesseractOCR engine to resolve " content " attribute in leaf element.If TD-STRUCT rule 610 describes each symbol <S of structure expression formula i, E i> and certain bounding box bi matches, then total body with comprise all b iminimum bounding box match.If exist for this symbol (TDSYMBOL-CONTOUR rule 612) coupling profile or use relative priority from the symbol of other successful match, calculated b and successfully recurrence coupling (TD-SYMBOL-DEPEND rule 614), then symbol <S i, E i> is successfully matched with bounding box b.If TD-SEQ-RANGE rule 616 describes and there is k1≤k≤k2 and make bounding box b can be divided into k part according to the direction of sequence, and the element type of sequence is successfully matched with every part, then there is the sequence that Range (k1, k2) counts expression formula and be successfully matched with bounding box b.
In certain embodiments, parser modules may be implemented in the parse operation replaced between top-down parsing and parsing from bottom to top and is formed in middle dynamic programming technique of joining.In some instances, parsing from bottom to top can call top-down resolution rules.If top-down parsing does not identify the region of the structured image of mating with expression formula, then parser modules can perform analytic technique from bottom to top.In some instances, analytic technique from bottom to top can comprise the expression formula in expression tree is matched with before the profile that identifies.
Parsing from bottom to top also can comprise use inference rule and detect for the true coupling of any proper number in high-speed cache.The fact alleged herein can comprise the region of the image matched with bounding box.If analytic technique from bottom to top identifies the region of the image matched with expression formula, then the new coupling for expression tree can be stored in high-speed cache M together with corresponding bounding box.In some instances, bounding box can identify according to profile or carry out suggestion by inference rule.
Inference rule (being identified as BU in Fig. 6) from bottom to top implements analytic method from bottom to top, even if the method mark can be matched with the region of the image of expression formula when the negative bounding box of knowledge by mistake.Parsing from bottom to top can comprise mates identified profile with leaf element.Subexpression and the bounding box of IF expression tree match, then parsing from bottom to top also can comprise the conjecture of generation for the bounding box in expression tree.Matching process BUMatch from bottom to top can perform when identifying the bounding box of any coupling.In certain embodiments, BUMatch comprises application inference rule from bottom to top, and such as 620 and 624.In one example, BUMatch calls corresponding inference rule from bottom to top.Inference rule BUMatch from bottom to top can accept expression formula Ev and bounding box b, and mark is for the coupling of the parent expressions of the Ev in expression tree.Inference rule BUMatch from bottom to top also can call some top-down inference rules and check that hypothesis is to find new coupling.In certain embodiments, can for each type expression (except leaf) definition inference rule from bottom to top.
Inference rule from bottom to top also can identify the expression formula of the bounding box in indicating image.Such as, in high-speed cache M, there is new coupling (being shown as Fact (E, b, F) 618) if looked, and some more than horizontal line conditions are true, then create the new Fact under horizontal line.In certain embodiments, the inference rule on horizontal line instruction horizontal line is the prerequisite of hypothetical proposition, and the inference rule under horizontal line is the result of this hypothetical proposition.Such as, horizontal line can indicate " IF-THEN " relation between two inference rules.Condition above line can be called other inference rule and bind new variable.Such as, BU-SEQ-UNKNOWN inference rule 620 describes as certain element in infructescence is successfully matched with bounding box b, and the counting expression formula of this sequence be unknown (by "? " or " * " indicates), then inference rule from bottom to top can in multiple directions (the i.e. left and right of horizontal direction, or vertical direction is upper and lower) on b copied far away as far as possible, assuming that each element of this sequence has identical size or determines clearly by certain profile.If inference rule from bottom to top causes sequential element to match with the bounding box copied or profile, then obtain the bounding box matched with expression formula.
In certain embodiments, parsing described above and Representation algorithm can use programming language (such as C# etc.) suitable arbitrarily to simulate.In some instances, contour detecting can adopt any suitable algorithm (such as Suzuki algorithm etc.) to realize.In addition, OCR engine realizes by any suitable technology (such as Tesseract etc.).In certain embodiments, technology described herein does not use any existing semantic knowledge, to resolve structured image.On the contrary, technology described herein can use the information of the description to structured image of such as writing with programming language etc. described above and so on to resolve structured image.In some instances, which region that programming language can be used to definition structure image comprises the data that will be extracted.
The chart of Fig. 6 is not intended to indicator diagram and comprises all inference rule.On the contrary, more inference rule can be there is.Such as, TD-union inference rule 622, BU-union inference rule 624 and TD-SEQ-Const inference rule 626 can be used to the regional identifying the structured image of mating with expression formula.
Fig. 7 is can by the example pearl figure using the expression formula of constructed fuction to describe.In certain embodiments, pearl figure can comprise the unit of any proper number row and column.In some instances, each unit can have any proper number characteristic, such as color, shape and size etc.
Pearl Figure 70 0 comprises multiple rectangular element 702.The unit 702 of pearl Figure 70 0 is illustrated by with two kinds of different colours, namely black and white.As discussed above, parser modules 120 can under the help of expression formula describing pearl Figure 70 0 pattern of detecting unit.In certain embodiments, expression formula can be the expression tree comprising subexpression and parent expressions.Parent expressions can pattern in indicating structure image (such as pearl Figure 70 0), and this pattern comprises the unit of larger amt.Subexpression can pattern in indicating structure image (such as pearl Figure 70 0), and this pattern comprises the unit of lesser amt.Such as, the pattern 704 comprising the unit collection of level is parent expressions, and its subexpression comprises the pattern 706 matched with individual unit.In certain embodiments, expression tree can the pattern of indicating structure image (such as pearl Figure 70 0), and wherein subexpression and parent expressions correspond to the layered arrangement of unit pattern.
Fig. 8 illustrates resolve and present the block diagram of the tangible computer-readable recording medium 800 of structured image.Tangible computer-readable recording medium 800 can be conducted interviews by computer bus 804 by processor 802.In addition, tangible computer-readable recording medium 800 can comprise the code that bootstrap processor 802 performs each step of current method.
Each component software discussed herein can be stored on tangible computer-readable recording medium 800, as shown in Figure 8.Such as, tangible computer-readable recording medium 800 can comprise parser modules 806 and renderer module 808.In certain embodiments, parser modules 806 can use expression formula from structured image, generate data file collection.Renderer module 808 can use any proper number expression formula or expression tree and any proper number data file to generate structured image.
Be appreciated that and depend on specific application, in Fig. 8, the extra component software of unshowned any amount can be included in tangible computer-readable recording medium 800.Although describe this theme with the language of special description scheme feature and/or method, be appreciated that the theme limited in appended claim book might not be confined to above-mentioned specific architectural feature or method.On the contrary, specific structural features as described above and method are as disclosed in the exemplary forms realizing claims.

Claims (10)

1., for a method for generating structured data file tuple, comprising:
Detect the expression formula of the structure of description scheme image;
The search strategy based on rule of inference is used to identify the bounding box layered arrangement matched with described expression formula in described structured image; And
The first structured data file tuple is generated based on the bounding box layered arrangement in identified structured image.
2. the method for claim 1, is characterized in that, the described search strategy based on rule of inference comprises the top-down rule of inference of use and rule of inference from bottom to top.
3. the method for claim 1, is characterized in that, the described search strategy based on rule of inference comprises by using the image processing techniques profile detected in described structured image to identify described bounding box.
4. the method for claim 1, is characterized in that, described first structured data file tuple comprises the first data file be made up of contents value and the second data file be made up of the style properties relevant with described contents value.
5. method as claimed in claim 4, is characterized in that, comprising:
Be that the second structured image generates the second data file tuple based on described expression formula, wherein said second data file tuple comprises the 3rd data file be made up of contents value and the 4th data file be made up of the style properties relevant with described contents value;
The 3rd structured image is presented from described first data file based on described first structured image and described 4th data file based on described second structured image.
6. the method for claim 1, is characterized in that, comprising:
Detect the amendment for described first structured data file tuple; And
Modified first structured data file tuple is adopted to present the second structured image.
7. method as claimed in claim 2, it is characterized in that, described expression formula is the expression tree based on one group of Example Structured Computer image genration.
8. method as claimed in claim 7, it is characterized in that, described top-down inference rule can identify described bounding box, when the parent expressions wherein supposing in the 3rd bounding box and described expression tree matches, described bounding box comprises the first bounding box and the second boundary frame that match with at least two subexpressions in described expression tree.
9. for one or more computer-readable recording mediums of generating structured data file tuple, described computer-readable recording medium comprises multiple instruction, and described instruction, when being performed by processor, causes described processor:
Detect the expression formula using constructed fuction to carry out the structure of description scheme image;
The search strategy based on rule of inference is used to identify the bounding box layered arrangement matched with described expression formula in described structured image; And
Layered arrangement based on the bounding box in identified structured image generates the first structured data file tuple, and wherein said first structured data file tuple comprises the first data file be made up of contents value and the second data file be made up of the style properties relevant with described contents value.
10., for generating a system for data file, comprising:
Perform the processor of processor executable code;
The memory device of storage of processor executable code, wherein said processor executable code causes described processor when being performed by described processor:
Detect the expression formula using constructed fuction to carry out the structure of description scheme image;
The search strategy based on rule of inference is used to identify the bounding box layered arrangement matched with described expression formula in described structured image; And
Layered arrangement based on the bounding box in identified structured image generates the first structured data file tuple, and wherein said first structured data file tuple comprises the first data file be made up of contents value and the second data file be made up of the style properties relevant with described contents value.
CN201480009496.7A 2013-02-19 2014-02-14 Parsing and rendering structured images Pending CN105144195A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13/769,838 2013-02-19
US13/769,838 US9031894B2 (en) 2013-02-19 2013-02-19 Parsing and rendering structured images
PCT/US2014/016335 WO2014130345A1 (en) 2013-02-19 2014-02-14 Parsing and rendering structured images

Publications (1)

Publication Number Publication Date
CN105144195A true CN105144195A (en) 2015-12-09

Family

ID=50190804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480009496.7A Pending CN105144195A (en) 2013-02-19 2014-02-14 Parsing and rendering structured images

Country Status (4)

Country Link
US (1) US9031894B2 (en)
EP (1) EP2959429A1 (en)
CN (1) CN105144195A (en)
WO (1) WO2014130345A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048356A (en) * 2022-01-12 2022-02-15 广东粤港澳大湾区硬科技创新研究院 Knowledge input method, device and storage medium

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10241992B1 (en) * 2018-04-27 2019-03-26 Open Text Sa Ulc Table item information extraction with continuous machine learning through local and global models
US11074048B1 (en) 2020-04-28 2021-07-27 Microsoft Technology Licensing, Llc Autosynthesized sublanguage snippet presentation
US11327728B2 (en) 2020-05-07 2022-05-10 Microsoft Technology Licensing, Llc Source code text replacement by example
CN111814673B (en) * 2020-07-08 2023-05-26 重庆农村商业银行股份有限公司 Method, device, equipment and storage medium for correcting text detection bounding box
US11900080B2 (en) 2020-07-09 2024-02-13 Microsoft Technology Licensing, Llc Software development autocreated suggestion provenance
US11875136B2 (en) 2021-04-01 2024-01-16 Microsoft Technology Licensing, Llc Edit automation using a temporal edit pattern
US11941372B2 (en) 2021-04-01 2024-03-26 Microsoft Technology Licensing, Llc Edit automation using an anchor target list

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1577328A (en) * 2003-07-28 2005-02-09 微软公司 Vision-based document segmentation
EP1883037A2 (en) * 2006-07-26 2008-01-30 Xerox Corporation Graphical syntax analysis of tables through tree rewriting
CN101253514A (en) * 2005-07-01 2008-08-27 微软公司 Grammatical parsing of document visual structures
CN102317933A (en) * 2009-01-02 2012-01-11 苹果公司 Content Profiling to Dynamically Configure Content Processing

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6557017B1 (en) 1998-02-06 2003-04-29 Xerox Corporation Image production system theme integration
US8606015B2 (en) 2002-12-17 2013-12-10 Abbyy Development Llc Multilevel image analysis
US8799234B2 (en) * 2010-07-12 2014-08-05 Microsoft Corporation Semantic entity manipulation using input-output examples
US20070192267A1 (en) * 2006-02-10 2007-08-16 Numenta, Inc. Architecture of a hierarchical temporal memory based system
US8060880B2 (en) * 2007-05-04 2011-11-15 Microsoft Corporation System using backward inter-procedural analysis for determining alternative coarser grained lock when finer grained locks exceeding threshold
US8181163B2 (en) * 2007-05-07 2012-05-15 Microsoft Corporation Program synthesis and debugging using machine learning techniques
US20080300796A1 (en) 2007-05-31 2008-12-04 Lassahn Gordon D Biological analysis methods, biological analysis devices, and articles of manufacture
US8316345B2 (en) * 2007-06-01 2012-11-20 Microsoft Corporation Program abstraction based on program control
US8266598B2 (en) * 2008-05-05 2012-09-11 Microsoft Corporation Bounding resource consumption using abstract interpretation
US8719801B2 (en) * 2008-06-25 2014-05-06 Microsoft Corporation Timing analysis of concurrent programs
US8402439B2 (en) * 2008-06-27 2013-03-19 Microsoft Corporation Program analysis as constraint solving
US8271404B2 (en) * 2008-10-02 2012-09-18 Microsoft Corporation Template based approach to discovering disjunctive and quantified invariants over predicate abstraction
US8397221B2 (en) * 2008-10-07 2013-03-12 Microsoft Corporation Calculating resource bounds of programs manipulating recursive data structures and collections
US8195582B2 (en) * 2009-01-16 2012-06-05 Numenta, Inc. Supervision based grouping of patterns in hierarchical temporal memory (HTM)
US8752029B2 (en) * 2009-09-29 2014-06-10 Microsoft Corporation Computing a symbolic bound for a procedure
US20120133664A1 (en) 2010-11-29 2012-05-31 Lotus Hill Institute For Computer Vision And Information Science System and method for painterly rendering based on image parsing
US8484550B2 (en) * 2011-01-27 2013-07-09 Microsoft Corporation Automated table transformations from examples
US8825572B2 (en) * 2011-02-01 2014-09-02 Microsoft Corporation Program synthesis with existentially and universally quantified belief propagation using probabilistic inference
US8504570B2 (en) * 2011-08-25 2013-08-06 Numenta, Inc. Automated search for detecting patterns and sequences in data using a spatial and temporal memory system
US8645291B2 (en) * 2011-08-25 2014-02-04 Numenta, Inc. Encoding of data for processing in a spatial and temporal memory system
US8825565B2 (en) * 2011-08-25 2014-09-02 Numenta, Inc. Assessing performance in a spatial and temporal memory system
US8650207B2 (en) * 2011-12-02 2014-02-11 Microsoft Corporation Inductive synthesis of table-based string transformations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1577328A (en) * 2003-07-28 2005-02-09 微软公司 Vision-based document segmentation
CN101253514A (en) * 2005-07-01 2008-08-27 微软公司 Grammatical parsing of document visual structures
EP1883037A2 (en) * 2006-07-26 2008-01-30 Xerox Corporation Graphical syntax analysis of tables through tree rewriting
CN102317933A (en) * 2009-01-02 2012-01-11 苹果公司 Content Profiling to Dynamically Configure Content Processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BERTRAND COUASNON: "DMOS,a generic document recognition method:application to table structure analysis in a general and in a specific way", 《INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048356A (en) * 2022-01-12 2022-02-15 广东粤港澳大湾区硬科技创新研究院 Knowledge input method, device and storage medium

Also Published As

Publication number Publication date
WO2014130345A1 (en) 2014-08-28
US20140236991A1 (en) 2014-08-21
EP2959429A1 (en) 2015-12-30
US9031894B2 (en) 2015-05-12

Similar Documents

Publication Publication Date Title
CN105144195A (en) Parsing and rendering structured images
US10963632B2 (en) Method, apparatus, device for table extraction based on a richly formatted document and medium
US10191889B2 (en) Systems, apparatuses and methods for generating a user interface by performing computer vision and optical character recognition on a graphical representation
US10691976B2 (en) System for time-efficient assignment of data to ontological classes
JP5789525B2 (en) Document content ordering
US20190340240A1 (en) Automated extraction of unstructured tables and semantic information from arbitrary documents
Shahab et al. An open approach towards the benchmarking of table structure recognition systems
CN101206639B (en) Method for indexing complex impression based on PDF
US7853869B2 (en) Creation of semantic objects for providing logical structure to markup language representations of documents
CN101375278A (en) Strategies for processing annotations
US9286526B1 (en) Cohort-based learning from user edits
CN112597773A (en) Document structuring method, system, terminal and medium
US20220058383A1 (en) System and method to extract information from unstructured image documents
CN114187602B (en) Method, system, equipment and storage medium for identifying content of property proving material
CN112241730A (en) Form extraction method and system based on machine learning
CN109241151B (en) Data structure conversion method and device and electronic equipment
EP4068121A1 (en) Method and apparatus for acquiring character, page processing method, method for constructing knowledge graph, and medium
CN104408403A (en) Arbitration method and apparatus for inconsistent phenomenon of two pieces of entry information
CN113673294A (en) Method and device for extracting key information of document, computer equipment and storage medium
CN112084103B (en) Interface test method, device, equipment and medium
Bampis et al. A LoCATe‐based visual place recognition system for mobile robotics and GPGPUs
Lee et al. Deep learning-based digitalization of a part catalog book to generate part specification by a neutral reference data dictionary
CN115146073B (en) Test question knowledge point marking method for cross-space semantic knowledge injection and application
JP2018084952A (en) Automatic translation pattern learning device, automatic translation preprocessor and computer program
CN112541505B (en) Text recognition method, text recognition device and computer-readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151209

WD01 Invention patent application deemed withdrawn after publication