CN108734089A - Identify method, apparatus, equipment and the storage medium of table content in picture file - Google Patents

Identify method, apparatus, equipment and the storage medium of table content in picture file Download PDF

Info

Publication number
CN108734089A
CN108734089A CN201810285135.5A CN201810285135A CN108734089A CN 108734089 A CN108734089 A CN 108734089A CN 201810285135 A CN201810285135 A CN 201810285135A CN 108734089 A CN108734089 A CN 108734089A
Authority
CN
China
Prior art keywords
character
information
gauge outfit
coordinate
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810285135.5A
Other languages
Chinese (zh)
Other versions
CN108734089B (en
Inventor
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810285135.5A priority Critical patent/CN108734089B/en
Publication of CN108734089A publication Critical patent/CN108734089A/en
Application granted granted Critical
Publication of CN108734089B publication Critical patent/CN108734089B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention relates to method, apparatus, equipment and the storage mediums of table content in a kind of identification picture file, belong to image identification technical field.The method includes:Obtain Target Photo file to be identified;Character recognition processing is carried out to Target Photo file, obtains the character information in Target Photo file;The character information that will identify that carries out matching treatment with default dictionary, to obtain being more than the gauge outfit character of first threshold with default dictionary matching degree;According to the corresponding character information of gauge outfit character, the table content that Target Photo file includes is determined.Hereby it is achieved that fast and accurately being identified to the table that picture includes, the accuracy of identification is not only increased, moreover it is possible to reduce identification operation the time it takes, effectively improve the usage experience of user.

Description

Identify method, apparatus, equipment and the storage medium of table content in picture file
Technical field
The present invention relates to image identification technical field, more particularly to the method for table content in a kind of identification picture file, Device, equipment and storage medium.
Background technology
Optical character recognition technology (Optical Character Recognition, referred to as:OCR), it is that one kind passes through It detects dark, bright pattern and determines character shape in picture, the image of character is then converted into calculating using character recognition technologies The process of machine word.That is, being directed to printed character, the text conversion in picture is become by black and white lattice using optical method Image file, and by identification software by the text conversion in image at text formatting, further compiled for word processor Collect the technology of processing.
With the continuous development of computer technology, picture input computer system is become one strong with user-friendly Strong demand.Especially, it will include the picture input computer system of table.Currently, in the related technology, to including table Picture carry out Table recognition when, document is typically divided into multiple units first, the table for then including to each unit Line is identified, and extracts and identifies into line character after obtaining tableau format, then to picture.
However, when the table in picture being identified using aforesaid way, not only algorithm is complicated, but also the effect identified It is affected by picture quality, and it is high to detect error rate.
Invention content
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, one aspect of the present invention embodiment provides a kind of method identifying table content in picture file, this method packet It includes:Obtain Target Photo file to be identified;Character recognition processing is carried out to the Target Photo file, obtains the target Character information in picture file;The character information that will identify that carries out matching treatment with default dictionary, with obtain with it is described pre- If dictionary matching degree is more than the gauge outfit character of first threshold;According to the corresponding character information of the gauge outfit character, the mesh is determined The table content that mark picture file includes.
Another aspect of the present invention embodiment provides a kind of device identifying table content in picture file, which includes: First acquisition module, for obtaining Target Photo file to be identified;Processing module, for the Target Photo file into Line character identifying processing obtains the character information in the Target Photo file;Matching module, the character for will identify that Information carries out matching treatment with default dictionary, to obtain being more than the gauge outfit character of first threshold with the default dictionary matching degree; Determining module, for according to the corresponding character information of the gauge outfit character, determining the table that the Target Photo file includes Content.
Another aspect of the invention embodiment provides a kind of computer equipment, which includes:Memory and processing Device, the memory are stored with computer program, when the processor executes described program, realize the identification picture The method of table content in file.
Further aspect of the present invention embodiment provides a kind of computer readable storage medium, is stored thereon with computer program, When the program is executed by processor, the method for identifying table content in picture file is realized.
Method, apparatus, equipment and the storage medium of table content in identification picture file provided in an embodiment of the present invention lead to It crosses and obtains Target Photo file to be identified, to carry out character recognition processing to Target Photo file, obtain Target Photo file In character information, the character information that then will identify that and default dictionary carry out matching treatment, to obtain and default dictionary Gauge outfit character with degree more than first threshold, and then according to the corresponding character information of gauge outfit character, determine in Target Photo file Including table content.Hereby it is achieved that fast and accurately being identified to the table that picture includes, knowledge is not only increased Other accuracy, moreover it is possible to identification operation the time it takes is reduced, to effectively improve the usage experience of user.
It should be understood that above general description and following detailed description is only exemplary and explanatory, not It can the limitation present invention.
Description of the drawings
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the present invention Example, and be used to explain the principle of the present invention together with specification.
Fig. 1 is the flow according to the method for table content in the identification picture file shown in an exemplary embodiment of the invention Schematic diagram;
Fig. 2 is the flow according to the method for table content in the identification picture file shown in an exemplary embodiment of the invention Schematic diagram;
Fig. 3 (a) is the table style schematic diagram shown according to an exemplary embodiment of the invention;
Fig. 3 (b) is the schematic diagram according to the addition Target Photo shown in an exemplary embodiment of the invention;
Fig. 3 (c) is according to the format for determining Target Photo and correspondence character shown in an exemplary embodiment of the invention The schematic diagram of content;
Fig. 3 (d) is screened according to the character content to identification shown in an exemplary embodiment of the invention, determines knot Fruit is the schematic diagram of digital content;
Fig. 3 (e) is to draw corresponding trend broken line according to numeric results according to shown in an exemplary embodiment of the invention The schematic diagram of figure;
Fig. 4 is the position according to the selection content character corresponding with gauge outfit character shown in an exemplary embodiment of the invention The flow diagram of information and semanteme;
Fig. 5 is the flow according to the method for table content in the identification picture file shown in an exemplary embodiment of the invention Schematic diagram;
Fig. 6 is the structure according to the device of table content in the identification picture file shown in an exemplary embodiment of the invention Schematic diagram;
Fig. 7 is the structural schematic diagram according to the computer equipment shown in an exemplary embodiment of the invention;
Fig. 8 is the structural schematic diagram according to the computer equipment shown in an exemplary embodiment of the invention.
Through the above attached drawings, it has been shown that the specific embodiment of the present invention will be hereinafter described in more detail.These attached drawings It is not intended to limit the scope of the inventive concept in any manner with verbal description, but by referring to specific embodiments Illustrate idea of the invention for those skilled in the art.
Specific implementation mode
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary is implemented Embodiment described in example does not represent all embodiments consistent with the present invention.On the contrary, they are only and such as institute The example of the consistent device and method of some aspects being described in detail in attached claims, of the invention.
Various embodiments of the present invention are for the method for table content in existing identification picture file, and not only algorithm is complicated, and And the effect of identification is affected by picture quality, and the high problem of error rate is detected, propose a kind of Table recognition method.
The method of table content in identification picture file provided in an embodiment of the present invention, first by obtaining mesh to be identified Picture file is marked, to carry out character recognition processing to Target Photo file, obtains the character information in Target Photo file, so The character information that will identify that afterwards carries out matching treatment with default dictionary, to obtain being more than the first threshold with default dictionary matching degree The gauge outfit character of value, and then according to the obtained corresponding character information of gauge outfit character, determine the table that Target Photo file includes Lattice content.Hereby it is achieved that fast and accurately being identified to the table that picture includes, the accurate of identification is not only increased Property, moreover it is possible to identification operation the time it takes is reduced, to effectively improve the usage experience of user.
Below in conjunction with the accompanying drawings, it to the method, apparatus of table content, equipment in identification picture file provided by the invention and deposits Storage media is described in detail.
Fig. 1 is combined first, the method for table content in identification picture file provided in an embodiment of the present invention is carried out detailed Explanation.
Fig. 1 is the flow according to the method for table content in the identification picture file shown in an exemplary embodiment of the invention Schematic diagram.
As shown in Figure 1, the method for table content may comprise steps of in the identification picture file:
Step 101, Target Photo file to be identified is obtained.
Optionally, the method provided in an embodiment of the present invention for identifying table content in picture file, can be by of the invention real The computer equipment for applying example offer executes.Wherein, the dress of table content in identification picture file is provided in computer equipment It sets, with by identifying that the device of table content in picture file identified table content in Target Photo file to be identified Journey is managed or controls.The present embodiment computer equipment can be any hardware device with data processing function, such as Computer, personal digital assistant etc..
Wherein, in the present embodiment, Target Photo file to be identified can be the arbitrary picture text with table content Part, the present embodiment are not especially limited this.
, can be from the local picture library of equipment in a kind of optional way of realization of the application, obtaining arbitrarily has table The picture file of lattice content is as Target Photo file to be identified;Alternatively, can be sent to server-side with table content Picture file obtains request, obtains Target Photo file etc. to be identified in real time from server-side to realize, does not make to it herein It is specific to limit.
Step 102, character recognition processing is carried out to Target Photo file, obtains the character information in Target Photo file.
Wherein, in the present embodiment, character information may include character shape, semanteme and character location information etc., herein It is not especially limited.
Wherein, " character shape ", the writing for indicating character and presentation mode, " character is semantic ", for indicating character Meaning, " character location information ", for indicating position of the character in Target Photo file.
Optionally, after getting Target Photo file, it is existing to identify that the device of table content in picture file can utilize There are the character recognition technologies in technology, such as:ORC technologies carry out character recognition processing, to obtain mesh to Target Photo file Mark the character information in picture file.
Step 103, the character information that will identify that carries out matching treatment with default dictionary, to obtain matching with default dictionary Gauge outfit character of the degree more than first threshold.
Wherein, it includes various gauge outfit characters to preset dictionary.It can be by collecting a large amount of words, and to a large amount of words Carry out what analyzing processing obtained;Alternatively, can also be artificial self-defined setting;Alternatively, can also be by different field Involved a large amount of words are handled, and obtain the dictionary etc. corresponding to different field, the present embodiment does not limit this specifically It is fixed.
For example, being illustrated by taking the physical examination of medical domain report as an example, physical examination report generally includes:Project, result, reference The gauge outfit of the types such as value, unit, and there may be difference for the gauge outfit used in the physical examination report of Different hospital, such as:Item class Gauge outfit generally include:" project ", " project name ", " project full name ", " examining project ", " Chinese ", " Chinese name " etc. Deng as a result the gauge outfit of class generally includes:" result ", " inspection result ", " testing result ", " measurement result ", " actual numerical value ", " detected value ", " quantitative result " etc., then by carrying out analyzing processing to above-mentioned multiple contents, you can obtain medical domain Corresponding default dictionary.
In the present embodiment, first threshold can carry out adaptability setting, such as 0.90 according to actual needs, and 0.92 etc., It is not especially limited herein.
In a kind of optional way of realization of the application, after the character information in obtaining Target Photo file, identification The device of table content carries out matching operation, to obtain i.e. using default dictionary with the character information identified in picture file Get the gauge outfit character that matching degree is more than first threshold.
If for example, after to Target Photo file identification, determine the character information in Target Photo file be " inspection item ", " albumin ", " weight ", " reference value ", and first threshold are 0.90.So when above-mentioned character information and the progress of default dictionary With processing, when obtaining the matching degree between " inspection item " and " reference value " and default dictionary more than 0.90, then " inspection can be determined Look into project " and " reference value " be gauge outfit character.
Step 104, according to the corresponding character information of gauge outfit character, the table content that Target Photo file includes is determined.
Optionally, after determining gauge outfit character, identify that the device of table content in picture file can be according to gauge outfit The corresponding character information of character, the table content for including to Target Photo file are determined.
In a kind of optional way of realization, first the corresponding character information of gauge outfit character can be analyzed first, with true Field belonging to the fixed gauge outfit character, then according to determining field and the corresponding character information of gauge outfit character, you can obtain mesh Mark table format and table content in picture file.
The method of table content in identification picture file provided in an embodiment of the present invention, by obtaining target figure to be identified Piece file obtains the character information in Target Photo file to carry out character recognition processing to Target Photo file, then will The character information identified carries out matching treatment with default dictionary, to obtain being more than first threshold with default dictionary matching degree Gauge outfit character, and then according to the corresponding character information of gauge outfit character, determine the table content that Target Photo file includes.By This, realizes and is fast and accurately identified to the table that picture includes, not only increase the accuracy of identification, moreover it is possible to subtract Few identification operation the time it takes, to effectively improve the usage experience of user.
By above-mentioned analysis it is found that the embodiment of the present invention is by obtaining character information in Target Photo file, with according to word Information is accorded with, gauge outfit character is obtained, then according to the corresponding character information of gauge outfit character, determines that Target Photo file includes Table content.In a kind of optional way of realization, due to may include character semanteme and character bit in the character information of acquisition Confidence ceases, therefore in order to more accurately determine that gauge outfit character, the present embodiment can determine target word first according to dictionary is preset Symbol collection, it is then semantic further according to the character in character information, the corresponding table style of character information is determined, to according to table Pattern determines target position information, and then according to target position information and the corresponding location information of target character collection, obtains gauge outfit Character.With reference to Fig. 2, the above process of the method for table content in picture file, which is specifically described, to be identified to the present invention.
As shown in Fig. 2, the method for table content may comprise steps of in the identification picture file:
Step 201, Target Photo file to be identified is obtained.
Step 202, character recognition processing is carried out to Target Photo file, obtains the character information in Target Photo file.
Wherein, character information includes:Character semanteme and character location information.Character location information may include that character exists First direction coordinate in Target Photo file and second direction coordinate.
In actual use, it can first be one coordinate system of Target Photo document definition, such as Target Photo file The upper left corner is coordinate origin, is to the right X-axis positive direction by origin, is downwards positive direction of the y-axis.Correspondingly, above-mentioned first Direction coordinate can be X axis coordinate, and second direction coordinate can be Y axis coordinate;It is sat alternatively, first direction coordinate can be Y-axis Mark, second direction coordinate can be X axis coordinate, and the present embodiment is not especially limited this.
In the present embodiment, Target Photo file can be picture of any format, such as BMP, TIF, JPG, PDF etc., It is not especially limited herein.
A kind of optional realization method, using character recognition technologies in the prior art, such as:OCR technique, to target Picture file carries out character recognition processing, to obtain the semanteme of the character in Target Photo file and character location information.
Step 203, it will identify that character information carries out matching treatment with default dictionary, to obtain and default dictionary matching degree More than the target character collection of first threshold.
Optionally, in the present embodiment, character in getting Target Photo file is semantic and character location information it Afterwards, it identifies that the device of table content in picture file can utilize default dictionary, matching operation is carried out with character information, to obtain It is more than the target character collection of first threshold to matching degree.
Since in actual application, Target Photo file may be related to any one field, therefore in order to improve To the accuracy of Target Photo file identification, the present embodiment, can before character information and default dictionary are carried out matching operation To be analyzed first by the character semanteme to identification, to determine corresponding target dictionary according to the semanteme of character.Also It is to say, by analyzing character semanteme, according to character semanteme fields, to determine corresponding with above-mentioned field pre- If dictionary, to effectively promote the identification accuracy to Target Photo file.
For example, if analysis show that character semanteme relates generally to medical domain, then can be corresponding by medical domain Default dictionary, is determined as target dictionary;In another example if analysis show that character semanteme relates generally to financial field, then can incite somebody to action The corresponding default dictionary in financial field, is determined as target dictionary.
Further, after determining the corresponding target dictionary of character information, table content in picture file is identified The character that character information includes can be carried out matching treatment by device with target dictionary, obtain between character and target dictionary With degree size, then matching degree size is compared with first threshold, and matching degree is more than to the character of first threshold, made For target character collection.
If for example, after to Target Photo file identification, determine character that Target Photo file includes be " inspection item ", " albumin ", " weight ", " reference value ", and first threshold are 0.90.After being analyzed by the semanteme to above-mentioned each character, It determines that the character in the Target Photo file is related to medical domain, and then the dictionary of medical domain can be obtained, then in judgement The matching degree of the dictionary of each character and medical domain is stated, if the matching degree between " inspection item " and " reference value " and default dictionary When more than 0.90, then it can determine that " inspection item " and " reference value " is used as target character collection.
Step 204, semantic according to the character in character information, determine the corresponding table style of character information.
Step 205, according to table style, target position information is determined.
In a kind of optional way of realization, since the table style of different field is different, in order to accurately may be used That leans on gets gauge outfit character, and the present embodiment can be semantic according to the character in character information in Target Photo file first, right The corresponding table style of character information is determined;Then further according to determining table style, target position information is determined.
That is, when field difference involved by the character semanteme in character information in Target Photo file, target The table style that picture file includes also differs.For example, if the character semanteme in Target Photo file in character information relates to And medical domain, it is determined that table style may be as shown in Fig. 3 (a), to can determine that the position of gauge outfit intercharacter is closed System is that row coordinate is identical.
Step 206, it according to target position information and the corresponding location information of target character collection, concentrates and obtains from target character Gauge outfit character.
Optionally, after determining target position information, identify that the device of table content in picture file can basis Target position information and the corresponding location information of target character collection are concentrated from target character and obtain gauge outfit character.
As a kind of optional realization method, when determining that gauge outfit character is that second direction coordinate is identical, and second direction For Y direction, then the device of table content can be according to the identical rule of Y direction coordinate, in target in identification picture file Gauge outfit character is filtered out in character set;Alternatively, when determining that gauge outfit character is that first direction coordinate is identical, i.e., first direction is X When axis direction coordinate is identical, then the device of table content can be according to the identical rule of X-direction coordinate in identification picture file Then, gauge outfit character is filtered out in target character concentration.
For example, if the location information of target character concentration character and character is respectively:" serial number, (X1, Y1) ", " inspection Look into project, (X2, Y1) ", " blood pressure, (X2, Y2) ", " inspection result, (X3, Y1) ", " 45, (X3, Y3) ", " reference value, (X4, Y1) ", and determining target position information is:Y direction coordinate is identical.So identify the device of table content in picture file It can determine that gauge outfit character is respectively with the identical rule of Y direction coordinate:" serial number, (X1, Y1) ", " inspection item, (X2, Y1) ", " inspection result, (X3, Y1) " and " reference value, (X4, Y1) ".
Step 207, it according to the location information and semanteme of gauge outfit character, is chosen from character information corresponding with gauge outfit character The location information and semanteme of content character.
Specifically, after determining gauge outfit character, it is perfect in order to be carried out to the content of table, identify table in picture file The device of lattice content can according to the location information and semanteme of gauge outfit character, chosen from character information with gauge outfit character position and Semantic corresponding character as content character, and obtains the location information and semanteme of content character.
In order to clearly illustrate above-mentioned example, with reference to Fig. 4, to above-mentioned location information and language according to gauge outfit character The process of justice, location information and semanteme that content character corresponding with gauge outfit character is chosen from character information carries out specifically It is bright.
It should be noted that in the present embodiment, character location information, including character first direction coordinate, second direction Coordinate.
As shown in figure 4, choosing the location information and semanteme of content character corresponding with gauge outfit character, it may include following step Suddenly:
Step 401, according to the first direction coordinate or second direction coordinate of any gauge outfit character, the of content character is determined One direction coordinate range or second direction coordinate range.
It optionally, can be according to gauge outfit word after identifying that the device of table content in picture file determines gauge outfit character The location information of symbol determines the first direction coordinate range or second direction coordinate range of content character.
In practical applications, due to the device of table content in identification picture file, the position letter of determining gauge outfit character Breath is therefore corresponding with gauge outfit character in order to accurately obtain there may be error or content character length are irregular Content character, the present embodiment can determine the first direction coordinate range of content character according to the location information of gauge outfit character Or second direction coordinate range.For example, coordinate in a first direction can be distinguished on the basis of the location information of gauge outfit character With on second direction coordinate add an additional range, i.e., when the location information of gauge outfit character be (X, Y) when, determine content word The coordinate range for according with first direction can be (X- Δs, X+ Δs);Alternatively, being in second direction coordinate range:(Y- Δs, Y+ Δs) Etc., the present embodiment is not especially limited this.
It is understood that when determining the coordinate range of content character, it can be first according to the position of each gauge outfit character Relationship determines the position relationship of content character and gauge outfit character, and then determines the corresponding coordinate range of content character again.
For example, if according to the position of gauge outfit character, determine that each gauge outfit character is located at same row, then can determine with The corresponding content character of gauge outfit character is close with the row coordinate of gauge outfit character, to be sat according to the X-direction of each gauge outfit character X1 is marked, determines that each content character row coordinate range is (x1- Δs, x1+ Δs).
If correspondingly, according to the position of gauge outfit character, determine that each gauge outfit character is located at same row, then can then determine with The corresponding content character of gauge outfit character is close with the ordinate of gauge outfit character, to can both be sat according to the Y-direction of each gauge outfit character Y1 is marked, determines that each content character row coordinate range is (y1- Δs, y1+ Δs).
Step 402, chosen position information meets first direction coordinate range or second direction coordinate model from character information The primary election character set enclosed.
Step 403, according to the semanteme of any gauge outfit character, from primary election character set, the language with any gauge outfit character is chosen The matched character of justice is the corresponding content character of any gauge outfit character.
Specifically, obtaining precision to the corresponding content character of gauge outfit character to improve, the present embodiment identifies picture text The device of table content in part, can also be after getting primary election character set, based on the semanteme of gauge outfit character, to primary election character Collection is analyzed, from primary election character set, to select the character to match with gauge outfit character semanteme, as gauge outfit character pair The content character answered.
It is understood that the present embodiment by the location information according to gauge outfit character, first determines the coordinate of content symbol Position range is in first direction coordinate range or second direction coordinate range to select location information from character information Primary election character set from primary election character set, selected and gauge outfit character semanteme phase then further according to the semanteme of gauge outfit character Matched character is effectively increased and is obtained to the corresponding content character of gauge outfit character as the corresponding content character of gauge outfit character Take accuracy.
Step 208, according to the location information of gauge outfit character and the location information and semanteme of semanteme, content character, mesh is generated The table that mark picture file includes.
Specifically, in the location information for the gauge outfit character for getting Target Photo file and the position of semanteme, content character After information and semanteme, identify that the device of table content in picture file can generate Target Photo file according to above- mentioned information The table for including.
In practical application, due to character may be because alignment thereof etc. in table, a word is divided into two Farther apart word on the space of a whole page, this is easy for so that above-mentioned word is identified as two independent characters.Such as:" project " quilt It is identified as " item " and " mesh ";In another example " unit " to be identified as to two independent characters of " list " and " position ";
Alternatively, because there are many word of identical semanteme is understood in table fields, therefore in order to by the word of different semantemes Language carries out unification, to form relatively uniform structural data, follow-up storage and use, the present embodiment is facilitated to identify picture text The device of table content in part will have identical language in combination with the modes such as the semantic analysis of gauge outfit character or synonym merging The word of justice is normalized.For example, physical examination report template includes:" project ", " unit ", " reference value ", " unit " etc. Four, then can be the words such as " project ", " project name ", " project full name ", " examine project ", " Chinese name " by gauge outfit character Language is classified as " project ", other and so on.
Further, preset dictionary or the semanteme according to the word after merging can also be utilized, to gauge outfit character and Content character carries out correction process, if than " project " to be identified as to " item and ", then passing through semantic analysis, it is known that " and " For wrong word, and then character " item and " can be corrected as " project ".
That is, according to the location information of gauge outfit character and the location information and semanteme of semanteme, content character, mesh is generated Before the table that mark picture file includes, the present embodiment identifies the device of table content in picture file, using default word Library, is normalized gauge outfit character and content character and word merging treatment, so that follow-up management and processing more square Just.
With reference to Fig. 3 (b)-Fig. 3 (e), explanation is further expalined to above-described embodiment:
If Target Photo file is that user is reported using the papery physical examination in different time periods of equipment acquisition in the present embodiment, So user's papery physical examination is reported in order to realize, establishes corresponding electronic edition physical examination archives, user can be by the papery body of shooting Inspection report is added in the application for establishing electronic edition physical examination archives, specific such as Fig. 3 (b), then when establishing electronic edition physical examination archives Application detect user addition physical examination report picture after, using character identification function from physical examination report picture in, determine Go out the table style of physical examination report and corresponding character content, such as Fig. 3 (c), further user for convenience understands physical examination report The height of indices, tendency in announcement can also screen the character content of identification, to select the content that result is number, As Fig. 3 (d) draws the corresponding trend line chart of total bilirubin, tool then according to the numeric results of extraction, such as total bilirubin Body such as Fig. 3 (e), so that user can be according to the trend line chart of generation, the well-known understanding indices of itself Whether normal level is in.
The method of table content in identification picture file provided in an embodiment of the present invention, by obtaining mesh to be identified first Picture file is marked, character recognition processing then is carried out to Target Photo file, to obtain the letter of the character in Target Photo file Breath, and the character information that will identify that carries out matching treatment with default dictionary, to obtain being more than first with default dictionary matching degree The target character collection of threshold value, it is then semantic according to the character in character information, determine the corresponding table style of character information, from And according to table style, determine target position information, and then believe according to target position information and the corresponding position of target character collection Breath is concentrated from target character and obtains gauge outfit character, then according to the location information and semanteme of gauge outfit character, from character information The location information and semanteme for choosing content character corresponding with gauge outfit character, to according to the location information and language of gauge outfit character Justice, the location information and semanteme of content character generate the table that Target Photo file includes.Hereby it is achieved that in picture Including table fast and accurately identified, not only increase the accuracy of identification, moreover it is possible to reduce identification operation and be spent Time, to effectively improve the usage experience of user.
By above-mentioned analysis it is found that the embodiment of the present invention by obtain Target Photo file character location information, with root Gauge outfit character is determined according to character location information, then according to the location information and semanteme of gauge outfit character, is chosen from character information The location information and semanteme of content character corresponding with gauge outfit character, with according to the location information of gauge outfit character and semanteme, content The location information and semanteme of character generate the table that Target Photo file includes.In specific implementation, due to identifying picture The character location information that the device of table content recognizes in file, including character first direction coordinate or second direction sit There may be errors for mark, so that according to character first direction coordinate or second direction coordinate, determine gauge outfit character or content When character, in fact it could happen that due to character location information mistake, and lead to the situation of character types identification mistake, therefore, the present invention It,, can be first to character bit confidence before determining gauge outfit character or content character according to character location information in embodiment Breath is modified.
In actual use, in the character location information recognized due to the device of table content in identification picture file also Including character width in a first direction and in the width of second direction.Therefore, the present embodiment is determining whether character is gauge outfit When character, can also first according to character in a first direction the width of coordinate and second direction coordinate and character in a first direction and In the width of second direction, character location information is modified, and then further according to revised location information, determines gauge outfit Character or content character.With reference to Fig. 5, to the present invention identify the above process of the method for table content in picture file into Row illustrates.
Fig. 5 is the flow according to the method for table content in the identification picture file shown in an exemplary embodiment of the invention Schematic diagram.
It should be noted that for clearer explanation embodiment, the present embodiment can be first to Target Photo document definition One coordinate system, for example be X-axis positive direction to the right by origin using the upper left corner of Target Photo file as coordinate origin, to It is Y axis positive directions down.Correspondingly, first direction coordinate can be defined as to X axis coordinate (i.e. abscissa), second direction coordinate is fixed Justice is Y axial coordinates (i.e. ordinate), and the embodiment is described in detail according to content defined above to realize.
As shown in figure 5, the method for table content may comprise steps of in the identification picture file:
Step 501, Target Photo file to be identified is obtained.
Step 502, character recognition processing is carried out to Target Photo file, obtains the character information in Target Photo file.
Wherein, character information includes character semanteme and character location information, the character location information, including character is in mesh First direction coordinate (i.e. X axis coordinate) and second direction coordinate (i.e. Y axis coordinate), character in mark picture file is in first party To width and in the width of second direction.
Since the realization method of the step and the realization method of above-mentioned example are similar, it is not repeated excessively at this, is had Body is referring to step 102 or step 202.
Step 503, character information is traversed successively by the ascending sequence of first direction coordinate, is judged j-th Whether character is identical in the coordinate of second direction as i-th of character, if differing, thens follow the steps 504, otherwise, executes step 508。
Wherein, the difference of the first direction coordinate of j-th of character and i-th of intercharacter adjacent character is in preset range Interior, i and j are positive integer, and j is more than i.
Preset range in the present embodiment can carry out adaptability setting, for example, in advance according to the location information between actual characters If range can be determined according to character duration, alternatively, determined according to common character pitch etc., the present embodiment does not make this to have Body limits.
Optionally, after getting the character information in Target Photo file, table content in picture file is identified Device can successively be traversed according to the first direction coordinate sequence from small to large of character information, with judge j-th of character with Whether i-th of character be identical in the coordinate of second direction.If identical, illustrate that each character is in same a line, if differing, Illustrate that each character is in not go together.
Wherein, j-th of character and i-th of character can be the different character of any two in the character identified, this reality Example is applied to be not especially limited this.
That is, be independent variable by the way that the X axis coordinate of character in character information to be used as, it is right with ascending sequence Whether the Y axis coordinate of character is identical to be determined.Wherein, when the Y axis coordinate of character is identical, you can determine that character is in same A line;When the Y axis coordinate of character differs, you can determine that character is in and do not go together, to realize the position to each coordinate Whether information is correctly accurately judged.
For example, if the first direction coordinate of character A is X1, second direction coordinate is Y1, and the first direction coordinate of character B is X2, second direction coordinate are that Y2 then illustrates that character A and character B is in same a line then as Y1=Y2;As Y1 ≠ Y2, Then illustrate that character A and character B is in not go together.
Step 504, it according to the width of the second direction of the width of the second direction of j-th of character and i-th of character, determines The registration of jth character and i-th of character in second direction.
When specific implementation, can by formula (1), determine j-th of character and i-th of character second direction registration:
Wherein, ri indicates that i-th of character, yi indicate the second direction coordinate of i-th of character, hi i-th of character of expression The width of second direction, rj indicate that j-th of character, yj indicate that the second direction coordinate of j-th of character, hj indicate j-th of character Second direction width.
Step 505, judge j-th of character and i-th of character second direction registration, if be more than third threshold value, If more than, then follow the steps 506, it is no to then follow the steps 508.
Wherein, third threshold value can rule of thumb carry out value, and the present embodiment is not especially limited this.For example, third threshold Value could be provided as the half of the minimum value of i-th of character and j-th of character in the width of second direction.
If for example, third threshold value be j-th of character and i-th of character second direction width minimum one Half, and the coordinate of the second direction of j-th of character is 2, the width of second direction is 1, the seat of the second direction of i-th of character 2.1 are designated as, the width of second direction is 0.9, then being based on above-mentioned formula (1), it may be determined that go out j-th of character and i-th of word The registration of symbol is more than third threshold value.
Step 506, j-th of character and/or i-th of character are modified in the coordinate of second direction.
Specifically, when determining j-th of character and i-th of character are more than third threshold value in the registration of second direction, then say Bright jth character and i-th of character are actually in a line.Therefore subsequently character types are judged by accident to reduce May, j-th of the character and i-th of character of third threshold value are more than for above-mentioned registration, according to j-th of word of alignment schemes pair Symbol and/or i-th character are modified in the coordinate of second direction, so that j-th of character and i-th of character are in second direction Coordinate it is identical.
Specifically to j-th of character and/or i-th of character when the coordinate of second direction is modified, repaiied to improve Positive effect, the present embodiment can be first according to the second direction coordinate of i-th character and the second direction coordinates of j-th of character, really Target second direction coordinate range is made, is then selected from each character of identification in target second direction coordinate range Multiple characters, to according to the second direction coordinate of multiple characters of selection, exist to j-th of character and/or i-th of character The coordinate of second direction is modified.
I.e.:According to the second direction coordinate of i-th character and the second direction coordinate of j-th of character, target second is determined Direction coordinate range;
It chooses second direction coordinate and belongs to k character in target second direction coordinate range;
According to the second direction coordinate of k character, to j-th of character and/or i-th of character second direction coordinate into Row is corrected.
It is understood that in the present embodiment, to j-th of character and/or i-th of character second direction coordinate into Row is corrected, and can be that the coordinate to j-th of character in second direction is modified;Alternatively, to i-th of character in second direction Coordinate be modified;Alternatively, being modified in the coordinate of second direction to j-th of character and i-th of character, the present embodiment This is not especially limited.
Step 507, j-th of character is in i-th of character and does not go together.
Step 508, j-th of character and the i-th character are in same a line.
In actual application, since to may result in Target Photo file abnormal for the identifying processing of Target Photo file Become, or error occur to the identification of character location information in character information, so as to cause to each character whether in second direction There is error in the identical judgement of coordinate.
In this regard, causing the situation of judging result inaccuracy to occur to reduce drawbacks described above, in the possibility of the present invention It realizes in scene, identifies that the device of table content in picture file can be selected with j-th of character from character information second Immediate i-th of the character of coordinate in direction is compared, with determine j-th of character and i-th of character second direction seat Whether mark is identical.If identical, it is determined that j-th of character and i-th of character are in same a line, if differing, according to j-th The width of the second direction of the width of the second direction of character and i-th of character determines j-th of character and i-th of character The registration in two directions, and judge whether registration is more than third threshold value, if more than j-th of character and i-th of word is then illustrated Symbol is practically at same a line, at this time in order to avoid subsequently being judged by accident to character types identification, can to j-th of character and/or I-th of character is modified in the coordinate of second direction, so that the coordinate phase of j-th of character and i-th of character in second direction Together;Otherwise, in not going together.
Step 509, character information and default dictionary are subjected to matching treatment, to obtain being more than the with default dictionary matching degree The target character collection of one threshold value.
Step 510, semantic according to the character in character information, determine the corresponding table style of character information.
Step 511, according to table style, target position information is determined.
Step 512, it according to target position information and the corresponding location information of target character collection, concentrates and obtains from target character Gauge outfit character.
Step 513, it according to the location information and semanteme of gauge outfit character, is chosen from character information corresponding with gauge outfit character The location information and semanteme of content character.
Step 514, according to the location information of gauge outfit character and the location information and semanteme of semanteme, content character, mesh is generated The table that mark picture file includes.
It should be noted that the specific implementation process and principle of above-mentioned steps 509-514, are referred to above-described embodiment Detailed description, details are not described herein again.
Likewise, when in identification picture file table content according to character width in a first direction and character the The coordinate in one direction and the coordinate of second direction, when being modified to the location information of character, with it is above-mentioned according to character second The width and the coordinate of character first direction and the coordinate of second direction in direction, are modified the location information of character Process is similar, differs only in:
When carrying out traversing operation to the character information that Target Photo file identification is handled, by second direction coordinate (i.e. Y Axial coordinate) ascending sequence is traversed successively, and judges the coordinate of j-th of character and i-th of character in a first direction Whether (i.e. X axial coordinates) be identical.If identical, illustrate that j-th of character and i-th of character are in same row, if differing, According to the width of the first direction of j-th of character and the width of the first direction of i-th of character, j-th of character and i-th are determined The registration of a character in a first direction, and determine whether registration is more than third threshold value.
If it is determined that registration is less than third threshold value, then illustrate that j-th of character and i-th of character are in different lines;If it is determined that When registration is more than third threshold value, then it can determine that j-th of character is actually with the coordinate of i-th of character in a first direction It is identical, therefore in order to reduce the probability of miscarriage of justice subsequently identified to character types, the present embodiment can be to j-th of character or i-th The coordinate of character in a first direction is modified, so that the coordinate of j-th of character or i-th of character in a first direction is identical.Tool The coordinate of j-th of character of body pair or i-th of character in a first direction may include when being modified:According to the of j-th of character The first direction coordinate of one direction coordinate and i-th character determines target first direction coordinate range;First direction is chosen to sit Mark belongs to m character in target first direction coordinate range;According to the first direction coordinate of m character, to j-th of character And/or the coordinate of i-th of character in a first direction is modified.
Wherein, third threshold value can be the half of minimum value in the width of j-th of character and i-th of character in a first direction Etc., the present embodiment is not especially limited this.
It is understood that in the present embodiment, to j-th of character and/or i-th of character second direction coordinate into Row is corrected, and can be that the coordinate to j-th of character in second direction is modified;Alternatively, to i-th of character in second direction Coordinate be modified;Alternatively, being modified in the coordinate of second direction to j-th of character and i-th of character, the present embodiment This is not especially limited.
The method of table content in identification picture file provided in an embodiment of the present invention, by being carried out to Target Photo file Character recognition, to obtain first direction coordinate and second direction coordinate of the character in Target Photo file in Target Photo, And the width of character in a first direction and the width in second direction, then by character according to first direction coordinate by it is small to Big sequence is traversed, to judge whether j-th of character be identical in the coordinate of second direction as i-th of character, if differing Then judge j-th of character and i-th of character second direction registration, and judge registration whether be more than threshold value, if more than Then j-th of character or i-th of character are modified in the coordinate of second direction, then by character information and default dictionary into Row matching, obtains target character collection, then semantic according to character in character information, determines table style, and according to table sample Formula determines target position information, according to target position information and the corresponding location information of target character collection, obtains gauge outfit character, Then according to the location information and semanteme of gauge outfit character, content character corresponding with gauge outfit character is chosen from character information Location information and semanteme, to which according to the location information and semanteme of gauge outfit character, the location information and semanteme of content character are raw The table for including at Target Photo file.Hereby it is achieved that the table that picture includes fast and accurately is identified, Not only increase the accuracy of identification, moreover it is possible to identification operation the time it takes is reduced, to effectively improve the use of user Experience, and provide advantage for the use of subsequent user.
In the exemplary embodiment, a kind of device identifying table content in picture file is additionally provided.
Fig. 6 is the structure according to the device of table content in the identification picture file shown in an exemplary embodiment of the invention Schematic diagram.
With reference to shown in Fig. 6, the device of table content includes in identification picture file of the invention:First acquisition module 110, Processing module 120, matching module 130 and determining module 140.
Wherein, the first acquisition module 110 is for obtaining Target Photo file to be identified;
Processing module 120 is used to carry out character recognition processing to the Target Photo file, obtains the Target Photo text Character information in part;
Matching module 130 is used for the character information that will identify that and carries out matching treatment with default dictionary, with obtain with it is described Default dictionary matching degree is more than the gauge outfit character of first threshold;
Determining module 140 is used to, according to the corresponding character information of the gauge outfit character, determine in the Target Photo file Including table content.
It should be noted that the explanation of the embodiment of the method for table content is also suitable in the aforementioned picture file to identification The device of table content in the identification picture file of the embodiment, realization principle is similar, and details are not described herein again.
The device of table content in identification picture file provided in an embodiment of the present invention, by obtaining target figure to be identified Piece file obtains the character information in Target Photo file to carry out character recognition processing to Target Photo file, then will The character information identified carries out matching treatment with default dictionary, to obtain being more than first threshold with default dictionary matching degree Gauge outfit character, and then according to the corresponding character information of gauge outfit character, determine the table content that Target Photo file includes.By This, realizes and is fast and accurately identified to the table that picture includes, not only increase the accuracy of identification, moreover it is possible to subtract Few identification operation the time it takes, to effectively improve the usage experience of user.
In the exemplary embodiment, a kind of computer equipment is additionally provided.
Fig. 7 is the structural schematic diagram according to the computer equipment shown in an exemplary embodiment.The computer that Fig. 7 is shown is set A standby only example, should not bring any restrictions to the function and use scope of the embodiment of the present invention.
With reference to Fig. 7, which includes:Memory 210 and processor 220, the memory 210 are stored with Computer program, when the computer program is executed by processor 220 so that the processor 220 executes following steps:It obtains Take Target Photo file to be identified;Character recognition processing is carried out to the Target Photo file, obtains the Target Photo text Character information in part;Wherein, the character information includes character shape, semanteme and character location information;The word that will identify that It accords with information and carries out matching treatment with default dictionary, to obtain being more than the gauge outfit word of first threshold with the default dictionary matching degree Symbol;According to the corresponding character information of the gauge outfit character, the table content that the Target Photo file includes is determined.
In one embodiment, the character information includes character semanteme and character location information;It is described obtain with it is described Default dictionary matching degree is more than the gauge outfit character of first threshold, including:The character information that will identify that and the progress of default dictionary With processing, to obtain being more than the target character collection of first threshold with the default dictionary matching degree;According in the character information Character it is semantic, determine the corresponding table style of the character information;According to the table style, target position information is determined; According to the target position information and the corresponding location information of the target character collection, is concentrated from the target character and obtain table Head character.
In one embodiment, before the table content that the determination Target Photo file includes, further include:Profit With the default dictionary, to the gauge outfit character and the content character is normalized and word merging treatment.
In one embodiment, described according to the corresponding character information of the gauge outfit character, determine the Target Photo text The table content that part includes, including:According to the location information and semanteme of the gauge outfit character, selected from the character information Take the location information and semanteme of content character corresponding with the gauge outfit character;According to the location information of the gauge outfit character and The location information and semanteme of semantic, the described content character generate the table that the Target Photo file includes.
In one embodiment, the character location information, including character first direction coordinate, second direction coordinate;Institute The location information and semanteme that content character corresponding with the gauge outfit character is chosen from the character information are stated, including:According to The first direction coordinate or second direction coordinate of any gauge outfit character determine in target corresponding with any gauge outfit character Hold the first direction coordinate range or second direction coordinate range of character;Chosen position information meets from the character information The primary election character set of the first direction coordinate range or second direction coordinate range;According to the language of any gauge outfit character Justice, from the primary election character set, it is any gauge outfit word to choose with the character of the semantic matches of any gauge outfit character Accord with corresponding content character.
In one embodiment, the character information includes character location information, wherein character location information, including word The width of symbol first direction coordinate, second direction coordinate and character in second direction;It is described to obtain in the Target Photo file Character information after, further include:Character information is traversed successively by the ascending sequence of first direction coordinate, is sentenced Whether disconnected j-th of character and i-th of character are identical in the coordinate of second direction, wherein j-th of character and i-th of intercharacter are each Within a preset range, i and j are positive integer to the difference of the first direction coordinate of adjacent character, and j is more than i;If the jth A character is different in the coordinate of second direction from i-th of character, then according to the width of the second direction of j-th of character and The width of the second direction of i-th of character, determine j-th of character and i-th of character second direction weight It is right;Judge j-th of character and i-th of character second direction registration, if be more than third threshold value;If It is more than, then j-th of character and/or i-th of character is modified in the coordinate of second direction.
In one embodiment, it is described to j-th of character and/or i-th of character second direction coordinate Before being modified, further include:It is sat according to the second direction of the second direction coordinate of i-th character and j-th of character Mark, determines target second direction coordinate range;Second direction coordinate is chosen to belong in the target second direction coordinate range K character;According to the second direction coordinate of the k character, to j-th of character and/or i-th of character The coordinate in two directions is modified.
In one embodiment, the character location information further includes the width of first direction;It is described to believe the character Breath is carried out by the ascending sequence of first direction coordinate after traversing successively, further includes:According to the first party of j-th of character To coordinate and the first direction coordinate of i-th of character, target first direction coordinate range is determined;First direction is chosen to sit Mark belongs to m character in the target first direction coordinate range;According to the first direction coordinate of the m character, to institute The coordinate of j-th of character and/or i-th of character in a first direction is stated to be modified.
In one embodiment, the character information includes character semanteme;The character information that will identify that with it is default Before dictionary carries out matching treatment, further include:According to character semanteme, target dictionary is determined;The character that will identify that Information carries out matching treatment with default dictionary, including:The character information identified is matched with the target dictionary Processing.
In a kind of optional way of realization, as shown in figure 8, the computer equipment 200 can also include:Memory 210 And processor 220, the bus 230 of different components (including memory 210 and processor 220) is connected, memory 210 is stored with Computer program realizes the cross-domain data transmission method described in the embodiment of the present invention when processor 220 executes described program.
Bus 230 indicates one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures. For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) Bus.
Computer equipment 200 typically comprises a variety of computer equipment readable mediums.These media can be it is any can The usable medium accessed by computer equipment 200, including volatile and non-volatile media, it is moveable and immovable Medium.
Memory 210 can also include the computer system readable media of form of volatile memory, such as arbitrary access Memory (RAM) 240 and/or cache memory 250.Computer equipment 200 may further include it is other it is removable/ Immovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 260 can be used for Read and write immovable, non-volatile magnetic media (Fig. 8 do not show, commonly referred to as " hard disk drive ").Although not showing in Fig. 8 Go out, can provide for the disc driver to moving non-volatile magnetic disk (such as " floppy disk ") read-write, and to removable The CD drive of anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, Each driver can be connected by one or more data media interfaces with bus 230.Memory 210 may include to There is one group of (for example, at least one) program module, these program modules to be configured to for a few program product, the program product Execute the function of various embodiments of the present invention.
Program/utility 280 with one group of (at least one) program module 270, can be stored in such as memory In 210, such program module 270 include --- but being not limited to --- operating system, one or more application program, its Its program module and program data may include the realization of network environment in each or certain combination in these examples. Program module 270 usually executes function and/or method in embodiment described in the invention.
Computer equipment 200 can also be with one or more external equipments 290 (such as keyboard, sensing equipment, display 291 etc.) it communicates, the equipment interacted with the computer equipment 200 communication can be also enabled a user to one or more, and/or With enable any equipment that the computer equipment 200 communicated with one or more of the other computing device (such as network interface card, Modem etc.) communication.This communication can be carried out by input/output (I/O) interface 292.Also, computer is set Standby 200 can also pass through network adapter 293 and one or more network (such as LAN (LAN), wide area network (WAN) And/or public network, such as internet) communication.As shown, network adapter 293 passes through bus 230 and computer equipment 200 other modules communication.It should be understood that although not shown in the drawings, other hardware can be used in conjunction with computer equipment 200 And/or software module, including but not limited to:Microcode, device driver, redundant processing unit, external disk drive array, RAID systems, tape drive and data backup storage system etc..
It should be noted that the explanation of the embodiment of the method for table content is also suitable in the aforementioned picture file to identification In the computer equipment of the embodiment, realization principle is similar, and details are not described herein again.
Computer equipment provided in an embodiment of the present invention, by obtaining Target Photo file to be identified, with to target figure Piece file carry out character recognition processing, obtain the character information in Target Photo file, the character information that then will identify that with Default dictionary carries out matching treatment, to obtain being more than the gauge outfit character of first threshold with default dictionary matching degree, and then according to table The corresponding character information of head character, determines the table content that Target Photo file includes.Hereby it is achieved that being wrapped in picture The table included is fast and accurately identified, the accuracy of identification is not only increased, moreover it is possible to when reducing identification and operating spent Between, to effectively improve the usage experience of user.
In the exemplary embodiment, the invention also provides a kind of computer readable storage mediums.
Above computer readable storage medium storing program for executing, is stored thereon with computer program, when which is executed by processor, realizes The method of table content in the identification picture file.
In the description of the present invention, it is to be understood that, term " first ", " second " are used for description purposes only, and cannot It is interpreted as indicating or implies relative importance or implicitly indicate the quantity of indicated technical characteristic.Define as a result, " the One ", the feature of " second " can explicitly or implicitly include one or more this feature.In the description of the present invention, The meaning of " plurality " is two or more, unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means that specific features described in conjunction with this embodiment or example or feature are contained in this In at least one embodiment or example of invention.In the present specification, schematic expression of the above terms are necessarily directed to It is identical embodiment or example.Moreover, the specific features or feature of description in any one or more embodiments or can show It can be combined in any suitable manner in example.In addition, without conflicting with each other, those skilled in the art can illustrate this The feature of different embodiments or examples and different embodiments or examples described in book is combined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of the module of code of executable instruction, segment or Part, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussion Sequentially, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be by this The embodiment person of ordinary skill in the field of invention is understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (system of such as computer based system including processor or other can be from instruction Execute system, device or equipment instruction fetch and the system that executes instruction) use, or combine these instruction execution systems, device or Equipment and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating Or transmission program uses for instruction execution system, device or equipment or in conjunction with these instruction execution systems, device or equipment Device.The more specific example (non-exhaustive list) of computer-readable medium includes following:It is connected up with one or more Electrical connection section (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk are read-only Memory (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other conjunctions Suitable medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or necessity When handled with other suitable methods electronically to obtain described program, be then stored in computer storage In.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned In embodiment, multiple steps or method can in memory and by suitable instruction execution system be executed soft with storage Part or firmware are realized.It, and in another embodiment, can be with well known in the art for example, if realized with hardware Any one of following technology or their combination are realized:With the logic gate for realizing logic function to data-signal The discrete logic of circuit, the application-specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), Field programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage In medium, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a calculating In machine read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the present invention System, those skilled in the art above-described embodiment can be changed, be changed within the scope of the invention, replaced and Modification.

Claims (12)

1. a kind of method of table content in identification picture file, which is characterized in that including:
Obtain Target Photo file to be identified;
Character recognition processing is carried out to the Target Photo file, obtains the character information in the Target Photo file;
The character information that will identify that carries out matching treatment with default dictionary, to obtain being more than the with the default dictionary matching degree The gauge outfit character of one threshold value;
According to the corresponding character information of the gauge outfit character, the table content that the Target Photo file includes is determined.
2. the method as described in claim 1, which is characterized in that the character information includes character semanteme and character bit confidence Breath;
The gauge outfit character for obtaining being more than first threshold with the default dictionary matching degree, including:
The character information that will identify that carries out matching treatment with default dictionary, to obtain being more than the with the default dictionary matching degree The target character collection of one threshold value;
It is semantic according to the character in the character information, determine the corresponding table style of the character information;
According to the table style, target position information is determined;
According to the target position information and the corresponding location information of the target character collection, concentrates and obtain from the target character Gauge outfit character.
3. method as claimed in claim 2, which is characterized in that in the table that the determination Target Photo file includes Before appearance, further include:
Using the default dictionary, to the gauge outfit character and the content character is normalized and word merging treatment.
4. method as claimed in claim 2, which is characterized in that it is described according to the corresponding character information of the gauge outfit character, really The table content that the fixed Target Photo file includes, including:
According to the location information and semanteme of the gauge outfit character, chosen from the character information corresponding with the gauge outfit character The location information and semanteme of content character;
According to the location information of the gauge outfit character and the location information and semanteme of semanteme, the content character, the mesh is generated The table that mark picture file includes.
5. method as claimed in claim 4, which is characterized in that the character location information, including character first direction coordinate, Second direction coordinate;
The location information and semanteme that content character corresponding with the gauge outfit character is chosen from the character information, packet It includes:
According to the first direction coordinate or second direction coordinate of any gauge outfit character, determination is corresponding with any gauge outfit character The first direction coordinate range or second direction coordinate range of object content character;
Chosen position information meets the first direction coordinate range or second direction coordinate range from the character information Primary election character set;
According to the semanteme of any gauge outfit character, from the primary election character set, the language with any gauge outfit character is chosen The matched character of justice is the corresponding content character of any gauge outfit character.
6. the method as described in claim 1, which is characterized in that the character information includes character location information, wherein character Location information, including character first direction coordinate, second direction coordinate and character are in the width of second direction;
It is described obtain the character information in the Target Photo file after, further include:
Character information is traversed successively by the ascending sequence of first direction coordinate, judges j-th of character and i-th of word Whether symbol is identical in the coordinate of second direction, wherein the first direction of j-th of character and each adjacent character of i-th of intercharacter is sat Within a preset range, i and j are positive integer to target difference, and j is more than i;
If j-th of character is different in the coordinate of second direction from i-th of character, according to the second of j-th of character The width of the second direction of the width in direction and i-th of character determines that j-th of character exists with i-th of character The registration of second direction;
Judge j-th of character and i-th of character second direction registration, if be more than third threshold value;
If more than being then modified in the coordinate of second direction to j-th of character and/or i-th of character.
7. method as claimed in claim 6, which is characterized in that described to j-th of character and/or i-th of character Before the coordinate of second direction is modified, further include:
According to the second direction coordinate of i-th character and the second direction coordinate of j-th of character, target second direction is determined Coordinate range;
It chooses second direction coordinate and belongs to k character in the target second direction coordinate range;
According to the second direction coordinate of the k character, to j-th of character and/or i-th of character in second direction Coordinate be modified.
8. method as claimed in claim 6, which is characterized in that the character location information further includes the width of first direction;
It is described to carry out the character information after traversing successively by the ascending sequence of first direction coordinate, further include:
According to the first direction coordinate of j-th character and the first direction coordinate of i-th of character, target first direction is determined Coordinate range;
It chooses first direction coordinate and belongs to m character in the target first direction coordinate range;
According to the first direction coordinate of the m character, in a first direction to j-th of character and/or i-th of character Coordinate be modified.
9. the method as described in claim 1-8 is any, which is characterized in that the character information includes character semanteme;
Before the character information that will identify that carries out matching treatment with default dictionary, further include:
According to character semanteme, target dictionary is determined;
The character information that will identify that carries out matching treatment with default dictionary, including:
The character information identified and the target dictionary are subjected to matching treatment.
10. the device of table content in a kind of identification picture file, which is characterized in that including:
First acquisition module, for obtaining Target Photo file to be identified;
Processing module is obtained for carrying out character recognition processing to the Target Photo file in the Target Photo file Character information;
Matching module, the character information for will identify that carries out matching treatment with default dictionary, to obtain and the default word Storehouse matching degree is more than the gauge outfit character of first threshold;
Determining module, for according to the corresponding character information of the gauge outfit character, determining that the Target Photo file includes Table content.
11. a kind of computer equipment, which is characterized in that including:Memory and processor, the memory are stored with computer journey Sequence, which is characterized in that when the processor executes described program, realize the identification picture as described in claim 1-9 is any The method of table content in file.
12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor When execution, the method for identifying table content in picture file as described in claim 1-9 is any is realized.
CN201810285135.5A 2018-04-02 2018-04-02 Method, device, equipment and storage medium for identifying table content in picture file Active CN108734089B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810285135.5A CN108734089B (en) 2018-04-02 2018-04-02 Method, device, equipment and storage medium for identifying table content in picture file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810285135.5A CN108734089B (en) 2018-04-02 2018-04-02 Method, device, equipment and storage medium for identifying table content in picture file

Publications (2)

Publication Number Publication Date
CN108734089A true CN108734089A (en) 2018-11-02
CN108734089B CN108734089B (en) 2023-04-18

Family

ID=63940603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810285135.5A Active CN108734089B (en) 2018-04-02 2018-04-02 Method, device, equipment and storage medium for identifying table content in picture file

Country Status (1)

Country Link
CN (1) CN108734089B (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726643A (en) * 2018-12-13 2019-05-07 北京金山数字娱乐科技有限公司 The recognition methods of form data, device, electronic equipment and storage medium in image
CN109740135A (en) * 2018-12-19 2019-05-10 平安普惠企业管理有限公司 Chart generation method and device, electronic equipment and storage medium
CN109871524A (en) * 2019-02-21 2019-06-11 腾讯科技(深圳)有限公司 A kind of chart generation method and device
CN110059688A (en) * 2019-03-19 2019-07-26 平安科技(深圳)有限公司 Pictorial information recognition methods, device, computer equipment and storage medium
CN110147774A (en) * 2019-05-23 2019-08-20 阳光保险集团股份有限公司 Sheet format picture printed page analysis method and computer storage medium
CN110287854A (en) * 2019-06-20 2019-09-27 北京百度网讯科技有限公司 Extracting method, device, computer equipment and the storage medium of table
WO2020098078A1 (en) * 2018-11-12 2020-05-22 平安科技(深圳)有限公司 Method and apparatus for generating ocr training sample, device and readable storage medium
CN111507230A (en) * 2020-04-11 2020-08-07 创景未来(北京)科技有限公司 Method and system for identifying and extracting document and table data
CN111683285A (en) * 2020-08-11 2020-09-18 腾讯科技(深圳)有限公司 File content identification method and device, computer equipment and storage medium
CN111898528A (en) * 2020-07-29 2020-11-06 腾讯科技(深圳)有限公司 Data processing method and device, computer readable medium and electronic equipment
WO2020232866A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Scanned text segmentation method and apparatus, computer device and storage medium
CN112115111A (en) * 2019-06-20 2020-12-22 上海怀若智能科技有限公司 OCR-based document version management method and system
CN112395830A (en) * 2019-07-31 2021-02-23 腾讯科技(深圳)有限公司 Form processing method based on Wan Guo code and related device
WO2021042507A1 (en) * 2019-09-02 2021-03-11 苏州朗动网络科技有限公司 Method and device for extracting table data from pdf file, and storage medium
CN112507909A (en) * 2020-12-15 2021-03-16 信号旗智能科技(上海)有限公司 Document data extraction method, device, equipment and medium based on OCR recognition
CN112509661A (en) * 2021-02-03 2021-03-16 南京吉拉福网络科技有限公司 Methods, computing devices, and media for identifying physical examination reports
WO2021072885A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Method and apparatus for recognizing text, device and storage medium
WO2021147222A1 (en) * 2020-01-22 2021-07-29 平安科技(深圳)有限公司 Ocr-based table layout restoration method and device, electronic apparatus, and storage medium
CN113449559A (en) * 2020-03-26 2021-09-28 顺丰科技有限公司 Table identification method and device, computer equipment and storage medium
CN113504863A (en) * 2021-06-02 2021-10-15 珠海金山办公软件有限公司 Method and device for realizing picture screening, computer storage medium and terminal
CN113723301A (en) * 2021-08-31 2021-11-30 广州新丝路信息科技有限公司 Imported goods customs clearance list OCR recognition branch processing method and device
CN115019320A (en) * 2022-06-30 2022-09-06 京东方科技集团股份有限公司 Data extraction method, device, equipment and storage medium
CN116127928A (en) * 2023-04-17 2023-05-16 广东粤港澳大湾区国家纳米科技创新研究院 Table data identification method and device, storage medium and computer equipment
CN116156212A (en) * 2023-02-21 2023-05-23 广州虎牙科技有限公司 Live video processing method and system
CN115019320B (en) * 2022-06-30 2024-10-18 京东方科技集团股份有限公司 Data extraction method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184265A (en) * 2015-09-14 2015-12-23 哈尔滨工业大学 Self-learning-based handwritten form numeric character string rapid recognition method
JP2016009223A (en) * 2014-06-23 2016-01-18 株式会社日立情報通信エンジニアリング Optica character recognition device and optical character recognition method
US20160055376A1 (en) * 2014-06-21 2016-02-25 iQG DBA iQGATEWAY LLC Method and system for identification and extraction of data from structured documents

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160055376A1 (en) * 2014-06-21 2016-02-25 iQG DBA iQGATEWAY LLC Method and system for identification and extraction of data from structured documents
JP2016009223A (en) * 2014-06-23 2016-01-18 株式会社日立情報通信エンジニアリング Optica character recognition device and optical character recognition method
CN105184265A (en) * 2015-09-14 2015-12-23 哈尔滨工业大学 Self-learning-based handwritten form numeric character string rapid recognition method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
仲小挺: ""基于自学习的手写表格数字字符串快速识别方法的研究"" *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020098078A1 (en) * 2018-11-12 2020-05-22 平安科技(深圳)有限公司 Method and apparatus for generating ocr training sample, device and readable storage medium
CN109726643A (en) * 2018-12-13 2019-05-07 北京金山数字娱乐科技有限公司 The recognition methods of form data, device, electronic equipment and storage medium in image
CN112818812A (en) * 2018-12-13 2021-05-18 北京金山数字娱乐科技有限公司 Method and device for identifying table information in image, electronic equipment and storage medium
CN112818812B (en) * 2018-12-13 2024-03-12 北京金山数字娱乐科技有限公司 Identification method and device for table information in image, electronic equipment and storage medium
CN109740135A (en) * 2018-12-19 2019-05-10 平安普惠企业管理有限公司 Chart generation method and device, electronic equipment and storage medium
CN109871524A (en) * 2019-02-21 2019-06-11 腾讯科技(深圳)有限公司 A kind of chart generation method and device
CN110059688A (en) * 2019-03-19 2019-07-26 平安科技(深圳)有限公司 Pictorial information recognition methods, device, computer equipment and storage medium
CN110059688B (en) * 2019-03-19 2024-05-28 平安科技(深圳)有限公司 Picture information identification method, device, computer equipment and storage medium
WO2020232866A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Scanned text segmentation method and apparatus, computer device and storage medium
CN110147774A (en) * 2019-05-23 2019-08-20 阳光保险集团股份有限公司 Sheet format picture printed page analysis method and computer storage medium
CN110147774B (en) * 2019-05-23 2021-06-15 阳光保险集团股份有限公司 Table format picture layout analysis method and computer storage medium
CN110287854B (en) * 2019-06-20 2022-06-10 北京百度网讯科技有限公司 Table extraction method and device, computer equipment and storage medium
CN112115111A (en) * 2019-06-20 2020-12-22 上海怀若智能科技有限公司 OCR-based document version management method and system
CN110287854A (en) * 2019-06-20 2019-09-27 北京百度网讯科技有限公司 Extracting method, device, computer equipment and the storage medium of table
CN112395830A (en) * 2019-07-31 2021-02-23 腾讯科技(深圳)有限公司 Form processing method based on Wan Guo code and related device
WO2021042507A1 (en) * 2019-09-02 2021-03-11 苏州朗动网络科技有限公司 Method and device for extracting table data from pdf file, and storage medium
WO2021072885A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Method and apparatus for recognizing text, device and storage medium
WO2021147222A1 (en) * 2020-01-22 2021-07-29 平安科技(深圳)有限公司 Ocr-based table layout restoration method and device, electronic apparatus, and storage medium
CN113449559A (en) * 2020-03-26 2021-09-28 顺丰科技有限公司 Table identification method and device, computer equipment and storage medium
CN111507230A (en) * 2020-04-11 2020-08-07 创景未来(北京)科技有限公司 Method and system for identifying and extracting document and table data
CN111898528A (en) * 2020-07-29 2020-11-06 腾讯科技(深圳)有限公司 Data processing method and device, computer readable medium and electronic equipment
CN111898528B (en) * 2020-07-29 2023-11-10 腾讯科技(深圳)有限公司 Data processing method, device, computer readable medium and electronic equipment
CN111683285A (en) * 2020-08-11 2020-09-18 腾讯科技(深圳)有限公司 File content identification method and device, computer equipment and storage medium
CN112507909A (en) * 2020-12-15 2021-03-16 信号旗智能科技(上海)有限公司 Document data extraction method, device, equipment and medium based on OCR recognition
CN112509661A (en) * 2021-02-03 2021-03-16 南京吉拉福网络科技有限公司 Methods, computing devices, and media for identifying physical examination reports
CN112509661B (en) * 2021-02-03 2021-05-25 南京吉拉福网络科技有限公司 Methods, computing devices, and media for identifying physical examination reports
CN113504863A (en) * 2021-06-02 2021-10-15 珠海金山办公软件有限公司 Method and device for realizing picture screening, computer storage medium and terminal
CN113723301A (en) * 2021-08-31 2021-11-30 广州新丝路信息科技有限公司 Imported goods customs clearance list OCR recognition branch processing method and device
CN113723301B (en) * 2021-08-31 2024-08-30 广州新丝路信息科技有限公司 OCR recognition and branch processing method and device for import goods customs declaration
CN115019320A (en) * 2022-06-30 2022-09-06 京东方科技集团股份有限公司 Data extraction method, device, equipment and storage medium
CN115019320B (en) * 2022-06-30 2024-10-18 京东方科技集团股份有限公司 Data extraction method, device, equipment and storage medium
CN116156212A (en) * 2023-02-21 2023-05-23 广州虎牙科技有限公司 Live video processing method and system
CN116127928A (en) * 2023-04-17 2023-05-16 广东粤港澳大湾区国家纳米科技创新研究院 Table data identification method and device, storage medium and computer equipment

Also Published As

Publication number Publication date
CN108734089B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN108734089A (en) Identify method, apparatus, equipment and the storage medium of table content in picture file
US11232300B2 (en) System and method for automatic detection and verification of optical character recognition data
US10482174B1 (en) Systems and methods for identifying form fields
CN112185520B (en) Text structuring processing system and method for medical pathology report picture
US10489645B2 (en) System and method for automatic detection and verification of optical character recognition data
WO2019075820A1 (en) Test paper reviewing system
US11816138B2 (en) Systems and methods for parsing log files using classification and a plurality of neural networks
WO2023279045A1 (en) Ai-augmented auditing platform including techniques for automated document processing
US9286526B1 (en) Cohort-based learning from user edits
CN112749547A (en) Generation of text classifier training data
US20120026081A1 (en) System and method for using paper as an interface to computer applications
CN109783796A (en) Predict that the pattern in content of text destroys
US11315353B1 (en) Systems and methods for spatial-aware information extraction from electronic source documents
CN111090641A (en) Data processing method and device, electronic equipment and storage medium
CN112509661B (en) Methods, computing devices, and media for identifying physical examination reports
CN108170468A (en) The method and its system of a kind of automatic detection annotation and code consistency
CN110135225A (en) Sample mask method and computer storage medium
CN108597565A (en) It is a kind of that method of calibration is cooperateed with the clinical queuing data of name entity extraction technology based on OCR
CN106529381A (en) Information processing apparatus and information processing method
JP2019212115A (en) Inspection device, inspection method, program, and learning device
CN115937887A (en) Method and device for extracting document structured information, electronic equipment and storage medium
RU2702967C1 (en) Method and system for checking an electronic set of documents
CN111008624A (en) Optical character recognition method and method for generating training sample for optical character recognition
CN112308048B (en) Medical record integrity judging method, device and system based on small quantity of marked data
EP3640861A1 (en) Systems and methods for parsing log files using classification and a plurality of neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant