CN116311306A

CN116311306A - Form identification method and device based on computer vision algorithm and related components

Info

Publication number: CN116311306A
Application number: CN202310268786.4A
Authority: CN
Inventors: 梁辉; 刘志; 朱晓宁; 李嘉驹
Original assignee: Jingying Digital Technology Co Ltd
Current assignee: Jingying Digital Technology Co Ltd
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2023-06-23

Abstract

The invention discloses a form identification method and device based on a computer vision algorithm and related components, and relates to the field of image processing. The method comprises the steps of performing text recognition on a received target image, obtaining and screening all text contents, and merging the text contents meeting preset merging rules; acquiring position information and identification direction information of each field name based on the received template matching information; based on the identification direction information corresponding to the current field name, matching a preset ordering rule and ordering all the combined text contents to obtain a text content set; acquiring target text content from the text content set based on the position information of the field names and the template matching information; traversing all field names and outputting a target form. The method can automatically match text contents of the uploaded target image, so that the contents of the output target form and the target image are the same, namely, the input efficiency and the input accuracy are improved.

Description

Form identification method and device based on computer vision algorithm and related components

Technical Field

The present invention relates to the field of image processing, and in particular, to a form recognition method, apparatus and related components based on a computer vision algorithm.

Background

Along with the development of artificial intelligence technology and 5G technology at present, artificial intelligence has outstanding contributions in many fields, has huge market in file entry convenience, long-time loaded down with trivial details information entry, probably leads to the tired of entry personnel, information entry mistake, just needs to use ocr custom form recognition algorithm at this moment, and it can automatic identification form information in the image to automatic generation corresponds correct information and fills in the form.

The ocr form recognition algorithm in the current market aims at the situation that the recognized text content cannot be accurately filled in the correct position by aiming at an indefinite length form or a dynamic change form, so that more content is wrongly input in the output form.

Disclosure of Invention

The invention aims to provide a form identification method, a form identification device and related components based on a computer vision algorithm, and aims to solve the problem that the existing form identification algorithm has poor applicability to an indefinite length form or a dynamic change form.

In order to solve the technical problems, the aim of the invention is realized by the following technical scheme: there is provided a form recognition method based on a computer vision algorithm, comprising:

Receiving a target image, and performing character recognition on the target image by utilizing an OCR character recognition technology to obtain all text contents;

screening all the text contents, and combining the text contents meeting preset combining rules to obtain combined text contents;

acquiring position information and identification direction information of each field name based on the received template matching information;

based on the identification direction information corresponding to the current field name, matching a preset ordering rule;

based on the ordering rule, ordering all the combined text contents to obtain a text content set;

acquiring target text content from the text content set based on the position information of the field name and the template matching information;

and traversing all field names, acquiring corresponding target text contents, and outputting a target form.

In addition, the technical problem to be solved by the invention is to provide a form recognition device based on a computer vision algorithm, which comprises:

the character recognition unit is used for receiving the target image, and performing character recognition on the target image by utilizing an OCR character recognition technology to obtain all text contents;

The merging unit is used for screening all the text contents and merging the text contents meeting the preset merging rule to obtain merged text contents;

the information acquisition unit is used for acquiring the position information and the identification direction information of each field name based on the received template matching information;

the matching unit is used for matching a preset ordering rule based on the identification direction information corresponding to the current field name;

the ordering unit is used for ordering all the combined text contents based on the ordering rule to obtain a text content set;

a target obtaining unit, configured to obtain target text content from the text content set based on the location information of the field name and the template matching information;

and the output unit is used for traversing all field names, acquiring corresponding target text contents and outputting a target form. .

In addition, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the form recognition method based on the computer vision algorithm according to the first aspect when executing the computer program.

In addition, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program when executed by a processor causes the processor to execute the form recognition method based on the computer vision algorithm described in the first aspect.

The embodiment of the invention discloses a form identification method, a form identification device and related components based on a computer vision algorithm, wherein the method comprises the following steps: receiving a target image, and performing character recognition on the target image by utilizing an OCR character recognition technology to obtain all text contents; screening all the text contents, and combining the text contents meeting preset combining rules to obtain combined text contents; acquiring position information and identification direction information of each field name based on the received template matching information; based on the identification direction information corresponding to the current field name, matching a preset ordering rule; based on the ordering rule, ordering all the combined text contents to obtain a text content set; acquiring target text content from the text content set based on the position information of the field name and the template matching information; and traversing all field names, acquiring corresponding target text contents, and outputting a target form. The method can automatically match text contents of the uploaded target image, so that the contents of the output target form and the target image are the same, namely, the input efficiency and the input accuracy are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a form identification method based on a computer vision algorithm according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a form in a form identification method based on a computer vision algorithm according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a target image in a form recognition method based on a computer vision algorithm according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of text content identified in a form identification method based on a computer vision algorithm according to an embodiment of the present invention;

FIG. 5 is a schematic block diagram of a form recognition device based on a computer vision algorithm provided by an embodiment of the present invention;

fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1, fig. 1 is a flowchart of a form recognition method based on a computer vision algorithm according to an embodiment of the present invention;

as shown in FIG. 1, the method includes steps S101-S107.

S101, receiving a target image, and performing character recognition on the target image by utilizing an OCR character recognition technology to obtain all text contents;

in this embodiment, the target image may be obtained by shooting through a device end with a camera function, such as a mobile phone, and after receiving the target image sent by the device end, the terminal identifies the characters in the target image through an OCR character recognition technology, so as to obtain all text contents in the target image; it should be added that the form layout in the target image should be consistent with the form format established in the terminal in advance, that is, template matching information of the form input in the terminal in advance is required.

The template matching information includes a field name, identification direction information, identification number information, a stop key attribute, category information, and a table attribute value, where the field name is unique in the table, for example, as shown in fig. 2, the field name includes a lease unit, a maintenance unit, a serial number, a device name, a specification model, a manufacturer, and the like, and the identification direction information corresponding to the field name, that is, from the location of the field name, the direction of the target text content relative to the field name includes a right identification or downward identification manner, for example, a right identification result from the location of the field name "lease unit" is "mine equipment lease company (double willow coal mine)", and a downward identification result from the location of the field name "serial number" is "1"; the identification quantity information, that is, the field name matches several results to the corresponding direction based on the corresponding identification direction information, for example, the "lease unit" field name matches 1 result to the right based on the "right identification information", and the "serial number" field name matches 1 result downward based on the "down identification information" (that is, automatically judging the quantity of the target text content); a stop key attribute, that is, a text content output by a terminal name stops the text content output after a stop key is recognized, for example, a field name "serial number" stops outputting the text content after a "device damage condition" is recognized; category information, i.e. what type the text content corresponding to the field name belongs to, for example, the object corresponding to the "serial number" field name is that the content belongs to an integer type, and the object corresponding to the "equipment name" field name is that the content belongs to a character string type; the table attribute value, i.e., whether the current field name belongs to a table type, and defines the table name. For example, the serial number, the equipment name, the specification model, the manufacturer and the like belong to the same table, and can be defined as a table1 table name for subsequent use.

It should be noted that, at the time of setting, the specific contents of the form and the form specification may be other, and fig. 2 of the present application is merely for exemplary purposes, and thus is not described.

In a specific embodiment, before the step S101, the method includes the following steps:

s10, identifying all the table frames in the target image, calculating perimeter data of each table frame, and taking the table frame corresponding to the perimeter data meeting the conditions as a target contour;

s11, performing perspective conversion on the target outline.

In this embodiment, since the target image shot by the device side has a table and has table frames of the table, the table frames need to be transparent to obtain a schematic diagram as shown in fig. 4, and in this way, recognition of characters in the target image by OCR character recognition technology is facilitated.

S102, screening all the text contents, and combining the text contents meeting preset combining rules to obtain combined text contents;

in this embodiment, the step S102 includes the following steps:

s20, after the coordinates of the central points of the adjacent text contents are obtained, calculating the text distance of the adjacent text contents;

It should be noted that, the target image is formed by a plurality of pixels, and the position of each pixel is taken as a corresponding coordinate, where the text content is a frame diagram as shown in fig. 3, each text content is an independent frame diagram, for example, the initially identified text content is 4 of the following: the bevel gear of the cutting head reducer of the heading machine is damaged, the cutting head pick and the cutting head seat are seriously worn, and the cutting head needs to be replaced; the creeper tread is severely worn; the tail rollers, scraper chain wheels and the driving wheels of the intermediate conveyor are seriously worn and need to be replaced; part of oil pipes, joints and oil leakage in a hydraulic system of the heading machine; the spline of the rotary disk with one side of the harrow teeth is severely worn, and the rotary disk needs to be replaced ", but in the actual content of the table, the 4 text contents should belong to the same sentence and are only divided into different rows, so the text contents need to be combined.

In this embodiment, coordinates of an intermediate pixel point on the left side of the current frame map and coordinates of an intermediate pixel point on the right side are obtained, numerical values of horizontal coordinates of 2 coordinates are added and averaged to obtain a center point coordinate of the current frame map, and after center point coordinates of 2 adjacent text contents are obtained respectively, a difference between center horizontal coordinates and a difference between center vertical coordinates of the 2 center point coordinates are calculated to obtain a corresponding text distance.

S21, judging whether the central abscissa in the 2 central point coordinates is equal or not after judging that the text space is smaller than a preset text space threshold; if the central abscissas in the 2 central point coordinates are equal, executing step S22, and if the central abscissas in the 2 central point coordinates are not equal, acquiring the left endpoint coordinates corresponding to the adjacent text content, executing step S23;

in this embodiment, if the text space is smaller than the preset text space threshold, it is indicated that the distances between the 2 text contents are relatively close, so that it can be determined that the 2 text contents belong to the same text content, and only in different rows, specifically, it is determined whether the central point abscissas corresponding to the 2 adjacent text contents are equal, if so, it can be directly determined that the 2 adjacent text contents belong to the same text content, and then the 2 frame patterns are combined into the same frame pattern to represent the same text content.

S22, merging the adjacent text contents into 1 text content, and updating the center point coordinates;

it should be noted that since 2 frame graphs are combined into 1 frame graph, that is, 2 text contents are combined into 1 text content, it is necessary to perform center point coordinate update on the combined text contents.

S23, calculating a left endpoint abscissa pair Ji Chazhi in the 2 left endpoint coordinates, and entering into step S24;

since in the actual operation process, besides the layout format with aligned centers, there may be layout formats with aligned left of the adjacent text contents, this may cause that the abscissas of the coordinates of the 2 center points belonging to the same text content are different, so the present application obtains the abscissas of the coordinates of the endpoints of the upper left corner of the 2 frame-shaped diagrams, and calculates the alignment difference between the 2 abscissas.

And S24, after judging that the pair Ji Chazhi is smaller than the alignment threshold, merging the adjacent text contents into 1 text content, and updating the center point coordinates.

In this embodiment, by comparing the alignment difference value with the alignment threshold value, it may be determined whether 2 adjacent texts belong to the same text content, if so, 2 frame patterns are combined into the same frame pattern, and the center point coordinates are updated.

In this embodiment, adjacent text contents are judged as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

abscissa representing the coordinates of the upper left corner end point of one of the frame patterns, +.>

The abscissa representing the coordinates of the upper left corner end point of the other frame-shaped diagram, +. >

Representing the ordinate in the coordinates of the central point of one of the block diagrams,

representing the ordinate of the center point coordinates of the other frame-shaped diagram,/>

Represents the abscissa in the coordinates of the center point of one of the frame patterns,/>

Representing the abscissa in the center point coordinate of the other frame-shaped diagram, w representing the width of the target image, w representing the difference between the upper left and right coordinates of the target image,/x>

Representing an alignment threshold and a text pitch threshold.

In an embodiment, after the step S102:

s30, acquiring center point coordinates of adjacent text contents, wherein the center point coordinates comprise a center abscissa and a center ordinate;

s31, calculating the width difference values of 2 center point coordinates, and unifying the center abscissa coordinates in the 2 center point coordinates after judging that the width difference values are smaller than a width threshold value;

s32, calculating the height difference value of the 2 center point coordinates, and unifying the center ordinate of the 2 center point coordinates after judging that the height difference value is smaller than the height threshold value.

In this embodiment, since the device side may have a picture tilt during the shooting process, that is, shift deviation occurs to the text content, for example, the center point coordinate of the field name "lease unit" is (50, 50), and the center point coordinate of the field name "repair unit" is (200, 45), the field name "lease unit" and the field name "repair unit" are not at the desired level, so that the following formula is required to perform position correction on all the text content:

Wherein h represents the difference between the upper left and lower left coordinates of the target image and the ordinate in the above formula

Represents the width threshold value +.>

Representing a height threshold.

The coordinates of the center point of the field name "maintenance unit" are updated from (200, 45) to (200, 50) by the above formula, in other words, by the embodiment, all text contents can be subjected to contrast correction, and the difference of the horizontal coordinates of all the coordinates of the center point is smaller than the width

The horizontal coordinates of the central points of the text contents of (a) are unified to be the same, and the vertical coordinate difference of the coordinates of all the central points is smaller than the height +.>

The ordinate of the center point of the text content is unified to be the same so as to improve the execution accuracy of the subsequent ordering rule.

S103, acquiring position information and identification direction information of each field name based on the received template matching information, wherein the identification direction information comprises rightward identification direction information and downward identification direction information;

in this embodiment, the template matching information is a field name, identification direction information, identification number information, a stop key attribute, category information, and a table attribute value that are input in advance, where the field name includes its position information in the table, that is, it carries center point coordinate information representing its own position, and in this way, the center point coordinate of the identified text content and the center point coordinate in the template matching information can be matched, so that it is known that the field name in the identified text content, such as "lease unit", "repair unit", "serial number", and the like, can be matched with the field name position in the target table.

S104, matching a preset ordering rule based on the identification direction information corresponding to the current field name;

s105, sorting all the combined text contents based on the sorting rule to obtain a text content set;

in this embodiment, the step S105 includes the following steps:

s40, acquiring the center point coordinates of all the text contents according to the sequence from left to right;

s41, based on the numerical values of the central point abscissas in all the central point coordinates, sequentially arranging corresponding text contents in order from small to large to obtain a rightward text content set, wherein when the central point abscissas are judged to be identical, the numerical values of the central ordinates in the corresponding central point coordinates are obtained, and the corresponding text contents are sequentially ordered in order from small to large;

in this embodiment, for ease of understanding, the center point coordinates of "lease unit branch leader" are set to (45, 250), the center point coordinates of "serial number" to (48, 80), the center point coordinates of "1" to (48, 100), the center point coordinates of "lease unit" to (50, 50), the center point coordinates of "equipment name" to (60, 80), the center point coordinates of "heading machine" to (60, 80), the center point coordinates of "mine equipment lease company (double willow coal mine)" to (70, 50), and the center point coordinates of "specification model and manufacturer" to (70, 80).

After the current field name "lease unit" is located at the position of the form, it is known from the template matching information that the identification direction information of the current field name "lease unit", that is, the current field name "lease unit" is identified to the right, so that all the remaining field names are ordered according to the numerical value of the central point abscissa in the central point coordinate in the order from left to right, to obtain the following order: the leasing unit is used for managing the leader opinion, the serial number/1, the leasing unit, the equipment name/heading machine, the mining equipment leasing company (double willow coal mine)/the specification model and the manufacturer, so that the serial number/1, the equipment name/heading machine, the mining equipment leasing company (double willow coal mine)/the specification model and the manufacturer are not orderly ordered according to the abscissa size due to the fact that the abscissa is equal, and the application orders the sequences again based on the ordinate size, wherein the ordering sequence is as follows: lease unit branch leader opinion, serial number, 1, lease unit, equipment name, heading machine, mine equipment lease company (double willow coal mine), specification and model, and manufacturer.

S42, acquiring the center point coordinates of all text contents according to the sequence from top to bottom;

S43, based on the numerical values of the ordinate of the center point in all the center point coordinates, sequentially arranging the corresponding text contents in order from small to large to obtain a downward text content set, wherein when the ordinate of the center point is judged to be identical, the numerical values of the abscissa of the center point in the corresponding center point coordinates are obtained, and the corresponding text contents are sequentially ordered in order from small to large.

Similarly, after the current field name "serial number" is located at the position of the form, according to the template matching information, the identification direction information of the current field name "serial number", that is, the current field name "lease unit" is identified downwards, so that all the field names are ordered according to the numerical value of the ordinate of the center point in the coordinate of the center point in the order from top to bottom (after the equipment name above the serial number, the mine equipment lease company (double willow coal mine), the repair unit, the shanxi Fenxi Xinhua Yi industry limited company has been entered into the form, the following order is obtained by deleting from the text content collection: the secondary work/original white house in the 1/heading machine/EBZ-260H is stored on the ground, and the use/part/1/748.3 ten thousand/150 ten thousand/8 years of the bottom suction roadway is used/part/used/1/748.3 ten thousand/150 ten thousand/8 years after the repair is carried out, so that the equipment damage condition is … …; from this, it is clear that since the retransmission/original white house in the field name "1/heading machine/EBZ-260H is stored on the ground, and the ordinate in the center point coordinates of the usage/section/1/748.3 ten thousand/150 ten thousand/8 years" of the bottom suction lane after repair 43 (4) 07 is the same, it is necessary to determine the abscissa of the center point coordinates of the field name again, so that the latest downward text content set is obtained: 1. the heading machine and the secondary work in the EBZ-260H are stored on the ground, and the use, the parts, the 1, the 748.3 ten thousand, the 150 ten thousand/8 years and the equipment damage condition … … of the bottom suction roadway of 43 (4) 07 are repaired.

S106, acquiring target text content from the text content set based on the position information of the field names and the template matching information;

in this embodiment, the step S106 includes the following steps:

s50, acquiring a form attribute value and identification number information carried by a field name based on the template matching information;

it should be noted that, according to the table attribute value, it may be determined whether the current field name belongs to a table, where the table in the present application refers to regular table frames, i.e. a table frame corresponding to a sequence number, a device name, a specification sequence number, a manufacturer, an original use and repair use place, a unit, a number, a device original value, a repair price, a used service life, a table frame where a last overhaul time is located, and a table frame corresponding to a corresponding table frame where a target text content needs to be entered, and other table frames corresponding to field names such as a lease unit, a repair unit, a device damage condition do not belong to the table.

S51, judging whether the current field name is a table field or not based on the table attribute value; if the current field name is not the table field, then step S52 is performed; if the current field name belongs to the table, executing step S53;

In this embodiment, since the template matching information carries the table attribute value of the current field name, the table attribute value of the identified field name can be obtained, so as to determine whether the field name belongs to a table, that is, whether the field name belongs to a table field, for example, if the table attribute value of a lease unit is 0, then the lease unit does not belong to a table, and is not a table field, and if the table attribute value of a sequence number is 1, then the sequence number belongs to a table, and is a table field.

S52, acquiring identification quantity information and identification direction information carried by the current field name, and outputting target text content from the corresponding text content set based on the identification quantity information and the identification direction information

In this embodiment, for example, the current field name "lease unit", the carried identification number information is 1 target text content, the carried identification direction information is right, then a right text content set is acquired based on the identification direction information, then, from the right text content set, a first text content equal to the abscissa in the center point coordinate of the current field name, that is, "mine equipment lease company (double willow coal mine)", for example, the current field name "equipment name", the carried identification number information is 1 target text content, the carried identification direction information is down, then, a down text content set is acquired based on the identification direction information, and then, from the down text content set, a first text content equal to the ordinate in the center point coordinate of the current field name, that is, "heading machine" is selected.

For example, the identification number information of the current field name "serial number" is 1, the identification direction information is downward, then the text content represented by the center abscissa having the same center abscissa as the previous field name "serial number" and the center ordinate nearest to the center ordinate of the previous field name "serial number", that is, the text content "1", is acquired in the downward text content set, wherein the text content "1" is in the order (because other text contents having the same center ordinate as the text content "1" are in the order of the text content "1" according to the order rule, and when the target text content of the current field name "serial number" is output, the text content "1" is deleted from the downward text content set and the rightward text content set).

Similarly, for example, if the identification number information of the current field name "internal repair budget price" is 1 and the identification direction information is rightward, the text content represented by the center abscissa having the same center ordinate as the current field name "internal repair budget price" and the nearest center abscissa to the center abscissa of the previous field name "serial number", that is, the text content "150 ten thousand" is acquired from the rightward text content set.

S53, judging whether the current form field carries a stop key attribute, and if so, executing a step S54; if the table field does not carry the stop key attribute, executing step S56;

s54, outputting all target text contents based on the stop key attribute and the identification direction information carried by the current form field, acquiring coordinates in the center point coordinates of the last target text content to be output based on the identification direction information, and judging whether the boundary threshold is equal to the boundary threshold of the last form field or not, and if not, executing the step S55;

s55, updating the boundary threshold value of the current form field based on the updating rule;

in this embodiment, in order to improve the recognition efficiency, in the present application, for the same table, for example, the field names such as "serial number", "equipment name", "specification model" and "… …" of manufacturer "in the first table are all table fields, where the serial number of the table field is the first field name of the table, so the serial number of the table field is provided with a stop key attribute (the stop key attribute is the central coordinate point of the equipment damage condition), since the recognition direction information of the serial number of the table field is downward, the text content with the same central abscissa as the serial number of the table field is obtained from the downward text content set, and the corresponding target text content, that is, the target text content" 1", is output according to the size order of the central ordinate, and it should be noted that the central ordinate of the target text content is smaller than the position of the stop key attribute, that is, that the central ordinate of the field name" 1 "should be smaller than the central ordinate of the field name" equipment damage condition ".

After outputting the field name "1", taking the central ordinate of the central point coordinates of the field name "1" as the boundary threshold value, and then in the identification process of the next table field "equipment name", the stop key attribute of the table field "equipment name" is the field name "equipment damage condition", so that the corresponding target text content is the heading machine, wherein the ordinate of the central point coordinates of the target text content "heading machine" is equal to the central point ordinate of the last field name "1", so that the boundary threshold values of the 2 field names are the same, and the boundary threshold value does not need to be updated.

In this way, the following field names can be quickly entered by taking the coordinates of the center point of the target text content "second heading machine" as a new boundary threshold, and discarding the coordinates of the center point of the last target text content "1" (because the identification omission may occur in the form identification process for the target text content corresponding to the "serial number", i.e., only "1" is identified, and no "2") if the table field "equipment name" corresponds to 2 field names (e.g., the "first heading machine" and the "second heading machine").

In short, when the recognition direction information is right recognition, judging whether center_x (center abscissa) in the center point coordinates of the current form field is greater than a boundary threshold value X (center abscissa in the center point coordinates of the previous form field), and if so, stopping the output of the target text content; when the identification direction information is downward identification, judging whether center_y (center ordinate) in the center point coordinates of the current form field is larger than a boundary threshold value X (center ordinate in the center point coordinates of the last form field), and if so, stopping outputting the target text content.

S56, acquiring a boundary threshold value of a last table field, and outputting all target text contents based on the boundary threshold value of the last table field;

in this embodiment, if the current table field does not carry the stop key attribute, the boundary threshold of the last table field is obtained, and all the target text contents are output based on the boundary threshold of the last table field.

S107, traversing all field names, acquiring corresponding target text contents, and outputting a target form.

In a specific embodiment, the form recognition method based on the computer vision algorithm provided by the embodiment of the invention further includes the following steps:

S108, based on the template matching information, acquiring category information of the current field name;

and S109, judging whether the current target text content corresponding to the field name accords with the category information, and if not, deleting the target text content.

In this embodiment, after the target text content corresponding to the current field name is obtained, based on the category information carried by the current field name, whether the target text content accords with the category information is determined, for example, the category information is of an integer type, and the obtained target text content is of a character string type, which indicates that the identified target text content is wrong, so that the target text content is deleted when the target form is generated.

According to the method and the device, text content matching can be automatically carried out on the uploaded target image, so that the content in the output target form is the same as that in the target image, namely, the input efficiency is improved, and the cost of manual input is reduced.

The embodiment of the invention also provides a form recognition device based on the computer vision algorithm, which is used for executing any embodiment of the form recognition method based on the computer vision algorithm. In particular, referring to fig. 5, fig. 5 is a schematic block diagram of a form recognition device based on a computer vision algorithm according to an embodiment of the present invention.

As shown in fig. 5, the form recognition apparatus 500 based on the computer vision algorithm includes:

the character recognition unit 501 is configured to receive a target image, and perform character recognition on the target image by using an OCR character recognition technology to obtain all text contents;

the merging unit 502 is configured to screen all the text contents, merge the text contents that meet a preset merging rule, and obtain merged text contents;

an information obtaining unit 503, configured to obtain location information and identification direction information of each field name based on the received template matching information;

a matching unit 504, configured to match a preset ordering rule based on the identification direction information corresponding to the current field name;

the sorting unit 505 is configured to sort all the combined text contents based on the sorting rule, so as to obtain a text content set;

a target obtaining unit 506, configured to obtain target text content from the text content set based on the location information of the field name and the template matching information;

and the output unit 507 is used for traversing all field names, acquiring corresponding target text contents and outputting a target form.

The device only needs to upload the image, can automatically return the target form after information matching, has lower cost than manual input, greatly reduces the use and operation difficulty, and has the recognition rate of more than 90 percent.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

The form recognition apparatus based on computer vision algorithms described above may be implemented in the form of a computer program which can be run on a computer device as shown in fig. 6.

Referring to fig. 6, fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 1100 is a server, and the server may be a stand-alone server or a server cluster formed by a plurality of servers.

With reference to FIG. 6, the computer device 1100 includes a processor 1102, memory, and a network interface 1105 connected through a system bus 1101, wherein the memory may include a non-volatile storage medium 1103 and an internal memory 1104.

The non-volatile storage medium 1103 may store an operating system 11031 and computer programs 11032. The computer program 11032, when executed, causes the processor 1102 to perform a form recognition method based on computer vision algorithms.

The processor 1102 is operable to provide computing and control capabilities to support the operation of the overall computer device 1100.

The internal memory 1104 provides an environment for the execution of a computer program 11032 in the non-volatile storage medium 1103, which computer program 11032, when executed by the processor 1102, causes the processor 1102 to perform a form recognition method based on computer vision algorithms.

The network interface 1105 is used for network communication such as providing transmission of data information, etc. It will be appreciated by those skilled in the art that the architecture shown in fig. 6 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computer device 1100 to which the present inventive arrangements may be implemented, and that a particular computer device 1100 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 6 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 6, and will not be described again.

It should be appreciated that in an embodiment of the invention, the processor 1102 may be a central processing unit (Central Processing Unit, CPU), the processor 1102 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), field programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor implements a form recognition method based on a computer vision algorithm of an embodiment of the present invention.

The storage medium is a physical, non-transitory storage medium, and may be, for example, a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A form recognition method based on a computer vision algorithm, comprising:

2. The form recognition method based on the computer vision algorithm of claim 1, wherein before the target image is text-recognized by using the OCR text recognition technology, the method comprises:

identifying all the table frames in the target image, calculating perimeter data of each table frame, and taking the table frame corresponding to the perimeter data meeting the conditions as a target contour;

and performing perspective conversion on the target contour.

3. The form recognition method based on the computer vision algorithm of claim 1, wherein the character recognition of the target image by using the OCR character recognition technology, after obtaining all text contents, comprises:

Acquiring a center point coordinate of adjacent text content, wherein the center point coordinate comprises a center abscissa and a center ordinate;

calculating the width difference value of the 2 center point coordinates, and unifying the center abscissa coordinates in the 2 center point coordinates after judging that the width difference value is smaller than a width threshold value;

and calculating the height difference values of the 2 center point coordinates, and unifying the center ordinate of the 2 center point coordinates after judging that the height difference values are smaller than a height threshold value.

4. The form recognition method based on the computer vision algorithm of claim 3, wherein the filtering all the text contents, merging the text contents satisfying a preset merging rule, to obtain a merged text content, includes:

after the coordinates of the central points of the adjacent text contents are obtained, calculating the text distance of the adjacent text contents;

after judging that the text space is smaller than a preset text space threshold value, judging whether the central abscissa coordinates in the 2 central point coordinates are equal or not;

if the central abscissa coordinates in the 2 central point coordinates are equal, merging the adjacent text contents into 1 text content, and updating the central point coordinates;

If the central abscissa among the 2 central point coordinates is not equal, acquiring a left endpoint coordinate corresponding to the adjacent text content, and calculating a pair Ji Chazhi of left endpoint abscissas among the 2 left endpoint coordinates;

after determining that the pair Ji Chazhi is less than the alignment threshold, merging the adjacent text content into 1 text content and updating the center point coordinates.

5. The form recognition method based on the computer vision algorithm of claim 4, wherein the sorting all the combined text contents based on the sorting rule to obtain a text content set comprises:

acquiring the coordinates of the center points of all text contents according to the sequence from left to right;

based on the numerical values of the central point abscissas in all the central point coordinates, sequentially arranging corresponding text contents in order from small to large to obtain a rightward text content set, wherein when the central point abscissas are judged to be identical, the numerical values of the central ordinates in the corresponding central point coordinates are obtained, and the corresponding text contents are sequentially ordered in order from small to large;

acquiring the coordinates of the center points of all text contents according to the sequence from top to bottom;

And sequentially arranging corresponding text contents according to the order from small to large based on the numerical values of the ordinate of the central point in all the coordinates of the central point to obtain a downward text content set, wherein when the ordinate of the central point is judged to be identical, the numerical values of the abscissa of the central point in the coordinates of the corresponding central point are obtained, and the corresponding text contents are sequentially ordered according to the order from small to large.

6. The computer vision algorithm-based form recognition method of claim 5, wherein the obtaining the target text content from the text content collection based on the field name location information and the template matching information comprises:

acquiring a form attribute value and identification number information carried by a field name based on the template matching information;

judging whether the current field name is a table field or not based on the table attribute value;

if the current field name is a table field, continuing to judge whether the current table field carries a stop key attribute, if the current table field carries the stop key attribute, outputting all target text contents based on the stop key attribute and the identification direction information carried by the current table field, acquiring coordinates in a central point coordinate of the last target text content output based on the identification direction information, and taking the coordinates as a boundary threshold value, judging whether the boundary threshold value is equal to the boundary threshold value of the last table field, and if not, updating the boundary threshold value of the current table field based on an updating rule; if the current form field does not carry the stop key attribute, acquiring a boundary threshold value of the last form field, and outputting all target text contents based on the boundary threshold value of the last form field;

If the current field name is not the table field, acquiring identification quantity information and identification direction information carried by the current field name, and outputting target text content from the corresponding text content set based on the identification quantity information and the identification direction information.

7. The computer vision algorithm-based form recognition method of claim 6, further comprising:

based on the template matching information, obtaining category information of the current field name;

and judging whether the target text content corresponding to the field name accords with the category information or not, and deleting the target text content if the target text content does not accord with the category information.

8. A form recognition device based on a computer vision algorithm, comprising:

and the output unit is used for traversing all field names, acquiring corresponding target text contents and outputting a target form.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the computer vision algorithm-based form recognition method of any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the form recognition method based on a computer vision algorithm as claimed in any one of claims 1 to 7.