CN115439853A - Electronic bill text recognition method and device, electronic equipment and storage medium - Google Patents

Electronic bill text recognition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115439853A
CN115439853A CN202211051283.3A CN202211051283A CN115439853A CN 115439853 A CN115439853 A CN 115439853A CN 202211051283 A CN202211051283 A CN 202211051283A CN 115439853 A CN115439853 A CN 115439853A
Authority
CN
China
Prior art keywords
text
information
text information
region
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211051283.3A
Other languages
Chinese (zh)
Inventor
赖嘉伟
窦逸辛
王锟朋
卞凯
康家梁
冀乃庚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN202211051283.3A priority Critical patent/CN115439853A/en
Publication of CN115439853A publication Critical patent/CN115439853A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • G06V30/18076Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by analysing connectivity, e.g. edge linking, connected component analysis or slices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • G06V30/19093Proximity measures, i.e. similarity or distance measures

Abstract

The application discloses an electronic bill text recognition method, a device, electronic equipment and a storage medium, wherein in the application, a corresponding anchor template is determined according to the category information of an electronic bill to be recognized, after a reference text area corresponding to key text information to be recognized in an image is determined, the relative position relation between a target text area corresponding to each key text information to be recognized and the reference text area is determined according to the anchor template, even if the target text area corresponding to the key text information to be recognized changes under the influence of the resolution ratio of a user and a mobile phone, the relative position relation is unchanged, so that the application can accurately determine each target text area, and then recognize the text information in each target text area without recognizing the text information of all the text areas in the image. The flexibility and the efficiency of the text recognition of the electronic bill are improved.

Description

Electronic bill text recognition method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of text recognition technologies, and in particular, to a method and an apparatus for recognizing text in an electronic bill, an electronic device, and a storage medium.
Background
In recent years, electronic bill text recognition such as card recognition, bill recognition, and the like has been maturing in the financial field. The text recognition capability greatly assists the user or the enterprise to improve the business processing efficiency.
At present, each type of electronic bill uses a corresponding layout analysis model or selects a fixed area of an image to perform character recognition, and when the format of the electronic bill changes, the layout analysis model needs to be adjusted or newly added, so that the model adjustment is complex and the recognition period is long.
In addition, in order to adapt to text recognition scenes of different electronic bills, the prior art needs to divide text areas in text images of the electronic bills by using corresponding layout analysis models or fixed positions, and perform splicing after item-by-item recognition. The electronic bill text recognition scenario is a recognition scenario with a simple background but the key text location can vary due to the user and the cell phone resolution. When the scene is subjected to character recognition, the time consumption is prolonged or accurate recognition cannot be realized by using the conventional scheme. And for different types of electronic bills, different character area division schemes are required. Moreover, when the text of the electronic bill is identified, only the key text content is generally identified, while the scheme in the prior art identifies all the text contents based on the layout analysis model, so that the flexibility is poor and the text identification efficiency is low.
Disclosure of Invention
The embodiment of the application provides an electronic bill text recognition method and device, electronic equipment and a storage medium, and aims to solve the problems that an existing electronic bill text recognition method is poor in flexibility and low in text recognition efficiency.
The application provides an electronic bill text recognition method, which comprises the following steps:
acquiring an electronic bill image to be identified, category information and key text information to be identified, and determining a corresponding anchor point template according to the category information;
determining each text region in the image, identifying text information of each text region one by one, and determining the text region corresponding to the text information as a reference text region when the text information is determined to be any key text information to be identified;
and determining the relative position relation between the target text area corresponding to each key text information to be identified and the reference text area according to the anchor template, determining each target text area according to the relative position relation, and identifying the text information in each target text area.
Further, the determining each text region in the image comprises:
carrying out binarization processing and opening and closing operation processing on the image to obtain each connected region in the image; and taking the minimum bounding rectangle of each connected region as each text region.
Further, the determining each text region in the image comprises:
and determining an effective identification area in the image according to the anchor template, reserving a text area in the effective identification area, and filtering the text area outside the effective identification area.
Further, the identifying the text information of each text region one by one, and when determining that the text information is any key text information to be identified, determining the text region corresponding to the text information as a reference text region includes:
the method comprises the steps of acquiring text regions one by one according to a preset identification sequence, identifying text information of the currently acquired text region, determining the text region corresponding to the text information as a reference text region if the text information is consistent with any key text information to be identified, and acquiring a next text region and judging whether the text region is the reference text region if the text information is inconsistent with any key text information to be identified.
Further, after determining each text region in the image, before determining a relative positional relationship between a target text region corresponding to each key text information to be recognized and the reference text region according to the anchor template, the method further includes:
and gridding each text area to obtain the relative position coordinate information of each text area in the image.
Further, the identifying text information in the respective target text regions comprises:
determining language information of the text information in each target text region according to the anchor template, and selecting a recognition model corresponding to the language to recognize the text information in the target text region according to the language information of the text information in each target text region.
Further, the method further comprises:
and determining the similarity between the text information and the corresponding target text information in the anchor template aiming at the recognized text information in each target text region, and if the similarity is greater than a set similarity threshold, updating the text information by adopting the corresponding target text information.
In another aspect, the present application provides an electronic bill text recognition apparatus, including:
the first determination module is used for acquiring an electronic bill image to be identified, category information and key text information to be identified and determining a corresponding anchor point template according to the category information;
the second determining module is used for determining each text region in the image, identifying the text information of each text region one by one, and determining the text region corresponding to the text information as a reference text region when the text information is determined to be any key text information to be identified;
and the identification module is used for determining the relative position relation between the target text area corresponding to each key text information to be identified and the reference text area according to the anchor template, determining each target text area according to the relative position relation, and identifying the text information in each target text area.
Further, the second determining module is specifically configured to perform binarization processing and opening and closing operation processing on the image to obtain each connected region in the image; and taking the minimum bounding rectangle of each connected region as each text region.
Further, the second determining module is specifically configured to determine an effective recognition area in the image according to the anchor template, reserve a text area in the effective recognition area, and filter out the text area outside the effective recognition area.
Further, the second determining module is specifically configured to obtain text regions one by one according to a preset identification sequence, identify text information of a currently obtained text region, determine a text region corresponding to the text information as a reference text region if the text information is consistent with any key text information to be identified, and obtain a next text region and perform determination on whether the text region is the reference text region if the text information is inconsistent with any key text information to be identified.
Further, the second determining module is further configured to perform gridding processing on each text region to obtain the relative position coordinate information of each text region in the image.
Further, the identification module is specifically configured to determine language information of the text information in each target text region according to the anchor template, and select an identification model corresponding to the language to identify the text information in the target text region according to the language information of the text information in each target text region.
Further, the identification module is further configured to determine, for text information in each identified target text region, a similarity between the text information and corresponding target text information in the anchor template, and if the similarity is greater than a set similarity threshold, update the text information with the corresponding target text information.
In another aspect, the present application provides an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing any of the above method steps when executing a program stored in the memory.
In another aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method steps of any of the above.
The present application provides a method comprising: acquiring an electronic bill image to be identified, category information and key text information to be identified, and determining a corresponding anchor point template according to the category information; determining each text region in the image, identifying text information of each text region one by one, and determining the text region corresponding to the text information as a reference text region when the text information is determined to be any key text information to be identified; and determining the relative position relation between the target text area corresponding to each key text information to be identified and the reference text area according to the anchor template, determining each target text area according to the relative position relation, and identifying the text information in each target text area.
The technical scheme has the following advantages or beneficial effects:
according to the method and the device, the corresponding anchor template is determined according to the category information of the electronic bill to be recognized, after the reference text region corresponding to one key text information to be recognized in the image is determined, the relative position relation between the target text region corresponding to each key text information to be recognized and the reference text region is determined according to the anchor template, and even if the target text region corresponding to the key text information to be recognized changes due to the influence of the resolutions of a user and a mobile phone, the relative position relation is unchanged, so that the method and the device can accurately determine each target text region, and then recognize the text information in each target text region without recognizing the text information of all the text regions in the image. The flexibility and the efficiency of text recognition of the electronic bill are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic diagram of an electronic bill text recognition process provided herein;
FIG. 2 is a schematic diagram of key text information to be recognized according to the present application;
FIG. 3 is a schematic view of an electronic bill image to be identified provided by the present application;
fig. 4 is a schematic diagram of an electronic bill image after binarization processing provided by the present application;
fig. 5 is a schematic view of an electronic bill image after the opening and closing operation provided by the present application;
FIG. 6 is a schematic diagram of text regions provided herein;
FIG. 7 is a schematic diagram of an effective identification area and an ineffective identification area provided by the present application;
fig. 8 is a schematic diagram of gridding a text coordinate position provided by the present application;
FIG. 9 is a schematic diagram of anchor templates provided herein containing data;
fig. 10 is a schematic diagram of the text recognition result of the electronic bill provided by the present application;
fig. 11 is a schematic structural diagram of an electronic bill text recognition apparatus provided in the present application;
fig. 12 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.
The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.
The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.
The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.
Fig. 1 is a schematic diagram of a text recognition process of an electronic bill provided by the present application, where the process includes the following steps:
s101: and acquiring an electronic bill image to be identified, category information and key text information to be identified, and determining a corresponding anchor point template according to the category information.
S102: determining each text region in the image, identifying the text information of each text region one by one, and determining the text region corresponding to the text information as a reference text region when the text information is determined to be any key text information to be identified.
S103: and determining the relative position relation between the target text area corresponding to each key text information to be identified and the reference text area according to the anchor template, determining each target text area according to the relative position relation, and identifying the text information in each target text area.
The electronic bill text recognition method is applied to electronic equipment, and the electronic equipment can be equipment such as a PC (personal computer), a tablet personal computer and the like, and can also be equipment such as a scanner, a digital camera and the like.
The electronic equipment establishes anchor point templates of the electronic bills of the category in advance aiming at the electronic bills of each category. The anchor point template is formed by analyzing and summarizing off-line electronic bill data.
When the electronic bill text is identified, the electronic equipment acquires an electronic bill image to be identified, and acquires the category information of the electronic bill to be identified and the key text information to be identified. The electronic bill image to be recognized can be sent to the electronic equipment after being collected by the image collecting equipment, and the category information and the key text information to be recognized of the electronic bill to be recognized can be input into the electronic equipment by a user. The category information of the electronic bill to be identified is, for example, financial category, medical category, and the like. The key text information to be identified is, for example, "merchant number", "merchant name", "payment amount", and the like. As shown in fig. 2, the key text information to be identified may be represented in the form of a key-value pair, for example, the key text information "merchant number" to be identified includes a merchant number (key) and specific merchant number content (value), for example, the merchant number content is an × number; the key text information "business name" to be identified includes a business name (key) and specific business name content (value), for example, business name content; the key text information "payout amount" to be recognized includes a payout amount (key) and a specific payout amount content (value), for example. And determining a corresponding anchor point template according to the acquired category information of the electronic bill to be identified.
The electronic equipment determines each text region in the electronic bill image to be recognized, and then recognizes the text information of each text region one by one according to a preset recognition sequence. The preset recognition order is, for example, left to right, top to bottom. After the text information of the first text area is identified, judging whether the text information of the first text area is the key text information to be identified, if so, stopping identifying the text information of the text area, if not, identifying the text information of the second text area according to a preset identification sequence, and so on.
And when the text information is determined to be any key text information to be identified, determining the text area corresponding to the text information as a reference text area. The relative position relation of each text area is recorded in the anchor template, the electronic equipment determines the relative position relation between a target text area corresponding to each key text information to be identified and a reference text area according to the anchor template, determines each target text area according to the relative position relation, and identifies the text information in each target text area.
According to the method and the device, the corresponding anchor template is determined according to the category information of the electronic bill to be recognized, after the reference text region corresponding to one key text information to be recognized in the image is determined, the relative position relation between the target text region corresponding to each key text information to be recognized and the reference text region is determined according to the anchor template, and even if the target text region corresponding to the key text information to be recognized changes due to the influence of the resolutions of a user and a mobile phone, the relative position relation is unchanged, so that the method and the device can accurately determine each target text region, and then recognize the text information in each target text region without recognizing the text information of all the text regions in the image. The flexibility and the efficiency of text recognition of the electronic bill are improved.
In this application, the determining each text region in the image includes:
carrying out binarization processing and opening and closing operation processing on the image to obtain each connected region in the image; and taking the minimum bounding rectangle of each connected region as each text region.
Fig. 3 is a schematic view of an image of an electronic bill to be identified, which is generally an image of a gray-white background. The binarization processing of threshold segmentation is performed on the electronic bill image to be recognized, so that the binarized electronic bill image shown in fig. 4 is obtained. The binarized electronic bill image is subjected to opening and closing operation processing to obtain an image shown in fig. 5. Identifying each connected region in the image shown in fig. 5, determining the minimum bounding rectangle of each connected region, and mapping each minimum bounding rectangle to fig. 3 to obtain the image shown in fig. 6, where the region corresponding to the rectangular frame in fig. 6 is the text region.
In order to improve the efficiency of text recognition of the electronic bill, the determining each text region in the image comprises:
and determining an effective identification area in the image according to the anchor template, reserving a text area in the effective identification area, and filtering out the text area outside the effective identification area.
The anchor template has a valid identification area and an invalid identification area stored therein. Alternatively, as shown in fig. 7, from top to bottom in the image, 10% to 90% of the longitudinal direction is the effective recognition area, and the upper 10% and the lower 10% in the image are the ineffective recognition areas. And determining an effective identification area in the image according to the anchor template, then reserving a text area in the effective identification area, and performing a subsequent text identification process to filter the text area outside the effective identification area, namely, not performing text identification on the text area in the ineffective identification area, so that the efficiency of text identification of the electronic bill is improved.
In this application, identifying the text information of each text region one by one, when determining that the text information is any key text information to be identified, determining the text region corresponding to the text information as a reference text region includes:
the method comprises the steps of acquiring text regions one by one according to a preset identification sequence, identifying text information of the currently acquired text region, determining the text region corresponding to the text information as a reference text region if the text information is consistent with any key text information to be identified, and acquiring a next text region and judging whether the text region is the reference text region if the text information is inconsistent with any key text information to be identified.
According to the method and the device, the text regions are acquired one by one according to a preset identification sequence, and the text information of the currently acquired text regions is identified. The electronic equipment stores a recognition model which is trained in advance, inputs the currently acquired text region image into the recognition model, and recognizes text information of the text region. And then matching the text information with the acquired key text information to be recognized, and judging whether the text information is consistent with any key text information to be recognized. If yes, determining the text area corresponding to the text information as a reference text area, and if not, acquiring the next text area and judging whether the text area is the reference text area.
And when the reference text region is determined, determining the relative position relation between the target text region corresponding to each piece of key text information to be recognized and the reference text region according to the anchor template, determining each target text region according to the relative position relation, and recognizing the text information in each target text region.
In order to determine each target text region, in this application, before determining the relative position relationship between the target text region corresponding to each piece of key text information to be recognized and the reference text region according to the anchor template after determining each text region in the image, the method further includes:
and gridding each text area to obtain the relative position coordinate information of each text area in the image.
And gridding each text area to obtain a grid type list. The grid list includes an array and elements within the array. One row of the lattice type list is an array, and each lattice of the lattice type list is an element in the array. The relationship between the respective immediate adjacent numbers represents the vertical coordinate relationship of the respective text regions. The relationship between each immediately adjacent element within the array represents the lateral coordinate relationship of each text region. And finally obtaining the two-dimensional array with the text space information. The relative position coordinate information of each text region in the image is determined based on the lattice-type list.
In order to identify the text information in each target text region more accurately, in the present application, the identifying the text information in each target text region includes:
determining language information of the text information in each target text region according to the anchor template, and selecting a recognition model corresponding to the language to recognize the text information in the target text region according to the language information of the text information in each target text region.
The anchor template stores the language information of the text information in each target text region, for example, the language information of the text information in the target text region a is chinese, the language information of the text information in the target text region B is english, and the language information of the text information in the target text region C is french. The electronic equipment stores respective recognition models obtained by training for various languages, such as recognition models corresponding to Chinese training based on sample text regions and Chinese text information labeled in the sample text regions; training an identification model corresponding to English based on the sample text area and the English text information marked in the sample text area; and training a recognition model corresponding to the French based on the sample text region and the French text information marked in the sample text region.
Determining language information of the text information in each target text region according to the anchor template, and selecting a recognition model corresponding to the language to recognize the text information in the target text region according to the language information of the text information in each target text region. The accuracy of text information identification is improved, and the efficiency of text information identification is also improved.
In this application, the method further comprises:
and determining the similarity between the text information and the corresponding target text information in the anchor template aiming at the recognized text information in each target text region, and if the similarity is greater than a set similarity threshold, updating the text information by adopting the corresponding target text information.
The similarity between the text information and the corresponding target text information in the anchor template is determined by calculating the levenstein distance to perform fuzzy matching on the text information and the corresponding target text information in the anchor template, so that the problem of small-probability individual word recognition error is solved. The specific process of fuzzy matching is to calculate the Levinstein distance of the character strings to obtain the similarity value between the two character strings, if the similarity is 1, the two character strings are the same, and if the similarity is 0.5, the two character strings are half the same. For example, if the text information is recognized as the "agency code", the target text information is the "agency code", the same characters are 3, the total characters of the target text information are 4, the similarity is 3/4=0.75, and at this time, the corresponding target text information "agency code" is adopted to update the "agency code" of the text information.
The following describes the process of text recognition of an electronic bill in detail with reference to the accompanying drawings.
Firstly, preparing an anchor point template corresponding to the electronic bill, wherein the anchor point template is formed by analyzing and summarizing offline electronic bill data. The anchor template includes the following elements:
1. core identification area: and the effective identification area of the original electronic bill image needing to be identified is used for removing the text area image which is not needed to be identified and the forward/reverse order identification mark.
2. The content of each key text message, such as an acquirer, a merchant ticket number, and the like.
3. Theoretical relative position information in the image between the previous key text region and the next key text region, such as the next line and the second lattice, the next line and the first lattice, and so on.
And 4, theoretical relative position information of the key text information key and the corresponding value, such as the second lattice of the current line, the second lattice of the next line and the like.
5. Theoretical relative position information between the same type of values, such as the first lattice in the next line, it should be noted that the scene is a scene with one value divided into a plurality of lines.
6. Types of models, including the Chinese model, the English-to-digital model, and so on.
The application provides an electronic bill text recognition method based on an anchor point template. Under the scene of identifying the electronic bill without a complex format, text information in a target text area corresponding to each key text information to be identified in the image is identified by using an anchor template-based map searching mode. According to the method and the device, the absolute position of the text area is converted into the relative position between the text areas, so that the whole target text area identification does not depend on the area division of the fixed position any more, the problem that the electronic bills of the same type are changed due to the influence of the resolution ratio of a user and a mobile phone can be solved, the compatibility problem when the text position is changed can be solved, different anchor point templates can be used for carrying out targeted configuration on the electronic bills of different types, and the expandability is improved. Meanwhile, the anchor point template can support the same type of electronic bills to identify the text information of each target text area by using identification models of different languages, so that the feasibility of multi-language identification is improved.
In the method and the device, the electronic bill can be subjected to text position detection by using the universal text detection model to obtain the position information of all the text regions, and then the position information of all the text regions is sequenced from top to bottom, for example, from left to right.
Based on the page alignment property of the information in the electronic bill, a gridding method is introduced, namely a gridding type list is constructed according to the position information of the sequenced text area. The grid list includes an array and elements within the array. The relationship between the respective immediate groups represents the vertical coordinate relationship of the respective text regions. The relationship between each immediately adjacent element within the array represents the lateral coordinate relationship of each text region. And finally obtaining the two-dimensional array with the text space information.
And removing text areas of other irrelevant positions in the text gridding list through the effective identification area of the original image. Fig. 8 is a schematic diagram of gridding a text coordinate position provided in the present application.
And identifying the gridded text region according to the sequence of the text gridding list by using the corresponding identification model through the first keyword to be identified and the identification model type of the anchor point template until the first reference text region is found and is determined as the anchor point, and stopping traversing.
And directly finding a text area corresponding to the value according to the key-value relative position of the anchor point template and the type of the identification model, and identifying by using the corresponding identification model. After the previous group of key-value is taken, fixed-point character recognition is carried out according to the relative position of the next key stored in the previous group of keys, and the like. All key-values are finally found.
And removing the text regions which do not need to be identified from the position coordinate information of the sorted text regions according to the effective identification regions and the text regions which do not meet the conditions. Classifying the text areas with the close vertical axes into an array, and writing the front texts into the array from left to right according to the horizontal axis; the text areas with different vertical axes are classified into independent arrays, so that a two-dimensional array of relative positions between texts after gridding can be obtained. The text area which does not meet the condition is, for example, a status bar of the top of the mobile phone or an embedded advertisement area of the electronic bill in the top area in the screenshot of the mobile phone of the electronic bill. The text areas are screened through the effective identification areas, useless text areas can be reduced, and gridding data is more reliable.
Actual recognition, statistics and analysis are carried out on a large amount of off-line electronic bill data to obtain a reasonable optimal recognition route, and the sequence is configured into an anchor point template in a certain regular form. Fig. 9 is a schematic diagram of the anchor template containing data provided in the present application. The nomenclature in FIG. 9 is as follows:
Figure BDA0003823400840000131
the electronic equipment imports and analyzes the anchor point template, and searches for the corresponding Key1-Value1 according to the anchor point by taking the Key1 as a starting point. When the Key1-value1 is found, the position of the next Key2 prompted in the Key1 is identified, and so on.
And if the current key1 cannot find the next key2 according to the anchor point template, searching the previous line of the key2 according to the gridding result. If not, then key2 is determined to be not found. At this point, key1 is returned to start to find key3 line by line according to the gridding result, and so on.
After the text information in each target text region is determined, for the recognized text information in each target text region, the similarity between the text information and the corresponding target text information in the anchor template is determined, and if the similarity is greater than a set similarity threshold, the text information is updated by using the corresponding target text information. And if the similarity is greater than the set similarity threshold, determining that the text information cannot be found. Fig. 10 is a schematic diagram of a text recognition result of an electronic bill provided in the present application.
Fig. 11 is a schematic structural diagram of an electronic bill text recognition device provided in the present application, where the device includes:
the first determining module 111 is configured to obtain an electronic bill image to be identified, category information, and key text information to be identified, and determine a corresponding anchor point template according to the category information;
a second determining module 112, configured to determine each text region in the image, recognize text information of each text region one by one, and when it is determined that the text information is any key text information to be recognized, determine a text region corresponding to the text information as a reference text region;
the identifying module 113 is configured to determine, according to the anchor template, a relative position relationship between a target text region corresponding to each piece of key text information to be identified and the reference text region, determine each target text region according to the relative position relationship, and identify text information in each target text region.
The second determining module 112 is specifically configured to perform binarization processing and opening and closing operation processing on the image to obtain each connected region in the image; and taking the minimum bounding rectangle of each connected region as each text region.
The second determining module 112 is specifically configured to determine an effective recognition area in the image according to the anchor template, reserve a text area in the effective recognition area, and filter out a text area outside the effective recognition area.
The second determining module 112 is specifically configured to obtain text regions one by one according to a preset identification sequence, identify text information of a currently obtained text region, determine, if the text information is consistent with any key text information to be identified, a text region corresponding to the text information as a reference text region, and if the text information is inconsistent with any key text information to be identified, obtain a next text region and determine whether the text region is the reference text region.
The second determining module 112 is further configured to perform a meshing process on each text region to obtain the relative position coordinate information of each text region in the image.
The identification module 113 is specifically configured to determine language information of the text information in each target text region according to the anchor template, and select an identification model corresponding to the language to identify the text information in the target text region according to the language information of the text information in each target text region.
The identification module 113 is further configured to determine, for the text information in each identified target text region, a similarity between the text information and the corresponding target text information in the anchor template, and if the similarity is greater than a set similarity threshold, update the text information with the corresponding target text information.
The present application also provides an electronic device, as shown in fig. 12, including: the system comprises a processor 301, a communication interface 302, a memory 303 and a communication bus 304, wherein the processor 301, the communication interface 302 and the memory 303 are communicated with each other through the communication bus 304;
the memory 303 has stored therein a computer program which, when executed by the processor 301, causes the processor 301 to perform any of the above method steps.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface 302 is used for communication between the above-described electronic apparatus and other apparatuses.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
The present application further provides a computer-readable storage medium having stored therein a computer program executable by an electronic device, the program, when run on the electronic device, causing the electronic device to perform any of the above method steps.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. An electronic bill text recognition method, characterized in that the method comprises:
acquiring an electronic bill image to be identified, category information and key text information to be identified, and determining a corresponding anchor point template according to the category information;
determining each text region in the image, identifying text information of each text region one by one, and determining the text region corresponding to the text information as a reference text region when the text information is determined to be any key text information to be identified;
and determining the relative position relation between the target text area corresponding to each piece of key text information to be recognized and the reference text area according to the anchor template, determining each target text area according to the relative position relation, and recognizing the text information in each target text area.
2. The method of claim 1, wherein the determining each text region in the image comprises:
carrying out binarization processing and opening and closing operation processing on the image to obtain each connected region in the image; and taking the minimum bounding rectangle of each connected region as each text region.
3. The method of claim 1 or 2, wherein the determining each text region in the image comprises:
and determining an effective identification area in the image according to the anchor template, reserving a text area in the effective identification area, and filtering out the text area outside the effective identification area.
4. The method according to claim 1, wherein the identifying the text information of each text region one by one, and when the text information is determined to be any one of the key text information to be identified, determining the text region corresponding to the text information as a reference text region comprises:
the method comprises the steps of acquiring text regions one by one according to a preset identification sequence, identifying text information of the currently acquired text region, determining the text region corresponding to the text information as a reference text region if the text information is consistent with any key text information to be identified, and acquiring a next text region and judging whether the text region is the reference text region if the text information is inconsistent with any key text information to be identified.
5. The method of claim 1, wherein after determining each text region in the image, before determining a relative positional relationship between a target text region corresponding to each key text information to be recognized and the reference text region according to the anchor template, the method further comprises:
and gridding each text area to obtain the relative position coordinate information of each text area in the image.
6. The method of claim 1, wherein the identifying text information in the respective target text regions comprises:
determining language information of the text information in each target text region according to the anchor template, and selecting a recognition model corresponding to the language to recognize the text information in the target text region according to the language information of the text information in each target text region.
7. The method of claim 1, wherein the method further comprises:
and determining the similarity between the text information and the corresponding target text information in the anchor template aiming at the recognized text information in each target text region, and if the similarity is greater than a set similarity threshold, updating the text information by adopting the corresponding target text information.
8. An electronic bill text recognition apparatus, comprising:
the first determining module is used for acquiring an electronic bill image to be identified, category information and key text information to be identified, and determining a corresponding anchor point template according to the category information;
the second determining module is used for determining each text region in the image, identifying the text information of each text region one by one, and determining the text region corresponding to the text information as a reference text region when the text information is determined to be any key text information to be identified;
and the identification module is used for determining the relative position relation between the target text region corresponding to each piece of key text information to be identified and the reference text region according to the anchor template, determining each target text region according to the relative position relation, and identifying the text information in each target text region.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1 to 7 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.
CN202211051283.3A 2022-08-30 2022-08-30 Electronic bill text recognition method and device, electronic equipment and storage medium Pending CN115439853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211051283.3A CN115439853A (en) 2022-08-30 2022-08-30 Electronic bill text recognition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211051283.3A CN115439853A (en) 2022-08-30 2022-08-30 Electronic bill text recognition method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115439853A true CN115439853A (en) 2022-12-06

Family

ID=84245457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211051283.3A Pending CN115439853A (en) 2022-08-30 2022-08-30 Electronic bill text recognition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115439853A (en)

Similar Documents

Publication Publication Date Title
JP6831480B2 (en) Text detection analysis methods, equipment and devices
CN111931664B (en) Mixed-pasting bill image processing method and device, computer equipment and storage medium
CN109726643B (en) Method and device for identifying table information in image, electronic equipment and storage medium
CN109685055B (en) Method and device for detecting text area in image
JP4676225B2 (en) Method and apparatus for capturing electronic forms from scanned documents
CN110909725A (en) Method, device and equipment for recognizing text and storage medium
CN111046784A (en) Document layout analysis and identification method and device, electronic equipment and storage medium
CN112597773B (en) Document structuring method, system, terminal and medium
CN111353491A (en) Character direction determining method, device, equipment and storage medium
WO2020071558A1 (en) Business form layout analysis device, and analysis program and analysis method therefor
CN111753120A (en) Method and device for searching questions, electronic equipment and storage medium
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
CN111914729A (en) Voucher association method and device, computer equipment and storage medium
CN115082659A (en) Image annotation method and device, electronic equipment and storage medium
CN112464927B (en) Information extraction method, device and system
CN114092948A (en) Bill identification method, device, equipment and storage medium
CN112613367A (en) Bill information text box acquisition method, system, equipment and storage medium
CN111797772A (en) Automatic invoice image classification method, system and device
CN116030469A (en) Processing method, processing device, processing equipment and computer readable storage medium
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN113420116B (en) Medical document analysis method, device, equipment and medium
CN115546815A (en) Table identification method, device, equipment and storage medium
CN115439853A (en) Electronic bill text recognition method and device, electronic equipment and storage medium
US11335108B2 (en) System and method to recognise characters from an image
CN115063784A (en) Bill image information extraction method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination