CN114202766A - Method and device for extracting text field and electronic equipment - Google Patents

Method and device for extracting text field and electronic equipment Download PDF

Info

Publication number
CN114202766A
CN114202766A CN202111428606.1A CN202111428606A CN114202766A CN 114202766 A CN114202766 A CN 114202766A CN 202111428606 A CN202111428606 A CN 202111428606A CN 114202766 A CN114202766 A CN 114202766A
Authority
CN
China
Prior art keywords
image
text field
target
text
original image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111428606.1A
Other languages
Chinese (zh)
Inventor
徐书豪
金洪亮
闫凯
梅俊辉
王志刚
林文辉
高洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN202111428606.1A priority Critical patent/CN114202766A/en
Publication of CN114202766A publication Critical patent/CN114202766A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Character Input (AREA)

Abstract

The method comprises the steps of obtaining an original image containing text fields, determining each target image area in the original image, then carrying out text recognition on the text fields in each target image area to obtain the text fields corresponding to each target image area, and then extracting the target text fields meeting business requirements from the text fields according to preset extraction rules. Based on the method, the extraction of the target text field in the tax finalization certification image is realized, the problem that the target text field meeting the business requirement in the tax finalization certification image cannot be extracted in the prior art is solved, and the accuracy rate of extracting the target text field is effectively improved.

Description

Method and device for extracting text field and electronic equipment
Technical Field
The present application relates to the field of image processing, and in particular, to a method, an apparatus, and an electronic device for extracting a text field.
Background
With the development of image processing technology, text fields in certification images such as tax completion certificates can be identified by image processing technologies such as HOG (Histogram of Oriented Gradient), LBP (Local Binary Patterns), and the like. However, this method is specifically to extract all text fields in the tax finalization image, and cannot extract only the target text field satisfying the business requirements in the tax finalization image.
Disclosure of Invention
The application provides a method and a device for extracting a text field and electronic equipment, which are used for extracting a target text field in a tax finalization image, solving the problem that the target text field meeting business requirements in the tax finalization image cannot be extracted in the prior art, and effectively improving the accuracy of extracting the target text field in the tax finalization image.
In a first aspect, the present application provides a method for extracting a text field, the method comprising:
acquiring an original image containing a text field, and determining each target image area in the original image, wherein the target image area is an area containing the text field in the image to be processed;
performing text recognition on the text fields in the target image areas to obtain the text fields corresponding to the target image areas;
and extracting a target text field meeting the service requirement from the text field according to a preset extraction rule.
In one possible design, obtaining an original image containing a text field includes: the method comprises the steps of obtaining an image to be processed containing a text field, rotating the image to be processed according to a preset angle to obtain N rotating images with different rotating angles corresponding to the image to be processed, projecting the text field in the rotating images in a given direction, superposing the projections of the rotating images in the given direction to obtain projection values of the rotating images, determining N projection values corresponding to the N rotating images, selecting the rotating image corresponding to the minimum projection value from the N projection values, and taking the rotating image as an original image.
In one possible design, after obtaining the original image containing the text field, the method further includes: dividing an original image into a plurality of image blocks, calculating Euclidean distance between the two image blocks, determining the image blocks with the Euclidean distance smaller than a preset threshold value as similar image blocks, identifying the similar image blocks as similar areas to obtain one or more similar areas in the original image, and denoising each similar area in the original image to obtain the original image subjected to denoising.
In one possible design, determining each target image region in the original image includes: based on the target detection model, the image characteristics in the original image are extracted, and then each target image area in the original image is determined according to the image characteristics.
In one possible design, extracting a target text field satisfying a business requirement from the text fields according to a preset extraction rule includes: acquiring an incidence relation between a target text field and a text field based on a preset database, and extracting the target text field meeting the service requirement from the text field in the original image according to the incidence relation.
In one possible design, after extracting a target text field satisfying a business requirement from the text field, the method further includes: and sending the target text field to a front-end display interface for display.
By the method, the target text field can be extracted, the problem that the target text field meeting the business requirement in the tax certified image cannot be extracted in the prior art is solved, and the following technical effects can be achieved:
1. the method comprises the steps of correcting an original image to be in a set state by performing image preprocessing operation on the original image and based on angle correction and noise filtering on the original image, filtering noise pixels existing in the original image, restoring image information of the original image to the maximum extent, and facilitating improvement of accuracy of determining each target image area in the original image;
2. the method comprises the steps of performing image detection on an original image, realizing end-to-end rapid text field detection by a modeling regression task based on a target detection model, realizing lightweight feature calculation through a small convolution kernel, realizing text box prediction through a full connection layer, and quickly realizing detection of a target image area in the image;
3. through a text recognition mode, a CTC model is introduced to combine pixel information with a time sequence relation, so that word content in an image is aligned, and through one-to-one corresponding conversion of 'image- > word', recognition of a text field corresponding to a target image area is realized, and the recognition accuracy is improved;
4. a coordinate analysis method is specifically introduced based on the processing of target field retrieval, line text alignment and multi-line text combination through a preset extraction rule so as to ensure the accuracy of the output target text field.
In a second aspect, the present application provides an apparatus for extracting a text field, the apparatus comprising:
a module for determining a target image area, which is used for acquiring an original image containing a text field and determining each target image area in the original image, wherein the target image area is an area containing the text field in the image to be processed;
a text field identification module for performing text identification on the text field in each target image area to obtain the text field corresponding to each target image area;
and the target text field extraction module is used for extracting a target text field meeting the service requirement from the text field according to a preset extraction rule.
In one possible design, the target image area determining module is specifically configured to obtain an image to be processed including a text field; rotating the image to be processed according to a preset angle to obtain N rotating images with different rotating angles corresponding to the image to be processed, wherein N is a positive integer greater than or equal to 1; projecting the text field in the rotating image in a given direction, superposing the projections of the rotating image in the given direction to obtain the projection values of the rotating image, and determining N projection values corresponding to the N rotating images; and selecting a rotating image corresponding to the minimum projection value from the N projection values, and taking the rotating image as an original image.
In one possible design, the target image area determining module is specifically configured to divide the original image into a plurality of image blocks, where the image blocks represent a partial image of the original image; calculating a Euclidean distance between two image blocks, and determining the two image blocks with the Euclidean distance smaller than a preset threshold value as similar image blocks; identifying the similar image blocks as similar areas to obtain one or more similar areas in the original image; and denoising each similar region in the original image to obtain the denoised original image.
In one possible design, the module for determining a target image region is specifically configured to extract image features in the original image based on a target detection model; and determining each target image area in the original image according to the image characteristics.
In one possible design, the target text field extraction module is specifically configured to obtain an association relationship between a target text field and a text field based on a preset database; and extracting a target text field meeting the service requirement from the text fields in the original image according to the incidence relation.
In one possible design, after the module for extracting the target text field is further configured to send the target text field to a front-end display interface for display.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the method for extracting the text field when executing the computer program stored in the memory.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the above-mentioned method steps of extracting a text field.
For each of the second to fourth aspects and possible technical effects of each aspect, please refer to the above description of the first aspect or the possible technical effects of each solution in the first aspect, and no repeated description is given here.
Drawings
FIG. 1 is a flow chart of a method for extracting text fields provided herein;
FIG. 2 is a schematic diagram of a projection-based angle correction algorithm provided herein;
FIG. 3 is a schematic view of an angle correction provided herein;
FIG. 4 is a schematic diagram of a network architecture of a YOLO model provided in the present application;
FIG. 5 is a schematic illustration of target detection provided herein;
FIG. 6 is a diagram illustrating extraction of a target text field according to the present application;
FIG. 7 is a diagram illustrating an output target text field provided herein;
FIG. 8 is a diagram illustrating an apparatus for extracting text fields according to the present disclosure;
fig. 9 is a schematic diagram of a structure of an electronic device provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied to apparatus embodiments or system embodiments. It should be noted that "a plurality" is understood as "at least two" in the description of the present application. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In addition, in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to be construed.
The embodiment of the application provides a method and a device for extracting a text field and electronic equipment, and solves the problem that the prior art cannot extract a target text field meeting business requirements in a tax finalization image.
It is to be understood that the preferred embodiments described herein are for purposes of illustration and explanation only and are not intended to limit the present invention, and that the embodiments and features thereof in this application may be combined with each other without conflict.
Referring to fig. 1, an embodiment of the present application provides a method for extracting a text field, which includes the following specific processes:
step 101: acquiring an original image containing a text field, and determining each target image area in the original image;
after the original image is obtained, each target image area in the original image can be determined by performing operations of image preprocessing and image detection on the original image, where the image preprocessing can include angle correction on the original image and/or noise filtering on the original image, and the target image area is an area containing a text field in the original image.
It should be noted that the original image is not limited to the tax finalization image, and may be other certification images or any images containing text fields, and the tax finalization image is taken as an example and is specifically described below.
The specific procedures of S1 angle correction, S2 noise filtering, and S3 image detection are described below.
And S1 angle correction:
the angular correction of the original image can be implemented based on a projected angular correction algorithm. Such as Radon transform algorithm, that is, finding the angle at the maximum projection value by projection superposition in a given direction, thereby determining the tilt angle of the original image.
Specifically, the projection-based angle correction algorithm mainly applies the characteristic that the image projection is longest in the normal direction and shortest in the horizontal direction. If the original image is represented as a binary function f (x, y), the projection of the original image in a given direction can be represented as a line integral of f (x, y) in that direction. Referring to fig. 2, the projection of f (x, y) in the x direction can be represented as the line integral of f (x, y) in the vertical direction; the projection of f (x, y) in the y direction can be expressed as the line integral of f (x, y) in the horizontal direction; the projection of f (x, y) in the x 'direction can be expressed as the line integral of f (x, y) in the y' direction.
It is worth mentioning that f (x, y) can be projected along any given direction to obtain the line integral in the corresponding given direction. Taking the case of the projection of f (x, y) along the x 'direction as an example, a specific calculation formula of the line integral of f (x, y) along the y' direction can be seen as the following formula 1.
Figure BDA0003379221540000061
Wherein (x, y) is the original coordinate of the original image, and (x ', y') is the original coordinate (x, y)) New coordinate of rotation theta angle, Rθ(x ') is the line integral of the original image in the y' direction.
In addition, the above projection-based angle correction algorithm is a possible angle correction algorithm, and is also applicable to other correction algorithms, and is not specifically described here.
For example, whether the original image is in a set state is determined, if yes, the angle correction is not performed on the original image; if not, the original image is adjusted to a set state by using an angle correction algorithm, for example, the horizontal direction is taken as a given direction, and the set state is a state in which a text field in the original image is parallel to the horizontal direction, as shown in fig. 3, when the original image is not in the set state, the angle of the original image is adjusted so that the adjusted image is in the set state, and the adjusted image at this time is taken as the original image.
The method for correcting the angle is beneficial to improving the accuracy of determining the target image area.
S2 noise filtering:
noise filtering of the raw image can be implemented by Non-Local mean filtering (NLM) algorithm. Of course, other filtering algorithms can be used to implement noise filtering, and the process of noise filtering is described in detail below by taking the non-local mean filtering algorithm as an example.
In particular, the non-local mean filtering algorithm mainly applies non-local self-similarity of an image, wherein the non-local self-similarity is that textures or structures in non-local areas of the image have repeated characteristics, and the characteristics can be used for effectively keeping edges and details of the image. For example, in an image, image blocks having the same pixels are numerous and the noise therein is uncorrelated, and processing the image blocks can effectively remove the noise in the image.
For example, the original image is divided into a plurality of image blocks, then the euclidean distance between the image blocks is calculated to determine similar image blocks in the image blocks, the similar image blocks are identified as a similar area, one or more similar areas in the original image are determined, and then each similar area is subjected to averaging processing to achieve the purpose of removing noise in the original image.
Referring to the following formula 2, a specific calculation formula for performing noise filtering on an original image by using a non-local mean filtering algorithm is shown.
Figure BDA0003379221540000071
Figure BDA0003379221540000081
W (i, j) represents the image block similarity of the weighted average and is determined by the euclidean distance between two image blocks. Since the similarity between pixels in an image is greatly affected by the physical distance, the pixels need to be aligned for normalization, and z (i) is the distance between processed image blocks.
It should be noted that the euclidean distance is one possible way to determine similar image blocks, and is not intended to limit the embodiments of the present application.
By the noise filtering method, the accuracy of subsequent text field extraction is improved.
S3 image detection:
image detection of the original image may be implemented based on an object detection model algorithm. Such as a YOLO (young Only Look Once) model algorithm, based on which a target image region in an original image can be determined by recognizing a text field in the original image.
Specifically, a network architecture of the YOLO model can be shown in fig. 4, where the YOLO model is a one-step target detection model, and the model can model a text field detection task as a regression problem to solve the problem. The YOLO model is used in a practical application to complete the output of the position and the category from the input of the original image to the target image area based on an end-to-end network. In the YOLO model, image features in an original image are extracted based on a Convolutional Neural Network (CNN), and then a predicted probability value of a position and a category of a target image area in the original image is obtained by using a full Connected Layer (FC).
For example, referring to fig. 5, after image detection processing by the YOLO model, each target image area in the original image can be determined.
It should be noted that the above YOLO model is a possible model for target detection, and other suitable target detection models are not specifically described herein.
By the image detection method, each target image area in the original image is determined, the accuracy of determining the target image area is effectively improved, and the accuracy of subsequently extracting the text field is improved.
Step 302: performing text recognition on the text fields in the target image areas to obtain the text fields corresponding to the target image areas;
in the embodiment of the application, the text recognition of the text field in each target image area can be realized by adopting a 'feature extraction' and 'word prediction'. Text recognition enables corresponding processing for different text fields, i.e. converting image information in various writing forms into uniformly represented text information.
Specifically, in the stage of "feature processing", image features in the original image are extracted first by a convolutional Neural Network, and then character sequence features are extracted by a Recurrent Neural Network (RNN). Here, since the extracted image features are affected by the writing style, the size of characters, the font factor, and the like, there is a case where the same letter is predicted multiple times in the predicted text field, for example, "Hello" is predicted as "helloloooo".
Further, in the stage of "text prediction", a ctc (connect i on i st Tempora l C l ass i f i cat i on) model is introduced to perform text prediction in the embodiment of the present application, and alignment compression is performed on redundant features, for example, "Hello" of incorrect prediction is adjusted to "Hello", so as to ensure that the predicted text field corresponds to the detected target image region one to one.
By the text recognition mode, the accuracy of recognizing the text fields in the target image areas can be effectively improved.
Step 303: and extracting a target text field meeting the service requirement from the text field according to a preset extraction rule.
In the embodiment of the present application, the preset extraction rule is to extract a target text field from text fields corresponding to each target image area, where the target text field may be as shown in fig. 6, and output the target text field according to a specified format, and the output target text field may be as shown in fig. 7. Here, the extraction process may include three parts, namely, target field retrieval, line text alignment, and multi-line text merging, which will be described in detail below with reference to the accompanying drawings.
First, target field retrieval: firstly, a retrieval rule of the target field is set according to the characteristics of the target field, and then the text field is retrieved and matched based on the retrieval rule.
Here, the association relationship between the target text field and the text field may be obtained based on a preset database, and then the target text field meeting the service requirement is extracted from the text fields corresponding to the target image areas according to the association relationship.
It should be noted that, in order to facilitate a more complete description of the technical solution provided by the embodiment of the present application in combination with the second and third sections, the extracted target text field is described as a target field in the following. It should be understood by those skilled in the art that the target text field can be extracted through the processing operation of the first part, and in addition, the first part can be implemented in combination with the second part or/and the third part, which is an optimized technical solution provided by the embodiments of the present application.
Specifically, for the original image of the tax completed proof, text fields of "chinese", "tax receipt", "verification code", "tax payer identification number", "20", "M", and the like may be extracted. Setting a retrieval rule of a target field according to actual business requirements, namely presetting a database, and if fields such as 'China' and 'tax' are not set in the retrieval rule as the target field, then not retrieving the text fields; only the target fields such as "validation code", "tax payer identification number", "20", "M", etc. are retrieved. After the target field is retrieved, the target field is matched according to a preset matching rule, such as a regular matching mode.
For example, a matching rule is preset according to actual business requirements, for example, the specific content corresponding to the "taxpayer identification number" is "number + capital letter"; the specific content corresponding to the "verification code" is "20 first digits", and the like, and thus matching of the target field can be achieved.
As shown in fig. 6, the text field "captcha" is matched with the text field "20 × × according to the preset matching rule, and the matching result" captcha "is obtained: and 20. the text field "taxpayer identification number" is matched with the text field "taxpayer identification number" M ", and a matching result" taxpayer identification number "is obtained: and M ", etc.
A second part: the lines are aligned, and when multiple lines of target fields are matched in the first part, the multiple lines of target fields are analyzed by introducing a coordinate analysis method, so that the matching accuracy in a multiple-line target field scene is improved.
Specifically, for the original image of the tax completion certification, a plurality of lines of matching may be performed for target fields such as "original certificate number", "tax type", "item name", "time to which tax belongs", "date of entering (returning) the bank", "amount of actually paid (returning)". In the way of rule matching, there may be a problem of multiple rows of staggered matching.
As shown in fig. 6, three "target fields" are respectively matched for "original certificate number", "tax type", "item name", "period to which tax belongs", "date of entering (returning) the bank", and "amount of actual payment (returning)". It is assumed that the first row "31" matching under the "original certificate number" is "local education addition" corresponding to "tax type", the corresponding "item name" is "value-added tax local education addition", the corresponding "time to which the tax belongs" is "2021-06-01 to 2021-06-30", the corresponding "income (withdrawal) library date" is "2021-07-08", and the corresponding "actual payment (withdrawal) amount" is "1339.00"; the second row "31" matching under the "original voucher number" is "education fee addition", the corresponding "item name" is "value-added tax education addition", the corresponding "time to which the tax belongs" is "2021-. The problem of multiple rows of cross-matches is that the "education expense plus" match of the second row corresponds to the "1339.00" case of the first row.
In order to solve the problem, in the line text alignment part, firstly, the target fields corresponding to a plurality of lines are determined, then, the reference text and the ordinate of the reference text are determined, and the average height of the target fields in the same line with the reference text is determined. See equation 3 for a specific calculation formula for text alignment.
Figure RE-GDA0003498288990000111
Wherein, yRIs the ordinate of the reference text, y' is the ordinate of the object field in line with the reference text, havgIs the average height from the reference text.
It should be noted that "0.5" in the formula 3 is a possible preset threshold, and can be set according to the actual application requirements, and the formula 3 is described with reference to the accompanying drawings.
Referring to fig. 6, first, the first line "31" under the left "certificate number" is used as a reference text, and the ordinate of the reference text is yRThen, according to the practical application condition, the height range of the vertical coordinate floating up and down is set, and the reference is determined based on the height rangeThe test text is in other target fields in the same line. After determining the relationship of the first line, the process is repeated with the second line "31" under "document number" as the reference text until the alignment of the lines of the target field is completed.
And a third part: the multi-line text merging mainly aims at the case of multi-line text, as shown in fig. 6, it is assumed that the target document segment "tax authority" corresponds to two target document segments "national tax administration and economic development area" and "tax authority first tax authority", but in an actual application scenario, the two corresponding target fields should be one target field, namely "national tax administration and economic development area tax authority first tax authority", for which the multi-text merging operation is required.
Specifically, firstly, the "tax authority" is determined as a reference text, and target fields to be merged, namely the "national tax administration, economic development area" and the "tax authority first tax authority" are determined through the ordinate and abscissa of the reference text. See equation 4 for a specific calculation formula for multi-text merging.
Figure BDA0003379221540000121
Wherein, yRAs ordinate, x, of the reference textRFor reference to the abscissa of the text, wRFor the width of the reference text, y 'is the ordinate of the target field to be merged, x' is the abscissa of the target field to be merged, havgIs the average height from the reference text.
It should be noted that "0.5" and "1.5" in equation 4 are all possible preset thresholds, and may be set according to actual application requirements.
In addition, the drawings and the description of the examples provided by the embodiments of the application are only used for illustration and are not used for other purposes, so that the using specification of the drawings and the description of the examples meets the relevant national regulation.
Therefore, the extraction of the target text field can be realized through the processing of the three parts, and the accuracy of the extraction of the target text field is effectively improved through the processing of the three parts, namely target field retrieval, line text alignment and multi-line text combination.
By the method, the target text field can be extracted, the problem that the target text field meeting the service requirement in the tax finalization image cannot be extracted in the prior art is solved, and the following technical effects can be achieved:
1. the method comprises the steps of correcting an original image to be in a set state by performing image preprocessing operation on the original image and based on angle correction and noise filtering on the original image, filtering noise pixels existing in the original image, restoring image information of the original image to the maximum extent, and facilitating improvement of accuracy of determining each target image area in the original image;
2. the method comprises the steps of performing image detection on an original image, realizing end-to-end rapid text field detection by a modeling regression task based on a target detection model, realizing lightweight feature calculation through a small convolution kernel, realizing text box prediction through a full connection layer, and quickly realizing detection of a target image area in the image;
3. through a text recognition mode, a CTC model is introduced to combine pixel information with a time sequence relation, so that word content in an image is aligned, and through one-to-one corresponding conversion of 'image- > word', recognition of a text field corresponding to a target image area is realized, and the recognition accuracy is improved;
4. a coordinate analysis method is specifically introduced based on the processing of target field retrieval, line text alignment and multi-line text combination through a preset extraction rule so as to ensure the accuracy of the output target text field.
Based on the method provided by the embodiment of the application, the target text field can be sent to the front-end display interface, and personalized display service is provided for the user according to actual application requirements.
Based on the same inventive concept, the present application further provides a device for extracting a text field, so as to extract a target text field in a tax finalization image, solve the problem in the prior art that the target text field meeting the service requirement in the tax finalization image cannot be extracted, and effectively improve the accuracy of extracting the target text field, referring to fig. 8, the device includes:
a module 801 for determining a target image area, which is to obtain an original image containing a text field, and determine each target image area in the original image, where the target image area is an area containing the text field in the image to be processed;
a text field identification module 802, configured to perform text identification on the text field in each target image area to obtain a text field corresponding to each target image area;
and the target text field extraction module 803 extracts a target text field meeting the service requirement from the text field according to a preset extraction rule.
In one possible design, the target image area determining module 801 is specifically configured to obtain an image to be processed including a text field; rotating the image to be processed according to a preset angle to obtain N rotating images with different rotating angles corresponding to the image to be processed, wherein N is a positive integer greater than or equal to 1; projecting a text field in the rotating image in a given direction, superposing the projections of the rotating image in the given direction to obtain projection values of the rotating image, and determining N projection values corresponding to the N rotating images; and selecting a rotating image corresponding to the minimum projection value from the N projection values, and taking the rotating image as an original image.
In one possible design, the target image area determining module 801 is specifically configured to divide the original image into a plurality of image blocks, where the image blocks represent part of images in the original image; calculating a Euclidean distance between two image blocks, and determining the two image blocks with the Euclidean distance smaller than a preset threshold value as similar image blocks; identifying the similar image blocks as similar areas to obtain one or more similar areas in the original image; and denoising each similar region in the original image to obtain the denoised original image.
In one possible design, the target image region determining module 801 is specifically configured to extract image features in the original image based on a target detection model; and determining each target image area in the original image according to the image characteristics.
In a possible design, the target text field extracting module 803 is specifically configured to obtain an association relationship between a target text field and a text field based on a preset database; and extracting a target text field meeting the service requirement from the text fields in the original image according to the incidence relation.
In one possible design, after the target text field extracting module 803, the target text field extracting module is further configured to send the target text field to a front-end display interface for display.
Based on the device, the extraction of the target text field in the tax finalization certification image is realized, the problem that the target text field meeting the business requirement in the tax finalization certification image cannot be extracted in the prior art is solved, and the accuracy rate of extracting the target text field is effectively improved.
Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device may implement the function of the foregoing apparatus for extracting a text field, and with reference to fig. 9, the electronic device includes:
at least one processor 901 and a memory 902 connected to the at least one processor 901, in this embodiment, a specific connection medium between the processor 901 and the memory 902 is not limited in this application, and fig. 9 illustrates an example in which the processor 901 and the memory 902 are connected through a bus 900. The bus 900 is shown in fig. 9 by a thick line, and the connection between other components is merely illustrative and not limited thereto. The bus 900 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 9 for ease of illustration, but does not represent only one bus or type of bus. Alternatively, the processor 901 may also be referred to as a controller, without limitation to name a few.
In the embodiment of the present application, the memory 902 stores instructions executable by the at least one processor 901, and the at least one processor 901 can execute the method for extracting text field discussed above by executing the instructions stored in the memory 902. The processor 901 may implement the functions of the respective modules in the apparatus shown in fig. 8.
The processor 901 is a control center of the apparatus, and can connect various parts of the entire control device by using various interfaces and lines, and by executing or executing instructions stored in the memory 902 and calling data stored in the memory 902, various functions of the apparatus and processing data are performed, thereby performing overall monitoring of the apparatus.
In one possible design, the processor 901 may include one or more processing units, and the processor 901 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, and the like, and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 901. In some embodiments, the processor 901 and the memory 902 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 901 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method for extracting text fields disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
Memory 902, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 902 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 902 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 902 of the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
The processor 901 is programmed to solidify the code corresponding to the text field extracting method described in the foregoing embodiments into the chip, so that the chip can execute the steps of the text field extracting method of the embodiment shown in fig. 1 when running. How to program the processor 901 is well known to those skilled in the art and will not be described herein.
Based on the same inventive concept, the embodiment of the present application further provides a storage medium storing computer instructions, which when run on a computer, cause the computer to execute the method for extracting text fields discussed above.
In some possible embodiments, the various aspects of the method for extracting a text field provided by the present application may also be implemented in the form of a program product, which includes program code for causing the control apparatus to perform the steps of the method for extracting a text field according to various exemplary embodiments of the present application described above in this specification, when the program product runs on a device.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method of extracting a text field, the method comprising:
acquiring an original image containing a text field, and determining each target image area in the original image, wherein the target image area is an area containing the text field in the image to be processed;
performing text recognition on the text fields in the target image areas to obtain the text fields corresponding to the target image areas;
and extracting a target text field meeting the service requirement from the text field according to a preset extraction rule.
2. The method of claim 1, wherein said obtaining an original image containing a text field comprises:
acquiring an image to be processed containing a text field;
rotating the image to be processed according to a preset angle to obtain N rotating images with different rotating angles corresponding to the image to be processed, wherein N is a positive integer greater than or equal to 1;
projecting a text field in the rotating image in a given direction, superposing the projections of the rotating image in the given direction to obtain projection values of the rotating image, and determining N projection values corresponding to the N rotating images;
and selecting a rotating image corresponding to the minimum projection value from the N projection values, and taking the rotating image as an original image.
3. The method of any of claims 1-2, further comprising, after the obtaining an original image containing a text field:
dividing the original image into a plurality of image blocks, wherein the image blocks represent partial images in the original image;
calculating a Euclidean distance between two image blocks, and determining the image block of which the Euclidean distance is smaller than a preset threshold value as a similar image block;
identifying the similar image blocks as similar areas to obtain one or more similar areas in the original image;
and denoising each similar region in the original image to obtain the denoised original image.
4. The method of claim 1, wherein said determining each target image region in said original image comprises:
extracting image features in the original image based on a target detection model;
and determining each target image area in the original image according to the image characteristics.
5. The method of claim 1, wherein the extracting a target text field satisfying a service requirement from the text fields according to a preset extraction rule comprises:
acquiring an incidence relation between a target text field and a text field based on a preset database;
and extracting a target text field meeting the service requirement from the text fields in the original image according to the incidence relation.
6. The method of claim 1, wherein after extracting a target text field satisfying a business requirement from the text field, further comprising:
and sending the target text field to a front-end display interface for display.
7. An apparatus for extracting a text field, the apparatus comprising:
a module for determining a target image area, which is used for acquiring an original image containing a text field and determining each target image area in the original image, wherein the target image area is an area containing the text field in the image to be processed;
a text field identification module for performing text identification on the text field in each target image area to obtain the text field corresponding to each target image area;
and the target text field extraction module is used for extracting a target text field meeting the service requirement from the text field according to a preset extraction rule.
8. The apparatus according to claim 7, wherein the module for determining a target image area is specifically configured to obtain an association relationship between a target text field and a text field based on a preset database; and extracting a target text field meeting the service requirement from the text fields in the original image according to the incidence relation.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1-6 when executing the computer program stored on the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-6.
CN202111428606.1A 2021-11-29 2021-11-29 Method and device for extracting text field and electronic equipment Pending CN114202766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111428606.1A CN114202766A (en) 2021-11-29 2021-11-29 Method and device for extracting text field and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111428606.1A CN114202766A (en) 2021-11-29 2021-11-29 Method and device for extracting text field and electronic equipment

Publications (1)

Publication Number Publication Date
CN114202766A true CN114202766A (en) 2022-03-18

Family

ID=80649291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111428606.1A Pending CN114202766A (en) 2021-11-29 2021-11-29 Method and device for extracting text field and electronic equipment

Country Status (1)

Country Link
CN (1) CN114202766A (en)

Similar Documents

Publication Publication Date Title
CN111401371B (en) Text detection and identification method and system and computer equipment
WO2019174130A1 (en) Bill recognition method, server, and computer readable storage medium
CN110414507B (en) License plate recognition method and device, computer equipment and storage medium
CN112016438B (en) Method and system for identifying certificate based on graph neural network
CN112052781A (en) Feature extraction model training method, face recognition device, face recognition equipment and medium
CN110598686B (en) Invoice identification method, system, electronic equipment and medium
CN108197644A (en) A kind of image-recognizing method and device
CN111680690B (en) Character recognition method and device
CN111353491B (en) Text direction determining method, device, equipment and storage medium
CN114038004A (en) Certificate information extraction method, device, equipment and storage medium
JP7026165B2 (en) Text recognition method and text recognition device, electronic equipment, storage medium
CN112541443B (en) Invoice information extraction method, invoice information extraction device, computer equipment and storage medium
CN110738219A (en) Method and device for extracting lines in image, storage medium and electronic device
CN110807455A (en) Bill detection method, device and equipment based on deep learning and storage medium
CN110852311A (en) Three-dimensional human hand key point positioning method and device
CN108830275B (en) Method and device for identifying dot matrix characters and dot matrix numbers
CN112949455B (en) Value-added tax invoice recognition system and method
CN112052845A (en) Image recognition method, device, equipment and storage medium
CN113011144A (en) Form information acquisition method and device and server
CN110738030A (en) Table reconstruction method and device, electronic equipment and storage medium
CN114881698A (en) Advertisement compliance auditing method and device, electronic equipment and storage medium
CN111104941B (en) Image direction correction method and device and electronic equipment
CN113111880A (en) Certificate image correction method and device, electronic equipment and storage medium
CN111461905A (en) Vehicle insurance fraud and claim evasion method and device, computer equipment and storage medium
Wicht et al. Camera-based sudoku recognition with deep belief network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination