CN113780267A - Method, device and equipment for character recognition and computer readable medium - Google Patents

Method, device and equipment for character recognition and computer readable medium Download PDF

Info

Publication number
CN113780267A
CN113780267A CN202010789024.5A CN202010789024A CN113780267A CN 113780267 A CN113780267 A CN 113780267A CN 202010789024 A CN202010789024 A CN 202010789024A CN 113780267 A CN113780267 A CN 113780267A
Authority
CN
China
Prior art keywords
image
character string
corrected image
recognition
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010789024.5A
Other languages
Chinese (zh)
Inventor
魏雪
何云龙
赖荣凤
梅涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010789024.5A priority Critical patent/CN113780267A/en
Publication of CN113780267A publication Critical patent/CN113780267A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Input (AREA)

Abstract

The invention discloses a method, a device and equipment for character recognition and a computer readable medium, and relates to the technical field of computers. One embodiment of the method comprises: correcting the image to be recognized by utilizing a template image of the image to be recognized to obtain a corrected image; recognizing the character string in the corrected image by adopting a character recognition model, positioning the position of the character string in the corrected image, and acquiring characters of the character string in the corrected image; determining the recognition result of the corrected image based on the recognition area in the template image, the position of the character string in the corrected image and the characters of the character string in the corrected image, wherein the recognition result comprises the items of the recognition area of the template image and the characters of the corrected image in the recognition area. This embodiment can improve the efficiency of text recognition in different certificates or tickets.

Description

Method, device and equipment for character recognition and computer readable medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer readable medium for character recognition.
Background
The identification of the certificate or the bill has wide application scenes, including an identity authentication system, a financial reimbursement system, an information input system and the like. An identification model can only identify a specific document or ticket, such as: identification of identity card, identification of driving license or identification of value added tax special invoice, etc.
In order to obtain a recognition model with better performance, firstly, a large number of specific images need to be collected, and key character strings in the specific images are labeled; and then training a recognition model by using the marked image, and outputting a recognition result of the key character string in the image by using the recognition model.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the existing identification model cannot be reused for the newly added identification requirement, and a large number of images need to be collected and marked again to establish the identification model, so that the technical problem of low character identification efficiency in different certificates or bills exists.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a computer readable medium for character recognition, which can improve the efficiency of character recognition in different certificates or tickets.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of character recognition, including:
correcting the image to be recognized by utilizing a template image of the image to be recognized to obtain a corrected image;
recognizing the character string in the corrected image by adopting a character recognition model, positioning the position of the character string in the corrected image, and acquiring characters of the character string in the corrected image;
determining the recognition result of the corrected image based on the recognition area in the template image, the position of the character string in the corrected image and the characters of the character string in the corrected image, wherein the recognition result comprises the items of the recognition area of the template image and the characters of the corrected image in the recognition area.
The method for correcting the image to be recognized by using the template image of the image to be recognized to obtain a corrected image comprises the following steps:
matching the character string in the image to be recognized with the anchor character string in the template image to obtain a target character string with the minimum editing distance between the image to be recognized and the anchor character string in the template image;
and correcting the image to be recognized by utilizing the position relation between the anchor point character string and the target character string to obtain a corrected image.
And the character string in the image to be recognized is obtained by recognizing the image to be recognized by using a character recognition model.
The correcting the image to be recognized by using the position relationship between the anchor character string and the target character string to obtain a corrected image includes:
establishing an affine transformation matrix according to the position relation between the anchor point character string and the target character string;
and transforming the image to be recognized into the correction image according to the affine transformation matrix.
The number of the anchor character strings is 4 or more than 4.
The determining, based on the identification area in the template image, the position of the character string in the corrected image, and the text of the character string in the corrected image, the identification result of the corrected image, which includes the item of the identification area of the template image and the text of the corrected image in the identification area, includes:
obtaining the intersection-parallel ratio of the field character strings in the corrected image based on the identification area in the template image and the area where the field character strings in the corrected image are located, wherein the area where the field character strings in the corrected image are located is determined by the positions of the field character strings;
determining an identification area of the corrected image according to the intersection ratio of the field character strings in the corrected image;
and taking the items of the identification area of the template image and the characters of the identification area of the correction image as the identification result.
Determining the identification area of the corrected image according to the intersection ratio of the character strings in the corrected image, wherein the identification area comprises the following steps:
and taking the area with the maximum intersection ratio in the corrected image as the identification area of the corrected image.
According to a second aspect of the embodiments of the present invention, there is provided a device for character recognition, including:
the correction module is used for correcting the image to be recognized by utilizing the template image of the image to be recognized to obtain a corrected image;
the recognition module is used for recognizing the character string in the corrected image by adopting a character recognition model, positioning the position of the character string in the corrected image and acquiring the characters of the character string in the corrected image;
a determining module, configured to determine a recognition result of the corrected image based on a recognition area in the template image, a position of the character string in the corrected image, and a text of the character string in the corrected image, where the recognition result includes an item of the recognition area of the template image and the text of the corrected image in the recognition area.
The correction module is specifically configured to match a character string in the image to be recognized with an anchor character string in the template image, so as to obtain a target character string in the image to be recognized, where the editing distance between the target character string and the anchor character string in the template image is the minimum;
and correcting the image to be recognized by utilizing the position relation between the anchor point character string and the target character string to obtain a corrected image.
According to a third aspect of the embodiments of the present invention, there is provided an electronic device for character recognition, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method as described above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method as described above.
One embodiment of the above invention has the following advantages or benefits: correcting the image to be recognized by utilizing the template image of the image to be recognized to obtain a corrected image; identifying the character string in the corrected image by adopting a character identification model, positioning the position of the character string in the corrected image, and acquiring the characters of the character string in the corrected image; the recognition result of the corrected image is determined based on the recognition area in the template image, the position of the character string in the corrected image, and the characters of the character string in the corrected image, the recognition result including the items of the recognition area and the characters in the recognition area. For different certificates or bills, the recognition models do not need to be trained again, and only corresponding template images need to be set, so that the character recognition efficiency of different certificates or bills can be improved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic illustration of a driver license in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of an identification card according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a main flow of a method of text recognition according to an embodiment of the invention;
FIG. 4 is a schematic flow chart of correcting an image to be recognized according to an embodiment of the present invention;
FIG. 5 is a schematic illustration of a template image according to an embodiment of the invention;
FIG. 6 is a diagram illustrating an anchor string and a corresponding target string according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of obtaining a corrected image using positional relationships according to an embodiment of the present invention;
FIG. 8 is a schematic illustration of a corrected image according to an embodiment of the invention;
FIG. 9 is a schematic flow chart illustrating the determination of the recognition result of the corrected image according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of the main structure of a text recognition device according to an embodiment of the present invention;
FIG. 11 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 12 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Generally, the bill body in the annotation data used to train the text recognition model is large and remains horizontal. In the actual recognition image, the bill body is not constant and may be inclined.
Therefore, in the recognition process, the recognition result of the character recognition model is often used in combination with the object detection model. Firstly, an image to be recognized passes through an object detection model, and a bill body is extracted and corrected to be horizontal; sending the corrected image to be recognized into a character recognition model to obtain a recognition result; finally, by making corrections to and structuring the recognition result for certain specific character strings, including certain fixed-style character strings and content enumerable character strings, such as: birth character strings, national character strings and the like in the identity card. In addition to the text recognition model, the object detection model also requires a large amount of labeled data, and the post-processing of a particular string also requires a priori knowledge.
The entire recognition process development takes weeks. The more data collected, the better the generalization ability of the model, the higher the recognition accuracy, the more time cost and computational resource cost are spent correspondingly. Meanwhile, in most cases, data annotation is completed by a professional annotation team, so that annotation rules need to be unified, and numerous annotation personnel need to be trained.
The customized identification mode obtained aiming at the specific marking data can only act on specific identification requirements, for example, the identification card image identification mode cannot be reused for a driving license image or an invoice image.
Taking a driving license and an identification card as an example, referring to fig. 1, fig. 1 is a schematic diagram of a driving license according to an embodiment of the invention. Referring to fig. 2, fig. 2 is a schematic diagram of an identity card according to an embodiment of the invention.
The object detection model obtained based on the identity card image training cannot effectively detect the driving card main body. The background colors of the identity card image and the driving card image are different from each other in the distribution of the character strings.
The driving license image and the identity card image cannot be directly multiplexed into one identification model; with continued reference to fig. 1, in the identification text, the registration date character string in the travel certificate contains a special form of the number 0, and is separated by a dashed line in the middle of the year, month and day. With continued reference to fig. 2, the birth date string in the identification card does not contain special characters and there are three specific chinese characters of year, month and day.
When new recognition requirements arise, technicians need to re-collect and label large amounts of data to train new detection and recognition models. A great deal of time, human resources, and labeled data and recognition models obtained by computing resources do not play a role any more. The characteristics of long development period and high investment cost prevent quick response to rich identification requirements of clients in specific scenes. Therefore, the technical problem that the character recognition efficiency of different certificates or bills is low exists.
In order to solve the technical problem of low character recognition efficiency in different certificates or bills, the following technical scheme in the embodiment of the invention can be adopted.
Referring to fig. 3, fig. 3 is a schematic diagram of a main flow of a text recognition method according to an embodiment of the present invention, in which a template image of an image to be recognized is used to obtain a corrected image, so as to obtain a position of a character string and text of the character string. And determining the recognition result of the corrected image based on the recognition area in the template image and by combining the position of the character string and the characters. As shown in fig. 3, the method specifically includes the following steps:
s301, correcting the image to be recognized by utilizing the template image of the image to be recognized to obtain a corrected image.
In the embodiment of the invention, the image to be recognized is the image where the character string needing to be recognized is located. As one example, the image to be recognized may be an identification card image including an identification number. As another example, the image to be recognized may be a travel license image including a registration date.
The camera of the terminal can be utilized to collect the image to be identified. As an example, in order to identify a character string in an image to be identified, the image to be identified may be collected by a camera of the terminal. Then, the scheme of the embodiment of the invention is adopted to identify the characters.
For each image to be recognized, there is a corresponding template image. The template image is the image corresponding to each certificate or ticket. As an example, for an identity card, a template image of the identity card may be set in advance; for the travel certificate, a template image of the travel certificate may be set in advance.
It is understood that for each image to be recognized, there is a corresponding template image that is preset. In the process of identifying the image to be identified, the image to be identified may be corrected using the template image of the image to be identified to obtain a corrected image.
This is because there are cases where the image to be recognized is not tilted horizontally or the subject position is comparatively small due to the shooting angle or other reasons. The main part position accounts for less means that the area that certificate or bill occupy and the area ratio of waiting to discern the image is less, if: the above ratio is less than 0.5.
The character recognition model is used for recognizing the image to be recognized, the recognition accuracy is low, and whether the recognition result is a character string needing to be recognized or not cannot be determined, namely whether the recognition character string can be output as a final recognition result or not can be determined.
Referring to fig. 4, fig. 4 is a schematic flow chart of correcting an image to be recognized according to an embodiment of the present invention, which specifically includes:
s401, matching the character string in the image to be recognized with the anchor character string in the template image to obtain a target character string with the minimum editing distance between the anchor character string in the image to be recognized and the anchor character string in the template image.
A plurality of character strings are included in an image to be recognized in order to obtain the character strings in the image to be recognized. Character strings in the image to be recognized can be recognized using a character recognition model. That is, the character string of the image to be recognized is obtained by recognizing the image to be recognized using the character recognition model.
In the embodiment of the invention, the character recognition model is obtained by utilizing the image training containing various character strings. It should be noted that, since the character recognition model is not specific to a specific certificate or ticket, and the training data is not limited to a specific certificate image or ticket image, the character recognition model can be continuously iteratively optimized by using images including various types of character strings as the training data. The character recognition model obtained by training has better generalization capability.
It can be understood that the character recognition model can output all character strings in the image to be recognized, but because the size and direction of the image to be recognized are different from those of the template image, and it cannot be determined whether the recognized character strings belong to the recognition area in the template image, and it cannot take all recognition results output by the character recognition model as final character recognition results.
The template image is the image corresponding to each certificate or ticket. The template image is on the clear and angle correct certificate or bill image, and the character string with relatively fixed content and position in the format is selected as the anchor character string. Recording the position of the anchor character string and the characters of the anchor character string, selecting the area where the character string to be recognized is located as a recognition area, recording the position of the recognition area, and creating a key or id for each recognition area to be used as a mark, namely, an item of the recognition area. In order to improve the accuracy of the recognition result, the type of the character string to be recognized can be recorded. As one example, the type of character string to be recognized may include one or more of the following, numbers, chinese characters, and letters.
Referring to fig. 5, fig. 5 is a schematic diagram of a template image according to an embodiment of the present invention. Encircled in black solid boxes in fig. 5 is an anchor string that includes: name, birth, social security number, and human resources and social security bureau.
The black dotted frame in fig. 5 is an identification area, which includes the following character strings, zhangsan, man, han, 7/1/1940, and 987654321, the items of the identification area are: cardholder name, cardholder gender, cardholder ethnicity, cardholder birth date, and cardholder assurance number.
In the embodiment of the present invention, in order to ensure the accuracy of correction, the number of anchor character strings is 4 or more than 4. Wherein the anchor character strings may be distributed around the template image.
With continued reference to fig. 5, the anchor string in fig. 5: names distributed on the upper side of the template image; anchor string: birth, distributed in the middle of the template image; anchor string: the social security numbers are distributed on the left side of the template image; anchor string: and the human resource and social security bureau are distributed at the lower side of the template image.
In the embodiment of the invention, the matching can be carried out through the editing distance between the character strings. Namely, matching the character string in the image to be recognized with the anchor character string in the template image to obtain the target character string with the minimum editing distance between the anchor character string in the image to be recognized and the anchor character string in the template image. The method aims to obtain the relative position relation between the image to be recognized and the template image.
Specifically, the edit distance is a quantitative measure of the difference between two strings, which is calculated by how many processing operations are required to change one string into another. The processing operations include three types: insert, delete, replace.
As an example, the strings kitten and sitting, convert kitten to sitting; the following operations are required:
step 1: k is replaced by s: kitten- - > sitten.
Step 2: e is replaced by i: sitten- - > sittin.
And 3, step 3: adding g: sittin- - > sitting.
That is, the edit distance of the strings kitten and sitting is 3.
And respectively calculating the editing distance between each character string of the image to be recognized and the anchor character string in the template image. And matching to the target character string with the minimum editing distance with the anchor character string. It is understood that the target character string having the minimum editing distance to the anchor character string, i.e., the character string corresponding to the anchor character string in the image to be recognized.
Referring to fig. 6, fig. 6 is a schematic diagram of an anchor character string region and a corresponding target character string region according to an embodiment of the present invention. The left image in fig. 6 is an image to be recognized, and the right image in fig. 6 is a template image.
The image to be recognized includes not only the document but also the background of the document, i.e. the white background in addition to the dimensions of the document. And respectively calculating the editing distance between each character string of the image to be recognized and the anchor character string in the template image.
Further, a target character string corresponding to the anchor character string, that is, a character string having the minimum edit distance from the anchor character string is connected by a dotted line. Such as: the social security number, the matched character string is the social security number, the two character strings are the same, and the editing distance is the minimum and is 0.
S402, correcting the image to be recognized by utilizing the position relation between the anchor character string and the target character string to obtain a corrected image.
The anchor character string corresponds to the target character string, and the position relation between the target character string and the anchor character string can be further obtained. As an example, the target string: and the name is positioned at a distance of 60 mm from the left side and 35 mm from the lower side in the image to be recognized. The position in the template image is 20 mm to the left and 5 mm to the bottom. The position of the upper left corner of the template image is 0 mm away from the left side, and the position of the upper left corner of the template image is 0 mm away from the lower side.
Referring to fig. 7, fig. 7 is a schematic diagram of obtaining a corrected image by using a positional relationship according to an embodiment of the present invention, which specifically includes:
s701, establishing an affine transformation matrix according to the position relation between the anchor character string and the target character string.
By using the positional relationship between the anchor string and the target string, an affine transformation matrix between the two sets of positions can be calculated. And (4) acting the calculated affine transformation matrix on the picture to be identified, so that the bill main body can be extracted, and a corrected image is obtained. The size and the direction of the corrected image are consistent with those of the template image.
S702, converting the image to be recognized into a correction image according to affine transformation.
Affine transformation is a linear transformation between two-dimensional coordinates to two-dimensional coordinates, which can be expressed in the form of multiplication by a matrix, plus a translation vector. Affine transformation preserves the flatness and parallelism of two-dimensional graphics. The straightness is that the straight line is still a straight line after being transformed. The parallelism is that the relative position relation between two-dimensional patterns is kept unchanged, parallel lines are still parallel lines, and the position sequence of points on the straight lines is not changed.
The subject of the image to be recognized can be extracted using the affine transformation matrix, and the subject from which the image to be recognized is extracted is taken as a corrected image. And correcting the size and the direction of the image to be consistent with the template image.
Referring to fig. 8, fig. 8 is a schematic diagram of a corrected image according to an embodiment of the present invention. It can be understood that the direction and size of the corrected image in fig. 8 are consistent with the direction and size of the template image.
In the embodiment of fig. 4, the original image to be recognized is different from the template image in terms of direction and size, and thus the accuracy of character recognition is affected. In the embodiment of the invention, the template images of the images to be recognized are preset aiming at the images to be recognized of different types without training an object detection model, and the template images of the images to be recognized are utilized to correct the images to be recognized so as to obtain corrected images.
The template image does not need additional annotation data and computing resources, is suitable for various evidences or bills, and can quickly respond to identification requirements with rich changes.
S302, recognizing the character string in the corrected image by adopting a character recognition model, positioning the position of the character string in the corrected image, and acquiring characters of the character string in the corrected image.
The corrected image is consistent with the template image in terms of orientation and size. Then a text recognition model may be employed to identify the character string in the corrected image, locate the position of the character string in the corrected image, and learn the text of the character string in the corrected image.
It should be noted that the character recognition model is not specific to a specific certificate or ticket, and thus can be trained using images including various types of character strings. In the embodiment of the present invention, the character recognition model used here may be the same as the character recognition model in S401.
The character recognition model can be used for recognizing the character strings in the corrected image, positioning the positions of the character strings in the corrected image and acquiring characters of the character strings in the corrected image.
S303, based on the identification area in the template image, the position of the character string in the correction image and the characters of the character string in the correction image, and determining the identification result of the correction image, wherein the identification result comprises the items of the identification area of the template image and the characters of the correction image in the identification area.
The template image includes an identification region, which is a region to which the identification result relates. Take the template image shown in fig. 5 as an example, wherein the identified region is a black dotted frame therein. The recognition result of the correction image may be determined based on the recognition area in the template image in the character string in the correction image.
With continued reference to fig. 5, the item of the identified area of the template image of fig. 5 is a name. Since the corrected image corresponds to the template image, the text of the corrected image in the recognition area of the template image, i.e. the recognition result belonging to the corrected image.
Referring to fig. 9, fig. 9 is a schematic flow chart of determining a recognition result of a corrected image according to an embodiment of the present invention, which specifically includes:
s901, obtaining the cross-over ratio of the character strings in the corrected image based on the identification area in the template image and the area where the character strings in the corrected image are located, wherein the area where the character strings in the corrected image are located is determined by the positions of the character strings.
The region where the character string is located in the corrected image can be determined according to the position of the character string in the corrected image. As an example, in recognizing a character string in a corrected image by using a character recognition model, the character string is located in a candidate box, and the area of the candidate box may be used as the area where the character string is located in the corrected image.
With continued reference to FIG. 8, the strings in FIG. 8: and the area where the Wang five is located is the area where the character string in the corrected image is located.
And obtaining the intersection-parallel ratio of the character strings in the corrected image based on the identification area in the template image and the area where the character strings in the corrected image are located.
Illustratively, the correction image may be placed on top of the template image, with an overlap between the identified region in the template image and the region of the correction image where the character string is located. And then the intersection ratio of the character strings in the corrected image can be calculated.
An Intersection over Union is a concept used in target detection. The intersection ratio in the present embodiment refers to an overlapping rate of the candidate frame (candidate frame) and the recognition area. I.e. the proportion of the intersection and union of the candidate box and the identified region. The optimal situation is complete overlap, i.e. a ratio of 1.
S902, according to the intersection ratio of the character strings in the corrected image, determining the identification area of the corrected image.
The identification area of the corrected image may be determined based on the cross-over ratio of the character strings in the corrected image. In the embodiment of the present invention, the region having the largest cross-over ratio in the corrected image is taken as the identification region of the corrected image. It is understood that the region of the maximum cross-over ratio in the corrected image is the region where the character string to be recognized is located.
And S903, taking the item of the identification area of the template image and the character of the identification area of the correction image as identification results.
The recognition area of the corrected image, i.e. the character string in fig. 8: the area of the Wangwu corresponds to the identification area of the template image. After the identification area of the corrected image is determined, since the character string in the corrected image has been identified using the character recognition model, the characters belonging to the identification area in the corrected image can be known.
In the embodiment of the invention, the item of the identification area is generated when the template image identification area is established, and the identification character string is returned as a structured result.
With continued reference to fig. 8, the identification text corresponding to the identification area with the name as the item in fig. 8 is: and (5) king five.
It is understood that the recognition result includes items of the recognition area in the template image and words of the recognition area of the correction image.
In the embodiment of fig. 9, the position of the character string in the corrected image and the character of the character string in the corrected image are combined with the recognition area in the template image to determine the recognition result of the corrected image.
In the embodiment, the template image of the image to be recognized is utilized to correct the image to be recognized, and a corrected image is obtained; identifying the character string in the corrected image by adopting a character identification model, positioning the position of the character string in the corrected image, and acquiring the characters of the character string in the corrected image; and determining a recognition result of the corrected image based on the recognition area in the template image, the position of the character string in the corrected image and the characters of the character string in the corrected image, wherein the recognition result comprises the items of the recognition area and the characters of the recognition area. For different certificates or bills, the recognition models do not need to be trained again, and only corresponding template images need to be set, so that the character recognition efficiency of different certificates or bills can be improved.
Referring to fig. 10, fig. 10 is a schematic diagram of a main structure of a text recognition device according to an embodiment of the present invention, where the text recognition device can implement a text recognition method, as shown in fig. 10, the text recognition device specifically includes:
the correcting module 1001 is configured to correct the image to be recognized by using the template image of the image to be recognized, so as to obtain a corrected image.
The recognition module 1002 is configured to recognize a character string in the corrected image by using a character recognition model, locate a position of the character string in the corrected image, and obtain characters of the character string in the corrected image.
A determining module 1003, configured to determine a recognition result of the corrected image based on the recognition area in the template image, the position of the character string in the corrected image, and the text of the character string in the corrected image, where the recognition result includes an item of the recognition area of the template image and the text of the corrected image in the recognition area.
In an embodiment of the present invention, the correcting module 1001 is specifically configured to match a character string in the image to be recognized with an anchor character string in the template image, so as to obtain a target character string in the image to be recognized, where the editing distance between the target character string and the anchor character string in the template image is the minimum;
and correcting the image to be recognized by utilizing the position relation between the anchor point character string and the target character string to obtain a corrected image.
In an embodiment of the present invention, the character string in the image to be recognized is obtained by recognizing the image to be recognized by using a character recognition model.
In an embodiment of the present invention, the correcting module 1001 is specifically configured to establish an affine transformation matrix according to a position relationship between the anchor character string and the target character string;
and transforming the image to be identified into the correction image according to the affine transformation matrix.
In one embodiment of the invention, the number of anchor strings is 4 or more than 4.
In an embodiment of the present invention, the determining module 1003 is specifically configured to obtain an intersection-parallel ratio of the character strings in the corrected image based on the identification area in the template image and the area where the character string in the corrected image is located, where the area where the character string in the corrected image is located is determined by the position of the character string;
determining an identification area of the corrected image according to the intersection ratio of the character strings in the corrected image;
and taking the items of the identification area of the template image and the characters of the identification area of the correction image as the identification result.
In an embodiment of the present invention, the determining module 1003 is specifically configured to use the area with the largest intersection ratio in the corrected image as the identification area of the corrected image.
Fig. 11 illustrates an exemplary system architecture 1100 of a method of text recognition or a device of text recognition to which embodiments of the present invention may be applied.
As shown in fig. 11, the system architecture 1100 may include terminal devices 1101, 1102, 1103, a network 1104, and a server 1105. The network 1104 is a medium to provide communication links between the terminal devices 1101, 1102, 1103 and the server 1105. Network 1104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 1101, 1102, 1103 to interact with a server 1105 over a network 1104 to receive or send messages or the like. Various messaging client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (examples only) may be installed on the terminal devices 1101, 1102, 1103.
The terminal devices 1101, 1102, 1103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 1105 may be a server that provides various services, such as a backend management server (for example only) that provides support for shopping-like websites browsed by users using the terminal devices 1101, 1102, 1103. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the method for character recognition provided by the embodiment of the present invention is generally executed by the server 1105, and accordingly, the device for character recognition is generally disposed in the server 1105.
It should be understood that the number of terminal devices, networks, and servers in fig. 11 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 12, shown is a block diagram of a computer system 1200 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 12 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 12, the computer system 1200 includes a Central Processing Unit (CPU)1201, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for the operation of the system 1200 are also stored. The CPU 1201, ROM 1202, and RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.
The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 1201.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a transmitting unit, an obtaining unit, a determining unit, and a first processing unit. The names of these units do not in some cases constitute a limitation to the unit itself, and for example, the sending unit may also be described as a "unit sending a picture acquisition request to a connected server".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
correcting the image to be recognized by utilizing a template image of the image to be recognized to obtain a corrected image;
recognizing the character string in the corrected image by adopting a character recognition model, positioning the position of the character string in the corrected image, and acquiring characters of the character string in the corrected image;
determining the recognition result of the corrected image based on the recognition area in the template image, the position of the character string in the corrected image and the characters of the character string in the corrected image, wherein the recognition result comprises the items of the recognition area of the template image and the characters of the corrected image in the recognition area.
According to the technical scheme of the embodiment of the invention, the template image of the image to be identified is utilized to correct the image to be identified, and a corrected image is obtained; identifying the character string in the corrected image by adopting a character identification model, positioning the position of the character string in the corrected image, and acquiring the characters of the character string in the corrected image; and determining a recognition result of the corrected image based on the recognition area in the template image, the position of the character string in the corrected image and the characters of the character string in the corrected image, wherein the recognition result comprises the items of the recognition area and the characters of the recognition area. For different certificates or bills, the recognition models do not need to be trained again, and only corresponding template images need to be set, so that the character recognition efficiency of different certificates or bills can be improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A method of character recognition, comprising:
correcting the image to be recognized by utilizing a template image of the image to be recognized to obtain a corrected image;
recognizing the character string in the corrected image by adopting a character recognition model, positioning the position of the character string in the corrected image, and acquiring characters of the character string in the corrected image;
determining the recognition result of the corrected image based on the recognition area in the template image, the position of the character string in the corrected image and the characters of the character string in the corrected image, wherein the recognition result comprises the items of the recognition area of the template image and the characters of the corrected image in the recognition area.
2. The method for character recognition according to claim 1, wherein the correcting the image to be recognized by using the template image of the image to be recognized to obtain a corrected image comprises:
matching the character string in the image to be recognized with the anchor character string in the template image to obtain a target character string with the minimum editing distance between the image to be recognized and the anchor character string in the template image;
and correcting the image to be recognized by utilizing the position relation between the anchor point character string and the target character string to obtain a corrected image.
3. The method of claim 2, wherein the character string in the image to be recognized is obtained by recognizing the image to be recognized by using a character recognition model.
4. The character recognition method according to claim 2, wherein the correcting the image to be recognized by using the position relationship between the anchor character string and the target character string to obtain a corrected image comprises:
establishing an affine transformation matrix according to the position relation between the anchor point character string and the target character string;
and transforming the image to be identified into the correction image according to the affine transformation matrix.
5. The method of character recognition according to claim 2, wherein the number of the anchor strings is 4 or more than 4.
6. The method for character recognition according to claim 1, wherein the determining a recognition result of the corrected image based on the recognition area in the template image, the position of the character string in the corrected image, and the character of the character string in the corrected image, the recognition result including the item of the recognition area of the template image and the character of the corrected image in the recognition area comprises:
obtaining the intersection-parallel ratio of the character strings in the corrected image based on the identification area in the template image and the area where the character strings in the corrected image are located, wherein the area where the character strings in the corrected image are located is determined by the positions of the character strings;
determining an identification area of the corrected image according to the intersection ratio of the character strings in the corrected image;
and taking the items of the identification area of the template image and the characters of the identification area of the correction image as the identification result.
7. The character recognition method of claim 6, wherein the determining the recognition area of the corrected image according to the cross-over ratio of the character strings in the corrected image comprises:
and taking the area with the maximum intersection ratio in the corrected image as the identification area of the corrected image.
8. An apparatus for character recognition, comprising:
the correction module is used for correcting the image to be recognized by utilizing the template image of the image to be recognized to obtain a corrected image;
the recognition module is used for recognizing the character string in the corrected image by adopting a character recognition model, positioning the position of the character string in the corrected image and acquiring the characters of the character string in the corrected image;
a determining module, configured to determine a recognition result of the corrected image based on a recognition area in the template image, a position of the character string in the corrected image, and a text of the character string in the corrected image, where the recognition result includes an item of the recognition area of the template image and the text of the corrected image in the recognition area.
9. The text recognition device according to claim 8, wherein the correction module is specifically configured to match a character string in the image to be recognized with an anchor character string in the template image to obtain a target character string in the image to be recognized, where an editing distance between the target character string and the anchor character string in the template image is the smallest;
and correcting the image to be recognized by utilizing the position relation between the anchor point character string and the target character string to obtain a corrected image.
10. An electronic device for character recognition, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
11. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202010789024.5A 2020-08-07 2020-08-07 Method, device and equipment for character recognition and computer readable medium Pending CN113780267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010789024.5A CN113780267A (en) 2020-08-07 2020-08-07 Method, device and equipment for character recognition and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010789024.5A CN113780267A (en) 2020-08-07 2020-08-07 Method, device and equipment for character recognition and computer readable medium

Publications (1)

Publication Number Publication Date
CN113780267A true CN113780267A (en) 2021-12-10

Family

ID=78835107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010789024.5A Pending CN113780267A (en) 2020-08-07 2020-08-07 Method, device and equipment for character recognition and computer readable medium

Country Status (1)

Country Link
CN (1) CN113780267A (en)

Similar Documents

Publication Publication Date Title
AU2017302250B2 (en) Optical character recognition in structured documents
US10140511B2 (en) Building classification and extraction models based on electronic forms
CN109308681B (en) Image processing method and device
US10380237B2 (en) Smart optical input/output (I/O) extension for context-dependent workflows
US10339373B1 (en) Optical character recognition utilizing hashed templates
US10943107B2 (en) Simulating image capture
CN110442744A (en) Extract method, apparatus, electronic equipment and the readable medium of target information in image
US11017498B2 (en) Ground truth generation from scanned documents
CN111931771B (en) Bill content identification method, device, medium and electronic equipment
CN113313114B (en) Certificate information acquisition method, device, equipment and storage medium
CN114283416A (en) Processing method and device for vehicle insurance claim settlement pictures
CN112418206B (en) Picture classification method based on position detection model and related equipment thereof
CN111145143B (en) Problem image determining method and device, electronic equipment and storage medium
CN112581344A (en) Image processing method and device, computer equipment and storage medium
CN112632952A (en) Method and device for comparing files
CN112801099A (en) Image processing method, device, terminal equipment and medium
CN113780267A (en) Method, device and equipment for character recognition and computer readable medium
CN112395450B (en) Picture character detection method and device, computer equipment and storage medium
CN114495146A (en) Image text detection method and device, computer equipment and storage medium
CN111401137A (en) Method and device for identifying certificate column
CN111178353A (en) Image character positioning method and device
CN112396059A (en) Certificate identification method and device, computer equipment and storage medium
CN117671681A (en) Picture labeling method, device, terminal and storage medium
CN115145823A (en) UI (user interface) automatic testing method and device, computer equipment and storage medium
CN113761849A (en) Prompting method and device for filling document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination