WO2021072879A1

WO2021072879A1 - Method and apparatus for extracting target text in certificate, device, and readable storage medium

Info

Publication number: WO2021072879A1
Application number: PCT/CN2019/118469
Authority: WO
Inventors: 黄文韬; 刘鹏; 王健宗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-10-15
Filing date: 2019-11-14
Publication date: 2021-04-22
Also published as: CN111126125A; CN111126125B

Abstract

Provided are a method and apparatus for extracting target text in a certificate, a computer device, and a computer-readable storage medium. The embodiments of the present application belong to the technical field of text identification. The method comprises: acquiring a template image and a test image that belong to the same certificate type, wherein the template image is marked with a character anchor point and a target frame position, and the character anchor point comprises first anchor point text; acquiring, in a first preset mode, a feature point matching relationship between an anchor point position on the template image and an anchor point position on the test image; performing solution by means of a transformation matrix according to the feature point matching relationship to obtain a perspective transformation operator; performing perspective transformation on the test image by means of the perspective transformation operator to obtain a perspective transformation image; acquiring a projection position of the target frame position on the perspective transformation image by means of the perspective transformation operator; and performing text identification on text at the projection position by means of a text identification model to obtain target text of the test image.

Description

Method, device, equipment and readable storage medium for extracting target text in certificate

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 15, 2019, the application number is 201910979567.0, and the application name is "Methods, devices, equipment, and readable storage media for extracting target text from documents". The entire content is incorporated into this application by reference.

Technical field

This application relates to the field of text recognition technology, and in particular to a method, device, computer equipment, and computer-readable storage medium for extracting target text from a certificate.

Background technique

After the deep learning technology matures, text recognition technology has made great progress. It can not only locate the text position in the image, but also recognize the located text. However, many deep learning models used for text recognition have a good recognition effect on standard frontal images, but they have poor adaptability for some images that are in a state of rotation and transformation relative to the standard frontal image and cannot be recognized well. However, most of the pictures taken in daily life are not standard frontal images, and there are varying degrees of perspective changes. If you want to achieve a better recognition effect on these images, you have to go through some filtering, cropping, and rotation transformations. In traditional technology, methods such as screening, cropping, and rotation transformation of images are usually done by manual preprocessing. But for users, sometimes it is necessary to extract text from a large number of image data, such as extracting the owner’s name, birthday and other information from a bunch of driver’s licenses. At this time, if you want to perform automated batch extraction, only through traditional technology In the text recognition, it is difficult to achieve. Because even if the user specifies the recognition area manually, there will be some differences in the position of each picture, and the position of the target field on each picture will be somewhat different. It is still difficult to eliminate this by text recognition alone. The impact of the difference in location. However, if the data is manually processed in advance to eliminate the difference in position, one is difficult to operate, and the other is excessive consumption, resulting in low efficiency of image recognition.

Summary of the invention

The embodiments of the application provide a method, device, computer equipment, and computer-readable storage medium for extracting target text in a certificate, which can solve the problem of low efficiency when extracting target text in a certificate through a text recognition model in traditional technology. .

In the first aspect, an embodiment of the present application provides a method for extracting target text in a certificate. The method includes: obtaining a template image belonging to the same certificate type and a detection image used to extract the target text, and the template image is marked There are text anchor points and target frame positions, wherein the text anchor point is a fixed field marked on the template image, the text anchor point includes a first anchor point text, and the first anchor point text is the The content of the fixed field, the target frame position is the position of the target text that needs to be extracted on the certificate marked on the template image; according to the first anchor text and based on the text recognition model, pass the first preview Set the method to obtain the feature point matching relationship between the anchor point position of the first anchor point text on the template image and the feature point contained in the anchor point position of the first anchor point text on the detection image , Wherein the anchor point position is the position of the first anchor point text on the corresponding image; according to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image; Perform perspective transformation of the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image; acquire the position of the target frame on the perspective transformation image through the perspective transformation operator Projection position; text recognition is performed on the text at the projection position on the perspective transformed image through the text recognition model, and the recognized text is extracted to obtain the target text of the detection image.

In a second aspect, an embodiment of the present application also provides a device for extracting target text in a certificate, including: a first obtaining unit, configured to obtain a template image belonging to the same certificate type and a detection image for extracting target text, so The template image is marked with a text anchor point and a target frame position, wherein the text anchor point is a fixed field marked on the template image, the text anchor point includes a first anchor point text, and the first anchor point The point text is the content of the fixed field, and the target frame position is the position of the target text that needs to be extracted on the certificate marked on the template image; the second acquiring unit is configured to Point text and obtain the anchor point position of the first anchor point text on the template image and the anchor point position of the first anchor point text on the detection image through a first preset method based on a text recognition model The feature point matching relationship between the feature points contained in each of them, wherein the anchor point position is the position of the first anchor point text on the corresponding image; the solving unit is used to perform a transformation matrix based on the feature point matching relationship Solve to obtain a perspective transformation operator that performs perspective transformation on the detected image; a transformation unit for performing perspective transformation on the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image Projection unit, used to obtain the projection position of the target frame position on the perspective transformed image through the perspective transformation operator; Recognition unit, used to identify all on the perspective transformed image through the text recognition model Perform text recognition on the text at the projection position, and extract the recognized text to obtain the target text of the detection image.

In a third aspect, an embodiment of the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor implements the target text in the certificate when the computer program is executed. method of extraction.

In a fourth aspect, the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the document The target text extraction method.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic diagram of an application scenario of a method for extracting target text in a certificate provided by an embodiment of the application;

2 is a schematic flowchart of a method for extracting target text in a certificate provided by an embodiment of this application;

FIG. 3 is a schematic diagram of the technical feature relationship of the target text extraction method in the certificate provided by the embodiment of the application;

4 is a schematic diagram of the flow of feature point extraction and feature point matching in the method for extracting target text in a certificate provided by an embodiment of this application;

Figures 5(a) and 5(b) are schematic diagrams of the perspective transformation principle provided by the embodiments of this application;

FIG. 6 is a schematic diagram of an image correction process through perspective transformation provided by an embodiment of this application;

FIG. 7 is a schematic flowchart of another embodiment of a method for extracting target text in a certificate according to an embodiment of this application;

8 is a schematic diagram of a simplified flow chart of technical feature relationships in the method for extracting target text in a certificate provided by the embodiment shown in FIG. 7;

9 is a schematic diagram of a perspective transformation operator in a method for extracting target text in a certificate provided by an embodiment of the application;

10(a) to 10(i) are schematic diagrams of an embodiment of the method for extracting target text in a certificate provided by an embodiment of this application;

11(a) to 11(i) are schematic diagrams of another embodiment of the method for extracting target text in the certificate provided by the embodiments of this application;

FIG. 12 is a schematic block diagram of a device for extracting target text in a certificate provided by an embodiment of this application; and

FIG. 13 is a schematic block diagram of a computer device provided by an embodiment of the application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

Please refer to FIG. 1. FIG. 1 is a schematic diagram of an application scenario of a method for extracting target text in a certificate provided by an embodiment of the application. The application scenarios include: (1) users. The user uses the input device or the input component of the computer device to mark the position of the text anchor point and the target frame on the template image. (2) Terminal. The terminal is used to execute the steps of the target text extraction method in the certificate. The terminal may be a computer device such as a smart phone, a smart watch, a notebook computer, a tablet computer, or a desktop computer.

The working process of each subject in Figure 1 is as follows: the user marks the text anchor point and target frame position on the template image, and stores or uploads the template image to the system for the terminal to obtain, and the terminal obtains the template image and the template image belonging to the same certificate type. A detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor point is a fixed field marked on the template image, and the text anchor point includes the first An anchor text, the first anchor text is the content of the fixed field, and the target frame position is the position of the target text that needs to be extracted on the certificate marked on the template image; according to the The first anchor point text is based on a text recognition model, and the anchor point position of the first anchor point text on the template image and the position of the first anchor point text on the detection image are acquired through a first preset method. The feature point matching relationship between the feature points contained in each anchor point position; according to the feature point matching relationship, a transformation matrix is used to solve the problem to obtain a perspective transformation operator that performs perspective transformation on the detection image; Perform perspective transformation through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image; acquire the projection position of the target frame position on the perspective transformation image through the perspective transformation operator; The text recognition model performs text recognition on the text at the projection position on the perspective transformed image, and extracts the recognized text to obtain the target text of the detection image.

It should be noted that FIG. 1 only shows a desktop computer as a terminal. In actual operation, the type of terminal is not limited to that shown in FIG. 1. The application scenario of the method for extracting target text in the certificate is only used to illustrate this application. The technical solution is not used to limit the technical solution of this application.

Please refer to FIG. 2. FIG. 2 is a schematic flowchart of a method for extracting target text in a certificate provided by an embodiment of the application. The target text extraction method in the certificate is applied to the terminal in FIG. 1 to complete all or part of the functions of the target text extraction method in the certificate. As shown in Figure 2, the method includes the following steps S201-S206:

S201. Obtain a template image belonging to the same certificate type and a detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor point is on the template image A marked fixed field, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is to be extracted on the certificate marked on the template image The location of the target text.

Among them, the text anchor refers to the fixed field defined by the user on the template image, and the fixed field refers to the field that will not change in different samples of the same type of document, such as the "name" or "citizen ID number" on the ID card "And other fixed fields. Since the fixed field on the template image is defined by the position of the fixed field and the content of the fixed field, the position of the fixed field is the anchor position, and the content of the fixed field is the anchor text, so the text anchor contains Anchor position and anchor text. More specifically, the position of the fixed field on the template image, that is, the area covered by the fixed field on the template image is called the anchor position, and the content of the fixed field, that is, the specific meaning of the fixed field description, such as identity The "name" described in the "name" field on the certificate image or the "citizen ID number" described in the "citizen ID number" field is called the anchor text of the text anchor.

The target frame position refers to the location of the area covered by the text content that needs to be extracted on the certificate defined by the user on the template image. For example, the position of the "name" field on the ID card is the anchor point, and the specific assignment of the name, such as The position of "Zhang San" on the template image is the target frame position. The position of the target frame is determined by the user according to the area covered by the text to be extracted.

Specifically, because the obtained detection image is often not a standard frontal image that matches the perspective of the template image, it is necessary to perform image angle correction on the detection image to be recognized before applying the deep learning model to recognize the detection image to extract the text content of the image. Adjust to rotate the image to an appropriate angle to improve the recognition effect of the deep learning model on the content of the detected image.

In order to achieve the purpose of accurately correcting the angle of the detected image, the embodiment of the application adopts the feature of a fixed field in the credential image, and uses the content of the fixed field as an intermediate medium to obtain a perspective transformation operator that rotates the detected image. Later, the image is subjected to perspective transformation. Therefore, it is first necessary to obtain a template image belonging to the same certificate type and a detection image used to extract the target text. The template image has a text anchor point and target frame position marked by customization, wherein the text anchor point includes the first text The anchor position and the first anchor text. For example, the user can pre-select fixed fields on the template image, that is, the text anchor points marked on the template image, and recognize the fixed text content in the frame selection area through text recognition or the user enters the content of the fixed field ( That is, the first anchor point text), and then find the text area that is the same as the fixed text selected on the template image in the input detection image, and then select the text area found on the detection image and the corresponding frame on the template image The region performs feature point extraction and matching, so that by first extracting the local regions where the fixed fields in the document image are located, and then matching these local regions, at this time, because only partial regions on the image are matched, it can be It can effectively reduce the influence and interference caused by the wrong similar areas in the entire image, thereby improving the quality and efficiency of the extraction and matching of the template image and the local area in the detection image, and can improve the accuracy of feature point extraction and matching. For example, please refer to FIG. 3, which is a schematic diagram of the technical feature relationship of the method for extracting target text in the certificate provided by the embodiment of this application. As shown in Figure 3, where A, C, and F are the same fixed fields in the certificate, A1, C1, and F1 are the anchor positions of the same fixed fields A, C, and F in the certificate, and A2, C2, and F2 are The anchor texts of the same fixed fields A, C, and F in the certificate, the corresponding relationship between A1, C1, and F1 is obtained through A2, C2, and F2, and the feature points are extracted and extracted for the area where A1, C1, and F1 are located. Matching, because only the area where A1, C1 and F1 are located on the image is extracted and matched, it can effectively reduce the impact of the wrong similar areas in the entire image, thereby improving the template image and the local area in the detected image The quality and efficiency of extraction and matching.

Further, in the embodiment of the present application, the user is allowed to customize the template image of the certificate to perform text recognition on the designated target in the certificate image. The setting of the anchor point and target frame position on the template image can be customized by the developer or user. Among them, the anchor text can be directly obtained by manual input. For example, the text content of fixed fields such as "name", "date of birth" and "licensing authority" on the ID card can be directly obtained according to the fixed field content on the certificate. The method of obtaining the anchor point position and target frame position can be obtained by custom programs such as the Event mouse event in Opencv to obtain the mouse pointer position and other custom programs to obtain the anchor point position and target frame position drawn by the manual mouse on the template image, for example, the position on the ID card The position of the fixed fields such as "name" and "date of birth" can be obtained through the Event mouse event in Opencv to obtain the position of the mouse pointer to obtain the coordinates of the anchor point position and the position of the target frame drawn by the manual mouse on the ID card image. The position can be described by the coordinates of the upper left corner and the lower right corner of the rectangle, and then the anchor point and the target frame position are defined in a programming language.

Further, for the template image being edited by the user, if there are related records and corresponding storage data in the system, the template image data can be directly obtained from the background storage. If there is no relevant record, after the user completes the marking process, the picture and the marked information need to be uploaded together, and then the computer device obtains a template image on which the anchor point position and the target frame position are set (defined) by the user.

S202. According to the first anchor text and based on the text recognition model, obtain the anchor position of the first anchor text on the template image and the location of the first anchor text in a first preset manner. The feature point matching relationship between the feature points contained in the anchor point positions on the detection image, wherein the anchor point position is the position of the first anchor point text on the corresponding image.

Among them, the text recognition model, also known as the text recognition model, is Text recognition in English, which refers to a model that uses computers to automatically recognize characters, such as OCR text recognition, and the English is Optical Character Recognition.

Specifically, according to the first anchor point text and based on a text recognition model, the anchor point position of the first anchor point text on the template image and the position of the first anchor point text on the template image are acquired in a first preset manner. The feature point matching relationship between the feature points contained in the anchor point positions on the detection image, wherein the anchor point position is the position of the first anchor point text on the corresponding image, for example, the anchor point on the template image The position is the position of the first anchor point text on the template image, and the anchor point position on the detection image is the position of the first anchor point text on the detection image. It can include the following two situations:

1) After the computer device obtains the template image and the detection image belonging to the same certificate type, it converts the detection image once, and performs perspective transformation of the detection image through a perspective transformation operator to obtain a perspective consistent with the perspective of the template image The image is transformed, and the projection position of the target frame position on the perspective transformed image is obtained through the perspective transformation operator. For example, please continue to refer to Figure 3. As shown in Figure 3, the anchor text A2 and C2 are the same fields. According to the same relationship between the anchor text A2 and C2, the correspondence between the anchor point position of A1 and the anchor point position of C1 is obtained. According to the feature point extraction algorithm, extract the respective feature points of the anchor point position of A1 and C1 anchor point, and obtain the feature point matching relationship between A1 and C1 through the feature point matching algorithm, and then according to the feature point matching relationship of A1 and C1 , Obtain a perspective transformation operator that rotates the detection image into a standard frontal image, perform perspective transformation on the detection image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image, and pass the perspective transformation The operator obtains the projection position of the target frame position on the perspective transformed image, performs text recognition on the area where the projection position is located on the detected image through a text recognition model, and extracts the target text.

2) On the basis of the situation in (1) above, the detected image is transformed again, and the first perspective transformed image is perspective transformed by a second perspective transformation operator to obtain a second perspective transformed image. The second perspective transformation operator calculates the projection position of the target frame position on the second perspective transformation image. Please continue to refer to Figure 3, as shown in Figure 3, according to the same relationship between the anchor text A2, C2 and F2, the corresponding relationship between the anchor point position of A1, the anchor point position of C1 and the anchor point position of F1 is obtained, according to A1 and C1 The feature point matching relationship is obtained, and the first perspective transformation operator that rotates the detected image into a standard frontal image is obtained, and the detected image is converted into a first perspective consistent with the perspective of the template image through the first perspective transformation operator The image is transformed, and the second perspective transformation operator for the perspective transformation between the template image and the detection image E is obtained from the feature point matching relationship of A1 and F1, and the target position B1 is projected onto the detection image through the second perspective transformation operator On E, the position (text area) H1 where the target text recognition is performed on the detected image is obtained, the text recognition is performed on the area H1 on the detected image through the text recognition model, and the target text H2 is extracted.

Further, please refer to FIG. 4, which is a schematic diagram of the process of feature point extraction and feature point matching in the method for extracting target text in the certificate provided in an embodiment of this application. As shown in Figure 4, feature point extraction and matching are performed on the template image and the detection image. In the perspective transformation, it is necessary to find the corresponding points of the image before and after the transformation, so as to calculate the matrix used for the perspective transformation as the transformation operator. To find such a corresponding relationship, in the embodiment of the present application, a feature point extraction algorithm and a feature point matching algorithm are used to perform automatic matching using a unified standard of the algorithm. In the embodiment of the present application, the corresponding anchor points in the template image and the detection image need to be extracted through the feature point extraction algorithm, and then the feature points are matched through the feature point matching algorithm, so as to match the feature points according to the feature point matching algorithm. The relationship calculates the operator of the perspective transformation.

Among them, the feature point extraction algorithm compares each point of the image with its surrounding points, and calculates the feature value of each point according to the standard included in the algorithm. The standard here refers to the method of calculating the feature point value, for example, you can Use SIFT algorithm (Scale-invariant feature transform in English, SIFT for short), or use SURF algorithm (Full name is Speeded-Up Robust Features), if this point is the largest or smallest in the area , You can consider this to be a characteristic point. Then, by assigning a high-dimensional direction parameter to each feature point to reflect its gradient information in different directions, it is used as the feature parameter of this point or called feature vector, that is, different parameters are used to describe the feature point from different angles. It should be noted that whether the subsequent feature points are matched is not the matching of the location of the feature points on the respective images, but the matched feature points have similar properties in the local area of the respective images, or they are called similar attributes. Corresponding points that can be overlapped after perspective transformation. Please continue to refer to Figure 3. If there is a feature point Am at anchor point A in Figure 3 and feature point Fn exists at anchor point F, m and n are integers, and feature points Am and Fn are matched feature points, not because of feature points Am and Fn The positions in the respective images are the same. For example, they are the corresponding vertices of the rectangle where the graphics are located. Instead, Am and Fn are eigenvalues calculated using a unified standard. For example, the eigenvalues calculated by the SIFT algorithm or the SURF algorithm are used by Cosine similarity or the point that meets the requirements of the matching relationship after calculating the distance between two eigenvectors. After the feature points are extracted, the matching relationship between the feature points is counted by the feature point matching algorithm, for example, the cosine similarity of the feature vectors between the two feature points or the distance between the two feature vectors can be used to determine whether the feature points match.

Further, when performing feature point matching, the matched feature points are points with similar surrounding changes. For example, the cosine similarity of the feature vector between the points on the two images of the template image and the detection image can be calculated, and the cosine similarity can be calculated according to the cosine. The similarity ranks the points. For example, suppose there is a feature point A on the template image, and after calculating the similarity with the feature points on the detection image, the point with the largest cosine similarity on the detection image is A1, and the point with the second largest cosine similarity is A2. The similarity of the feature vector between A and A1 is 0.98, and the similarity between A and A2 is 0.97. If such a similar situation occurs, it is judged that the feature point A does not have a matching feature point on the detected image, and A does not participate in the subsequent perspective transformation In the calculation of the operator, if the similarity between A and A1 is 0.98, and the similarity between A and A2 is 0.68, it is judged that A and A1 are matching feature points, so that A and A1 are included in the calculation of the subsequent perspective transformation operator. A threshold is set during the process. If the feature point is being judged, the difference between the similarity between the first similar point and the second similar point is calculated. When the difference is not less than the preset threshold, it is judged that the feature point is unique For the matched feature point, both points are included in the subsequent calculations. On the contrary, if the difference is less than the preset threshold, it is considered that the feature point cannot find a unique matching point and is not included in the subsequent calculation.

S203: According to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image.

Specifically, according to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detection image is to find out the corresponding points on the input detection image and the given template image If at least four pairs of matching feature points are found, the transformation operator for the perspective transformation required to transform the detected image into a rotation that matches the perspective of the template image can be calculated.

Further, the perspective transformation operator can be calculated in conjunction with the full text recognition. The calculation process of the perspective transformation operator is as follows: ax=b, a and b are the coordinates of the known feature points, and x is the operator, where x is a matrix, including 9 values.

S204: Perform perspective transformation on the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image.

Specifically, the computer device performs a solution through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detection image, and then applies perspective transformation technology to perform perspective transformation of the detection image through the perspective transformation operator to match the perspective of the template image. The standard frontal image. The detection image is translated and rotated in the three-dimensional space through the perspective transformation operator, that is, the coordinates are moved in the three-dimensional space, and then the projection on the two-dimensional plane is obtained by taking pictures, so as to automatically correct the detection image to be the same as the template image according to the template image The same frontal standard image with a viewing angle can greatly reduce labor consumption and improve the accuracy of text recognition compared to the traditional technology by manually changing the viewing angle of the detected image. The process of perspective transformation may also be to convert one by one coordinates on the image in the three-dimensional space into coordinates on the two-dimensional plane through a perspective transformation operator to obtain a standard frontal image of the detected image. Among them, perspective transformation is a method of rotating a two-dimensional picture in a three-dimensional space and then projecting it onto a two-dimensional plane to form a two-dimensional figure. The more intuitive term for perspective transformation can be called "space transformation" or "three-dimensional coordinate transformation."

Further, please refer to Fig. 5(a) and Fig. 5(b), Fig. 5(a) and Fig. 5(b) are schematic diagrams of the perspective transformation principle provided by the embodiment of the application. First of all, the value of the third dimension of all points (x, y) in the three-dimensional space in the two-dimensional image is regarded as a fixed value, such as z=1, and all two-dimensional points can be converted into points in three-dimensional space ( x, y, 1), and then by multiplying each point with a 3x3 transformation matrix to obtain the rotated point (X, Y, Z), the 3x3 matrix can describe the rigid body transformation of the image in the three-dimensional space, which is exactly the original The transformation method required in the application embodiment, and the matrix less than 3x3 cannot describe this relationship. After the image is rotated in a three-dimensional space, by dividing each point by the value of the z coordinate, all points can be converted to (X/Z, Y/Z, 1) to convert the points of the three-dimensional image Project back to the two-dimensional plane with z=1 again to obtain the point (x', y'), where x'=X/Z, y'=Y/Z. The parameters in the 3x3 matrix have no specific meaning. The nine parameters together represent the operator of the perspective transformation. The 3x3 transformation matrix has nine values, but since only the projection of the transformed three-dimensional image on the two-dimensional plane is required in the end, so Any one of the 9 values can be set to 1, so when solving the transformation operator, there are only 8 unknowns. Therefore, when the solution is required, four sets of feature points need to be found as mapping points, and the four sets of mapping points are four sets of matching Relationship, the four sets of matching relationships just determine a perspective transformation relationship. Since the 3x3 matrix contains 9 unknowns, any one of the unknowns can be set to 1, and the value of 8 unknowns needs to be obtained as the operator of perspective transformation, so at least four matching relationships corresponding to four sets of feature points are required to obtain 8 Unknown. Although at least four sets of matching relationships are required, in general, there will be dozens or hundreds of feature points, and the limit of the error function is calculated for the extracted multiple feature points to determine the operator with the smallest error.

After this transformation, the process of rotating the image in the three-dimensional space and projecting the three-dimensional image back to the two-dimensional space can be completed, thereby transforming the image from different perspectives, and transforming some non-standard perspective images to match the template image The standard viewing angle of the image can be used to extract the text at the specified position in the text recognition. Please refer to FIG. 6, which is a schematic diagram of an image correction process through perspective transformation provided by an embodiment of the application. As shown in Figure 6, in order to achieve this transformation, as shown in Figure 5, a 3x3 transformation matrix needs to be multiplied by (x, y, 1), and to find such a matrix, you need to find at least four The converted detection image and the corresponding feature points on the template image.

S205: Obtain a projection position of the target frame position on the perspective transformed image through the perspective transformation operator.

Specifically, the computer device performs a solution based on the feature point matching relationship through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detection image, and then performs perspective transformation on the detection image through the perspective transformation operator In order to obtain a perspective transformation image that matches the perspective of the template image, the projection position of the target frame position on the perspective transformation image can be obtained through the perspective transformation operator. For example, please continue to refer to FIG. 3 to obtain the projection position H1 of the target frame position B1 on the perspective transformed image through a perspective transformation operator.

S206: Perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.

Among them, text recognition is the recognition of text, and text recognition is a multi-classification task. The text recognition model in the embodiments of the present application is essentially a combined model of two models, one is a position detection model that detects text positions first, and the other is a text recognition model that performs text recognition later.

Specifically, the computer device obtains the projection position of the target frame position on the perspective transformed image through the perspective transformation operator, and uses the text recognition model to identify the target frame projected on the transformed image. Recognition and extraction of the text in the box are performed in the area to obtain the target text of the detected image, so as to realize the conversion of the input image to the full-text recognition by combining the two technologies of perspective transformation and feature point matching in the traditional computer vision technology After the same viewing angle as the template image, the text recognition and extraction of the specified area are performed. For example, please continue to refer to Figure 3. In the embodiment of this application, the anchor point position A1, the anchor point text A2 and the target frame position B1 of the template image are to be obtained. Now it is necessary to accurately extract the target frame position B1 corresponding to the detected image The text content of the area. In the embodiment of this application, since the anchor text A2, C2, and F2 are the same fields, the process of text recognition mainly includes: 1) According to the same relationship between the anchor text A2, C2, and F2, obtain Correspondence between A1 anchor point position, C1 anchor point position and F1 anchor point position; According to the feature point matching relationship of A1 and C1, the operator D that rotates the detection image into a standard frontal image is obtained; the detection image is rotated to the template image Meet the standard frontal image E. 2) Obtain the perspective transformation operator G between the template image and the detection image E from the feature point matching relationship of A1 and F1; project the target position B1 onto the detection image E through G to obtain the target text recognition on the detection image The position (text area) H1; text recognition is performed on the area of H1 on the detected image through the text recognition model, and the target text H2 is extracted.

The embodiment of the application provides a method for extracting target text in a certificate. By combining the two technologies of perspective transformation and feature point matching in traditional computer vision technology with full text recognition, the input image is converted into a template image with the same perspective. The text recognition and extraction of the designated area not only avoids the labor and time consumption caused by the complete custom logic for the different extraction requirements of each type of document, which greatly reduces the cost consumption, but also avoids too general logic. The extraction problem caused by insufficient precision can improve the accuracy and efficiency of text recognition.

Please refer to FIG. 7. FIG. 7 is a schematic flowchart of another embodiment of a method for extracting target text in a certificate according to an embodiment of the application, including the following processes:

S701. Obtain a template image belonging to the same certificate type and a detection image for extracting target text, where the template image is marked with a text anchor point and a target frame position, wherein the text anchor point includes the first anchor text and The position of the first anchor point.

Specifically, in this embodiment, the text anchor point further includes the first anchor point position, and the user only needs to preset the first anchor point text, the first anchor point position, and the target frame position included in the text anchor point. The computer equipment obtains the template image belonging to the same certificate type and the detection image used to extract the target text. For example, please refer to FIG. 3 and FIG. 8. FIG. 8 is a schematic diagram of the simplified flow chart of the technical feature relationship in the target text extraction method in the certificate provided by the embodiment shown in FIG. 7, as shown in FIG. 3 and FIG. The middle is to obtain the anchor point position A1, the anchor point text A2, and the target frame position B1 of the template image, so as to accurately extract the text content of the target frame position B1 in the corresponding area on the detection image through A1 and A2.

S702: Extract a second anchor point text on the detection image that is consistent with the first anchor point text through a text recognition model.

Specifically, it is necessary to first extract the second anchor text on the detection image that is consistent with the first anchor text on the template image through a text recognition model. For example, please continue to refer to Figures 3 and 8. As shown in Figures 3 and 8, in the embodiment of the present application, the anchor text C2 on the detection image that is the same as the anchor text A2 of the template image is to be obtained to pass A2 and C2 get the corresponding relationship between A1 and C1.

S703: Obtain a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model.

Specifically, the second anchor point position corresponding to the first anchor point position on the detection image is obtained through the second anchor point text based on the text recognition model. Please continue to refer to FIGS. 3 and 8 to obtain Detect the image and input the image you want to detect into the text recognition model. To find out the field area C1 that matches the area A1 where the anchor text defined in the template image is located, the text recognition model needs to be used on the detected image first Find the field C2 that is consistent with the A2 field, get the field area C1 where C2 is located through C2, and find the field area C1 that matches the anchor point position A1, for example, the template image of the ID card and the detection image of the ID card Areas A1 and C1 where the "name" field is located in.

S704: Extract a first feature point set included in the first anchor point position and a second feature point set included in the second anchor point position based on a preset feature point extraction algorithm.

S705. Acquire the first feature between the feature points in the first feature point set and the second feature point set based on a feature point matching algorithm according to the first feature point set and the second feature point set Point matching relationship.

Specifically, the first feature point set included in the first anchor point position and the second feature point set included in the second anchor point position are extracted according to the feature point extraction algorithm in step S202, and the first feature point set is extracted according to the first feature point. The first feature point matching relationship between the first feature point set and the feature points in the second feature point set is obtained based on the feature point matching algorithm in step S202. For example, please continue to refer to Figures 3 and 8, based on the preset feature point extraction algorithm to extract the first feature point set contained in the first anchor point position A1 and the second feature point contained in the second anchor point position C1 A set, according to the first feature point set and the second feature point set, based on a feature point matching algorithm to obtain the first feature between the feature points in the first feature point set and the second feature point set Point matching relationship.

S706: According to the first feature point matching relationship, perform a solution through a transformation matrix to calculate a first perspective transformation operator that performs perspective transformation on the detection image.

Specifically, please continue to refer to FIGS. 3 and 8 to extract the feature points of A1 and C1, and calculate the first operator D of the perspective transformation according to the feature point matching relationship composed of the feature points of A1 and C1.

S707. Perform perspective transformation on the detected image through the first perspective transformation operator to obtain a first perspective transformed image that matches the angle of view of the template image.

Specifically, please continue to refer to Figures 3 and 8, the detection image is transformed into a standard frontal image E conforming to the template image perspective through the first perspective transformation operator D, and the first perspective transformation operator D is used to obtain The projection position of the position of the target frame B1 on the first perspective transformed image, and the text recognition is performed on the area identified by the target frame projected on the transformed first perspective image through the text recognition model And extraction to obtain the target text of the detected image.

Further, after the detection image is subjected to perspective transformation by the first perspective transformation operator, since the transformed first perspective transformation image may still have a certain perspective difference from the template image, the next step is not to change the position of the target frame at all. The change is directly mapped to the transformed first perspective transformation image, but a second perspective transformation operator between the template image and the transformed first perspective transformation image is found to pass the target frame through the second perspective transformation calculation. The sub is projected onto the transformed second perspective transformed image through perspective transformation. Please continue to refer to FIG. 3, FIG. 7 and FIG. 8. In this embodiment, the detection image is subjected to perspective transformation by the first perspective transformation operator to obtain a first perspective that matches the perspective of the template image. After the step of transforming the image, it also includes:

S708. Input the first perspective transformation image into the text recognition model, and obtain a third anchor point position on the first perspective transformation image corresponding to the first anchor point position through the first anchor point text . S709: Extract a third set of feature points included in the position of the third anchor point based on the feature point extraction algorithm. S710. According to the first feature point set and the third feature point set, obtain the first feature point set between the feature points in the first feature point set and the third feature point set based on the feature point matching algorithm. Two feature point matching relationship. S711: According to the second feature point matching relationship, solving through the transformation matrix to calculate a second perspective transformation operator that performs perspective transformation on the first perspective transformation image. S712: Perform perspective transformation on the first perspective transformation image through the second perspective transformation operator to obtain a second perspective transformation image.

Specifically, the transformation process from step S708 to step S710 is similar to that of step S703 to step S707. Please continue to refer to Figures 3 and 8, input the transformed standard frontal image E corresponding to the first perspective transformed image into the text recognition model, and find the text area F1 that matches the area A1 where the anchor text A2 of the template image is located , Extract the third feature point set contained in the third anchor point position F1 based on the feature point extraction algorithm, and perform feature point extraction and matching on A1 and F1 according to the feature point sets corresponding to each of A1 and F1, based on the The feature point matching algorithm obtains the second feature point matching relationship between the feature points in the first feature point set and the third feature point set, and calculates the second perspective transformation based on the second feature point matching relationship. The perspective transformation operator G performs perspective transformation on the first perspective transformation image through the second perspective transformation operator G to obtain a second perspective transformation image, so that the perspectives of the second perspective transformation image and the template image are maximized It may be consistent, and the projection H1 of the target frame B1 on the second perspective transformed image is finally obtained through the second perspective transformation operator G.

S713: Calculate the projection position of the target frame position on the second perspective transformed image by using the second perspective transformation operator. S714. Input the second perspective transformation image into the text recognition model, perform text recognition on the text at the projection position on the second perspective transformation image through the text recognition model, and extract the recognized text to Obtain the target text of the detected image.

Specifically, the projection H1 of the target frame B1 on the transformed second perspective transformation image is calculated through the second perspective transformation operator G, and the target frame B1 is first calculated through the first perspective transformation operator. After the projection H1' on the first perspective transformed image, the second perspective transformation operator is used to perform the perspective transformation on the projection H1' to obtain the projection H1 of the target frame B1 on the second perspective transformed image. The text in the region H1 identified on the second perspective transformation image is recognized and extracted from the text in the frame through a text recognition model to obtain the target text H2 of the detection image.

In an embodiment, the step of calculating a first perspective transformation operator for performing perspective transformation on the detected image by solving through a transformation matrix according to the first feature point matching relationship includes:

Use the matching relationship between every four pairs of feature points to solve through the transformation matrix to obtain a perspective transformation operator; according to each of all matching feature points in the first feature point set and the second feature point set Combining the four pairs of feature points, repeat the above process of obtaining a perspective transformation operator for every four pairs of feature points to obtain multiple perspective transformation operators, and use the multiple perspective transformation operators as a set of perspective transformation operators Subset; according to the pre-built error function of the perspective transformation operator, the perspective transformation operator in the perspective transformation operator set corresponding to the minimum value in the error function is obtained by seeking the limit as the first perspective transformation operator.

Specifically, for the calculation of the transformation operator, first, when the nine values of the matrix are multiplied by the same number or divided by the same number, the matrix is obtained. The transformation effect produced when it is applied to the image is the same, so nine values can be preset One of the values is 1, and then the other eight values are solved according to the matching relationship. When the above steps are performed, when there are and only four pairs of matching relationships, the unique solution of the matrix can be found; when the matching relationship is less than four pairs, there will be infinite solutions, and the unique transformation relationship cannot be obtained; in general, The matching relationship will be far greater than four. At this time, the equation system usually has no solution. In the case of more than four matching points, it is necessary to find a solution that minimizes the total error after transformation by seeking the limit.

Obtaining the limit is usually to construct an error function about the transformation operator, and find the position of the minimum value according to the change trend of the error function. Find an operator that minimizes the total error after the transformation by seeking the limit. When seeking the limit, an error function f(D) is constructed. D represents the transformation operator, which is an unknown variable, and f(D) represents the calculation formula of the total error. , Is a function of D. What we need to do is to find the value of D that can make f(D) get the minimum value. For example, if f(D6) can make f(D) get the minimum value, the operator D6 is the more accurate operator selected. Furthermore, the construction process of f(D) is as follows:

For any operator D, the process of describing the total error with a function is as follows. For example, there are two matching feature points A1 and A11. A11 is called the feature point of A1, and the point corresponding to A1 is calculated by the operator D as A12, A12 are called the corresponding points of A1, A1* operator D=A12, calculate the distance d1 between A11 and A12, the smaller the distance d1 between A11 and A12, please refer to Figure 9, Figure 9 is the certificate provided by the embodiment of this application The schematic diagram of the perspective transformation operator in the target text extraction method in Figure 9 shows that the smaller the error of the operator D, if there are 100 feature points A1, A2, A3...A, there will be 100 matching relationships. The distances d1, d2, d3...d100 between the feature points corresponding to the 100 matching relationships and the corresponding points are respectively calculated by the above method, and the total error of the operator D corresponding to the 100 matching relationships is: f(D) =d1+d2+d3+...+d100. By analogy, if there are n matching relationships, the total error is: f(D)=d1+d2+d3+...+dn. According to the above process, the error function f(D) can be described as f(D)=d1+d2+d3+...+dn, the minimum value of the error function is calculated according to the error function f(D), and the operator corresponding to the minimum value is The detection image is used as a more accurate operator for perspective transformation. It should be noted that the method for finding the error is not limited to the above example, and other methods for finding the error can also be used, such as mean square error, cross entropy or log-likelihood error, which will not be repeated here.

Furthermore, in the process of calculating the total error, in d1, d2, d3...dn, some values with too large deviation can also be removed by the variance, and the characteristics can be filtered out by controlling the dispersion degree of d1, d2, d3...dn Feature points with large differences among the points, so that the total error reflects the difference between the image transformed by the operator and the detected image as much as possible.

In one embodiment, before the step of extracting the second anchor text on the detection image that is consistent with the first anchor text on the detection image through the text recognition model, the method further includes: presetting according to the certificate type of the certificate An auxiliary matching method for extracting anchor text, where the auxiliary matching method includes character spacing and/or positional relationship between characters.

Specifically, for different documents, necessary auxiliary matching rules can be defined for anchor points, so that the subsequent finding of anchor points in the sample to be detected is more accurate, thereby improving the efficiency of anchor point identification and extraction. Different auxiliary matching rules are formulated for anchor points for different document types. For example, the matching rules of ID cards are different from the matching rules of marriage certificates. Corresponding auxiliary matching rules are formulated for specific document types. On the one hand, it is used to extract anchor points more accurately, and on the other hand, it is used to expand the search range of the input image when looking for anchor points, and realize the range of positioning targets when extracting targets. Among them, for the auxiliary matching rules of anchor points, due to the limitation of text recognition ability when extracting anchor points, it is sometimes necessary to add some auxiliary logic to help find anchor points. For example, sometimes the text content of the specified anchor point is in the picture. Sometimes it is very separated. At this time, the content of the position may be recognized as multiple fields in the input image. At this time, it cannot directly correspond to the set anchor text content. Therefore, for this and similar situations, some auxiliary logic such as character spacing and/or position relationship needs to be added to extract anchor points. For example, there is a large gap between each word in the "license holder" field on the marriage certificate, which will easily cause the general text recognition model to recognize it as three fields when looking for an anchor point on the image to be detected. In this case, it is necessary to define certain auxiliary matching rules to splice the three identified fields into one field to obtain the anchor point of the "license holder" we need.

In one embodiment, the step of pre-setting an auxiliary matching method for extracting anchor text includes: pre-setting graphic anchor points to extract feature points through a combination of text anchor points and graphic anchor points.

Specifically, in some certificates, image information can be expanded as an auxiliary matching rule, and feature points can be extracted in combination with graphical anchor points. Since anchor points are generally text information, the image information that can be provided is limited. In the subsequent feature point matching, there may be too few feature points that can be extracted due to insufficient image information, which affects the accuracy of subsequent perspective transformation. However, some documents actually have some fixed graphics, which can provide a large amount of feature point information, but the general text recognition model cannot detect these non-text images. At this time, it is necessary to locate these fixed-position graphics by adding some auxiliary expansions to the detected anchor point information, and then these graphics can also be used as anchor points for feature point extraction in perspective transformation. For example, for the "license holder" field on the marriage certificate, there is a fixed graphic above the license holder field. The position of the graphic can be located through the fixed field of "license holder", and the graphic can be expanded into a graphic anchor point. In this way, more feature points are extracted through the combination of text anchor points and graphic anchor points, and then through more feature point matching, as many more accurate matching feature points as possible are obtained for accurate perspective transformation. The graphic anchor point can determine its position by the relative position relationship with the text anchor point. After the position is determined, its description is the same as the position anchor point, which describes the two vertices of a diagonal line of a rectangular box, generally the upper left and The vertex at the bottom right. There are many ways to determine the relative position relationship between the graphic anchor point and the text anchor point. For example, you can obtain the relative position relationship by trying, or you can mark the position of the graphic anchor point on the template image first. Then calculate the relative position relationship between the graphic anchor point and the text anchor point.

In an embodiment, after the step of performing text recognition on the text at the projection position on the perspective transformed image according to the text recognition model, and extracting the recognized text, the method further includes: The second preset method is to filter the recognized text to obtain the target text of the detection image.

Wherein, the second preset mode refers to a preset text filtering logic, and the text filtering logic includes the type of the text content, the location logic of the location of the text content, and the length limitation of the text content.

Specifically, since the general text recognition model usually directly detects and recognizes the entire picture, according to the performance of the text recognition model and the different training methods, for the text of different certificate types, the text and the certificate recognized by the text recognition model The content of the text may have different degrees of error. For example, the recognized text contains fields that are mixed with unexpected fields, or the content that should be recognized as a field is recognized as multiple fields, or due to text recognition For the logical reason of the position of the model, the field located at the back but slightly upward is recognized to the front, and the field located at the front is recognized to the back. At this time, it is usually very difficult to directly use the recognized text content as the final recognition result. Rough and inaccurate. Because the text results recognized by the text recognition model cannot be guaranteed to be 100% accurate, for the above possible situations, in order to improve the accuracy of the extracted content, you can specify some filtering logic for the extracted content according to the actual characteristics of various types of documents , That is, it is necessary to formulate preset filtering rules for different documents to further accurately filter the text content recognized and extracted by the text recognition model, such as formulating the type of recognition content (such as pure numbers or numbers + English, etc.) and location Logic and length restrictions, etc., to make the extraction results as close to expectations as possible, so that the final extracted text is more accurate. Formulating filtering rules for the extracted content to filter the extracted content can easily meet the different customized needs of customers for different certificates, and can make up for the inaccurate extraction results that may be generated by the user only marking the location, and further satisfy customers Demand.

Therefore, in order to extract the target text more accurately, a small amount of custom logic can be added to assist when defining the extraction field, thereby improving the accuracy. According to the second preset method, the recognized text is filtered to obtain all the text. The step of detecting the target text of the image includes the following steps: filtering the content of the target text extracted by the text recognition model according to the auxiliary extraction logic of the target text formulated in advance, so as to obtain conformity with the auxiliary extraction logic The target text of the corresponding rule is used as the extracted text in the final certificate.

Specifically, formulating auxiliary extraction logic for different content, that is, formulating filtering rules for the extracted content, to achieve more accurate extraction of the content of different fields, according to the formulated auxiliary extraction logic, the extracted text content is further processed Filter to get the text that meets the established logic rules.

Further, since the auxiliary matching method can be pre-defined for the anchor point, that is, the auxiliary extraction logic is formulated for the recognition of the anchor point, the auxiliary matching method for the anchor point definition can be combined with the auxiliary extraction logic for the target text, Define anchor points and target boxes for different documents, initially determine the fixed field anchor points used for perspective transformation and the location of the target information that the customer needs to extract finally, and combine the respective extraction logics customized for anchor points and target information, so as to The extraction of anchor points and target content is more refined, and the anchor point information can be obtained as accurate as possible through the auxiliary matching method of anchor points, so as to make the detection image as accurate as possible perspective transformation, and on the basis of accurate perspective transformation , And then extract the exact target text as much as possible through the filtering logic of the target text, which can avoid the inaccurate extraction results that may be generated by only marking the position, so as to achieve the combination of customized templates and auxiliary logic, which avoids The different extraction requirements of certificates bring about the labor and time cost of completely custom logic. On the other hand, it also avoids the problem of insufficient extraction caused by too general logic.

The above solutions of the embodiments of the present application will be described in the following two specific embodiments:

In one embodiment, please refer to FIG. 10, which includes FIG. 10(a) to FIG. 10(i). FIG. 10 is a schematic diagram of graph transformation of an embodiment of the method for extracting target text in a certificate provided by an embodiment of this application In this embodiment, the specific implementation process includes the following steps:

1.01) The user selects a picture as the template image, and selects fixed fields on the template image, which are called anchor points below. Please refer to the field marked by the solid line selection box in Figure 10(a) as anchor points. The calculation of the perspective transformation operator is performed through these areas;

1.02) The user selects the area on the template image that he wants to extract the text recognition result, which is called the target box hereinafter. Please refer to the position marked by the dotted box in Figure 10(a) as the target box position, which is to extract the text in these areas ；

1.03) The text recognition model recognizes the anchor area selected by the user, and obtains the content information of the anchor area, please refer to Figure 10(b);

1.04) The user inputs the detection image used to extract the target text;

1.05) The text recognition model performs full text recognition on the detected image, and finds the area matching the text content of the anchor point selected by the user through full text recognition, that is, finds the area containing the text content of the anchor point selected by the user, see Figure 10(c );

1.06) Perform feature point extraction and matching on the anchor point area on the template image and the detection image matching, so as to find the first perspective transformation operator that turns the detection image into the template image perspective, see Figure 10(d);

1.07) Perform perspective transformation on the detected image to obtain the first perspective transformation image after perspective, please refer to Figure 10(e);

1.08) Because there may be a certain error in the feature point matching process, the perspective transformation operator obtained may not be completely standard, so the transformed first perspective transformation image may still have a certain perspective difference from the template image , So the next step is not to directly map the position of the target frame to the transformed first perspective transformation image, but to find a second perspective transformation operator between the template image and the transformed first perspective transformation image Project the target frame onto the transformed second perspective transformed image through perspective transformation, so first detect the area matching the anchor text of the template image on the transformed first perspective transformed image, see Figure 10(f);

1.09) Perform feature point extraction and matching on the transformed first perspective transformed image and the template image. Please refer to Figure 10(g) to obtain the second perspective transformation operator from the perspective of the first perspective transformed image to the template image;

1.10) The target frame marked on the template image is projected onto the second perspective transformation image corresponding to the detection image through the second perspective transformation operator through the perspective transformation. Please refer to Figure 10(h). It should be noted that the implementation of this application In the example, it can be seen that on the second perspective transformed image after the transformation of the detection image, the frame of the residence does not frame the entire content of the part of the residence. This is because the user marked only that area on the template image. Therefore, there will only be a small area after projection. You can adjust the range of the target frame by trying the sample, or directly set the largest possible range to make the target frame all the content;

1.11) Text recognition recognizes the content of the target box, please refer to Figure 10(i).

In another embodiment, please refer to Figure 11. Figure 11 includes Figure 11 (a) to Figure 11 (i), Figure 11 (a) to Figure 11 (i) are the target text in the certificate provided by the embodiment of the application A schematic diagram of graph transformation in another embodiment of the extraction method. The specific implementation process includes the following steps:

2.01) Select a picture as the template image, and specify (set) the position and text content of the fixed field on the template image, which is called anchor point below, please refer to Figure 11(a) the part enclosed by the solid line selection box ；

2.02) Customize the auxiliary logic for finding anchor points, that is, the auxiliary logic for the part enclosed by the solid line frame;

2.03) Specify the area contained in the text recognition result you want to extract, hereinafter referred to as the target box, please refer to the dotted line selection box in Figure 11(a);

2.04) Customize the filtering logic of text extraction for the target box;

2.05) User input detection image;

2.06) The text recognition model recognizes the full text of the detected image and finds the area containing the text content of the specified anchor point, please refer to Figure 11(b);

2.07) To extract and match the feature points of the anchor point area on the template image and the detection image matching, please refer to Figure 11(c), so as to find the first perspective transformation operator that turns the detection image into the template image perspective;

2.08) Use the first perspective transformation operator to perform perspective transformation on the detected image to obtain the first perspective transformation image. The image after the perspective transformation is shown in Figure 11(d);

2.09) Also, because there may be certain errors in the matching process of feature points, the first perspective transformation operator obtained may not be completely standard, so the transformed first perspective transformation image may still have a certain difference with the template image. The perspective becomes worse, so the next step is not to directly map the target frame position to the transformed first perspective transformed image, but to find a second perspective between the template image and the transformed first perspective transformed image. The transformation operator projects the target frame onto the transformed second perspective transformation image through the second perspective transformation operator through the perspective transformation, so firstly, the anchor text of the template image is detected on the transformed first perspective transformation image. For the matching area, please refer to Figure 11(e);

2.10) Perform feature point extraction and matching on the transformed first perspective transformed image and the template image. Please refer to Figure 11(f) to obtain the second perspective transformation operator from the perspective of the first perspective transformed image to the template image;

2.11) The target frame marked on the template image is projected onto the transformed second perspective transformed image through the second perspective transformation operator through the perspective transformation, please refer to the area enclosed by the dashed frame in Figure 11(g);

2.12) Text recognition recognizes the content of the target box, please refer to Figure 11(h).

It should be noted that although the target frame where the registration date is located in Fig. 11(g) does not completely frame all the contents of "X5X5X5", due to the configuration of auxiliary logic, the complete "X5X5X5" content is also considered to belong to the target area.

2.13) The identification content is filtered according to the previously formulated filtering rules, please refer to Figure 11(i).

It should be noted that the method for extracting the target text in the certificate described in each of the above embodiments can recombine the technical features contained in the different embodiments as needed to obtain the combined implementation plan, but they are all required by this application. Within the scope of protection.

Please refer to FIG. 12, which is a schematic block diagram of a device for extracting target text in a certificate according to an embodiment of the application. Corresponding to the aforementioned method for extracting target text in a certificate, an embodiment of the present application also provides a device for extracting target text in a certificate. As shown in FIG. 12, the device for extracting target text in the certificate includes a unit for executing the method for extracting target text in the certificate. The device can be configured in computer equipment such as a desktop computer. Specifically, referring to FIG. 12, the target text extraction device 1200 in the certificate includes a first obtaining unit 1201, a second obtaining unit 1202, a solving unit 1203, a transforming unit 1204, a projection unit 1205, and a recognition unit 1206. Wherein, the first obtaining unit 1201 is configured to obtain a template image belonging to the same certificate type and a detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor A point is a fixed field marked on the template image, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is in the template The location of the target text that needs to be extracted on the document marked on the image; the second acquiring unit 1202 is configured to acquire the first preset method according to the first anchor text and based on the text recognition model. The feature point matching relationship between the anchor point position of the anchor point text on the template image and the feature point contained in the anchor point position of the first anchor point text on the detection image, wherein the anchor point position is The position of the first anchor point text on the corresponding image; a solving unit 1203, configured to perform a solution based on the feature point matching relationship through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image; transformation The unit 1204 is configured to perform perspective transformation on the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image; the projection unit 1205 is configured to obtain the perspective transformation operator through the perspective transformation operator. The projection position of the target frame position on the perspective transformation image; the recognition unit 1206 is configured to perform text recognition on the text at the projection position on the perspective transformation image through the text recognition model, and perform text recognition on the recognized text Extraction is performed to obtain the target text of the detected image.

In one embodiment, the text anchor point further includes a first anchor point position, and the second acquiring unit 1202 includes: a first extracting subunit, configured to extract the first anchor point on the detected image through a text recognition model. A second anchor point text that is consistent with the anchor point text; a first obtaining subunit for obtaining, based on the text recognition model, through the second anchor point text on the detection image corresponding to the first anchor point position The second anchor point position; the second extraction subunit is used to extract the first feature point set contained in the first anchor point location and the second feature point set contained in the second anchor point location based on the preset feature point extraction algorithm Feature point set; a first obtaining subunit, configured to obtain the first feature point set and the second feature point set based on a feature point matching algorithm according to the first feature point set and the second feature point set The first feature point matching relationship between the feature points in the; the solving unit 1203 is configured to solve the first feature point matching relationship through the transformation matrix according to the first feature point matching relationship to calculate the first feature point of the detected image for perspective transformation A perspective transformation operator; the transformation unit 1204 is configured to perform perspective transformation on the detected image through the first perspective transformation operator to obtain a first perspective transformation image that matches the perspective of the template image.

In one embodiment, the second obtaining unit 1202 further includes: a second obtaining subunit, configured to input the first perspective transformation image into the text recognition model, and obtain all the images through the first anchor text. A third anchor point position corresponding to the first anchor point position on the first perspective transformed image; a third extraction subunit for extracting the first anchor point contained in the third anchor point position based on the feature point extraction algorithm Three feature point sets; a third acquisition subunit, configured to acquire the first feature point set and the third feature point set based on the feature point matching algorithm according to the first feature point set and the third feature point set The second feature point matching relationship between the feature points in the feature point set; the first solving subunit is used to solve the second feature point matching relationship through the transformation matrix to calculate the first A second perspective transformation operator for performing perspective transformation on a perspective transformation image; the transformation unit 1204 is configured to perform perspective transformation on the first perspective transformation image through the second perspective transformation operator to obtain a second perspective transformation image; The projection unit 1205 is configured to calculate the projection position of the target frame position on the second perspective transformed image through the second perspective transformation operator; the recognition unit 1206 is configured to convert the second perspective transformation The perspective transformation image is input to the text recognition model, and the text at the projection position on the second perspective transformation image is text recognized through the text recognition model, and the recognized text is extracted to obtain the target of the detection image text.

In an embodiment, the solving unit 1203 includes: a second solving subunit, configured to use the matching relationship between each four pairs of feature points to solve through the transformation matrix to obtain a perspective transformation operator; and a repeating subunit , For repeating the above-mentioned method of obtaining a perspective transformation operator for every four pairs of feature points according to the combination of every four pairs of feature points in the first feature point set and all the matching feature points in the second feature point set In the process of obtaining multiple perspective transformation operators, and forming a set of the multiple perspective transformation operators as a perspective transformation operator set; the second obtaining subunit is used to obtain a subunit according to the error function of the pre-built perspective transformation operator, The perspective transformation operator in the perspective transformation operator set corresponding to the minimum value in the error function is obtained by seeking the limit as the first perspective transformation operator.

In one embodiment, the second acquiring unit 1202 further includes: a setting subunit, configured to preset an auxiliary matching method for extracting anchor text according to the certificate type of the certificate.

In one embodiment, the setting subunit is used to preset graphic anchor points to extract feature points through the combination of text anchor points and graphic anchor points.

In an embodiment, wherein the auxiliary matching manner includes character spacing and/or positional relationship between characters.

In an embodiment, the device 1200 for extracting the target text in the certificate further includes: a filtering unit, configured to filter the recognized text according to a second preset manner to obtain the target text of the detection image.

It should be noted that those skilled in the art can clearly understand that the specific implementation process of the target text extraction device and each unit in the above certificate can be referred to the corresponding description in the foregoing method embodiment. For the convenience and conciseness of the description, I won't repeat them here.

At the same time, the division and connection of the various units in the target text extraction device in the certificate are only for illustration. In other embodiments, the target text extraction device in the certificate can be divided into different units as needed, or the target text extraction device in the certificate can be divided into different units as needed. Each unit in the target text extraction device in the certificate adopts different connection sequences and methods to complete all or part of the functions of the target text extraction device in the above-mentioned certificate.

The device for extracting the target text in the certificate can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 13.

Please refer to FIG. 13, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 1300 may be a computer device such as a desktop computer or a server, or may be a component or component in other devices.

Referring to FIG. 13, the computer device 1300 includes a processor 1302, a memory, and a network interface 1305 connected through a system bus 1301, where the memory may include a non-volatile storage medium 1303 and an internal memory 1304.

The non-volatile storage medium 1303 can store an operating system 13031 and a computer program 13032. When the computer program 13032 is executed, the processor 1302 can execute a method for extracting the target text in the certificate. The processor 1302 is used to provide computing and control capabilities to support the operation of the entire computer device 1300.

The internal memory 1304 provides an environment for the operation of the computer program 13032 in the non-volatile storage medium 1303. When the computer program 13032 is executed by the processor 1302, the processor 1302 can make the processor 1302 execute a method for extracting the target text in the certificate.

The network interface 1305 is used for network communication with other devices. Those skilled in the art can understand that the structure shown in FIG. 13 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 1300 to which the solution of the present application is applied. The specific computer device The 1300 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement. For example, in some embodiments, the computer device may only include a memory and a processor. In such embodiments, the structures and functions of the memory and the processor are the same as those of the embodiment shown in FIG. 13 and will not be repeated here.

Wherein, the processor 1302 is configured to run a computer program 13032 stored in the memory, so as to implement the method for extracting the target text in the certificate in the embodiment of the present application.

It should be understood that in this embodiment of the application, the processor 1302 may be a central processing unit (Central Processing Unit, CPU), and the processor 1302 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by a computer program, and the computer program can be stored in a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the steps of the embodiment of the method for extracting target text in the certificate.

Therefore, the embodiment of the present application also provides a computer-readable storage medium. The storage medium stores a computer program. When the computer program is executed by the processor, the processor executes the steps of the method for extracting target text in the certificate described in the above embodiments.

The storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk, etc., which can store program codes. medium.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of both, in order to clearly illustrate the hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described in accordance with the function. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

The above are only specific implementations of this application, but the scope of protection stated in this application is not limited to this. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A method for extracting target text in a certificate, including:

Acquire a template image belonging to the same certificate type and a detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor point is marked on the template image A fixed field, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is all that needs to be extracted on the certificate marked on the template image State the location of the target text;

According to the first anchor point text and based on the text recognition model, the anchor point position of the first anchor point text on the template image and the detection of the first anchor point text on the template image are obtained in a first preset manner. The feature point matching relationship between the feature points contained in each anchor point position on the image, wherein the anchor point position is the position of the first anchor point text on the corresponding image;

According to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image;

Performing perspective transformation on the detection image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image;

Obtaining the projection position of the target frame position on the perspective transformed image by the perspective transformation operator;

Perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.
The method for extracting target text in a certificate according to claim 1, wherein the text anchor point further includes a first anchor point position, and the text is based on the first anchor point text and based on a text recognition model through a first preset Way to obtain the feature point matching relationship between the anchor point position of the first anchor point text on the template image and the anchor point position of the first anchor point text on the detection image The steps include:

Extracting a second anchor point text on the detection image that is consistent with the first anchor point text through a text recognition model;

Obtaining a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model;

Extracting a first feature point set included in the first anchor point location and a second feature point set included in the second anchor point location based on a preset feature point extraction algorithm;

According to the first feature point set and the second feature point set, the first feature point matching between the feature points in the first feature point set and the second feature point set is obtained based on a feature point matching algorithm relationship;

The step of obtaining a perspective transformation operator for performing perspective transformation on the detected image by solving through a transformation matrix according to the feature point matching relationship includes:

According to the first feature point matching relationship, solving through a transformation matrix to calculate a first perspective transformation operator that performs perspective transformation on the detected image;

The step of performing perspective transformation of the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image includes:

The detection image is subjected to perspective transformation by the first perspective transformation operator to obtain a first perspective transformation image that matches the angle of view of the template image.
2. The method for extracting target text in a certificate according to claim 2, wherein the detection image is subjected to perspective transformation through the first perspective transformation operator to obtain a first perspective transformation image that matches the perspective of the template image After the steps, it also includes:

Inputting the first perspective transformation image into the text recognition model, and obtaining a third anchor point position corresponding to the first anchor point position on the first perspective transformation image through the first anchor point text;

Extracting a third set of feature points contained in the third anchor point location based on the feature point extraction algorithm;

According to the first feature point set and the third feature point set, the second feature between the feature points in the first feature point set and the third feature point set is acquired based on the feature point matching algorithm Point matching relationship;

According to the second feature point matching relationship, solving through the transformation matrix to calculate a second perspective transformation operator that performs perspective transformation on the first perspective transformation image;

Performing perspective transformation on the first perspective transformation image through the second perspective transformation operator to obtain a second perspective transformation image;

The step of obtaining the projection position of the target frame position on the perspective transformed image by the perspective transformation operator includes:

Calculating the projection position of the target frame position on the second perspective transformed image by using the second perspective transformation operator;

The step of performing text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extracting the recognized text to obtain the target text of the detection image includes:

The second perspective transformed image is input into the text recognition model, and the text at the projection position on the second perspective transformed image is text recognized through the text recognition model, and the recognized text is extracted to obtain the Describe the target text of the detected image.
The method for extracting the target text in the certificate according to claim 2, wherein the first perspective transformation algorithm for performing perspective transformation on the detected image is calculated by solving through a transformation matrix according to the first feature point matching relationship The sub-steps include:

Use the matching relationship between every four pairs of feature points to solve through the transformation matrix to obtain a perspective transformation operator;

According to the combination between every four pairs of feature points in the first feature point set and all the matching feature points in the second feature point set, repeat the above process of obtaining a perspective transformation operator for every four pairs of feature points, Obtain a plurality of perspective transformation operators, and use the plurality of perspective transformation operators as a set of perspective transformation operators;

According to the pre-built error function of the perspective transformation operator, the perspective transformation operator in the perspective transformation operator set corresponding to the minimum value in the error function is obtained by seeking the limit as the first perspective transformation operator.
2. The method for extracting target text in a certificate according to claim 2, wherein before the step of extracting the second anchor text on the detection image that is consistent with the first anchor text on the detection image through a text recognition model, the method further comprises: According to the certificate type of the certificate, an auxiliary matching method for extracting anchor text is preset.
5. The method for extracting target text in a certificate according to claim 5, wherein the step of presetting an auxiliary matching method for extracting anchor text comprises: presetting graphic anchor points to extract feature points through a combination of text anchor points and graphic anchor points .
The method for extracting target text in a certificate according to claim 5, wherein the auxiliary matching method includes character spacing and/or positional relationship between characters.
The method for extracting target text in a certificate according to claim 1, wherein the text recognition is performed on the text at the projection position on the perspective transformed image by the text recognition model, and the recognized text is extracted After the step, the method further includes: filtering the recognized text according to a second preset manner to obtain the target text of the detection image.
A device for extracting target text in a certificate, which includes:

The first acquiring unit is used to acquire a template image belonging to the same certificate type and a detection image for extracting target text. The template image is marked with a text anchor point and a target frame position, wherein the text anchor point is at The fixed field marked on the template image, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is marked on the template image The location of the target text that needs to be extracted on the certificate;

The second acquiring unit is configured to acquire the anchor position of the first anchor text on the template image and the first preset method according to the first anchor text and based on the text recognition model. The feature point matching relationship between the feature points contained in the anchor point positions of the anchor point text on the detected image, wherein the anchor point position is the position of the first anchor point text on the corresponding image;

A solving unit, configured to perform a solution through a transformation matrix according to the feature point matching relationship to obtain a perspective transformation operator that performs perspective transformation on the detection image;

A transformation unit, configured to perform perspective transformation on the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image;

A projection unit, configured to obtain the projection position of the target frame position on the perspective transformed image through the perspective transformation operator;

The recognition unit is configured to perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.
9. The device for extracting target text in a certificate according to claim 9, wherein the text anchor point further comprises a first anchor point position, and the second acquiring unit comprises:

The first extraction subunit is configured to extract the second anchor text on the detection image that is consistent with the first anchor text through a text recognition model;

A first obtaining subunit, configured to obtain a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model;

The second extraction subunit is configured to extract the first feature point set included in the first anchor point position and the second feature point set included in the second anchor point position based on a preset feature point extraction algorithm;

The first obtaining subunit is configured to obtain the feature points in the first feature point set and the second feature point set based on a feature point matching algorithm according to the first feature point set and the second feature point set The first feature point matching relationship between;

The solution unit is configured to perform a solution through a transformation matrix according to the first feature point matching relationship to calculate a first perspective transformation operator that performs perspective transformation on the detection image;

The transformation unit is configured to perform perspective transformation on the detected image through the first perspective transformation operator to obtain a first perspective transformation image that matches the angle of view of the template image.
A computer device, wherein the computer device includes a memory and a processor connected to the memory; the memory is used to store a computer program; the processor is used to run the computer program stored in the memory to execute the following step:

Acquire a template image belonging to the same certificate type and a detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor point is marked on the template image A fixed field, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is all that needs to be extracted on the certificate marked on the template image State the location of the target text;

According to the first anchor point text and based on the text recognition model, the anchor point position of the first anchor point text on the template image and the detection of the first anchor point text on the template image are obtained in a first preset manner. The feature point matching relationship between the feature points contained in each anchor point position on the image, wherein the anchor point position is the position of the first anchor point text on the corresponding image;

According to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image;

Performing perspective transformation on the detection image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image;

Obtaining the projection position of the target frame position on the perspective transformed image through the perspective transformation operator;

Perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.
11. The computer device according to claim 11, wherein the text anchor point further comprises a first anchor point position, and the first anchor point is obtained in a first preset manner according to the first anchor point text and based on a text recognition model. The step of matching the feature points between the anchor point position of an anchor point text on the template image and the feature points contained in the anchor point position of the first anchor point text on the detection image includes:

Extracting a second anchor point text on the detection image that is consistent with the first anchor point text through a text recognition model;

Obtaining a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model;

Extracting a first feature point set included in the first anchor point location and a second feature point set included in the second anchor point location based on a preset feature point extraction algorithm;

According to the first feature point set and the second feature point set, the first feature point matching between the feature points in the first feature point set and the second feature point set is obtained based on a feature point matching algorithm relationship;

The step of obtaining a perspective transformation operator for performing perspective transformation on the detected image by solving through a transformation matrix according to the feature point matching relationship includes:

According to the first feature point matching relationship, solving through a transformation matrix to calculate a first perspective transformation operator that performs perspective transformation on the detected image;

The step of performing perspective transformation of the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image includes:

The detection image is subjected to perspective transformation by the first perspective transformation operator to obtain a first perspective transformation image that matches the angle of view of the template image.
The computer device according to claim 12, wherein after the step of performing perspective transformation on the detected image by the first perspective transformation operator to obtain a first perspective transformation image that matches the perspective of the template image, further include:

Inputting the first perspective transformation image into the text recognition model, and obtaining a third anchor point position corresponding to the first anchor point position on the first perspective transformation image through the first anchor point text;

Extracting a third set of feature points contained in the third anchor point location based on the feature point extraction algorithm;

According to the first feature point set and the third feature point set, the second feature between the feature points in the first feature point set and the third feature point set is acquired based on the feature point matching algorithm Point matching relationship;

According to the second feature point matching relationship, solving through the transformation matrix to calculate a second perspective transformation operator that performs perspective transformation on the first perspective transformation image;

Performing perspective transformation on the first perspective transformation image through the second perspective transformation operator to obtain a second perspective transformation image;

The step of obtaining the projection position of the target frame position on the perspective transformed image by the perspective transformation operator includes:

Calculating the projection position of the target frame position on the second perspective transformed image by using the second perspective transformation operator;

The step of performing text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extracting the recognized text to obtain the target text of the detection image includes:

The second perspective transformed image is input into the text recognition model, and the text at the projection position on the second perspective transformed image is text recognized through the text recognition model, and the recognized text is extracted to obtain the Describe the target text of the detected image.
11. The computer device according to claim 12, wherein the step of calculating a first perspective transformation operator for performing perspective transformation of the detected image by solving through a transformation matrix according to the first feature point matching relationship comprises:

Use the matching relationship between every four pairs of feature points to solve through the transformation matrix to obtain a perspective transformation operator;

According to the combination between every four pairs of feature points in the first feature point set and all the matching feature points in the second feature point set, repeat the above process of obtaining a perspective transformation operator for every four pairs of feature points, Obtain a plurality of perspective transformation operators, and use the plurality of perspective transformation operators as a set of perspective transformation operators;

According to the pre-built error function of the perspective transformation operator, the perspective transformation operator in the perspective transformation operator set corresponding to the minimum value in the error function is obtained by seeking the limit as the first perspective transformation operator.
The computer device according to claim 12, wherein, before the step of extracting the second anchor text on the detection image that is consistent with the first anchor text on the detection image through the text recognition model, the method further comprises: Credential type, preset auxiliary matching method for extracting anchor text.
15. The computer device according to claim 15, wherein the step of presetting an auxiliary matching method for extracting anchor text comprises: presetting graphic anchor points to extract feature points through a combination of text anchor points and graphic anchor points.
15. The computer device according to claim 15, wherein the auxiliary matching method includes a character pitch and/or a positional relationship between characters.
11. The computer device according to claim 11, wherein after the step of performing text recognition on the text at the projection position on the perspective transformed image by the text recognition model, and extracting the recognized text, further The method includes: filtering the recognized text according to a second preset manner to obtain the target text of the detection image.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the following steps:

Acquire a template image belonging to the same certificate type and a detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor point is marked on the template image A fixed field, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is all that needs to be extracted on the certificate marked on the template image State the location of the target text;

According to the first anchor point text and based on the text recognition model, the anchor point position of the first anchor point text on the template image and the detection of the first anchor point text on the template image are obtained in a first preset manner. The feature point matching relationship between the feature points contained in each anchor point position on the image, wherein the anchor point position is the position of the first anchor point text on the corresponding image;

According to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image;

Performing perspective transformation on the detection image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image;

Obtaining the projection position of the target frame position on the perspective transformed image by the perspective transformation operator;

Perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.
18. The storage medium according to claim 19, wherein the text anchor point further comprises a first anchor point position, and the first anchor point is obtained in a first preset manner according to the first anchor point text and based on a text recognition model. The step of matching the feature points between the anchor point position of an anchor point text on the template image and the feature points contained in the anchor point position of the first anchor point text on the detection image includes:

Extracting a second anchor point text on the detection image that is consistent with the first anchor point text through a text recognition model;

Obtaining a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model;

Extracting a first feature point set included in the first anchor point location and a second feature point set included in the second anchor point location based on a preset feature point extraction algorithm;

According to the first feature point set and the second feature point set, the first feature point matching between the feature points in the first feature point set and the second feature point set is obtained based on a feature point matching algorithm relationship;

The step of obtaining a perspective transformation operator for performing perspective transformation on the detected image by solving through a transformation matrix according to the feature point matching relationship includes:

According to the first feature point matching relationship, solving through a transformation matrix to calculate a first perspective transformation operator that performs perspective transformation on the detected image;

The step of performing perspective transformation of the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image includes:

The detection image is subjected to perspective transformation by the first perspective transformation operator to obtain a first perspective transformation image that matches the angle of view of the template image.