CN113096170B

CN113096170B - Text image registration method, device, equipment and storage medium

Info

Publication number: CN113096170B
Application number: CN202110639946.2A
Authority: CN
Inventors: 李盼盼; 秦勇
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2022-01-25
Anticipated expiration: 2041-06-09
Also published as: CN113096170A

Abstract

The embodiment of the invention provides a method, a device, equipment, a storage medium and a program product for registering a text image. Wherein the method comprises the following steps: respectively extracting image features of a first text image and a second text image to be registered to obtain a first image feature point of the first text image and a second image feature point of the second text image; respectively detecting key points of a first text area of the first text image and a second text area of the second text image; respectively screening out a third image feature point and a fourth image feature point which are located in a preset range of the key points from the first image feature point and the second image feature point based on the key points of the first text region and the second text region; and obtaining a registration result of the first text image and the second text image based on the feature description data of the third image feature point and the fourth image feature point. By the embodiment, the efficiency and the accuracy of text image registration can be improved.

Description

Text image registration method, device, equipment and storage medium

Technical Field

Embodiments of the present invention relate to the field of image processing technologies, and in particular, to a method and an apparatus for registering a text image, an electronic device, a computer-readable storage medium, and a computer program product.

Background

The image registration technology is a hotspot and difficult technology in the field of image processing research, and aims to compare and fuse images acquired under different conditions for the same object, for example, images acquired under different times, different illumination or different shooting angles for the same object. Specifically, for two images to be registered, spatial position transformation is obtained through a series of operations, and one image is mapped onto the other image, that is, pixel points at the same position in space in the two images are in one-to-one correspondence. The image registration technology is widely applied to the fields of target detection, model reconstruction, motion estimation, feature matching, tumor detection, lesion positioning, angiography, geological exploration, aerial reconnaissance and the like.

At present, the image registration technology has poor registration effect on text images, especially text images with complex changes. How to effectively improve the efficiency and accuracy of text image registration becomes a technical problem to be solved urgently at present.

Disclosure of Invention

In view of the above, an embodiment of the present invention provides a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for registering a text image, so as to solve at least one of the above technical problems.

According to a first aspect of embodiments of the present invention, a method for registering text images is provided. The method comprises the following steps: respectively extracting image features of a first text image and a second text image to be registered to obtain a first image feature point of the first text image and a second image feature point of the second text image; performing key point detection on a first text region of the first text image to obtain key points of the first text region, and performing key point detection on a second text region of the second text image to obtain key points of the second text region; based on the key points of the first text region, screening out third image feature points which are located in a first preset range of the key points of the first text region from the first image feature points, and based on the key points of the second text region, screening out fourth image feature points which are located in a second preset range of the key points of the second text region from the second image feature points; and obtaining a registration result of the first text image and the second text image based on the feature description data of the third image feature point and the feature description data of the fourth image feature point.

According to a second aspect of the embodiments of the present invention, there is provided a registration apparatus for text images. The device comprises: the characteristic extraction module is used for respectively extracting image characteristics of a first text image and a second text image to be registered to obtain a first image characteristic point of the first text image and a second image characteristic point of the second text image; the key point detection module is used for detecting key points of a first text region of the first text image to obtain key points of the first text region, and detecting key points of a second text region of the second text image to obtain key points of the second text region; the screening module is used for screening a third image feature point which is located in a first preset range of the key point of the first text region from the first image feature point based on the key point of the first text region, and screening a fourth image feature point which is located in a second preset range of the key point of the second text region from the second image feature point based on the key point of the second text region; a registration module, configured to obtain a registration result of the first text image and the second text image based on the feature description data of the third image feature point and the feature description data of the fourth image feature point.

According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus. The electronic device includes: a processor; and a memory storing a program, wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to the first aspect of an embodiment of the invention.

According to a fourth aspect of embodiments of the present invention, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method according to the first aspect of embodiments of the present invention.

According to a fifth aspect of embodiments of the present invention, there is provided a computer program product comprising a computer program, wherein the computer program realizes the method of the first aspect of embodiments of the present invention when executed by a processor.

According to the registration scheme of the text images provided by the embodiment of the invention, the first text image and the second text image to be registered are respectively subjected to image feature extraction to obtain a first image feature point of the first text image and a second image feature point of the second text image, and respectively detecting key points of a first text region of the first text image and a second text region of the second text image, respectively screening out the first image feature points and the second image feature points based on the key points of the first text region and the second text region, respectively registering the first text image and the second text image based on feature description data of the third image feature points and the fourth image feature points, wherein the third image feature points are located in a first preset range of the key points of the first text region and the fourth image feature points are located in a second preset range of the key points of the second text region. On one hand, only partial image feature points within the preset range of the key points in the text region are screened for text image registration, so that the number of the image feature points for text image registration is reduced, the time required by image feature point matching can be shortened, and the efficiency of text image registration is effectively improved. On the other hand, the screened image feature points are located in the preset range of the key points of the text region, the image feature points most meaningful for text image registration are reserved, the image feature points useless for text image registration are eliminated, the matching accuracy of the image feature points can be improved, and the registration accuracy of the text image can be further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.

Fig. 1 is a flowchart illustrating steps of a text image registration method according to a first embodiment of the present invention;

FIG. 2 is a flowchart showing the steps of a topic modification method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a text image registration apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.

The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.

Before describing the specific implementation of the embodiment of the present invention in detail, the design idea of the technical solution of the embodiment of the present invention is briefly described. Specifically, based on a method for manually designing a Feature extractor, such as SIFT (Scale Invariant Feature Transform), SURF (Speeded Up Robust Features), BRIEF (Binary Robust independent basic Features), and the like, a Feature descriptor corresponding to a detected image Feature point is often based on a statistical Feature of pixel values of pixel points near the image Feature point, and in a certain sense, only a low-level Feature, not high-level Features such as a semantic with stronger representation capability, and meanwhile, a homography matrix for text image registration is only eight unknown numbers, so that only four pairs of matched image Feature points are needed to calculate a homography matrix, and thousands of image Feature points can be extracted from any one text image, and long time is consumed in subsequent matching of image Feature points, resulting in inefficient registration of the text image. In addition, when the homography matrix is determined through the matched image feature points, because the number of the matched image feature points is large, iterative optimization needs to be performed on the homography matrix in a random sampling consistency mode, but the selected matched image feature points may not be text feature points of the text image but image feature points of interference factors such as the background of the text image, and the stability of the determined homography matrix is poor, and the accuracy of text image registration is affected. Based on this, the inventor of the present application considers that the matching mode of the image feature points in the traditional method is optimized, that is, the most significant image feature points for the registration of the text image are kept for matching, and the image feature points which are useless for the registration of the text image are removed, so that not only can the accuracy of the matching of the image feature points be improved, but also the matching speed of the image feature points can be accelerated, and the efficiency and the accuracy of the registration of the text image can be improved. The embodiment of the invention provides a specific implementation manner of a text image registration method, which comprises the following steps:

example one

Referring to fig. 1, a flowchart illustrating steps of a text image registration method according to a first embodiment of the present invention is shown.

Specifically, the registration method for the text image provided by the embodiment of the invention comprises the following steps:

in step S101, image feature extraction is performed on a first text image and a second text image to be registered, so as to obtain a first image feature point of the first text image and a second image feature point of the second text image.

In this embodiment, the first text image and the second text image can be understood as images of text characters, such as a title image, a chinese character image, an english character image, and the like. The first image feature points can be understood as typical representative pixel points in the first text image, and the second image feature points can be understood as typical representative pixel points in the second text image. The image feature points are mainly applied to multiple tasks of computer vision, and application scenes of the image feature points mainly comprise camera calibration, image splicing, dense reconstruction and scene understanding. At present, there are three methods for obtaining image feature points, the first is an artificially designed image feature point detection algorithm, such as SIFT, SURF, BRIEF, etc.; the second is to use a deep learning based approach; the third is to use the artificial mark points in the scene as the image feature points. Although the current deep learning-based method can achieve a better effect, the artificially designed image feature point detection algorithm still has a better effect, a higher speed and high feasibility, and is widely applied in the industry, especially the SIFT algorithm, and is not weak since being proposed. Any pixel point on the text image is required to become an image feature point, and two basic requirements of difference and repeatability are required to be met, wherein the difference means that the pixel point is obvious compared with surrounding pixel points, gray level transformation is obvious, such as an angular point, an edge point and the like, and the repeatability means that the features of the pixel point repeatedly appear in different visual angles and have invariance of rotation, luminosity and scale.

In the present embodiment, the first image feature point and the second image feature point may be obtained by using a classical feature description method such as SIFT, SURF, and BRIEF. The inventors of the present application found that using different feature description methods to obtain the first image feature points and the second image feature points differs in that the dimensions of the feature vectors characterizing the features of the image feature points are different, e.g., the dimensions of the feature vectors may be 64-dimensional or 128-dimensional, etc. In general, the larger the dimension of the feature vector of the image feature point, the higher the accuracy of the feature characterizing the image feature point. Of course, the first image feature point and the second image feature point can also be obtained by means of manual calibration. The specific obtaining method can be selected by those skilled in the art according to actual needs, and the embodiment of the invention is not limited in any way.

In step S102, performing keypoint detection on a first text region of the first text image to obtain keypoints of the first text region, and performing keypoint detection on a second text region of the second text image to obtain keypoints of the second text region.

In this embodiment, the key point of the first text region may be understood as a representative pixel point of a key in the first text region, and the key point of the second text region may be understood as a representative pixel point of a key in the second text region.

In some optional embodiments, when performing the keypoint detection on the first text region of the first text image, performing the keypoint detection on the first text region of the first text image by using a keypoint detection model, and obtaining the keypoints of the first text region. Therefore, the key point detection model is used for detecting the key points of the first text area of the first text image, and the key points of the first text area can be accurately obtained. In addition, when the key point detection is carried out on the second text region of the second text image, the key point detection model is utilized to carry out key point detection on the second text region of the second text image, and the key point of the second text region is obtained. Therefore, the key point detection model is used for detecting the key points of the second text area of the second text image, and the key points of the second text area can be accurately obtained.

In one specific example, prior to performing keypoint detection on a first text region of the first text image using a keypoint detection model, the method further comprises: performing key point detection on a text area of a text image sample through the key point detection model to be trained to obtain a detection key point of the text area of the text image sample; training the key point detection model to be trained based on the detection key points and the labeling key points of the text region of the text image sample to obtain the trained key point detection model. Therefore, the key point detection model to be trained can be effectively trained through the detection key points and the labeling key points of the text region of the text image sample.

In a specific example, the text image sample can be understood as a text image in a sample library, the detection key points can be understood as key points of text regions in the text image sample detected by the key point detection model, and the labeling key points can be understood as key points of text regions in the text image sample labeled manually or by machine. When the key point detection model to be trained is trained on the basis of the detection key points and the labeling key points of the text region of the text image sample, determining difference values of the detection key points and the labeling key points through a target loss function; and adjusting the model parameters of the key point detection model based on the difference values. The target loss function can be any loss function such as a cross entropy loss function, a softmax loss function, an L1 loss function, and an L2 loss function. In adjusting the model parameters of the keypoint detection model, a back propagation algorithm or a stochastic gradient descent algorithm may be used to adjust the model parameters of the keypoint detection model.

In a specific example, the currently obtained detection keypoints are evaluated by determining a difference value between the detection keypoints and the labeling keypoints, so as to serve as a basis for subsequently training the keypoint detection model. Specifically, the discrepancy values may be transmitted back to the keypoint detection model, thereby iteratively training the keypoint detection model. The training of the keypoint detection model is an iterative process, and this embodiment describes only one training process, but it should be understood by those skilled in the art that this training mode may be adopted for each training of the keypoint detection model until the training of the keypoint detection model is completed.

In some optional embodiments, when performing keypoint detection on a first text region of the first text image to obtain keypoints of the first text region, performing text feature extraction on the first text region to obtain a text feature map of the first text region; obtaining a key point feature map of the first text region based on the text feature map of the first text region; and obtaining the key points of the first text area according to the key point feature map of the first text area. Therefore, the key point feature map of the first text region can be accurately obtained through the text feature map of the first text region. In addition, the key points of the first text region can be accurately obtained through the key point feature map of the first text region.

In a specific example, when text feature extraction is performed on the first text region to obtain a text feature map of the first text region, text feature extraction is performed on the first text region by using the key point detection model to obtain a plurality of text feature maps of different scales of the first text region; utilizing the key point detection model to perform up-sampling on a plurality of text feature maps of different scales of the first text region to obtain a plurality of text feature maps of the same scale of the first text region; and utilizing the key point detection model to serially connect a plurality of text feature maps with the same scale in the first text region to obtain the text feature map of the first text region.

In a specific example, when obtaining the feature map of the key point of the first text region based on the text feature map of the first text region, performing convolution operation and deconvolution operation on the text feature map of the first text region to obtain a feature map of a center point of a text box, a first feature map of a text box, and a second feature map of a text box of the first text region, where the first corner point of the text box and the second corner point of the text box are corner points of the text box. Therefore, the feature map of the center point of the text box, the feature map of the first corner point of the text box and the feature map of the second corner point of the text box of the first text area can be accurately obtained by performing convolution operation and deconvolution operation on the feature map of the text of the first text area.

In a specific example, when the key points of the first text region are obtained according to the key point feature map of the first text region, the center point of the text box of the first text region is determined based on the text box center point feature map of the first text region; determining a first corner of the text box of the first text area based on a first corner feature map of the text box of the first text area; and determining a second corner of the text box of the first text region based on a second corner feature map of the text box of the first text region. The first corner may be an upper left corner of a text box of the first text region, and the second corner may be a lower right corner of the text box of the first text region. Alternatively, the first corner may be a lower left corner of a text box of the first text region, and the second corner may be an upper right corner of the text box of the first text region.

In some optional embodiments, when performing keypoint detection on a second text region of the second text image to obtain keypoints of the second text region, performing text feature extraction on the second text region to obtain a text feature map of the second text region; obtaining a key point feature map of the second text region based on the text feature map of the second text region; and obtaining the key points of the second text region according to the key point feature map of the second text region. Therefore, the key point feature map of the second text region can be accurately obtained through the text feature map of the second text region. In addition, the key points of the second text region can be accurately obtained through the key point feature map of the second text region.

In a specific example, when text feature extraction is performed on the second text region to obtain a text feature map of the second text region, text feature extraction is performed on the second text region by using the key point detection model to obtain a plurality of text feature maps of different scales of the second text region; utilizing the key point detection model to perform up-sampling on a plurality of text feature maps of different scales of the second text region to obtain a plurality of text feature maps of the same scale of the second text region; and utilizing the key point detection model to serially connect a plurality of text feature maps with the same scale in the second text region to obtain the text feature map of the second text region.

In a specific example, when obtaining the feature map of the key point of the second text region based on the text feature map of the second text region, performing convolution operation and deconvolution operation on the text feature map of the second text region to obtain a feature map of a center point of a text box, a first feature map of a text box, and a second feature map of a text box of the second text region, where the first corner point of the text box and the second corner point of the text box are corner points of the text box. Therefore, the feature map of the center point of the text box, the feature map of the first corner point of the text box and the feature map of the second corner point of the text box of the second text area can be accurately obtained by performing convolution operation and deconvolution operation on the feature map of the text of the second text area.

In a specific example, when the key points of the second text region are obtained according to the key point feature map of the second text region, the center point of the text box of the second text region is determined based on the text box center point feature map of the second text region; determining a first corner of the text box of the second text region based on the first corner feature map of the text box of the second text region; and determining a second corner of the text box of the second text region based on a second corner feature map of the text box of the second text region. The first corner may be an upper left corner of a text box of the second text region, and the second corner may be a lower right corner of the text box of the second text region. Alternatively, the first corner point may be a lower-left corner point of a text box of the second text region, and the second corner point may be an upper-right corner point of the text box of the second text region.

In one specific example, the keypoint detection model may be a centret network. The CenterNet network is a regression-based method, in the method, the category of the text region to be detected is firstly set, and finally, the output channel number is 1+1+1, wherein the center point of the text box of the detected text region is included, and a score map (the value of each pixel point is between 0 and 1, and the probability that the pixel point is the center point of the text box) is output for the center point. In addition, two channels are used for detecting a score map of an upper left corner of the text box where the central point is located (the value of each pixel point is between 0 and 1, and the probability that the pixel point is the upper left corner of the text region is represented) and a score map of a lower right corner of the text box where the central point is located (the value of each pixel point is between 0 and 1, and the probability that the pixel point is the lower right corner of the text region is represented). And finding a center point, an upper left corner point or a lower right corner point of the text region in the score map by setting a threshold value. Specifically, each pixel point of the score map of the center point corresponds to a score, which represents a probability value that the corresponding pixel point may serve as the center point of the text box. The scores in the probability score chart are all decimals between 0 and 1, and may be, for example, 0.11, 0.34, 0.52, 0.89, and the like, which are not exhaustive here. And when the score of the pixel point is larger than the threshold value, the pixel point is considered as the center point of the text box. The threshold may be preset manually, or may be automatically set after calculation by a model according to the actual situation of the text region. For example, when the threshold of the score map is manually preset to be 0.70, the pixel points corresponding to the scores of 0.81, 0.79 and 0.92 in the text region are the center points of the text box detected by the model, and the other pixel points with lower scores are not the center points. The setting of the threshold value here may also be set to other values, which are not exhaustive here. The score map of the upper left corner point and the score map of the lower right corner point are similar to the score map of the center point, and are not repeated herein. The feature extraction part of the CenterNet network can be a Resnet18 network and is used as a backbone network of the CenterNet network, the Resnet18 network is constructed by connecting four residual blocks in series, each residual block comprises a plurality of layers of convolution operations, the feature mapping size output by the first residual block is 1/4 of a text image, the second residual block is 1/8 of the text image, the third residual block is 1/16 of the text image, the fourth residual block is 1/32 of the text image, the feature mapping number output by each residual block is 128, 4 groups of feature mappings are all changed into 1/4 of the text image in an interpolation mode and are connected in series to obtain a group of feature mappings, the channel number is 512, then, one convolution operation is carried out on the feature mappings of 512 channels, two deconvolution operations are carried out to obtain 3 (1 +1+ 1) channel output consistent with the size of the text image, the first channel represents the score map of the center point of the text box (i.e. the value of each pixel point is between 0 and 1, which represents the probability that this pixel point is the center point of the text box), the second channel represents the score map of the upper left corner of the text box, and the third channel represents the score map of the lower right corner of the text box.

In step S103, a third image feature point located within a first preset range of the key points of the first text region is screened from the first image feature points based on the key points of the first text region, and a fourth image feature point located within a second preset range of the key points of the second text region is screened from the second image feature points based on the key points of the second text region.

In this embodiment, the first preset range and the second preset range may be set by a person skilled in the art according to actual needs, and this embodiment does not limit this.

In some optional embodiments, when a third image feature point located within a first preset range of the key points of the first text region is screened out from the first image feature points based on the key points of the first text region, a first circular screening region is determined with the key points of the first text region as a center and a first preset distance as a radius; and screening out the feature points in the first circular screening area from the first image feature points as the third image feature points. The first preset distance may be set by a person skilled in the art according to actual needs, and this embodiment does not limit this. Thereby, by determining the first circular filtering region for the first image feature point filtering, the third image feature points located around the key points of the first text region can be accurately obtained.

In some optional embodiments, when a third image feature point located within a first preset range of the key points of the first text region is screened from the first image feature points based on the key points of the first text region, a square screening region for screening the first image feature point is determined with the key points of the first text region as a center and with a preset length as a side length; and determining the first image feature point positioned in the square screening area as the third image feature point. The preset length may be set by a person skilled in the art according to actual needs, and this embodiment does not limit this. Thereby, by determining a square filtering region for filtering the first image feature point, the third image feature point located around the key point of the first text region can be accurately obtained.

In some optional embodiments, when a fourth image feature point located within a second preset range of the key points of the second text region is screened out from the second image feature points based on the key points of the second text region, a second circular screening region is determined with the key points of the second text region as a center and a second preset distance as a radius; and screening out the feature points in the second circular screening area from the second image feature points as the fourth image feature points. The second preset distance may be set by a person skilled in the art according to actual needs, and this embodiment does not limit this. Thereby, by determining the second circular filtering region for filtering the second image feature points, the fourth image feature points located around the key points of the second text region can be accurately obtained.

In some optional embodiments, when a fourth image feature point located within a second preset range of the key points of the second text region is screened from the second image feature points based on the key points of the second text region, a square screening region for screening the second image feature points is determined with the key points of the second text region as a center and with a preset length as a side length; and determining a second image feature point located in the square screening area as the fourth image feature point. The preset length may be set by a person skilled in the art according to actual needs, and this embodiment does not limit this. Thereby, by determining a square filtering region for filtering the second image feature point, the fourth image feature point located around the key point of the second text region can be accurately obtained.

In step S104, a registration result of the first text image and the second text image is obtained based on the feature description data of the third image feature point and the feature description data of the fourth image feature point.

In this embodiment, the feature description data of the third image feature point may be a feature descriptor of the third image feature point, and the feature description data of the fourth image feature point may be a feature descriptor of the fourth image feature point. Wherein the feature descriptor is understood to be a symbol for describing a feature of an image feature point.

In some optional embodiments, when the registration result of the first text image and the second text image is obtained based on the feature description data of the third image feature point and the feature description data of the fourth image feature point, matching the third image feature point with the fourth image feature point based on the feature description data of the third image feature point and the feature description data of the fourth image feature point to obtain at least one image feature point pair matched with each other; for each image feature point pair, determining position transformation data based on position data of each image feature point in the image feature point pair; and based on the position transformation data, transforming the position of a pixel point corresponding to one of the image feature point pairs, so that the pixel point in the first text image and the pixel point in the second text image are mapped with each other. Therefore, by determining position conversion data and converting the position of a pixel point corresponding to one of the image feature point pairs based on the position conversion data, the pixel point in the first text image and the pixel point in the second text image can be mapped with each other.

In a specific example, when the third image feature point is matched with the fourth image feature point based on the feature description data of the third image feature point and the feature description data of the fourth image feature point, when the similarity between the feature description vector of the third image feature point and the feature description vector of the fourth image feature point is greater than or equal to a preset similarity threshold, it may be determined that the third image feature point and the fourth image feature point are a matched pair of image feature points. In this way, the third image feature point and the fourth image feature point are in a corresponding relationship.

In a specific example, when the position conversion data is determined for each of the image feature point pairs based on the position data of each of the image feature points in the image feature point pair, the position conversion matrix is determined based on the two-dimensional coordinate data of each of the image feature points in the image feature point pair. The position transformation matrix may be a homography matrix, and determining the homography matrix based on the two-dimensional coordinate data of each image feature point in the image feature point pair is prior art and is not described herein again.

In a specific example, when the position of the pixel point corresponding to one of the image feature point pairs is transformed based on the position transformation data, the position of the pixel point corresponding to one of the image feature point pairs is transformed based on the homography matrix, so that the pixel point in the first text image and the pixel point in the second text image are mapped to each other.

In a specific example, a large number of text image samples are collected, each text image sample is shot for multiple times at random under different conditions, so that a large number of text image sample pairs with different backgrounds and different shooting angles are obtained, and then the text image samples are labeled according to the training data requirements of a text detection task. Then, using the improved centret network to construct a text detection model, unlike the conventional centret network, this embodiment uses the Resnet18 network as a backbone network, the Resnet18 network is constructed by connecting 4 residual blocks in series, each residual block includes several layers of convolution operations, the feature mapping size output by the first residual block is 1/4 of the original, the second is 1/8 of the original, the third is 1/16 of the original, the fourth is 1/32 of the original, the feature mapping number output by each residual block is 128, the 4 groups of feature mappings are all changed into 1/4 of the original by interpolation and connected in series to obtain a group of feature mappings, the channel number is 512, then, one convolution operation is performed on the feature mappings of 512 channels, two deconvolution operations are performed to obtain 3 channels output consistent with the input image size, compared with the conventional centret network, the first channel represents the score map of the central point of the text box (the value of each pixel point is between 0 and 1, and represents the probability that the pixel point is the central point of the text box), the second channel represents the score map of the upper left corner point of the text box, and the third channel represents the score map of the lower right corner point of the text box. In this embodiment, the text does not really need to be detected, but only in order to find key representative pixel points, in the training process, the lost function focal loss used when the centeret network is used for training the center point is used for training the three channels. When the training is finished, in the using stage, for any two text images to be registered (respectively called a text image one and a text image two), firstly, the center point detection and the corner point detection of the text line are carried out on the two text images to be registered by using the improved centret network in the embodiment, the center point and the corner point are obtained in the three obtained channels by using a set threshold value mode (namely, the center point or the corner point is considered when the probability value on the partial graph is larger than the set threshold value, otherwise, the center point or the corner point is not considered), then, the image feature points are extracted by using the SIFT algorithm for the text image one and the text image two respectively, and the feature descriptors (exemplified by the SIFT algorithm, and other algorithms can be used) corresponding to the feature points of each image are obtained, then, each center point and each corner point on each text image are used as the center, a circle is drawn by using the length of 5mm as the radius (other values can be set), and the image feature points in the circle range are kept, and then, matching image feature points by using the feature descriptors of the left image feature points, calculating a homography matrix according to a matching result, and mapping the first text image to the second text image, thereby realizing the registration of the text images. For example, the background information of the text image means that, for example, the text image is placed on a desk for taking a picture, the mobile phone terminal holds a higher height, and at this time, the four corners have the background of the desk, and these parts also have image feature points, which most likely affect the matching accuracy, and meanwhile, the text image is different from other images, and the image feature points at blank positions without characters have little meaning.

According to the registration method of the text images provided by the embodiment of the invention, the first text image and the second text image to be registered are respectively subjected to image feature extraction to obtain the first image feature point of the first text image and the second image feature point of the second text image, and respectively detecting key points of a first text region of the first text image and a second text region of the second text image, respectively screening out the first image feature points and the second image feature points based on the key points of the first text region and the second text region, respectively registering the first text image and the second text image based on feature description data of the third image feature points and the fourth image feature points, wherein the third image feature points are located in a first preset range of the key points of the first text region and the fourth image feature points are located in a second preset range of the key points of the second text region. On one hand, only partial image feature points within the preset range of the key points in the text region are screened for text image registration, so that the number of the image feature points for text image registration is reduced, the time required by image feature point matching can be shortened, and the efficiency of text image registration is effectively improved. On the other hand, the screened image feature points are located in the preset range of the key points of the text region, the image feature points most meaningful for text image registration are reserved, the image feature points useless for text image registration are eliminated, the matching accuracy of the image feature points can be improved, and the registration accuracy of the text image can be further improved.

The registration method of the text image provided by the present embodiment may be performed by any suitable device with data processing capability, including but not limited to: cameras, terminals, mobile terminals, PCs, servers, in-vehicle devices, entertainment devices, advertising devices, Personal Digital Assistants (PDAs), tablet computers, notebook computers, handheld game consoles, smart glasses, smart watches, wearable devices, virtual display devices or display enhancement devices (such as Google Glass, Oculus rise, Hololens, Gear VR), and the like.

Example two

Before describing the specific implementation of the embodiment of the present invention in detail, the design idea of the technical solution of the embodiment of the present invention is briefly described. Specifically, in the exercise book for primary school mathematics, due to various reasons such as writing habits and shooting scenes, a large number of problems such as back penetration of text images (one side affects the other side due to both sides of the same page of paper), uneven illumination (shooting under operation light), incorrect photoprint and shooting angle can occur. If all the questions in primary school mathematics are to be corrected comprehensively, a question database must be established, how to correspond the answer area of the question image in the question database to the answer area of the question image to be corrected has great influence on the accuracy rate of the correction of the questions, and the answer areas are mapped by using an image registration method to correct the questions, so that a good effect can be realized. However, the method depends heavily on the effect of image registration, and the current general image registration method has poor effect on text images, especially text images with complicated changes, which restricts the improvement of the accuracy rate of title correction. Based on this, the inventor of the present application considers that the matching mode of the image feature points in the traditional method is optimized, that is, the most significant image feature points for the registration of the text image are kept for matching, and the image feature points which are useless for the registration of the text image are removed, so that not only can the accuracy of the matching of the image feature points be improved, but also the matching speed of the image feature points can be accelerated, and therefore, the efficiency and the accuracy of the registration of the text image can be improved, and the accuracy of the correction of the subjects can be improved. The embodiment of the invention provides a specific implementation way of a title correction method, which is as follows:

referring to fig. 2, a flowchart of the steps of the topic modification method according to the second embodiment of the present invention is shown.

Specifically, the title correction method provided by the embodiment of the present invention includes the following steps:

in step S201, image feature extraction is performed on an image of a first topic to be modified and an image of a second topic in a preset topic database, so as to obtain a first image feature point of the image of the first topic and a second image feature point of the image of the second topic.

In this embodiment, the first subject to be corrected may be a primary school mathematics subject, a middle school mathematics subject, a college mathematics subject, an english subject, a chinese subject, and the like, and the second subject may be a primary school mathematics subject, a middle school mathematics subject, a college mathematics subject, an english subject, a chinese subject, and the like.

Since the specific implementation of step S201 is similar to the specific implementation of step S101 in the first embodiment, it is not repeated herein.

In step S202, performing keypoint detection on a first question-answering region of the first question image to obtain keypoints of the first question-answering region, and performing keypoint detection on a second question-answering region of the second question image to obtain keypoints of the second question-answering region.

Since the specific implementation of step S202 is similar to the specific implementation of step S102 in the first embodiment, it is not repeated herein.

In step S203, based on the key points in the first question answering area, third image feature points located in a first preset range of the key points in the first question answering area are selected from the first image feature points, and based on the key points in the second question answering area, fourth image feature points located in a second preset range of the key points in the second question answering area are selected from the second image feature points.

Since the specific implementation of step S203 is similar to the specific implementation of step S103 in the first embodiment, it is not repeated here.

In step S204, based on the feature description data of the third image feature point and the feature description data of the fourth image feature point, a registration result of the image of the first topic and the image of the second topic is obtained.

Since the specific implementation of step S204 is similar to the specific implementation of step S104 in the first embodiment, it is not repeated here.

In step S205, based on the registration result of the image of the first topic and the image of the second topic, the first topic is modified, and a modification result of the first topic is obtained.

In this embodiment, because the registration result of the image of the first topic and the image of the second topic is obtained by registering the image of the first topic and the image of the second topic based on the feature description data of the third image feature point and the feature description data of the fourth image feature point, the registration result of the image of the first topic and the image of the second topic can be understood as that a pixel point in the first topic area is mapped to a corresponding pixel point in the second topic area, or a pixel point in the second topic area is mapped to a corresponding pixel point in the first topic area. After the first question answering area and the second question answering area are mapped, the first question answering area and the second question answering area can be compared, and the first question can be corrected according to a comparison result so as to obtain a correction result of the first question. Specifically, if the first question area and the second question area have the same answer content, the first question is answered correctly, otherwise, the first question is answered incorrectly.

According to the title correction method provided by the embodiment of the invention, image feature extraction is respectively carried out on an image of a first title and an image of a second title to be corrected, a first image feature point of the image of the first title and a second image feature point of the image of the second title are obtained, key point detection is respectively carried out on a first answer area of the image of the first title and a second answer area of the image of the second title, then a third image feature point located in a first preset range of the key point of the first answer area and a fourth image feature point located in a second preset range of the key point of the second answer area are respectively screened out from the first image feature point and the second image feature point based on the key points of the first answer area and the second answer area, and registration is carried out on the image of the first title and the image of the second title based on feature description data of the third image feature point and the fourth image feature point, and then correcting the first topic based on the registration result of the image of the first topic and the image of the second topic. On one hand, only partial image feature points around the key points of the answer area are screened for topic image registration, so that the number of image feature points for topic image registration is reduced, the time required by image feature point matching can be shortened, the topic image registration efficiency is effectively improved, and the topic correction efficiency is further effectively improved. On the other hand, the screened image feature points are located around the key points of the answer area, the most significant image feature points for topic image registration are reserved, the image feature points which are useless for topic image registration are eliminated, the accuracy of image feature point matching can be improved, the accuracy of topic image registration can be improved, and the accuracy of topic correction is effectively improved.

The title modification method provided by the present embodiment can be executed by any suitable device with data processing capability, including but not limited to: cameras, terminals, mobile terminals, PCs, servers, in-vehicle devices, entertainment devices, advertising devices, Personal Digital Assistants (PDAs), tablet computers, notebook computers, handheld game consoles, smart glasses, smart watches, wearable devices, virtual display devices or display enhancement devices (such as Google Glass, Oculus rise, Hololens, Gear VR), and the like.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a text image registration apparatus according to a third embodiment of the present invention, and referring to fig. 3, the apparatus includes:

the feature extraction module 301 is configured to perform image feature extraction on a first text image and a second text image to be registered respectively, so as to obtain a first image feature point of the first text image and a second image feature point of the second text image;

a key point detection module 302, configured to perform key point detection on a first text region of the first text image to obtain key points of the first text region, and perform key point detection on a second text region of the second text image to obtain key points of the second text region;

a screening module 303, configured to screen, from the first image feature points, third image feature points located within a first preset range of the key points in the first text region based on the key points in the first text region, and screen, from the second image feature points, fourth image feature points located within a second preset range of the key points in the second text region based on the key points in the second text region;

a registration module 304, configured to obtain a registration result of the first text image and the second text image based on the feature description data of the third image feature point and the feature description data of the fourth image feature point.

In the embodiment of the invention, the image feature extraction is respectively carried out on the first text image and the second text image to be registered to obtain the first image feature point of the first text image and the second image feature point of the second text image, and respectively detecting key points of a first text region of the first text image and a second text region of the second text image, respectively screening out the first image feature points and the second image feature points based on the key points of the first text region and the second text region, respectively registering the first text image and the second text image based on feature description data of the third image feature points and the fourth image feature points, wherein the third image feature points are located in a first preset range of the key points of the first text region and the fourth image feature points are located in a second preset range of the key points of the second text region. On one hand, only partial image feature points within the preset range of the key points in the text region are screened for text image registration, so that the number of the image feature points for text image registration is reduced, the time required by image feature point matching can be shortened, and the efficiency of text image registration is effectively improved. On the other hand, the screened image feature points are located in the preset range of the key points of the text region, the image feature points most meaningful for text image registration are reserved, the image feature points useless for text image registration are eliminated, the matching accuracy of the image feature points can be improved, and the registration accuracy of the text image can be further improved.

In a possible implementation manner, the keypoint detection module 302 is specifically configured to perform keypoint detection on a first text region of the first text image by using a keypoint detection model, so as to obtain keypoints of the first text region; and carrying out key point detection on a second text region of the second text image by using the key point detection model to obtain key points of the second text region.

In a possible implementation manner, the keypoint detection module 302 is specifically configured to perform text feature extraction on the first text region to obtain a text feature map of the first text region; obtaining a key point feature map of the first text region based on the text feature map of the first text region; and obtaining the key points of the first text area according to the key point feature map of the first text area.

In a possible implementation manner, the key point detecting module 302 is specifically configured to, when obtaining the key point feature map of the first text region based on the text feature map of the first text region, perform convolution operation and deconvolution operation on the text feature map of the first text region, and obtain a text box center feature map, a text box first corner feature map, and a text box second corner feature map of the first text region, where the text box first corner and the text box second corner are opposite corners of the text box.

In a possible implementation manner, the screening module 303 is specifically configured to determine a first circular screening area by using the key point of the first text area as a center of a circle and using a first preset distance as a radius when a third image feature point located within a first preset range of the key point of the first text area is screened from the first image feature points based on the key point of the first text area; and screening out the feature points in the first circular screening area from the first image feature points as the third image feature points.

In a possible implementation manner, the screening module 303 is specifically configured to determine a second circular screening area by using the key point of the second text area as a center of a circle and using a second preset distance as a radius when a fourth image feature point located in a second preset range of the key point of the second text area is screened from the second image feature points based on the key point of the second text area; and screening out the feature points in the second circular screening area from the second image feature points as the fourth image feature points.

In a possible implementation manner, the registration module 304 is specifically configured to, when obtaining a registration result of the first text image and the second text image based on the feature description data of the third image feature point and the feature description data of the fourth image feature point, match the third image feature point with the fourth image feature point based on the feature description data of the third image feature point and the feature description data of the fourth image feature point, and obtain at least one image feature point pair that matches each other; for each image feature point pair, determining position transformation data based on position data of each image feature point in the image feature point pair; and based on the position transformation data, transforming the position of a pixel point corresponding to one of the image feature point pairs, so that the pixel point in the first text image and the pixel point in the second text image are mapped with each other.

Example four

Fig. 4 is a hardware structure of an electronic device according to a fourth embodiment of the present invention; as shown in fig. 4, the electronic device 400 may include: a processor (processor)402, a Communications Interface 408, a memory 404, and a communication bus 406.

Wherein:

the processor 402, communication interface 408, and memory 404 communicate with each other via a communication bus 406.

A communication interface 408 for communicating with other electronic devices or servers.

The processor 402 is configured to execute the program 410, and may specifically execute relevant steps in the above embodiment of the text image registration method.

In particular, program 410 may include program code comprising computer operating instructions.

The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 404 for storing a program 410. The memory 404 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 410 may specifically be configured to cause the processor 402 to perform the following operations: respectively extracting image features of a first text image and a second text image to be registered to obtain a first image feature point of the first text image and a second image feature point of the second text image; performing key point detection on a first text region of the first text image to obtain key points of the first text region, and performing key point detection on a second text region of the second text image to obtain key points of the second text region; based on the key points of the first text region, screening out third image feature points which are located in a first preset range of the key points of the first text region from the first image feature points, and based on the key points of the second text region, screening out fourth image feature points which are located in a second preset range of the key points of the second text region from the second image feature points; and obtaining a registration result of the first text image and the second text image based on the feature description data of the third image feature point and the feature description data of the fourth image feature point.

In an alternative embodiment, the program 410 is further configured to enable the processor 402 to perform a keypoint detection on a first text region of the first text image by using a keypoint detection model, and obtain keypoints of the first text region; and carrying out key point detection on a second text region of the second text image by using the key point detection model to obtain key points of the second text region.

In an optional implementation, the program 410 is further configured to cause the processor 402 to, when performing keypoint detection on a first text region of the first text image to obtain keypoints of the first text region, perform text feature extraction on the first text region to obtain a text feature map of the first text region; obtaining a key point feature map of the first text region based on the text feature map of the first text region; and obtaining the key points of the first text area according to the key point feature map of the first text area.

In an optional implementation, the program 410 is further configured to, when obtaining the feature map of the key point of the first text region based on the feature map of the text of the first text region, perform convolution operation and deconvolution operation on the feature map of the text of the first text region, so as to obtain a feature map of a center point of a text box of the first text region, a feature map of a first corner point of the text box, and a feature map of a second corner point of the text box, where the first corner point of the text box and the second corner point of the text box are corner points of the text box.

In an alternative embodiment, the program 410 is further configured to cause the processor 402 to determine a first circular screening area by taking the keypoint of the first text area as a center and taking a first preset distance as a radius when a third image feature point located within a first preset range of the keypoint of the first text area is screened from the first image feature points based on the keypoint of the first text area; and screening out the feature points in the first circular screening area from the first image feature points as the third image feature points.

In an alternative embodiment, the program 410 is further configured to enable the processor 402, when a fourth image feature point located within a second preset range of the key points of the second text region is screened from the second image feature points based on the key points of the second text region, to determine a second circular screening region by taking the key points of the second text region as a center and a second preset distance as a radius; and screening out the feature points in the second circular screening area from the second image feature points as the fourth image feature points.

In an alternative embodiment, the program 410 is further configured to cause the processor 402 to, when obtaining the registration result of the first text image and the second text image based on the feature description data of the third image feature point and the feature description data of the fourth image feature point, match the third image feature point with the fourth image feature point based on the feature description data of the third image feature point and the feature description data of the fourth image feature point, and obtain at least one matched image feature point pair; for each image feature point pair, determining position transformation data based on position data of each image feature point in the image feature point pair; and based on the position transformation data, transforming the position of a pixel point corresponding to one of the image feature point pairs, so that the pixel point in the first text image and the pixel point in the second text image are mapped with each other.

For specific implementation of each step in the program 410, reference may be made to corresponding descriptions in corresponding steps in the foregoing embodiment of the text image registration method, which is not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

By the electronic equipment of the embodiment, image feature extraction is respectively carried out on the first text image and the second text image to be registered, so as to obtain a first image feature point of the first text image and a second image feature point of the second text image, and respectively detecting key points of a first text region of the first text image and a second text region of the second text image, respectively screening out the first image feature points and the second image feature points based on the key points of the first text region and the second text region, respectively registering the first text image and the second text image based on feature description data of the third image feature points and the fourth image feature points, wherein the third image feature points are located in a first preset range of the key points of the first text region and the fourth image feature points are located in a second preset range of the key points of the second text region. On one hand, only partial image feature points within the preset range of the key points in the text region are screened for text image registration, so that the number of the image feature points for text image registration is reduced, the time required by image feature point matching can be shortened, and the efficiency of text image registration is effectively improved. On the other hand, the screened image feature points are located in the preset range of the key points of the text region, the image feature points most meaningful for text image registration are reserved, the image feature points useless for text image registration are eliminated, the matching accuracy of the image feature points can be improved, and the registration accuracy of the text image can be further improved.

In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code configured to perform the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program performs the above-described functions defined in the method in the embodiment of the present invention when executed by a Central Processing Unit (CPU). It should be noted that the computer readable medium in the embodiments of the present invention may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access storage media (RAM), a read-only storage media (ROM), an erasable programmable read-only storage media (EPROM or flash memory), an optical fiber, a portable compact disc read-only storage media (CD-ROM), an optical storage media piece, a magnetic storage media piece, or any suitable combination of the foregoing. In embodiments of the invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In an embodiment of the invention, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code configured to carry out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may operate over any of a variety of networks: including a Local Area Network (LAN) or a Wide Area Network (WAN) -to the user's computer, or alternatively, to an external computer (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions configured to implement the specified logical function(s). In the above embodiments, specific precedence relationships are provided, but these precedence relationships are only exemplary, and in particular implementations, the steps may be fewer, more, or the execution order may be modified. That is, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a feature extraction module, a keypoint detection module, a screening module, and a registration module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.

As another aspect, embodiments of the present invention also provide a non-transitory computer-readable storage medium storing computer instructions for causing the processor to execute the registration method of text images according to the description in the above embodiments.

As another aspect, an embodiment of the present invention further provides a computer-readable medium, which may be included in the apparatus described in the above embodiment; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: respectively extracting image features of a first text image and a second text image to be registered to obtain a first image feature point of the first text image and a second image feature point of the second text image; performing key point detection on a first text region of the first text image to obtain key points of the first text region, and performing key point detection on a second text region of the second text image to obtain key points of the second text region; based on the key points of the first text region, screening out third image feature points which are located in a first preset range of the key points of the first text region from the first image feature points, and based on the key points of the second text region, screening out fourth image feature points which are located in a second preset range of the key points of the second text region from the second image feature points; and obtaining a registration result of the first text image and the second text image based on the feature description data of the third image feature point and the feature description data of the fourth image feature point.

As another aspect, the embodiment of the present invention further provides a computer program product, which includes a computer program, wherein the computer program, when executed by a processor, implements the registration method of text images according to the description in the above embodiments.

The expressions "first", "second", "said first" or "said second" used in various embodiments of the invention may modify various components without relation to order and/or importance, but these expressions do not limit the respective components. The above description is only configured for the purpose of distinguishing elements from other elements.

The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the embodiments of the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept described above. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present invention are mutually replaced to form the technical solution.

Claims

1. A method of registration of text images, the method comprising:

respectively extracting image features of a first text image and a second text image to be registered to obtain a first image feature point of the first text image and a second image feature point of the second text image;

performing key point detection on a first text region of the first text image to obtain key points of the first text region, and performing key point detection on a second text region of the second text image to obtain key points of the second text region;

screening out third image feature points which are located in a first preset screening area range of the key points of the first text area from the first image feature points on the basis of the key points of the first text area, and screening out fourth image feature points which are located in a second preset screening area range of the key points of the second text area from the second image feature points on the basis of the key points of the second text area;

and obtaining a registration result of the first text image and the second text image based on the feature description data of the third image feature point and the feature description data of the fourth image feature point.

2. The registration method of the text images according to claim 1, wherein a first text region of the first text image is subjected to a key point detection by using a key point detection model to obtain key points of the first text region; and the number of the first and second groups,

and performing key point detection on a second text region of the second text image by using the key point detection model to obtain key points of the second text region.

3. The registration method of the text image according to claim 2, wherein the performing the keypoint detection on the first text region of the first text image to obtain the keypoint of the first text region comprises:

performing text feature extraction on the first text region to obtain a text feature map of the first text region;

obtaining a key point feature map of the first text region based on the text feature map of the first text region;

and obtaining the key points of the first text area according to the key point feature map of the first text area.

4. The registration method of the text image according to claim 3, wherein the obtaining the keypoint feature map of the first text region based on the text feature map of the first text region comprises:

and performing convolution operation and deconvolution operation on the text feature map of the first text region to obtain a text box center point feature map, a text box first corner feature map and a text box second corner feature map of the first text region, wherein the text box first corner and the text box second corner are opposite corners of the text box.

5. The method for registering text images according to claim 1, wherein the screening out third image feature points from the first image feature points based on the key points of the first text region, the third image feature points being located within a first preset screening region of the key points of the first text region, comprises:

determining a first circular screening area by taking the key point of the first text area as the circle center and taking a first preset distance as the radius;

and screening out the feature points in the first circular screening area from the first image feature points as the third image feature points.

6. The method for registering text images according to claim 1, wherein the screening out fourth image feature points from the second image feature points based on the key points of the second text region, the fourth image feature points being located within a second preset screening region of the key points of the second text region, comprises:

determining a second circular screening area by taking the key point of the second text area as the circle center and a second preset distance as the radius;

and screening out the feature points in the second circular screening area from the second image feature points as the fourth image feature points.

7. The method for registering text images according to claim 1, wherein the obtaining of the registration result of the first text image and the second text image based on the feature description data of the third image feature point and the feature description data of the fourth image feature point comprises:

matching the third image feature point with the fourth image feature point based on the feature description data of the third image feature point and the feature description data of the fourth image feature point to obtain at least one image feature point pair matched with each other;

for each image feature point pair, determining position transformation data based on position data of each image feature point in the image feature point pair;

and based on the position transformation data, transforming the position of a pixel point corresponding to one of the image feature point pairs, so that the pixel point in the first text image and the pixel point in the second text image are mapped with each other.

8. An apparatus for registration of text images, the apparatus comprising:

the characteristic extraction module is used for respectively extracting image characteristics of a first text image and a second text image to be registered to obtain a first image characteristic point of the first text image and a second image characteristic point of the second text image;

the key point detection module is used for detecting key points of a first text region of the first text image to obtain key points of the first text region, and detecting key points of a second text region of the second text image to obtain key points of the second text region;

the screening module is used for screening a third image feature point which is located in a first preset screening area range of the key points of the first text area from the first image feature points based on the key points of the first text area, and screening a fourth image feature point which is located in a second preset screening area range of the key points of the second text area from the second image feature points based on the key points of the second text area;

a registration module, configured to obtain a registration result of the first text image and the second text image based on the feature description data of the third image feature point and the feature description data of the fourth image feature point.

9. An electronic device, comprising:

a processor; and

a memory for storing a program, wherein the program is stored in the memory,

wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the method according to any one of claims 1-7.

10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.