WO2021072879A1 - Method and apparatus for extracting target text in certificate, device, and readable storage medium - Google Patents

Method and apparatus for extracting target text in certificate, device, and readable storage medium Download PDF

Info

Publication number
WO2021072879A1
WO2021072879A1 PCT/CN2019/118469 CN2019118469W WO2021072879A1 WO 2021072879 A1 WO2021072879 A1 WO 2021072879A1 CN 2019118469 W CN2019118469 W CN 2019118469W WO 2021072879 A1 WO2021072879 A1 WO 2021072879A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image
perspective transformation
anchor point
perspective
Prior art date
Application number
PCT/CN2019/118469
Other languages
French (fr)
Chinese (zh)
Inventor
黄文韬
刘鹏
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021072879A1 publication Critical patent/WO2021072879A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • FIG. 6 is a schematic diagram of an image correction process through perspective transformation provided by an embodiment of this application.
  • the position of the fixed field on the template image that is, the area covered by the fixed field on the template image is called the anchor position
  • the content of the fixed field that is, the specific meaning of the fixed field description, such as identity
  • the “name” described in the "name” field on the certificate image or the “citizen ID number” described in the “citizen ID number” field is called the anchor text of the text anchor.
  • Fig. 5(a) and Fig. 5(b) are schematic diagrams of the perspective transformation principle provided by the embodiment of the application.
  • S206 Perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.
  • S703 Obtain a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model.
  • the processor 1302 is configured to run a computer program 13032 stored in the memory, so as to implement the method for extracting the target text in the certificate in the embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Character Input (AREA)

Abstract

Provided are a method and apparatus for extracting target text in a certificate, a computer device, and a computer-readable storage medium. The embodiments of the present application belong to the technical field of text identification. The method comprises: acquiring a template image and a test image that belong to the same certificate type, wherein the template image is marked with a character anchor point and a target frame position, and the character anchor point comprises first anchor point text; acquiring, in a first preset mode, a feature point matching relationship between an anchor point position on the template image and an anchor point position on the test image; performing solution by means of a transformation matrix according to the feature point matching relationship to obtain a perspective transformation operator; performing perspective transformation on the test image by means of the perspective transformation operator to obtain a perspective transformation image; acquiring a projection position of the target frame position on the perspective transformation image by means of the perspective transformation operator; and performing text identification on text at the projection position by means of a text identification model to obtain target text of the test image.

Description

证件中的目标文本提取方法、装置、设备及可读存储介质Method, device, equipment and readable storage medium for extracting target text in certificate
本申请要求于2019年10月15日提交中国专利局、申请号为201910979567.0、申请名称为“证件中的目标文本提取方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 15, 2019, the application number is 201910979567.0, and the application name is "Methods, devices, equipment, and readable storage media for extracting target text from documents". The entire content is incorporated into this application by reference.
技术领域Technical field
本申请涉及文本识别技术领域,尤其涉及一种证件中的目标文本提取方法、装置、计算机设备及计算机可读存储介质。This application relates to the field of text recognition technology, and in particular to a method, device, computer equipment, and computer-readable storage medium for extracting target text from a certificate.
背景技术Background technique
文本识别技术在深度学习技术成熟之后取得了长足进步,既可以对图像中的文本位置进行定位,同时也可以对定位到的文本进行识别。不过很多用于文本识别的深度学习模型对标准正面图像的识别效果很好,但是对于一些相对于标准正面图像为处于视角旋转和变换状态的图像适应性较差,不能很好识别。然而,日常生活中取得的图片大多数都不是标准的正面图像,存在不同程度的视角变换。如果要对这些图像实现一个比较好的识别效果,就得通过一些筛选、裁剪和旋转变换等手段才行。传统技术中,对图像进行筛选、裁剪和旋转变换等手段通常都是通过人工进行预处理来完成。但对于用户来说,有时需要对大批量的图像数据进行文本提取,比如在一堆驾驶证中提取所有者的姓名、生日等信息,这时如果想要进行自动化的批量提取,仅仅通过传统技术中的文本识别,很难实现。因为即使通过人工实现了用户指定识别区域,但是每张图片都会多少存在位置上的差异,并且每张图片上的目标字段的位置都会多少存在不同,单是通过文本识别目前还是很难消除这种位置上的差异带来的影响。而如果通过人工来预先对数据进行处理以消除位置上的差异,一是操作困难,二是消耗过大,导致对图片进行识别的效率较低。After the deep learning technology matures, text recognition technology has made great progress. It can not only locate the text position in the image, but also recognize the located text. However, many deep learning models used for text recognition have a good recognition effect on standard frontal images, but they have poor adaptability for some images that are in a state of rotation and transformation relative to the standard frontal image and cannot be recognized well. However, most of the pictures taken in daily life are not standard frontal images, and there are varying degrees of perspective changes. If you want to achieve a better recognition effect on these images, you have to go through some filtering, cropping, and rotation transformations. In traditional technology, methods such as screening, cropping, and rotation transformation of images are usually done by manual preprocessing. But for users, sometimes it is necessary to extract text from a large number of image data, such as extracting the owner’s name, birthday and other information from a bunch of driver’s licenses. At this time, if you want to perform automated batch extraction, only through traditional technology In the text recognition, it is difficult to achieve. Because even if the user specifies the recognition area manually, there will be some differences in the position of each picture, and the position of the target field on each picture will be somewhat different. It is still difficult to eliminate this by text recognition alone. The impact of the difference in location. However, if the data is manually processed in advance to eliminate the difference in position, one is difficult to operate, and the other is excessive consumption, resulting in low efficiency of image recognition.
发明内容Summary of the invention
本申请实施例提供了一种证件中的目标文本提取方法、装置、计算机设备及计算机可读存储介质,能够解决传统技术中通过文本识别模型对证件中的目标文本进行提取时效率较低的问题。The embodiments of the application provide a method, device, computer equipment, and computer-readable storage medium for extracting target text in a certificate, which can solve the problem of low efficiency when extracting target text in a certificate through a text recognition model in traditional technology. .
第一方面,本申请实施例提供了一种证件中的目标文本提取方法,所述方法包括:获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上标注有文字锚点和目标框位置,其中,所述文字锚点为在所述模板图像上标注的固定字段,所述文字锚点包括第一锚点文本,所述第一锚点文本为所述固定字段的内容,所述目标框位置为在所述模板图像上标注的证件上需要提取的所述目标文本所在的位置;根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系,其中,锚点位置为所述第一锚点文本在对应图像上的位置;根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子;将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像;通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置;通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。In the first aspect, an embodiment of the present application provides a method for extracting target text in a certificate. The method includes: obtaining a template image belonging to the same certificate type and a detection image used to extract the target text, and the template image is marked There are text anchor points and target frame positions, wherein the text anchor point is a fixed field marked on the template image, the text anchor point includes a first anchor point text, and the first anchor point text is the The content of the fixed field, the target frame position is the position of the target text that needs to be extracted on the certificate marked on the template image; according to the first anchor text and based on the text recognition model, pass the first preview Set the method to obtain the feature point matching relationship between the anchor point position of the first anchor point text on the template image and the feature point contained in the anchor point position of the first anchor point text on the detection image , Wherein the anchor point position is the position of the first anchor point text on the corresponding image; according to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image; Perform perspective transformation of the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image; acquire the position of the target frame on the perspective transformation image through the perspective transformation operator Projection position; text recognition is performed on the text at the projection position on the perspective transformed image through the text recognition model, and the recognized text is extracted to obtain the target text of the detection image.
第二方面,本申请实施例还提供了一种证件中的目标文本提取装置,包括:第一获取单元,用于获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上标注有文字锚点和目标框位置,其中,所述文字锚点为在所述模板图像上标注的固定字段,所述文字锚点包括第一锚点文本,所述第一锚点文本为所述固定字段的内容,所述目标框位置为在所述模板图像上标注的证件上需要提取的所述目标文本所在的位置;第二获取单元,用于根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系,其中,锚点位置为所述第一锚点文本在对应图像上的位置;求解单元,用于根据所述特征点匹配关系, 通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子;变换单元,用于将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像;投影单元,用于通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置;识别单元,用于通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。In a second aspect, an embodiment of the present application also provides a device for extracting target text in a certificate, including: a first obtaining unit, configured to obtain a template image belonging to the same certificate type and a detection image for extracting target text, so The template image is marked with a text anchor point and a target frame position, wherein the text anchor point is a fixed field marked on the template image, the text anchor point includes a first anchor point text, and the first anchor point The point text is the content of the fixed field, and the target frame position is the position of the target text that needs to be extracted on the certificate marked on the template image; the second acquiring unit is configured to Point text and obtain the anchor point position of the first anchor point text on the template image and the anchor point position of the first anchor point text on the detection image through a first preset method based on a text recognition model The feature point matching relationship between the feature points contained in each of them, wherein the anchor point position is the position of the first anchor point text on the corresponding image; the solving unit is used to perform a transformation matrix based on the feature point matching relationship Solve to obtain a perspective transformation operator that performs perspective transformation on the detected image; a transformation unit for performing perspective transformation on the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image Projection unit, used to obtain the projection position of the target frame position on the perspective transformed image through the perspective transformation operator; Recognition unit, used to identify all on the perspective transformed image through the text recognition model Perform text recognition on the text at the projection position, and extract the recognized text to obtain the target text of the detection image.
第三方面,本申请实施例还提供了一种计算机设备,其包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算机程序时实现所述证件中的目标文本提取方法。In a third aspect, an embodiment of the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor implements the target text in the certificate when the computer program is executed. method of extraction.
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器执行所述证件中的目标文本提取方法。In a fourth aspect, the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the document The target text extraction method.
附图说明Description of the drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1为本申请实施例提供的证件中的目标文本提取方法的应用场景示意图;FIG. 1 is a schematic diagram of an application scenario of a method for extracting target text in a certificate provided by an embodiment of the application;
图2为本申请实施例提供的证件中的目标文本提取方法的一个流程示意图;2 is a schematic flowchart of a method for extracting target text in a certificate provided by an embodiment of this application;
图3为本申请示实施例提供的证件中的目标文本提取方法的技术特征关系示意图;FIG. 3 is a schematic diagram of the technical feature relationship of the target text extraction method in the certificate provided by the embodiment of the application;
图4为本申请实施例提供的证件中的目标文本提取方法中特征点提取和特征点匹配的流程示意图;4 is a schematic diagram of the flow of feature point extraction and feature point matching in the method for extracting target text in a certificate provided by an embodiment of this application;
图5(a)和图5(b)为本申请实施例提供的透视变换原理示意图;Figures 5(a) and 5(b) are schematic diagrams of the perspective transformation principle provided by the embodiments of this application;
图6为本申请实施例提供的通过透视变换矫正图像流程示意图;FIG. 6 is a schematic diagram of an image correction process through perspective transformation provided by an embodiment of this application;
图7为本申请实施例提供的证件中的目标文本提取方法的另一个实施例的流程示意图;FIG. 7 is a schematic flowchart of another embodiment of a method for extracting target text in a certificate according to an embodiment of this application;
图8为图7所示实施例提供的证件中的目标文本提取方法中技术特征关系简化的流程示意图;8 is a schematic diagram of a simplified flow chart of technical feature relationships in the method for extracting target text in a certificate provided by the embodiment shown in FIG. 7;
图9为本申请实施例提供的证件中的目标文本提取方法中透视变换算子的示意图;9 is a schematic diagram of a perspective transformation operator in a method for extracting target text in a certificate provided by an embodiment of the application;
图10(a)至图10(i)为本申请实施例提供的证件中的目标文本提取方法中一个实施例的图形变换示意图;10(a) to 10(i) are schematic diagrams of an embodiment of the method for extracting target text in a certificate provided by an embodiment of this application;
图11(a)至图11(i)为本申请实施例提供的证件中的目标文本提取方法中另一个实施例的图形变换示意图;11(a) to 11(i) are schematic diagrams of another embodiment of the method for extracting target text in the certificate provided by the embodiments of this application;
图12为本申请实施例提供的证件中的目标文本提取装置的一个示意性框图;以及FIG. 12 is a schematic block diagram of a device for extracting target text in a certificate provided by an embodiment of this application; and
图13为本申请实施例提供的计算机设备的示意性框图。FIG. 13 is a schematic block diagram of a computer device provided by an embodiment of the application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
请参阅图1,图1为本申请实施例提供的证件中的目标文本提取方法的应用场景示意图。所述应用场景包括:(1)用户。用户通过输入设备或者计算机设备的输入组件在模板图像上标注文字锚点和目标框位置。(2)终端。终端用于执行证件中的目标文本提取方法的步骤。所述终端可以为智能手机、智能手表、笔记本电脑、平板电脑或者台式电脑等计算机设备。Please refer to FIG. 1. FIG. 1 is a schematic diagram of an application scenario of a method for extracting target text in a certificate provided by an embodiment of the application. The application scenarios include: (1) users. The user uses the input device or the input component of the computer device to mark the position of the text anchor point and the target frame on the template image. (2) Terminal. The terminal is used to execute the steps of the target text extraction method in the certificate. The terminal may be a computer device such as a smart phone, a smart watch, a notebook computer, a tablet computer, or a desktop computer.
图1中的各个主体工作过程如下:用户在模板图像上标注文字锚点和目标框位置,并将模板图像进行存储或者上传至系统以供终端获取,终端获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上标注有文字锚点和目标框位置,其中,所述文字锚点为在所述模板图像上标注的固定字段,所述文字锚点包括第一锚点文本,所述第一锚点文本为所述固定字段的内容,所述目标框位置为在所述模板图像上标注的证件上需要提取的所 述目标文本所在的位置;根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系;根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子;将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像;通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置;通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。The working process of each subject in Figure 1 is as follows: the user marks the text anchor point and target frame position on the template image, and stores or uploads the template image to the system for the terminal to obtain, and the terminal obtains the template image and the template image belonging to the same certificate type. A detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor point is a fixed field marked on the template image, and the text anchor point includes the first An anchor text, the first anchor text is the content of the fixed field, and the target frame position is the position of the target text that needs to be extracted on the certificate marked on the template image; according to the The first anchor point text is based on a text recognition model, and the anchor point position of the first anchor point text on the template image and the position of the first anchor point text on the detection image are acquired through a first preset method. The feature point matching relationship between the feature points contained in each anchor point position; according to the feature point matching relationship, a transformation matrix is used to solve the problem to obtain a perspective transformation operator that performs perspective transformation on the detection image; Perform perspective transformation through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image; acquire the projection position of the target frame position on the perspective transformation image through the perspective transformation operator; The text recognition model performs text recognition on the text at the projection position on the perspective transformed image, and extracts the recognized text to obtain the target text of the detection image.
需要说明的是,图1中仅仅示意出台式电脑作为终端,在实际操作过程中,终端的类型不限于图1中所示,上述证件中的目标文本提取方法的应用场景仅仅用于说明本申请技术方案,并不用于限定本申请技术方案。It should be noted that FIG. 1 only shows a desktop computer as a terminal. In actual operation, the type of terminal is not limited to that shown in FIG. 1. The application scenario of the method for extracting target text in the certificate is only used to illustrate this application. The technical solution is not used to limit the technical solution of this application.
请参阅图2,图2为本申请实施例提供的证件中的目标文本提取方法的示意性流程图。该证件中的目标文本提取方法应用于图1中的终端中,以完成证件中的目标文本提取方法的全部或者部分功能。如图2所示,该方法包括以下步骤S201-S206:Please refer to FIG. 2. FIG. 2 is a schematic flowchart of a method for extracting target text in a certificate provided by an embodiment of the application. The target text extraction method in the certificate is applied to the terminal in FIG. 1 to complete all or part of the functions of the target text extraction method in the certificate. As shown in Figure 2, the method includes the following steps S201-S206:
S201、获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上标注有文字锚点和目标框位置,其中,所述文字锚点为在所述模板图像上标注的固定字段,所述文字锚点包括第一锚点文本,所述第一锚点文本为所述固定字段的内容,所述目标框位置为在所述模板图像上标注的证件上需要提取的所述目标文本所在的位置。S201. Obtain a template image belonging to the same certificate type and a detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor point is on the template image A marked fixed field, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is to be extracted on the certificate marked on the template image The location of the target text.
其中,文字锚点是指由用户在模板图像上定义的固定字段,固定字段是指在同类证件的不同样本中不会发生变化的字段,比如身份证上的“姓名”或者“公民身份证号”等固定字段。由于模板图像上的固定字段是由该固定字段的位置和该固定字段的内容限定的,以该固定字段的位置为锚点位置,以该固定字段的内容为锚点文本,所以文字锚点包含锚点位置和锚点文本。更具体地,固定字段在模板图像上的位置,也即固定字段在模板图像上所覆盖的区域称为锚点位置,所述固定字段的内容,也即该固定字段描述的具体含义,比如身份 证图像上的“姓名”字段描述的“姓名”或者“公民身份证号”字段描述的“公民身份证号”等,称为文字锚点的锚点文本。Among them, the text anchor refers to the fixed field defined by the user on the template image, and the fixed field refers to the field that will not change in different samples of the same type of document, such as the "name" or "citizen ID number" on the ID card "And other fixed fields. Since the fixed field on the template image is defined by the position of the fixed field and the content of the fixed field, the position of the fixed field is the anchor position, and the content of the fixed field is the anchor text, so the text anchor contains Anchor position and anchor text. More specifically, the position of the fixed field on the template image, that is, the area covered by the fixed field on the template image is called the anchor position, and the content of the fixed field, that is, the specific meaning of the fixed field description, such as identity The "name" described in the "name" field on the certificate image or the "citizen ID number" described in the "citizen ID number" field is called the anchor text of the text anchor.
目标框位置是指由用户在模板图像上定义的证件上需要提取的文本内容覆盖的区域所在的位置,比如身份证上的“姓名”字段所处的位置是锚点,姓名的具体赋值,比如“张三”在模板图像上的位置就是目标框位置。目标框的位置是用户根据需要提取的文字覆盖的区域范围来确定的。The target frame position refers to the location of the area covered by the text content that needs to be extracted on the certificate defined by the user on the template image. For example, the position of the "name" field on the ID card is the anchor point, and the specific assignment of the name, such as The position of "Zhang San" on the template image is the target frame position. The position of the target frame is determined by the user according to the area covered by the text to be extracted.
具体地,由于获得的检测图像往往不是和模板图像视角相符的标准正面图像,因此在应用深度学习模型对检测图像进行识别以提取图像的文字内容之前,需要对待识别的检测图像进行图像角度矫正的调整,使图像旋转到合适的角度以提高深度学习模型对检测图像内容的识别效果。Specifically, because the obtained detection image is often not a standard frontal image that matches the perspective of the template image, it is necessary to perform image angle correction on the detection image to be recognized before applying the deep learning model to recognize the detection image to extract the text content of the image. Adjust to rotate the image to an appropriate angle to improve the recognition effect of the deep learning model on the content of the detected image.
为了达到对检测图像的角度进行准确校正的目的,本申请实施例采用证件图像中存在固定字段的特征,通过固定字段的内容作为中间媒介,进而获得将检测图像进行旋转的透视变换算子,以后续将图像进行透视变换。因此,首先需要获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上有通过自定义进行标注的文字锚点和目标框位置,其中文字锚点包括第一文字锚点位置和第一锚点文本。例如,可以通过用户预先在模板图像上框选出固定字段,也即模板图像上标注的文字锚点,通过文本识别的方式识别出框选区域固定的文字内容或者由用户输入固定字段的内容(即第一锚点文本),然后在输入的检测图像中找出与模板图像上所选的固定文字内容相同的文字区域,再将检测图像上找出的文字区域与模板图像上对应的框选区域进行特征点的提取和匹配,从而通过先对证件图像中固定不变的字段所在的局部区域进行提取,再对这些局部区域进行匹配,这时候因为只对图像上的部分区域进行匹配,能够有效减少整张图像中错误的相似区域所带来的影响和干扰,从而提高模板图像和检测图像中局部区域的提取和匹配的质量和效率,能够提高对特征点的提取和匹配准确度。例如,请参阅图3,图3为本申请示实施例提供的证件中的目标文本提取方法的技术特征关系示意图。如图3所示,其中,A、C及F是证件中相同的固定字段,A1、C1及F1分别是证件中相同的固定字段A、C及F的锚点位置,A2、C2及F2是证件中相同的固定字段A、C及F的锚点文 本,通过A2、C2及F2得出A1、C1及F1之间的对应关系,针对A1、C1及F1所在的区域进行特征点的提取和匹配,由于只对图像上A1、C1及F1所在的区域进行特征点的提取和匹配,能够有效减少整张图像中错误的相似区域所带来的影响,从而提高模板图像和检测图像中局部区域的提取和匹配的质量和效率。In order to achieve the purpose of accurately correcting the angle of the detected image, the embodiment of the application adopts the feature of a fixed field in the credential image, and uses the content of the fixed field as an intermediate medium to obtain a perspective transformation operator that rotates the detected image. Later, the image is subjected to perspective transformation. Therefore, it is first necessary to obtain a template image belonging to the same certificate type and a detection image used to extract the target text. The template image has a text anchor point and target frame position marked by customization, wherein the text anchor point includes the first text The anchor position and the first anchor text. For example, the user can pre-select fixed fields on the template image, that is, the text anchor points marked on the template image, and recognize the fixed text content in the frame selection area through text recognition or the user enters the content of the fixed field ( That is, the first anchor point text), and then find the text area that is the same as the fixed text selected on the template image in the input detection image, and then select the text area found on the detection image and the corresponding frame on the template image The region performs feature point extraction and matching, so that by first extracting the local regions where the fixed fields in the document image are located, and then matching these local regions, at this time, because only partial regions on the image are matched, it can be It can effectively reduce the influence and interference caused by the wrong similar areas in the entire image, thereby improving the quality and efficiency of the extraction and matching of the template image and the local area in the detection image, and can improve the accuracy of feature point extraction and matching. For example, please refer to FIG. 3, which is a schematic diagram of the technical feature relationship of the method for extracting target text in the certificate provided by the embodiment of this application. As shown in Figure 3, where A, C, and F are the same fixed fields in the certificate, A1, C1, and F1 are the anchor positions of the same fixed fields A, C, and F in the certificate, and A2, C2, and F2 are The anchor texts of the same fixed fields A, C, and F in the certificate, the corresponding relationship between A1, C1, and F1 is obtained through A2, C2, and F2, and the feature points are extracted and extracted for the area where A1, C1, and F1 are located. Matching, because only the area where A1, C1 and F1 are located on the image is extracted and matched, it can effectively reduce the impact of the wrong similar areas in the entire image, thereby improving the template image and the local area in the detected image The quality and efficiency of extraction and matching.
进一步地,本申请实施例中允许用户对证件进行自定义模板图像以对证件图像中的指定目标进行文本识别。对于模板图像上锚点和目标框位置的设置可以由研发人员或者用户进行自定义。其中,锚点文本可以通过人工输入的方式直接获取,比如身份证上的“姓名”、“出生日期”及“发证机关”等固定字段的文本内容可以根据证件上的固定字段内容直接获得,锚点位置和目标框位置的获取方式可以通过Opencv中Event鼠标事件获取鼠标指针的位置等自定义程序得到模板图像上由人工使用鼠标绘制的锚点位置及目标框位置,比如,身份证上的“姓名”及“出生日期”等固定字段的位置可以通过Opencv中Event鼠标事件获取鼠标指针的位置以得到身份证图像上由人工使用鼠标绘制的锚点位置坐标及目标框位置的坐标。所述位置可以通过矩形的左上角和右下角坐标来描述,然后在编程语言中进行定义该锚点及目标框位置。Further, in the embodiment of the present application, the user is allowed to customize the template image of the certificate to perform text recognition on the designated target in the certificate image. The setting of the anchor point and target frame position on the template image can be customized by the developer or user. Among them, the anchor text can be directly obtained by manual input. For example, the text content of fixed fields such as "name", "date of birth" and "licensing authority" on the ID card can be directly obtained according to the fixed field content on the certificate. The method of obtaining the anchor point position and target frame position can be obtained by custom programs such as the Event mouse event in Opencv to obtain the mouse pointer position and other custom programs to obtain the anchor point position and target frame position drawn by the manual mouse on the template image, for example, the position on the ID card The position of the fixed fields such as "name" and "date of birth" can be obtained through the Event mouse event in Opencv to obtain the position of the mouse pointer to obtain the coordinates of the anchor point position and the position of the target frame drawn by the manual mouse on the ID card image. The position can be described by the coordinates of the upper left corner and the lower right corner of the rectangle, and then the anchor point and the target frame position are defined in a programming language.
进一步地,对于用户正在编辑的模板图像,如果系统中有相关记录并有相应储存数据,则可以直接从后台储存中获取模板图像数据。如果没有相关记录,则需要在用户完成标记流程之后将图片与标记的信息一起上传,然后计算机设备获取模板图像,所述模板图像上由用户设置(定义)有锚点位置和目标框位置。Further, for the template image being edited by the user, if there are related records and corresponding storage data in the system, the template image data can be directly obtained from the background storage. If there is no relevant record, after the user completes the marking process, the picture and the marked information need to be uploaded together, and then the computer device obtains a template image on which the anchor point position and the target frame position are set (defined) by the user.
S202、根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系,其中,锚点位置为所述第一锚点文本在对应图像上的位置。S202. According to the first anchor text and based on the text recognition model, obtain the anchor position of the first anchor text on the template image and the location of the first anchor text in a first preset manner. The feature point matching relationship between the feature points contained in the anchor point positions on the detection image, wherein the anchor point position is the position of the first anchor point text on the corresponding image.
其中,文本识别模型,又称为文字识别模型,英文为Text recognition,是指利用计算机自动识别字符的模型,例如OCR文字识别,英文为Optical Character Recognition。Among them, the text recognition model, also known as the text recognition model, is Text recognition in English, which refers to a model that uses computers to automatically recognize characters, such as OCR text recognition, and the English is Optical Character Recognition.
具体地,根据所述第一锚点文本并基于文本识别模型,通过第一预设方式 获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系,其中,锚点位置为所述第一锚点文本在对应图像上的位置,例如,模板图像上的锚点位置为所述第一锚点文本在模板图像上的位置,检测图像上的锚点位置为所述第一锚点文本在检测图像上的位置。可以包括以下两种情形:Specifically, according to the first anchor point text and based on a text recognition model, the anchor point position of the first anchor point text on the template image and the position of the first anchor point text on the template image are acquired in a first preset manner. The feature point matching relationship between the feature points contained in the anchor point positions on the detection image, wherein the anchor point position is the position of the first anchor point text on the corresponding image, for example, the anchor point on the template image The position is the position of the first anchor point text on the template image, and the anchor point position on the detection image is the position of the first anchor point text on the detection image. It can include the following two situations:
1)计算机设备获取属于同一证件类型的模板图像和检测图像后,将所述检测图像进行一次转换,将所述检测图像通过透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像,通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置。例如,请继续参阅图3,如图3所示,锚点文本A2及C2是相同的字段,根据锚点文本A2及C2之间的相同关系,获得A1锚点位置及C1锚点位置的对应关系,并根据特征点提取算法提取A1锚点位置及C1锚点位置各自的特征点,通过特征点匹配算法得到A1及C1之间的特征点匹配关系,再根据A1及C1的特征点匹配关系,得到将检测图像旋转为标准正面图像的透视变换算子,将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像,通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置,通过文本识别模型对检测图像上投影位置所在的区域进行文本识别,提取到目标文本。1) After the computer device obtains the template image and the detection image belonging to the same certificate type, it converts the detection image once, and performs perspective transformation of the detection image through a perspective transformation operator to obtain a perspective consistent with the perspective of the template image The image is transformed, and the projection position of the target frame position on the perspective transformed image is obtained through the perspective transformation operator. For example, please continue to refer to Figure 3. As shown in Figure 3, the anchor text A2 and C2 are the same fields. According to the same relationship between the anchor text A2 and C2, the correspondence between the anchor point position of A1 and the anchor point position of C1 is obtained. According to the feature point extraction algorithm, extract the respective feature points of the anchor point position of A1 and C1 anchor point, and obtain the feature point matching relationship between A1 and C1 through the feature point matching algorithm, and then according to the feature point matching relationship of A1 and C1 , Obtain a perspective transformation operator that rotates the detection image into a standard frontal image, perform perspective transformation on the detection image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image, and pass the perspective transformation The operator obtains the projection position of the target frame position on the perspective transformed image, performs text recognition on the area where the projection position is located on the detected image through a text recognition model, and extracts the target text.
2)在上述(1)情形的基础上,将所述检测图像再进行一次转换,将所述第一透视变换图像通过第二透视变换算子进行透视变换以得到第二透视变换图像,通过所述第二透视变换算子计算出所述目标框位置在所述第二透视变换图像上的投影位置。请继续参阅图3,如图3所示,根据锚点文本A2、C2及F2之间的相同关系,获得A1锚点位置、C1锚点位置及F1锚点位置的对应关系,根据A1及C1的特征点匹配关系,得到将检测图像旋转为标准正面图像的第一透视变换算子,将所述检测图形通过所述第一透视变换算子转换为与所述模板图像视角相符的第一透视变换图像,再由A1及F1的特征点匹配关系得到模板图像与检测图像E之间的透视变换的第二透视变换算子,将目标位置B1通过所述第二透视变换算子投影到检测图像E上,获得检测图像上进行目标文本识别的位置(文本区域)H1,通过文本识别模型对检测图像上H1的区域进行文本识别, 提取到目标文本H2。2) On the basis of the situation in (1) above, the detected image is transformed again, and the first perspective transformed image is perspective transformed by a second perspective transformation operator to obtain a second perspective transformed image. The second perspective transformation operator calculates the projection position of the target frame position on the second perspective transformation image. Please continue to refer to Figure 3, as shown in Figure 3, according to the same relationship between the anchor text A2, C2 and F2, the corresponding relationship between the anchor point position of A1, the anchor point position of C1 and the anchor point position of F1 is obtained, according to A1 and C1 The feature point matching relationship is obtained, and the first perspective transformation operator that rotates the detected image into a standard frontal image is obtained, and the detected image is converted into a first perspective consistent with the perspective of the template image through the first perspective transformation operator The image is transformed, and the second perspective transformation operator for the perspective transformation between the template image and the detection image E is obtained from the feature point matching relationship of A1 and F1, and the target position B1 is projected onto the detection image through the second perspective transformation operator On E, the position (text area) H1 where the target text recognition is performed on the detected image is obtained, the text recognition is performed on the area H1 on the detected image through the text recognition model, and the target text H2 is extracted.
进一步地,请参阅图4,图4为本申请实施例提供的证件中的目标文本提取方法中特征点提取和特征点匹配的流程示意图。如图4所示,对模板图像和检测图像进行特征点提取和匹配,在透视变换中需要找到变换前图像和变换后图像对应的点,从而来计算用于透视变换的矩阵作为变换算子。而要找到这样的对应关系,在本申请实施例中通过特征点提取算法和特征点匹配算法以使用算法的统一标准进行自动匹配。在本申请实施例中,需要对模板图像和检测图像中相对应的锚点通过特征点提取算法进行特征点的提取,然后再通过特征点匹配算法将特征点进行匹配,从而根据特征点的匹配关系计算出透视变换的算子。Further, please refer to FIG. 4, which is a schematic diagram of the process of feature point extraction and feature point matching in the method for extracting target text in the certificate provided in an embodiment of this application. As shown in Figure 4, feature point extraction and matching are performed on the template image and the detection image. In the perspective transformation, it is necessary to find the corresponding points of the image before and after the transformation, so as to calculate the matrix used for the perspective transformation as the transformation operator. To find such a corresponding relationship, in the embodiment of the present application, a feature point extraction algorithm and a feature point matching algorithm are used to perform automatic matching using a unified standard of the algorithm. In the embodiment of the present application, the corresponding anchor points in the template image and the detection image need to be extracted through the feature point extraction algorithm, and then the feature points are matched through the feature point matching algorithm, so as to match the feature points according to the feature point matching algorithm. The relationship calculates the operator of the perspective transformation.
其中,特征点提取算法会对图像的每一个点与其周围点进行比较,按算法包含的标准计算出每个点的特征值,此处的标准是指计算该特征点值的方法,比如,可以采用SIFT算法(尺度不变特征转换,英文为Scale-invariant feature transform,简称为SIFT),或者采用SURF算法(全称是Speeded-Up Robust Features),如果这个点在其所在区域内是最大或者最小值,则可以认为这是一个特征点。然后通过对每个特征点赋予一个高维方向参数反映其在不同方向上的梯度信息,作为这个点的特征参数或叫特征向量,也即从不同角度使用不同参数来描述该特征点。需要说明的是,后续特征点是否匹配,不是特征点在各自图像上位置的匹配,而是匹配的特征点在各自图像中的局部区域上具有相似的性质,或者称为具有相似属性,在图形进行透视变换后可以重合的对应的点。请继续参阅图3,假如图3中A锚点存在特征点Am,F锚点存在特征点Fn,m和n为整数,特征点Am和Fn属于匹配的特征点,不是因为特征点Am和Fn在各自图像中的位置相同,比如均为图形所在矩形的对应顶点,而是Am和Fn采用统一标准计算出的特征值,比如均采用SIFT算法或者采用SURF算法计算出的特征值,经特征向量余弦相似度或者计算两个特征向量的距离计算之后满足匹配关系要求的点。特征点提取后,通过特征点匹配算法统计特征点之间的匹配关系,例如可以通过两个特征点之间的特征向量余弦相似度或者计算两个特征向量的距离来判断特征点是否匹配。Among them, the feature point extraction algorithm compares each point of the image with its surrounding points, and calculates the feature value of each point according to the standard included in the algorithm. The standard here refers to the method of calculating the feature point value, for example, you can Use SIFT algorithm (Scale-invariant feature transform in English, SIFT for short), or use SURF algorithm (Full name is Speeded-Up Robust Features), if this point is the largest or smallest in the area , You can consider this to be a characteristic point. Then, by assigning a high-dimensional direction parameter to each feature point to reflect its gradient information in different directions, it is used as the feature parameter of this point or called feature vector, that is, different parameters are used to describe the feature point from different angles. It should be noted that whether the subsequent feature points are matched is not the matching of the location of the feature points on the respective images, but the matched feature points have similar properties in the local area of the respective images, or they are called similar attributes. Corresponding points that can be overlapped after perspective transformation. Please continue to refer to Figure 3. If there is a feature point Am at anchor point A in Figure 3 and feature point Fn exists at anchor point F, m and n are integers, and feature points Am and Fn are matched feature points, not because of feature points Am and Fn The positions in the respective images are the same. For example, they are the corresponding vertices of the rectangle where the graphics are located. Instead, Am and Fn are eigenvalues calculated using a unified standard. For example, the eigenvalues calculated by the SIFT algorithm or the SURF algorithm are used by Cosine similarity or the point that meets the requirements of the matching relationship after calculating the distance between two eigenvectors. After the feature points are extracted, the matching relationship between the feature points is counted by the feature point matching algorithm, for example, the cosine similarity of the feature vectors between the two feature points or the distance between the two feature vectors can be used to determine whether the feature points match.
进一步地,在进行特征点匹配时,匹配的特征点是周围变化情况相似的点, 例如可以计算模板图像和检测图像两张图像上点与点之间的特征向量的余弦相似度,并按照余弦相似度对点进行排序。例如,假设模板图像上存在特征点A,及与检测图像上的特征点计算相似度之后,得到检测图像上余弦相似度最大的点为A1,余弦相似度第二大的点为A2,若出现A与A1的特征向量的相似度为0.98,A与A2的相似度为0.97,若出现这样比较相近的情形,判断特征点A在该检测图像上没有匹配的特征点,A不参与后续透视变换算子的计算,若出现A与A1相似度为0.98,A与A2相似度为0.68,判断A与A1为匹配的特征点,从而将A与A1纳入后续透视变换算子的计算,即在这个过程中设定一个阈值,如果对于正在判断的特征点,计算其第一相似点和第二相似点的相似度之差,当这个差不小于预设阈值时,判断找到了与该特征点唯一匹配的特征点,将两个点都纳入后续计算,反之如果这个差小于预设阈值,则认为该特征点找不到唯一匹配点,则不纳入后续计算。Further, when performing feature point matching, the matched feature points are points with similar surrounding changes. For example, the cosine similarity of the feature vector between the points on the two images of the template image and the detection image can be calculated, and the cosine similarity can be calculated according to the cosine. The similarity ranks the points. For example, suppose there is a feature point A on the template image, and after calculating the similarity with the feature points on the detection image, the point with the largest cosine similarity on the detection image is A1, and the point with the second largest cosine similarity is A2. The similarity of the feature vector between A and A1 is 0.98, and the similarity between A and A2 is 0.97. If such a similar situation occurs, it is judged that the feature point A does not have a matching feature point on the detected image, and A does not participate in the subsequent perspective transformation In the calculation of the operator, if the similarity between A and A1 is 0.98, and the similarity between A and A2 is 0.68, it is judged that A and A1 are matching feature points, so that A and A1 are included in the calculation of the subsequent perspective transformation operator. A threshold is set during the process. If the feature point is being judged, the difference between the similarity between the first similar point and the second similar point is calculated. When the difference is not less than the preset threshold, it is judged that the feature point is unique For the matched feature point, both points are included in the subsequent calculations. On the contrary, if the difference is less than the preset threshold, it is considered that the feature point cannot find a unique matching point and is not included in the subsequent calculation.
S203、根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子。S203: According to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image.
具体地,根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子,是要找出输入的检测图像和给定的模板图像上对应的点的位置,且找到至少四对匹配的特征点,就可以计算出将检测图像变换为与模板图像视角相符进行旋转所需要的透视变换的变换算子。Specifically, according to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detection image is to find out the corresponding points on the input detection image and the given template image If at least four pairs of matching feature points are found, the transformation operator for the perspective transformation required to transform the detected image into a rotation that matches the perspective of the template image can be calculated.
进一步地,可以结合全文识别计算透视变换算子,透视变换算子计算过程如下:ax=b,a及b为已知的特征点的坐标,x为算子,其中,x为一个矩阵,包括9个值。Further, the perspective transformation operator can be calculated in conjunction with the full text recognition. The calculation process of the perspective transformation operator is as follows: ax=b, a and b are the coordinates of the known feature points, and x is the operator, where x is a matrix, including 9 values.
S204、将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像。S204: Perform perspective transformation on the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image.
具体地,计算机设备通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子,再应用透视变换技术将检测图像通过所述透视变换算子进行透视变换为与模板图像视角相符的标准正面图像。将检测图像通过透视变换算子在三维空间进行平移和旋转,也即将坐标在三维空间进行移动,然后通过拍照获取二维平面上的投影,从而将检测图像按照模板图像自动矫正为与 模板图像相同视角的同一正面标准图像,相比传统技术中通过人工将检测图像进行视角变换,可以大大减少人力消耗,也能提高文本识别的准确性。透视变换的过程也可以为通过透视变换算子将三维空间上图像上的一个一个坐标逐个转换为二维平面上的坐标,以获得检测图像的标准正面图像。其中,透视变换是一种将二维图片在三维空间进行旋转后再投影到二维平面形成二维图形的方法,透视变换更直观的叫法可以叫做“空间变换”或者“三维坐标变换。Specifically, the computer device performs a solution through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detection image, and then applies perspective transformation technology to perform perspective transformation of the detection image through the perspective transformation operator to match the perspective of the template image. The standard frontal image. The detection image is translated and rotated in the three-dimensional space through the perspective transformation operator, that is, the coordinates are moved in the three-dimensional space, and then the projection on the two-dimensional plane is obtained by taking pictures, so as to automatically correct the detection image to be the same as the template image according to the template image The same frontal standard image with a viewing angle can greatly reduce labor consumption and improve the accuracy of text recognition compared to the traditional technology by manually changing the viewing angle of the detected image. The process of perspective transformation may also be to convert one by one coordinates on the image in the three-dimensional space into coordinates on the two-dimensional plane through a perspective transformation operator to obtain a standard frontal image of the detected image. Among them, perspective transformation is a method of rotating a two-dimensional picture in a three-dimensional space and then projecting it onto a two-dimensional plane to form a two-dimensional figure. The more intuitive term for perspective transformation can be called "space transformation" or "three-dimensional coordinate transformation."
进一步地,请参阅图5(a)和图5(b),图5(a)和图5(b)为本申请实施例提供的透视变换原理示意图。首先将二维图像中的所有点(x,y)在三维空间中的第三个维度的值看作一个固定值,比如z=1,可以把所有的二维点转换为三维空间的点(x,y,1),然后通过将每个点与一个3x3的变换矩阵相乘得到旋转后的点(X,Y,Z),3x3矩阵可以描述图像在三维空间的刚体变换,也正是本申请实施例中所需的变换方式,而少于3x3的矩阵是无法描述这种关系的。在对图像进行一个在三维空间上的旋转之后,通过对每个点都除以z坐标的值,就可将所有点转换为(X/Z,Y/Z,1)从而将三维图像的点再次投影回z=1的二维平面得到点(x’,y’),其中x’=X/Z,y’=Y/Z。3x3矩阵中的参数单个没有具体的含义,九个参数合起来表示透视变换的算子,3x3的变换矩阵有九个值,但是由于最后只需要变换后的三维图像在二维平面的投影,所以可以将9个值中的任意一个设置为1,所以在求解变换算子的时候就只有8个未知数,于是要求解就需要找到四组特征点作为映射点,四组映射点也即四组匹配关系,四组匹配关系就刚好确定了一个透视变换关系。由于3x3矩阵中包含9个未知数,可以设置其中任意一个未知数为1,还需要获得8个未知数的值作为透视变换的算子,所以至少需要四组特征点对应的四个匹配关系才能获得8个未知数。虽然至少需要四组匹配关系,但一般情况下会有几十上百个特征点,针对提取出的多个特征点求误差函数的极限从而确定误差最小的算子。Further, please refer to Fig. 5(a) and Fig. 5(b), Fig. 5(a) and Fig. 5(b) are schematic diagrams of the perspective transformation principle provided by the embodiment of the application. First of all, the value of the third dimension of all points (x, y) in the three-dimensional space in the two-dimensional image is regarded as a fixed value, such as z=1, and all two-dimensional points can be converted into points in three-dimensional space ( x, y, 1), and then by multiplying each point with a 3x3 transformation matrix to obtain the rotated point (X, Y, Z), the 3x3 matrix can describe the rigid body transformation of the image in the three-dimensional space, which is exactly the original The transformation method required in the application embodiment, and the matrix less than 3x3 cannot describe this relationship. After the image is rotated in a three-dimensional space, by dividing each point by the value of the z coordinate, all points can be converted to (X/Z, Y/Z, 1) to convert the points of the three-dimensional image Project back to the two-dimensional plane with z=1 again to obtain the point (x', y'), where x'=X/Z, y'=Y/Z. The parameters in the 3x3 matrix have no specific meaning. The nine parameters together represent the operator of the perspective transformation. The 3x3 transformation matrix has nine values, but since only the projection of the transformed three-dimensional image on the two-dimensional plane is required in the end, so Any one of the 9 values can be set to 1, so when solving the transformation operator, there are only 8 unknowns. Therefore, when the solution is required, four sets of feature points need to be found as mapping points, and the four sets of mapping points are four sets of matching Relationship, the four sets of matching relationships just determine a perspective transformation relationship. Since the 3x3 matrix contains 9 unknowns, any one of the unknowns can be set to 1, and the value of 8 unknowns needs to be obtained as the operator of perspective transformation, so at least four matching relationships corresponding to four sets of feature points are required to obtain 8 Unknown. Although at least four sets of matching relationships are required, in general, there will be dozens or hundreds of feature points, and the limit of the error function is calculated for the extracted multiple feature points to determine the operator with the smallest error.
通过这样变换之后,即可完成图像在三维空间中的旋转和将三维图像投影回二维空间的过程,从而对图像进行不同视角的变换,将一些非标准视角的图像变换为与模板图像相匹配的标准视角图像以在文本识别中提取指定位置的文本。请参阅图6,图6为本申请实施例提供的通过透视变换矫正图像流程示意图。 如图6所示,为了实现这种变换,就如图5所示那样,需要一个3x3的变换矩阵与(x,y,1)相乘,而要找到这样一个矩阵,需要找到至少四个被转换的检测图像与模板图像上对应的特征点。After this transformation, the process of rotating the image in the three-dimensional space and projecting the three-dimensional image back to the two-dimensional space can be completed, thereby transforming the image from different perspectives, and transforming some non-standard perspective images to match the template image The standard viewing angle of the image can be used to extract the text at the specified position in the text recognition. Please refer to FIG. 6, which is a schematic diagram of an image correction process through perspective transformation provided by an embodiment of the application. As shown in Figure 6, in order to achieve this transformation, as shown in Figure 5, a 3x3 transformation matrix needs to be multiplied by (x, y, 1), and to find such a matrix, you need to find at least four The converted detection image and the corresponding feature points on the template image.
S205、通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置。S205: Obtain a projection position of the target frame position on the perspective transformed image through the perspective transformation operator.
具体地,计算机设备根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子,再将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像,就可以通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置。例如,请继续参阅图3,通过透视变换算子获取所述目标框位置B1在所述透视变换图像上的投影位置H1。Specifically, the computer device performs a solution based on the feature point matching relationship through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detection image, and then performs perspective transformation on the detection image through the perspective transformation operator In order to obtain a perspective transformation image that matches the perspective of the template image, the projection position of the target frame position on the perspective transformation image can be obtained through the perspective transformation operator. For example, please continue to refer to FIG. 3 to obtain the projection position H1 of the target frame position B1 on the perspective transformed image through a perspective transformation operator.
S206、通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。S206: Perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.
其中,文本识别是对文字的识别,文字识别就是一个多分类任务。本申请实施例中的文本识别模型,实质上是两个模型的组合模型,一个是先进行检测文本位置的位置检测模型,另一个是后进行文字识别的文字识别模型。Among them, text recognition is the recognition of text, and text recognition is a multi-classification task. The text recognition model in the embodiments of the present application is essentially a combined model of two models, one is a position detection model that detects text positions first, and the other is a text recognition model that performs text recognition later.
具体地,计算机设备通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置,通过所述文本识别模型对投影到变换后的图像上的目标框所标识出的区域进行框内文本的识别和提取,以得到所述检测图像的目标文本,从而实现通过将传统计算机视觉技术中的透视变换和特征点匹配两项技术与全文识别结合,实现将输入图像转换为与模板图像相同视角之后进行对指定区域的文本识别和提取。例如,请继续参阅图3,在本申请实施例中,是要获取模板图像的锚点位置A1、锚点文本A2及目标框位置B1,现在是要精确提取目标框位置B1在检测图像上对应区域的文本内容,本申请实施例中,由于锚点文本A2、C2及F2是相同的字段,文本识别的过程主要包括:1)根据锚点文本A2、C2及F2之间的相同关系,获得A1锚点位置、C1锚点位置及F1锚点位置的对应关系;根据A1及C1的特征点匹配关系,得到将检测图像旋转为标准正面图像的算子D;将检测图像旋转为与模板图像符合的标准正面图像E。2) 由A1及F1的特征点匹配关系得到模板图像与检测图像E之间的透视变换的算子G;将目标位置B1透过G投影到检测图像E上,获得检测图像上进行目标文本识别的位置(文本区域)H1;通过文本识别模型对检测图像上H1的区域进行文本识别,提取到目标文本H2。Specifically, the computer device obtains the projection position of the target frame position on the perspective transformed image through the perspective transformation operator, and uses the text recognition model to identify the target frame projected on the transformed image. Recognition and extraction of the text in the box are performed in the area to obtain the target text of the detected image, so as to realize the conversion of the input image to the full-text recognition by combining the two technologies of perspective transformation and feature point matching in the traditional computer vision technology After the same viewing angle as the template image, the text recognition and extraction of the specified area are performed. For example, please continue to refer to Figure 3. In the embodiment of this application, the anchor point position A1, the anchor point text A2 and the target frame position B1 of the template image are to be obtained. Now it is necessary to accurately extract the target frame position B1 corresponding to the detected image The text content of the area. In the embodiment of this application, since the anchor text A2, C2, and F2 are the same fields, the process of text recognition mainly includes: 1) According to the same relationship between the anchor text A2, C2, and F2, obtain Correspondence between A1 anchor point position, C1 anchor point position and F1 anchor point position; According to the feature point matching relationship of A1 and C1, the operator D that rotates the detection image into a standard frontal image is obtained; the detection image is rotated to the template image Meet the standard frontal image E. 2) Obtain the perspective transformation operator G between the template image and the detection image E from the feature point matching relationship of A1 and F1; project the target position B1 onto the detection image E through G to obtain the target text recognition on the detection image The position (text area) H1; text recognition is performed on the area of H1 on the detected image through the text recognition model, and the target text H2 is extracted.
本申请实施例提供了一种证件中的目标文本提取方法,通过将传统计算机视觉技术中的透视变换和特征点匹配两项技术与全文识别结合,实现将输入图像转换为模板图像相同视角之后进行对指定区域的文本识别和提取,既避免了对每种证件的不同提取需求完全自定义逻辑所带来的人力和时间消耗,大大减少了成本消耗,另一方面也避免了太过通用的逻辑带来的提取不够精确地问题,能提高文本识别的准确性和效率。The embodiment of the application provides a method for extracting target text in a certificate. By combining the two technologies of perspective transformation and feature point matching in traditional computer vision technology with full text recognition, the input image is converted into a template image with the same perspective. The text recognition and extraction of the designated area not only avoids the labor and time consumption caused by the complete custom logic for the different extraction requirements of each type of document, which greatly reduces the cost consumption, but also avoids too general logic. The extraction problem caused by insufficient precision can improve the accuracy and efficiency of text recognition.
请参阅图7,图7为本申请实施例提供的证件中的目标文本提取方法的另一个实施例的流程示意图,包括以下过程:Please refer to FIG. 7. FIG. 7 is a schematic flowchart of another embodiment of a method for extracting target text in a certificate according to an embodiment of the application, including the following processes:
S701、获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上标注有文字锚点和目标框位置,其中,所述文字锚点包括第一锚点文本和第一锚点位置。S701. Obtain a template image belonging to the same certificate type and a detection image for extracting target text, where the template image is marked with a text anchor point and a target frame position, wherein the text anchor point includes the first anchor text and The position of the first anchor point.
具体地,在该实施例中,所述文字锚点还包括第一锚点位置,只需要用户预先设置文字锚点包含的第一锚点文本、第一锚点位置及目标框位置即可,计算机设备获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像。例如请参阅图3和图8,图8为图7所示实施例提供的证件中的目标文本提取方法中技术特征关系简化的流程示意图,如图3和图8所示,在本申请实施例中是要获取模板图像的锚点位置A1、锚点文本A2及目标框位置B1,以通过A1和A2实现精确提取目标框位置B1在检测图像上对应区域的文本内容。Specifically, in this embodiment, the text anchor point further includes the first anchor point position, and the user only needs to preset the first anchor point text, the first anchor point position, and the target frame position included in the text anchor point. The computer equipment obtains the template image belonging to the same certificate type and the detection image used to extract the target text. For example, please refer to FIG. 3 and FIG. 8. FIG. 8 is a schematic diagram of the simplified flow chart of the technical feature relationship in the target text extraction method in the certificate provided by the embodiment shown in FIG. 7, as shown in FIG. 3 and FIG. The middle is to obtain the anchor point position A1, the anchor point text A2, and the target frame position B1 of the template image, so as to accurately extract the text content of the target frame position B1 in the corresponding area on the detection image through A1 and A2.
S702、通过文本识别模型提取所述检测图像上与所述第一锚点文本一致的第二锚点文本。S702: Extract a second anchor point text on the detection image that is consistent with the first anchor point text through a text recognition model.
具体地,需要首先通过文本识别模型提取所述检测图像上与模板图像上的所述第一锚点文本一致的第二锚点文本。例如,请继续参阅图3和图8,如图3和图8所示,在本申请实施例中是要获取与模板图像的锚点文本A2相同的检测图像上的锚点文本C2,以通过A2和C2得到A1和C1的对应关系。Specifically, it is necessary to first extract the second anchor text on the detection image that is consistent with the first anchor text on the template image through a text recognition model. For example, please continue to refer to Figures 3 and 8. As shown in Figures 3 and 8, in the embodiment of the present application, the anchor text C2 on the detection image that is the same as the anchor text A2 of the template image is to be obtained to pass A2 and C2 get the corresponding relationship between A1 and C1.
S703、基于所述文本识别模型通过所述第二锚点文本得到所述检测图像上与所述第一锚点位置相对应的第二锚点位置。S703: Obtain a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model.
具体地,基于所述文本识别模型通过所述第二锚点文本得到所述检测图像上与所述第一锚点位置相对应的第二锚点位置,请继续参阅图3和图8,获取检测图像,将想要检测的图像输入文本识别模型,要找出与模板图像中定义的锚点文本所在的区域A1对应相匹配的字段区域C1,需要通过所述文本识别模型先在检测图像上找出与A2字段一致的字段C2,通过C2获取C2所在的字段区域C1,也就找出了与锚点位置A1相匹配的字段区域C1,比如,身份证的模板图像与身份证的检测图像中的“姓名”字段所在的区域A1和C1。Specifically, the second anchor point position corresponding to the first anchor point position on the detection image is obtained through the second anchor point text based on the text recognition model. Please continue to refer to FIGS. 3 and 8 to obtain Detect the image and input the image you want to detect into the text recognition model. To find out the field area C1 that matches the area A1 where the anchor text defined in the template image is located, the text recognition model needs to be used on the detected image first Find the field C2 that is consistent with the A2 field, get the field area C1 where C2 is located through C2, and find the field area C1 that matches the anchor point position A1, for example, the template image of the ID card and the detection image of the ID card Areas A1 and C1 where the "name" field is located in.
S704、基于预设的特征点提取算法提取所述第一锚点位置包含的第一特征点集合和所述第二锚点位置包含的第二特征点集合。S704: Extract a first feature point set included in the first anchor point position and a second feature point set included in the second anchor point position based on a preset feature point extraction algorithm.
S705、根据所述第一特征点集合和所述第二特征点集合,基于特征点匹配算法获取所述第一特征点集合和所述第二特征点集合中的特征点之间的第一特征点匹配关系。S705. Acquire the first feature between the feature points in the first feature point set and the second feature point set based on a feature point matching algorithm according to the first feature point set and the second feature point set Point matching relationship.
具体地,根据步骤S202中的特征点提取算法提取所述第一锚点位置包含的第一特征点集合和所述第二锚点位置包含的第二特征点集合,根据所述第一特征点集合和所述第二特征点集合,再基于步骤S202中的特征点匹配算法获取所述第一特征点集合和所述第二特征点集合中的特征点之间的第一特征点匹配关系。例如,请继续参阅图3和图8,基于预设的特征点提取算法提取所述第一锚点位置A1包含的第一特征点集合和所述第二锚点位置C1包含的第二特征点集合,根据所述第一特征点集合和所述第二特征点集合,基于特征点匹配算法获取所述第一特征点集合和所述第二特征点集合中的特征点之间的第一特征点匹配关系。Specifically, the first feature point set included in the first anchor point position and the second feature point set included in the second anchor point position are extracted according to the feature point extraction algorithm in step S202, and the first feature point set is extracted according to the first feature point. The first feature point matching relationship between the first feature point set and the feature points in the second feature point set is obtained based on the feature point matching algorithm in step S202. For example, please continue to refer to Figures 3 and 8, based on the preset feature point extraction algorithm to extract the first feature point set contained in the first anchor point position A1 and the second feature point contained in the second anchor point position C1 A set, according to the first feature point set and the second feature point set, based on a feature point matching algorithm to obtain the first feature between the feature points in the first feature point set and the second feature point set Point matching relationship.
S706、根据所述第一特征点匹配关系,通过变换矩阵进行求解以计算出将所述检测图像进行透视变换的第一透视变换算子。S706: According to the first feature point matching relationship, perform a solution through a transformation matrix to calculate a first perspective transformation operator that performs perspective transformation on the detection image.
具体地,请继续参阅图3和图8,提取A1和C1的特征点,根据A1和C1的特征点组成的特征点匹配关系计算出透视变换的第一算子D。Specifically, please continue to refer to FIGS. 3 and 8 to extract the feature points of A1 and C1, and calculate the first operator D of the perspective transformation according to the feature point matching relationship composed of the feature points of A1 and C1.
S707、将所述检测图像通过所述第一透视变换算子进行透视变换以得到与 所述模板图像视角相符的第一透视变换图像。S707. Perform perspective transformation on the detected image through the first perspective transformation operator to obtain a first perspective transformed image that matches the angle of view of the template image.
具体地,请继续参阅图3和图8,通过所述第一透视变换算子D将检测图像变换为与模板图像视角相符的标准正面图像E,并通过所述第一透视变换算子D获取所述目标框B1位置在所述第一透视变换图像上的投影位置,通过所述文本识别模型对投影到变换后的第一透视图像上的目标框所标识出的区域进行框内文本的识别和提取,以得到所述检测图像的目标文本。Specifically, please continue to refer to Figures 3 and 8, the detection image is transformed into a standard frontal image E conforming to the template image perspective through the first perspective transformation operator D, and the first perspective transformation operator D is used to obtain The projection position of the position of the target frame B1 on the first perspective transformed image, and the text recognition is performed on the area identified by the target frame projected on the transformed first perspective image through the text recognition model And extraction to obtain the target text of the detected image.
进一步地,将所述检测图像通过第一透视变换算子进行透视变换后,由于变换后的第一透视变换图像可能依然与模板图像存在一定的视角变差,接下来不是将目标框位置完全不变地直接映射到变换后的第一透视变换图像上,而是找到一个模板图像与变换后的第一透视变换图像之间的第二透视变换算子将目标框通过所述第二透视变换算子通过透视变换投影到变换后的第二透视变换图像上。请继续参阅图3、图7和图8,在该实施例中,所述将所述检测图像通过所述第一透视变换算子进行透视变换以得到与所述模板图像视角相符的第一透视变换图像的步骤之后,还包括:Further, after the detection image is subjected to perspective transformation by the first perspective transformation operator, since the transformed first perspective transformation image may still have a certain perspective difference from the template image, the next step is not to change the position of the target frame at all. The change is directly mapped to the transformed first perspective transformation image, but a second perspective transformation operator between the template image and the transformed first perspective transformation image is found to pass the target frame through the second perspective transformation calculation. The sub is projected onto the transformed second perspective transformed image through perspective transformation. Please continue to refer to FIG. 3, FIG. 7 and FIG. 8. In this embodiment, the detection image is subjected to perspective transformation by the first perspective transformation operator to obtain a first perspective that matches the perspective of the template image. After the step of transforming the image, it also includes:
S708、将所述第一透视变换图像输入所述文本识别模型,通过所述第一锚点文本获取所述第一透视变换图像上与所述第一锚点位置相对应的第三锚点位置。S709、基于所述特征点提取算法提取所述第三锚点位置包含的第三特征点集合。S710、根据所述第一特征点集合和所述第三特征点集合,基于所述特征点匹配算法获取所述第一特征点集合和所述第三特征点集合中的特征点之间的第二特征点匹配关系。S711、根据所述第二特征点匹配关系,通过所述变换矩阵进行求解以计算出将所述第一透视变换图像进行透视变换的第二透视变换算子。S712、将所述第一透视变换图像通过所述第二透视变换算子进行透视变换以得到第二透视变换图像。S708. Input the first perspective transformation image into the text recognition model, and obtain a third anchor point position on the first perspective transformation image corresponding to the first anchor point position through the first anchor point text . S709: Extract a third set of feature points included in the position of the third anchor point based on the feature point extraction algorithm. S710. According to the first feature point set and the third feature point set, obtain the first feature point set between the feature points in the first feature point set and the third feature point set based on the feature point matching algorithm. Two feature point matching relationship. S711: According to the second feature point matching relationship, solving through the transformation matrix to calculate a second perspective transformation operator that performs perspective transformation on the first perspective transformation image. S712: Perform perspective transformation on the first perspective transformation image through the second perspective transformation operator to obtain a second perspective transformation image.
具体地,步骤S708至步骤S710与步骤S703至步骤S707变换过程类似。请继续参阅图3和图8,将变换后的所述第一透视变换图像对应的标准正面图像E输入文本识别模型,找出与模板图像锚点文本A2所在的区域A1相匹配的文本区域F1,基于所述特征点提取算法提取所述第三锚点位置F1包含的第三特征点集合,根据A1和F1各自对应的特征点集合,将A1和F1进行特征点提取和匹 配,基于所述特征点匹配算法获取所述第一特征点集合和所述第三特征点集合中的特征点之间的第二特征点匹配关系,通过所述第二特征点匹配关系计算出透视变换的第二透视变换算子G,将所述第一透视变换图像通过所述第二透视变换算子G进行透视变换以得到第二透视变换图像,从而使所述第二透视变换图像与模板图像的视角尽可能一致,并通过所述第二透视变换算子G最终得到所述目标框B1在所述第二透视变换图像上的投影H1。Specifically, the transformation process from step S708 to step S710 is similar to that of step S703 to step S707. Please continue to refer to Figures 3 and 8, input the transformed standard frontal image E corresponding to the first perspective transformed image into the text recognition model, and find the text area F1 that matches the area A1 where the anchor text A2 of the template image is located , Extract the third feature point set contained in the third anchor point position F1 based on the feature point extraction algorithm, and perform feature point extraction and matching on A1 and F1 according to the feature point sets corresponding to each of A1 and F1, based on the The feature point matching algorithm obtains the second feature point matching relationship between the feature points in the first feature point set and the third feature point set, and calculates the second perspective transformation based on the second feature point matching relationship. The perspective transformation operator G performs perspective transformation on the first perspective transformation image through the second perspective transformation operator G to obtain a second perspective transformation image, so that the perspectives of the second perspective transformation image and the template image are maximized It may be consistent, and the projection H1 of the target frame B1 on the second perspective transformed image is finally obtained through the second perspective transformation operator G.
S713、通过所述第二透视变换算子计算出所述目标框位置在所述第二透视变换图像上的投影位置。S714、将所述第二透视变换图像输入所述文本识别模型,通过所述文本识别模型对所述第二透视变换图像上的投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。S713: Calculate the projection position of the target frame position on the second perspective transformed image by using the second perspective transformation operator. S714. Input the second perspective transformation image into the text recognition model, perform text recognition on the text at the projection position on the second perspective transformation image through the text recognition model, and extract the recognized text to Obtain the target text of the detected image.
具体地,通过所述第二透视变换算子G计算出目标框B1在变换后的所述第二透视变换图像上的投影H1,是先通过第一透视变换算子计算出目标框B1在变换后的所述第一透视变换图像上的投影H1',再对投影H1'使用第二透视变换算子进行透视变换以得到目标框B1在所述第二透视变换图像上的投影H1,对所述第二透视变换图像上所标识出的区域H1内的文本通过文本识别模型进行框内文本的识别和提取,以得到所述检测图像的目标文本H2。Specifically, the projection H1 of the target frame B1 on the transformed second perspective transformation image is calculated through the second perspective transformation operator G, and the target frame B1 is first calculated through the first perspective transformation operator. After the projection H1' on the first perspective transformed image, the second perspective transformation operator is used to perform the perspective transformation on the projection H1' to obtain the projection H1 of the target frame B1 on the second perspective transformed image. The text in the region H1 identified on the second perspective transformation image is recognized and extracted from the text in the frame through a text recognition model to obtain the target text H2 of the detection image.
在一个实施例中,所述根据所述第一特征点匹配关系,通过变换矩阵进行求解以计算出将所述检测图像进行透视变换的第一透视变换算子的步骤包括:In an embodiment, the step of calculating a first perspective transformation operator for performing perspective transformation on the detected image by solving through a transformation matrix according to the first feature point matching relationship includes:
利用每四对特征点之间的匹配关系通过所述变换矩阵进行求解以得到一个透视变换算子;根据所述第一特征点集合和所述第二特征点集合中的所有匹配特征点中每四对特征点之间的组合,重复上述通过每四对特征点获得一个透视变换算子的过程,得到多个透视变换算子,并将所述多个透视变换算子组成集合作为透视变换算子集合;根据预先构建的透视变换算子的误差函数,通过求极限的方式得到所述误差函数中最小值对应的所述透视变换算子集合中的透视变换算子作为所述第一透视变换算子。Use the matching relationship between every four pairs of feature points to solve through the transformation matrix to obtain a perspective transformation operator; according to each of all matching feature points in the first feature point set and the second feature point set Combining the four pairs of feature points, repeat the above process of obtaining a perspective transformation operator for every four pairs of feature points to obtain multiple perspective transformation operators, and use the multiple perspective transformation operators as a set of perspective transformation operators Subset; according to the pre-built error function of the perspective transformation operator, the perspective transformation operator in the perspective transformation operator set corresponding to the minimum value in the error function is obtained by seeking the limit as the first perspective transformation operator.
具体地,对于变换算子的计算,首先当矩阵的九个值同乘或者同除以一个数后得到了矩阵,其应用于图像时所产生的变换效果是一样的,所以可以预先设九个值中的其中一个为1,然后再根据匹配关系去解另外八个值。进行上述步 骤时,当有且仅有四对匹配关系时,可以求出该矩阵的唯一解;当匹配关系小于四对时,会有无穷解,进而无法得到唯一变换关系;而通常情况下,匹配关系会远远大于四个,这时候是方程组通常是没有解的。在匹配点大于四个的情况下,则需要通过求极限找到一个令变换后总误差最小的解。Specifically, for the calculation of the transformation operator, first, when the nine values of the matrix are multiplied by the same number or divided by the same number, the matrix is obtained. The transformation effect produced when it is applied to the image is the same, so nine values can be preset One of the values is 1, and then the other eight values are solved according to the matching relationship. When the above steps are performed, when there are and only four pairs of matching relationships, the unique solution of the matrix can be found; when the matching relationship is less than four pairs, there will be infinite solutions, and the unique transformation relationship cannot be obtained; in general, The matching relationship will be far greater than four. At this time, the equation system usually has no solution. In the case of more than four matching points, it is necessary to find a solution that minimizes the total error after transformation by seeking the limit.
求取极限通常是构建关于变换算子的误差函数,根据该误差函数的变化趋势去找到最小值的位置。通过求极限找到一个令变换后总误差最小的算子,求极限的时候就是构建一个误差函数f(D),D代表变换算子,是一个未知变量,f(D)代表总误差的计算公式,是一个关于D的函数。要做的就是找到能使f(D)取得最小值的D的值,比如,若f(D6)能使f(D)取得最小值,算子D6就是筛选出的较准确的算子。更进一步地,f(D)的构建过程如下:Obtaining the limit is usually to construct an error function about the transformation operator, and find the position of the minimum value according to the change trend of the error function. Find an operator that minimizes the total error after the transformation by seeking the limit. When seeking the limit, an error function f(D) is constructed. D represents the transformation operator, which is an unknown variable, and f(D) represents the calculation formula of the total error. , Is a function of D. What we need to do is to find the value of D that can make f(D) get the minimum value. For example, if f(D6) can make f(D) get the minimum value, the operator D6 is the more accurate operator selected. Furthermore, the construction process of f(D) is as follows:
对于任一算子D,其总误差用函数描述的过程如下,比如,有两个匹配的特征点A1和A11,A11称为A1的特征点,通过算子D计算出与A1对应的点为A12,A12称为A1的对应点,A1*算子D=A12,计算A11和A12的距离d1,A11和A12的距离d1越小,请参阅图9,图9为本申请实施例提供的证件中的目标文本提取方法中透视变换算子的示意图,如图9所示,表明算子D的误差越小,假如有A1、A2、A3…A100个特征点,也就有100个匹配关系,通过上述方式分别计算出这100个匹配关系各自对应的特征点和对应点之间的距离d1、d2、d3…d100,算子D对应于这100个匹配关系的总误差为:f(D)=d1+d2+d3+…+d100。以此类推,如果有n个匹配关系,其总误差则为:f(D)=d1+d2+d3+…+dn。依照上述过程,误差函数f(D)可以描述为f(D)=d1+d2+d3+…+dn,根据误差函数f(D)求该误差函数的最小值,最小值对应的算子就是将检测图像作透视变换较准确的算子。需要说明的是,求误差的方法不限于上述示例,也可以采用其他求误差的方法,比如,均方误差、交叉熵或者对数似然误差,在此不再赘述。For any operator D, the process of describing the total error with a function is as follows. For example, there are two matching feature points A1 and A11. A11 is called the feature point of A1, and the point corresponding to A1 is calculated by the operator D as A12, A12 are called the corresponding points of A1, A1* operator D=A12, calculate the distance d1 between A11 and A12, the smaller the distance d1 between A11 and A12, please refer to Figure 9, Figure 9 is the certificate provided by the embodiment of this application The schematic diagram of the perspective transformation operator in the target text extraction method in Figure 9 shows that the smaller the error of the operator D, if there are 100 feature points A1, A2, A3...A, there will be 100 matching relationships. The distances d1, d2, d3...d100 between the feature points corresponding to the 100 matching relationships and the corresponding points are respectively calculated by the above method, and the total error of the operator D corresponding to the 100 matching relationships is: f(D) =d1+d2+d3+...+d100. By analogy, if there are n matching relationships, the total error is: f(D)=d1+d2+d3+...+dn. According to the above process, the error function f(D) can be described as f(D)=d1+d2+d3+...+dn, the minimum value of the error function is calculated according to the error function f(D), and the operator corresponding to the minimum value is The detection image is used as a more accurate operator for perspective transformation. It should be noted that the method for finding the error is not limited to the above example, and other methods for finding the error can also be used, such as mean square error, cross entropy or log-likelihood error, which will not be repeated here.
更进一步地,在计算总误差的过程中,在d1、d2、d3…dn中,还可以通过方差去掉一些偏差过大的值,通过控制d1、d2、d3…dn的离散程度,过滤掉特征点中差异较大的特征点,以使总误差尽可能地反应通过算子进行变换后的图像与检测图像的差异。Furthermore, in the process of calculating the total error, in d1, d2, d3...dn, some values with too large deviation can also be removed by the variance, and the characteristics can be filtered out by controlling the dispersion degree of d1, d2, d3...dn Feature points with large differences among the points, so that the total error reflects the difference between the image transformed by the operator and the detected image as much as possible.
在一个实施例中,所述通过文本识别模型提取所述检测图像上与所述第一 锚点文本一致的第二锚点文本的步骤之前,还包括:根据所述证件的证件类型,预先设置提取锚点文本的辅助匹配方式,其中,辅助匹配方式包括字符间距和/或字符之间的位置关系。In one embodiment, before the step of extracting the second anchor text on the detection image that is consistent with the first anchor text on the detection image through the text recognition model, the method further includes: presetting according to the certificate type of the certificate An auxiliary matching method for extracting anchor text, where the auxiliary matching method includes character spacing and/or positional relationship between characters.
具体地,针对不同的证件,可以为锚点定义必要的辅助匹配规则,以使后续在待检测样本中对锚点的寻找更准确,从而提高锚点识别和提取的效率。针对不同证件类型为锚点制定不同的辅助匹配规则,比如身份证的匹配规则和结婚证的匹配规则是不一样的。针对具体的证件类型制定对应的辅助匹配规则,一方面用于更精确地提取锚点,另一方面用于拓展输入图像在寻找锚点时的寻找范围,实现目标提取时的定位目标的范围。其中,对于锚点的辅助匹配规则,在提取锚点时由于文本识别能力的限制,有时需要加一些辅助逻辑来帮助寻找锚点,比如有时指定的锚点的文本内容在图片上字与字之间有时会隔得很开,这时候在输入图像里面就有可能会把该位置的内容识别成多个字段,这时候就不能直接和设置的锚点文本内容对应上。所以对于这种以及类似的情况,需要加一些例如字符间距和/或位置关系等辅助逻辑来进行锚点提取。比如,结婚证上的“持证人”字段中各个字之间存有较大间隔,会容易导致在待检测图像上寻找锚点时,通用文本识别模型将其识别为三个字段,这种情况下就需要定义一定的辅助匹配规则将识别出的三个字段拼接成一个字段以得到我们需要的“持证人”这个锚点。Specifically, for different documents, necessary auxiliary matching rules can be defined for anchor points, so that the subsequent finding of anchor points in the sample to be detected is more accurate, thereby improving the efficiency of anchor point identification and extraction. Different auxiliary matching rules are formulated for anchor points for different document types. For example, the matching rules of ID cards are different from the matching rules of marriage certificates. Corresponding auxiliary matching rules are formulated for specific document types. On the one hand, it is used to extract anchor points more accurately, and on the other hand, it is used to expand the search range of the input image when looking for anchor points, and realize the range of positioning targets when extracting targets. Among them, for the auxiliary matching rules of anchor points, due to the limitation of text recognition ability when extracting anchor points, it is sometimes necessary to add some auxiliary logic to help find anchor points. For example, sometimes the text content of the specified anchor point is in the picture. Sometimes it is very separated. At this time, the content of the position may be recognized as multiple fields in the input image. At this time, it cannot directly correspond to the set anchor text content. Therefore, for this and similar situations, some auxiliary logic such as character spacing and/or position relationship needs to be added to extract anchor points. For example, there is a large gap between each word in the "license holder" field on the marriage certificate, which will easily cause the general text recognition model to recognize it as three fields when looking for an anchor point on the image to be detected. In this case, it is necessary to define certain auxiliary matching rules to splice the three identified fields into one field to obtain the anchor point of the "license holder" we need.
在一个实施例中,所述预先设置提取锚点文本的辅助匹配方式的步骤包括:预先设置图形锚点以通过文字锚点和图形锚点结合提取特征点。In one embodiment, the step of pre-setting an auxiliary matching method for extracting anchor text includes: pre-setting graphic anchor points to extract feature points through a combination of text anchor points and graphic anchor points.
具体地,在一些证件中,可以拓展图像信息作为辅助匹配规则,结合图形锚点进行特征点的提取。由于锚点一般是文本信息,能提供的图像信息有限,在做后续特征点匹配时有可能会因为图像信息不足导致能提取出的特征点过少,从而影响后续透视变换的精确度。不过一些证件上其实是存在一些固定的图形的,这些图形能够提供大量特征点信息,但是一般文本识别模型是不能对这些非文本的图像进行检测的。这时候则需要通过对检测到的锚点信息加一些位置关系的辅助拓展,从而定位到这些固定位置的图形,则可以将这些图形也利用起来作为透视变换中特征点提取的锚点。比如,针对结婚证上的“持证人”字 段,持证人字段上方有固定的图形,可以通过“持证人”这个固定字段的位置定位出图形位置,将图形拓展为一个图形锚点,从而通过文字锚点和图形锚点的结合提取更多的特征点,再通过更多特征点的匹配,获取尽可能多的更准确的匹配特征点,以进行准确的透视变换。图形锚点可以通过与文字锚点的相对位置关系来确定其位置,而确定位置后其描述和位置锚点一样是描述一个矩形框的一条对角线的两个顶点即可,一般是左上和右下的顶点。确定图形锚点与文字锚点的相对位置关系的方法有多种,比如,可以通过采用尝试的方式来获取相对位置关系,也可以通过在模板图像上先把图形锚点的位置给标记出来,然后计算图形锚点和文字锚点的相对位置关系。Specifically, in some certificates, image information can be expanded as an auxiliary matching rule, and feature points can be extracted in combination with graphical anchor points. Since anchor points are generally text information, the image information that can be provided is limited. In the subsequent feature point matching, there may be too few feature points that can be extracted due to insufficient image information, which affects the accuracy of subsequent perspective transformation. However, some documents actually have some fixed graphics, which can provide a large amount of feature point information, but the general text recognition model cannot detect these non-text images. At this time, it is necessary to locate these fixed-position graphics by adding some auxiliary expansions to the detected anchor point information, and then these graphics can also be used as anchor points for feature point extraction in perspective transformation. For example, for the "license holder" field on the marriage certificate, there is a fixed graphic above the license holder field. The position of the graphic can be located through the fixed field of "license holder", and the graphic can be expanded into a graphic anchor point. In this way, more feature points are extracted through the combination of text anchor points and graphic anchor points, and then through more feature point matching, as many more accurate matching feature points as possible are obtained for accurate perspective transformation. The graphic anchor point can determine its position by the relative position relationship with the text anchor point. After the position is determined, its description is the same as the position anchor point, which describes the two vertices of a diagonal line of a rectangular box, generally the upper left and The vertex at the bottom right. There are many ways to determine the relative position relationship between the graphic anchor point and the text anchor point. For example, you can obtain the relative position relationship by trying, or you can mark the position of the graphic anchor point on the template image first. Then calculate the relative position relationship between the graphic anchor point and the text anchor point.
在一个实施例中,所述根据通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取的步骤之后,还包括:根据第二预设方式,对识别出的所述文本进行过滤以得到所述检测图像的目标文本。In an embodiment, after the step of performing text recognition on the text at the projection position on the perspective transformed image according to the text recognition model, and extracting the recognized text, the method further includes: The second preset method is to filter the recognized text to obtain the target text of the detection image.
其中,所述第二预设方式是指预先设置的文本过滤逻辑,所述文本过滤逻辑包括文本内容的类型、文本内容所在位置的位置逻辑及文本内容的长度限制。Wherein, the second preset mode refers to a preset text filtering logic, and the text filtering logic includes the type of the text content, the location logic of the location of the text content, and the length limitation of the text content.
具体地,由于通用的文本识别模型通常直接对整张图片进行检测和识别,根据文本识别模型的性能和训练方式的不同,对不同证件类型的文本,通过文本识别模型识别出来的文本与证件上的文本内容有可能出现不同程度的误差,例如识别出来的文本包含的字段里混进了非预期的字段,或者是将本来应该识别为一个字段的内容识别成了多个字段,或者由于文本识别模型的位置逻辑原因,把位置靠后但是稍微向上偏了一点的字段识别到了前面,而位置靠前的字段识别到了后面,这时如果直接把识别出的文本内容作为最后的识别结果通常是很粗糙而不准确的。由于针对文本识别模型识别出来的文本结果不能保证百分百精确,针对上述可能出现的情况,所以为了提高提取内容的准确性,可以根据各种证件类型的实际特点为提取的内容指定一些过滤逻辑,也即需要针对不同的证件制定预设过滤规则,以对文本识别模型识别并提取出的文本内容进行进一步的精确过滤,例如制定识别内容的类型(比如纯数字或者数字+英文等)、位置逻辑及长度限制等,以使提取结果尽量接近预期,从而使最终的提取文本 更精确。对于提取内容的制定过滤规则以实现对提取内容的过滤,可以很容易的满足客户对不同证件的不同定制化需求,可以弥补由用户仅标记位置所可能产生的不够精确地提取结果,进一步满足客户的需求。Specifically, since the general text recognition model usually directly detects and recognizes the entire picture, according to the performance of the text recognition model and the different training methods, for the text of different certificate types, the text and the certificate recognized by the text recognition model The content of the text may have different degrees of error. For example, the recognized text contains fields that are mixed with unexpected fields, or the content that should be recognized as a field is recognized as multiple fields, or due to text recognition For the logical reason of the position of the model, the field located at the back but slightly upward is recognized to the front, and the field located at the front is recognized to the back. At this time, it is usually very difficult to directly use the recognized text content as the final recognition result. Rough and inaccurate. Because the text results recognized by the text recognition model cannot be guaranteed to be 100% accurate, for the above possible situations, in order to improve the accuracy of the extracted content, you can specify some filtering logic for the extracted content according to the actual characteristics of various types of documents , That is, it is necessary to formulate preset filtering rules for different documents to further accurately filter the text content recognized and extracted by the text recognition model, such as formulating the type of recognition content (such as pure numbers or numbers + English, etc.) and location Logic and length restrictions, etc., to make the extraction results as close to expectations as possible, so that the final extracted text is more accurate. Formulating filtering rules for the extracted content to filter the extracted content can easily meet the different customized needs of customers for different certificates, and can make up for the inaccurate extraction results that may be generated by the user only marking the location, and further satisfy customers Demand.
因此,为了更精确地提取目标文本,可以在定义提取字段时加入少量定制逻辑用以辅助,从而提高精确度,所述根据第二预设方式,对识别出的所述文本进行过滤以得到所述检测图像的目标文本的步骤包括以下步骤:根据预先制定的所述目标文本的辅助提取逻辑,对通过文本识别模型提取得到的所述目标文本的内容进行过滤,以得到符合所述辅助提取逻辑对应规则的目标文本作为最终的证件中的提取文本。Therefore, in order to extract the target text more accurately, a small amount of custom logic can be added to assist when defining the extraction field, thereby improving the accuracy. According to the second preset method, the recognized text is filtered to obtain all the text. The step of detecting the target text of the image includes the following steps: filtering the content of the target text extracted by the text recognition model according to the auxiliary extraction logic of the target text formulated in advance, so as to obtain conformity with the auxiliary extraction logic The target text of the corresponding rule is used as the extracted text in the final certificate.
具体地,为不同的内容制定辅助的提取逻辑,也就是对提取的内容制定过滤规则,以针对不同字段内容实现更加准确的提取,根据制定的辅助提取逻辑,对提取得到的文本内容进行进一步的过滤,得到符合所制定的逻辑规则的文本。Specifically, formulating auxiliary extraction logic for different content, that is, formulating filtering rules for the extracted content, to achieve more accurate extraction of the content of different fields, according to the formulated auxiliary extraction logic, the extracted text content is further processed Filter to get the text that meets the established logic rules.
进一步地,由于对锚点可以预先定义辅助匹配方式,也即对锚点的识别进行辅助提取逻辑的制定,可以将对锚点定义辅助匹配方式和对所述目标文本的辅助提取逻辑结合起来,对不同证件通过定义锚点和目标框,初步确定用来做透视变换的固定字段锚点和客户最后需要提取的目标信息的位置,结合对锚点和目标信息定制的各自的提取逻辑,从而对锚点和对目标内容的提取更加精细化,可以通过锚点的辅助匹配方式获得尽可能准确的锚点信息,以将检测图像做出尽可能准确的透视变换,并在准确透视变换的基础上,再通过对目标文本的过滤逻辑尽可能提取到精确的目标文本,可以避免仅标记位置所可能产生的不够精确地提取结果,从而实现通过定制模板与辅助逻辑的结合,既避免了对每种证件的不同提取需求完全自定义逻辑所带来的人力和时间成本消耗,另一方面也避免了太过通用的逻辑带来的提取不够精确地问题。Further, since the auxiliary matching method can be pre-defined for the anchor point, that is, the auxiliary extraction logic is formulated for the recognition of the anchor point, the auxiliary matching method for the anchor point definition can be combined with the auxiliary extraction logic for the target text, Define anchor points and target boxes for different documents, initially determine the fixed field anchor points used for perspective transformation and the location of the target information that the customer needs to extract finally, and combine the respective extraction logics customized for anchor points and target information, so as to The extraction of anchor points and target content is more refined, and the anchor point information can be obtained as accurate as possible through the auxiliary matching method of anchor points, so as to make the detection image as accurate as possible perspective transformation, and on the basis of accurate perspective transformation , And then extract the exact target text as much as possible through the filtering logic of the target text, which can avoid the inaccurate extraction results that may be generated by only marking the position, so as to achieve the combination of customized templates and auxiliary logic, which avoids The different extraction requirements of certificates bring about the labor and time cost of completely custom logic. On the other hand, it also avoids the problem of insufficient extraction caused by too general logic.
以下将通过两个具体实施例来描述本申请实施例的上述方案:The above solutions of the embodiments of the present application will be described in the following two specific embodiments:
在一个实施例中,请参阅图10,图10包括图10(a)至图10(i),图10为本申请实施例提供的证件中的目标文本提取方法中一个实施例的图形变换示意图,在该实施例中,具体实现过程包括以下步骤:In one embodiment, please refer to FIG. 10, which includes FIG. 10(a) to FIG. 10(i). FIG. 10 is a schematic diagram of graph transformation of an embodiment of the method for extracting target text in a certificate provided by an embodiment of this application In this embodiment, the specific implementation process includes the following steps:
1.01)用户选择一张图作为模板图像,并在模板图像上框选出固定不变的 字段,下文称为锚点,请参阅图10(a)中实线选框标注的字段为锚点,是通过这些区域进行透视变换算子的计算;1.01) The user selects a picture as the template image, and selects fixed fields on the template image, which are called anchor points below. Please refer to the field marked by the solid line selection box in Figure 10(a) as anchor points. The calculation of the perspective transformation operator is performed through these areas;
1.02)用户在模板图像上框选出希望提取文本识别结果的区域,下文称为目标框,请参阅图10(a)中虚线选框标注的位置为目标框位置,是要在这些区域提取文本;1.02) The user selects the area on the template image that he wants to extract the text recognition result, which is called the target box hereinafter. Please refer to the position marked by the dotted box in Figure 10(a) as the target box position, which is to extract the text in these areas ;
1.03)文本识别模型对用户选择的锚点区域进行识别,获得锚点区域的内容信息,请参阅图10(b);1.03) The text recognition model recognizes the anchor area selected by the user, and obtains the content information of the anchor area, please refer to Figure 10(b);
1.04)用户输入用于提取目标文本的检测图像;1.04) The user inputs the detection image used to extract the target text;
1.05)文本识别模型对检测图像进行全文识别,通过全文识别找出与用户所选锚点文本内容匹配的区域,也即找出包含用户所选锚点文字内容的区域,请参阅图10(c);1.05) The text recognition model performs full text recognition on the detected image, and finds the area matching the text content of the anchor point selected by the user through full text recognition, that is, finds the area containing the text content of the anchor point selected by the user, see Figure 10(c );
1.06)对模板图像和检测图像匹配上的锚点区域进行特征点的提取和匹配,从而求出将检测图像变为模板图像视角的第一透视变换算子,请参阅图10(d);1.06) Perform feature point extraction and matching on the anchor point area on the template image and the detection image matching, so as to find the first perspective transformation operator that turns the detection image into the template image perspective, see Figure 10(d);
1.07)对检测图像进行透视变换得到透视后的第一透视变换图像,请参阅图10(e);1.07) Perform perspective transformation on the detected image to obtain the first perspective transformation image after perspective, please refer to Figure 10(e);
1.08)因为特征点匹配过程中是有可能存在一定误差的,从而导致求出的透视变换算子不一定完全标准,所以变换后的第一透视变换图像可能依然与模板图像存在一定的视角变差,所以接下来不是将目标框位置完全不变地直接映射到变换后的第一透视变换图像上,而是找到一个模板图像与变换后的第一透视变换图像之间的第二透视变换算子将目标框通过透视变换投影到变换后的第二透视变换图像上,所以首先再在变换后的第一透视变换图像上检测与模板图像锚点文本匹配的区域,请参阅图10(f);1.08) Because there may be a certain error in the feature point matching process, the perspective transformation operator obtained may not be completely standard, so the transformed first perspective transformation image may still have a certain perspective difference from the template image , So the next step is not to directly map the position of the target frame to the transformed first perspective transformation image, but to find a second perspective transformation operator between the template image and the transformed first perspective transformation image Project the target frame onto the transformed second perspective transformed image through perspective transformation, so first detect the area matching the anchor text of the template image on the transformed first perspective transformed image, see Figure 10(f);
1.09)对变换后的第一透视变换图像和模板图像进行特征点提取和匹配,请参阅图10(g),求得所述第一透视变换图像到模板图像视角的第二透视变换算子;1.09) Perform feature point extraction and matching on the transformed first perspective transformed image and the template image. Please refer to Figure 10(g) to obtain the second perspective transformation operator from the perspective of the first perspective transformed image to the template image;
1.10)将模板图像的上标记的目标框通过第二透视变换算子经过透视变换投影到检测图像对应的第二透视变换图像上,请参阅图10(h),需要说明的是,本申请实施例中,可以看出在检测图像的变换后的第二透视变换图像上,住所 的框并没有框住住所部分的全部内容,这是因为用户在模板图像上标框是只标了那个区域,所以投影后也会只有那一小块区域,可以通过对样本的尝试进行调整目标框框住的范围,或者直接设置尽可能大的范围以使目标框框住所有的内容;1.10) The target frame marked on the template image is projected onto the second perspective transformation image corresponding to the detection image through the second perspective transformation operator through the perspective transformation. Please refer to Figure 10(h). It should be noted that the implementation of this application In the example, it can be seen that on the second perspective transformed image after the transformation of the detection image, the frame of the residence does not frame the entire content of the part of the residence. This is because the user marked only that area on the template image. Therefore, there will only be a small area after projection. You can adjust the range of the target frame by trying the sample, or directly set the largest possible range to make the target frame all the content;
1.11)文本识别对目标框的内容进行识别,请参阅图10(i)。1.11) Text recognition recognizes the content of the target box, please refer to Figure 10(i).
在另一个实施例中,请参阅图11,图11包括图11(a)至图11(i),图11(a)至图11(i)为本申请实施例提供的证件中的目标文本提取方法中另一个实施例的图形变换示意图,具体实现过程包括以下步骤:In another embodiment, please refer to Figure 11. Figure 11 includes Figure 11 (a) to Figure 11 (i), Figure 11 (a) to Figure 11 (i) are the target text in the certificate provided by the embodiment of the application A schematic diagram of graph transformation in another embodiment of the extraction method. The specific implementation process includes the following steps:
2.01)选择一张图作为模板图像,并在此模板图像上指定(设置)固定不变字段的位置和文本内容,下文称为锚点,请参阅图11(a)实线选框框住的部分;2.01) Select a picture as the template image, and specify (set) the position and text content of the fixed field on the template image, which is called anchor point below, please refer to Figure 11(a) the part enclosed by the solid line selection box ;
2.02)定制寻找锚点的辅助逻辑,也就是实线框框住部分的辅助逻辑;2.02) Customize the auxiliary logic for finding anchor points, that is, the auxiliary logic for the part enclosed by the solid line frame;
2.03)指定想要提取的文本识别结果所包含的区域,下文称为目标框,请参阅图11(a)虚线选框;2.03) Specify the area contained in the text recognition result you want to extract, hereinafter referred to as the target box, please refer to the dotted line selection box in Figure 11(a);
2.04)为目标框定制文本提取的过滤逻辑;2.04) Customize the filtering logic of text extraction for the target box;
2.05)用户输入检测图像;2.05) User input detection image;
2.06)文本识别模型对检测图像进行全文识别,找出包含指定锚点文本内容的区域,请参阅图11(b);2.06) The text recognition model recognizes the full text of the detected image and finds the area containing the text content of the specified anchor point, please refer to Figure 11(b);
2.07)对模板图像和检测图像匹配上的锚点区域进行特征点的提取和匹配请参阅图11(c),从而求出将检测图像变为模板图像视角的第一透视变换算子;2.07) To extract and match the feature points of the anchor point area on the template image and the detection image matching, please refer to Figure 11(c), so as to find the first perspective transformation operator that turns the detection image into the template image perspective;
2.08)对检测图像采用所述第一透视变换算子进行透视变换以得到第一透视变换图像,透视变换后的图像如图11(d)所示;2.08) Use the first perspective transformation operator to perform perspective transformation on the detected image to obtain the first perspective transformation image. The image after the perspective transformation is shown in Figure 11(d);
2.09)同样因为特征点匹配过程中是有可能存在一定误差的,从而导致求出的第一透视变换算子不一定完全标准,所以变换后的第一透视变换图像可能依然与模板图像存在一定的视角变差,所以接下来不是将目标框位置完全不变地直接映射到变换后的第一透视变换图像上,而是找到一个模板图像与变换后的第一透视变换图像之间的第二透视变换算子将目标框通过所述第二透视变换算子经过透视变换投影到变换后的第二透视变换图像上,所以首先再在变换后 的第一透视变换图像上检测与模板图像锚点文本匹配的区域,请参阅图11(e);2.09) Also, because there may be certain errors in the matching process of feature points, the first perspective transformation operator obtained may not be completely standard, so the transformed first perspective transformation image may still have a certain difference with the template image. The perspective becomes worse, so the next step is not to directly map the target frame position to the transformed first perspective transformed image, but to find a second perspective between the template image and the transformed first perspective transformed image. The transformation operator projects the target frame onto the transformed second perspective transformation image through the second perspective transformation operator through the perspective transformation, so firstly, the anchor text of the template image is detected on the transformed first perspective transformation image. For the matching area, please refer to Figure 11(e);
2.10)对变换后的第一透视变换图像和模板图像进行特征点提取和匹配,请参阅图11(f),求得第一透视变换图像到模板图像视角的第二透视变换算子;2.10) Perform feature point extraction and matching on the transformed first perspective transformed image and the template image. Please refer to Figure 11(f) to obtain the second perspective transformation operator from the perspective of the first perspective transformed image to the template image;
2.11)将模板图像的上标记的目标框通过第二透视变换算子经过透视变换投影到变换后的第二透视变换图像上,请参阅图11(g)中的虚线框框住的区域;2.11) The target frame marked on the template image is projected onto the transformed second perspective transformed image through the second perspective transformation operator through the perspective transformation, please refer to the area enclosed by the dashed frame in Figure 11(g);
2.12)文本识别对目标框的内容进行识别,请参阅图11(h)。2.12) Text recognition recognizes the content of the target box, please refer to Figure 11(h).
需要说明的是,虽然图11(g)中登记日期所在的目标框没有完全框住“X5X5X5”所有的内容,但由于配置有辅助逻辑,同样认为完整的“X5X5X5”内容是属于目标区域的。It should be noted that although the target frame where the registration date is located in Fig. 11(g) does not completely frame all the contents of "X5X5X5", due to the configuration of auxiliary logic, the complete "X5X5X5" content is also considered to belong to the target area.
2.13)根据之前制定的过滤规则对识别内容进行过滤,请参阅图11(i)。2.13) The identification content is filtered according to the previously formulated filtering rules, please refer to Figure 11(i).
需要说明的是,上述各个实施例所述的证件中的目标文本提取方法,可以根据需要将不同实施例中包含的技术特征重新进行组合,以获取组合后的实施方案,但都在本申请要求的保护范围之内。It should be noted that the method for extracting the target text in the certificate described in each of the above embodiments can recombine the technical features contained in the different embodiments as needed to obtain the combined implementation plan, but they are all required by this application. Within the scope of protection.
请参阅图12,图12为本申请实施例提供的证件中的目标文本提取装置的示意性框图。对应于上述证件中的目标文本提取方法,本申请实施例还提供一种证件中的目标文本提取装置。如图12所示,该证件中的目标文本提取装置包括用于执行上述证件中的目标文本提取方法的单元,该装置可以被配置于台式机电脑等计算机设备中。具体地,请参阅图12,该证件中的目标文本提取装置1200包括第一获取单元1201、第二获取单元1202、求解单元1203、变换单元1204、投影单元1205及识别单元1206。其中,第一获取单元1201,用于获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上标注有文字锚点和目标框位置,其中,所述文字锚点为在所述模板图像上标注的固定字段,所述文字锚点包括第一锚点文本,所述第一锚点文本为所述固定字段的内容,所述目标框位置为在所述模板图像上标注的证件上需要提取的所述目标文本所在的位置;第二获取单元1202,用于根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系,其中,锚点位置为所述第一锚点文本在对应图像上的 位置;求解单元1203,用于根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子;变换单元1204,用于将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像;投影单元1205,用于通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置;识别单元1206,用于通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。Please refer to FIG. 12, which is a schematic block diagram of a device for extracting target text in a certificate according to an embodiment of the application. Corresponding to the aforementioned method for extracting target text in a certificate, an embodiment of the present application also provides a device for extracting target text in a certificate. As shown in FIG. 12, the device for extracting target text in the certificate includes a unit for executing the method for extracting target text in the certificate. The device can be configured in computer equipment such as a desktop computer. Specifically, referring to FIG. 12, the target text extraction device 1200 in the certificate includes a first obtaining unit 1201, a second obtaining unit 1202, a solving unit 1203, a transforming unit 1204, a projection unit 1205, and a recognition unit 1206. Wherein, the first obtaining unit 1201 is configured to obtain a template image belonging to the same certificate type and a detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor A point is a fixed field marked on the template image, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is in the template The location of the target text that needs to be extracted on the document marked on the image; the second acquiring unit 1202 is configured to acquire the first preset method according to the first anchor text and based on the text recognition model. The feature point matching relationship between the anchor point position of the anchor point text on the template image and the feature point contained in the anchor point position of the first anchor point text on the detection image, wherein the anchor point position is The position of the first anchor point text on the corresponding image; a solving unit 1203, configured to perform a solution based on the feature point matching relationship through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image; transformation The unit 1204 is configured to perform perspective transformation on the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image; the projection unit 1205 is configured to obtain the perspective transformation operator through the perspective transformation operator. The projection position of the target frame position on the perspective transformation image; the recognition unit 1206 is configured to perform text recognition on the text at the projection position on the perspective transformation image through the text recognition model, and perform text recognition on the recognized text Extraction is performed to obtain the target text of the detected image.
在一个实施例中,所述文字锚点还包括第一锚点位置,所述第二获取单元1202包括:第一提取子单元,用于通过文本识别模型提取所述检测图像上与所述第一锚点文本一致的第二锚点文本;第一得到子单元,用于基于所述文本识别模型通过所述第二锚点文本得到所述检测图像上与所述第一锚点位置相对应的第二锚点位置;第二提取子单元,用于基于预设的特征点提取算法提取所述第一锚点位置包含的第一特征点集合和所述第二锚点位置包含的第二特征点集合;第一获取子单元,用于根据所述第一特征点集合和所述第二特征点集合,基于特征点匹配算法获取所述第一特征点集合和所述第二特征点集合中的特征点之间的第一特征点匹配关系;所述求解单元1203,用于根据所述第一特征点匹配关系,通过变换矩阵进行求解以计算出将所述检测图像进行透视变换的第一透视变换算子;所述变换单元1204,用于将所述检测图像通过所述第一透视变换算子进行透视变换以得到与所述模板图像视角相符的第一透视变换图像。In one embodiment, the text anchor point further includes a first anchor point position, and the second acquiring unit 1202 includes: a first extracting subunit, configured to extract the first anchor point on the detected image through a text recognition model. A second anchor point text that is consistent with the anchor point text; a first obtaining subunit for obtaining, based on the text recognition model, through the second anchor point text on the detection image corresponding to the first anchor point position The second anchor point position; the second extraction subunit is used to extract the first feature point set contained in the first anchor point location and the second feature point set contained in the second anchor point location based on the preset feature point extraction algorithm Feature point set; a first obtaining subunit, configured to obtain the first feature point set and the second feature point set based on a feature point matching algorithm according to the first feature point set and the second feature point set The first feature point matching relationship between the feature points in the; the solving unit 1203 is configured to solve the first feature point matching relationship through the transformation matrix according to the first feature point matching relationship to calculate the first feature point of the detected image for perspective transformation A perspective transformation operator; the transformation unit 1204 is configured to perform perspective transformation on the detected image through the first perspective transformation operator to obtain a first perspective transformation image that matches the perspective of the template image.
在一个实施例中,所述第二获取单元1202包括还包括:第二获取子单元,用于将所述第一透视变换图像输入所述文本识别模型,通过所述第一锚点文本获取所述第一透视变换图像上与所述第一锚点位置相对应的第三锚点位置;第三提取子单元,用于基于所述特征点提取算法提取所述第三锚点位置包含的第三特征点集合;第三获取子单元,用于根据所述第一特征点集合和所述第三特征点集合,基于所述特征点匹配算法获取所述第一特征点集合和所述第三特征点集合中的特征点之间的第二特征点匹配关系;第一求解子单元,用于根据所述第二特征点匹配关系,通过所述变换矩阵进行求解以计算出将所述第一透视变换图像进行透视变换的第二透视变换算子;所述变换单元1204,用于将所述 第一透视变换图像通过所述第二透视变换算子进行透视变换以得到第二透视变换图像;所述投影单元1205,用于通过所述第二透视变换算子计算出所述目标框位置在所述第二透视变换图像上的投影位置;所述识别单元1206,用于将所述第二透视变换图像输入所述文本识别模型,通过所述文本识别模型对所述第二透视变换图像上的投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。In one embodiment, the second obtaining unit 1202 further includes: a second obtaining subunit, configured to input the first perspective transformation image into the text recognition model, and obtain all the images through the first anchor text. A third anchor point position corresponding to the first anchor point position on the first perspective transformed image; a third extraction subunit for extracting the first anchor point contained in the third anchor point position based on the feature point extraction algorithm Three feature point sets; a third acquisition subunit, configured to acquire the first feature point set and the third feature point set based on the feature point matching algorithm according to the first feature point set and the third feature point set The second feature point matching relationship between the feature points in the feature point set; the first solving subunit is used to solve the second feature point matching relationship through the transformation matrix to calculate the first A second perspective transformation operator for performing perspective transformation on a perspective transformation image; the transformation unit 1204 is configured to perform perspective transformation on the first perspective transformation image through the second perspective transformation operator to obtain a second perspective transformation image; The projection unit 1205 is configured to calculate the projection position of the target frame position on the second perspective transformed image through the second perspective transformation operator; the recognition unit 1206 is configured to convert the second perspective transformation The perspective transformation image is input to the text recognition model, and the text at the projection position on the second perspective transformation image is text recognized through the text recognition model, and the recognized text is extracted to obtain the target of the detection image text.
在一个实施例中,所述求解单元1203包括:第二求解子单元,用于利用每四对特征点之间的匹配关系通过所述变换矩阵进行求解以得到一个透视变换算子;重复子单元,用于根据所述第一特征点集合和所述第二特征点集合中的所有匹配特征点中每四对特征点之间的组合,重复上述通过每四对特征点获得一个透视变换算子的过程,得到多个透视变换算子,并将所述多个透视变换算子组成集合作为透视变换算子集合;第二得到子单元,用于根据预先构建的透视变换算子的误差函数,通过求极限的方式得到所述误差函数中最小值对应的所述透视变换算子集合中的透视变换算子作为所述第一透视变换算子。In an embodiment, the solving unit 1203 includes: a second solving subunit, configured to use the matching relationship between each four pairs of feature points to solve through the transformation matrix to obtain a perspective transformation operator; and a repeating subunit , For repeating the above-mentioned method of obtaining a perspective transformation operator for every four pairs of feature points according to the combination of every four pairs of feature points in the first feature point set and all the matching feature points in the second feature point set In the process of obtaining multiple perspective transformation operators, and forming a set of the multiple perspective transformation operators as a perspective transformation operator set; the second obtaining subunit is used to obtain a subunit according to the error function of the pre-built perspective transformation operator, The perspective transformation operator in the perspective transformation operator set corresponding to the minimum value in the error function is obtained by seeking the limit as the first perspective transformation operator.
在一个实施例中,所述第二获取单元1202还包括:设置子单元,用于根据所述证件的证件类型,预先设置提取锚点文本的辅助匹配方式。In one embodiment, the second acquiring unit 1202 further includes: a setting subunit, configured to preset an auxiliary matching method for extracting anchor text according to the certificate type of the certificate.
在一个实施例中,所述设置子单元,用于预先设置图形锚点以通过文字锚点和图形锚点结合提取特征点。In one embodiment, the setting subunit is used to preset graphic anchor points to extract feature points through the combination of text anchor points and graphic anchor points.
在一个实施例中,其中,所述辅助匹配方式包括字符间距和/或字符之间的位置关系。In an embodiment, wherein the auxiliary matching manner includes character spacing and/or positional relationship between characters.
在一个实施例中,所述证件中的目标文本提取装置1200还包括:过滤单元,用于根据第二预设方式,对识别出的所述文本进行过滤以得到所述检测图像的目标文本。In an embodiment, the device 1200 for extracting the target text in the certificate further includes: a filtering unit, configured to filter the recognized text according to a second preset manner to obtain the target text of the detection image.
需要说明的是,所属领域的技术人员可以清楚地了解到,上述证件中的目标文本提取装置和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。It should be noted that those skilled in the art can clearly understand that the specific implementation process of the target text extraction device and each unit in the above certificate can be referred to the corresponding description in the foregoing method embodiment. For the convenience and conciseness of the description, I won't repeat them here.
同时,上述证件中的目标文本提取装置中各个单元的划分和连接方式仅用于举例说明,在其他实施例中,可将证件中的目标文本提取装置按照需要划分 为不同的单元,也可将证件中的目标文本提取装置中各单元采取不同的连接顺序和方式,以完成上述证件中的目标文本提取装置的全部或部分功能。At the same time, the division and connection of the various units in the target text extraction device in the certificate are only for illustration. In other embodiments, the target text extraction device in the certificate can be divided into different units as needed, or the target text extraction device in the certificate can be divided into different units as needed. Each unit in the target text extraction device in the certificate adopts different connection sequences and methods to complete all or part of the functions of the target text extraction device in the above-mentioned certificate.
上述证件中的目标文本提取装置可以实现为一种计算机程序的形式,该计算机程序可以在如图13所示的计算机设备上运行。The device for extracting the target text in the certificate can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 13.
请参阅图13,图13是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备1300可以是台式机电脑或者服务器等计算机设备,也可以是其他设备中的组件或者部件。Please refer to FIG. 13, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 1300 may be a computer device such as a desktop computer or a server, or may be a component or component in other devices.
参阅图13,该计算机设备1300包括通过系统总线1301连接的处理器1302、存储器和网络接口1305,其中,存储器可以包括非易失性存储介质1303和内存储器1304。Referring to FIG. 13, the computer device 1300 includes a processor 1302, a memory, and a network interface 1305 connected through a system bus 1301, where the memory may include a non-volatile storage medium 1303 and an internal memory 1304.
该非易失性存储介质1303可存储操作系统13031和计算机程序13032。该计算机程序13032被执行时,可使得处理器1302执行一种上述证件中的目标文本提取方法。该处理器1302用于提供计算和控制能力,以支撑整个计算机设备1300的运行。The non-volatile storage medium 1303 can store an operating system 13031 and a computer program 13032. When the computer program 13032 is executed, the processor 1302 can execute a method for extracting the target text in the certificate. The processor 1302 is used to provide computing and control capabilities to support the operation of the entire computer device 1300.
该内存储器1304为非易失性存储介质1303中的计算机程序13032的运行提供环境,该计算机程序13032被处理器1302执行时,可使得处理器1302执行一种上述证件中的目标文本提取方法。The internal memory 1304 provides an environment for the operation of the computer program 13032 in the non-volatile storage medium 1303. When the computer program 13032 is executed by the processor 1302, the processor 1302 can make the processor 1302 execute a method for extracting the target text in the certificate.
该网络接口1305用于与其它设备进行网络通信。本领域技术人员可以理解,图13中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备1300的限定,具体的计算机设备1300可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图13所示实施例一致,在此不再赘述。The network interface 1305 is used for network communication with other devices. Those skilled in the art can understand that the structure shown in FIG. 13 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 1300 to which the solution of the present application is applied. The specific computer device The 1300 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement. For example, in some embodiments, the computer device may only include a memory and a processor. In such embodiments, the structures and functions of the memory and the processor are the same as those of the embodiment shown in FIG. 13 and will not be repeated here.
其中,所述处理器1302用于运行存储在存储器中的计算机程序13032,以实现本申请实施例的证件中的目标文本提取方法。Wherein, the processor 1302 is configured to run a computer program 13032 stored in the memory, so as to implement the method for extracting the target text in the certificate in the embodiment of the present application.
应当理解,在本申请实施例中,处理器1302可以是中央处理单元(Central Processing Unit,CPU),该处理器1302还可以是其他通用处理器、数字信号 处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in this embodiment of the application, the processor 1302 may be a central processing unit (Central Processing Unit, CPU), and the processor 1302 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来完成,该计算机程序可存储于一计算机可读存储介质。该计算机程序被该计算机系统中的至少一个处理器执行,以实现上述证件中的目标文本提取方法的实施例的步骤。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by a computer program, and the computer program can be stored in a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the steps of the embodiment of the method for extracting target text in the certificate.
因此,本申请实施例还提供一种计算机可读存储介质。该存储介质存储有计算机程序,其中,计算机程序被处理器执行时使处理器执行以上各实施例中所描述的证件中的目标文本提取方法的步骤。Therefore, the embodiment of the present application also provides a computer-readable storage medium. The storage medium stores a computer program. When the computer program is executed by the processor, the processor executes the steps of the method for extracting target text in the certificate described in the above embodiments.
所述存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质。The storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk, etc., which can store program codes. medium.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of both, in order to clearly illustrate the hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described in accordance with the function. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
以上所述,仅为本申请的具体实施方式,但本申请明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific implementations of this application, but the scope of protection stated in this application is not limited to this. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (20)

  1. 一种证件中的目标文本提取方法,包括:A method for extracting target text in a certificate, including:
    获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上标注有文字锚点和目标框位置,其中,所述文字锚点为在所述模板图像上标注的固定字段,所述文字锚点包括第一锚点文本,所述第一锚点文本为所述固定字段的内容,所述目标框位置为在所述模板图像上标注的证件上需要提取的所述目标文本所在的位置;Acquire a template image belonging to the same certificate type and a detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor point is marked on the template image A fixed field, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is all that needs to be extracted on the certificate marked on the template image State the location of the target text;
    根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系,其中,锚点位置为所述第一锚点文本在对应图像上的位置;According to the first anchor point text and based on the text recognition model, the anchor point position of the first anchor point text on the template image and the detection of the first anchor point text on the template image are obtained in a first preset manner. The feature point matching relationship between the feature points contained in each anchor point position on the image, wherein the anchor point position is the position of the first anchor point text on the corresponding image;
    根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子;According to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image;
    将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像;Performing perspective transformation on the detection image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image;
    通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置;Obtaining the projection position of the target frame position on the perspective transformed image by the perspective transformation operator;
    通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。Perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.
  2. 根据权利要求1所述证件中的目标文本提取方法,其中,所述文字锚点还包括第一锚点位置,所述根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系的步骤包括:The method for extracting target text in a certificate according to claim 1, wherein the text anchor point further includes a first anchor point position, and the text is based on the first anchor point text and based on a text recognition model through a first preset Way to obtain the feature point matching relationship between the anchor point position of the first anchor point text on the template image and the anchor point position of the first anchor point text on the detection image The steps include:
    通过文本识别模型提取所述检测图像上与所述第一锚点文本一致的第二锚点文本;Extracting a second anchor point text on the detection image that is consistent with the first anchor point text through a text recognition model;
    基于所述文本识别模型通过所述第二锚点文本得到所述检测图像上与所述第一锚点位置相对应的第二锚点位置;Obtaining a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model;
    基于预设的特征点提取算法提取所述第一锚点位置包含的第一特征点集合和所述第二锚点位置包含的第二特征点集合;Extracting a first feature point set included in the first anchor point location and a second feature point set included in the second anchor point location based on a preset feature point extraction algorithm;
    根据所述第一特征点集合和所述第二特征点集合,基于特征点匹配算法获取所述第一特征点集合和所述第二特征点集合中的特征点之间的第一特征点匹配关系;According to the first feature point set and the second feature point set, the first feature point matching between the feature points in the first feature point set and the second feature point set is obtained based on a feature point matching algorithm relationship;
    所述根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子的步骤包括:The step of obtaining a perspective transformation operator for performing perspective transformation on the detected image by solving through a transformation matrix according to the feature point matching relationship includes:
    根据所述第一特征点匹配关系,通过变换矩阵进行求解以计算出将所述检测图像进行透视变换的第一透视变换算子;According to the first feature point matching relationship, solving through a transformation matrix to calculate a first perspective transformation operator that performs perspective transformation on the detected image;
    所述将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像的步骤包括:The step of performing perspective transformation of the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image includes:
    将所述检测图像通过所述第一透视变换算子进行透视变换以得到与所述模板图像视角相符的第一透视变换图像。The detection image is subjected to perspective transformation by the first perspective transformation operator to obtain a first perspective transformation image that matches the angle of view of the template image.
  3. 根据权利要求2所述证件中的目标文本提取方法,其中,所述将所述检测图像通过所述第一透视变换算子进行透视变换以得到与所述模板图像视角相符的第一透视变换图像的步骤之后,还包括:2. The method for extracting target text in a certificate according to claim 2, wherein the detection image is subjected to perspective transformation through the first perspective transformation operator to obtain a first perspective transformation image that matches the perspective of the template image After the steps, it also includes:
    将所述第一透视变换图像输入所述文本识别模型,通过所述第一锚点文本获取所述第一透视变换图像上与所述第一锚点位置相对应的第三锚点位置;Inputting the first perspective transformation image into the text recognition model, and obtaining a third anchor point position corresponding to the first anchor point position on the first perspective transformation image through the first anchor point text;
    基于所述特征点提取算法提取所述第三锚点位置包含的第三特征点集合;Extracting a third set of feature points contained in the third anchor point location based on the feature point extraction algorithm;
    根据所述第一特征点集合和所述第三特征点集合,基于所述特征点匹配算法获取所述第一特征点集合和所述第三特征点集合中的特征点之间的第二特征点匹配关系;According to the first feature point set and the third feature point set, the second feature between the feature points in the first feature point set and the third feature point set is acquired based on the feature point matching algorithm Point matching relationship;
    根据所述第二特征点匹配关系,通过所述变换矩阵进行求解以计算出将所述第一透视变换图像进行透视变换的第二透视变换算子;According to the second feature point matching relationship, solving through the transformation matrix to calculate a second perspective transformation operator that performs perspective transformation on the first perspective transformation image;
    将所述第一透视变换图像通过所述第二透视变换算子进行透视变换以得到第二透视变换图像;Performing perspective transformation on the first perspective transformation image through the second perspective transformation operator to obtain a second perspective transformation image;
    所述通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置的步骤包括:The step of obtaining the projection position of the target frame position on the perspective transformed image by the perspective transformation operator includes:
    通过所述第二透视变换算子计算出所述目标框位置在所述第二透视变换图像上的投影位置;Calculating the projection position of the target frame position on the second perspective transformed image by using the second perspective transformation operator;
    所述通过所述文本识别模型对所述透视变换图像上的投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本的步骤包括:The step of performing text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extracting the recognized text to obtain the target text of the detection image includes:
    将所述第二透视变换图像输入所述文本识别模型,通过所述文本识别模型对所述第二透视变换图像上的投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。The second perspective transformed image is input into the text recognition model, and the text at the projection position on the second perspective transformed image is text recognized through the text recognition model, and the recognized text is extracted to obtain the Describe the target text of the detected image.
  4. 根据权利要求2所述证件中的目标文本提取方法,其中,所述根据所述第一特征点匹配关系,通过变换矩阵进行求解以计算出将所述检测图像进行透视变换的第一透视变换算子的步骤包括:The method for extracting the target text in the certificate according to claim 2, wherein the first perspective transformation algorithm for performing perspective transformation on the detected image is calculated by solving through a transformation matrix according to the first feature point matching relationship The sub-steps include:
    利用每四对特征点之间的匹配关系通过所述变换矩阵进行求解以得到一个透视变换算子;Use the matching relationship between every four pairs of feature points to solve through the transformation matrix to obtain a perspective transformation operator;
    根据所述第一特征点集合和所述第二特征点集合中的所有匹配特征点中每四对特征点之间的组合,重复上述通过每四对特征点获得一个透视变换算子的过程,得到多个透视变换算子,并将所述多个透视变换算子组成集合作为透视变换算子集合;According to the combination between every four pairs of feature points in the first feature point set and all the matching feature points in the second feature point set, repeat the above process of obtaining a perspective transformation operator for every four pairs of feature points, Obtain a plurality of perspective transformation operators, and use the plurality of perspective transformation operators as a set of perspective transformation operators;
    根据预先构建的透视变换算子的误差函数,通过求极限的方式得到所述误差函数中最小值对应的所述透视变换算子集合中的透视变换算子作为所述第一透视变换算子。According to the pre-built error function of the perspective transformation operator, the perspective transformation operator in the perspective transformation operator set corresponding to the minimum value in the error function is obtained by seeking the limit as the first perspective transformation operator.
  5. 根据权利要求2所述证件中的目标文本提取方法,其中,所述通过文本识别模型提取所述检测图像上与所述第一锚点文本一致的第二锚点文本的步骤之前,还包括:根据所述证件的证件类型,预先设置提取锚点文本的辅助匹配方式。2. The method for extracting target text in a certificate according to claim 2, wherein before the step of extracting the second anchor text on the detection image that is consistent with the first anchor text on the detection image through a text recognition model, the method further comprises: According to the certificate type of the certificate, an auxiliary matching method for extracting anchor text is preset.
  6. 根据权利要求5所述证件中的目标文本提取方法,其中,所述预先设置提取锚点文本的辅助匹配方式的步骤包括:预先设置图形锚点以通过文字锚点和图形锚点结合提取特征点。5. The method for extracting target text in a certificate according to claim 5, wherein the step of presetting an auxiliary matching method for extracting anchor text comprises: presetting graphic anchor points to extract feature points through a combination of text anchor points and graphic anchor points .
  7. 根据权利要求5所述证件中的目标文本提取方法,其中,所述辅助匹配 方式包括字符间距和/或字符之间的位置关系。The method for extracting target text in a certificate according to claim 5, wherein the auxiliary matching method includes character spacing and/or positional relationship between characters.
  8. 根据权利要求1所述证件中的目标文本提取方法,其中,所述通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取的步骤之后,还包括:根据第二预设方式,对识别出的所述文本进行过滤以得到所述检测图像的目标文本。The method for extracting target text in a certificate according to claim 1, wherein the text recognition is performed on the text at the projection position on the perspective transformed image by the text recognition model, and the recognized text is extracted After the step, the method further includes: filtering the recognized text according to a second preset manner to obtain the target text of the detection image.
  9. 一种证件中的目标文本提取装置,其中,包括:A device for extracting target text in a certificate, which includes:
    第一获取单元,用于获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上标注有文字锚点和目标框位置,其中,所述文字锚点为在所述模板图像上标注的固定字段,所述文字锚点包括第一锚点文本,所述第一锚点文本为所述固定字段的内容,所述目标框位置为在所述模板图像上标注的证件上需要提取的所述目标文本所在的位置;The first acquiring unit is used to acquire a template image belonging to the same certificate type and a detection image for extracting target text. The template image is marked with a text anchor point and a target frame position, wherein the text anchor point is at The fixed field marked on the template image, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is marked on the template image The location of the target text that needs to be extracted on the certificate;
    第二获取单元,用于根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系,其中,锚点位置为所述第一锚点文本在对应图像上的位置;The second acquiring unit is configured to acquire the anchor position of the first anchor text on the template image and the first preset method according to the first anchor text and based on the text recognition model. The feature point matching relationship between the feature points contained in the anchor point positions of the anchor point text on the detected image, wherein the anchor point position is the position of the first anchor point text on the corresponding image;
    求解单元,用于根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子;A solving unit, configured to perform a solution through a transformation matrix according to the feature point matching relationship to obtain a perspective transformation operator that performs perspective transformation on the detection image;
    变换单元,用于将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像;A transformation unit, configured to perform perspective transformation on the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image;
    投影单元,用于通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置;A projection unit, configured to obtain the projection position of the target frame position on the perspective transformed image through the perspective transformation operator;
    识别单元,用于通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。The recognition unit is configured to perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.
  10. 根据权利要求9所述证件中的目标文本提取装置,其中,所述文字锚点还包括第一锚点位置,所述第二获取单元包括:9. The device for extracting target text in a certificate according to claim 9, wherein the text anchor point further comprises a first anchor point position, and the second acquiring unit comprises:
    第一提取子单元,用于通过文本识别模型提取所述检测图像上与所述第一锚点文本一致的第二锚点文本;The first extraction subunit is configured to extract the second anchor text on the detection image that is consistent with the first anchor text through a text recognition model;
    第一得到子单元,用于基于所述文本识别模型通过所述第二锚点文本得到所述检测图像上与所述第一锚点位置相对应的第二锚点位置;A first obtaining subunit, configured to obtain a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model;
    第二提取子单元,用于基于预设的特征点提取算法提取所述第一锚点位置包含的第一特征点集合和所述第二锚点位置包含的第二特征点集合;The second extraction subunit is configured to extract the first feature point set included in the first anchor point position and the second feature point set included in the second anchor point position based on a preset feature point extraction algorithm;
    第一获取子单元,用于根据所述第一特征点集合和所述第二特征点集合,基于特征点匹配算法获取所述第一特征点集合和所述第二特征点集合中的特征点之间的第一特征点匹配关系;The first obtaining subunit is configured to obtain the feature points in the first feature point set and the second feature point set based on a feature point matching algorithm according to the first feature point set and the second feature point set The first feature point matching relationship between;
    所述求解单元,用于根据所述第一特征点匹配关系,通过变换矩阵进行求解以计算出将所述检测图像进行透视变换的第一透视变换算子;The solution unit is configured to perform a solution through a transformation matrix according to the first feature point matching relationship to calculate a first perspective transformation operator that performs perspective transformation on the detection image;
    所述变换单元,用于将所述检测图像通过所述第一透视变换算子进行透视变换以得到与所述模板图像视角相符的第一透视变换图像。The transformation unit is configured to perform perspective transformation on the detected image through the first perspective transformation operator to obtain a first perspective transformation image that matches the angle of view of the template image.
  11. 一种计算机设备,其中,所述计算机设备包括存储器以及与所述存储器相连的处理器;所述存储器用于存储计算机程序;所述处理器用于运行所述存储器中存储的计算机程序,以执行如下步骤:A computer device, wherein the computer device includes a memory and a processor connected to the memory; the memory is used to store a computer program; the processor is used to run the computer program stored in the memory to execute the following step:
    获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上标注有文字锚点和目标框位置,其中,所述文字锚点为在所述模板图像上标注的固定字段,所述文字锚点包括第一锚点文本,所述第一锚点文本为所述固定字段的内容,所述目标框位置为在所述模板图像上标注的证件上需要提取的所述目标文本所在的位置;Acquire a template image belonging to the same certificate type and a detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor point is marked on the template image A fixed field, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is all that needs to be extracted on the certificate marked on the template image State the location of the target text;
    根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系,其中,锚点位置为所述第一锚点文本在对应图像上的位置;According to the first anchor point text and based on the text recognition model, the anchor point position of the first anchor point text on the template image and the detection of the first anchor point text on the template image are obtained in a first preset manner. The feature point matching relationship between the feature points contained in each anchor point position on the image, wherein the anchor point position is the position of the first anchor point text on the corresponding image;
    根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子;According to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image;
    将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像;Performing perspective transformation on the detection image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image;
    通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影 位置;Obtaining the projection position of the target frame position on the perspective transformed image through the perspective transformation operator;
    通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。Perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.
  12. 根据权利要求11所述计算机设备,其中,所述文字锚点还包括第一锚点位置,所述根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系的步骤包括:11. The computer device according to claim 11, wherein the text anchor point further comprises a first anchor point position, and the first anchor point is obtained in a first preset manner according to the first anchor point text and based on a text recognition model. The step of matching the feature points between the anchor point position of an anchor point text on the template image and the feature points contained in the anchor point position of the first anchor point text on the detection image includes:
    通过文本识别模型提取所述检测图像上与所述第一锚点文本一致的第二锚点文本;Extracting a second anchor point text on the detection image that is consistent with the first anchor point text through a text recognition model;
    基于所述文本识别模型通过所述第二锚点文本得到所述检测图像上与所述第一锚点位置相对应的第二锚点位置;Obtaining a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model;
    基于预设的特征点提取算法提取所述第一锚点位置包含的第一特征点集合和所述第二锚点位置包含的第二特征点集合;Extracting a first feature point set included in the first anchor point location and a second feature point set included in the second anchor point location based on a preset feature point extraction algorithm;
    根据所述第一特征点集合和所述第二特征点集合,基于特征点匹配算法获取所述第一特征点集合和所述第二特征点集合中的特征点之间的第一特征点匹配关系;According to the first feature point set and the second feature point set, the first feature point matching between the feature points in the first feature point set and the second feature point set is obtained based on a feature point matching algorithm relationship;
    所述根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子的步骤包括:The step of obtaining a perspective transformation operator for performing perspective transformation on the detected image by solving through a transformation matrix according to the feature point matching relationship includes:
    根据所述第一特征点匹配关系,通过变换矩阵进行求解以计算出将所述检测图像进行透视变换的第一透视变换算子;According to the first feature point matching relationship, solving through a transformation matrix to calculate a first perspective transformation operator that performs perspective transformation on the detected image;
    所述将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像的步骤包括:The step of performing perspective transformation of the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image includes:
    将所述检测图像通过所述第一透视变换算子进行透视变换以得到与所述模板图像视角相符的第一透视变换图像。The detection image is subjected to perspective transformation by the first perspective transformation operator to obtain a first perspective transformation image that matches the angle of view of the template image.
  13. 根据权利要求12所述计算机设备,其中,所述将所述检测图像通过所述第一透视变换算子进行透视变换以得到与所述模板图像视角相符的第一透视变换图像的步骤之后,还包括:The computer device according to claim 12, wherein after the step of performing perspective transformation on the detected image by the first perspective transformation operator to obtain a first perspective transformation image that matches the perspective of the template image, further include:
    将所述第一透视变换图像输入所述文本识别模型,通过所述第一锚点文本 获取所述第一透视变换图像上与所述第一锚点位置相对应的第三锚点位置;Inputting the first perspective transformation image into the text recognition model, and obtaining a third anchor point position corresponding to the first anchor point position on the first perspective transformation image through the first anchor point text;
    基于所述特征点提取算法提取所述第三锚点位置包含的第三特征点集合;Extracting a third set of feature points contained in the third anchor point location based on the feature point extraction algorithm;
    根据所述第一特征点集合和所述第三特征点集合,基于所述特征点匹配算法获取所述第一特征点集合和所述第三特征点集合中的特征点之间的第二特征点匹配关系;According to the first feature point set and the third feature point set, the second feature between the feature points in the first feature point set and the third feature point set is acquired based on the feature point matching algorithm Point matching relationship;
    根据所述第二特征点匹配关系,通过所述变换矩阵进行求解以计算出将所述第一透视变换图像进行透视变换的第二透视变换算子;According to the second feature point matching relationship, solving through the transformation matrix to calculate a second perspective transformation operator that performs perspective transformation on the first perspective transformation image;
    将所述第一透视变换图像通过所述第二透视变换算子进行透视变换以得到第二透视变换图像;Performing perspective transformation on the first perspective transformation image through the second perspective transformation operator to obtain a second perspective transformation image;
    所述通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置的步骤包括:The step of obtaining the projection position of the target frame position on the perspective transformed image by the perspective transformation operator includes:
    通过所述第二透视变换算子计算出所述目标框位置在所述第二透视变换图像上的投影位置;Calculating the projection position of the target frame position on the second perspective transformed image by using the second perspective transformation operator;
    所述通过所述文本识别模型对所述透视变换图像上的投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本的步骤包括:The step of performing text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extracting the recognized text to obtain the target text of the detection image includes:
    将所述第二透视变换图像输入所述文本识别模型,通过所述文本识别模型对所述第二透视变换图像上的投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。The second perspective transformed image is input into the text recognition model, and the text at the projection position on the second perspective transformed image is text recognized through the text recognition model, and the recognized text is extracted to obtain the Describe the target text of the detected image.
  14. 根据权利要求12所述计算机设备,其中,所述根据所述第一特征点匹配关系,通过变换矩阵进行求解以计算出将所述检测图像进行透视变换的第一透视变换算子的步骤包括:11. The computer device according to claim 12, wherein the step of calculating a first perspective transformation operator for performing perspective transformation of the detected image by solving through a transformation matrix according to the first feature point matching relationship comprises:
    利用每四对特征点之间的匹配关系通过所述变换矩阵进行求解以得到一个透视变换算子;Use the matching relationship between every four pairs of feature points to solve through the transformation matrix to obtain a perspective transformation operator;
    根据所述第一特征点集合和所述第二特征点集合中的所有匹配特征点中每四对特征点之间的组合,重复上述通过每四对特征点获得一个透视变换算子的过程,得到多个透视变换算子,并将所述多个透视变换算子组成集合作为透视变换算子集合;According to the combination between every four pairs of feature points in the first feature point set and all the matching feature points in the second feature point set, repeat the above process of obtaining a perspective transformation operator for every four pairs of feature points, Obtain a plurality of perspective transformation operators, and use the plurality of perspective transformation operators as a set of perspective transformation operators;
    根据预先构建的透视变换算子的误差函数,通过求极限的方式得到所述误差函数中最小值对应的所述透视变换算子集合中的透视变换算子作为所述第一透视变换算子。According to the pre-built error function of the perspective transformation operator, the perspective transformation operator in the perspective transformation operator set corresponding to the minimum value in the error function is obtained by seeking the limit as the first perspective transformation operator.
  15. 根据权利要求12所述计算机设备,其中,所述通过文本识别模型提取所述检测图像上与所述第一锚点文本一致的第二锚点文本的步骤之前,还包括:根据所述证件的证件类型,预先设置提取锚点文本的辅助匹配方式。The computer device according to claim 12, wherein, before the step of extracting the second anchor text on the detection image that is consistent with the first anchor text on the detection image through the text recognition model, the method further comprises: Credential type, preset auxiliary matching method for extracting anchor text.
  16. 根据权利要求15所述计算机设备,其中,所述预先设置提取锚点文本的辅助匹配方式的步骤包括:预先设置图形锚点以通过文字锚点和图形锚点结合提取特征点。15. The computer device according to claim 15, wherein the step of presetting an auxiliary matching method for extracting anchor text comprises: presetting graphic anchor points to extract feature points through a combination of text anchor points and graphic anchor points.
  17. 根据权利要求15所述计算机设备,其中,所述辅助匹配方式包括字符间距和/或字符之间的位置关系。15. The computer device according to claim 15, wherein the auxiliary matching method includes a character pitch and/or a positional relationship between characters.
  18. 根据权利要求11所述计算机设备,其中,所述通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取的步骤之后,还包括:根据第二预设方式,对识别出的所述文本进行过滤以得到所述检测图像的目标文本。11. The computer device according to claim 11, wherein after the step of performing text recognition on the text at the projection position on the perspective transformed image by the text recognition model, and extracting the recognized text, further The method includes: filtering the recognized text according to a second preset manner to obtain the target text of the detection image.
  19. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如下步骤:A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the following steps:
    获取属于同一种证件类型的模板图像和用于提取目标文本的检测图像,所述模板图像上标注有文字锚点和目标框位置,其中,所述文字锚点为在所述模板图像上标注的固定字段,所述文字锚点包括第一锚点文本,所述第一锚点文本为所述固定字段的内容,所述目标框位置为在所述模板图像上标注的证件上需要提取的所述目标文本所在的位置;Acquire a template image belonging to the same certificate type and a detection image for extracting target text, the template image is marked with a text anchor point and a target frame position, wherein the text anchor point is marked on the template image A fixed field, the text anchor point includes a first anchor point text, the first anchor point text is the content of the fixed field, and the target frame position is all that needs to be extracted on the certificate marked on the template image State the location of the target text;
    根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系,其中,锚点位置为所述第一锚点文本在对应图像上的位置;According to the first anchor point text and based on the text recognition model, the anchor point position of the first anchor point text on the template image and the detection of the first anchor point text on the template image are obtained in a first preset manner. The feature point matching relationship between the feature points contained in each anchor point position on the image, wherein the anchor point position is the position of the first anchor point text on the corresponding image;
    根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子;According to the feature point matching relationship, solving through a transformation matrix to obtain a perspective transformation operator that performs perspective transformation on the detected image;
    将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像;Performing perspective transformation on the detection image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image;
    通过所述透视变换算子获取所述目标框位置在所述透视变换图像上的投影位置;Obtaining the projection position of the target frame position on the perspective transformed image by the perspective transformation operator;
    通过所述文本识别模型对所述透视变换图像上的所述投影位置的文本进行文本识别,并对识别出的文本进行提取以得到所述检测图像的目标文本。Perform text recognition on the text at the projection position on the perspective transformed image through the text recognition model, and extract the recognized text to obtain the target text of the detection image.
  20. 根据权利要求19所述存储介质,其中,所述文字锚点还包括第一锚点位置,所述根据所述第一锚点文本并基于文本识别模型,通过第一预设方式获取所述第一锚点文本在所述模板图像上的锚点位置与所述第一锚点文本在所述检测图像上的锚点位置各自包含的特征点之间的特征点匹配关系的步骤包括:18. The storage medium according to claim 19, wherein the text anchor point further comprises a first anchor point position, and the first anchor point is obtained in a first preset manner according to the first anchor point text and based on a text recognition model. The step of matching the feature points between the anchor point position of an anchor point text on the template image and the feature points contained in the anchor point position of the first anchor point text on the detection image includes:
    通过文本识别模型提取所述检测图像上与所述第一锚点文本一致的第二锚点文本;Extracting a second anchor point text on the detection image that is consistent with the first anchor point text through a text recognition model;
    基于所述文本识别模型通过所述第二锚点文本得到所述检测图像上与所述第一锚点位置相对应的第二锚点位置;Obtaining a second anchor point position corresponding to the first anchor point position on the detection image through the second anchor point text based on the text recognition model;
    基于预设的特征点提取算法提取所述第一锚点位置包含的第一特征点集合和所述第二锚点位置包含的第二特征点集合;Extracting a first feature point set included in the first anchor point location and a second feature point set included in the second anchor point location based on a preset feature point extraction algorithm;
    根据所述第一特征点集合和所述第二特征点集合,基于特征点匹配算法获取所述第一特征点集合和所述第二特征点集合中的特征点之间的第一特征点匹配关系;According to the first feature point set and the second feature point set, the first feature point matching between the feature points in the first feature point set and the second feature point set is obtained based on a feature point matching algorithm relationship;
    所述根据所述特征点匹配关系,通过变换矩阵进行求解以得到将所述检测图像进行透视变换的透视变换算子的步骤包括:The step of obtaining a perspective transformation operator for performing perspective transformation on the detected image by solving through a transformation matrix according to the feature point matching relationship includes:
    根据所述第一特征点匹配关系,通过变换矩阵进行求解以计算出将所述检测图像进行透视变换的第一透视变换算子;According to the first feature point matching relationship, solving through a transformation matrix to calculate a first perspective transformation operator that performs perspective transformation on the detected image;
    所述将所述检测图像通过所述透视变换算子进行透视变换以得到与所述模板图像视角相符的透视变换图像的步骤包括:The step of performing perspective transformation of the detected image through the perspective transformation operator to obtain a perspective transformation image that matches the perspective of the template image includes:
    将所述检测图像通过所述第一透视变换算子进行透视变换以得到与所述模板图像视角相符的第一透视变换图像。The detection image is subjected to perspective transformation by the first perspective transformation operator to obtain a first perspective transformation image that matches the angle of view of the template image.
PCT/CN2019/118469 2019-10-15 2019-11-14 Method and apparatus for extracting target text in certificate, device, and readable storage medium WO2021072879A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910979567.0A CN111126125B (en) 2019-10-15 2019-10-15 Method, device, equipment and readable storage medium for extracting target text in certificate
CN201910979567.0 2019-10-15

Publications (1)

Publication Number Publication Date
WO2021072879A1 true WO2021072879A1 (en) 2021-04-22

Family

ID=70495348

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118469 WO2021072879A1 (en) 2019-10-15 2019-11-14 Method and apparatus for extracting target text in certificate, device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN111126125B (en)
WO (1) WO2021072879A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177541A (en) * 2021-05-17 2021-07-27 上海云扩信息科技有限公司 Method for extracting character contents in PDF document and picture by computer program
CN113657384A (en) * 2021-09-02 2021-11-16 京东科技控股股份有限公司 Certificate image correction method and device, storage medium and electronic equipment
CN114332865A (en) * 2022-03-11 2022-04-12 北京锐融天下科技股份有限公司 Certificate OCR recognition method and system
CN117315033A (en) * 2023-11-29 2023-12-29 上海仙工智能科技有限公司 Neural network-based identification positioning method and system and storage medium

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762244A (en) * 2020-06-05 2021-12-07 北京市天元网络技术股份有限公司 Document information extraction method and device
CN111696044B (en) * 2020-06-16 2022-06-10 清华大学 Large-scene dynamic visual observation method and device
CN111898381A (en) * 2020-06-30 2020-11-06 北京来也网络科技有限公司 Text information extraction method, device, equipment and medium combining RPA and AI
CN111967347A (en) * 2020-07-28 2020-11-20 北京嘀嘀无限科技发展有限公司 Data processing method and device, readable storage medium and electronic equipment
CN111914840A (en) * 2020-07-31 2020-11-10 中国建设银行股份有限公司 Text recognition method, model training method, device and equipment
CN112001331A (en) * 2020-08-26 2020-11-27 上海高德威智能交通系统有限公司 Image recognition method, device, equipment and storage medium
CN112016561B (en) * 2020-09-01 2023-08-04 中国银行股份有限公司 Text recognition method and related equipment
CN111931771B (en) * 2020-09-16 2021-01-01 深圳壹账通智能科技有限公司 Bill content identification method, device, medium and electronic equipment
CN111931784B (en) * 2020-09-17 2021-01-01 深圳壹账通智能科技有限公司 Bill recognition method, system, computer device and computer-readable storage medium
CN112132016B (en) * 2020-09-22 2023-09-15 平安科技(深圳)有限公司 Bill information extraction method and device and electronic equipment
CN112613402A (en) * 2020-12-22 2021-04-06 金蝶软件(中国)有限公司 Text region detection method, text region detection device, computer equipment and storage medium
CN112668572B (en) * 2020-12-24 2023-01-31 成都新希望金融信息有限公司 Identity card image standardization method and device, electronic equipment and storage medium
CN112633279A (en) * 2020-12-31 2021-04-09 北京市商汤科技开发有限公司 Text recognition method, device and system
CN112651378B (en) * 2021-01-08 2021-10-15 唐旸 Method, device and medium for identifying marking information of fastener two-dimensional drawing
CN113269126A (en) * 2021-06-10 2021-08-17 上海云扩信息科技有限公司 Key information extraction method based on coordinate transformation
CN113920512B (en) * 2021-12-08 2022-03-15 共道网络科技有限公司 Image recognition method and device
CN114577756B (en) * 2022-05-09 2022-07-15 烟台正德电子科技有限公司 Light transmission uniformity detection device and detection method
CN116740719A (en) * 2023-05-04 2023-09-12 北京和利时系统集成有限公司 Pointer type meter reading method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130198615A1 (en) * 2006-08-01 2013-08-01 Abbyy Software Ltd. Creating Flexible Structure Descriptions
CN107368800A (en) * 2017-07-13 2017-11-21 上海携程商务有限公司 Order confirmation method, system, equipment and storage medium based on fax identification
CN109977935A (en) * 2019-02-27 2019-07-05 平安科技(深圳)有限公司 A kind of text recognition method and device
CN110321895A (en) * 2019-04-30 2019-10-11 北京市商汤科技开发有限公司 Certificate recognition methods and device, electronic equipment, computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN109492643B (en) * 2018-10-11 2023-12-19 平安科技(深圳)有限公司 Certificate identification method and device based on OCR, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130198615A1 (en) * 2006-08-01 2013-08-01 Abbyy Software Ltd. Creating Flexible Structure Descriptions
CN107368800A (en) * 2017-07-13 2017-11-21 上海携程商务有限公司 Order confirmation method, system, equipment and storage medium based on fax identification
CN109977935A (en) * 2019-02-27 2019-07-05 平安科技(深圳)有限公司 A kind of text recognition method and device
CN110321895A (en) * 2019-04-30 2019-10-11 北京市商汤科技开发有限公司 Certificate recognition methods and device, electronic equipment, computer readable storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177541A (en) * 2021-05-17 2021-07-27 上海云扩信息科技有限公司 Method for extracting character contents in PDF document and picture by computer program
CN113177541B (en) * 2021-05-17 2023-12-19 上海云扩信息科技有限公司 Method for extracting text content in PDF document and picture by computer program
CN113657384A (en) * 2021-09-02 2021-11-16 京东科技控股股份有限公司 Certificate image correction method and device, storage medium and electronic equipment
CN113657384B (en) * 2021-09-02 2024-04-05 京东科技控股股份有限公司 Certificate image correction method and device, storage medium and electronic equipment
CN114332865A (en) * 2022-03-11 2022-04-12 北京锐融天下科技股份有限公司 Certificate OCR recognition method and system
CN117315033A (en) * 2023-11-29 2023-12-29 上海仙工智能科技有限公司 Neural network-based identification positioning method and system and storage medium
CN117315033B (en) * 2023-11-29 2024-03-19 上海仙工智能科技有限公司 Neural network-based identification positioning method and system and storage medium

Also Published As

Publication number Publication date
CN111126125A (en) 2020-05-08
CN111126125B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
WO2021072879A1 (en) Method and apparatus for extracting target text in certificate, device, and readable storage medium
US10311099B2 (en) Method and system for 3D model database retrieval
JP4594372B2 (en) Method for recognizing parameterized shape from document image
US10878173B2 (en) Object recognition and tagging based on fusion deep learning models
CN110675487B (en) Three-dimensional face modeling and recognition method and device based on multi-angle two-dimensional face
JP4623676B2 (en) Method, apparatus and storage medium for dynamic connector analysis
WO2019018063A1 (en) Fine-grained image recognition
JP5340441B2 (en) Shape parameterization for editable document generation
WO2018233055A1 (en) Method and apparatus for entering policy information, computer device and storage medium
WO2021017272A1 (en) Pathology image annotation method and device, computer apparatus, and storage medium
JPWO2009060975A1 (en) Feature point arrangement collation apparatus, image collation apparatus, method and program thereof
WO2014123619A1 (en) System and method for identifying similarities in different images
US20180253852A1 (en) Method and device for locating image edge in natural background
CN111290684B (en) Image display method, image display device and terminal equipment
Wang et al. Joint head pose and facial landmark regression from depth images
CN104881657A (en) Profile face identification method and system, and profile face construction method and system
CN111209909B (en) Construction method, device, equipment and storage medium for qualification recognition template
CN112815936B (en) Rapid all-sky-domain star map identification method and system for noise robustness
WO2015068417A1 (en) Image collation system, image collation method, and program
CN109978829B (en) Detection method and system for object to be detected
CN113111687A (en) Data processing method and system and electronic equipment
GB2532537A (en) Aligning multi-view scans
WO2021151274A1 (en) Image file processing method and apparatus, electronic device, and computer readable storage medium
CN114299509A (en) Method, device, equipment and medium for acquiring information
Dantanarayana et al. Object recognition and localization from 3D point clouds by maximum-likelihood estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19949273

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19949273

Country of ref document: EP

Kind code of ref document: A1