WO2023005253A1 - Method, apparatus and system for training text recognition model framework - Google Patents

Method, apparatus and system for training text recognition model framework Download PDF

Info

Publication number
WO2023005253A1
WO2023005253A1 PCT/CN2022/085149 CN2022085149W WO2023005253A1 WO 2023005253 A1 WO2023005253 A1 WO 2023005253A1 CN 2022085149 W CN2022085149 W CN 2022085149W WO 2023005253 A1 WO2023005253 A1 WO 2023005253A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
feature
model
fusion
features
Prior art date
Application number
PCT/CN2022/085149
Other languages
French (fr)
Chinese (zh)
Inventor
章成全
吕鹏原
李煜林
庾悦晨
姚锟
韩钧宇
刘经拓
丁二锐
吴甜
王海峰
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to KR1020237005116A priority Critical patent/KR20230030005A/en
Publication of WO2023005253A1 publication Critical patent/WO2023005253A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Abstract

The present disclosure relates to the technical field of artificial intelligence, and in particular to the technical fields of computer vision and deep learning, and can be applied in smart city and smart finance scenarios. Provided are a method, apparatus and system for training a text recognition model framework. The method comprises: performing feature processing on a sample image on the basis of a preset text detection model, so as to obtain at least two types of feature information related to text information in the sample image; performing fusion processing on the at least two types of feature information of the sample image on the basis of a preset feature fusion model, so as to obtain a fused feature of the sample image; and inputting the fused feature into the feature fusion model, and respectively adjusting parameters of the text detection model and the feature fusion model on the basis of the fused feature, so as to obtain a text recognition model framework. The text detection model and the feature fusion model in the text recognition model framework have a relatively high relevance, so as to realize the integrity and comprehensiveness of a training process, thereby improving the accuracy and reliability of the text recognition model framework.

Description

文本识别模型框架的训练方法、装置及系统Training method, device and system for text recognition model framework
本公开要求于2021年07月28日提交中国专利局、申请号为CN202110858410.X、申请名称为“文本识别模型框架的训练方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This disclosure claims the priority of the Chinese patent application submitted to the China Patent Office on July 28, 2021, with the application number CN202110858410.X and the application title "Training method, device and system for text recognition model framework", and the entire content of which is passed References are incorporated in this application.
技术领域technical field
本公开涉及人工智能技术领域,具体为计算机视觉和深度学习技术领域,尤其涉及一种文本识别模型框架的训练方法、装置及系统,可应用于智慧城市和智慧金融场景。The present disclosure relates to the field of artificial intelligence technology, specifically the field of computer vision and deep learning technology, and in particular to a training method, device and system for a text recognition model framework, which can be applied to smart cities and smart financial scenarios.
背景技术Background technique
随着人工智能技术的发展,对图像中的文本信息的识别由人工识别发展为了自动识别,如预先训练用于辅助训练文本识别模型的文本识别模型框架(也可以称为用于辅助训练文本识别模型的结构化解析框架模型),在该结构化框架模型的基础上,训练生成用于对待识别的图像中的文本信息进行识别的文本识别模型。With the development of artificial intelligence technology, the recognition of text information in images has evolved from manual recognition to automatic recognition, such as the text recognition model framework pre-trained for auxiliary training of text recognition models (also known as text recognition for auxiliary training A structured analysis framework model of the model), on the basis of the structured framework model, train and generate a text recognition model for recognizing text information in the image to be recognized.
在现有技术中,通常基于文本检测模型和特征融合模型训练得到文本识别模型框架,其中,文本检测模型与特征融合模型为两个相互独立的模型,特征融合模型具体基于文本检测模型的线下识别结果完成训练。In the prior art, the text recognition model framework is usually trained based on the text detection model and the feature fusion model, wherein the text detection model and the feature fusion model are two independent models, and the feature fusion model is based on the offline text detection model. The recognition result completes the training.
然而,文本检测模型与特征融合模型二者在训练过程中相互独立,可能导致训练得到的文本识别模型框架的准确性偏低的技术问题。However, the text detection model and the feature fusion model are independent of each other during the training process, which may lead to a technical problem of low accuracy of the trained text recognition model framework.
发明内容Contents of the invention
本公开提供了一种用于提高文本识别模型框架的准确性的文本识别模型框架的训练方法及装置。The present disclosure provides a training method and device for a text recognition model framework for improving the accuracy of the text recognition model framework.
根据本公开的第一方面,提供了一种文本识别模型框架的训练方法,所述方法包括:基于预设的文本检测模型对样本图像进行特征处理,得到与所述样本图像中文本信息相关的至少两种特征信息;According to the first aspect of the present disclosure, a method for training a text recognition model framework is provided, the method includes: performing feature processing on a sample image based on a preset text detection model, and obtaining information related to the text information in the sample image At least two types of characteristic information;
基于预设的特征融合模型对所述样本图像的至少两种特征信息进行融合处理,得到所述样本图像的融合特征;performing fusion processing on at least two types of feature information of the sample image based on a preset feature fusion model to obtain fusion features of the sample image;
将所述融合特征输入至所述特征融合模型,基于所述融合特征模型对所述文本检测模型和所述特征融合模型的参数分别进行调整,得到文本识别模型框架,其中,所述文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。Input the fusion feature into the feature fusion model, adjust the parameters of the text detection model and the feature fusion model based on the fusion feature model respectively, to obtain a text recognition model framework, wherein the text recognition model The framework includes an adjusted text detection model and an adjusted feature fusion model.
根据本公开的第二方面,提供了一种文本识别方法,包括:According to a second aspect of the present disclosure, a text recognition method is provided, including:
获取待识别图像;Obtain the image to be recognized;
将所述待识别图像输入至预先训练的文本识别模型,得到所述待识别图像中的文本信息,其中,所述文本识别模型是基于预先训练的文本识别模型框架对待训练图像进行训练生成的,所述文本识别模型框架为由第一方面所述训练方法训练获得,所述待训练图像中包括文本信息。Inputting the image to be recognized into a pre-trained text recognition model to obtain text information in the image to be recognized, wherein the text recognition model is generated based on the pre-trained text recognition model framework to train the image to be trained, The text recognition model framework is obtained by training the training method described in the first aspect, and the image to be trained includes text information.
根据本公开的第三方面,提供了一种文本识别模型框架的训练装置,所述装置包括:According to a third aspect of the present disclosure, a training device for a text recognition model framework is provided, the device comprising:
处理单元,用于基于预设的文本检测模型对样本图像进行特征处理,得到与所述样本图像中文本信息相关的至少两种特征信息;A processing unit, configured to perform feature processing on the sample image based on a preset text detection model to obtain at least two types of feature information related to the text information in the sample image;
融合单元,用于基于预设的特征融合模型对所述样本图像的至少两种特征信息进行融合处理,得到所述样本图像的融合特征;a fusion unit, configured to perform fusion processing on at least two types of feature information of the sample image based on a preset feature fusion model, to obtain fusion features of the sample image;
训练单元,用于将所述融合特征输入至所述特征融合模型,基于所述融合特征模型对所述文本检测模型和所述特征融合模型的参数分别进行调整,得到文本识别模型框架,其中,所述文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。A training unit, configured to input the fusion feature into the feature fusion model, and adjust the parameters of the text detection model and the feature fusion model based on the fusion feature model to obtain a text recognition model framework, wherein, The text recognition model framework includes an adjusted text detection model and an adjusted feature fusion model.
根据本公开的第四方面,提供了一种文本识别装置,包括:According to a fourth aspect of the present disclosure, a text recognition device is provided, including:
获取单元,用于获取待识别图像;an acquisition unit, configured to acquire an image to be identified;
识别单元,用于将所述待识别图像输入至预先训练的文本识别模型,得到所述待识别图像中的文本信息,其中,所述文本识别模型是基于预先训练的文本识别模型框架对待训练图像进行训练生成的,所述文本识别模型框架为由第一方面训练方法训练获得,所述待训练图像中包括文本信息。A recognition unit, configured to input the image to be recognized into a pre-trained text recognition model to obtain text information in the image to be recognized, wherein the text recognition model is based on the pre-trained text recognition model framework to treat the training image Generated by training, the text recognition model framework is obtained through training in the training method of the first aspect, and the image to be trained includes text information.
根据本公开的第五方面,提供了一种电子设备,包括:According to a fifth aspect of the present disclosure, there is provided an electronic device, comprising:
至少一个处理器;以及at least one processor; and
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行第一方面所述的方法;或者,以使所述至少一个处理器能够执行第二方面所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect; or, by The at least one processor is enabled to execute the method described in the second aspect.
根据本公开的第六方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行第一方面所述的方法;或者,所述计算机指令用于使所述计算机执行第二方面所述的方法。According to a sixth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method described in the first aspect; or, the The computer instructions are used to cause the computer to execute the method described in the second aspect.
根据本公开的第七方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序,所述计算机程序存储在可读存储介质中,电子设备的至少一个处理器可以从所述可读存储介质读取所述计算机程序,所述至少一个处理器执行所述计算机程序使得电子设备执行第一方面所述的方法;或者,所述至少一个处理器执行所述计算机程序使得电子设备执行第二方面所述的方法。According to a seventh aspect of the present disclosure, there is provided a computer program product, the computer program product comprising: a computer program stored in a readable storage medium, at least one processor of an electronic device can read from the Read the computer program from the storage medium, the at least one processor executes the computer program to make the electronic device execute the method described in the first aspect; or, the at least one processor executes the computer program to make the electronic device execute The method described in the second aspect.
根据本公开的第八方面,提供了一种文本识别模型框架的训练系统,所述系统包括:According to an eighth aspect of the present disclosure, a training system for a text recognition model framework is provided, the system comprising:
文本检测模型,用于对样本图像进行特征处理,得到与所述样本图像中文本信息相关的至少两种特征信息;A text detection model, configured to perform feature processing on the sample image to obtain at least two types of feature information related to the text information in the sample image;
特征融合模型,用于对所述样本图像的至少两种特征信息进行融合处理,得到所述样本图像的融合特征;A feature fusion model, configured to perform fusion processing on at least two types of feature information of the sample image to obtain fusion features of the sample image;
所述特征融合模型还用于,对所述文本检测模型和所述特征融合模型的参数分别进行调整,得到文本识别模型框架,其中,所述文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。The feature fusion model is also used to adjust the parameters of the text detection model and the feature fusion model respectively to obtain a text recognition model framework, wherein the text recognition model framework includes the adjusted text detection model and Adjusted feature fusion model.
根据第九方面,本申请实施例提供了一种计算机程序,包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行如上第一方面或者第二方面所述的方法。According to a ninth aspect, an embodiment of the present application provides a computer program, including program code, and when a computer runs the computer program, the program code executes the method described in the first aspect or the second aspect above.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used to better understand the present solution, and do not constitute a limitation to the present disclosure. in:
附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used to better understand the present solution, and do not constitute a limitation to the present disclosure. in:
图1是根据本公开第一实施例的示意图;FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
图2是根据本公开第二实施例的示意图;FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
图3是根据本公开实施例的文本识别模型框架的训练方法的场景示意图;FIG. 3 is a schematic diagram of a scene of a training method of a text recognition model framework according to an embodiment of the present disclosure;
图4是根据本公开第三实施例的示意图;FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;
图5是根据本公开第四实施例的示意图;FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;
图6是根据本公开第五实施例的示意图;FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;
图7是根据本公开第六实施例的示意图;FIG. 7 is a schematic diagram according to a sixth embodiment of the present disclosure;
图8是用来实现本公开实施例的文本识别模型框架的训练方法、文本识别方法的电子设备的框图;8 is a block diagram of an electronic device used to implement the training method of the text recognition model framework and the text recognition method of the embodiment of the present disclosure;
图9是根据本公开第七实施例的示意图。FIG. 9 is a schematic diagram according to a seventh embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
文本识别技术是指对图像中的文本信息的识别,且文本识别技术被广泛地应用于各个领域,如教育领域,金融领域,医疗领域,交通领域,以及保险领域等。Text recognition technology refers to the recognition of text information in images, and text recognition technology is widely used in various fields, such as education, finance, medical care, transportation, and insurance.
例如,当文本识别技术应用于医疗领域中,可以基于文本识别技术对病历本图像中的文本信息进行识别。又如,当文本识别技术应用于保险领域中,可以基于文本识别技术对保险单图像中的文本信息进行识别等,此处不再一一列举。For example, when the text recognition technology is applied in the medical field, the text information in the medical record image can be recognized based on the text recognition technology. As another example, when the text recognition technology is applied in the insurance field, the text information in the insurance policy image can be recognized based on the text recognition technology, which will not be listed here.
而随着人工智能技术中深度学习技术的发展,深度学习技术可以与其他技术相结合,例如,可以将深度学习技术应用于文本识别技术,从而提高对文本信息的识别的准确性和可靠性。With the development of deep learning technology in artificial intelligence technology, deep learning technology can be combined with other technologies. For example, deep learning technology can be applied to text recognition technology, thereby improving the accuracy and reliability of text information recognition.
例如,可以基于深度学习技术训练用于对文本信息进行识别的文本识别模型。而对文本识别模型的训练通常需要基于文本识别模型框架,即一般而言,先训练得到文本识别模型框架,而后在文本识别模型框架的基础上,训练得到文本识别模型。For example, a text recognition model for recognizing text information can be trained based on deep learning technology. The training of the text recognition model usually needs to be based on the text recognition model framework, that is, generally speaking, the text recognition model framework is first trained, and then the text recognition model is trained on the basis of the text recognition model framework.
在相关技术中,通常由两个相互独立的模型训练得到文本识别模型框架,两个相互独立的模型分别为文本检测模型和特征融合模型,且在训练文本识别模型框架时,特征融合模型是基于文本检测模型的线下识别结果。In related technologies, the text recognition model framework is usually obtained by training two independent models, the two independent models are the text detection model and the feature fusion model, and when training the text recognition model framework, the feature fusion model is based on The offline recognition results of the text detection model.
具体地,文本检测模型可以为光学字符识别(Optical Character Recognition,OCR)模型,特征融合模型可以为transfromer模型,transfromer模型具体基于光学字符识别模型的线下识别结果完成训练,得到文本识别模型框架。Specifically, the text detection model can be an Optical Character Recognition (OCR) model, and the feature fusion model can be a transfromer model. The transfromer model is trained based on the offline recognition results of the OCR model to obtain a text recognition model framework.
然而,光学字符识别模型与transfromer模型二者在训练过程中相互独立,可能导致训练得到的文本识别模型框架的准确性偏低的技术问题。However, the optical character recognition model and the transferer model are independent of each other during the training process, which may lead to a technical problem of low accuracy of the trained text recognition model framework.
为了避免上述技术问题,本公开的发明人经过创造性地劳动,得到了本公开的发明构思:基于文本检测模型和特征融合模型,得到融合特征,由特征融合模型基于融 合特征,对文本识别模型和特征识别模型进行整体训练,以得到文本识别模型框架。In order to avoid the above-mentioned technical problems, the inventors of the present disclosure have obtained the inventive concept of the present disclosure through creative labor: based on the text detection model and the feature fusion model, the fusion features are obtained, and the feature fusion model is based on the fusion features, and the text recognition model and The feature recognition model is trained as a whole to obtain the framework of the text recognition model.
基于上述发明构思,本公开提供一种文本识别模型框架的训练方法、装置及系统,应用于人工智能技术领域中的计算机视觉和深度学习技术领域,可应用于智慧城市和智慧金融场景,以提高文本识别模型框架的准确性。Based on the above inventive concepts, the present disclosure provides a training method, device and system for a text recognition model framework, which is applied in the field of computer vision and deep learning technology in the field of artificial intelligence technology, and can be applied to smart cities and smart financial scenarios to improve Accuracy of Text Recognition Model Framework.
请参阅图1,图1是根据本公开第一实施例的示意图。Please refer to FIG. 1 . FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure.
如图1所示,本公开实施例提供的文本识别模型框架的训练方法,包括:As shown in Figure 1, the training method of the text recognition model framework provided by the embodiment of the present disclosure includes:
S101:基于预设的文本检测模型对样本图像进行特征处理,得到与样本图像中文本信息相关的至少两种特征信息。S101: Perform feature processing on the sample image based on a preset text detection model to obtain at least two types of feature information related to text information in the sample image.
示例性地,本实施例的执行主体可以为文本识别模型框架的训练装置(下文简称训练装置),训练装置可以为服务器(如本地服务器,或者,云端服务器),也可以为终端设备,也可以为处理器,也可以为芯片等,本实施例不做限定。Exemplarily, the execution subject of this embodiment may be a training device of the text recognition model framework (hereinafter referred to as the training device), and the training device may be a server (such as a local server, or a cloud server), or a terminal device, or It may be a processor, or may be a chip, etc., which is not limited in this embodiment.
其中,样本图像中包括文本信息,例如,针对医疗领域,样本图像可以为病历单的图像,样本图像中包括如病人身份的文本信息、以及病例的文本信息等。又如,针对保险领域,样本图像可以为保险单的图像,样本图像中包括如保险人身份的文本信息、以及保险内容的文本信息等。The sample image includes text information. For example, for the medical field, the sample image may be an image of a medical record, and the sample image includes text information such as patient identity and case information. As another example, for the insurance field, the sample image may be an image of an insurance policy, and the sample image includes text information such as the identity of the insurer and text information of the insurance content.
应该理解地是,用于训练文本识别模型框架的样本图像的数量可以由训练装置基于需求、历史记录、以及试验等方式进行设置,本实施例不做限定。It should be understood that the number of sample images used for training the text recognition model framework can be set by the training device based on requirements, historical records, and experiments, which is not limited in this embodiment.
文本检测模型为可以对样本图像中,与文本信息相关的特征进行检测的模型。例如,针对医疗领域,文本检测模型可以对病历单的图像中的病人身份的文本信息进行检测。The text detection model is a model that can detect features related to text information in sample images. For example, for the medical field, the text detection model can detect the text information of the patient identity in the image of the medical record.
具体地,文本检测模型可以为光学字符识别模型。Specifically, the text detection model may be an optical character recognition model.
在本实施例中,特征信息用于表征与样本图像中文本信息相关的特征,至少两种特征信息可以包括:与文本内容相关的信息、与文本视觉相关的信息、以及各文字在空间关系上的信息等,此处不再一一列举。In this embodiment, the feature information is used to characterize the features related to the text information in the sample image. At least two kinds of feature information may include: information related to text content, information related to text vision, and the spatial relationship of each text information, etc., will not be listed here.
S102:基于预设的特征融合模型对样本图像的至少两种特征信息进行融合处理,得到样本图像的融合特征。S102: Perform fusion processing on at least two types of feature information of the sample image based on a preset feature fusion model to obtain fusion features of the sample image.
其中,特征融合模型是指,可以对多种特征信息进行融合处理的模型。例如,特征融合模型可以为transfromer模型。Wherein, the feature fusion model refers to a model that can perform fusion processing on various feature information. For example, the feature fusion model may be a transferer model.
融合处理可以为对多种特征信息进行拼接,也可以为对多种特征信息进行组合,可以为对多种特征信息进行连接等,本实施例不做限定,融合处理地详细处理过程可以参见相关技术,此处不再赘述。Fusion processing may be splicing multiple types of feature information, or combining multiple types of feature information, or connecting multiple types of feature information. technology, which will not be repeated here.
S103:将融合特征输入至特征融合模型,基于融合特征模型对文本检测模型和特征融合模型的参数分别进行调整,得到文本识别模型框架。S103: Input the fusion feature into the feature fusion model, adjust the parameters of the text detection model and the feature fusion model based on the fusion feature model, and obtain a text recognition model framework.
其中,文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。Wherein, the text recognition model framework includes an adjusted text detection model and an adjusted feature fusion model.
在本实施例中,将融合特征输入至特征融合模型,可以基于融合特征调整文本检测模型的参数,且调整特征融合模型的参数,从而文本识别模型框架。In this embodiment, the fusion features are input into the feature fusion model, the parameters of the text detection model can be adjusted based on the fusion features, and the parameters of the feature fusion model can be adjusted to form a text recognition model framework.
可以理解地是,文本识别模型框架的训练为迭代的过程,即对文本检测模型的参数和特征融合模型的参数反复进行调整的过程,当迭代次数达到预设次数阈值,或者迭代时的损失函数小于预设的损失阈值,则说明训练已经得到要求,从而得到文本识别模型框架。It can be understood that the training of the text recognition model framework is an iterative process, that is, the process of repeatedly adjusting the parameters of the text detection model and the parameters of the feature fusion model. When the number of iterations reaches the preset threshold, or the loss function during iteration If it is less than the preset loss threshold, it means that the training has been required, so as to obtain the text recognition model framework.
基于上述分析可知,本公开实施例提供了一种文本识别模型框架的训练方法,该方法包括:基于预设的文本检测模型对样本图像进行特征处理,得到与样本图像中文本信息相关的至少两种特征信息,基于预设的特征融合模型对样本图像的至少两种特征信息进行融合处理,得到样本图像的融合特征,将融合特征输入至特征融合模型,基于融合特征模型对文本检测模型和特征融合模型的参数分别进行调整,得到文本识别模型框架,其中,文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型,在本实施例中,引入了:基于融合特征对文本检测模型和特征融合模型的参数分别进行调整,以得到文本识别模型框架的技术特征,使得文本识别模型框架中的文本检测模型与特征融合模型之间具有较高的关联性,从而实现了训练过程的完整性和全面性,避免了相关技术中文本检测模型与特征融合模型相互独立,使得训练文本识别模型框架时缺乏从整体维度的考虑,从而导致文本识别模型框架的准确性偏低的弊端,提高了文本识别模型框架的准确性和可靠性。Based on the above analysis, it can be seen that the embodiment of the present disclosure provides a training method for a text recognition model framework, the method includes: performing feature processing on the sample image based on a preset text detection model, and obtaining at least two text information related to the sample image Based on the preset feature fusion model, at least two kinds of feature information of the sample image are fused to obtain the fusion feature of the sample image, and the fusion feature is input to the feature fusion model. Based on the fusion feature model, the text detection model and the feature The parameters of the fusion model are adjusted respectively to obtain a text recognition model framework, wherein the text recognition model framework includes an adjusted text detection model and an adjusted feature fusion model. In this embodiment, it is introduced: The parameters of the detection model and the feature fusion model are adjusted separately to obtain the technical characteristics of the text recognition model framework, so that there is a high correlation between the text detection model and the feature fusion model in the text recognition model framework, thus realizing the training process Integrity and comprehensiveness, avoiding the independence of the text detection model and the feature fusion model in the related technology, which makes the lack of consideration from the overall dimension when training the text recognition model framework, resulting in the disadvantages of low accuracy of the text recognition model framework. Improved the accuracy and reliability of the text recognition model framework.
请参阅图2,图2是根据本公开第二实施例的示意图。Please refer to FIG. 2 , which is a schematic diagram according to a second embodiment of the present disclosure.
如图2所示,本公开实施例提供的文本识别模型框架的训练方法,包括:As shown in Figure 2, the training method of the text recognition model framework provided by the embodiment of the present disclosure includes:
S201:基于文本检测模型确定样本图像中文本行的位置信息,并根据位置信息确定至少两种特征信息。S201: Determine position information of a text line in a sample image based on a text detection model, and determine at least two types of feature information according to the position information.
其中,关于本实施例与第一实施例的相同特征,本实施例不做赘述。Wherein, regarding the same features of this embodiment and the first embodiment, this embodiment will not repeat them.
基于上述分析可知,本实施例的文本识别模型框架的训练方法可以应用于不同的领域,如保险领域和医疗领域等,现结合本实施例的文本识别模型框架的训练方法应用于保险领域为例,对本实施例进行示范性地描述。Based on the above analysis, it can be seen that the training method of the text recognition model framework of this embodiment can be applied to different fields, such as the insurance field and the medical field, etc., and the training method of the text recognition model framework of this embodiment is applied to the insurance field as an example , to exemplarily describe this embodiment.
如图3所示,样本图像为保险单图像,保险单图像中包括文本信息,如图3中所示的“姓名:XXX”、“保险类型:XXXXXX”、以及“保险年限:XXXX”等。As shown in FIG. 3 , the sample image is an insurance policy image, and the insurance policy image includes text information, such as “name: XXX”, “insurance type: XXXXXX”, and “insurance period: XXXX” as shown in FIG. 3 .
在一些实施例中,可以通过扫描的方式将样本图像传输至训练装置,并由训练装置中的文本检测模型对样本图像中文本行的位置信息进行确定。In some embodiments, the sample image may be transmitted to the training device by scanning, and the text detection model in the training device determines the position information of the text line in the sample image.
在另一些实施例中,如图3所示,也可以由训练装置与外接装置(如存储装置等)连接,并接收由外接装置传输的样本图像,以由训练装置中的文本检测模型对样本图像中文本行的位置信息进行确定。In some other embodiments, as shown in Figure 3, the training device can also be connected with an external device (such as a storage device, etc.), and receive the sample image transmitted by the external device, so that the text detection model in the training device can detect the samples The location information of the text line in the image is determined.
其中,文本行是指,文本信息所在的行。文本行的位置信息是指,文本信息所在的行的位置相关的信息,具体可以为文本信息所在的行、在样本图像中的坐标。Wherein, the text line refers to the line where the text information is located. The position information of the text line refers to information related to the position of the line where the text information is located, and specifically may be the line where the text information is located and the coordinates in the sample image.
例如,文本检测模型在对样本图像进行识别时,可以基于预设矩形框对样本图像中的文本行进行框选,并确定该矩形框在样本图像中的坐标。For example, when a text detection model recognizes a sample image, it may select a text line in the sample image based on a preset rectangular frame, and determine the coordinates of the rectangular frame in the sample image.
在本实施例中,通过确定样本图像中文本行的位置信息,以基于位置信息从样本图像中确定至少两种特征信息,可以实现通过相对较高准确性定位的方式确定至少两种特征信息,进而可以提高至少两种特征信息的准确性和可靠性的技术效果。In this embodiment, by determining the position information of the text line in the sample image, at least two types of feature information can be determined from the sample image based on the position information, so that at least two types of feature information can be determined in a relatively high-accuracy positioning manner, Furthermore, the technical effect of improving the accuracy and reliability of at least two types of characteristic information can be achieved.
在一些实施例中,根据位置信息确定至少两种特征信息,包括:根据位置信息对样本图像进行裁剪操作,得到文本区域,并从文本区域中获取至少两种特征信息。In some embodiments, determining at least two kinds of characteristic information according to the location information includes: performing a cropping operation on the sample image according to the location information to obtain a text region, and obtaining at least two kinds of characteristic information from the text region.
例如,结合上述实施例,在确定出位置信息之后,可以基于位置信息从样本图像中裁剪出矩形框所框选的区域,该区域即为文本区域,并通过对文本区域中的文本信息进行识别的方式,得到至少两种特征信息。For example, in combination with the above-mentioned embodiments, after the position information is determined, the area selected by the rectangular frame can be cut out from the sample image based on the position information, and this area is the text area, and by identifying the text information in the text area In this way, at least two kinds of feature information can be obtained.
在本实施例中,基于位置信息从样本图像中裁剪得到文本区域,可以使得文本区域中包括几乎全量的文本信息,避免文本信息的遗漏,也使得裁剪操作具有较高的准确性,从而使得文本区域具有较高的准确性和可靠性,进而使得基于文本区域确定的至少两种特征信息具有较高的全面性和可靠性的技术效果。In this embodiment, the text area is cut out from the sample image based on the position information, which can make the text area include almost all the text information, avoid the omission of the text information, and also make the cutting operation have higher accuracy, so that the text The region has relatively high accuracy and reliability, thereby making the at least two types of feature information determined based on the text region have relatively high comprehensiveness and reliability.
在一些实施例中,从文本区域中获取至少两种特征信息,包括:从文本区域中提取样本图像的图像特征,并对图像特征进行识别,得到至少两种特征信息。In some embodiments, obtaining at least two types of feature information from the text area includes: extracting image features of the sample image from the text area, and identifying the image features to obtain at least two types of feature information.
其中,图像特征可以从两个大的维度理解,两个大的维度分别为内容维度和外观维度。如在本实施例中,样本图像为包括文本信息的图像,则内容维度的图像特征是指,图像特征中包括的与文本信息的内容相关的特征,如文本内容;外观维度的图像特征是指,图像特征中包括的与文本信息的颜色和纹理等相关的特征。Among them, image features can be understood from two major dimensions, the two major dimensions are content dimension and appearance dimension. As in this embodiment, the sample image is an image including text information, then the image feature of the content dimension refers to the features related to the content of the text information included in the image feature, such as text content; the image feature of the appearance dimension refers to , the features related to the color and texture of the text information included in the image features.
因此,在本实施例中,至少两种特征信息可以包括基于两个大的维度(即内容维度和外观维度)分别确定的两种特征信息。当然,结合上述第一实施例中的分析可知,也可以将该两个大的维度拆分为更小的维度,并基于更小的维度确定三种及以上的特征信息,本实施例不做限定。Therefore, in this embodiment, the at least two kinds of characteristic information may include two kinds of characteristic information respectively determined based on two large dimensions (ie, content dimension and appearance dimension). Of course, combined with the analysis in the first embodiment above, it can be seen that the two large dimensions can also be split into smaller dimensions, and three or more feature information can be determined based on the smaller dimensions. This embodiment does not limited.
在本实施例中,由于文本区域具有较高的准确性和全面性,因此,从文本区域中提取到的图像特征具有较高的准确性和全面性,且在对图像特征进行识别时,得到特征信息时,可以从多个维度进行分析,从而得到多个维度的特征信息,所以,可以提高特征信息的准确性、全面性、以及可靠性的技术效果。In this embodiment, since the text area has high accuracy and comprehensiveness, the image features extracted from the text area have high accuracy and comprehensiveness, and when the image features are identified, the obtained When analyzing feature information, it can be analyzed from multiple dimensions to obtain feature information in multiple dimensions. Therefore, the technical effect of accuracy, comprehensiveness, and reliability of feature information can be improved.
在一些实施例中,至少两种特征信息包括文本特征和视觉特征。In some embodiments, the at least two types of feature information include textual features and visual features.
其中,文本特征可以理解为基于内容维度的特征信息,视觉特征可以理解为外观维度的特征信息。Among them, text features can be understood as feature information based on the content dimension, and visual features can be understood as feature information based on the appearance dimension.
S202:基于预设的特征融合模型对文本特征和视觉特征进行融合处理,得到样本图像的融合特征。S202: Perform fusion processing on text features and visual features based on a preset feature fusion model to obtain fusion features of the sample image.
其中,关于S202的实现原理可以参见第一实施例,此处不再赘述。For the implementation principle of S202, reference may be made to the first embodiment, which will not be repeated here.
S203:构建用于表征文本特征的多个文本特征块,并构建用于表征视觉特征的多个视觉特征块。S203: Construct multiple text feature blocks for representing text features, and construct multiple visual feature blocks for representing visual features.
例如,构建与文本特征具有映射关系的多个文本特征块,多个文本特征块可以用于表征文本特征。For example, multiple text feature blocks having a mapping relationship with text features are constructed, and multiple text feature blocks can be used to represent text features.
示例性地,可以基于需求、历史记录、以及试验等方式,确定文本特征块的数量,并将文本特征映射至多个文本特征块中,多个文本特征块可以对文本特征进行表征。Exemplarily, the number of text feature blocks can be determined based on requirements, historical records, and experiments, and the text features can be mapped to multiple text feature blocks, and the multiple text feature blocks can represent the text features.
具体地,文本特征块可以为2*2(像素)的特征块,且可以基于文本特征的语义信息,将文本特征拆分并存储至多个2*2(像素)的特征块中,从而得到多个文本特征块。Specifically, the text feature block can be a 2*2 (pixel) feature block, and based on the semantic information of the text feature, the text feature can be split and stored into multiple 2*2 (pixel) feature blocks, so as to obtain multiple A text feature block.
其中,语义信息可以理解文本信息在字段分类上相关的信息,也可以理解为文本信息在字段间位置相关的信息,也可以理解为文本信息在表征含义上相关的信息。Among them, the semantic information can be understood as the information related to the field classification of the text information, the information related to the position of the text information between fields, or the information related to the representation meaning of the text information.
同理,关于构建用于表征视觉特征的多个视觉特征块的原理和实现,可以参阅关于构建用于表征文本特征的多个文本特征块,此处不再赘述。Similarly, for the principle and implementation of constructing multiple visual feature blocks for representing visual features, please refer to the construction of multiple text feature blocks for representing text features, which will not be repeated here.
S204:由特征融合模型根据融合特征和多个文本特征块,对文本检测模型和特征融合模型的参数分别进行调整;和/或,由特征融合模型根据融合特征和多个视觉特征块,对文本检测模型和特征融合模型的参数分别进行调整。S204: The feature fusion model adjusts the parameters of the text detection model and the feature fusion model respectively according to the fusion feature and multiple text feature blocks; and/or, the feature fusion model adjusts the text according to the fusion feature and multiple visual feature blocks The parameters of the detection model and the feature fusion model are adjusted separately.
一个示例中,可以结合融合特征和多个文本特征块,对文本检测模型的参数进行调整,且对特征融合模型的参数进行调整。In one example, the parameters of the text detection model can be adjusted by combining the fusion feature and multiple text feature blocks, and the parameters of the feature fusion model can be adjusted.
在一些实施例中,结合融合特征和多个文本特征块,对文本检测模型的参数进行调整,且对特征融合模型的参数进行调整,可以包括如下步骤:In some embodiments, adjusting the parameters of the text detection model in combination with the fusion feature and a plurality of text feature blocks, and adjusting the parameters of the feature fusion model may include the following steps:
第一步骤:由特征融合模型随机遮盖融合特征中的部分文本特征,并根据多个文 本特征块对遮盖的部分文本特征进行预测补齐处理,得到预测补齐后的部分文本特征。The first step: the feature fusion model randomly covers some of the text features in the fusion feature, and performs prediction and completion processing on the covered part of the text features according to multiple text feature blocks, and obtains the part of the text features after the prediction and completion.
基于上述分析可知,训练文本识别模型框架的过程是迭代的过程,因此,在训练的过程中,当前次迭代随机遮盖融合特征中的部分文本特征与在前迭代随机遮盖融合特征中的部分文本特征不相同。Based on the above analysis, it can be seen that the process of training the text recognition model framework is an iterative process. Therefore, during the training process, part of the text features in the random cover fusion feature of the current iteration and some text features in the random cover fusion feature of the previous iteration Are not the same.
示例性地,每次迭代随机遮盖融合特征的部分文本特征完全不同。Exemplarily, each iteration randomly covers part of the text features of the fused features which are completely different.
例如,在第一次迭代时,随机遮盖融合特征中的部分文本特征为,文本特征中的前百分之六的文本特征,在第二次迭代时,随机遮盖融合特征中的部分文本特征为,文本特征中的前百分之六至前百分之十二之间的文本特征,以此类推,不再一一列举。For example, in the first iteration, some of the text features in the randomly masked fusion feature are the top six percent of the text features in the text features, and in the second iteration, some of the text features in the randomly masked fusion feature are , the text features between the first six percent and the first twelve percent of the text features, and so on, and will not be listed one by one.
又如,在第一次迭代时,随机遮盖融合特征中的部分文本特征为,文本特征中的百分之六的文本特征,在第二次迭代时,随机遮盖融合特征中的部分文本特征为,文本特征中的除第一次迭代时遮盖的融合特征之外的百分之六的文本特征。For another example, in the first iteration, part of the text features in the randomly covered fusion feature is 6% of the text features in the text feature, and in the second iteration, part of the text features in the randomly covered fusion feature is , six percent of the text features except the fused features masked in the first iteration.
示例性地,每次迭代随机遮盖融合特征的部分文本特征不完全相同。Exemplarily, each iteration randomly covers part of the text features of the fused features that are not completely the same.
例如,在第一次迭代时,随机遮盖融合特征中的部分文本特征为,文本特征中的百分之六的文本特征,在第二次迭代时,随机遮盖融合特征中的部分文本特征为,文本特征中的百分之六的文本特征,且第一次迭代中遮盖的百分之六的文本特征、与第二次迭代中遮盖的百分之六的文本特征中存在相同的文本特征。For example, in the first iteration, part of the text features in the randomly masked fusion feature is, six percent of the text features in the text features, and in the second iteration, the part of the text features in the randomly masked fusion feature is, Six percent of the text features, and the same text features are present in the six percent of the text features masked in the first iteration as in the six percent of the text features masked in the second iteration.
基于上述分析,多个文本特征块可以用于表征文本特征,因此,当遮挡部分文本特征之后,可以基于多个文本特征块对遮挡的部分文档进行补齐预测,从而得到预测补齐后的部分文本特征。Based on the above analysis, multiple text feature blocks can be used to represent text features. Therefore, when partial text features are occluded, the occluded partial document can be predicted based on multiple text feature blocks, so as to obtain the predicted and completed part text features.
例如,融合特征中的文本特征为A,文本特征A被遮挡的部分文本特征为a1,其他未被遮挡的部分文档特征为a2,则训练装置可以基于多个文本块特征、以及部分文档特征a2推断出部分文本特征的内容(即预测补齐后的部分文本特征)。For example, the text feature in the fusion feature is A, the part of the text feature that is occluded by the text feature A is a1, and the other part of the document feature that is not occluded is a2, then the training device can be based on multiple text block features and part of the document feature a2 Infer the content of some text features (ie, predict the completed part of the text features).
第二步骤:根据预测补齐后的部分文本特征、以及融合特征中除被遮盖的部分文本特征以外的特征,对文本检测模型和特征融合模型的参数分别进行调整。The second step: adjust the parameters of the text detection model and the feature fusion model according to the predicted and completed part of the text features and the features of the fusion features except the covered part of the text features.
结合上述实施例,该步骤可以理解为:训练装置根据推断出的部分文本特征的内容(即预测补齐后的部分文本特征)、部分文档特征为a2,对文本检测模型的参数进行调整,且对特征融合模型的参数进行调整。In conjunction with the above-mentioned embodiment, this step can be understood as: the training device adjusts the parameters of the text detection model according to the content of the inferred partial text feature (ie, the partial text feature after prediction and completion), and the partial document feature is a2, and Adjust the parameters of the feature fusion model.
在本实施例中,通过遮盖融合特征中部分文本特征,并基于多个文本特征块对遮盖的部分文本特征进行预测补齐,以基于得到的预测补齐后的部分文本特征对两个模型(即文本检测模型和特征融合模型)的参数分别进行调整,充分考虑了文本特征中各部分文本特征之间的关联关系(既包括文字内容上的关联关系,也包括位置上的关 联关系),可以提高两个模型的识别辨别能力,从而提高训练得到的文本识别模型框架的准确性和可靠性的技术效果。In this embodiment, by covering part of the text features in the fusion feature, and based on a plurality of text feature blocks, the covered part of the text features is predicted and completed, so that the two models ( That is, the parameters of the text detection model and the feature fusion model) are adjusted separately, fully considering the relationship between the text features of each part of the text feature (including the relationship between the text content and the position), which can be The technical effect of improving the recognition and discrimination capabilities of the two models, thereby improving the accuracy and reliability of the trained text recognition model framework.
在另一些实施例中,结合融合特征和多个文本特征块,对文本检测模型的参数进行调整,且对特征融合模型的参数进行调整,可以包括如下步骤:In other embodiments, adjusting the parameters of the text detection model in combination with the fusion feature and a plurality of text feature blocks, and adjusting the parameters of the feature fusion model may include the following steps:
第一步骤:由特征融合模型根据多个文本特征块中的至少部分文本特征块,对融合特征中的文本特征进行替换处理,得到替换后的文本特征。The first step: the feature fusion model replaces the text features in the fusion features according to at least part of the text feature blocks in the plurality of text feature blocks, and obtains the replaced text features.
第二步骤:根据融合特征中的视觉特征、以及替换后的文本特征,对文本检测模型和特征融合模型的参数分别进行调整。The second step: adjust the parameters of the text detection model and the feature fusion model respectively according to the visual features in the fusion features and the replaced text features.
其中,关于本实施例对融合特征中的文本特征的替换原理可以为全部替换,也可以为部分替换,本实施例不做限定。Wherein, the principle of replacing the text features in the fusion features in this embodiment may be full replacement or partial replacement, which is not limited in this embodiment.
且关于对融合特征中的文本特征的替换原理,可以参见上述实施例中,对融合特征中的文本特征中的部分文本特征的遮盖处理的原理,此处不再赘述。As for the principle of replacing the text features in the fused features, please refer to the principle of covering some of the text features in the fused features in the above embodiment, which will not be repeated here.
同理,在本实施例中,通过对融合特征中的文本特征进行替换处理,并基于替换后的文本特征、以及融合特征中的视觉特征,对两个模型(即文本检测模型和特征融合模型)的参数分别进行调整,可以提高两个模型的识别辨别能力,从而提高训练得到的文本识别模型框架的准确性和可靠性的技术效果。Similarly, in this embodiment, by replacing the text features in the fusion features, and based on the replaced text features and the visual features in the fusion features, the two models (namely, the text detection model and the feature fusion model ) parameters are adjusted separately, which can improve the recognition and discrimination capabilities of the two models, thereby improving the technical effect of the accuracy and reliability of the trained text recognition model framework.
另一个示例中,可以结合融合特征和多个视觉特征块,对文本检测模型的参数进行调整,且对特征融合模型的参数进行调整。In another example, the parameters of the text detection model can be adjusted by combining the fusion feature and multiple visual feature blocks, and the parameters of the feature fusion model can be adjusted.
在一些实施例中,结合融合特征和多个视觉特征块,对文本检测模型的参数进行调整,且对特征融合模型的参数进行调整,可以包括如下步骤:In some embodiments, adjusting the parameters of the text detection model in combination with the fusion feature and a plurality of visual feature blocks, and adjusting the parameters of the feature fusion model may include the following steps:
第一步骤:由特征融合模型随机遮盖融合特征中的部分视觉特征,并根据多个视觉特征块对被遮盖的部分视觉特征进行预测补齐处理,得到预测补齐后的部分视觉特征。The first step: the feature fusion model randomly covers some of the visual features in the fusion feature, and performs prediction and completion processing on the covered part of the visual features according to multiple visual feature blocks, and obtains the partial visual features after prediction and completion.
第二步骤:根据预测补齐后的部分视觉特征、以及融合特征中除被遮盖的部分视觉特征以外的特征,对文本检测模型和特征融合模型的参数分别进行调整。The second step: adjust the parameters of the text detection model and the feature fusion model according to the predicted and completed part of the visual features and the features of the fusion features except for the covered part of the visual features.
其中,关于本实施例的实现原理,可以参见上述实施例中,结合融合特征和多个文本特征块的实现原理,此处不再赘述。For the implementation principle of this embodiment, reference may be made to the implementation principle of combining the fusion feature and multiple text feature blocks in the above-mentioned embodiments, which will not be repeated here.
同理,在本实施例中,通过遮盖融合特征中部分视觉特征,并基于多个视觉特征块对遮盖的部分视觉特征进行预测补齐,以基于得到的预测补齐后的部分视觉特征对两个模型(即文本检测模型和特征融合模型)的参数分别进行调整,可以提高两个模型的识别辨别能力,从而提高训练得到的文本识别模型框架的准确性和可靠性的技术 效果。Similarly, in this embodiment, by covering part of the visual features in the fusion feature, and based on a plurality of visual feature blocks, the covered part of the visual features is predicted and completed, so that the two Adjusting the parameters of the two models (ie, the text detection model and the feature fusion model) respectively can improve the recognition and discrimination capabilities of the two models, thereby improving the technical effect of the accuracy and reliability of the trained text recognition model framework.
在另一些实施例中,结合融合特征和多个视觉特征块,对文本检测模型的参数进行调整,且对特征融合模型的参数进行调整,可以包括如下步骤:In other embodiments, adjusting the parameters of the text detection model in combination with the fusion feature and a plurality of visual feature blocks, and adjusting the parameters of the feature fusion model may include the following steps:
第一步骤:由特征融合模型根据多个视觉特征块中的至少部分视觉特征块,对融合特征中的视觉特征进行替换处理,得到替换后的视觉特征。The first step: the feature fusion model replaces the visual features in the fused features according to at least part of the visual feature blocks in the plurality of visual feature blocks, and obtains the replaced visual features.
第二步骤:根据融合特征中的文本特征、以及替换后的视觉特征,对文本检测模型和特征融合模型的参数分别进行调整。The second step: adjust the parameters of the text detection model and the feature fusion model respectively according to the text features in the fusion features and the replaced visual features.
其中,关于本实施例的实现原理,可以参见上述实施例中,结合融合特征和多个文本特征块的实现原理,此处不再赘述。For the implementation principle of this embodiment, reference may be made to the implementation principle of combining the fusion feature and multiple text feature blocks in the above-mentioned embodiments, which will not be repeated here.
同理,在本实施例中,通过替换融合特征中的视觉特征,并基于替换后的视觉特征、以及融合特征中的文本特征,对两个模型(即文本检测模型和特征融合模型)的参数分别进行调整,可以提高两个模型的识别辨别能力,从而提高训练得到的文本识别模型框架的准确性和可靠性的技术效果。Similarly, in this embodiment, by replacing the visual features in the fusion features, and based on the replaced visual features and the text features in the fusion features, the parameters of the two models (ie, the text detection model and the feature fusion model) Adjusting them separately can improve the recognition and discrimination capabilities of the two models, thereby improving the technical effect of the accuracy and reliability of the trained text recognition model framework.
再一个示例中,可以结合融合特征、多个文本特征块、以及多个视觉特征块,对文本检测模型的参数进行调整,且对特征融合模型的参数进行调整。In another example, the parameters of the text detection model and the parameters of the feature fusion model can be adjusted by combining the fusion feature, multiple text feature blocks, and multiple visual feature blocks.
例如,该示例可以包括如下步骤:For example, the example could include the following steps:
第一步骤:由特征融合模型根据融合特征和多个文本特征块,确定用于对文本检测模型和特征融合模型的进行调整的第一调整任务结果。The first step: the feature fusion model determines a first adjustment task result for adjusting the text detection model and the feature fusion model according to the fusion feature and multiple text feature blocks.
第二步骤:由特征融合模型根据融合特征和多个视觉特征块,确定用于对文本检测模型和特征融合模型的进行调整的第二调整任务结果。The second step: the feature fusion model determines a second adjustment task result for adjusting the text detection model and the feature fusion model according to the fusion feature and a plurality of visual feature blocks.
第三步骤:根据第一调整任务结果和第二调整任务结果的加权平均信息,对文本检测模型和特征融合模型的参数分别进行调整。The third step: adjust the parameters of the text detection model and the feature fusion model respectively according to the weighted average information of the first adjustment task result and the second adjustment task result.
结合上述实施例,在一些实施例中,可以将遮挡融合特征中的部分文本特征作为第一训练任务,得到第一训练结果;将遮挡融合特征中的部分视觉特征作为第二训练任务,得到第二训练结果;将替换融合特征中的文本特征作为第三训练任务,进行训练,得到第三训练结果;对第一训练结果、第二训练结果、以及第三训练结果进行加权平均处理,从而得到最终用于调整文本检测模型的参数和特征融合模型参数,并基于得到的最终用于调整文本检测模型的参数对文本检测模型的参数进行调整,基于得到的最终用于对调整特征融合模型的参数对特征融合模型的参数进行调整。In combination with the above-mentioned embodiments, in some embodiments, part of the text features in the occlusion fusion feature can be used as the first training task to obtain the first training result; part of the visual features in the occlusion fusion feature can be used as the second training task to obtain the first training result. Two training results; replace the text feature in the fusion feature as the third training task, perform training, and obtain the third training result; carry out weighted average processing on the first training result, the second training result, and the third training result, so as to obtain Finally, it is used to adjust the parameters of the text detection model and the parameters of the feature fusion model, and adjust the parameters of the text detection model based on the parameters that are finally used to adjust the text detection model, and based on the parameters that are finally used to adjust the feature fusion model Adjust the parameters of the feature fusion model.
在另一些实施例中,可以将遮挡融合特征中的部分文本特征作为第一训练任务,得到第一训练结果;将遮挡融合特征中的部分视觉特征作为第二训练任务,得到第二 训练结果;将替换融合特征中的视觉特征作为第三训练任务,进行训练,得到第三训练结果;对第一训练结果、第二训练结果、以及第三训练结果进行加权平均处理,从而得到最终用于调整文本检测模型的参数和特征融合模型参数,并基于得到的最终用于调整文本检测模型的参数对文本检测模型的参数进行调整,基于得到的最终用于对调整特征融合模型的参数对特征融合模型的参数进行调整。In other embodiments, part of the text features in the occlusion fusion feature can be used as the first training task to obtain the first training result; part of the visual features in the occlusion fusion feature can be used as the second training task to obtain the second training result; Replacing the visual features in the fusion feature as the third training task, training to obtain the third training result; performing weighted average processing on the first training result, the second training result, and the third training result, so as to obtain the final adjustment The parameters of the text detection model and the parameters of the feature fusion model, and adjust the parameters of the text detection model based on the parameters that are finally used to adjust the text detection model, and adjust the feature fusion model based on the parameters that are finally used to adjust the feature fusion model parameters to adjust.
关于本实施例中训练任务的组合方式,可以为上述三种训练任务相结合(应该理解地是,上述三种训练任务相结合的方式,只是用于示范性地说明,而不能理解为对相结合的方式的限定,即还可以为除上述组合方式之外的其他组合方式,此处不再一一列举),也可以为两种训练任务相结合,其实现原理可以参见三种训练任务相结合的实现原理,此处不再赘述。Regarding the combination of training tasks in this embodiment, it can be a combination of the above three training tasks (it should be understood that the combination of the above three training tasks is only used for exemplary illustrations, and cannot be understood as a combination of relative training tasks. The limitation of the way of combination, that is, it can also be other combination ways besides the above-mentioned combination way, which will not be listed one by one here), can also be a combination of two training tasks, and its realization principle can be found in the three training tasks The realization principle of the combination is not repeated here.
在本实施例中,采用多任务训练的方式训练得到文本识别模型框架,将融合特征和多个文本特征块进行的训练作为其中的一个训练任务,将融合特征和多个视觉特征块进行的训练作为另一个训练任务,并基于两个训练任务的训练结果确定最终用于调整文本检测模型的参数和特征融合模型的参数,从而实现对文本检测模型的参数和特征融合模型的参数的调整,而通过多任务训练的方式调整文本检测模型的参数和特征融合模型的参数,可以实现调整的准确性和可靠性的技术效果。In this embodiment, the text recognition model framework is obtained through multi-task training, the training of fusion features and multiple text feature blocks is used as one of the training tasks, and the training of fusion features and multiple visual feature blocks is As another training task, and based on the training results of the two training tasks, determine the parameters for adjusting the parameters of the text detection model and the parameters of the feature fusion model, so as to realize the adjustment of the parameters of the text detection model and the parameters of the feature fusion model, while By adjusting the parameters of the text detection model and the parameters of the feature fusion model through multi-task training, the technical effect of the accuracy and reliability of the adjustment can be achieved.
值得说明地是,在本实施例中,通过基于不同的方式,对文本检测模型和特征融合模型的参数分别进行调整,可以提高参数调整的灵活性和多样性,从而实现训练文本识别模型框架的灵活性和多样性的技术效果。It is worth noting that in this embodiment, by adjusting the parameters of the text detection model and the feature fusion model based on different methods, the flexibility and diversity of parameter adjustment can be improved, so as to achieve the goal of training the text recognition model framework. The technical effect of flexibility and variety.
图4是根据本公开第三实施例的示意图,如图4所示,本实施例的文本识别方法包括:FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure. As shown in FIG. 4, the text recognition method of this embodiment includes:
S401:获取待识别图像。S401: Acquire an image to be recognized.
S402:将待识别图像输入至预先训练的文本识别模型,得到待识别图像中的文本信息。S402: Input the image to be recognized into a pre-trained text recognition model to obtain text information in the image to be recognized.
其中,文本识别模型是基于预先训练的文本识别模型框架对待训练图像进行训练生成的,文本识别模型框架为由如上任意实施例所述训练方法训练获得,待训练图像中包括文本信息。Wherein, the text recognition model is generated by training images to be trained based on a pre-trained text recognition model framework, the text recognition model framework is obtained by training the training method described in any of the above embodiments, and the images to be trained include text information.
基于上述分析可知,文本识别模型框架包括文本检测模型和特征融合模型,具有较高的准确性和可靠性,因此,当基于文本识别模型框架训练得到文本识别模型时,可以使得文本识别模型具有较高的准确性和可靠性的技术效果,而当基于文本识别模型对待识别图像进行识别时,可以提高识别的有效性和可靠性的技术效果。Based on the above analysis, it can be seen that the text recognition model framework includes a text detection model and a feature fusion model, which has high accuracy and reliability. The technical effect of high accuracy and reliability, and the technical effect of improving the effectiveness and reliability of recognition when the image to be recognized is recognized based on the text recognition model.
图5是根据本公开第四实施例的示意图,如图5所示,本实施例的文本识别模型框架的训练装置500包括:Fig. 5 is a schematic diagram according to the fourth embodiment of the present disclosure. As shown in Fig. 5, the training device 500 of the text recognition model framework of this embodiment includes:
处理单元501,用于基于预设的文本检测模型对样本图像进行特征处理,得到与样本图像中文本信息相关的至少两种特征信息。The processing unit 501 is configured to perform feature processing on the sample image based on a preset text detection model to obtain at least two types of feature information related to text information in the sample image.
融合单元502,用于基于预设的特征融合模型对样本图像的至少两种特征信息进行融合处理,得到样本图像的融合特征。The fusion unit 502 is configured to perform fusion processing on at least two types of feature information of the sample image based on a preset feature fusion model to obtain fusion features of the sample image.
训练单元503,用于将融合特征输入至特征融合模型,基于融合特征模型对文本检测模型和特征融合模型的参数分别进行调整,得到文本识别模型框架,其中,文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。The training unit 503 is configured to input the fusion feature into the feature fusion model, adjust the parameters of the text detection model and the feature fusion model based on the fusion feature model, and obtain a text recognition model framework, wherein the text recognition model framework includes the adjusted Text detection model and adjusted feature fusion model.
图6是根据本公开第五实施例的示意图,如图6所示,本实施例的文本识别模型框架的训练装置600包括:Fig. 6 is a schematic diagram according to the fifth embodiment of the present disclosure. As shown in Fig. 6, the training device 600 of the text recognition model framework of this embodiment includes:
处理单元601,用于基于预设的文本检测模型对样本图像进行特征处理,得到与样本图像中文本信息相关的至少两种特征信息。The processing unit 601 is configured to perform feature processing on the sample image based on a preset text detection model to obtain at least two types of feature information related to text information in the sample image.
结合图6可知,在一些实施例中,处理单元601包括:As can be seen with reference to FIG. 6, in some embodiments, the processing unit 601 includes:
第一确定子单元6011,用于基于文本检测模型确定样本图像中文本行的位置信息。The first determination subunit 6011 is configured to determine the position information of the text line in the sample image based on the text detection model.
第二确定子单元6012,用于根据位置信息确定至少两种特征信息。The second determining subunit 6012 is configured to determine at least two kinds of characteristic information according to the location information.
在一些实施例中,第二确定子单元6012包括:In some embodiments, the second determining subunit 6012 includes:
裁剪模块,用于根据位置信息对样本图像进行裁剪操作,得到文本区域。The cropping module is configured to perform a cropping operation on the sample image according to the location information to obtain a text area.
获取模块,用于从文本区域中获取至少两种特征信息。An acquisition module, configured to acquire at least two types of feature information from the text area.
在一些实施例中,获取模块用于,从文本区域中提取样本图像的图像特征,并对图像特征进行识别,得到至少两种特征信息。In some embodiments, the acquisition module is configured to extract image features of the sample image from the text area, and identify the image features to obtain at least two types of feature information.
融合单元602,用于基于预设的特征融合模型对样本图像的至少两种特征信息进行融合处理,得到样本图像的融合特征。The fusion unit 602 is configured to perform fusion processing on at least two types of feature information of the sample image based on a preset feature fusion model to obtain fusion features of the sample image.
其中,至少两种特征信息包括文本特征和视觉特征。Wherein, at least two types of feature information include text features and visual features.
构建单元603,用于构建用于表征文本特征的多个文本特征块,并构建用于表征视觉特征的多个视觉特征块。A construction unit 603, configured to construct multiple text feature blocks for representing text features, and construct multiple visual feature blocks for representing visual features.
训练单元604,用于将融合特征输入至特征融合模型,基于融合特征模型对文本检测模型和特征融合模型的参数分别进行调整,得到文本识别模型框架,其中,文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。The training unit 604 is configured to input the fusion feature into the feature fusion model, adjust the parameters of the text detection model and the feature fusion model based on the fusion feature model, and obtain a text recognition model framework, wherein the text recognition model framework includes the adjusted Text detection model and adjusted feature fusion model.
结合图6可知,在一些实施例中,训练单元604包括:As can be seen from FIG. 6, in some embodiments, the training unit 604 includes:
第一遮盖子单元60411,用于随机遮盖融合特征中的部分文本特征。The first covering subunit 60411 is used to randomly cover some text features in the fused features.
第二预测补齐子单元60412,用于根据多个文本特征块对遮盖的所述部分文本特征进行预测补齐处理,得到预测补齐后的部分文本特征。The second prediction and completion subunit 60412 is configured to perform prediction and completion processing on the covered partial text features according to multiple text feature blocks, and obtain the partial text features after prediction and completion.
第一调整子单元60413,用于根据预测补齐后的部分文本特征、以及融合特征中除被遮盖的部分文本特征以外的特征,对文本检测模型和特征融合模型的参数分别进行调整。The first adjustment subunit 60413 is used to adjust the parameters of the text detection model and the feature fusion model respectively according to the predicted and completed part of the text features and the features of the fusion features except the covered part of the text features.
在另一些实施例中,训练单元604包括:In other embodiments, the training unit 604 includes:
第二遮盖子单元60414,用于随机遮盖融合特征中的部分视觉特征。The second covering subunit 60414 is used to randomly cover some visual features in the fused features.
第二预测补齐子单元60415,用于根据多个视觉特征块对被遮盖的部分视觉特征进行预测补齐处理,得到预测补齐后的部分视觉特征。The second prediction and completion subunit 60415 is used to perform prediction and completion processing on the covered partial visual features according to multiple visual feature blocks, and obtain the partial visual features after prediction and completion.
第二调整子单元60416,用于根据预测补齐后的部分视觉特征、以及融合特征中除被遮盖的部分视觉特征以外的特征,对文本检测模型和特征融合模型的参数分别进行调整。The second adjustment subunit 60416 is used to adjust the parameters of the text detection model and the feature fusion model respectively according to the predicted and completed part of the visual features and the features of the fusion features except the covered part of the visual features.
结合图6可知,在一些实施例中,训练单元604还可以包括:As can be seen from FIG. 6, in some embodiments, the training unit 604 may further include:
第一替换子单元60417,用于根据多个文本特征块中的至少部分文本特征块,对融合特征中的文本特征进行替换处理,得到替换后的文本特征。The first replacement subunit 60417 is configured to replace the text features in the fusion feature according to at least part of the text feature blocks in the plurality of text feature blocks to obtain replaced text features.
第三调整子单元60418,用于根据融合特征中的视觉特征、以及替换后的文本特征,对文本检测模型和特征融合模型的参数分别进行调整。The third adjustment subunit 60418 is used to adjust the parameters of the text detection model and the feature fusion model respectively according to the visual features in the fused features and the replaced text features.
在另一些实施例中,训练单元604包括:In other embodiments, the training unit 604 includes:
第二替换子单元60419,用于根据多个视觉特征块中的至少部分视觉特征块,对融合特征中的视觉特征进行替换处理,得到替换后的视觉特征。The second replacement subunit 60419 is configured to replace the visual features in the fusion feature according to at least part of the visual feature blocks in the plurality of visual feature blocks to obtain the replaced visual features.
第四调整子单元60420,用于根据融合特征中的文本特征、以及替换后的视觉特征,对文本检测模型和特征融合模型的参数分别进行调整。The fourth adjustment subunit 60420 is used to adjust the parameters of the text detection model and the feature fusion model respectively according to the text features in the fusion features and the replaced visual features.
结合图6可知,在一些实施例中,若由特征融合模型根据融合特征、多个文本特征块、以及多个视觉特征块,对文本检测模型和特征融合模型的参数分别进行调整,训练单元604还可以包括:As can be seen from FIG. 6, in some embodiments, if the feature fusion model adjusts the parameters of the text detection model and the feature fusion model according to the fusion feature, multiple text feature blocks, and multiple visual feature blocks, the training unit 604 Can also include:
第三确定子单元60421,用于根据融合特征和多个文本特征块,确定用于对文本检测模型和特征融合模型的进行调整的第一调整任务结果。The third determination subunit 60421 is configured to determine a first adjustment task result for adjusting the text detection model and the feature fusion model according to the fusion feature and the plurality of text feature blocks.
第四确定子单元60422,用于根据融合特征和多个视觉特征块,确定用于对文本检测模型和特征融合模型的进行调整的第二调整任务结果。The fourth determination subunit 60422 is configured to determine a second adjustment task result for adjusting the text detection model and the feature fusion model according to the fusion feature and multiple visual feature blocks.
第五调整子单元60423,用于根据第一调整任务结果和第二调整任务结果的加权平均信息,对文本检测模型和特征融合模型的参数分别进行调整。The fifth adjustment subunit 60423 is configured to adjust the parameters of the text detection model and the feature fusion model respectively according to the weighted average information of the first adjustment task result and the second adjustment task result.
图7是根据本公开第六实施例的示意图,如图7所示,本实施例的文本识别装置700,包括:Fig. 7 is a schematic diagram according to the sixth embodiment of the present disclosure. As shown in Fig. 7, the text recognition device 700 of this embodiment includes:
获取单元701,用于获取待训练图像,待训练图像中包括文本信息。The acquiring unit 701 is configured to acquire images to be trained, where the images to be trained include text information.
识别单元702,用于将待识别图像输入至预先训练的文本识别模型,得到待识别图像中的文本信息,其中,文本识别模型是基于预先训练的文本识别模型框架对待训练图像进行训练生成的,文本识别模型框架为由如上任一实施例所述训练方法训练获得,待训练图像中包括文本信息。The recognition unit 702 is configured to input the image to be recognized into a pre-trained text recognition model to obtain text information in the image to be recognized, wherein the text recognition model is generated by training the image to be trained based on the pre-trained text recognition model framework, The text recognition model framework is obtained by training the training method described in any one of the above embodiments, and the images to be trained include text information.
根据本公开的实施例,本公开还提供了一种电子设备和一种可读存储介质。According to the embodiments of the present disclosure, the present disclosure also provides an electronic device and a readable storage medium.
根据本公开的实施例,本公开还提供了一种计算机程序产品,计算机程序产品包括:计算机程序,计算机程序存储在可读存储介质中,电子设备的至少一个处理器可以从可读存储介质读取计算机程序,至少一个处理器执行计算机程序使得电子设备执行上述任一实施例提供的方案。According to an embodiment of the present disclosure, the present disclosure also provides a computer program product. The computer program product includes: a computer program, the computer program is stored in a readable storage medium, and at least one processor of an electronic device can read the program from the readable storage medium. Taking a computer program, at least one processor executes the computer program so that the electronic device executes the solution provided by any one of the above embodiments.
图8示出了可以用来实施本公开的实施例的示例电子设备800的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图8所示,电子设备800包括计算单元801,其可以根据存储在只读存储器(ROM)802中的计算机程序或者从存储单元808加载到随机访问存储器(RAM)803中的计算机程序,来执行各种适当的动作和处理。在RAM 803中,还可存储设备800操作所需的各种程序和数据。计算单元801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。As shown in FIG. 8 , an electronic device 800 includes a computing unit 801, which can perform calculations according to a computer program stored in a read-only memory (ROM) 802 or a computer program loaded from a storage unit 808 into a random access memory (RAM) 803. Various appropriate actions and processes are performed. In the RAM 803, various programs and data necessary for the operation of the device 800 can also be stored. The computing unit 801, ROM 802, and RAM 803 are connected to each other through a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804 .
设备800中的多个部件连接至I/O接口805,包括:输入单元806,例如键盘、鼠标等;输出单元807,例如各种类型的显示器、扬声器等;存储单元808,例如磁盘、光盘等;以及通信单元809,例如网卡、调制解调器、无线通信收发机等。通信单元809允许设备800通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, an optical disk, etc. ; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 809 allows the device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
计算单元801可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元801的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信 号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元801执行上文所描述的各个方法和处理,例如文本识别模型框架的训练方法、文本识别方法。例如,在一些实施例中,文本识别模型框架的训练方法、文本识别方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元808。在一些实施例中,计算机程序的部分或者全部可以经由ROM 802和/或通信单元809而被载入和/或安装到设备800上。当计算机程序加载到RAM 803并由计算单元801执行时,可以执行上文描述的文本识别模型框架的训练方法、文本识别方法的一个或多个步骤。备选地,在其他实施例中,计算单元801可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行文本识别模型框架的训练方法、文本识别方法。The computing unit 801 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 801 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 801 executes various methods and processes described above, such as a training method of a text recognition model framework and a text recognition method. For example, in some embodiments, the training method of the text recognition model framework and the text recognition method can be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 808 . In some embodiments, part or all of the computer program may be loaded and/or installed on the device 800 via the ROM 802 and/or the communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the training method of the text recognition model framework and the text recognition method described above can be performed. Alternatively, in other embodiments, the computing unit 801 may be configured in any other appropriate way (for example, by means of firmware) to execute the training method of the text recognition model framework and the text recognition method.
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务("Virtual Private Server",或简称"VPS")中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "VPS") Among them, there are defects such as difficult management and weak business scalability. The server can also be a server of a distributed system, or a server combined with a blockchain.
根据本公开实施例的另一个方面,本公开实施例还提供了一种文本识别模型框架的训练系统,包括:According to another aspect of the embodiments of the present disclosure, the embodiments of the present disclosure also provide a training system for a text recognition model framework, including:
文本检测模型,用于对样本图像进行特征处理,得到与样本图像中文本信息相关的至少两种特征信息。The text detection model is used to perform feature processing on the sample image to obtain at least two kinds of feature information related to the text information in the sample image.
特征融合模型,用于对样本图像的至少两种特征信息进行融合处理,得到样本图像的融合特征。The feature fusion model is used to fuse at least two kinds of feature information of the sample image to obtain the fusion feature of the sample image.
特征融合模型还用于,对文本检测模型和特征融合模型的参数分别进行调整,得到文本识别模型框架,其中,文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。The feature fusion model is also used to adjust the parameters of the text detection model and the feature fusion model respectively to obtain a text recognition model framework, wherein the text recognition model framework includes an adjusted text detection model and an adjusted feature fusion model.
结合上述分析可知,在一些实施例中,文本检测模型可以为光学字符识别模型,特征融合模型可以为transfromer模型。Based on the above analysis, it can be known that in some embodiments, the text detection model may be an optical character recognition model, and the feature fusion model may be a transfromer model.
结合图9可知,在一些实施例中,本公开实施例的文本识别模型框架的训练系统900,包括:It can be seen from FIG. 9 that, in some embodiments, the training system 900 of the text recognition model framework of the embodiment of the present disclosure includes:
光学字符识别模型901,用于对样本图像中的文本信息进行检测,得到样本图像中文本行的位置信息,并将位置信息传输给区域特征提取器902。The optical character recognition model 901 is used to detect the text information in the sample image, obtain the position information of the text line in the sample image, and transmit the position information to the region feature extractor 902 .
区域特征提取器902,用于根据位置信息对样本图像进行裁剪操作,得到文本区域,并将文本区域分别传输给文字识别器903和视觉识别器904。The region feature extractor 902 is configured to perform a cropping operation on the sample image according to the position information to obtain a text region, and transmit the text region to the text recognizer 903 and the visual recognizer 904 respectively.
文字识别器903,用于确定文本区域中的文本特征,并将文本特征传输给transfromer模型905。A text recognizer 903 is configured to determine text features in the text region, and transmit the text features to the transferer model 905 .
视觉识别器904,用于确定文本区域中的视觉特征,并将视觉特征传输给transfromer模型905。The visual recognizer 904 is configured to determine the visual features in the text region, and transmit the visual features to the transferer model 905 .
transfromer模型905对文本特征和视觉特征进行融合处理,得到融合特征,并基于融合特征对光学字符识别模型901的参数和transfromer模型905的参数进行调整,从而得到文本识别模型框架。The transfromer model 905 fuses text features and visual features to obtain fusion features, and adjusts the parameters of the optical character recognition model 901 and the parameters of the transfromer model 905 based on the fusion features to obtain a text recognition model framework.
其中,文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。Wherein, the text recognition model framework includes an adjusted text detection model and an adjusted feature fusion model.
应该理解地是,上述实施例中的各组件可以一体集成,也可以独立形成,本实施例不做限定。It should be understood that the components in the foregoing embodiments may be integrated or formed independently, which is not limited in this embodiment.
例如,光学字符识别模型、区域特征提取器、文字识别器、以及视觉识别器,为彼此独立的组件;又如,区域特征提取器为集成与光学字符识别模型中的组件,与文字识别器和视觉识别器分别独立,且文字识别器和视觉识别器为两个独立的组件等,此处不在一一列举。For example, an optical character recognition model, an area feature extractor, a text recognizer, and a visual recognizer are independent components; another example, an area feature extractor is an integrated component in an optical character recognition model, and the text recognizer and The visual recognizers are independent, and the text recognizer and the visual recognizer are two independent components, etc., which are not listed here.
且关于上述各特征的实现原理可以参见上述方法实施例中地描述,此处不再赘述。For the implementation principles of the above features, reference may be made to the descriptions in the above method embodiments, which will not be repeated here.
根据本申请实施例的另一个方面,本申请实施例还提供了一种计算机程序,包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行如上任一实施例所述的方法。According to another aspect of the embodiments of the present application, the embodiments of the present application further provide a computer program, including program code, and when the computer runs the computer program, the program code executes the method described in any of the above embodiments.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开提供的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present application may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution provided by the present disclosure can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims (27)

  1. 一种文本识别模型框架的训练方法,所述方法包括:A training method of a text recognition model framework, said method comprising:
    基于预设的文本检测模型对样本图像进行特征处理,得到与所述样本图像中文本信息相关的至少两种特征信息;performing feature processing on the sample image based on a preset text detection model to obtain at least two types of feature information related to the text information in the sample image;
    基于预设的特征融合模型对所述样本图像的至少两种特征信息进行融合处理,得到所述样本图像的融合特征;performing fusion processing on at least two types of feature information of the sample image based on a preset feature fusion model to obtain fusion features of the sample image;
    将所述融合特征输入至所述特征融合模型,基于所述融合特征模型对所述文本检测模型和所述特征融合模型的参数分别进行调整,得到文本识别模型框架,其中,所述文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。Input the fusion feature into the feature fusion model, adjust the parameters of the text detection model and the feature fusion model based on the fusion feature model respectively, to obtain a text recognition model framework, wherein the text recognition model The framework includes an adjusted text detection model and an adjusted feature fusion model.
  2. 根据权利要求1所述的方法,其中,基于预设的文本检测模型对样本图像进行特征处理,得到与所述样本图像中文本相关的至少两种特征信息,包括:The method according to claim 1, wherein the sample image is subjected to feature processing based on a preset text detection model to obtain at least two types of feature information related to the text in the sample image, including:
    基于所述文本检测模型确定所述样本图像中文本行的位置信息,并根据所述位置信息确定所述至少两种特征信息。The position information of the text line in the sample image is determined based on the text detection model, and the at least two kinds of feature information are determined according to the position information.
  3. 根据权利要求2所述的方法,其中,根据所述位置信息确定所述至少两种特征信息,包括:The method according to claim 2, wherein determining the at least two kinds of characteristic information according to the location information comprises:
    根据所述位置信息对所述样本图像进行裁剪操作,得到文本区域,并从所述文本区域中获取所述至少两种特征信息。A cropping operation is performed on the sample image according to the position information to obtain a text area, and the at least two kinds of feature information are obtained from the text area.
  4. 根据权利要求3所述的方法,其中,从所述文本区域中获取所述至少两种特征信息,包括:The method according to claim 3, wherein obtaining the at least two types of characteristic information from the text region comprises:
    从所述文本区域中提取所述样本图像的图像特征,并对所述图像特征进行识别,得到所述至少两种特征信息。Extracting image features of the sample image from the text area, and identifying the image features, to obtain the at least two types of feature information.
  5. 根据权利要求2至4中任一项所述的方法,所述至少两种特征信息包括文本特征和视觉特征;在根据所述位置信息确定所述至少两种特征信息之后,还包括:According to the method according to any one of claims 2 to 4, the at least two kinds of characteristic information include textual characteristics and visual characteristics; after determining the at least two kinds of characteristic information according to the position information, further comprising:
    构建用于表征所述文本特征的多个文本特征块,并构建用于表征所述视觉特征的多个视觉特征块;Constructing a plurality of text feature blocks for characterizing the text features, and constructing a plurality of visual feature blocks for characterizing the visual features;
    以及,将所述融合特征输入至所述特征融合模型,基于所述融合特征模型对所述文本检测模型和所述特征融合模型的参数分别进行调整,包括:由所述特征融合模型根据所述融合特征和所述多个文本特征块,对所述文本检测模型和所述特征融合模型的参数分别进行调整;和/或,由所述特征融合模型根据所述融合特征和所述多个视觉特征块,对所述文本检测模型和所述特征融合模型的参数分别进行调整。And, inputting the fusion feature into the feature fusion model, adjusting the parameters of the text detection model and the feature fusion model respectively based on the fusion feature model, including: using the feature fusion model according to the Fusing features and the plurality of text feature blocks, respectively adjusting the parameters of the text detection model and the feature fusion model; and/or, using the feature fusion model according to the fusion features and the plurality of visual The feature block adjusts the parameters of the text detection model and the feature fusion model respectively.
  6. 根据权利要求5所述的方法,其中,由所述特征融合模型根据所述融合特征和所述多个文本特征块,对所述文本检测模型和所述特征融合模型的参数分别进行调整,包括:The method according to claim 5, wherein the parameters of the text detection model and the feature fusion model are respectively adjusted by the feature fusion model according to the fusion feature and the plurality of text feature blocks, including :
    由所述特征融合模型随机遮盖所述融合特征中的部分文本特征,并根据所述多个文本特征块对遮盖的所述部分文本特征进行预测补齐处理,得到预测补齐后的部分文本特征;Randomly cover some text features in the fused features by the feature fusion model, and perform prediction and completion processing on the covered part of the text features according to the plurality of text feature blocks, to obtain part of the text features after prediction and completion ;
    根据所述预测补齐后的部分文本特征、以及所述融合特征中除被遮盖的所述部分文本特征以外的特征,对所述文本检测模型和所述特征融合模型的参数分别进行调整。The parameters of the text detection model and the feature fusion model are adjusted respectively according to the predicted and completed part of the text features and the features of the fusion features other than the covered part of the text features.
  7. 根据权利要求5所述的方法,其中,由所述特征融合模型根据所述融合特征和所述多个视觉特征块,对所述文本检测模型和所述特征融合模型的参数分别进行调整,包括:The method according to claim 5, wherein the parameters of the text detection model and the feature fusion model are respectively adjusted by the feature fusion model according to the fusion feature and the plurality of visual feature blocks, including :
    由所述特征融合模型随机遮盖所述融合特征中的部分视觉特征,并根据所述多个视觉特征块对被遮盖的所述部分视觉特征进行预测补齐处理,得到预测补齐后的部分视觉特征;Randomly cover part of the visual features in the fused features by the feature fusion model, and perform prediction and completion processing on the covered part of the visual features according to the plurality of visual feature blocks, to obtain the partial visual features after prediction and completion feature;
    根据所述预测补齐后的部分视觉特征、以及所述融合特征中除被遮盖的所述部分视觉特征以外的特征,对所述文本检测模型和所述特征融合模型的参数分别进行调整。Adjust the parameters of the text detection model and the feature fusion model according to the predicted and completed part of the visual features and the features of the fusion features except for the covered part of the visual features.
  8. 根据权利要求5所述的方法,其中,由所述特征融合模型根据所述融合特征和所述多个文本特征块,对所述文本检测模型和所述特征融合模型的参数分别进行调整,包括:The method according to claim 5, wherein the parameters of the text detection model and the feature fusion model are respectively adjusted by the feature fusion model according to the fusion feature and the plurality of text feature blocks, including :
    由所述特征融合模型根据所述多个文本特征块中的至少部分文本特征块,对所述融合特征中的文本特征进行替换处理,得到替换后的文本特征;performing replacement processing on the text features in the fusion features according to at least some of the text feature blocks in the plurality of text feature blocks by the feature fusion model, to obtain replaced text features;
    根据所述融合特征中的视觉特征、以及所述替换后的文本特征,对所述文本检测模型和所述特征融合模型的参数分别进行调整。Adjust the parameters of the text detection model and the feature fusion model respectively according to the visual features in the fusion features and the replaced text features.
  9. 根据权利要求5所述的方法,其中,由所述特征融合模型根据所述融合特征和所述多个视觉特征块,对所述文本检测模型和所述特征融合模型的参数分别进行调整,包括:The method according to claim 5, wherein the parameters of the text detection model and the feature fusion model are respectively adjusted by the feature fusion model according to the fusion feature and the plurality of visual feature blocks, including :
    由所述特征融合模型根据所述多个视觉特征块中的至少部分视觉特征块,对所述融合特征中的视觉特征进行替换处理,得到替换后的视觉特征;performing replacement processing on the visual features in the fusion features according to at least part of the visual feature blocks in the plurality of visual feature blocks by the feature fusion model, to obtain replaced visual features;
    根据所述融合特征中的文本特征、以及所述替换后的视觉特征,对所述文本检测模型和所述特征融合模型的参数分别进行调整。Adjust the parameters of the text detection model and the feature fusion model respectively according to the text features in the fusion features and the replaced visual features.
  10. 根据权利要求5至9中任一项所述的方法,其中,若由所述特征融合模型根据 所述融合特征、所述多个文本特征块、以及所述多个视觉特征块,对所述文本检测模型和所述特征融合模型的参数分别进行调整,则对所述文本检测模型和所述特征融合模型的参数分别进行调整,包括:The method according to any one of claims 5 to 9, wherein, if the feature fusion model is based on the fusion feature, the plurality of text feature blocks, and the plurality of visual feature blocks, the The parameters of the text detection model and the feature fusion model are adjusted respectively, and then the parameters of the text detection model and the feature fusion model are adjusted respectively, including:
    由所述特征融合模型根据所述融合特征和所述多个文本特征块,确定用于对所述文本检测模型和所述特征融合模型的进行调整的第一调整任务结果;Using the feature fusion model to determine a first adjustment task result for adjusting the text detection model and the feature fusion model according to the fusion feature and the plurality of text feature blocks;
    由所述特征融合模型根据所述融合特征和所述多个视觉特征块,确定用于对所述文本检测模型和所述特征融合模型的进行调整的第二调整任务结果;Using the feature fusion model to determine a second adjustment task result for adjusting the text detection model and the feature fusion model according to the fusion feature and the plurality of visual feature blocks;
    根据所述第一调整任务结果和第二调整任务结果的加权平均信息,对所述文本检测模型和所述特征融合模型的参数分别进行调整。The parameters of the text detection model and the feature fusion model are respectively adjusted according to the weighted average information of the first adjustment task result and the second adjustment task result.
  11. 一种文本识别方法,包括:A text recognition method, comprising:
    获取待识别图像;Obtain the image to be recognized;
    将所述待识别图像输入至预先训练的文本识别模型,得到所述待识别图像中的文本信息,其中,所述文本识别模型是基于预先训练的文本识别模型框架对待训练图像进行训练生成的,所述文本识别模型框架为由权利要求1至10中任一项所述训练方法训练获得,所述待训练图像中包括文本信息。Inputting the image to be recognized into a pre-trained text recognition model to obtain text information in the image to be recognized, wherein the text recognition model is generated based on the pre-trained text recognition model framework to train the image to be trained, The text recognition model framework is obtained by training the training method described in any one of claims 1 to 10, and the image to be trained includes text information.
  12. 一种文本识别模型框架的训练装置,所述装置包括:A training device of a text recognition model framework, said device comprising:
    处理单元,用于基于预设的文本检测模型对样本图像进行特征处理,得到与所述样本图像中文本信息相关的至少两种特征信息;A processing unit, configured to perform feature processing on the sample image based on a preset text detection model to obtain at least two types of feature information related to the text information in the sample image;
    融合单元,用于基于预设的特征融合模型对所述样本图像的至少两种特征信息进行融合处理,得到所述样本图像的融合特征;a fusion unit, configured to perform fusion processing on at least two types of feature information of the sample image based on a preset feature fusion model, to obtain fusion features of the sample image;
    训练单元,用于将所述融合特征输入至所述特征融合模型,基于所述融合特征模型对所述文本检测模型和所述特征融合模型的参数分别进行调整,得到文本识别模型框架,其中,所述文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。A training unit, configured to input the fusion feature into the feature fusion model, and adjust the parameters of the text detection model and the feature fusion model based on the fusion feature model to obtain a text recognition model framework, wherein, The text recognition model framework includes an adjusted text detection model and an adjusted feature fusion model.
  13. 根据权利要求12所述的装置,其中,所述处理单元,包括:The device according to claim 12, wherein the processing unit comprises:
    第一确定子单元,用于基于所述文本检测模型确定所述样本图像中文本行的位置信息;A first determination subunit, configured to determine position information of text lines in the sample image based on the text detection model;
    第二确定子单元,用于根据所述位置信息确定所述至少两种特征信息。The second determining subunit is configured to determine the at least two kinds of characteristic information according to the location information.
  14. 根据权利要求13所述的装置,其中,所述第二确定子单元,包括:The device according to claim 13, wherein the second determining subunit comprises:
    裁剪模块,用于根据所述位置信息对所述样本图像进行裁剪操作,得到文本区域;A cropping module, configured to perform a cropping operation on the sample image according to the position information to obtain a text area;
    获取模块,用于从所述文本区域中获取所述至少两种特征信息。An obtaining module, configured to obtain the at least two kinds of characteristic information from the text area.
  15. 根据权利要求14所述的装置,其中,所述获取模块用于,从所述文本区域中提取所述样本图像的图像特征,并对所述图像特征进行识别,得到所述至少两种特征信息。The device according to claim 14, wherein the acquisition module is configured to extract image features of the sample image from the text region, and identify the image features to obtain the at least two types of feature information .
  16. 根据权利要求13至15中任一项所述的装置,所述至少两种特征信息包括文本特征和视觉特征;还包括:According to the device according to any one of claims 13 to 15, the at least two kinds of feature information include text features and visual features; further comprising:
    构建单元,用于构建用于表征所述文本特征的多个文本特征块,并构建用于表征所述视觉特征的多个视觉特征块;a construction unit, configured to construct a plurality of text feature blocks for characterizing the text features, and construct a plurality of visual feature blocks for characterizing the visual features;
    以及,所述训练单元用于,由所述特征融合模型根据所述融合特征和所述多个文本特征块,对所述文本检测模型和所述特征融合模型的参数分别进行调整;和/或,用于由所述特征融合模型根据所述融合特征和所述多个视觉特征块,对所述文本检测模型和所述特征融合模型的参数分别进行调整。And, the training unit is configured to use the feature fusion model to adjust the parameters of the text detection model and the feature fusion model respectively according to the fusion feature and the plurality of text feature blocks; and/or , for the feature fusion model to adjust the parameters of the text detection model and the feature fusion model respectively according to the fusion feature and the plurality of visual feature blocks.
  17. 根据权利要求16所述的装置,其中,所述训练单元,包括:The device according to claim 16, wherein the training unit comprises:
    第一遮盖子单元,用于随机遮盖所述融合特征中的部分文本特征;The first covering subunit is used to randomly cover some text features in the fusion features;
    第一预测补齐子单元,用于根据所述多个文本特征块对遮盖的所述部分文本特征进行预测补齐处理,得到预测补齐后的部分文本特征;The first prediction and completion subunit is used to perform prediction and completion processing on the covered part of the text features according to the plurality of text feature blocks, and obtain the part of the text features after the prediction and completion;
    第一调整子单元,用于根据所述预测补齐后的部分文本特征、以及所述融合特征中除被遮盖的所述部分文本特征以外的特征,对所述文本检测模型和所述特征融合模型的参数分别进行调整。A first adjustment subunit, configured to fuse the text detection model with the features according to the predicted and completed partial text features and features in the fusion features other than the covered partial text features The parameters of the model were tuned separately.
  18. 根据权利要求16所述的装置,其中,所述训练单元,包括:The device according to claim 16, wherein the training unit comprises:
    第二遮盖子单元,用于随机遮盖所述融合特征中的部分视觉特征;The second covering subunit is used to randomly cover some visual features in the fusion features;
    第二预测补齐子单元,用于根据所述多个视觉特征块对被遮盖的所述部分视觉特征进行预测补齐处理,得到预测补齐后的部分视觉特征;The second prediction and completion subunit is used to perform prediction and completion processing on the covered part of the visual features according to the plurality of visual feature blocks, and obtain the partial visual features after prediction and completion;
    第二调整子单元,用于根据所述预测补齐后的部分视觉特征、以及所述融合特征中除被遮盖的所述部分视觉特征以外的特征,对所述文本检测模型和所述特征融合模型的参数分别进行调整。The second adjustment subunit is configured to fuse the text detection model and the features according to the predicted and completed part of the visual features and the features of the fusion features except the covered part of the visual features The parameters of the model were tuned separately.
  19. 根据权利要求16所述的装置,其中,所述训练单元,包括:The device according to claim 16, wherein the training unit comprises:
    第一替换子单元,用于根据所述多个文本特征块中的至少部分文本特征块,对所述融合特征中的文本特征进行替换处理,得到替换后的文本特征;The first replacement subunit is configured to replace the text features in the fusion features according to at least some of the text feature blocks in the plurality of text feature blocks, to obtain replaced text features;
    第三调整子单元,用于根据所述融合特征中的视觉特征、以及所述替换后的文本特征,对所述文本检测模型和所述特征融合模型的参数分别进行调整。The third adjustment subunit is configured to adjust the parameters of the text detection model and the feature fusion model respectively according to the visual features in the fusion features and the replaced text features.
  20. 根据权利要求16所述的装置,其中,所述训练单元,包括:The device according to claim 16, wherein the training unit comprises:
    第二替换子单元,用于根据所述多个视觉特征块中的至少部分视觉特征块,对所述融合特征中的视觉特征进行替换处理,得到替换后的视觉特征;The second replacement subunit is configured to replace the visual features in the fusion feature according to at least part of the visual feature blocks in the plurality of visual feature blocks to obtain replaced visual features;
    第四调整子单元,用于根据所述融合特征中的文本特征、以及所述替换后的视觉特征,对所述文本检测模型和所述特征融合模型的参数分别进行调整。The fourth adjustment subunit is configured to adjust the parameters of the text detection model and the feature fusion model respectively according to the text features in the fusion features and the replaced visual features.
  21. 根据权利要求16至20中任一项所述的装置,其中,若由所述特征融合模型根据所述融合特征、所述多个文本特征块、以及所述多个视觉特征块,对所述文本检测模型和所述特征融合模型的参数分别进行调整,则所述训练单元,包括:The device according to any one of claims 16 to 20, wherein, if the feature fusion model is based on the fusion feature, the plurality of text feature blocks, and the plurality of visual feature blocks, the The parameters of the text detection model and the feature fusion model are adjusted respectively, then the training unit includes:
    第三确定子单元,用于根据所述融合特征和所述多个文本特征块,确定用于对所述文本检测模型和所述特征融合模型的进行调整的第一调整任务结果;A third determining subunit, configured to determine a first adjustment task result for adjusting the text detection model and the feature fusion model according to the fusion feature and the plurality of text feature blocks;
    第四确定子单元,用于根据所述融合特征和所述多个视觉特征块,确定用于对所述文本检测模型和所述特征融合模型的进行调整的第二调整任务结果;A fourth determining subunit, configured to determine a second adjustment task result for adjusting the text detection model and the feature fusion model according to the fusion feature and the plurality of visual feature blocks;
    第五调整子单元,用于根据所述第一调整任务结果和第二调整任务结果的加权平均信息,对所述文本检测模型和所述特征融合模型的参数分别进行调整。The fifth adjustment subunit is configured to adjust the parameters of the text detection model and the feature fusion model according to the weighted average information of the first adjustment task result and the second adjustment task result.
  22. 一种文本识别装置,包括:A text recognition device, comprising:
    获取单元,用于获取待识别图像;an acquisition unit, configured to acquire an image to be identified;
    识别单元,用于所述待识别图像输入至预先训练的文本识别模型,得到所述待识别图像中的文本信息,其中,所述文本识别模型是基于预先训练的文本识别模型框架对待训练图像进行训练生成的,所述文本识别模型框架为由权利要求1至10中任一项所述训练方法训练获得,所述待训练图像中包括文本信息。The recognition unit is used to input the image to be recognized into a pre-trained text recognition model to obtain the text information in the image to be recognized, wherein the text recognition model is based on the pre-trained text recognition model framework for the image to be trained Generated by training, the text recognition model framework is obtained by training the training method described in any one of claims 1 to 10, and the image to be trained includes text information.
  23. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1至10中任一项所述的方法;或者,以使所述至少一个处理器能够执行权利要求11所述的方法。The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1 to 10 or, to enable said at least one processor to perform the method of claim 11.
  24. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行权利要求1至10中任一项所述的方法;或者,所述计算机指令用于使所述计算机执行权利要求11所述的方法。A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method according to any one of claims 1 to 10; or, the computer instructions are used to causing the computer to execute the method of claim 11.
  25. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1至10中任一项所述的方法;或者,所述计算机程序在被处理器执行时实现根据权利要求11所述的方法。A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 10; or, when executed by a processor, implements the method according to The method of claim 11.
  26. 一种文本识别模型框架的训练系统,所述系统包括:A training system of a text recognition model framework, said system comprising:
    文本检测模型,用于对样本图像进行特征处理,得到与所述样本图像中文本信息相关的至少两种特征信息;A text detection model, configured to perform feature processing on the sample image to obtain at least two types of feature information related to the text information in the sample image;
    特征融合模型,用于对所述样本图像的至少两种特征信息进行融合处理,得到所述样本图像的融合特征;A feature fusion model, configured to perform fusion processing on at least two types of feature information of the sample image to obtain fusion features of the sample image;
    所述特征融合模型还用于,对所述文本检测模型和所述特征融合模型的参数分别进行调整,得到文本识别模型框架,其中,所述文本识别模型框架中包括调整后的文本检测模型和调整后的特征融合模型。The feature fusion model is also used to adjust the parameters of the text detection model and the feature fusion model respectively to obtain a text recognition model framework, wherein the text recognition model framework includes the adjusted text detection model and Adjusted feature fusion model.
  27. 一种计算机程序,包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行如权利要求1-10中任一项所述的方法;或者,所述程序代码执行如权利要求11所述的方法。A computer program, including program code, when the computer runs the computer program, the program code executes the method according to any one of claims 1-10; or, the program code executes the method according to claim 11 described method.
PCT/CN2022/085149 2021-07-28 2022-04-02 Method, apparatus and system for training text recognition model framework WO2023005253A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020237005116A KR20230030005A (en) 2021-07-28 2022-04-02 Training method, apparatus and system of text recognition model framework

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110858410.X 2021-07-28
CN202110858410.XA CN113591864B (en) 2021-07-28 2021-07-28 Training method, device and system for text recognition model framework

Publications (1)

Publication Number Publication Date
WO2023005253A1 true WO2023005253A1 (en) 2023-02-02

Family

ID=78251207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/085149 WO2023005253A1 (en) 2021-07-28 2022-04-02 Method, apparatus and system for training text recognition model framework

Country Status (3)

Country Link
KR (1) KR20230030005A (en)
CN (1) CN113591864B (en)
WO (1) WO2023005253A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591864B (en) * 2021-07-28 2023-04-07 北京百度网讯科技有限公司 Training method, device and system for text recognition model framework
CN114511864B (en) * 2022-04-19 2023-01-13 腾讯科技(深圳)有限公司 Text information extraction method, target model acquisition method, device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271967A (en) * 2018-10-16 2019-01-25 腾讯科技(深圳)有限公司 The recognition methods of text and device, electronic equipment, storage medium in image
CN110163110A (en) * 2019-04-23 2019-08-23 中电科大数据研究院有限公司 A kind of pedestrian's recognition methods again merged based on transfer learning and depth characteristic
CN111507355A (en) * 2020-04-17 2020-08-07 北京百度网讯科技有限公司 Character recognition method, device, equipment and storage medium
CN111738251A (en) * 2020-08-26 2020-10-02 北京智源人工智能研究院 Optical character recognition method and device fused with language model and electronic equipment
CN113591864A (en) * 2021-07-28 2021-11-02 北京百度网讯科技有限公司 Training method, device and system for text recognition model framework

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688784A (en) * 2017-08-23 2018-02-13 福建六壬网安股份有限公司 A kind of character identifying method and storage medium based on further feature and shallow-layer Fusion Features
CN108171700A (en) * 2018-01-12 2018-06-15 西安电子科技大学 Medical image pulmonary nodule detection method based on confrontation network
KR102161476B1 (en) * 2018-07-13 2020-10-06 동국대학교 산학협력단 Apparatus and method for identifying user using user body based on deep learning
CN109359559B (en) * 2018-09-27 2021-11-12 天津师范大学 Pedestrian re-identification method based on dynamic shielding sample
CN110135366B (en) * 2019-05-20 2021-04-13 厦门大学 Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
CN110399798B (en) * 2019-06-25 2021-07-20 朱跃飞 Discrete picture file information extraction system and method based on deep learning
CN110837835B (en) * 2019-10-29 2022-11-08 华中科技大学 End-to-end scene text identification method based on boundary point detection
CN113139547B (en) * 2020-01-20 2022-04-29 阿里巴巴集团控股有限公司 Text recognition method and device, electronic equipment and storage medium
CN112329467B (en) * 2020-11-03 2022-09-30 腾讯科技(深圳)有限公司 Address recognition method and device, electronic equipment and storage medium
CN112686263B (en) * 2020-12-29 2024-04-16 科大讯飞股份有限公司 Character recognition method, character recognition device, electronic equipment and storage medium
CN112836702B (en) * 2021-01-04 2022-10-18 浙江大学 Text recognition method based on multi-scale feature extraction
CN112733768B (en) * 2021-01-15 2022-09-09 中国科学技术大学 Natural scene text recognition method and device based on bidirectional characteristic language model
CN112861739B (en) * 2021-02-10 2022-09-09 中国科学技术大学 End-to-end text recognition method, model training method and device
CN112966742A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Model training method, target detection method and device and electronic equipment
CN112861782B (en) * 2021-03-07 2023-06-20 上海大学 Bill photo key information extraction system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271967A (en) * 2018-10-16 2019-01-25 腾讯科技(深圳)有限公司 The recognition methods of text and device, electronic equipment, storage medium in image
CN110163110A (en) * 2019-04-23 2019-08-23 中电科大数据研究院有限公司 A kind of pedestrian's recognition methods again merged based on transfer learning and depth characteristic
CN111507355A (en) * 2020-04-17 2020-08-07 北京百度网讯科技有限公司 Character recognition method, device, equipment and storage medium
CN111738251A (en) * 2020-08-26 2020-10-02 北京智源人工智能研究院 Optical character recognition method and device fused with language model and electronic equipment
CN113591864A (en) * 2021-07-28 2021-11-02 北京百度网讯科技有限公司 Training method, device and system for text recognition model framework

Also Published As

Publication number Publication date
CN113591864B (en) 2023-04-07
CN113591864A (en) 2021-11-02
KR20230030005A (en) 2023-03-03

Similar Documents

Publication Publication Date Title
US20220147822A1 (en) Training method and apparatus for target detection model, device and storage medium
JP7406606B2 (en) Text recognition model training method, text recognition method and device
CN113807440B (en) Method, apparatus, and medium for processing multimodal data using neural networks
US20230069197A1 (en) Method, apparatus, device and storage medium for training video recognition model
WO2023005253A1 (en) Method, apparatus and system for training text recognition model framework
US20220415072A1 (en) Image processing method, text recognition method and apparatus
WO2022257487A1 (en) Method and apparatus for training depth estimation model, and electronic device and storage medium
EP3961584A2 (en) Character recognition method, model training method, related apparatus and electronic device
EP3876197A2 (en) Portrait extracting method and apparatus, electronic device and storage medium
CN114494784A (en) Deep learning model training method, image processing method and object recognition method
EP3955217A2 (en) Human behavior recognition method, apparatus, storage medium and program product
CN115422389B (en) Method and device for processing text image and training method of neural network
EP4191544A1 (en) Method and apparatus for recognizing token, electronic device and storage medium
CN113407850A (en) Method and device for determining and acquiring virtual image and electronic equipment
JP2022185143A (en) Text detection method, and text recognition method and device
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
US20230245429A1 (en) Method and apparatus for training lane line detection model, electronic device and storage medium
JP7309811B2 (en) Data annotation method, apparatus, electronics and storage medium
CN114550313A (en) Image processing method, neural network, and training method, device, and medium thereof
WO2023159819A1 (en) Visual processing and model training methods, device, storage medium and program product
US20230086145A1 (en) Method of processing data, electronic device, and medium
US20230096921A1 (en) Image recognition method and apparatus, electronic device and readable storage medium
CN113360672B (en) Method, apparatus, device, medium and product for generating knowledge graph
CN115565186A (en) Method and device for training character recognition model, electronic equipment and storage medium
CN114842476A (en) Watermark detection method and device and model training method and device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE