WO2020221298A1 - Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus - Google Patents

Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus Download PDF

Info

Publication number
WO2020221298A1
WO2020221298A1 PCT/CN2020/087809 CN2020087809W WO2020221298A1 WO 2020221298 A1 WO2020221298 A1 WO 2020221298A1 CN 2020087809 W CN2020087809 W CN 2020087809W WO 2020221298 A1 WO2020221298 A1 WO 2020221298A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
area
image
candidate
feature map
Prior art date
Application number
PCT/CN2020/087809
Other languages
French (fr)
Chinese (zh)
Inventor
苏驰
李凯
刘弘也
赵志明
Original Assignee
北京金山云网络技术有限公司
北京金山云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山云网络技术有限公司, 北京金山云科技有限公司 filed Critical 北京金山云网络技术有限公司
Publication of WO2020221298A1 publication Critical patent/WO2020221298A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition

Definitions

  • Step 34 Judge whether the updated parameters all converge; if the updated parameters all converge, go to step 36; if the updated parameters do not all converge, go to step 38;
  • Step S1014 Determine the text content in the text area according to the arranged characters.
  • the feature extraction module 112 is configured to extract multiple initial feature maps of the target training image through the first feature extraction network; the multiple initial feature maps have different scales;
  • the detection module 122 is configured to input the image to be detected into the pre-trained text detection model, and output multiple candidate regions of the text region in the image to be detected, and the probability value of each candidate region; the text detection model adopts the above text detection model Training method of training;
  • the above-mentioned device further includes: a region elimination module, configured to eliminate candidate regions whose probability value is lower than a preset probability threshold among the multiple candidate regions to obtain the final multiple candidate regions.
  • the above-mentioned device further includes a text recognition model training module, which is configured to complete the training of the text recognition model in the following manner: determining the target training text image; inputting the target training text image to the second initial model;
  • the model includes a second feature extraction network, a second output network, and a classification function; the feature map of the target training text image is extracted through the second feature extraction network; the feature map is split into at least one sub-feature map through the second initial model;
  • the feature map is input to the second output network, and the output matrix corresponding to each sub feature map is output; the output matrix corresponding to each sub feature map is input to the classification function, and the probability matrix corresponding to each sub feature map is output; through preset recognition
  • the loss function determines the second loss value of the probability matrix; the second initial model is trained according to the second loss value until the parameters in the second initial model converge to obtain a text recognition model.
  • the memory 100 may include a high-speed random access memory (RAM, Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM Random Access Memory
  • non-volatile memory such as at least one disk memory.
  • the communication connection between the system network element and at least one other network element is realized through at least one communication interface 103 (which may be wired or wireless), and the Internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
  • the bus 102 may be an ISA bus, PCI bus, EISA bus, or the like.
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one bidirectional arrow is used in FIG. 14, but it does not mean that there is only one bus or one type of bus.

Abstract

The present application provides a text detection model training method and apparatus, a text region determination method and apparatus, and a text content determination method and apparatus. The text detection model training method comprises: extracting a plurality of initial feature maps of a target training image by means of a first feature extraction network; fusing the plurality of initial feature maps by means of a feature fusion network to obtain a fusion feature map; inputting the fusion feature map to a first output network, and outputting candidate regions of a text region in the target training image and the probability value of each candidate region; determining a first loss value by means of a preset loss detection function; and training the first initial model according to the first loss value until parameters in the first initial model are converged, to obtain a text detection model. According to the present application, all kinds of texts in the image can be quickly, fully and accurately detected under a variety of front sizes, fonts, shapes and directions, thereby contributing to the accuracy of subsequent text recognition, and improving the text recognition effect.

Description

文本检测模型训练方法、文本区域、内容确定方法和装置Text detection model training method, text area, content determination method and device
本申请要求于2019年4月30日提交中国专利局、申请号为201910367675.2、发明名称为“文本检测模型训练方法、文本区域、内容确定方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on April 30, 2019, the application number is 201910367675. 2, and the invention title is "text detection model training method, text area, content determination method and device", and its entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及图像处理技术领域,尤其是涉及一种文本检测模型训练方法、文本区域、内容确定方法和装置。This application relates to the field of image processing technology, and in particular to a text detection model training method, text region, content determination method and device.
背景技术Background technique
相关技术中,可以通过字符切分方式实现文本的检测和识别。但这些方式通常适用于字体字号单一、背景简单、文本排列方向单一等简单场景中;在复杂场景下,如多种字号、多种字体、多种形状、多种方向、背景多变等场景,上述文本检测识别方式的效果较差。In related technologies, text detection and recognition can be realized by character segmentation. But these methods are usually suitable for simple scenes such as single font size, simple background, and single text arrangement direction; in complex scenes, such as scenes with multiple font sizes, multiple fonts, multiple shapes, multiple directions, and changing backgrounds, The above-mentioned text detection and recognition methods are less effective.
发明内容Summary of the invention
有鉴于此,本申请的目的在于提供一种文本检测模型训练方法、文本区域、内容确定方法和装置,以提高文本识别的准确性。In view of this, the purpose of this application is to provide a text detection model training method, text region, content determination method and device to improve the accuracy of text recognition.
第一方面,本申请实施例提供了一种文本检测模型训练方法,该方法包括:基于预设的训练集合确定目标训练图像;将目标训练图像输入至第一初始模型;第一初始模型包括第一特征提取网络、特征融合网络和第一输出网络;通过第一特征提取网络提取目标训练图像的多个初始特征图;多个初始特征图之间的尺度不同;通过特征融合网络对多个初始特征图进行融合处理,得到融合特征图;将融合特征图输入至第一输出网络,输出目标训练图像中文本区域的候选区域以及每个候选区域的概率值;通过预设的检测损失函数确定候选区域以及每个候选区域的概率值的第一损失值;根据第一损失值对第一初始模型进行训练,直至第一初始模型中的参数收敛,得到文本检测模型。In the first aspect, an embodiment of the present application provides a text detection model training method. The method includes: determining a target training image based on a preset training set; inputting the target training image to a first initial model; the first initial model includes the first A feature extraction network, a feature fusion network, and a first output network; multiple initial feature maps of the target training image are extracted through the first feature extraction network; the scales of the multiple initial feature maps are different; the multiple initial feature maps are processed through the feature fusion network The feature map is fused to obtain a fusion feature map; the fusion feature map is input to the first output network, and the candidate area of the text area in the target training image and the probability value of each candidate area are output; the candidate is determined by the preset detection loss function Region and the first loss value of the probability value of each candidate region; training the first initial model according to the first loss value until the parameters in the first initial model converge to obtain a text detection model.
第二方面,本申请实施例提供了一种文本区域确定方法,该方法包括:获取待检测图像;将待检测图像输入至预先训练完成的文本检测模型,输出待检测图像中文本区域的多个候选区域,以及每个候选区域的概率值;文本检测模型通过上述文本检测模型的训练方法训练得到;根据候选区域的概率值以及多个候选区域之间的重叠程度,从多个候选区域中确定待检测图像中的文本区域。In a second aspect, an embodiment of the present application provides a method for determining a text region, the method includes: obtaining an image to be detected; inputting the image to be detected into a pre-trained text detection model, and outputting multiple text regions in the image to be detected Candidate regions, and the probability value of each candidate region; the text detection model is trained by the above-mentioned text detection model training method; according to the probability value of the candidate region and the degree of overlap between multiple candidate regions, determine from multiple candidate regions The text area in the image to be detected.
第三方面,本申请实施例提供了一种文本内容确定方法,该方法包括:通过上述文本区域确定方法,获取图像中的文本区域;将文本区域输入至预先训练完成的文本识别模型,输出文本区域的识别结果;根据识别结果确定文本区域中的文本内容。In a third aspect, an embodiment of the present application provides a method for determining text content. The method includes: obtaining a text area in an image through the above method for determining a text area; inputting the text area into a pre-trained text recognition model, and outputting the text The recognition result of the area; the text content in the text area is determined according to the recognition result.
第四方面,本申请实施例提供了一种文本检测模型训练装置,该装置包括:训练图像确定模块,设置为基于预设的训练集合确定目标训练图像;训练图像输入模块,设置为将目标训练图像输入至第一初始模型;第一初始模型包括第一特征提取网络、特征融合网络和第一输出网络;特征提取模块,设置为通过第一特征提取网络提取目标训练图像的多个初始特征图;多个初始特征图之间的尺度不同;特征融合模块,设置为通过特征融合网络对多个初始特征图进行融合处理,得到融合特征图;输出模块,设置为将融合特征图输入至第一输出网络,输出目标训练图像中文本区域的候选区域以及每个候选区域的概率值;损失值确定和训练模块,设置为通过预设的检测损失函数确定候选区域以及每个候选区域的概率值的第一损失值;根据第一损失值对第一初始模型进行训练,直至第一初始模型中的参数收敛,得到文 本检测模型。In a fourth aspect, an embodiment of the present application provides a text detection model training device. The device includes: a training image determination module configured to determine a target training image based on a preset training set; a training image input module configured to train the target The image is input to the first initial model; the first initial model includes a first feature extraction network, a feature fusion network, and a first output network; the feature extraction module is configured to extract multiple initial feature maps of the target training image through the first feature extraction network ; The scales between the multiple initial feature maps are different; the feature fusion module is set to merge multiple initial feature maps through the feature fusion network to obtain the fusion feature map; the output module is set to input the fusion feature map to the first Output network, output the candidate area of the text area in the target training image and the probability value of each candidate area; the loss value determination and training module is set to determine the candidate area and the probability value of each candidate area through the preset detection loss function The first loss value; training the first initial model according to the first loss value until the parameters in the first initial model converge to obtain a text detection model.
第五方面,本申请实施例提供了一种文本区域确定装置,该装置包括:图像获取模块,设置为获取待检测图像;检测模块,设置为将待检测图像输入至预先训练完成的文本检测模型,输出待检测图像中文本区域的多个候选区域,以及每个候选区域的概率值;文本检测模型通过上述文本检测模型的训练方法训练得到;文本区域确定模块,设置为根据候选区域的概率值以及多个候选区域之间的重叠程度,从多个候选区域中确定待检测图像中的文本区域。In a fifth aspect, an embodiment of the present application provides a device for determining a text region. The device includes: an image acquisition module configured to acquire an image to be detected; a detection module configured to input the image to be detected into a pre-trained text detection model , Output multiple candidate regions of the text region in the image to be detected, and the probability value of each candidate region; the text detection model is trained through the above-mentioned text detection model training method; the text region determination module is set according to the probability value of the candidate region As well as the degree of overlap between multiple candidate areas, the text area in the image to be detected is determined from the multiple candidate areas.
第六方面,本申请实施例提供了一种文本内容确定装置,该装置包括:区域获取模块,设置为通过上述文本区域确定方法,获取图像中的文本区域;识别模块,设置为将文本区域输入至预先训练完成的文本识别模型,输出文本区域的识别结果;文本内容确定模块,设置为根据识别结果确定文本区域中的文本内容。In a sixth aspect, an embodiment of the present application provides a device for determining text content. The device includes: a region acquiring module configured to acquire a text region in an image by the above-mentioned text region determining method; and a recognition module configured to input the text region To the pre-trained text recognition model, output the recognition result of the text area; the text content determination module is set to determine the text content in the text area according to the recognition result.
第七方面,本申请实施例提供了一种电子设备,包括处理器和存储器,存储器存储有能够被处理器执行的机器可执行指令,处理器执行机器可执行指令以实现上述文本检测模型训练方法,上述文本区域确定方法,或者上述文本内容确定方法的步骤。In a seventh aspect, an embodiment of the present application provides an electronic device including a processor and a memory. The memory stores machine-executable instructions that can be executed by the processor. The processor executes the machine-executable instructions to implement the aforementioned text detection model training method. , The steps of the above-mentioned text area determination method, or the above-mentioned text content determination method.
第八方面,本申请实施例提供了一种机器可读存储介质,该机器可读存储介质存储有机器可执行指令,该机器可执行指令在被处理器调用和执行时,机器可执行指令促使处理器实现上述文本检测模型训练方法,上述文本区域确定方法,或者上述文本内容确定方法的步骤。In an eighth aspect, an embodiment of the present application provides a machine-readable storage medium that stores machine-executable instructions. When the machine-executable instructions are called and executed by a processor, the machine-executable instructions prompt The processor implements the steps of the text detection model training method, the text region determination method, or the text content determination method.
第九方面,本申请实施例提供了一种可执行程序代码,所述可执行程序代码设置为被运行以执行上述文本检测模型训练方法,上述文本区域确定方法,或者上述文本内容确定方法的步骤。In a ninth aspect, an embodiment of the present application provides an executable program code, the executable program code is set to be executed to execute the above-mentioned text detection model training method, the above-mentioned text area determination method, or the above-mentioned text content determination method steps .
本申请实施例带来了以下有益效果:The embodiments of the application bring the following beneficial effects:
本申请实施例提供的文本检测模型训练方法,特征提取网络可以自动提取不同尺度的特征,因而应用该文本检测模型,如果输入一张图像可以得到该图像中各种尺度的文本区域的候选区域,无需再人工变换图像尺度,操作便捷,尤其在多种字号、多种字体、多种形状、多种方向场景下,可以快速全面准确地检测出图像中的各类文本,进而也有利于后续文本识别的准确性,提高了文本识别的效果。In the text detection model training method provided by the embodiments of this application, the feature extraction network can automatically extract features of different scales. Therefore, if the text detection model is applied, if an image is input, candidate regions of text regions of various scales in the image can be obtained. There is no need to manually change the image scale, and the operation is convenient, especially in the scene of multiple font sizes, multiple fonts, multiple shapes, and multiple directions. It can quickly, comprehensively and accurately detect various types of text in the image, which is also beneficial to subsequent text The accuracy of recognition improves the effect of text recognition.
附图说明Description of the drawings
为了更清楚地说明本申请实施例和相关技术的技术方案,下面对实施例和相关技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application and related technologies, the following briefly introduces the drawings that need to be used in the embodiments and related technologies. Obviously, the drawings in the following description are only of the present application. For some embodiments, those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1为本申请实施例提供的一种文本检测模型训练方法的流程图;Fig. 1 is a flowchart of a text detection model training method provided by an embodiment of the application;
图2为本申请实施例提供的一种第一特征提取网络的结构示意图;2 is a schematic structural diagram of a first feature extraction network provided by an embodiment of this application;
图3为本申请实施例提供的一种对多个初始特征图进行融合处理的示意图;FIG. 3 is a schematic diagram of performing fusion processing on multiple initial feature maps according to an embodiment of the application;
图4为本申请实施例提供的一种文本区域确定方法的流程图;4 is a flowchart of a method for determining a text area provided by an embodiment of the application;
图5为本申请实施例提供的另一种文本区域确定方法的流程图;FIG. 5 is a flowchart of another method for determining a text area according to an embodiment of the application;
图6为本申请实施例提供的一种文本内容确定方法的流程图;FIG. 6 is a flowchart of a method for determining text content according to an embodiment of the application;
图7为本申请实施例提供的一种文本识别模型的训练方法的流程图;FIG. 7 is a flowchart of a method for training a text recognition model provided by an embodiment of the application;
图8为本申请实施例提供的一种第二特征提取网络的结构示意图;FIG. 8 is a schematic structural diagram of a second feature extraction network provided by an embodiment of this application;
图9为本申请实施例提供的另一种文本内容确定方法的流程图;FIG. 9 is a flowchart of another method for determining text content according to an embodiment of the application;
图10为本申请实施例提供的另一种文本内容确定方法的流程图;FIG. 10 is a flowchart of another method for determining text content according to an embodiment of the application;
图11为本申请实施例提供的一种文本检测模型训练装置的结构示意图;11 is a schematic structural diagram of a text detection model training device provided by an embodiment of the application;
图12为本申请实施例提供的一种文本区域确定装置的结构示意图;FIG. 12 is a schematic structural diagram of an apparatus for determining a text area provided by an embodiment of this application;
图13为本申请实施例提供的一种文本内容确定装置的结构示意图;FIG. 13 is a schematic structural diagram of a text content determination device provided by an embodiment of the application;
图14为本申请实施例提供的一种电子设备的结构示意图。FIG. 14 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
具体实施方式Detailed ways
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of the present application clearer, the following further describes the present application in detail with reference to the drawings and embodiments. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.
相关文本识别技术中,通过人为设定的规则从图片中检测出可能存在文本的文本区域,再对检测出的文本区域进行字符切分,得到每个字符对应的图像块,通过预先训练的分类器对每个图像块进行识别,进而得到最终的文本识别结果。该技术的缺陷主要为以下几个方面:第一,由于人为设定的规则数量有限,使得检测出的文本区域大多为规则形状区域,导致此技术应用范围有限,难以适用于复杂场景下的文本检测识别,如多种字号、多种字体、多种形状、多种方向、背景多变等场景,第二,该技术是对单个字符的识别,并未考虑字符之间的关联性,导致在复杂场景下的检测识别效果较差。In related text recognition technology, the text area that may have text is detected from the picture through artificially set rules, and then the detected text area is divided into characters to obtain the image block corresponding to each character, and through pre-trained classification The device recognizes each image block, and then obtains the final text recognition result. The disadvantages of this technology are mainly as follows: First, due to the limited number of artificially set rules, most of the detected text areas are regular-shaped areas, which results in the limited application range of this technology and it is difficult to apply to text in complex scenes. Detection and recognition, such as multiple font sizes, multiple fonts, multiple shapes, multiple directions, changeable backgrounds and other scenes. Second, the technology is the recognition of single characters without considering the correlation between characters, resulting in The detection and recognition effect in complex scenes is poor.
另有相关技术,通过深度学习的方式实现文本识别;首先需要通过循环神经网络训练识别模型;再将待检测的图片变换为多种尺度,逐一输入至识别模型中检测文本区域并识别文本;该技术的缺陷主要为以下几个方面:第一,需要人工变换图像尺度,将多种尺度的图像分别输入至识别模型中,以使识别模型识别不同大小的文本,操作较为繁琐,难以满足实时识别的需求;第二,由于循环神经网络需要遵从时间序列进行递归运算,难以并行处理,运算速度较慢;第三,该识别模型通常使用矩形检测框检测文本区域,因而仅能检测并识别水平方向的文本,对于任意角度的文本识别效果较差,导致难以适用于复杂场景下的文本检测识别。Another related technology is to realize text recognition through deep learning; firstly, the recognition model needs to be trained through a recurrent neural network; then the image to be detected is transformed into multiple scales, and then input into the recognition model to detect the text area and recognize the text; this The disadvantages of the technology are mainly as follows: First, it is necessary to manually transform the image scale, and input images of multiple scales into the recognition model, so that the recognition model can recognize texts of different sizes. The operation is cumbersome and difficult to meet real-time recognition. Second, because the cyclic neural network needs to follow the time series for recursive operations, it is difficult to process in parallel, and the calculation speed is slow; third, the recognition model usually uses a rectangular detection frame to detect the text area, so it can only detect and recognize the horizontal direction The text recognition effect is poor for text at any angle, which makes it difficult to apply to text detection and recognition in complex scenes.
综上,相关技术中的文本检测识别方式在复杂场景下效果较差;基于此,本申请实施例提供一种文本检测模型训练方法、文本区域、内容确定方法和装置;该技术可以广泛应用于各种场景下的文本检测和文本识别,尤其可以应用于网络直播、有限电视直播、游戏、视频等复杂场景下的文本检测和文本识别。In summary, the text detection and recognition methods in related technologies have poor effect in complex scenarios; based on this, the embodiments of the present application provide a text detection model training method, text region, content determination method and device; this technology can be widely used Text detection and text recognition in various scenarios can be especially applied to text detection and text recognition in complex scenarios such as web live broadcasts, limited TV live broadcasts, games, and videos.
首先对本申请实施例所公开的一种文本检测模型训练方法进行详细介绍,该文本检测模型可以用于文本检测,该文本检测可以理解为:从图像中定位出包含有文本的图像区域。如图1所示,该方法包括如下步骤:First, a method for training a text detection model disclosed in an embodiment of the present application is introduced in detail. The text detection model can be used for text detection. The text detection can be understood as: locating an image area containing text from an image. As shown in Figure 1, the method includes the following steps:
步骤S102,基于预设的训练集合确定目标训练图像。Step S102: Determine a target training image based on a preset training set.
后续内容中,一些情况下,对第一初始模型进行训练的过程中,需要多次确定训练图像;一种实施方式中,每次可以从预设的训练集合中确定目标训练图像;或者,其他实施方式中,也可以每次重新获取新的训练图像。In the subsequent content, in some cases, in the process of training the first initial model, the training image needs to be determined multiple times; in one embodiment, the target training image can be determined from the preset training set each time; or, other In the implementation, it is also possible to obtain a new training image every time.
以从预设的训练集合中确定目标训练图像为例来说,该训练集合中可以包含有多张图像,为了提高检测模型的应用广泛性,训练集合中的图像可以包含各种场景下的图像,例如,直播场景图像、游戏场景图像、户外场景图像、室内场景图像等;训练集合中的图像也可以包含多种字号、形状、字体、语言 的文本行,以使训练出的检测模型能够检测各类文本行。Taking the determination of the target training image from the preset training set as an example, the training set can contain multiple images. In order to improve the wide application of the detection model, the images in the training set can contain images in various scenarios , For example, live scene images, game scene images, outdoor scene images, indoor scene images, etc.; images in the training set can also contain text lines of multiple font sizes, shapes, fonts, and languages, so that the trained detection model can detect Various text lines.
每张图像中包含有由人工标注的文本行的文本区域,该文本区域可以通过矩形等四边形框标注,也可以通过其他多边形框进行标注;标注的文本区域通常能够完整地覆盖整个文本行,且文本区域与文本行能够紧密贴合。另外,还可以将上述训练集合中的多张图像按照预设比例划分为训练子集和测试子集。在训练过程中,可以从训练子集从获取目标训练图像。训练完成后,可以从测试子集中获取目标测试图像,用于测试检测模型的性能。Each image contains a text area of a text line manually marked. The text area can be marked by a rectangular box or other polygonal box; the marked text area can usually cover the entire text line completely, and The text area and the text line can fit closely. In addition, the multiple images in the above-mentioned training set may be divided into a training subset and a testing subset according to a preset ratio. In the training process, the target training image can be obtained from the training subset. After the training is completed, the target test image can be obtained from the test subset to test the performance of the detection model.
步骤S104,将目标训练图像输入至第一初始模型;该第一初始模型包括第一特征提取网络、特征融合网络和第一输出网络。Step S104, input the target training image to a first initial model; the first initial model includes a first feature extraction network, a feature fusion network, and a first output network.
在输入至第一初始模型之前,可以将目标训练图像调整至预设大小,如512像素*512像素。Before inputting to the first initial model, the target training image can be adjusted to a preset size, such as 512 pixels*512 pixels.
步骤S106,通过第一特征提取网络提取目标训练图像的多个初始特征图;多个初始特征图之间的尺度不同。Step S106: Extract multiple initial feature maps of the target training image through the first feature extraction network; the multiple initial feature maps have different scales.
其中,第一特征提取网络可以通过多层卷积层实现,通常,多层卷积层依次连接(连接的含义是一个卷积层的输入为另一个卷积层的输出),每层卷积层通过设置不同的卷积核,以提取不同尺度的特征图。目标训练图像的多个初始特征图中,每个初始特征图可以由对应的卷积层进行卷积计算得到。以四层卷积层为例,每层卷积层可以输出一个初始特征图;每层卷积层可以设置不同大小的卷积核,这样每层卷积层输出的初始特征图的尺度不同。举例来说,可以设置输入目标训练图像的卷积层输出的初始特征图的尺度最大,后续每层卷积层输出的初始特征图的尺度逐渐减小。Among them, the first feature extraction network can be realized by multi-layer convolutional layers. Generally, the multi-layer convolutional layers are connected in sequence (the meaning of connection is that the input of one convolutional layer is the output of another convolutional layer), and each layer of convolution The layer sets different convolution kernels to extract feature maps of different scales. Multiple initial feature maps of the target training image, each initial feature map can be obtained by convolution calculation of a corresponding convolution layer. Taking a four-layer convolutional layer as an example, each convolutional layer can output an initial feature map; each convolutional layer can be set with a different size of convolution kernel, so that the scale of the initial feature map output by each convolutional layer is different. For example, the scale of the initial feature map output by the convolutional layer of the input target training image can be set to be the largest, and the scale of the initial feature map output by each subsequent convolutional layer gradually decreases.
步骤S108,通过所述特征融合网络对多个所述初始特征图进行融合处理,得到融合特征图。Step S108: Perform fusion processing on multiple initial feature maps through the feature fusion network to obtain a fusion feature map.
通常,较小的卷积核可以感应图像中的高频特征,使用较小的卷积核的卷积网络输出的初始特征图中携带有小尺度的文本行特征;较大的卷积核可以感应图像中的低频特征,使用较大的卷积网络的卷积层输出的初始特征图中携带有大尺度的文本行特征;基于此,多个不同尺度的初始特征图中携带有各种尺度的文本行特征,对多个初始特征图进行融合处理后得到的融合特征图中也携带有各种尺度的文本行特征。通过该方式,检测模型可以检测各种尺度的文本行,无需在检测之前人为地进行图像尺度变换。Generally, a smaller convolution kernel can sense high-frequency features in the image, and the initial feature map output by the convolution network using a smaller convolution kernel carries small-scale text line features; a larger convolution kernel can Sense the low-frequency features in the image. The initial feature map output by the convolutional layer of the larger convolutional network carries large-scale text line features; based on this, multiple initial feature maps of different scales carry various scales The fusion feature map obtained after fusion processing of multiple initial feature maps also carries text line features of various scales. In this way, the detection model can detect text lines of various scales without artificial image scale transformation before detection.
一种情况下,由于多个初始特征图的尺度不同,在进行融合之前,可以将较小尺度的初始特征图进行插值运算,以扩展较小尺度的初始特征图,使之与较大尺度的初始特征图相匹配。在融合过程中,不同初始特征图间,相同位置的特征点可以进行相乘或相加运算,从而得到最终的融合特征图。In one case, because the scales of multiple initial feature maps are different, before the fusion, the smaller-scale initial feature maps can be interpolated to expand the smaller-scale initial feature maps to make them comparable to the larger-scale ones. Match the initial feature map. In the fusion process, between different initial feature maps, feature points at the same position can be multiplied or added to obtain the final fusion feature map.
步骤S110,将融合特征图输入至第一输出网络,输出目标训练图像中文本区域的候选区域以及每个候选区域的概率值。Step S110, input the fusion feature map to the first output network, and output the candidate regions of the text region in the target training image and the probability value of each candidate region.
该第一输出网络设置为从融合特征图中提取需要的特征,得到输出结果;如果检测模型的输出结果为一种结果,则该第一输出网络通常包含一组网络;如果检测模型的输出结果为多种结果,则该第一输出网络通常包含多组网络,多组网络间并列设置,每组网络分别对应输出一种结果。该第一输出网络可以由卷积层或全连接层组成。上述步骤中,第一输出网络需要输出候选区域和候选区域的概率值两种结果,因而该第一输出网络中可以包含两组网络,每组网络可以为卷积网络或全连接网络。The first output network is set to extract the required features from the fusion feature map to obtain the output result; if the output result of the detection model is a result, the first output network usually contains a group of networks; if the output result of the detection model For multiple results, the first output network usually includes multiple groups of networks, and the multiple groups of networks are arranged in parallel, and each group of networks respectively outputs one result. The first output network can be composed of a convolutional layer or a fully connected layer. In the above steps, the first output network needs to output two results of the candidate area and the probability value of the candidate area. Therefore, the first output network may include two sets of networks, and each set of networks may be a convolutional network or a fully connected network.
步骤S112,通过预设的检测损失函数确定上述候选区域以及每个候选区域的概率值的第一损失值;根据该第一损失值对第一初始模型进行训练,直至第一初始模型中的参数收敛,得到文本检测模型。Step S112: Determine the candidate region and the first loss value of the probability value of each candidate region through a preset detection loss function; train the first initial model according to the first loss value until the parameters in the first initial model Converge, and get the text detection model.
目标训练图像中预先标注有标准的文本区域,基于标注的文本区域的位置可以生成文本区域的坐标矩阵,以及文本区域的概率矩阵;其中,文本区域的坐标矩阵中可以包含有标准的文本区域的顶点坐标; 文本区域的概率矩阵包含有文本区域的概率值,例如,该概率值可以为1。The target training image is pre-marked with a standard text area. Based on the position of the marked text area, the coordinate matrix of the text area and the probability matrix of the text area can be generated. Among them, the coordinate matrix of the text area can contain the standard text area Vertex coordinates; the probability matrix of the text area contains the probability value of the text area, for example, the probability value can be 1.
检测损失函数可以比较候选区域的坐标矩阵与标准的文本区域的坐标矩阵的区别,以及候选区域的概率值与标准的文本区域的概率值的区别,通常区别越大,上述第一损失值越大。基于该第一损失值可以调整上述第一初始模型中各个部分的参数,以达到训练的目的。当模型中各个参数收敛时,训练结束,得到检测模型。The detection loss function can compare the difference between the coordinate matrix of the candidate area and the coordinate matrix of the standard text area, as well as the difference between the probability value of the candidate area and the standard text area. Generally, the greater the difference, the greater the above-mentioned first loss value . Based on the first loss value, the parameters of each part in the first initial model can be adjusted to achieve the purpose of training. When each parameter in the model converges, the training ends and the detection model is obtained.
本申请实施例提供的文本检测模型训练方法,首先提取目标训练图像的尺度相互不同的多个初始特征图;再对多个初始特征图进行融合处理,得到融合特征图;进而将融合特征图输入至第一输出网络,输出目标训练图像中文本区域的候选区域以及每个候选区域的概率值;通过预设的检测损失函数确定第一损失值后,根据该第一损失值对第一初始模型进行训练,得到检测模型。该方式中,特征提取网络可以自动提取不同尺度的特征,因而应用该文本检测模型,如果输入一张图像可以得到该图像中各种尺度的文本区域的候选区域,无需再人工变换图像尺度,操作便捷,尤其在多种字号、多种字体、多种形状、多种方向场景下,可以快速全面准确地检测出图像中的各类文本,进而也有利于后续文本识别的准确性,提高了文本识别的效果。The text detection model training method provided by the embodiments of the application first extracts multiple initial feature maps with different scales of the target training image; then performs fusion processing on the multiple initial feature maps to obtain a fused feature map; and then inputs the fused feature map To the first output network, output the candidate regions of the text region in the target training image and the probability value of each candidate region; after the first loss value is determined by the preset detection loss function, the first initial model is calculated according to the first loss value Perform training to obtain a detection model. In this method, the feature extraction network can automatically extract features of different scales. Therefore, if the text detection model is applied, if you input an image, you can get the candidate regions of the text area of various scales in the image, without manually changing the image scale. Convenient, especially in scenarios with multiple font sizes, multiple fonts, multiple shapes, and multiple orientations, it can quickly, comprehensively and accurately detect all types of text in the image, which is also conducive to the accuracy of subsequent text recognition and improves the text The effect of recognition.
本申请实施例还提供另一种文本检测模型训练方法,该方法在上述实施例所述文本检测模型训练方法的基础上实现;该方法重点描述上述训练方法中各个步骤的具体实现过程;该方法包括如下步骤:The embodiment of the present application also provides another text detection model training method, which is implemented on the basis of the text detection model training method described in the above embodiment; this method focuses on the specific implementation process of each step in the above training method; this method Including the following steps:
步骤202,基于预设的训练集合确定目标训练图像。Step 202: Determine a target training image based on a preset training set.
步骤204,将目标训练图像输入至第一初始模型;该第一初始模型包括第一特征提取网络、特征融合网络和第一输出网络。Step 204: Input the target training image to a first initial model; the first initial model includes a first feature extraction network, a feature fusion network, and a first output network.
步骤206,通过第一特征提取网络提取目标训练图像的多个初始特征图;多个初始特征图之间的尺度不同。Step 206: Extract multiple initial feature maps of the target training image through the first feature extraction network; the multiple initial feature maps have different scales.
一种实施方式中,该第一特征提取网络可以包括依次连接的多组第一卷积网络;每组第一卷积网络包括依次连接的卷积层、批归一化层和激活函数层。图2示出了一种第一特征提取网络的结构示意图;图2中以四组第一卷积网络为例进行说明,后一组第一卷积网络的卷积层连接前一组第一卷积网络的激活函数层,每一组第一卷积网络的激活函数层分别输出一个初始特征图,前一组第一卷积网络的激活函数层输出的初始特征图还输入至后一组第一卷积网络的卷积层。另外,第一特征提取网络中还可以包含更多组或更少组的第一卷积网络。In one embodiment, the first feature extraction network may include multiple groups of first convolutional networks connected in sequence; each group of first convolutional networks includes a convolution layer, a batch normalization layer, and an activation function layer connected in sequence. Figure 2 shows a schematic structural diagram of a first feature extraction network; Figure 2 takes four groups of first convolutional networks as an example for illustration, the convolutional layers of the latter group of first convolutional networks are connected to the first group of first convolutional networks. The activation function layer of the convolutional network. The activation function layer of each group of the first convolutional network outputs an initial feature map. The initial feature map output by the activation function layer of the first group of first convolutional network is also input to the latter group The convolutional layer of the first convolutional network. In addition, the first feature extraction network may also include more groups or fewer groups of first convolutional networks.
第一卷积网络中的批归一化层设置为对卷积层输出的特征图进行归一化处理,该过程可以加快第一特征提取网络以及检测模型的收敛速度,并且可以缓解在多层卷积网络中梯度弥散的问题,使得第一特征提取网络更加稳定。第一卷积网络中的激活函数层可以对归一化处理后的特征图进行函数变换,该变换过程打破卷积层输入的线性组合,可以提高第一卷积网络的特征表达能力。该激活函数层具体可以为Sigmoid函数、tanh函数、Relu函数等。The batch normalization layer in the first convolutional network is set to normalize the feature map output by the convolutional layer. This process can speed up the convergence speed of the first feature extraction network and the detection model, and can alleviate the The problem of gradient dispersion in convolutional networks makes the first feature extraction network more stable. The activation function layer in the first convolutional network can perform function transformation on the normalized feature map. The transformation process breaks the linear combination of the convolutional layer input and can improve the feature expression ability of the first convolutional network. The activation function layer may specifically be Sigmoid function, tanh function, Relu function, etc.
步骤208,通过上述特征融合网络对多个所述初始特征图进行融合处理,得到融合特征图。Step 208: Perform fusion processing on the multiple initial feature maps through the feature fusion network to obtain a fusion feature map.
下述步骤02-08提供一种步骤208的具体的实现方式,该方式中,以金字塔特征为例进行说明,即各个卷积层输出的初始特征图的尺度依次减小:The following steps 02-08 provide a specific implementation manner of step 208. In this manner, the pyramid feature is taken as an example for description, that is, the scales of the initial feature maps output by each convolutional layer are sequentially reduced:
步骤02,根据初始特征图的尺度,将多个初始特征图依次排列;其中,最顶层级的初始特征图的尺度最小;最底层级的初始特征图的尺度最大;Step 02: Arrange multiple initial feature maps in sequence according to the scale of the initial feature map; among them, the scale of the initial feature map at the top level is the smallest; the scale of the initial feature map at the bottom level is the largest;
步骤04,将最顶层级的初始特征图确定为最顶层级的融合特征图;Step 04: Determine the top-level initial feature map as the top-level fusion feature map;
步骤06,除最顶层级以外,将当前层级的初始特征图和当前层级的上一层级的融合特征图进行融合,得到当前层级的融合特征图;Step 06: In addition to the top level, the initial feature map of the current level and the fusion feature map of the previous level of the current level are fused to obtain the fusion feature map of the current level;
由于当前层级的上一层级的融合特征图的尺度小于当前层级的初始特征图,二者在进行融合之前,可以通过插值运算,将当前层级的上一层级的融合结果的尺度扩展至与当前层级的初始特征图的尺度相同,进而再进行逐点相加或逐点相乘的融合处理,得到当前层级的融合特征图。Since the scale of the fusion feature map of the upper level of the current level is smaller than the initial feature map of the current level, before the two are fused, the scale of the fusion result of the upper level of the current level can be extended to the current level through interpolation. The scale of the initial feature map is the same, and then the fusion processing of point-by-point addition or point-by-point multiplication is performed to obtain the current level of fusion feature map.
步骤08,将最低层级的融合特征图确定为最终的融合特征图。Step 08: Determine the lowest level fusion feature map as the final fusion feature map.
各层级的融合结果实质上就是各层级的融合特征图,为了与最终的融合特征图相区分,将各层级的融合特征图称为融合结果。步骤04-08可以表达为:按照排列顺序,依次针对所述最顶层级以下的每一层级,将该层级的初始特征图和该层级的上一层级的融合结果进行融合,得到该层级的融合结果;其中,所述最顶层级的融合结果为所述最顶层级的初始特征图;将最低层级的融合结果确定为所述初始特征图的融合特征图。The fusion result of each level is essentially the fusion feature map of each level. In order to distinguish from the final fusion feature map, the fusion feature map of each level is called the fusion result. Steps 04-08 can be expressed as: according to the arrangement order, for each level below the top level, the initial feature map of the level and the fusion result of the upper level of the level are merged to obtain the fusion of the level Result; wherein the fusion result of the top level is the initial feature map of the top level; the fusion result of the lowest level is determined as the fusion feature map of the initial feature map.
图3示出了一种对多个初始特征图进行融合处理的示意图;目标训练图像经第一特征提取网络进行卷积处理后得到四层初始特征图;最顶层级的初始特征图作为最顶层级的融合特征图;最顶层级的融合特征图与第二层级的初始特征图进行融合,得到第二层级的融合特征图;第二层级的融合特征图与第三层级的初始特征图进行融合,得到第三层级的融合特征图;第三层级的融合特征图与第四层级的初始特征图进行融合,得到第四层级的融合特征图;该第四层级的融合特征图即最终的融合特征图。Figure 3 shows a schematic diagram of fusion processing multiple initial feature maps; the target training image is convolved by the first feature extraction network to obtain four-layer initial feature maps; the top-level initial feature map is taken as the top Fusion feature map of level; the fusion feature map of the top level is fused with the initial feature map of the second level to obtain the fusion feature map of the second level; the fusion feature map of the second level is fused with the initial feature map of the third level , The third-level fusion feature map is obtained; the third-level fusion feature map is fused with the fourth-level initial feature map to obtain the fourth-level fusion feature map; the fourth-level fusion feature map is the final fusion feature Figure.
步骤210,将融合特征图输入至第一输出网络,输出目标训练图像中文本区域的候选区域以及每个候选区域的概率值。Step 210: Input the fusion feature map to the first output network, and output the candidate regions of the text region in the target training image and the probability value of each candidate region.
以卷积网络为例,上述第一输出网络包括第一卷积层和第二卷积层;其中,第一卷积层和第二卷积层并列设置,第一卷积层和第二卷积层分别设置为输出选区域的顶点坐标和候选区域的概率值,上述步骤210还可以通过下述步骤12-16实现:Taking the convolutional network as an example, the above-mentioned first output network includes a first convolutional layer and a second convolutional layer; wherein, the first convolutional layer and the second convolutional layer are arranged in parallel, and the first convolutional layer and the second convolutional layer are arranged in parallel. The layers are respectively set to output the vertex coordinates of the selected area and the probability value of the candidate area. The above step 210 can also be implemented through the following steps 12-16:
步骤12,将融合特征图分别输入至第一卷积层和第二卷积层;Step 12: Input the fusion feature map to the first convolutional layer and the second convolutional layer respectively;
步骤14,通过第一卷积层对融合特征图进行第一卷积运算,输出坐标矩阵;该坐标矩阵包括目标训练图像中文本区域的候选区域的顶点坐标;Step 14. Perform a first convolution operation on the fused feature map through the first convolution layer, and output a coordinate matrix; the coordinate matrix includes the vertex coordinates of the candidate region of the text region in the target training image;
例如,该坐标矩阵可以表示为n*H*W,其中H和W分别为坐标矩阵的高度和宽度,n为坐标矩阵的维度;例如,当候选区域为四边形时,一个候选区域需要通过四个顶点坐标确定,这样n为8;当候选区域为其他多边形时,则n的数值通常为候选区域边数的两倍。For example, the coordinate matrix can be expressed as n*H*W, where H and W are the height and width of the coordinate matrix, and n is the dimension of the coordinate matrix; for example, when the candidate area is a quadrilateral, a candidate area needs to pass through four The vertex coordinates are determined, so that n is 8; when the candidate area is another polygon, the value of n is usually twice the number of sides of the candidate area.
步骤16,通过第二卷积层对融合特征图进行第二卷积运算,输出概率矩阵;该概率矩阵包括每个候选区域的概率值。Step 16. Perform a second convolution operation on the fused feature map through the second convolution layer to output a probability matrix; the probability matrix includes the probability value of each candidate region.
每个候选区域的概率值也可以称为每个候选区域的得分,概率值可以用于表征候选区域能够完整包含有文本行的概率。The probability value of each candidate area can also be called the score of each candidate area, and the probability value can be used to characterize the probability that the candidate area can completely contain the text line.
步骤212,通过预设的检测损失函数确定上述候选区域以及每个候选区域的概率值的第一损失值;根据该第一损失值对第一初始模型进行训练,直至第一初始模型中的参数收敛,得到文本检测模型。Step 212: Determine the candidate region and the first loss value of the probability value of each candidate region through a preset detection loss function; train the first initial model according to the first loss value until the parameters in the first initial model Converge, and get the text detection model.
一种情况下,上述检测损失函数包括第一函数和第二函数,分别用于计算候选区域的顶点坐标以及每个候选区域的概率值的损失值;其中,第一函数为L 1=|G *-G|;其中,G *为预先标注的目标训练图像中文本区域的坐标矩阵;G为第一输出网络输出的目标训练图像中文本区域的候选区域的坐标矩阵; 第二函数为L 2=-Y *logY-(1-Y *)log(1-Y);其中,Y *为预先标注的目标训练图像中文本区域的概率矩阵;Y为第一输出网络输出的目标训练图像中文本区域的候选区域的概率矩阵;log表示对数运算。上述候选区域的顶点坐标以及每个候选区域的概率值的第一损失值为上述第一函数和第二函数之和,即L=L 1+L 2In one case, the above detection loss function includes a first function and a second function, which are respectively used to calculate the vertex coordinates of the candidate area and the loss value of the probability value of each candidate area; where the first function is L 1 =|G * -G|; where G * is the coordinate matrix of the text area in the pre-labeled target training image; G is the coordinate matrix of the candidate area of the text area in the target training image output by the first output network; the second function is L 2 =-Y * logY-(1-Y * )log(1-Y); where Y * is the probability matrix of the text area in the pre-labeled target training image; Y is the text in the target training image output by the first output network The probability matrix of the candidate area of the area; log represents the logarithmic operation. The first loss value of the vertex coordinates of the candidate area and the probability value of each candidate area is the sum of the first function and the second function, that is, L=L 1 +L 2 .
基于上述对第一损失值的描述,上述步骤中,根据该第一损失值对第一初始模型进行训练的过程,还可以通过下述步骤22-28实现:Based on the above description of the first loss value, in the above steps, the process of training the first initial model according to the first loss value can also be implemented through the following steps 22-28:
步骤22,根据第一损失值更新第一初始模型中的参数;Step 22: Update the parameters in the first initial model according to the first loss value;
一种情况下,可以预先设置函数映射关系,将原始参数和第一损失值输入至该函数映射关系中,即可计算得到更新的参数。不同参数的函数映射关系可以相同,也可以不同。In one case, the function mapping relationship can be preset, and the original parameters and the first loss value are input into the function mapping relationship, and then the updated parameters can be calculated. The function mapping relationship of different parameters can be the same or different.
具体而言,可以首先按照预设规则,确定待更新参数;该待更新参数可以为第一初始模型中的所有参数,也可以随机从第一初始模型中确定的部分参数;再计算第一损失值对第一初始模型中待更新参数的导数
Figure PCTCN2020087809-appb-000001
其中,L为第一损失值;W为待更新参数;
Figure PCTCN2020087809-appb-000002
表示偏导数运算;该待更新参数也可以称为各神经元的权值。该过程也可以称为反向传播算法;如果第一损失值较大,则说明当前的第一初始模型的输出与期望输出结果不符,则求出上述第一损失值对第一初始模型中待更新参数的导数,该导数可以作为调整待更新参数的依据。
Specifically, the parameters to be updated can be determined first according to preset rules; the parameters to be updated can be all parameters in the first initial model, or some parameters randomly determined from the first initial model; then the first loss is calculated The derivative of the value to the parameter to be updated in the first initial model
Figure PCTCN2020087809-appb-000001
Among them, L is the first loss value; W is the parameter to be updated;
Figure PCTCN2020087809-appb-000002
Represents partial derivative operation; the parameter to be updated can also be called the weight of each neuron. This process can also be called a backpropagation algorithm; if the first loss value is large, it means that the output of the current first initial model does not match the expected output result, then the first loss value is calculated for the first initial model to be Update the derivative of the parameter, which can be used as a basis for adjusting the parameter to be updated.
得到各个待更新参数的导数后,再更新各个待更新参数,得到更新后的待更新参数
Figure PCTCN2020087809-appb-000003
其中,α为预设系数。该过程也可以称为随机梯度下降算法;各个待更新参数的导数也可以理解为相对于当前参数,第一损失值下降最快的方向,通过该方向调整参数,可以使第一损失值快速降低,使该参数收敛。另外,当第一初始模型经一次训练后,得到一个第一损失值,此时可以从第一初始模型中各个参数中随机选择一个或多个参数进行上述的更新过程,该方式的模型训练时间较短,算法较快;当然也可以对第一初始模型中所有参数进行上述的更新过程,该方式的模型训练更加准确。
After obtaining the derivative of each parameter to be updated, update each parameter to be updated to obtain the updated parameter to be updated
Figure PCTCN2020087809-appb-000003
Among them, α is the preset coefficient. This process can also be called a stochastic gradient descent algorithm; the derivative of each parameter to be updated can also be understood as the direction in which the first loss value drops the fastest relative to the current parameter. By adjusting the parameters in this direction, the first loss value can be quickly reduced To make the parameter converge. In addition, when the first initial model is trained once, a first loss value is obtained. At this time, one or more parameters can be randomly selected from each parameter in the first initial model to perform the above-mentioned update process. The model training time of this method is Shorter, faster algorithm; Of course, it is also possible to perform the above-mentioned update process on all parameters in the first initial model, and the model training in this way is more accurate.
步骤24,判断更新后的各个参数是否均收敛;如果更新后的各个参数均收敛,执行步骤26;如果更新后的各个参数没有均收敛,执行步骤28;Step 24: Judge whether the updated parameters are all converged; if the updated parameters are all converged, go to step 26; if the updated parameters are not all converged, go to step 28;
步骤26,将参数更新后的第一初始模型确定为检测模型;结束。Step 26: Determine the first initial model after parameter update as the detection model; end.
步骤28,继续执行基于预设的训练集合确定目标训练图像的步骤,直至更新后的各个参数均收敛。Step 28: Continue to execute the step of determining the target training image based on the preset training set until all the updated parameters converge.
具体地,可以从训练集合中重新获取新的图像作为目标训练图像,也可以继续将当前的目标训练图像作为目标训练图像进行训练。Specifically, a new image can be retrieved from the training set as the target training image, or the current target training image can be used as the target training image for training.
上述方式中,特征提取网络可以自动提取不同尺度的特征图,进而再将不同尺度的特征图进行融合处理,基于得到的融合特征图获取图像中各种尺度的文本区域的候选区域。该检测模型,只需要输入一张图像即可得到该图像中各种尺度的文本区域的候选区域,无需再人工变换图像尺度,操作便捷,尤其在多种字号、多种字体、多种形状、多种方向场景下,可以快速全面准确地检测出图像中的各类文本,进而也有利于后续文本识别的准确性,提高了文本识别的效果。In the above method, the feature extraction network can automatically extract feature maps of different scales, and then perform fusion processing on the feature maps of different scales, and obtain candidate regions of text regions of various scales in the image based on the obtained fused feature maps. The detection model only needs to input an image to obtain the candidate regions of the text area of various scales in the image, without manually changing the image scale, and the operation is convenient, especially in multiple font sizes, multiple fonts, multiple shapes, In a multi-directional scene, various types of text in the image can be detected quickly, comprehensively and accurately, which is also conducive to the accuracy of subsequent text recognition and improves the effect of text recognition.
本申请实施例还提供一种文本区域确定方法,该方法在上述实施例所述的文本检测模型训练方法的基础上实现;如图4所示,该方法包括如下步骤:The embodiment of the present application also provides a method for determining a text region, which is implemented on the basis of the text detection model training method described in the above embodiment; as shown in FIG. 4, the method includes the following steps:
步骤S402,获取待检测图像;该待检测图像可以是图片,也可以是从视频文件或直播视频中截取的视频帧等。Step S402: Obtain an image to be detected; the image to be detected may be a picture, or a video frame intercepted from a video file or a live video.
步骤S404,将待检测图像输入至采用上述实施例提供的文本检测模型训练方法预先训练完成的文本检测模型中,输出待检测图像中文本区域的多个候选区域,以及每个候选区域的概率值。Step S404, input the image to be detected into the text detection model pre-trained using the text detection model training method provided in the above embodiment, and output multiple candidate regions of the text region in the image to be detected, and the probability value of each candidate region .
步骤S406,根据候选区域的概率值以及多个候选区域之间的重叠程度,从多个候选区域中确定待检测图像中的文本区域。Step S406: Determine the text area in the image to be detected from the multiple candidate areas according to the probability value of the candidate area and the degree of overlap between the multiple candidate areas.
上述文本检测模型输出的候选区域中,可能有多个候选区域均对应同一个文本行;为了从多个候选区域中找出与文本行最匹配的区域,需要对多个候选区域进行筛选。大多情况下,相互重叠程度较高的多个候选区域,通常对应同一个文本行,进而再根据相互重叠程度较高的多个候选区域的概率值,即可从中确定该文本行对应的文本区域;例如,将相互重叠程度较高的多个候选区域中,概率值最大的候选区域确定为文本区域。如果图像中存在多个文本行,则可以确定出多个文本区域。Among the candidate regions output by the above-mentioned text detection model, there may be multiple candidate regions corresponding to the same text line; in order to find the region that best matches the text line from the multiple candidate regions, multiple candidate regions need to be filtered. In most cases, multiple candidate areas with a high degree of mutual overlap usually correspond to the same text line, and then based on the probability value of multiple candidate areas with a high degree of mutual overlap, the text area corresponding to the text line can be determined. ; For example, among a plurality of candidate regions with a high degree of mutual overlap, the candidate region with the largest probability value is determined as the text region. If there are multiple text lines in the image, multiple text regions can be identified.
本申请实施例提供的上述文本区域确定方法,将获取到的待检测图像输入至文本检测模型,输出待检测图像中文本区域的多个候选区域以及每个候选区域的概率值;进而根据候选区域的概率值以及多个候选区域之间的重叠程度,从多个候选区域中确定待检测图像中的文本区域。该方式中,文本检测模型可以自动提取不同尺度的特征,如果输入一张图像至该模型可以得到该图像中各种尺度的文本区域的候选区域,无需再人工变换图像尺度,操作便捷,尤其在多种字号、多种字体、多种形状、多种方向场景下,可以快速全面准确地检测出图像中的各类文本,进而也有利于后续文本识别的准确性,提高了文本识别的效果。The above-mentioned text area determination method provided by the embodiments of the present application inputs the acquired image to be detected into the text detection model, and outputs multiple candidate areas of the text area in the image to be detected and the probability value of each candidate area; and then according to the candidate area The probability value of and the degree of overlap between multiple candidate areas are used to determine the text area in the image to be detected from the multiple candidate areas. In this way, the text detection model can automatically extract features of different scales. If you input an image to the model, you can get the candidate regions of the text area of various scales in the image. There is no need to manually transform the image scale, and the operation is convenient, especially in With multiple font sizes, multiple fonts, multiple shapes, and multiple orientations, various types of text in the image can be detected quickly, comprehensively and accurately, which in turn is beneficial to the accuracy of subsequent text recognition and improves the effect of text recognition.
本申请实施例还提供另一种文本区域确定方法,该方法在上述实施例文本区域确定方法的基础上实现;本方法重点描述根据检测网络输出的候选区域的顶点坐标以及候选区域的概率值确定待检测图像中的文本区域的具体过程;如图5所示,该方法包括如下步骤:The embodiment of the application also provides another method for determining a text region, which is implemented on the basis of the method for determining a text region in the above embodiment; this method focuses on determining the vertex coordinates of the candidate region output by the detection network and the probability value of the candidate region The specific process of the text area in the image to be detected; as shown in Figure 5, the method includes the following steps:
步骤S502,获取待检测图像。Step S502: Obtain an image to be detected.
步骤S504,将待检测图像输入至预先训练完成的文本检测模型,输出待检测图像中文本区域的多个候选区域,以及每个候选区域的概率值;Step S504, input the image to be detected into the pre-trained text detection model, and output multiple candidate regions of the text region in the image to be detected, and the probability value of each candidate region;
步骤S506,将多个候选区域中,概率值低于预设的概率阈值的候选区域剔除,得到最终的多个候选区域。Step S506, among the multiple candidate areas, the candidate areas whose probability value is lower than the preset probability threshold are eliminated to obtain the final multiple candidate areas.
该步骤S506为可选步骤,即下述步骤S508中,可以对检测模型输出的每个候选区域进行排列,也可以先将检测模型输出的候选区域中概率值低于预设的概率阈值的候选区域剔除,再对剩余的候选区域进行排列。上述预设的概率阈值可以预先设置,如0.2、0.1等;通过剔除概率值低于预设的概率阈值的候选区域,有利于降低后续确定待检测图像中的文本区域的运算量,提高运算速度。This step S506 is an optional step, that is, in the following step S508, each candidate region output by the detection model can be arranged, or the candidate regions output by the detection model can be first selected with a probability value lower than the preset probability threshold. The region is eliminated, and then the remaining candidate regions are arranged. The aforementioned preset probability threshold can be set in advance, such as 0.2, 0.1, etc.; by eliminating candidate regions whose probability value is lower than the preset probability threshold, it is helpful to reduce the amount of calculation for the subsequent determination of the text area in the image to be detected and increase the speed of calculation .
步骤S508,根据候选区域的概率值,将多个候选区域依次排列;其中,第一个候选区域的概率值最大,最后一个候选区域的概率值最小;Step S508: Arranging multiple candidate regions in sequence according to the probability values of the candidate regions; among them, the probability value of the first candidate region is the largest, and the probability value of the last candidate region is the smallest;
步骤S510,将第一个候选区域作为当前候选区域,逐一计算当前候选区域与除当前候选区域以外的候选区域的重叠程度;Step S510, taking the first candidate area as the current candidate area, and calculating the degree of overlap between the current candidate area and the candidate areas other than the current candidate area one by one;
除当前候选区域以外的候选区域也可以简称为其他候选区域,在计算当前候选区域与每个其他候选区域的重叠程度时,具体可以计算两个候选区域的交并比,该交并比等于两个候选区域交集的区域大小与两个候选区域并集的区域大小。可以理解,交并比越大,两个候选区域的重叠程度越大。对于当前候选区域而言,与该当前候选区域重叠程度较大的其他候选区域通常与该当前候选区域表征同一个文本行, 又由于其他候选区域的概率值小于当前候选区域,因此可以将该其他候选区域剔除,以通过当前候选区域表征该文本行。Candidate areas other than the current candidate area can also be referred to as other candidate areas for short. When calculating the degree of overlap between the current candidate area and each other candidate area, the intersection ratio of the two candidate areas can be specifically calculated, and the intersection ratio is equal to two. The area size of the intersection of two candidate areas and the area size of the union of two candidate areas. It can be understood that the greater the intersection ratio, the greater the degree of overlap between the two candidate regions. For the current candidate area, other candidate areas that have a greater degree of overlap with the current candidate area usually represent the same text line as the current candidate area, and because the probability values of other candidate areas are less than the current candidate area, the other candidate areas can be The candidate area is eliminated to characterize the text line through the current candidate area.
步骤S512,将除当前候选区域以外的候选区域中,重叠程度大于预设的重叠阈值的候选区域剔除;该重叠阈值可以预先设置,如0.5、0.6等。In step S512, among the candidate regions except the current candidate region, candidate regions whose overlap degree is greater than a preset overlap threshold are eliminated; the overlap threshold may be preset, such as 0.5, 0.6, etc.
步骤S514,将当前候选区域的下一个候选区域作为新的当前候选区域,继续执行逐一计算当前候选区域与除当前候选区域以外的候选区域的重叠程度的步骤,直至到达最后一个候选区域。Step S514: Regard the next candidate area of the current candidate area as the new current candidate area, and continue to perform the step of calculating the overlap degree of the current candidate area and the candidate areas other than the current candidate area one by one until the last candidate area is reached.
上述步骤S510-S514中包含有循环过程,每轮循环中都会剔除部分候选区域,当遍历至最后一个候选区域,循环结束,将最终剩余的候选区域确定为待检测图像中的文本区域。如果最终剩余的候选区域为多个,则可以确定待检测图像中的文本区域为多个。The above steps S510-S514 include a cyclic process, in each round of the cycle, part of the candidate area will be eliminated. When the last candidate area is traversed, the cycle ends, and the final remaining candidate area is determined as the text area in the image to be detected. If there are multiple remaining candidate areas, it can be determined that there are multiple text areas in the image to be detected.
步骤S510-S514也可以表达为:按照排列顺序,依次针对每个候选区域,逐一计算该候选区域与除该候选区域以外的候选区域的重叠程度;将除该候选区域以外的候选区域中,所述重叠程度大于预设的重叠阈值的候选区域剔除。Steps S510-S514 can also be expressed as: according to the order of arrangement, for each candidate area, calculate the degree of overlap between the candidate area and the candidate area other than the candidate area one by one; The candidate regions whose degree of overlap is greater than the preset overlap threshold are eliminated.
步骤S516,将剔除后的剩余的候选区域确定为待检测图像中的文本区域。Step S516: Determine the remaining candidate area after being eliminated as the text area in the image to be detected.
上述方式中,通过文本检测模型可以得到多个候选区域以及每个候选区域的概率值,进而再通过非极大抑制的方式从多个候选区域中确定文本区域。该方式中,文本检测模型可以自动提取不同尺度的特征,如果输入一张图像至该模型可以得到该图像中各种尺度的文本区域的候选区域,无需再人工变换图像尺度,操作便捷,尤其在多种字号、多种字体、多种形状、多种方向场景下,可以快速全面准确地检测出图像中的各类文本,进而也有利于后续文本识别的准确性,提高了文本识别的效果。In the above manner, multiple candidate regions and the probability value of each candidate region can be obtained through the text detection model, and then the text region can be determined from the multiple candidate regions by non-maximum suppression. In this way, the text detection model can automatically extract features of different scales. If you input an image to the model, you can get the candidate regions of the text area of various scales in the image. There is no need to manually transform the image scale, and the operation is convenient, especially in With multiple font sizes, multiple fonts, multiple shapes, and multiple orientations, various types of text in the image can be detected quickly, comprehensively and accurately, which in turn is beneficial to the accuracy of subsequent text recognition and improves the effect of text recognition.
本申请实施例还提供一种文本内容确定方法,该方法在上述实施例所述的文本区域确定方法的基础上实现;如图6所示,该方法包括如下步骤:The embodiment of the present application also provides a method for determining text content, which is implemented on the basis of the method for determining a text area described in the foregoing embodiment; as shown in FIG. 6, the method includes the following steps:
步骤S602,通过上述文本区域确定方法,获取图像中的文本区域;Step S602: Obtain the text area in the image by using the above-mentioned text area determination method;
步骤S604,将文本区域输入至预先训练完成的文本识别模型,输出文本区域的识别结果;Step S604, input the text area into the pre-trained text recognition model, and output the recognition result of the text area;
步骤S606,根据识别结果确定文本区域中的文本内容。Step S606: Determine the text content in the text area according to the recognition result.
上述文本识别模型可以通过多种方式训练得到,如循环神经网络、卷积神经网络,当然也可以通过光学字符识别的方式得到文本区域的识别结果。可以将文本识别模型输出的识别结果确定为文本区域中的文本内容,也可以先对文本识别模型输出的识别结果进行优化处理,如删除重复字符和空字符、空字符等,进而将处理后的识别结果确定为文本区域中的文本内容。The above text recognition model can be obtained by training in a variety of ways, such as cyclic neural network and convolutional neural network. Of course, the recognition result of the text area can also be obtained by means of optical character recognition. The recognition result output by the text recognition model can be determined as the text content in the text area, or the recognition result output by the text recognition model can be optimized first, such as deleting repeated characters and empty characters, empty characters, etc., and then the processed The recognition result is determined as the text content in the text area.
本申请实施例提供的文本内容确定方法,首先通过上述文本区域确定方法获取图像中的文本区域;再将该文本区域输入至预先训练完成的文本识别模型,输出文本区域的识别结果;最后根据该识别结果确定文本区域中的文本信息。该方式中,由于上述文本区域确定方法可以通过文本检测模型获取到各种尺度的文本区域,在多种字号、多种字体、多种形状、多种方向场景下,可以快速全面准确地检测出图像中的各类文本,进而也有利于文本识别的准确性,提高了文本识别的效果。In the method for determining text content provided by the embodiments of the present application, the text area in the image is first obtained by the above-mentioned method for determining text area; then the text area is input into a pre-trained text recognition model, and the recognition result of the text area is output; The recognition result determines the text information in the text area. In this method, because the above-mentioned text area determination method can obtain text areas of various scales through the text detection model, it can quickly, comprehensively and accurately detect in a variety of font sizes, fonts, shapes, and directions. The various types of text in the image in turn also contribute to the accuracy of text recognition and improve the effect of text recognition.
本申请实施例还提供另一种文本内容确定方法,该方法在上述实施例所述方法的基础上实现;该方法重点描述文本识别模型的训练方法;该文本识别模型可以用于文本识别,该文本识别可以理解为:对图片中文本区域进行检测,从而定位出包含有文本的图片区域,进而在该图片区域中确定出文本所表达的语言含义。如图7所示,该检测模型通过下述方式训练完成:The embodiment of the present application also provides another method for determining text content, which is implemented on the basis of the method described in the above embodiment; the method focuses on the training method of the text recognition model; the text recognition model can be used for text recognition. Text recognition can be understood as: detecting the text area in the picture, thereby locating the picture area containing the text, and then determining the language meaning of the text in the picture area. As shown in Figure 7, the detection model is trained in the following way:
步骤S702,基于预设的训练集合确定目标训练文本图像;Step S702: Determine a target training text image based on a preset training set;
后续内容中,一些情况下,对第二初始模型进行训练的过程中,需要多次确定训练文本图像;一种实施方式中,每次可以从预设的训练集合中确定目标训练文本图像;或者,其他实施方式中,也可以每次重新获取新的训练文本图像。In the subsequent content, in some cases, in the process of training the second initial model, the training text image needs to be determined multiple times; in one embodiment, the target training text image can be determined from the preset training set each time; or In other embodiments, a new training text image may also be obtained every time.
以从预设的训练集合中确定目标训练文本图像为例来说,该目标训练文本图像可以为单独的图像,也可以为标注在图像上的图像区域。该训练集合中可以包含有多张图像,为了提高文本识别模型的应用广泛性,训练集合中的图像可以包含各种场景下的图像,例如,直播场景图像、游戏场景图像、户外场景图像、室内场景图像等;训练集合中的图像也可以包含多种字号、形状、字体、语言的文本行,以使训练出的文本识别模型能够检测各类文本行。每张目标训练文本图像对应有由人工标注的文本行的文本内容,如“你好”“真棒”等。每张目标训练文本图像对应一个标注的文本内容。Taking the determination of a target training text image from a preset training set as an example, the target training text image may be a separate image or an image area marked on the image. The training set can contain multiple images. In order to improve the wide application of the text recognition model, the images in the training set can contain images in various scenes, for example, live broadcast scene images, game scene images, outdoor scene images, indoor Scene images, etc.; the images in the training set can also contain text lines of multiple font sizes, shapes, fonts, and languages, so that the trained text recognition model can detect various text lines. Each target training text image corresponds to the text content of the manually labeled text line, such as "Hello", "Awesome", etc. Each target training text image corresponds to a labeled text content.
在标注完成后,还可以通过训练集合中所有图像对应的所有文本行的文本内容,建立字符库;具体而言,获取到训练集合中所有图像对应的所有文本行的文本内容,从中提取不同的字符,将彼此不同的字符组成字符库。另外,还可以将上述训练集合中的多张图像按照预设比例划分为训练子集和测试子集。在训练过程中,可以从训练子集从获取目标训练图像。训练完成后,可以从测试子集中获取目标测试图像,用于测试文本识别模型的性能。After the annotation is completed, you can also build a character library based on the text content of all text lines corresponding to all images in the training set; specifically, obtain the text content of all text lines corresponding to all images in the training set, and extract different Characters, which compose characters that are different from each other into a character library. In addition, the multiple images in the above-mentioned training set may be divided into a training subset and a testing subset according to a preset ratio. In the training process, the target training image can be obtained from the training subset. After the training is completed, the target test image can be obtained from the test subset to test the performance of the text recognition model.
步骤S704,将目标训练文本图像输入至第二初始模型;第二初始模型包括第二特征提取网络、特征拆分网络、第二输出网络和分类函数;Step S704, input the target training text image into a second initial model; the second initial model includes a second feature extraction network, a feature splitting network, a second output network, and a classification function;
步骤S706,通过第二特征提取网络提取目标训练文本图像的特征图;Step S706, extract the feature map of the target training text image through the second feature extraction network;
该第二特征提取网络可以通过多层卷积层实现,通常,多层卷积层依次连接,每次卷积层通过设置相应的卷积核,对输入的数据进行卷积计算,最后一层卷积层输出的数据即可作为目标训练文本图像的特征图。The second feature extraction network can be realized by a multi-layer convolutional layer. Usually, the multi-layer convolutional layer is connected in sequence. Each time the convolutional layer sets the corresponding convolution kernel, the input data is convolutional calculation, and the last layer is The output data of the convolutional layer can be used as the feature map of the target training text image.
步骤S708,通过特征拆分网络将特征图拆分成至少一个子特征图;Step S708, split the feature map into at least one sub-feature map through the feature splitting network;
基于识别文本内容的目的,文本识别模型需要将文本行对应的特征图拆分,使每个子特征图中包含有一个或少量的文字或符号,便于文本内容的识别。在拆分过程中,可以预先设置子特征图的尺度,基于该子特征图的尺度拆分特征图;也可以预先设置子特征图的数量,基于该子特征图的数量拆分特征图。当然,如果文本行本来就很短,如只有一个字符,则特征图也可能仅拆分出一个子特征图。Based on the purpose of recognizing text content, the text recognition model needs to split the feature map corresponding to the text line, so that each sub feature map contains one or a small amount of text or symbols, which is convenient for text content recognition. During the splitting process, the scale of the sub-feature map can be preset, and the feature map can be split based on the scale of the sub-feature map; or the number of sub-feature maps can be preset, and the feature map can be split based on the number of sub-feature maps. Of course, if the text line is inherently short, such as only one character, the feature map may also be split into only one sub-feature map.
步骤S710,将上述子特征图分别输入至第二输出网络,输出每个子特征图对应的输出矩阵;Step S710, input the aforementioned sub-characteristic maps to the second output network respectively, and output the output matrix corresponding to each sub-characteristic map;
该第二输出网络设置为对子特征图进行再次计算;输出的每个子特征图对应的输出矩阵中,每个位置对应有一个预设的字符;该位置上的数值可以表征该子特征图与该位置对应的字符的匹配程度。该第二输出网络可以为卷积网络或全连接网络。The second output network is set to recalculate the sub-characteristic map; in the output matrix corresponding to each sub-characteristic map output, each position corresponds to a preset character; the value at this position can represent the sub-characteristic map and The matching degree of the character corresponding to the position. The second output network may be a convolutional network or a fully connected network.
步骤S712,将每个子特征图对应的输出矩阵分别输入至分类函数,输出每个子特征图对应的概率矩阵;Step S712, input the output matrix corresponding to each sub-feature map to the classification function, and output the probability matrix corresponding to each sub-feature map;
该分类函数可以将输出矩阵中的各个数值映射为概率值,从而得到概率矩阵。该概率矩阵中的每个位置上的概率值,可以用于表征该子特征图与该位置对应的字符相匹配的概率。The classification function can map each value in the output matrix to a probability value, thereby obtaining a probability matrix. The probability value at each position in the probability matrix can be used to characterize the probability that the sub-characteristic map matches the character corresponding to the position.
步骤S714,通过预设的识别损失函数确定概率矩阵的第二损失值;根据该第二损失值对上述第二初始模型进行训练,直至第二初始模型中的参数收敛,得到文本识别模型。Step S714: Determine the second loss value of the probability matrix through the preset recognition loss function; train the second initial model according to the second loss value until the parameters in the second initial model converge to obtain a text recognition model.
举例来说,目标训练文本图像中可以预先标注有标准的文本内容,该文本内容可以由一个或多个标 准字符组成;基于该文本内容可以生成概率矩阵;该概率矩阵中,子特征图对应的标准字符对应的位置的概率值可以为1,其他位置的概率值可以为0。识别损失函数可以比较分类函数输出的概率矩阵与标准的文本内容的概率矩阵的区别,通常区别越大,上述第二损失值越大。基于该第二损失值可以调整上述第二初始模型中各个部分的参数,以达到训练的目的。当模型中各个参数收敛时,训练结束,得到文本识别模型。For example, the target training text image can be pre-marked with standard text content, and the text content can be composed of one or more standard characters; a probability matrix can be generated based on the text content; in the probability matrix, the sub-feature map corresponds to The probability value of the position corresponding to the standard character can be 1, and the probability value of other positions can be 0. The recognition loss function can compare the difference between the probability matrix output by the classification function and the probability matrix of the standard text content. Generally, the greater the difference, the greater the aforementioned second loss value. Based on the second loss value, the parameters of each part of the second initial model can be adjusted to achieve the purpose of training. When the parameters in the model converge, the training ends and the text recognition model is obtained.
上述文本识别模型的训练方式中,首先提取目标训练文本图像的特征图;再将该特征图拆分成至少一个子特征图;进而将该子特征图分别输入至第二输出网络,输出每个子特征图对应的输出矩阵;再通过分类函数得到每个子特征图对应的概率矩阵;通过预设的识别损失函数确定概率矩阵的第二损失值后,根据该第二损失值对第二初始模型进行训练得到文本识别模型。该方式中,模型可以自动对图像的特征图进行切分,因而该文本识别模型,只需要输入包含有文本行的图像即可得到该图像中的文本内容,无需再对文本行进行切分,直接可得到文本行的文本内容,操作编辑,运算速度快,同时文本的识别准确度较高。In the training method of the text recognition model described above, first extract the feature map of the target training text image; then split the feature map into at least one sub-feature map; then input the sub-feature map to the second output network separately, and output each sub-feature map. The output matrix corresponding to the feature map; the probability matrix corresponding to each sub-feature map is obtained through the classification function; after the second loss value of the probability matrix is determined by the preset recognition loss function, the second initial model is performed according to the second loss value Train the text recognition model. In this way, the model can automatically segment the feature map of the image. Therefore, the text recognition model only needs to input the image containing the text line to get the text content in the image, and there is no need to segment the text line. The text content of the text line can be directly obtained, operation and editing, the calculation speed is fast, and the text recognition accuracy is high.
下面重点描述上述训练方法中各个步骤的具体实现过程:The following focuses on the specific implementation process of each step in the above training method:
步骤S702,基于预设的训练集合确定目标训练文本图像;Step S702: Determine a target training text image based on a preset training set;
步骤S704,将目标训练文本图像输入至第二初始模型;该第二初始模型包括第二特征提取网络、特征拆分网络、第二输出网络和分类函数;Step S704, input the target training text image to a second initial model; the second initial model includes a second feature extraction network, a feature splitting network, a second output network, and a classification function;
步骤S706,通过第二特征提取网络提取目标训练文本图像的特征图;Step S706, extract the feature map of the target training text image through the second feature extraction network;
该第二特征提取网络可以包括依次连接的多组第二卷积网络;每组第二卷积网络包括依次连接的卷积层、池化层和激活函数层。图8示出了一种第二特征提取网络的结构示意图;图8中以四组第二卷积网络为例进行说明,后一组第二卷积网络的卷积层连接前一组第二卷积网络的激活函数层。第二特征提取网络中还可以包含更多组或更少组的第二卷积网络。The second feature extraction network may include multiple groups of second convolutional networks connected in sequence; each group of second convolutional networks includes a convolution layer, a pooling layer, and an activation function layer connected in sequence. Figure 8 shows a schematic structural diagram of a second feature extraction network; Figure 8 takes four sets of second convolutional networks as an example for illustration. The convolutional layers of the latter set of second convolutional networks are connected to the previous set of second convolutional networks. The activation function layer of the convolutional network. The second feature extraction network may also include more or fewer sets of second convolutional networks.
可以理解,第二卷积网络中的卷积层用于提取特征,生成特征图;该池化层可以为平均池化层(Average Pooling或mean-pooling)、全局平均池化层(Global Average Pooling)、最大池化层(max-pooling)等;池化层可以用于对卷积层输出的特征图进行压缩,保留特征图中的主要特征,删除非主要特征,以降低特征图的维度,以平均池化层为例,平均池化层可以对当前特征点的预设范围大小的邻域内的特征点值求平均,将平均值作为该当前特征点的新的特征点值。另外,池化层还可以帮助特征图保持一些不变形,例如旋转不变性、平移不变性、伸缩不变性等。激活函数层可以对池化层处理后的特征图进行函数变换,该变换过程打破卷积层输入的线性组合,可以提高第二卷积网络的特征表达能力。该激活函数层具体可以为Sigmoid函数、tanh函数、Relu函数等。It can be understood that the convolutional layer in the second convolutional network is used to extract features and generate feature maps; the pooling layer can be an average pooling layer (Average Pooling or mean-pooling), a global average pooling layer (Global Average Pooling) ), max-pooling, etc.; the pooling layer can be used to compress the feature map output by the convolutional layer, retain the main features in the feature map, and delete non-main features to reduce the dimension of the feature map. Taking the average pooling layer as an example, the average pooling layer can average the feature point values in the neighborhood of the preset range size of the current feature point, and use the average value as the new feature point value of the current feature point. In addition, the pooling layer can also help the feature map to maintain some non-deformation, such as rotation invariance, translation invariance, and expansion invariance. The activation function layer can perform function transformation on the feature map processed by the pooling layer. The transformation process breaks the linear combination of the input of the convolutional layer and can improve the feature expression ability of the second convolutional network. The activation function layer may specifically be Sigmoid function, tanh function, Relu function, etc.
步骤S708,通过特征拆分网络将特征图拆分成至少一个子特征图;Step S708, split the feature map into at least one sub-feature map through the feature splitting network;
考虑到大部分的文本行为横向排列,为了使拆分后的子特征图中包含有一个或少量的字符对应的特征,可以沿着特征图的列方向,将特征图拆分成至少一个子特征图;该特征图的列方向可以理解为文本行方向的垂直方向。一种情况下,可以根据大部分字符的宽度设置子特征图的宽度,根据该宽度拆分上述特征图。例如,上述特征图为H*W*C,预设的子特征图的宽度为k,则每个子特征图为H*(W/k)*C。另外,还可以预设子特征图的个数,如T个,则每个子特征图为H*(W/T)*C。Considering that most of the text behavior is arranged horizontally, in order to make the sub-feature map after the split contains one or a few characters corresponding to the feature, the feature map can be split into at least one sub-feature along the column direction of the feature map Figure; the column direction of the feature map can be understood as the vertical direction of the text row direction. In one case, the width of the sub-characteristic map can be set according to the width of most characters, and the aforementioned characteristic map can be split according to the width. For example, the above feature map is H*W*C, and the preset width of the sub feature map is k, then each sub feature map is H*(W/k)*C. In addition, the number of sub-feature maps can also be preset, such as T, each sub-feature map is H*(W/T)*C.
步骤S710,将上述子特征图分别输入至第二输出网络,输出每个子特征图对应的输出矩阵;Step S710, input the aforementioned sub-characteristic maps to the second output network respectively, and output the output matrix corresponding to each sub-characteristic map;
以卷积网络为例,该第二输出网络包括多个全连接层;多个全连接层并列设置;该全连接层的数量与子特征图的数量对应,将每个子特征图分别输入至对应的全连接层中,得到每个全连接层分别输出的子特征图对应的输出矩阵。Taking the convolutional network as an example, the second output network includes multiple fully connected layers; multiple fully connected layers are arranged in parallel; the number of fully connected layers corresponds to the number of sub-feature maps, and each sub-feature map is input to the corresponding In the fully connected layer of, the output matrix corresponding to the sub-characteristic map output by each fully connected layer is obtained.
步骤S712,将每个子特征图对应的输出矩阵分别输入至分类函数,输出每个子特征图对应的概率矩阵;Step S712, input the output matrix corresponding to each sub-feature map to the classification function, and output the probability matrix corresponding to each sub-feature map;
该分类函数可以为Softmax函数;该Softmax函数可以标识为
Figure PCTCN2020087809-appb-000004
其中,e表示自然常数;t表示第t个概率矩阵;K表示所述训练集合的目标训练文本图像所包含的不同字符的个数;m表示从1到K+1;∑表示求和运算;
Figure PCTCN2020087809-appb-000005
为所述输出矩阵中的第i个元素;所述
Figure PCTCN2020087809-appb-000006
为所述概率矩阵pt中的第i个元素。
The classification function can be a Softmax function; the Softmax function can be identified as
Figure PCTCN2020087809-appb-000004
Where, e represents a natural constant; t represents the t-th probability matrix; K represents the number of different characters contained in the target training text image of the training set; m represents from 1 to K+1; ∑ represents a sum operation;
Figure PCTCN2020087809-appb-000005
Is the i-th element in the output matrix; the
Figure PCTCN2020087809-appb-000006
Is the i-th element in the probability matrix pt.
相对于输出矩阵中的元素
Figure PCTCN2020087809-appb-000007
本身,元素的指数函数值
Figure PCTCN2020087809-appb-000008
可以扩大各个元素之间的差异,例如,输出矩阵为[3,1,-3],计算每个元素的指数函数值后,该输出矩阵对应的指数函数值矩阵为[20,2.7,0.05]。采用元素的指数函数值计算各元素的概率,可以增大彼此间的概率差距,使正确的识别结果的概率更高,有利于识别结果的准确性。
Relative to the elements in the output matrix
Figure PCTCN2020087809-appb-000007
Itself, the value of the exponential function of the element
Figure PCTCN2020087809-appb-000008
The difference between each element can be expanded. For example, the output matrix is [3,1,-3]. After calculating the exponential function value of each element, the exponential function value matrix corresponding to the output matrix is [20,2.7,0.05] . Using the element's exponential function value to calculate the probability of each element can increase the probability gap between each other, make the probability of the correct recognition result higher, and help the accuracy of the recognition result.
步骤S714,通过预设的识别损失函数确定概率矩阵的第二损失值;根据该第二损失值对第二初始模型进行训练,直至第二初始模型中的参数收敛,得到文本识别模型。Step S714: Determine the second loss value of the probability matrix through the preset recognition loss function; train the second initial model according to the second loss value until the parameters in the second initial model converge to obtain a text recognition model.
该识别损失函数包括L=-log p(y|{p t} t=1…T);其中,y为预先标注的所述目标训练文本图像的概率矩阵;t表示第t个概率矩阵;p t为所述分类函数输出的每个所述子特征图对应的概率矩阵;T为所述概率矩阵的总数量;p表示计算概率;log表示对数运算。基于该识别损失函数,上述步骤中,根据该第二损失值对第二初始模型进行训练的过程,还可以通过下述步骤32-38实现: The recognition loss function includes L=-log p(y|{p t } t=1...T ); where y is the pre-labeled probability matrix of the target training text image; t represents the t-th probability matrix; p t is the probability matrix corresponding to each of the sub-characteristic maps output by the classification function; T is the total number of the probability matrix; p represents the calculated probability; log represents the logarithmic operation. Based on the recognition loss function, in the above steps, the process of training the second initial model according to the second loss value can also be implemented through the following steps 32-38:
步骤32,根据第二损失值更新第二初始模型中的参数;Step 32: Update the parameters in the second initial model according to the second loss value;
举例来说,可以预先设置函数映射关系,将原始参数和第二损失值输入至该函数映射关系中,即可计算得到更新的参数。不同参数的函数映射关系可以相同,也可以不同。For example, the function mapping relationship can be preset, and the original parameter and the second loss value are input into the function mapping relationship, and then the updated parameters can be calculated. The function mapping relationship of different parameters can be the same or different.
具体而言,可以按照预设规则,从第二初始模型确定待更新参数;该待更新参数可以为第二初始模型中的所有参数,也可以随机从第二初始模型中确定部分参数;再计算第二损失值对待更新参数的导数
Figure PCTCN2020087809-appb-000009
其中,L′为概率矩阵的损失值;W′为待更新参数;
Figure PCTCN2020087809-appb-000010
表示偏导数运算;该待更新参数也可以称为各神经元的权值。该过程也可以称为反向传播算法;如果第二损失值较大,则说明当前的第二初始模型的输出与期望输出结果不符,则求出上述第二损失值对第二初始模型中待更新参数的导数,该导数可以作为调整待更新参数的依据。
Specifically, the parameters to be updated can be determined from the second initial model according to preset rules; the parameters to be updated can be all parameters in the second initial model, or some parameters can be randomly determined from the second initial model; and then calculate The second loss value is the derivative of the parameter to be updated
Figure PCTCN2020087809-appb-000009
Among them, L′ is the loss value of the probability matrix; W′ is the parameter to be updated;
Figure PCTCN2020087809-appb-000010
Represents partial derivative operation; the parameter to be updated can also be called the weight of each neuron. This process can also be called a back-propagation algorithm; if the second loss value is large, it means that the output of the current second initial model does not match the expected output result. Update the derivative of the parameter, which can be used as a basis for adjusting the parameter to be updated.
得到各个待更新参数的导数后,再更新待更新参数,得到更新后的待更新参数
Figure PCTCN2020087809-appb-000011
其中,α′为预设系数。该过程也可以称为随机梯度下降算法;各个待更新参数的导数也可以理解为基于当前的待更新参数,第一损失值下降最快的方向,通过该方向调整参数,可以使第一损失值快速降低,使该参数收敛。另外,当第二初始模型经一次训练后,得到一个第二损失值,此时可以从第二初始模型中各个参数中随机选择一个或多个参数进行上述的更新过程,该方式的模型训练时间较短,算法较快;当然也可以对第一初始模型中所有参数进行上述的更新过程,该方式的模型训练更加准确。
After obtaining the derivative of each parameter to be updated, update the parameter to be updated to obtain the updated parameter to be updated
Figure PCTCN2020087809-appb-000011
Among them, α'is the preset coefficient. This process can also be called a stochastic gradient descent algorithm; the derivative of each parameter to be updated can also be understood as the direction in which the first loss value drops the fastest based on the current parameter to be updated. By adjusting the parameters in this direction, the first loss value can be adjusted Decrease quickly to make the parameter converge. In addition, when the second initial model is trained once, a second loss value is obtained. At this time, one or more parameters can be randomly selected from each parameter in the second initial model to perform the above-mentioned update process. The model training time of this method is Shorter, faster algorithm; Of course, it is also possible to perform the above-mentioned update process on all parameters in the first initial model, and the model training in this way is more accurate.
步骤34,判断更新后的参数是否均收敛;如果更新后的参数均收敛,执行步骤36;如果更新后的参数没有均收敛,执行步骤38;Step 34: Judge whether the updated parameters all converge; if the updated parameters all converge, go to step 36; if the updated parameters do not all converge, go to step 38;
步骤36,将参数更新后的第二初始模型确定为识别模型;Step 36: Determine the second initial model after parameter update as the recognition model;
步骤38,继续执行基于预设的训练集合确定目标训练文本图像的步骤,直至更新后的各个参数均收敛。Step 38: Continue to perform the step of determining the target training text image based on the preset training set until all the updated parameters converge.
具体地,可以从训练集合中重新获取新的图像作为目标训练文本图像,也可以继续将当前的目标训练文本图像作为目标训练文本图像进行训练。Specifically, a new image can be retrieved from the training set as the target training text image, or the current target training text image can be continuously used as the target training text image for training.
上述方式中,模型可以自动对图像的特征图进行切分,应用该文本识别模型,如果输入包含有文本行的图像可以得到该图像中的文本内容,这样无需再对文本行进行切分,直接可得到文本行的文本内容,操作编辑,运算速度快,同时文本的识别准确度较高。In the above method, the model can automatically segment the feature map of the image. Using the text recognition model, if you input an image containing a text line, you can get the text content in the image, so there is no need to segment the text line. The text content of the text line can be obtained, operation and editing, the calculation speed is fast, and the text recognition accuracy is high.
基于上述实施例提供的文本内容确定方法,本申请实施例还提供另一种文本内容确定方法,该方法在上述实施例所述的文本内容确定方法或文本识别模型的训练方法的基础上实现;该方法重点描述文本识别模型输出识别结果后,基于该识别结果得到文本区域的文本内容的过程;如图9所示,该方法包括如下步骤:Based on the text content determination method provided in the foregoing embodiment, an embodiment of the present application also provides another text content determination method, which is implemented on the basis of the text content determination method or text recognition model training method described in the foregoing embodiment; This method focuses on the process of obtaining the text content of the text area based on the recognition result after the text recognition model outputs the recognition result; as shown in Figure 9, the method includes the following steps:
步骤S902,通过上述文本区域确定方法,获取图像中的文本区域;Step S902: Obtain the text area in the image by using the above-mentioned text area determination method;
步骤S904,按照预设尺寸,对文本区域进行归一化处理。Step S904: Normalize the text area according to the preset size.
该预设尺寸可以包含预设的长度和宽度,如果文本区域不满足该预设尺寸,可以对该文本区域进行缩放处理,也可以对该文本区域进行剪切或填补空白区域的方式,以使处理后的文本区域满足上述预设尺寸。The preset size can include a preset length and width. If the text area does not meet the preset size, the text area can be scaled, or the text area can be cut or filled in the blank area to make The processed text area meets the aforementioned preset size.
步骤S906,将处理后的文本区域输入至预先训练完成的文本识别模型,输出文本区域的识别结果;该文本区域的识别结果包括文本区域对应的多个概率矩阵;Step S906, input the processed text area into the pre-trained text recognition model, and output the recognition result of the text area; the recognition result of the text area includes multiple probability matrices corresponding to the text area;
文本识别模型在识别过程中,需要对文本区域对应的特征图进行切分,对切分后的子特征图分别通过相应的输出网络输出输出矩阵,进而再通过分类函数得到每个输出矩阵对应的概率矩阵,因而文本区域的识别结果包括多个概率矩阵,每个概率矩阵通常对应一个或少量字符。In the recognition process of the text recognition model, the feature map corresponding to the text area needs to be segmented, and the sub-feature maps after segmentation are output through the corresponding output network to output the output matrix, and then the classification function is used to obtain the corresponding output matrix Probability matrix, so the recognition result of the text area includes multiple probability matrices, each probability matrix usually corresponds to one or a few characters.
步骤S908,确定每个概率矩阵中的最大概率值的位置;Step S908, determining the position of the maximum probability value in each probability matrix;
步骤S910,从预先设置的概率矩阵中各个位置与字符的对应关系中,获取最大概率值的位置对应的字符;为了方便描述,可以将获取到的字符称为待排列字符。Step S910: Obtain the character corresponding to the position with the maximum probability value from the correspondence between each position and the character in the preset probability matrix; for the convenience of description, the obtained character may be called the character to be arranged.
如上述实施例所述,概率矩阵中每个位置上的概率值,可以用于表征该子特征图与该位置对应的字符相匹配的概率。因而可以将最大概率值的位置对应的字符,确定为对应子特征图的识别结果。在大多数情况下,最大概率值的位置对应的字符,可以为一个字符,也可以多个字符。上述各个位置与字符的对应关系,可以通过下述方式建立:首先采集字符,该字符可以包含多种语言的文字、标点符号、数学符号、网络表情符号等;具体可以在建立训练集合的过程中采集字符,也可以通过词典、字符库、符号库等采集。As described in the foregoing embodiment, the probability value at each position in the probability matrix can be used to characterize the probability that the sub-characteristic map matches the character corresponding to the position. Therefore, the character corresponding to the position of the maximum probability value can be determined as the recognition result of the corresponding sub-feature map. In most cases, the character corresponding to the position of the maximum probability value can be one character or multiple characters. The correspondence between the above positions and the characters can be established in the following way: first collect characters, which can include text, punctuation marks, mathematical symbols, network emoticons, etc. in multiple languages; specifically, it can be in the process of establishing a training set Collecting characters can also be collected through dictionaries, character libraries, symbol libraries, etc.
步骤S912,按照多个概率矩阵的排列顺序,排列获取到的字符(上述待排列字符);Step S912: Arrange the acquired characters (the characters to be arranged above) according to the arrangement order of the multiple probability matrices;
文本识别模型输出的多个概率矩阵,通常按照各概率矩阵对应的子特征图在特征图中的位置确定排列顺序,因而多个概率矩阵的排列顺序通常与各个概率矩阵对应的子特征图包含的字符的排列顺序相一致;基于此,按照多个概率矩阵的排列顺序,排列获取到的字符,可以使排列后的字符与原始的文本行的字符排列相一致,因而可以根据排列后的字符确定文本区域中的文本内容。The multiple probability matrices output by the text recognition model are usually arranged according to the position of the sub-feature map corresponding to each probability matrix in the feature map. Therefore, the arrangement order of the multiple probability matrices is usually contained in the sub-feature map corresponding to each probability matrix. The arrangement order of the characters is consistent; based on this, according to the arrangement order of the multiple probability matrices, arrange the obtained characters, so that the arranged characters can be consistent with the character arrangement of the original text line, so it can be determined according to the arranged characters The text content in the text area.
步骤S914,根据排列后的字符确定文本区域中的文本内容。Step S914: Determine the text content in the text area according to the arranged characters.
举例来说,可以直接将排列后的字符确定文本区域中的文本内容;但考虑到文本中的字符字体大小不同,因而文本识别模型中,在切分特征图时,可能不会完全按照一个字符对应一个子特征图的方式实现,因而,最终排列后的字符中可能有相互重复的字符,为了进一步优化文本的识别效果,可以按照预设规则,删除排列后的字符中的重复字符和空字符,得到文本区域中的文本内容。For example, you can directly determine the text content in the text area by the arranged characters; however, considering that the font size of the characters in the text is different, in the text recognition model, when segmenting the feature map, it may not be exactly according to a character Corresponding to a sub-characteristic map. Therefore, there may be repeated characters in the final arranged characters. In order to further optimize the recognition effect of the text, you can delete the repeated characters and empty characters in the arranged characters according to preset rules To get the text content in the text area.
具体而言,可以预先建立一个叠词库,如果排列后的字符中存在重复字符,可以从叠词库中查找是否存在该重复字符,如果不存在,则删除该重复字符,仅保留重复字符中的一个;另外,还可以结合其他字符的语义判断当前语境是否应当存在重复字符。对于空字符,也可以结合当前语境判断是否删除,如果空字符位于两个英文单词之间,则无需删除,可以保留。举例而言,上述排列后的字符为“--hh-e-l-ll-oo-”,其中,“-”代表空字符;删除重复字符和空字符后,得到的文本内容为“hello”。Specifically, a repetitive word library can be established in advance. If there are repeated characters in the arranged characters, you can find whether the repeated characters exist in the repetitive words library, if not, delete the repeated characters, and only keep the repeated characters In addition, you can also combine the semantics of other characters to determine whether there should be repeated characters in the current context. For empty characters, you can also determine whether to delete them based on the current context. If the empty characters are located between two English words, they do not need to be deleted and can be kept. For example, the characters after the above arrangement are "--hh-e-l-ll-oo-", where "-" represents a null character; after deleting repeated characters and null characters, the text content obtained is "hello".
上述方式中,首先对获取到的文本区域进行归一化处理,再通过文本识别模型得到文本区域的识别结果;进而通过识别结果中的各个概率矩阵确定识别出的字符,进而得到文本区域的文本内容。由于文本识别模型可以自动对图像的特征图进行切分,因而该方式中,如果输入包含有文本行的图像可以得到该图像的识别结果,进而得到文本内容,无需再对文本行进行切分,直接可得到文本行的文本内容,操作编辑,运算速度快,同时文本的识别准确度较高。In the above method, first normalize the acquired text area, and then obtain the recognition result of the text area through the text recognition model; then determine the recognized characters through each probability matrix in the recognition result, and then obtain the text of the text area content. Since the text recognition model can automatically segment the feature map of the image, in this way, if you input an image containing a text line, you can get the recognition result of the image, and then get the text content, without the need to segment the text line. The text content of the text line can be directly obtained, operation and editing, the calculation speed is fast, and the text recognition accuracy is high.
基于上述实施例提供的文本内容确定方法,本申请实施例还提供另一种文本内容确定方法,该方法在上述方法的基础上实现;该方法重点描述得到文本区域的文本内容后,基于该文本内容判断图像中是否包含敏感词的过程。Based on the method for determining text content provided by the foregoing embodiment, the embodiment of the present application also provides another method for determining text content, which is implemented on the basis of the foregoing method; the method focuses on obtaining the text content of the text area and then based on the text The content determines whether the image contains sensitive words.
通常,需要预先建立一个敏感词库,通过该敏感词库确定图像对应的文本内容中是否包含有敏感信息;该敏感词库中包含有敏感词,如涉及色情、反动、恐怖主义的敏感词;可以逐一对文本内容中的词语该敏感词库进行匹配,如果匹配成功,则说明当前词语为敏感词。基于此,本实施例的文本内容确定方法包括如下步骤,如图10所示:Usually, a sensitive word database needs to be established in advance, and the sensitive word database is used to determine whether the text content corresponding to the image contains sensitive information; the sensitive word database contains sensitive words, such as sensitive words involving pornography, reaction, and terrorism; The sensitive word database can be matched one by one for the words in the text content. If the matching is successful, the current word is a sensitive word. Based on this, the method for determining text content in this embodiment includes the following steps, as shown in FIG. 10:
步骤S1002,通过上述文本区域确定方法,获取图像中的文本区域;Step S1002: Obtain the text area in the image by the above-mentioned method for determining the text area;
步骤S1004,按照预设尺寸,对文本区域进行归一化处理。Step S1004: Normalize the text area according to the preset size.
该预设尺寸可以包含预设的长度和宽度,如果文本区域不满足该预设尺寸,可以对该文本区域进行缩放处理,也可以对该文本区域进行剪切或填补空白区域的方式,以使处理后的文本区域满足上述预设尺寸。The preset size can include a preset length and width. If the text area does not meet the preset size, the text area can be scaled, or the text area can be cut or filled in the blank area to make The processed text area meets the aforementioned preset size.
步骤S1006,将处理后的文本区域输入至预先训练完成的文本识别模型,输出文本区域的识别结果;该文本区域的识别结果包括文本区域对应的多个概率矩阵;Step S1006, input the processed text area into the pre-trained text recognition model, and output the recognition result of the text area; the recognition result of the text area includes multiple probability matrices corresponding to the text area;
步骤S1008,确定每个概率矩阵中的最大概率值的位置;Step S1008: Determine the position of the maximum probability value in each probability matrix;
步骤S1010,从预先设置的概率矩阵中各个位置与字符的对应关系中,获取最大概率值的位置对应的字符;Step S1010, obtaining the character corresponding to the position with the maximum probability value from the correspondence between each position and the character in the preset probability matrix;
步骤S1012,按照多个概率矩阵的排列顺序,排列获取到的字符;Step S1012: Arrange the acquired characters according to the arrangement order of the multiple probability matrices;
步骤S1014,根据排列后的字符确定文本区域中的文本内容。Step S1014: Determine the text content in the text area according to the arranged characters.
步骤S1016,如果图像中包含有多个文本区域,获取每个文本区域中的文本内容;Step S1016, if the image contains multiple text areas, obtain the text content in each text area;
步骤S1018,对获取到的文本内容进行分词操作;Step S1018, perform word segmentation operation on the obtained text content;
分词操作也可以称为切词操作;举例来说,可以建立一个词库,基于该词库进行分词操作;具体而 言,可以从文本内容中的第一个字符开始,将该第一个字符和第二个字符作为一个组合,从词库中查找,如果找不到包含该组合对应的词,则将第一个字符划分为一个单独的词;如果可以找到包含该组合对应的词,再将第三个字符加入至该组合中,继续从词库中查找;直至找不到包含该组合对应的词,将该组合中除最后一个字符以外的字符划分为一个词,依此类推,直至完成文本内容的切词操作。The word segmentation operation can also be called the word segmentation operation; for example, a thesaurus can be established and the word segmentation operation can be performed based on the thesaurus; specifically, it can start from the first character in the text content, and the first character And the second character as a combination, search from the thesaurus, if the word that contains the combination is not found, divide the first character into a single word; if the word that contains the combination can be found, then Add the third character to the combination, and continue to search from the thesaurus; until no word containing the combination is found, the characters except the last character in the combination are divided into one word, and so on, until Complete the word segmentation operation of the text content.
步骤S1020,逐一将分词操作后得到的分词与预先建立的敏感词库进行匹配;Step S1020, matching the word segmentation obtained after the word segmentation operation with the pre-established sensitive vocabulary one by one;
步骤S1022,如果至少一个分词匹配成功,确定图像对应的文本内容中包含有敏感信息。Step S1022, if at least one word segmentation is successfully matched, it is determined that the text content corresponding to the image contains sensitive information.
步骤S1024,获取匹配成功的分词所属的文本区域,在图像中标识出获取到的文本区域,或者匹配成功的分词。Step S1024: Obtain the text area to which the successfully matched word segment belongs, and identify the acquired text area or the successfully matched word segment in the image.
在实际实现时,可以以标识框的方式标识获取到的文本区域,或者匹配成功的分词;如果是在视频播放或实时直播场景下的实时检测,可以使用马赛克或模糊化的方式标识获取到的文本区域,或者匹配成功的分词,以达到过滤敏感词的目的。In actual implementation, the acquired text area can be identified by the identification box, or the word segmentation that has been successfully matched; if it is real-time detection in video playback or real-time live broadcast scenes, mosaic or obfuscation can be used to identify the acquired text area. Text area, or matching successful word segmentation, to achieve the purpose of filtering sensitive words.
上述方式中,获取到文本区域的文本内容后,再通过敏感词库从文本内容中识别敏感词,以实现言论监管的目的;该方式可以实时获取内容并识别敏感词,有利于实现在网络直播、视频直播等场景下的言论监管,并限制敏感词传播的目的。In the above method, after obtaining the text content of the text area, the sensitive words are identified from the text content through the sensitive word database to achieve the purpose of speech supervision; this method can obtain the content and identify sensitive words in real time, which is beneficial to realize the live broadcast , Video live broadcast and other scenarios, and restrict the purpose of dissemination of sensitive words.
需要说明的是,上述各方法实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。It should be noted that the foregoing method embodiments are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other.
对应于上述方法实施例,参见图11所示的一种文本检测模型训练装置的结构示意图,该装置包括:Corresponding to the foregoing method embodiment, refer to the schematic structural diagram of a text detection model training device shown in FIG. 11, which includes:
训练图像确定模块110,设置为确定目标训练图像;The training image determining module 110 is configured to determine the target training image;
训练图像输入模块111,设置为将目标训练图像输入至第一初始模型;第一初始模型包括第一特征提取网络、特征融合网络和第一输出网络;The training image input module 111 is configured to input the target training image into a first initial model; the first initial model includes a first feature extraction network, a feature fusion network, and a first output network;
特征提取模块112,设置为通过第一特征提取网络提取目标训练图像的多个初始特征图;多个初始特征图之间的尺度不同;The feature extraction module 112 is configured to extract multiple initial feature maps of the target training image through the first feature extraction network; the multiple initial feature maps have different scales;
特征融合模块113,设置为通过特征融合网络对多个初始特征图进行融合处理,得到融合特征图;The feature fusion module 113 is configured to perform fusion processing on multiple initial feature maps through a feature fusion network to obtain a fusion feature map;
输出模块114,设置为将融合特征图输入至第一输出网络,输出目标训练图像中文本区域的候选区域以及每个候选区域的概率值;The output module 114 is configured to input the fusion feature map to the first output network, and output the candidate regions of the text region in the target training image and the probability value of each candidate region;
损失值确定和训练模块115,设置为通过预设的检测损失函数确定候选区域以及每个候选区域的概率值的第一损失值;根据第一损失值对第一初始模型进行训练,直至第一初始模型中的参数收敛,得到文本检测模型。The loss value determination and training module 115 is configured to determine the candidate area and the first loss value of the probability value of each candidate area through a preset detection loss function; train the first initial model according to the first loss value until the first The parameters in the initial model converge, and the text detection model is obtained.
本申请实施例提供的文本检测模型训练装置,首先提取目标训练图像的尺度相互不同的多个初始特征图;再对多个初始特征图进行融合处理,得到融合特征图;进而将融合特征图输入至第一输出网络,输出目标训练图像中文本区域的候选区域以及每个候选区域的概率值;通过预设的检测损失函数确定第一损失值后,根据该第一损失值对第一初始模型进行训练,得到检测模型。该方式中,特征提取网络可以自动提取不同尺度的特征,因而该文本检测模型,只需要输入一张图像即可得到该图像中各种尺度的文本区域的候选区域,无需再人工变换图像尺度,操作便捷,尤其在多种字号、多种字体、多种形状、多种方向场景下,可以快速全面准确地检测出图像中的各类文本,进而也有利于后续文本识别的准确性,提高了文本识别的效果。The text detection model training device provided by the embodiment of the application first extracts multiple initial feature maps with different scales of the target training image; then performs fusion processing on the multiple initial feature maps to obtain a fused feature map; and then inputs the fused feature map To the first output network, output the candidate regions of the text region in the target training image and the probability value of each candidate region; after the first loss value is determined by the preset detection loss function, the first initial model is calculated according to the first loss value Perform training to obtain a detection model. In this method, the feature extraction network can automatically extract features of different scales. Therefore, the text detection model only needs to input an image to obtain the candidate regions of the text area of various scales in the image, without manually changing the image scale. The operation is convenient, especially in the scene of multiple font sizes, multiple fonts, multiple shapes, and multiple orientations. It can quickly, comprehensively and accurately detect various types of text in the image, which is also conducive to the accuracy of subsequent text recognition and improves The effect of text recognition.
在一些实施例中,上述第一特征提取网络包括依次连接的多组第一卷积网络;每组第一卷积网络包括依次连接的卷积层、批归一化层和激活函数层。In some embodiments, the aforementioned first feature extraction network includes multiple groups of first convolutional networks connected in sequence; each group of first convolutional networks includes a convolution layer, batch normalization layer, and activation function layer connected in sequence.
在一些实施例中,上述特征融合模块还设置为:根据初始特征图的尺度,将多个初始特征图依次排列;其中,最顶层级的初始特征图的尺度最小;最底层级的初始特征图的尺度最大;按照排列顺序,依次针对所述最顶层级以下的每一层级,将该层级的初始特征图和该层级的上一层级的融合结果进行融合,得到该层级的融合结果;其中,所述最顶层级的融合结果为所述最顶层级的初始特征图;将最低层级的融合结果确定为所述初始特征图的融合特征图。In some embodiments, the above-mentioned feature fusion module is further configured to arrange a plurality of initial feature maps in sequence according to the scale of the initial feature map; wherein the scale of the initial feature map at the top level is the smallest; and the initial feature map at the bottom level The scale of is the largest; according to the order of arrangement, for each level below the top level, the initial feature map of this level and the fusion result of the upper level of this level are fused to obtain the fusion result of this level; where, The fusion result of the top level is the initial feature map of the top level; the fusion result of the lowest level is determined as the fusion feature map of the initial feature map.
在一些实施例中,上述第一输出网络包括第一卷积层和第二卷积层;上述输出模块还设置为:将融合特征图分别输入至第一卷积层和第二卷积层;通过第一卷积层对融合特征图进行第一卷积运算,输出坐标矩阵;坐标矩阵包括目标训练图像中文本区域的候选区域的顶点坐标;通过第二卷积层对融合特征图进行第二卷积运算,输出概率矩阵;概率矩阵包括每个候选区域的概率值。In some embodiments, the above-mentioned first output network includes a first convolutional layer and a second convolutional layer; the above-mentioned output module is further configured to: input the fusion feature map to the first convolutional layer and the second convolutional layer respectively; Perform the first convolution operation on the fused feature map through the first convolutional layer, and output a coordinate matrix; the coordinate matrix includes the vertex coordinates of the candidate regions of the text area in the target training image; perform the second convolution on the fused feature map through the second convolution layer Convolution operation, output probability matrix; the probability matrix includes the probability value of each candidate area.
在一些实施例中,上述检测损失函数包括第一函数和第二函数;第一函数为L 1=|G *-G|;其中,G *为预先标注的目标训练图像中文本区域的坐标矩阵;G为第一输出网络输出的目标训练图像中文本区域的候选区域的坐标矩阵;第二函数为L 2=-Y *logY-(1-Y *)log(1-Y);其中,Y *为预先标注的目标训练图像中文本区域的概率矩阵;Y为第一输出网络输出的目标训练图像中文本区域的候选区域的概率矩阵;候选区域以及每个候选区域的概率值的第一损失值L=L 1+L 2In some embodiments, the aforementioned detection loss function includes a first function and a second function; the first function is L 1 = |G * -G|; where G * is the coordinate matrix of the text area in the pre-labeled target training image ; G is the coordinate matrix of the candidate area of the text area in the target training image output by the first output network; the second function is L 2 =-Y * logY-(1-Y * )log(1-Y); where Y * Is the probability matrix of the text area in the pre-labeled target training image; Y is the probability matrix of the candidate area of the text area in the target training image output by the first output network; the candidate area and the first loss of the probability value of each candidate area The value L=L 1 +L 2 .
在一些实施例中,上述损失值确定和训练模块还设置为:根据第一损失值更新第一初始模型中的参数;判断更新后的参数是否均收敛;如果更新后的参数均收敛,将参数更新后的第一初始模型确定为检测模型;如果更新后的参数没有均收敛,继续执行基于预设的训练集合确定目标训练图像的步骤,直至更新后的参数均收敛。In some embodiments, the aforementioned loss value determination and training module is further configured to: update the parameters in the first initial model according to the first loss value; determine whether the updated parameters are all converged; if the updated parameters are all converged, set the parameters The updated first initial model is determined to be the detection model; if the updated parameters do not all converge, the step of determining the target training image based on the preset training set is continued until the updated parameters all converge.
在一些实施例中,上述损失值确定和训练模块还设置为:按照预设规则,从第一初始模型确定待更新参数;计算第一损失值对第一初始模型中待更新参数的导数
Figure PCTCN2020087809-appb-000012
其中,L为第一损失值;W为待更新参数;更新待更新参数,得到更新后的待更新参数
Figure PCTCN2020087809-appb-000013
其中,α为预设系数。
In some embodiments, the aforementioned loss value determination and training module is further configured to: determine the parameter to be updated from the first initial model according to preset rules; and calculate the derivative of the first loss value to the parameter to be updated in the first initial model
Figure PCTCN2020087809-appb-000012
Among them, L is the first loss value; W is the parameter to be updated; update the parameter to be updated to obtain the updated parameter to be updated
Figure PCTCN2020087809-appb-000013
Among them, α is the preset coefficient.
参见图12所示的一种文本区域确定装置的结构示意图;该装置包括:See FIG. 12 for a schematic structural diagram of a text area determining device; the device includes:
图像获取模块120,设置为获取待检测图像;The image acquisition module 120 is configured to acquire the image to be detected;
检测模块122,设置为将待检测图像输入至预先训练完成的文本检测模型,输出待检测图像中文本区域的多个候选区域,以及每个候选区域的概率值;文本检测模型通过上述文本检测模型的训练方法训练得到;The detection module 122 is configured to input the image to be detected into the pre-trained text detection model, and output multiple candidate regions of the text region in the image to be detected, and the probability value of each candidate region; the text detection model adopts the above text detection model Training method of training;
文本区域确定模块124,设置为根据候选区域的概率值以及多个候选区域之间的重叠程度,从多个候选区域中确定待检测图像中的文本区域。The text area determination module 124 is configured to determine the text area in the image to be detected from the multiple candidate areas according to the probability value of the candidate area and the degree of overlap between the multiple candidate areas.
本申请实施例提供的上述文本区域确定装置,将获取到的待检测图像输入至文本检测模型,输出待检测图像中文本区域的多个候选区域以及每个候选区域的概率值;进而根据候选区域的概率值以及多个候选区域之间的重叠程度,从多个候选区域中确定待检测图像中的文本区域。该方式中,文本检测模型可以自动提取不同尺度的特征,因而只需要输入一张图像至该模型即可得到该图像中各种尺度的文本区域的候选区域,无需再人工变换图像尺度,操作便捷,尤其在多种字号、多种字体、多种形状、多种方向场景下,可以快速全面准确地检测出图像中的各类文本,进而也有利于后续文本识别的准确性,提高 了文本识别的效果。The above-mentioned text area determination device provided by the embodiment of the application inputs the acquired image to be detected into the text detection model, and outputs multiple candidate areas of the text area in the image to be detected and the probability value of each candidate area; and then according to the candidate area The probability value of and the degree of overlap between multiple candidate areas are used to determine the text area in the image to be detected from the multiple candidate areas. In this method, the text detection model can automatically extract features of different scales, so you only need to input an image to the model to get the candidate regions of the text area of various scales in the image, without manually changing the image scale, and the operation is convenient , Especially in the scene of multiple font sizes, multiple fonts, multiple shapes, multiple orientations, it can quickly, comprehensively and accurately detect all types of text in the image, which is also conducive to the accuracy of subsequent text recognition and improves text recognition Effect.
在一些实施例中,上述文本区域确定模块还设置为:根据候选区域的概率值,将多个候选区域依次排列;其中,第一个候选区域的概率值最大,最后一个候选区域的概率值最小;按照排列顺序,依次针对每个候选区域,逐一计算该候选区域与除该候选区域以外的候选区域的重叠程度;将除该候选区域以外的候选区域中,所述重叠程度大于预设的重叠阈值的候选区域剔除;将剔除后的剩余的候选区域确定为待检测图像中的文本区域。In some embodiments, the above-mentioned text area determination module is further configured to arrange multiple candidate areas in sequence according to the probability value of the candidate area; wherein the probability value of the first candidate area is the largest, and the probability value of the last candidate area is the smallest. ; According to the arrangement order, for each candidate area in turn, calculate the degree of overlap between the candidate area and the candidate area other than the candidate area one by one; In the candidate areas other than the candidate area, the degree of overlap is greater than the preset overlap Threshold candidate area removal; the remaining candidate area after removal is determined as the text area in the image to be detected.
在一些实施例中,上述装置还包括:区域剔除模块,设置为将多个候选区域中,概率值低于预设的概率阈值的候选区域剔除,得到最终的多个候选区域。In some embodiments, the above-mentioned device further includes: a region elimination module, configured to eliminate candidate regions whose probability value is lower than a preset probability threshold among the multiple candidate regions to obtain the final multiple candidate regions.
参见图13所示的一种文本内容确定装置的结构示意图;该装置包括:See FIG. 13 for a schematic structural diagram of a text content determination device; the device includes:
区域获取模块130,设置为通过上述任一种文本区域确定方法,获取图像中的文本区域;The area obtaining module 130 is configured to obtain the text area in the image by using any of the foregoing text area determination methods;
识别模块132,设置为将文本区域输入至预先训练完成的文本识别模型,输出文本区域的识别结果;The recognition module 132 is configured to input the text area into the pre-trained text recognition model, and output the recognition result of the text area;
文本内容确定模块134,设置为根据识别结果确定文本区域中的文本内容。The text content determination module 134 is configured to determine the text content in the text area according to the recognition result.
本申请实施例提供的文本内容确定装置,首先通过上述文本区域确定方法获取图像中的文本区域;再将该文本区域输入至预先训练完成的文本识别模型,输出文本区域的识别结果;最后根据该识别结果确定文本区域中的文本信息。该方式中,由于上述文本区域确定方法可以通过文本检测模型获取到各种尺度的文本区域,在多种字号、多种字体、多种形状、多种方向场景下,可以快速全面准确地检测出图像中的各类文本,进而也有利于文本识别的准确性,提高了文本识别的效果。The text content determination device provided by the embodiment of the present application first obtains the text area in the image by the above-mentioned text area determination method; then inputs the text area into the pre-trained text recognition model, and outputs the recognition result of the text area; The recognition result determines the text information in the text area. In this method, because the above-mentioned text area determination method can obtain text areas of various scales through the text detection model, it can quickly, comprehensively and accurately detect in a variety of font sizes, fonts, shapes, and directions. The various types of text in the image in turn also contribute to the accuracy of text recognition and improve the effect of text recognition.
在一些实施例中,上述装置还包括:归一化模块,设置为按照预设尺寸,对文本区域进行归一化处理,得到处理后的文本区域;In some embodiments, the above-mentioned apparatus further includes: a normalization module, configured to perform normalization processing on the text area according to a preset size to obtain a processed text area;
识别模块132具体设置为:将所述处理后的文本区域输入至预先训练完成的识别模型。The recognition module 132 is specifically configured to input the processed text area into the pre-trained recognition model.
在一些实施例中,上述装置还包括文本识别模型训练模块,设置为使文本识别模型通过下述方式训练完成:确定目标训练文本图像;将目标训练文本图像输入至第二初始模型;第二初始模型包括第二特征提取网络、第二输出网络和分类函数;通过第二特征提取网络提取目标训练文本图像的特征图;通过第二初始模型将特征图拆分成至少一个子特征图;将子特征图分别输入至第二输出网络,输出每个子特征图对应的输出矩阵;将每个子特征图对应的输出矩阵分别输入至分类函数,输出每个子特征图对应的概率矩阵;通过预设的识别损失函数确定概率矩阵的第二损失值;根据第二损失值对第二初始模型进行训练,直至第二初始模型中的参数收敛,得到文本识别模型。In some embodiments, the above-mentioned device further includes a text recognition model training module, which is configured to complete the training of the text recognition model in the following manner: determining the target training text image; inputting the target training text image to the second initial model; The model includes a second feature extraction network, a second output network, and a classification function; the feature map of the target training text image is extracted through the second feature extraction network; the feature map is split into at least one sub-feature map through the second initial model; The feature map is input to the second output network, and the output matrix corresponding to each sub feature map is output; the output matrix corresponding to each sub feature map is input to the classification function, and the probability matrix corresponding to each sub feature map is output; through preset recognition The loss function determines the second loss value of the probability matrix; the second initial model is trained according to the second loss value until the parameters in the second initial model converge to obtain a text recognition model.
在一些实施例中,上述第二特征提取网络包括依次连接的多组第二卷积网络;每组第二卷积网络包括依次连接的卷积层、池化层和激活函数层。In some embodiments, the above-mentioned second feature extraction network includes multiple groups of second convolutional networks connected in sequence; each group of second convolutional networks includes a convolution layer, a pooling layer, and an activation function layer connected in sequence.
在一些实施例中,上述文本识别模型训练模块还设置为:沿着特征图的列方向,将特征图拆分成至少一个子特征图;特征图的列方向为文本行方向的垂直方向。In some embodiments, the text recognition model training module described above is further configured to split the feature map into at least one sub-feature map along the column direction of the feature map; the column direction of the feature map is the vertical direction of the text row direction.
在一些实施例中,上述第二输出网络包括多个全连接层;全连接层的数量与子特征图的数量对应;识别模型训练模块还设置为:将每个子特征图分别输入至对应的全连接层中,得到每个全连接层分别输出的子特征图对应的输出矩阵。In some embodiments, the above-mentioned second output network includes multiple fully connected layers; the number of fully connected layers corresponds to the number of sub-feature maps; the recognition model training module is further configured to: input each sub-feature map to the corresponding full In the connection layer, the output matrix corresponding to the sub-characteristic map output by each fully connected layer is obtained.
在一些实施例中,上述分类函数包括Softmax函数;Softmax函数为
Figure PCTCN2020087809-appb-000014
其中,e表示自然常数;t表示第t个概率矩阵;K表示所述训练集合的目标训练文本图像所包含的不同字符的个数;m表示从1到K+1;∑表示求和运算;
Figure PCTCN2020087809-appb-000015
为所述输出矩阵中的第i个元素;所述
Figure PCTCN2020087809-appb-000016
为所述概率矩阵pt 中的第i个元素。
In some embodiments, the above classification function includes a Softmax function; the Softmax function is
Figure PCTCN2020087809-appb-000014
Where, e represents a natural constant; t represents the t-th probability matrix; K represents the number of different characters contained in the target training text image of the training set; m represents from 1 to K+1; ∑ represents a sum operation;
Figure PCTCN2020087809-appb-000015
Is the i-th element in the output matrix; the
Figure PCTCN2020087809-appb-000016
Is the i-th element in the probability matrix pt.
在一些实施例中,上述识别损失函数包括L=-log p(y|{p t} t=1…T);其中,y为预先标注的所述目标训练文本图像的概率矩阵;t表示第t个概率矩阵;p t为所述分类函数输出的每个所述子特征图对应的概率矩阵;T为所述概率矩阵的总数量;p表示计算概率;log表示对数运算。 In some embodiments, the aforementioned recognition loss function includes L=-log p(y|{p t } t=1...T ); where y is the pre-labeled probability matrix of the target training text image; t represents the first t probability matrices; p t is the probability matrix corresponding to each sub-characteristic map output by the classification function; T is the total number of the probability matrices; p represents the calculated probability; log represents the logarithmic operation.
在一些实施例中,上述识别模型训练模块还设置为:根据第二损失值更新第二初始模型中的参数;判断更新后的各个参数是否均收敛;如果更新后的各个参数均收敛,将参数更新后的第二初始模型确定为文本识别模型;如果更新后的各个参数没有均收敛,继续执行基于预设的训练集合确定目标训练文本图像的步骤,直至更新后的各个参数均收敛。In some embodiments, the aforementioned recognition model training module is further configured to: update the parameters in the second initial model according to the second loss value; determine whether the updated parameters are all converged; if the updated parameters are all converged, set the parameters The updated second initial model is determined to be the text recognition model; if the updated parameters do not all converge, continue to perform the step of determining the target training text image based on the preset training set until all the updated parameters converge.
在一些实施例中,上述识别模型训练模块还设置为:按照预设规则,从第二初始模型确定待更新参数;计算第二损失值对待更新参数的导数
Figure PCTCN2020087809-appb-000017
其中,L′为概率矩阵的损失值;W′为待更新参数;更新待更新参数,得到更新后的待更新参数
Figure PCTCN2020087809-appb-000018
其中,α′为预设系数。
In some embodiments, the aforementioned recognition model training module is further configured to: determine the parameter to be updated from the second initial model according to preset rules; and calculate the derivative of the second loss value of the parameter to be updated
Figure PCTCN2020087809-appb-000017
Among them, L′ is the loss value of the probability matrix; W′ is the parameter to be updated; update the parameter to be updated to obtain the updated parameter to be updated
Figure PCTCN2020087809-appb-000018
Among them, α'is the preset coefficient.
在一些实施例中,上述文本区域的识别结果包括文本区域对应的多个概率矩阵;文本内容确定模块还设置为:确定每个概率矩阵中的最大概率值的位置;从预先设置的概率矩阵中各个位置与字符的对应关系中,获取最大概率值的位置对应的字符,作为待排列字符;按照多个概率矩阵的排列顺序,排列所述待排列字符,得到排列后的字符;根据排列后的字符确定文本区域中的文本内容。In some embodiments, the recognition result of the above text area includes multiple probability matrices corresponding to the text area; the text content determination module is further configured to: determine the position of the maximum probability value in each probability matrix; from a preset probability matrix In the correspondence between each position and the character, the character corresponding to the position where the maximum probability value is obtained is used as the character to be arranged; according to the arrangement order of the multiple probability matrices, the characters to be arranged are arranged to obtain the arranged characters; according to the arranged characters The character determines the text content in the text area.
在一些实施例中,上述文本内容确定模块还设置为:按照预设规则,删除排列后的字符中的重复字符和空字符,得到文本区域中的文本内容。In some embodiments, the above-mentioned text content determination module is further configured to delete repeated characters and empty characters in the arranged characters according to a preset rule to obtain the text content in the text area.
在一些实施例中,上述装置还包括:敏感信息确定模块,设置为通过预先建立的敏感词库确定文本内容中是否包含有敏感信息。In some embodiments, the above-mentioned apparatus further includes: a sensitive information determining module configured to determine whether the text content contains sensitive information through a pre-established sensitive vocabulary.
在一些实施例中,上述敏感信息确定模块还设置为:对获取到的文本内容进行分词操作;逐一将分词操作后得到的分词与预先建立的敏感词库进行匹配;如果至少一个分词匹配成功,确定文本内容中包含有敏感信息。In some embodiments, the above-mentioned sensitive information determination module is further configured to: perform word segmentation operations on the acquired text content; match the word segmentation obtained after the word segmentation operation with the pre-established sensitive vocabulary one by one; if at least one word segmentation is successfully matched, Make sure that the text contains sensitive information.
在一些实施例中,上述装置还包括:区域标识模块,设置为确定匹配成功的分词所属的文本区域,作为待标识区域;在所述图像中标识出所述待标识区域。In some embodiments, the above-mentioned apparatus further includes: an area identification module configured to determine a text area to which the successfully matched word segment belongs as the area to be identified; and identify the area to be identified in the image.
本申请实施例所提供的装置,其实现原理及产生的技术效果和前述方法实施例相同,为简要描述,装置实施例部分未提及之处,可参考前述方法实施例中相应内容。The implementation principles and technical effects of the device provided in the embodiment of the application are the same as those of the foregoing method embodiment. For a brief description, for the parts not mentioned in the device embodiment, please refer to the corresponding content in the foregoing method embodiment.
本申请实施例还提供了一种电子设备,参见图14所示,该电子设备包括存储器100和处理器101,其中,存储器100设置为存储一条或多条计算机指令,一条或多条计算机指令被处理器101执行,以实现上述文本检测模型训练方法,文本区域确定方法,或者文本内容确定方法的步骤。An embodiment of the present application also provides an electronic device. As shown in FIG. 14, the electronic device includes a memory 100 and a processor 101, where the memory 100 is configured to store one or more computer instructions, and one or more computer instructions are The processor 101 executes to implement the steps of the above-mentioned text detection model training method, text region determination method, or text content determination method.
进一步地,图14所示的电子设备还包括总线102和通信接口103,处理器101、通信接口103和存储器100通过总线102连接。Further, the electronic device shown in FIG. 14 further includes a bus 102 and a communication interface 103, and the processor 101, the communication interface 103, and the memory 100 are connected through the bus 102.
其中,存储器100可能包含高速随机存取存储器(RAM,Random Access Memory),也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。通过至少一个通信接口103(可以是有线或者无线)实现该系统网元与至少一个其他网元之间的通信连接,可以使用互联网,广域网,本地网,城域网等。总线102可以是ISA总线、PCI总线或EISA总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图14中仅用一个双向箭头表示,但并不表示仅有一根总线或一 种类型的总线。The memory 100 may include a high-speed random access memory (RAM, Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is realized through at least one communication interface 103 (which may be wired or wireless), and the Internet, a wide area network, a local network, a metropolitan area network, etc. may be used. The bus 102 may be an ISA bus, PCI bus, EISA bus, or the like. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one bidirectional arrow is used in FIG. 14, but it does not mean that there is only one bus or one type of bus.
处理器101可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器101中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器101可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processing,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现成可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器100,处理器101读取存储器100中的信息,结合其硬件完成前述实施例的方法的步骤。The processor 101 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 101 or instructions in the form of software. The aforementioned processor 101 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP for short). ), Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 100, and the processor 101 reads information in the memory 100, and completes the steps of the method of the foregoing embodiment in combination with its hardware.
本申请实施例还提供了一种机器可读存储介质,该机器可读存储介质存储有机器可执行指令,该机器可执行指令在被处理器调用和执行时,机器可执行指令促使处理器实现上述文本检测模型训练方法,文本区域确定方法,或者文本内容确定方法的步骤,具体实现可参见方法实施例,在此不再赘述。The embodiment of the present application also provides a machine-readable storage medium that stores machine-executable instructions. When the machine-executable instructions are called and executed by a processor, the machine-executable instructions prompt the processor to implement For the steps of the text detection model training method, the text region determination method, or the text content determination method, the specific implementation can be found in the method embodiment, which will not be repeated here.
本申请实施例所提供的文本检测模型训练方法、文本区域、内容确定方法、装置和电子设备的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可设置为执行前面方法实施例中所述的方法,具体实现可参见方法实施例,在此不再赘述。The text detection model training method, text area, content determination method, device, and computer program product of electronic equipment provided by the embodiments of the present application include a computer-readable storage medium storing program code, and instructions included in the program code can be set In order to implement the method described in the foregoing method embodiment, the specific implementation can be referred to the method embodiment, and details are not described herein again.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the related technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .
本申请实施例提供了一种可执行程序代码,所述可执行程序代码设置为被运行以执行上述文本检测模型训练方法,上述文本区域确定方法,或者上述文本内容确定方法的步骤。The embodiment of the present application provides an executable program code that is configured to be executed to execute the steps of the text detection model training method, the text region determination method, or the text content determination method.
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above are only the preferred embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in this application Within the scope of protection.

Claims (53)

  1. 一种文本检测模型训练方法,所述方法包括:A text detection model training method, the method includes:
    确定目标训练图像;Determine the target training image;
    将所述目标训练图像输入至第一初始模型;所述第一初始模型包括第一特征提取网络、特征融合网络和第一输出网络;Input the target training image to a first initial model; the first initial model includes a first feature extraction network, a feature fusion network, and a first output network;
    通过所述第一特征提取网络提取所述目标训练图像的多个初始特征图;多个所述初始特征图之间的尺度不同;Extracting multiple initial feature maps of the target training image through the first feature extraction network; the multiple initial feature maps have different scales;
    通过所述特征融合网络对多个所述初始特征图进行融合处理,得到融合特征图;Performing fusion processing on a plurality of the initial feature maps through the feature fusion network to obtain a fusion feature map;
    将所述融合特征图输入至所述第一输出网络,输出所述目标训练图像中文本区域的候选区域以及每个所述候选区域的概率值;Inputting the fusion feature map to the first output network, and outputting candidate regions of the text region in the target training image and the probability value of each candidate region;
    通过预设的检测损失函数确定所述候选区域以及每个所述候选区域的概率值的第一损失值;根据所述第一损失值对所述第一初始模型进行训练,直至所述第一初始模型中的参数收敛,得到文本检测模型。Determine the candidate area and the first loss value of the probability value of each candidate area through a preset detection loss function; train the first initial model according to the first loss value until the first The parameters in the initial model converge, and the text detection model is obtained.
  2. 根据权利要求1所述的方法,其中,所述第一特征提取网络包括依次连接的多组第一卷积网络;每组所述第一卷积网络包括依次连接的卷积层、批归一化层和激活函数层。The method according to claim 1, wherein the first feature extraction network comprises multiple groups of first convolutional networks connected in sequence; each group of the first convolutional network comprises convolutional layers connected in sequence, batch normalization The transformation layer and activation function layer.
  3. 根据权利要求1所述的方法,其中,通过所述特征融合网络对多个所述初始特征图进行融合处理,得到融合特征图的步骤,包括:The method according to claim 1, wherein the step of performing fusion processing on a plurality of the initial feature maps through the feature fusion network to obtain a fusion feature map comprises:
    根据所述初始特征图的尺度,将多个所述初始特征图依次排列;其中,最顶层级的初始特征图的尺度最小;最底层级的初始特征图的尺度最大;According to the scale of the initial feature map, arrange a plurality of the initial feature maps in sequence; wherein the scale of the initial feature map at the top level is the smallest; and the scale of the initial feature map at the bottom level is the largest;
    按照排列顺序,依次针对所述最顶层级以下的每一层级,将该层级的初始特征图和该层级的上一层级的融合结果进行融合,得到该层级的融合结果;其中,所述最顶层级的融合结果为所述最顶层级的初始特征图;According to the arrangement sequence, for each level below the top level, the initial feature map of the level and the fusion result of the upper level of the level are fused to obtain the fusion result of the level; wherein, the topmost level The fusion result of the level is the initial feature map of the top level;
    将最低层级的融合结果确定为所述初始特征图的融合特征图。The fusion result of the lowest level is determined as the fusion feature map of the initial feature map.
  4. 根据权利要求1所述的方法,其中,所述第一输出网络包括第一卷积层和第二卷积层;The method according to claim 1, wherein the first output network includes a first convolutional layer and a second convolutional layer;
    将所述融合特征图输入至所述第一输出网络,输出所述目标训练图像中文本区域的候选区域以及每个所述候选区域的概率值的步骤,包括:The step of inputting the fusion feature map to the first output network and outputting the candidate regions of the text region in the target training image and the probability value of each candidate region includes:
    将所述融合特征图分别输入至所述第一卷积层和所述第二卷积层;Input the fusion feature map to the first convolutional layer and the second convolutional layer respectively;
    通过所述第一卷积层对所述融合特征图进行第一卷积运算,输出坐标矩阵;所述坐标矩阵包括所述目标训练图像中文本区域的候选区域的顶点坐标;Performing a first convolution operation on the fusion feature map through the first convolution layer to output a coordinate matrix; the coordinate matrix includes the vertex coordinates of the candidate regions of the text region in the target training image;
    通过所述第二卷积层对所述融合特征图进行第二卷积运算,输出概率矩阵;所述概率矩阵包括每个所述候选区域的概率值。Perform a second convolution operation on the fusion feature map through the second convolution layer to output a probability matrix; the probability matrix includes the probability value of each candidate region.
  5. 根据权利要求1所述的方法,其中,所述检测损失函数包括第一函数和第二函数;The method according to claim 1, wherein the detection loss function includes a first function and a second function;
    所述第一函数为L 1=|G *-G|;其中,所述G *为预先标注的所述目标训练图像中文本区域的坐标矩阵;G为所述第一输出网络输出的所述目标训练图像中文本区域的候选区域的坐标矩阵; The first function is L 1 = |G * -G|; wherein, G * is the coordinate matrix of the text area in the target training image that is pre-labeled; G is the output of the first output network The coordinate matrix of the candidate area of the text area in the target training image;
    所述第二函数为L 2=-Y *logY-(1-Y *)log(1-Y);其中,Y *为预先标注的所述目标训练图像中文本区域的概率矩阵;Y为所述第一输出网络输出的所述目标训练图像中文本区域的候选区域的概率矩阵;log表示对数运算; The second function is L 2 =-Y * logY-(1-Y * )log(1-Y); where Y * is the pre-labeled probability matrix of the text area in the target training image; Y is the The probability matrix of the candidate area of the text area in the target training image output by the first output network; log represents a logarithmic operation;
    所述候选区域以及每个所述候选区域的概率值的第一损失值L=L 1+L 2The first loss value of the candidate area and the probability value of each candidate area is L=L 1 +L 2 .
  6. 根据权利要求1所述的方法,其中,根据所述第一损失值对所述第一初始模型进行训练,直至所述第一初始模型中的参数收敛,得到文本检测模型的步骤,包括:The method according to claim 1, wherein the step of training the first initial model according to the first loss value until the parameters in the first initial model converge to obtain a text detection model comprises:
    根据所述第一损失值更新所述第一初始模型中的参数;Updating the parameters in the first initial model according to the first loss value;
    判断更新后的所述参数是否均收敛;Determine whether the updated parameters are all converged;
    如果更新后的所述参数均收敛,将参数更新后的所述第一初始模型确定为检测模型;If the updated parameters all converge, determining the first initial model after the updated parameters as the detection model;
    如果更新后的所述参数没有均收敛,继续执行确定目标训练图像的步骤,直至更新后的所述参数均收敛。If the updated parameters do not all converge, continue to perform the step of determining the target training image until the updated parameters all converge.
  7. 根据权利要求6所述的方法,其中,根据所述第一损失值更新所述第一初始模型中的参数的步骤,包括:The method according to claim 6, wherein the step of updating the parameters in the first initial model according to the first loss value comprises:
    按照预设规则,从所述第一初始模型确定待更新参数;Determine the parameters to be updated from the first initial model according to preset rules;
    计算所述第一损失值对所述第一初始模型中所述待更新参数的导数
    Figure PCTCN2020087809-appb-100001
    其中,L为所述第一损失值;W为所述待更新参数;
    Calculate the derivative of the first loss value to the parameter to be updated in the first initial model
    Figure PCTCN2020087809-appb-100001
    Wherein, L is the first loss value; W is the parameter to be updated;
    更新所述待更新参数,得到更新后的待更新参数
    Figure PCTCN2020087809-appb-100002
    其中,α为预设系数。
    Update the parameter to be updated to obtain the updated parameter to be updated
    Figure PCTCN2020087809-appb-100002
    Among them, α is the preset coefficient.
  8. 一种文本区域确定方法,所述方法包括:A method for determining a text area, the method comprising:
    获取待检测图像;Obtain the image to be detected;
    将所述待检测图像输入至预先训练完成的文本检测模型,输出所述待检测图像中文本区域的多个候选区域,以及每个所述候选区域的概率值;所述文本检测模型通过权利要求1-7任一项所述的文本检测模型的训练方法训练得到;Input the image to be detected into a pre-trained text detection model, output multiple candidate regions of the text region in the image to be detected, and the probability value of each candidate region; the text detection model passes the claims The training method for the text detection model described in any one of 1-7 is obtained through training;
    根据所述候选区域的概率值以及多个所述候选区域之间的重叠程度,从多个所述候选区域中确定所述待检测图像中的文本区域。According to the probability value of the candidate area and the degree of overlap between the multiple candidate areas, the text area in the image to be detected is determined from the multiple candidate areas.
  9. 根据权利要求8所述的方法,其中,根据所述候选区域的概率值以及多个所述候选区域之间的重叠程度,从多个所述候选区域中确定所述待检测图像中的文本区域的步骤,包括:8. The method according to claim 8, wherein the text area in the image to be detected is determined from the plurality of candidate areas according to the probability value of the candidate area and the degree of overlap between the plurality of candidate areas The steps include:
    根据所述候选区域的概率值,将多个所述候选区域依次排列;其中,第一个候选区域的概率值最大,最后一个候选区域的概率值最小;According to the probability value of the candidate region, arrange a plurality of the candidate regions in sequence; wherein the probability value of the first candidate region is the largest, and the probability value of the last candidate region is the smallest;
    按照排列顺序,依次针对每个候选区域,逐一计算该候选区域与除该候选区域以外的候选区域的重叠程度;将除该候选区域以外的候选区域中,所述重叠程度大于预设的重叠阈值的候选区域剔除;According to the arrangement sequence, for each candidate area, calculate the degree of overlap between the candidate area and the candidate area other than the candidate area one by one; in the candidate areas other than the candidate area, the degree of overlap is greater than the preset overlap threshold The candidate area is eliminated;
    将剔除后的剩余的候选区域确定为所述待检测图像中的文本区域。The remaining candidate area after the elimination is determined as the text area in the image to be detected.
  10. 根据权利要求9所述的方法,其中,根据所述候选区域的概率值,将多个所述候选区域依次排列的步骤之前,所述方法还包括:The method according to claim 9, wherein, before the step of arranging a plurality of the candidate regions in sequence according to the probability value of the candidate region, the method further comprises:
    将多个所述候选区域中,概率值低于预设的概率阈值的候选区域剔除。Among the multiple candidate regions, candidate regions with a probability value lower than a preset probability threshold are eliminated.
  11. 一种文本内容确定方法,所述方法包括:A method for determining text content, the method comprising:
    通过权利要求8-10任一项所述的文本区域确定方法,获取图像中的文本区域;Obtain the text area in the image by the method for determining the text area according to any one of claims 8-10;
    将所述文本区域输入至预先训练完成的文本识别模型,输出所述文本区域的识别结果;Input the text area into a pre-trained text recognition model, and output the recognition result of the text area;
    根据所述识别结果确定所述文本区域中的文本内容。Determine the text content in the text area according to the recognition result.
  12. 根据权利要求11所述的方法,其中,将所述文本区域输入至预先训练完成的识别模型的步骤,包括:按照预设尺寸,对所述文本区域进行归一化处理,得到处理后的文本区域;将所述处理后的文本 区域输入至预先训练完成的识别模型。The method according to claim 11, wherein the step of inputting the text area into a pre-trained recognition model comprises: normalizing the text area according to a preset size to obtain the processed text Region; input the processed text region into the pre-trained recognition model.
  13. 根据权利要求11所述的方法,其中,所述文本识别模型通过下述方式训练完成:The method according to claim 11, wherein the text recognition model is trained in the following manner:
    确定目标训练文本图像;Determine the target training text image;
    将所述目标训练文本图像输入至第二初始模型;所述第二初始模型包括第二特征提取网络、特征拆分网络、第二输出网络和分类函数;Inputting the target training text image to a second initial model; the second initial model includes a second feature extraction network, a feature splitting network, a second output network, and a classification function;
    通过所述第二特征提取网络提取所述目标训练文本图像的特征图;Extracting the feature map of the target training text image through the second feature extraction network;
    通过所述特征拆分网络将所述特征图拆分成至少一个子特征图;Splitting the feature map into at least one sub feature map through the feature splitting network;
    将所述子特征图分别输入至所述第二输出网络,输出每个所述子特征图对应的输出矩阵;Input the sub-feature maps to the second output network respectively, and output the output matrix corresponding to each of the sub-feature maps;
    将每个所述子特征图对应的输出矩阵分别输入至所述分类函数,输出每个所述子特征图对应的概率矩阵;Input the output matrix corresponding to each of the sub-feature maps to the classification function, and output the probability matrix corresponding to each of the sub-feature maps;
    通过预设的识别损失函数确定所述概率矩阵的第二损失值;根据所述第二损失值对所述第二初始模型进行训练,直至所述第二初始模型中的参数收敛,得到文本识别模型。The second loss value of the probability matrix is determined by the preset recognition loss function; the second initial model is trained according to the second loss value until the parameters in the second initial model converge to obtain text recognition model.
  14. 根据权利要求13所述的方法,其中,所述第二特征提取网络包括依次连接的多组第二卷积网络;每组所述第二卷积网络包括依次连接的卷积层、池化层和激活函数层。The method according to claim 13, wherein the second feature extraction network comprises multiple groups of second convolutional networks connected in sequence; each group of the second convolutional network comprises a convolutional layer and a pooling layer connected in sequence And activation function layer.
  15. 根据权利要求13所述的方法,其中,通过所述特征拆分网络将所述特征图拆分成至少一个子特征图的步骤,包括:The method according to claim 13, wherein the step of splitting the feature map into at least one sub-feature map through the feature splitting network comprises:
    沿着所述特征图的列方向,将所述特征图拆分成至少一个子特征图;所述特征图的列方向为文本行方向的垂直方向。The feature map is split into at least one sub feature map along the column direction of the feature map; the column direction of the feature map is the vertical direction of the text row direction.
  16. 根据权利要求13所述的方法,其中,所述第二输出网络包括多个全连接层;所述全连接层的数量与所述子特征图的数量对应;The method according to claim 13, wherein the second output network includes a plurality of fully connected layers; the number of the fully connected layers corresponds to the number of the sub feature maps;
    所述将所述子特征图分别输入至所述第二输出网络,输出每个所述子特征图对应的输出矩阵的步骤,包括:将每个所述子特征图分别输入至对应的全连接层中,得到每个全连接层分别输出的子特征图对应的输出矩阵。The step of inputting the sub-feature maps to the second output network and outputting the output matrix corresponding to each of the sub-feature maps respectively includes: inputting each of the sub-feature maps to the corresponding fully connected In the layer, the output matrix corresponding to the sub-feature map respectively output by each fully connected layer is obtained.
  17. 根据权利要求13所述的方法,其中,所述分类函数包括Softmax函数;The method according to claim 13, wherein the classification function comprises a Softmax function;
    所述Softmax函数为
    Figure PCTCN2020087809-appb-100003
    其中,e表示自然常数;t表示第t个概率矩阵;K表示所述训练集合的目标训练文本图像所包含的不同字符的个数;m表示从1到K+1;∑表示求和运算;
    Figure PCTCN2020087809-appb-100004
    为所述输出矩阵中的第i个元素;所述
    Figure PCTCN2020087809-appb-100005
    为所述概率矩阵pt中的第i个元素。
    The Softmax function is
    Figure PCTCN2020087809-appb-100003
    Where, e represents a natural constant; t represents the t-th probability matrix; K represents the number of different characters contained in the target training text image of the training set; m represents from 1 to K+1; ∑ represents a sum operation;
    Figure PCTCN2020087809-appb-100004
    Is the i-th element in the output matrix; the
    Figure PCTCN2020087809-appb-100005
    Is the i-th element in the probability matrix pt.
  18. 根据权利要求13所述的方法,其中,所述识别损失函数包括L=-log p(y|{p t} t=1…T);其中,y为预先标注的所述目标训练文本图像的概率矩阵;t表示第t个概率矩阵;p t为所述分类函数输出的每个所述子特征图对应的概率矩阵;T为所述概率矩阵的总数量;p表示计算概率;log表示对数运算。 The method according to claim 13, wherein the recognition loss function comprises L=-log p(y|{p t } t=1...T ); wherein y is the pre-labeled target training text image Probability matrix; t represents the t-th probability matrix; p t is the probability matrix corresponding to each of the sub-characteristic maps output by the classification function; T is the total number of the probability matrix; p represents the calculated probability; log represents the pair Numerical operations.
  19. 根据权利要求13所述的方法,其中,根据所述第二损失值对所述第二初始模型进行训练,直至所述第二初始模型中的参数收敛,得到文本识别模型的步骤,包括:The method according to claim 13, wherein the step of training the second initial model according to the second loss value until the parameters in the second initial model converge to obtain a text recognition model comprises:
    根据所述第二损失值更新所述第二初始模型中的参数;Update the parameters in the second initial model according to the second loss value;
    判断更新后的所述参数是否均收敛;Determine whether the updated parameters are all converged;
    如果更新后的所述参数均收敛,将参数更新后的所述第二初始模型确定为文本识别模型;If the updated parameters all converge, determining the second initial model after the updated parameters as the text recognition model;
    如果更新后的所述参数没有均收敛,继续执行确定目标训练文本图像的步骤,直至更新后的各个所述参数均收敛。If the updated parameters do not all converge, continue to perform the step of determining the target training text image until all the updated parameters converge.
  20. 根据权利要求19所述的方法,其中,根据所述第二损失值更新所述第二初始模型中各个参数的步骤,包括:The method according to claim 19, wherein the step of updating each parameter in the second initial model according to the second loss value comprises:
    按照预设规则,从所述第二初始模型确定待更新参数;Determine the parameters to be updated from the second initial model according to preset rules;
    计算所述第二损失值对所述待更新参数的导数
    Figure PCTCN2020087809-appb-100006
    其中,L′为所述概率矩阵的损失值;W′为所述待更新参数;
    Calculate the derivative of the second loss value to the parameter to be updated
    Figure PCTCN2020087809-appb-100006
    Wherein, L′ is the loss value of the probability matrix; W′ is the parameter to be updated;
    更新所述待更新参数,得到更新后的待更新参数
    Figure PCTCN2020087809-appb-100007
    其中,α′为预设系数。
    Update the parameter to be updated to obtain the updated parameter to be updated
    Figure PCTCN2020087809-appb-100007
    Among them, α'is the preset coefficient.
  21. 根据权利要求11所述的方法,其中,所述文本区域的识别结果包括所述文本区域对应的多个概率矩阵;The method according to claim 11, wherein the recognition result of the text area includes multiple probability matrices corresponding to the text area;
    根据所述识别结果确定所述文本区域中的文本内容的步骤,包括:The step of determining the text content in the text area according to the recognition result includes:
    确定每个所述概率矩阵中的最大概率值的位置;Determining the position of the maximum probability value in each probability matrix;
    从预先设置的概率矩阵中各个位置与字符的对应关系中,获取所述最大概率值的位置对应的字符,作为待排列字符;Obtaining the character corresponding to the position with the maximum probability value from the correspondence between each position and the character in the preset probability matrix, as the character to be arranged;
    按照多个所述概率矩阵的排列顺序,排列所述待排列字符,得到排列后的字符;Arrange the characters to be arranged according to the arrangement sequence of the plurality of probability matrices to obtain the arranged characters;
    根据所述排列后的字符确定所述文本区域中的文本内容。The text content in the text area is determined according to the arranged characters.
  22. 根据权利要求21所述的方法,其中,根据所述排列后的字符确定所述文本区域中的文本内容的步骤,包括:22. The method according to claim 21, wherein the step of determining the text content in the text area according to the arranged characters comprises:
    删除排列后的所述字符中的重复字符和空字符,得到所述文本区域中的文本内容。The repeated characters and empty characters in the arranged characters are deleted to obtain the text content in the text area.
  23. 根据权利要求11所述的方法,其中,根据所述识别结果确定所述文本区域中的文本内容的步骤之后,所述方法还包括:The method according to claim 11, wherein, after the step of determining the text content in the text area according to the recognition result, the method further comprises:
    通过预先建立的敏感词库确定所述文本内容中是否包含有敏感信息。It is determined whether the text content contains sensitive information through a pre-established sensitive vocabulary.
  24. 根据权利要求23所述的方法,其中,通过预先建立的敏感词库确定所述文本内容中是否包含有敏感信息的步骤,包括:22. The method according to claim 23, wherein the step of determining whether the text content contains sensitive information through a pre-established sensitive vocabulary includes:
    对获取到的所述文本内容进行分词操作;Perform word segmentation operations on the obtained text content;
    逐一将分词操作后得到的分词与预先建立的敏感词库进行匹配;Match the word segmentation obtained after word segmentation operation with the pre-established sensitive vocabulary one by one;
    如果至少一个分词匹配成功,确定所述文本内容中包含有敏感信息。If at least one word segmentation matches successfully, it is determined that the text content contains sensitive information.
  25. 根据权利要求24所述的方法,其中,确定所述文本内容中包含有敏感信息之后,所述方法还包括:The method according to claim 24, wherein after determining that the text content contains sensitive information, the method further comprises:
    确定匹配成功的分词所属的文本区域,作为待标识区域;在所述图像中标识出所述待标识区域;Determine the text area to which the successfully matched word belongs as the area to be marked; mark the area to be marked in the image;
    或者,在所述图像中标识出所述匹配成功的分词。Alternatively, the matched word segmentation is identified in the image.
  26. 一种文本检测模型训练装置,所述装置包括:A text detection model training device, the device includes:
    训练图像确定模块,设置为确定目标训练图像;The training image determination module is set to determine the target training image;
    训练图像输入模块,设置为将所述目标训练图像输入至第一初始模型;所述第一初始模型包括第一特征提取网络、特征融合网络和第一输出网络;A training image input module, configured to input the target training image to a first initial model; the first initial model includes a first feature extraction network, a feature fusion network, and a first output network;
    特征提取模块,设置为通过所述第一特征提取网络提取所述目标训练图像的多个初始特征图;多个所述初始特征图之间的尺度不同;A feature extraction module, configured to extract multiple initial feature maps of the target training image through the first feature extraction network; the scales of the multiple initial feature maps are different;
    特征融合模块,设置为通过所述特征融合网络对多个所述初始特征图进行融合处理,得到融合特征 图;A feature fusion module, configured to perform fusion processing on a plurality of the initial feature maps through the feature fusion network to obtain a fusion feature map;
    输出模块,设置为将所述融合特征图输入至所述第一输出网络,输出所述目标训练图像中文本区域的候选区域以及每个所述候选区域的概率值;An output module, configured to input the fusion feature map to the first output network, and output candidate regions of the text region in the target training image and the probability value of each candidate region;
    损失值确定和训练模块,设置为通过预设的检测损失函数确定所述候选区域以及每个所述候选区域的概率值的第一损失值;根据所述第一损失值对所述第一初始模型进行训练,直至所述第一初始模型中的参数收敛,得到文本检测模型。The loss value determination and training module is configured to determine the candidate region and the first loss value of the probability value of each candidate region through a preset detection loss function; compare the first initial loss value to the first loss value according to the first loss value The model is trained until the parameters in the first initial model converge to obtain a text detection model.
  27. 根据权利要求26所述的装置,其中,所述第一特征提取网络包括依次连接的多组第一卷积网络;每组所述第一卷积网络包括依次连接的卷积层、批归一化层和激活函数层。The apparatus according to claim 26, wherein the first feature extraction network comprises multiple groups of first convolutional networks connected in sequence; each group of the first convolutional network comprises convolutional layers connected in sequence, batch normalization The transformation layer and activation function layer.
  28. 根据权利要求26所述的装置,其中,所述特征融合模块还设置为:The device according to claim 26, wherein the feature fusion module is further configured to:
    根据所述初始特征图的尺度,将多个所述初始特征图依次排列;其中,最顶层级的初始特征图的尺度最小;最底层级的初始特征图的尺度最大;According to the scale of the initial feature map, arrange a plurality of the initial feature maps in sequence; wherein the scale of the initial feature map at the top level is the smallest; and the scale of the initial feature map at the bottom level is the largest;
    按照排列顺序,依次针对所述最顶层级以下的每一层级,将该层级的初始特征图和该层级的上一层级的融合结果进行融合,得到该层级的融合结果;其中,所述最顶层级的融合结果为所述最顶层级的初始特征图;According to the arrangement sequence, for each level below the top level, the initial feature map of the level and the fusion result of the upper level of the level are fused to obtain the fusion result of the level; wherein, the topmost level The fusion result of the level is the initial feature map of the top level;
    将最低层级的融合结果确定为所述初始特征图的融合特征图。The fusion result of the lowest level is determined as the fusion feature map of the initial feature map.
  29. 根据权利要求26所述的装置,其中,所述第一输出网络包括第一卷积层和第二卷积层;The apparatus of claim 26, wherein the first output network includes a first convolutional layer and a second convolutional layer;
    所述输出模块还设置为:The output module is also set to:
    将所述融合特征图分别输入至所述第一卷积层和所述第二卷积层;Input the fusion feature map to the first convolutional layer and the second convolutional layer respectively;
    通过所述第一卷积层对所述融合特征图进行第一卷积运算,输出坐标矩阵;所述坐标矩阵包括所述目标训练图像中文本区域的候选区域的顶点坐标;Performing a first convolution operation on the fusion feature map through the first convolution layer to output a coordinate matrix; the coordinate matrix includes the vertex coordinates of the candidate regions of the text region in the target training image;
    通过所述第二卷积层对所述融合特征图进行第二卷积运算,输出概率矩阵;所述概率矩阵包括每个所述候选区域的概率值。Perform a second convolution operation on the fusion feature map through the second convolution layer to output a probability matrix; the probability matrix includes the probability value of each candidate region.
  30. 根据权利要求26所述的装置,其中,所述检测损失函数包括第一函数和第二函数;The apparatus of claim 26, wherein the detection loss function includes a first function and a second function;
    所述第一函数为L 1=|G *-G|;其中,所述G *为预先标注的所述目标训练图像中文本区域的坐标矩阵;G为所述第一输出网络输出的所述目标训练图像中文本区域的候选区域的坐标矩阵; The first function is L 1 = |G * -G|; wherein, G * is the coordinate matrix of the text area in the target training image that is pre-labeled; G is the output of the first output network The coordinate matrix of the candidate area of the text area in the target training image;
    所述第二函数为L 2=-Y *logY-(1-Y *)log(1-Y);其中,Y *为预先标注的所述目标训练图像中文本区域的概率矩阵;Y为所述第一输出网络输出的所述目标训练图像中文本区域的候选区域的概率矩阵;log表示对数运算; The second function is L 2 =-Y * logY-(1-Y * )log(1-Y); where Y * is the pre-labeled probability matrix of the text area in the target training image; Y is the The probability matrix of the candidate area of the text area in the target training image output by the first output network; log represents a logarithmic operation;
    所述候选区域以及每个所述候选区域的概率值的第一损失值L=L 1+L 2The first loss value of the candidate area and the probability value of each candidate area is L=L 1 +L 2 .
  31. 根据权利要求26所述的装置,其中,所述损失值确定和训练模块还设置为:The device according to claim 26, wherein the loss value determination and training module is further configured to:
    根据所述第一损失值更新所述第一初始模型中的参数;Updating the parameters in the first initial model according to the first loss value;
    判断更新后的所述参数是否均收敛;Determine whether the updated parameters are all converged;
    如果更新后的所述参数均收敛,将参数更新后的所述第一初始模型确定为检测模型;If the updated parameters all converge, determining the first initial model after the updated parameters as the detection model;
    如果更新后的所述参数没有均收敛,继续执行基于预设的训练集合确定目标训练图像的步骤,直至更新后的所述参数均收敛。If the updated parameters do not all converge, continue to perform the step of determining the target training image based on the preset training set until the updated parameters all converge.
  32. 根据权利要求31所述的装置,其中,所述损失值确定和训练模块还设置为:The device according to claim 31, wherein the loss value determination and training module is further configured to:
    按照预设规则,从所述第一初始模型确定待更新参数;Determine the parameters to be updated from the first initial model according to preset rules;
    计算所述第一损失值对所述第一初始模型中所述待更新参数的导数
    Figure PCTCN2020087809-appb-100008
    其中,L为所述第一损失值;W为所述待更新参数;
    Calculate the derivative of the first loss value to the parameter to be updated in the first initial model
    Figure PCTCN2020087809-appb-100008
    Wherein, L is the first loss value; W is the parameter to be updated;
    更新所述待更新参数,得到更新后的待更新参数
    Figure PCTCN2020087809-appb-100009
    其中,α为预设系数。
    Update the parameter to be updated to obtain the updated parameter to be updated
    Figure PCTCN2020087809-appb-100009
    Among them, α is the preset coefficient.
  33. 一种文本区域确定装置,所述装置包括:A device for determining a text area, the device comprising:
    图像获取模块,设置为获取待检测图像;The image acquisition module is set to acquire the image to be detected;
    检测模块,设置为将所述待检测图像输入至预先训练完成的文本检测模型,输出所述待检测图像中文本区域的多个候选区域,以及每个所述候选区域的概率值;所述文本检测模型通过权利要求1-7任一项所述的文本检测模型的训练方法训练得到;The detection module is configured to input the image to be detected into a pre-trained text detection model, and output multiple candidate regions of the text region in the image to be detected, and the probability value of each candidate region; the text The detection model is obtained by training the text detection model training method according to any one of claims 1-7;
    文本区域确定模块,设置为根据所述候选区域的概率值以及多个所述候选区域之间的重叠程度,从多个所述候选区域中确定所述待检测图像中的文本区域。The text area determination module is configured to determine the text area in the image to be detected from the plurality of candidate areas according to the probability value of the candidate area and the degree of overlap between the plurality of candidate areas.
  34. 根据权利要求33所述的装置,其中,所述文本区域确定模块还设置为:The device according to claim 33, wherein the text area determination module is further configured to:
    根据所述候选区域的概率值,将多个所述候选区域依次排列;其中,第一个候选区域的概率值最大,最后一个候选区域的概率值最小;According to the probability value of the candidate region, arrange a plurality of the candidate regions in sequence; wherein the probability value of the first candidate region is the largest, and the probability value of the last candidate region is the smallest;
    按照排列顺序,依次针对每个候选区域,逐一计算该候选区域与除该候选区域以外的候选区域的重叠程度;将除该候选区域以外的候选区域中,所述重叠程度大于预设的重叠阈值的候选区域剔除;According to the arrangement sequence, for each candidate area, calculate the degree of overlap between the candidate area and the candidate area other than the candidate area one by one; in the candidate areas other than the candidate area, the degree of overlap is greater than the preset overlap threshold The candidate area is eliminated;
    将剔除后的剩余的候选区域确定为所述待检测图像中的文本区域。The remaining candidate area after the elimination is determined as the text area in the image to be detected.
  35. 根据权利要求34所述的装置,其中,所述装置还包括:区域剔除模块,设置为将多个所述候选区域中,概率值低于预设的概率阈值的候选区域剔除,得到最终的多个所述候选区域。The device according to claim 34, wherein the device further comprises: an area elimination module, configured to eliminate candidate areas with a probability value lower than a preset probability threshold among the plurality of candidate areas to obtain a final multiple Said candidate regions.
  36. 一种文本内容确定装置,所述装置包括:A device for determining text content, the device comprising:
    区域获取模块,设置为通过权利要求8-10任一项所述的文本区域确定方法,获取图像中的文本区域;An area obtaining module, configured to obtain a text area in an image through the method for determining a text area according to any one of claims 8-10;
    识别模块,设置为将所述文本区域输入至预先训练完成的文本识别模型,输出所述文本区域的识别结果;A recognition module, configured to input the text area into a pre-trained text recognition model, and output the recognition result of the text area;
    文本内容确定模块,设置为根据所述识别结果确定所述文本区域中的文本内容。The text content determination module is configured to determine the text content in the text area according to the recognition result.
  37. 根据权利要求36所述的装置,其中,所述装置还包括:归一化模块,设置为按照预设尺寸,对所述文本区域进行归一化处理,得到处理后的文本区域;The device according to claim 36, wherein the device further comprises: a normalization module, configured to perform a normalization process on the text area according to a preset size to obtain a processed text area;
    所述识别模块,具体设置为:将所述处理后的文本区域输入至预先训练完成的识别模型。The recognition module is specifically configured to input the processed text area into a pre-trained recognition model.
  38. 根据权利要求36所述的装置,其中,所述装置还包括文本识别模型训练模块,设置为使所述文本识别模型通过下述方式训练完成:The device according to claim 36, wherein the device further comprises a text recognition model training module configured to train the text recognition model in the following manner:
    确定目标训练文本图像;Determine the target training text image;
    将所述目标训练文本图像输入至第二初始模型;所述第二初始模型包括第二特征提取网络、第二输出网络和分类函数;Inputting the target training text image into a second initial model; the second initial model includes a second feature extraction network, a second output network, and a classification function;
    通过所述第二特征提取网络提取所述目标训练文本图像的特征图;Extracting the feature map of the target training text image through the second feature extraction network;
    通过所述第二初始模型将所述特征图拆分成至少一个子特征图;Split the feature map into at least one sub feature map by using the second initial model;
    将所述子特征图分别输入至所述第二输出网络,输出每个所述子特征图对应的输出矩阵;Input the sub-feature maps to the second output network respectively, and output the output matrix corresponding to each of the sub-feature maps;
    将每个所述子特征图对应的输出矩阵分别输入至所述分类函数,输出每个所述子特征图对应的概率 矩阵;Inputting the output matrix corresponding to each of the sub-feature maps to the classification function respectively, and outputting the probability matrix corresponding to each of the sub-feature maps;
    通过预设的识别损失函数确定所述概率矩阵的第二损失值;根据所述第二损失值对所述第二初始模型进行训练,直至所述第二初始模型中的参数收敛,得到文本识别模型。The second loss value of the probability matrix is determined by the preset recognition loss function; the second initial model is trained according to the second loss value until the parameters in the second initial model converge to obtain text recognition model.
  39. 根据权利要求38所述的装置,其中,所述第二特征提取网络包括依次连接的多组第二卷积网络;每组所述第二卷积网络包括依次连接的卷积层、池化层和激活函数层。The device according to claim 38, wherein the second feature extraction network comprises multiple groups of second convolutional networks connected in sequence; each group of the second convolutional network comprises a convolutional layer and a pooling layer connected in sequence And activation function layer.
  40. 根据权利要求38所述的装置,其中,所述识别模型训练模块还设置为:The device according to claim 38, wherein the recognition model training module is further configured to:
    沿着所述特征图的列方向,将所述特征图拆分成至少一个子特征图;所述特征图的列方向为文本行方向的垂直方向。The feature map is split into at least one sub feature map along the column direction of the feature map; the column direction of the feature map is the vertical direction of the text row direction.
  41. 根据权利要求38所述的装置,其中,所述第二输出网络包括多个全连接层;所述全连接层的数量与所述子特征图的数量对应;The apparatus according to claim 38, wherein the second output network includes a plurality of fully connected layers; the number of the fully connected layers corresponds to the number of the sub-feature maps;
    所述识别模型训练模块还设置为:将每个所述子特征图分别输入至对应的全连接层中,得到每个全连接层分别输出的子特征图对应的输出矩阵。The recognition model training module is further configured to: input each of the sub-feature maps into the corresponding fully connected layer to obtain the output matrix corresponding to the sub-feature map respectively output by each fully connected layer.
  42. 根据权利要求38所述的装置,其中,所述分类函数包括Softmax函数;The device of claim 38, wherein the classification function comprises a Softmax function;
    所述Softmax函数为
    Figure PCTCN2020087809-appb-100010
    其中,e表示自然常数;t表示第t个概率矩阵;K表示所述训练集合的目标训练文本图像所包含的不同字符的个数;m表示从1到K+1;∑表示求和运算;
    Figure PCTCN2020087809-appb-100011
    为所述输出矩阵中的第i个元素;所述
    Figure PCTCN2020087809-appb-100012
    为所述概率矩阵pt中的第i个元素。
    The Softmax function is
    Figure PCTCN2020087809-appb-100010
    Where, e represents a natural constant; t represents the t-th probability matrix; K represents the number of different characters contained in the target training text image of the training set; m represents from 1 to K+1; ∑ represents a sum operation;
    Figure PCTCN2020087809-appb-100011
    Is the i-th element in the output matrix; the
    Figure PCTCN2020087809-appb-100012
    Is the i-th element in the probability matrix pt.
  43. 根据权利要求38所述的装置,其中,所述识别损失函数包括L=-log p(y|{p t} t=1…T);其中,y为预先标注的所述目标训练文本图像的概率矩阵;t表示第t个概率矩阵;p t为所述分类函数输出的每个所述子特征图对应的概率矩阵;T为所述概率矩阵的总数量;p表示计算概率;log表示对数运算。 The device according to claim 38, wherein the recognition loss function comprises L=-log p(y|{p t } t=1...T ); wherein y is the pre-labeled target training text image Probability matrix; t represents the t-th probability matrix; p t is the probability matrix corresponding to each of the sub-characteristic maps output by the classification function; T is the total number of the probability matrix; p represents the calculated probability; log represents the pair Numerical operations.
  44. 根据权利要求38所述的装置,其中,所述识别模型训练模块还设置为:The device according to claim 38, wherein the recognition model training module is further configured to:
    根据所述第二损失值更新所述第二初始模型中的参数;Update the parameters in the second initial model according to the second loss value;
    判断更新后的各个所述参数是否均收敛;Determine whether each of the updated parameters converges;
    如果更新后的各个所述参数均收敛,将参数更新后的所述第二初始模型确定为文本识别模型;If all the updated parameters converge, determining the second initial model after the parameter update is a text recognition model;
    如果更新后的各个所述参数没有均收敛,继续执行基于预设的训练集合确定目标训练文本图像的步骤,直至更新后的各个所述参数均收敛。If all the updated parameters do not converge, continue to perform the step of determining the target training text image based on the preset training set until all the updated parameters converge.
  45. 根据权利要求44所述的装置,其中,所述识别模型训练模块还设置为:The device according to claim 44, wherein the recognition model training module is further configured to:
    按照预设规则,从所述第二初始模型确定待更新参数;Determine the parameters to be updated from the second initial model according to preset rules;
    计算所述第二损失值对所述待更新参数的导数
    Figure PCTCN2020087809-appb-100013
    其中,L′为所述概率矩阵的损失值;W′为所述待更新参数;
    Calculate the derivative of the second loss value to the parameter to be updated
    Figure PCTCN2020087809-appb-100013
    Wherein, L′ is the loss value of the probability matrix; W′ is the parameter to be updated;
    更新所述待更新参数,得到更新后的待更新参数
    Figure PCTCN2020087809-appb-100014
    其中,α′为预设系数。
    Update the parameter to be updated to obtain the updated parameter to be updated
    Figure PCTCN2020087809-appb-100014
    Among them, α'is the preset coefficient.
  46. 根据权利要求36所述的装置,其中,所述文本区域的识别结果包括所述文本区域对应的多个概率矩阵;The device according to claim 36, wherein the recognition result of the text area includes multiple probability matrices corresponding to the text area;
    所述文本内容确定模块还设置为:The text content determination module is also set to:
    确定每个所述概率矩阵中的最大概率值的位置;Determining the position of the maximum probability value in each probability matrix;
    从预先设置的概率矩阵中各个位置与字符的对应关系中,获取所述最大概率值的位置对应的字符,作为待排列字符;Obtaining the character corresponding to the position with the maximum probability value from the correspondence between each position and the character in the preset probability matrix, as the character to be arranged;
    按照多个所述概率矩阵的排列顺序,排列所述待排列字符,得到排列后的字符;Arrange the characters to be arranged according to the arrangement sequence of the plurality of probability matrices to obtain the arranged characters;
    根据所述排列后的所述字符确定所述文本区域中的文本内容。The text content in the text area is determined according to the arranged characters.
  47. 根据权利要求46所述的装置,其中,所述文本内容确定模块还设置为:The device according to claim 46, wherein the text content determination module is further configured to:
    删除排列后的所述字符中的重复字符和空字符,得到所述文本区域中的文本内容。The repeated characters and empty characters in the arranged characters are deleted to obtain the text content in the text area.
  48. 根据权利要求36所述的装置,其中,所述装置还包括:The device of claim 36, wherein the device further comprises:
    敏感信息确定模块,设置为通过预先建立的敏感词库确定所述文本内容中是否包含有敏感信息。The sensitive information determining module is configured to determine whether the text content contains sensitive information through a pre-established sensitive vocabulary.
  49. 根据权利要求48所述的装置,其中,所述敏感信息确定模块还设置为:The device according to claim 48, wherein the sensitive information determining module is further configured to:
    对获取到的所述文本内容进行分词操作;Performing word segmentation operations on the acquired text content;
    逐一将分词操作后得到的分词与预先建立的敏感词库进行匹配;Match the word segmentation obtained after word segmentation operation with the pre-established sensitive vocabulary one by one;
    如果至少一个分词匹配成功,确定所述文本内容中包含有敏感信息。If at least one word segmentation matches successfully, it is determined that the text content contains sensitive information.
  50. 根据权利要求49所述的装置,其中,所述装置还包括:The device of claim 49, wherein the device further comprises:
    区域标识模块,设置为确定匹配成功的分词所属的文本区域,作为待标识区域;在所述图像中标识出所述待标识区域。The region identification module is configured to determine the text region to which the successfully matched word segment belongs, as the region to be identified; and identify the region to be identified in the image.
  51. 一种电子设备,包括处理器和存储器,所述存储器存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令以实现权利要求1至7任一项所述的文本检测模型训练方法,权利要求8至10任一项所述的文本区域确定方法,或者权利要求11至25任一项所述的文本内容确定方法的步骤。An electronic device comprising a processor and a memory, the memory storing machine executable instructions that can be executed by the processor, and the processor executing the machine executable instructions to implement any one of claims 1 to 7 The text detection model training method, the text region determination method according to any one of claims 8 to 10, or the steps of the text content determination method according to any one of claims 11 to 25.
  52. 一种机器可读存储介质,该机器可读存储介质存储有机器可执行指令,该机器可执行指令在被处理器调用和执行时,机器可执行指令促使处理器实现权利要求1至7任一项所述的文本检测模型训练方法,权利要求8至10任一项所述的文本区域确定方法,或者权利要求11至25任一项所述的文本内容确定方法的步骤。A machine-readable storage medium, the machine-readable storage medium stores machine-executable instructions. When the machine-executable instructions are called and executed by a processor, the machine-executable instructions prompt the processor to implement any one of claims 1 to 7 Steps of the text detection model training method according to the item, the text area determination method according to any one of claims 8 to 10, or the text content determination method according to any one of claims 11 to 25.
  53. 一种可执行程序代码,其中,所述可执行程序代码设置为被运行以执行权利要求1至7任一项所述的文本检测模型训练方法,权利要求8至10任一项所述的文本区域确定方法,或者权利要求11至25任一项所述的文本内容确定方法的步骤。An executable program code, wherein the executable program code is set to be executed to execute the text detection model training method according to any one of claims 1 to 7, and the text according to any one of claims 8 to 10 The region determining method, or the steps of the text content determining method according to any one of claims 11 to 25.
PCT/CN2020/087809 2019-04-30 2020-04-29 Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus WO2020221298A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910367675.2A CN110110715A (en) 2019-04-30 2019-04-30 Text detection model training method, text filed, content determine method and apparatus
CN201910367675.2 2019-04-30

Publications (1)

Publication Number Publication Date
WO2020221298A1 true WO2020221298A1 (en) 2020-11-05

Family

ID=67488106

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087809 WO2020221298A1 (en) 2019-04-30 2020-04-29 Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus

Country Status (2)

Country Link
CN (1) CN110110715A (en)
WO (1) WO2020221298A1 (en)

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328710A (en) * 2020-11-26 2021-02-05 北京百度网讯科技有限公司 Entity information processing method, entity information processing device, electronic equipment and storage medium
CN112418209A (en) * 2020-12-15 2021-02-26 润联软件系统(深圳)有限公司 Character recognition method and device, computer equipment and storage medium
CN112417847A (en) * 2020-11-19 2021-02-26 湖南红网新媒体集团有限公司 News content safety monitoring method, system, device and storage medium
CN112434510A (en) * 2020-11-24 2021-03-02 北京字节跳动网络技术有限公司 Information processing method and device, electronic equipment and storage medium
CN112541496A (en) * 2020-12-24 2021-03-23 北京百度网讯科技有限公司 Method, device and equipment for extracting POI name and computer storage medium
CN112560476A (en) * 2020-12-09 2021-03-26 中科讯飞互联(北京)信息科技有限公司 Text completion method, electronic device and storage device
CN112613376A (en) * 2020-12-17 2021-04-06 深圳集智数字科技有限公司 Re-recognition method and device and electronic equipment
CN112651373A (en) * 2021-01-04 2021-04-13 广联达科技股份有限公司 Identification method and device for text information of construction drawing
CN112686812A (en) * 2020-12-10 2021-04-20 广州广电运通金融电子股份有限公司 Bank card inclination correction detection method and device, readable storage medium and terminal
CN112734699A (en) * 2020-12-24 2021-04-30 浙江大华技术股份有限公司 Article state warning method and device, storage medium and electronic device
CN112784692A (en) * 2020-12-31 2021-05-11 科大讯飞股份有限公司 Method, device and equipment for identifying text content of image and storage medium
CN112802139A (en) * 2021-02-05 2021-05-14 歌尔股份有限公司 Image processing method and device, electronic equipment and readable storage medium
CN112861739A (en) * 2021-02-10 2021-05-28 中国科学技术大学 End-to-end text recognition method, model training method and device
CN112927173A (en) * 2021-04-12 2021-06-08 平安科技(深圳)有限公司 Model compression method and device, computing equipment and storage medium
CN112949653A (en) * 2021-02-23 2021-06-11 科大讯飞股份有限公司 Text recognition method, electronic device and storage device
CN112966609A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Target detection method and device
CN112966690A (en) * 2021-03-03 2021-06-15 中国科学院自动化研究所 Scene character detection method based on anchor-free frame and suggestion frame
CN112989844A (en) * 2021-03-10 2021-06-18 北京奇艺世纪科技有限公司 Model training and text recognition method, device, equipment and storage medium
CN113011312A (en) * 2021-03-15 2021-06-22 中国科学技术大学 Training method of motion positioning model based on weak supervision text guidance
CN113076823A (en) * 2021-03-18 2021-07-06 深圳数联天下智能科技有限公司 Training method of age prediction model, age prediction method and related device
CN113139625A (en) * 2021-05-18 2021-07-20 北京世纪好未来教育科技有限公司 Model training method, electronic device and storage medium thereof
CN113139463A (en) * 2021-04-23 2021-07-20 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for training a model
CN113205041A (en) * 2021-04-29 2021-08-03 百度在线网络技术(北京)有限公司 Structured information extraction method, device, equipment and storage medium
CN113205047A (en) * 2021-04-30 2021-08-03 平安科技(深圳)有限公司 Drug name identification method and device, computer equipment and storage medium
CN113221718A (en) * 2021-05-06 2021-08-06 新东方教育科技集团有限公司 Formula identification method and device, storage medium and electronic equipment
CN113298079A (en) * 2021-06-28 2021-08-24 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium
CN113313022A (en) * 2021-05-27 2021-08-27 北京百度网讯科技有限公司 Training method of character recognition model and method for recognizing characters in image
CN113326887A (en) * 2021-06-16 2021-08-31 深圳思谋信息科技有限公司 Text detection method and device and computer equipment
CN113344027A (en) * 2021-05-10 2021-09-03 北京迈格威科技有限公司 Retrieval method, device, equipment and storage medium for object in image
CN113343987A (en) * 2021-06-30 2021-09-03 北京奇艺世纪科技有限公司 Text detection processing method and device, electronic equipment and storage medium
CN113343970A (en) * 2021-06-24 2021-09-03 中国平安人寿保险股份有限公司 Text image detection method, device, equipment and storage medium
CN113361524A (en) * 2021-06-29 2021-09-07 北京百度网讯科技有限公司 Image processing method and device
CN113379500A (en) * 2021-06-21 2021-09-10 北京沃东天骏信息技术有限公司 Sequencing model training method and device, and article sequencing method and device
CN113378832A (en) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 Text detection model training method, text prediction box method and device
CN113379592A (en) * 2021-06-23 2021-09-10 北京百度网讯科技有限公司 Method and device for processing sensitive area in picture and electronic equipment
CN113469878A (en) * 2021-09-02 2021-10-01 北京世纪好未来教育科技有限公司 Text erasing method and training method and device of model thereof, and storage medium
CN113591893A (en) * 2021-01-26 2021-11-02 腾讯医疗健康(深圳)有限公司 Image processing method and device based on artificial intelligence and computer equipment
CN113762109A (en) * 2021-08-23 2021-12-07 北京百度网讯科技有限公司 Training method of character positioning model and character positioning method
CN113780087A (en) * 2021-08-11 2021-12-10 同济大学 Postal parcel text detection method and equipment based on deep learning
CN113780131A (en) * 2021-08-31 2021-12-10 众安在线财产保险股份有限公司 Text image orientation recognition method and text content recognition method, device and equipment
CN113806589A (en) * 2021-09-29 2021-12-17 云从科技集团股份有限公司 Video clip positioning method, device and computer readable storage medium
CN114022882A (en) * 2022-01-04 2022-02-08 北京世纪好未来教育科技有限公司 Text recognition model training method, text recognition device, text recognition equipment and medium
CN114419199A (en) * 2021-12-20 2022-04-29 北京百度网讯科技有限公司 Picture labeling method and device, electronic equipment and storage medium
CN114758332A (en) * 2022-06-13 2022-07-15 北京万里红科技有限公司 Text detection method and device, computing equipment and storage medium
CN114821622A (en) * 2022-03-10 2022-07-29 北京百度网讯科技有限公司 Text extraction method, text extraction model training method, device and equipment
CN114827132A (en) * 2022-06-27 2022-07-29 河北东来工程技术服务有限公司 Ship traffic file transmission control method, system, device and storage medium
CN114842483A (en) * 2022-06-27 2022-08-02 齐鲁工业大学 Standard file information extraction method and system based on neural network and template matching
CN114937267A (en) * 2022-04-20 2022-08-23 北京世纪好未来教育科技有限公司 Training method and device for text recognition model and electronic equipment
CN115171110A (en) * 2022-06-30 2022-10-11 北京百度网讯科技有限公司 Text recognition method, apparatus, device, medium, and product
CN115601553A (en) * 2022-08-15 2023-01-13 杭州联汇科技股份有限公司(Cn) Visual model pre-training method based on multi-level picture description data
CN116226319A (en) * 2023-05-10 2023-06-06 浪潮电子信息产业股份有限公司 Hybrid heterogeneous model training method, device, equipment and readable storage medium
CN116503517A (en) * 2023-06-27 2023-07-28 江西农业大学 Method and system for generating image by long text
CN117315702A (en) * 2023-11-28 2023-12-29 山东正云信息科技有限公司 Text detection method, system and medium based on set prediction
CN117593752A (en) * 2024-01-18 2024-02-23 星云海数字科技股份有限公司 PDF document input method, PDF document input system, storage medium and electronic equipment
CN113344027B (en) * 2021-05-10 2024-04-23 北京迈格威科技有限公司 Method, device, equipment and storage medium for retrieving objects in image

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110715A (en) * 2019-04-30 2019-08-09 北京金山云网络技术有限公司 Text detection model training method, text filed, content determine method and apparatus
CN110610166B (en) * 2019-09-18 2022-06-07 北京猎户星空科技有限公司 Text region detection model training method and device, electronic equipment and storage medium
CN110674804A (en) * 2019-09-24 2020-01-10 上海眼控科技股份有限公司 Text image detection method and device, computer equipment and storage medium
CN110705460B (en) * 2019-09-29 2023-06-20 北京百度网讯科技有限公司 Image category identification method and device
CN110751146B (en) * 2019-10-23 2023-06-20 北京印刷学院 Text region detection method, device, electronic terminal and computer readable storage medium
CN112749704A (en) * 2019-10-31 2021-05-04 北京金山云网络技术有限公司 Text region detection method and device and server
CN111062385A (en) * 2019-11-18 2020-04-24 上海眼控科技股份有限公司 Network model construction method and system for image text information detection
CN110929647B (en) * 2019-11-22 2023-06-02 科大讯飞股份有限公司 Text detection method, device, equipment and storage medium
CN110942067A (en) * 2019-11-29 2020-03-31 上海眼控科技股份有限公司 Text recognition method and device, computer equipment and storage medium
CN111104934A (en) * 2019-12-22 2020-05-05 上海眼控科技股份有限公司 Engine label detection method, electronic device and computer readable storage medium
CN113033593B (en) * 2019-12-25 2023-09-01 上海智臻智能网络科技股份有限公司 Text detection training method and device based on deep learning
CN111353442A (en) * 2020-03-03 2020-06-30 Oppo广东移动通信有限公司 Image processing method, device, equipment and storage medium
CN111382740B (en) * 2020-03-13 2023-11-21 深圳前海环融联易信息科技服务有限公司 Text picture analysis method, text picture analysis device, computer equipment and storage medium
CN111784623A (en) * 2020-09-07 2020-10-16 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN112287763A (en) * 2020-09-27 2021-01-29 北京旷视科技有限公司 Image processing method, apparatus, device and medium
CN112541491B (en) * 2020-12-07 2024-02-02 沈阳雅译网络技术有限公司 End-to-end text detection and recognition method based on image character region perception
CN112818975A (en) * 2021-01-27 2021-05-18 北京金山数字娱乐科技有限公司 Text detection model training method and device and text detection method and device
CN112580656A (en) * 2021-02-23 2021-03-30 上海旻浦科技有限公司 End-to-end text detection method, system, terminal and storage medium
CN113076944A (en) * 2021-03-11 2021-07-06 国家电网有限公司 Document detection and identification method based on artificial intelligence
CN113807096A (en) * 2021-04-09 2021-12-17 京东科技控股股份有限公司 Text data processing method and device, computer equipment and storage medium
CN112801097B (en) * 2021-04-14 2021-07-16 北京世纪好未来教育科技有限公司 Training method and device of text detection model and readable storage medium
CN113112511B (en) * 2021-04-19 2024-01-05 新东方教育科技集团有限公司 Method and device for correcting test paper, storage medium and electronic equipment
CN113221711A (en) * 2021-04-30 2021-08-06 北京金山数字娱乐科技有限公司 Information extraction method and device
CN112990181B (en) * 2021-04-30 2021-08-24 北京世纪好未来教育科技有限公司 Text recognition method, device, equipment and storage medium
CN113205426A (en) * 2021-05-27 2021-08-03 中库(北京)数据系统有限公司 Method and device for predicting popularity level of social media content
CN113298156A (en) * 2021-05-28 2021-08-24 有米科技股份有限公司 Neural network training method and device for image gender classification
CN113205160B (en) * 2021-07-05 2022-03-04 北京世纪好未来教育科技有限公司 Model training method, text recognition method, model training device, text recognition device, electronic equipment and medium
CN114005019B (en) * 2021-10-29 2023-09-22 北京有竹居网络技术有限公司 Method for identifying flip image and related equipment thereof
CN114065768B (en) * 2021-12-08 2022-12-09 马上消费金融股份有限公司 Feature fusion model training and text processing method and device
CN114663594A (en) * 2022-03-25 2022-06-24 中国电信股份有限公司 Image feature point detection method, device, medium, and apparatus
CN114724144B (en) * 2022-05-16 2024-02-09 北京百度网讯科技有限公司 Text recognition method, training device, training equipment and training medium for model
CN115205562B (en) * 2022-07-22 2023-03-14 四川云数赋智教育科技有限公司 Random test paper registration method based on feature points
CN116630755B (en) * 2023-04-10 2024-04-02 雄安创新研究院 Method, system and storage medium for detecting text position in scene image
CN116311320B (en) * 2023-05-22 2023-08-22 建信金融科技有限责任公司 Training method of text image fusion layer, text image recognition method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150193667A1 (en) * 2014-01-08 2015-07-09 Qualcomm Incorporated Processing text images with shadows
CN108288078A (en) * 2017-12-07 2018-07-17 腾讯科技(深圳)有限公司 Character identifying method, device and medium in a kind of image
CN108764228A (en) * 2018-05-28 2018-11-06 嘉兴善索智能科技有限公司 Word object detection method in a kind of image
CN109299274A (en) * 2018-11-07 2019-02-01 南京大学 A kind of natural scene Method for text detection based on full convolutional neural networks
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110110715A (en) * 2019-04-30 2019-08-09 北京金山云网络技术有限公司 Text detection model training method, text filed, content determine method and apparatus
CN110135248A (en) * 2019-04-03 2019-08-16 华南理工大学 A kind of natural scene Method for text detection based on deep learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108519970B (en) * 2018-02-06 2021-08-31 平安科技(深圳)有限公司 Method for identifying sensitive information in text, electronic device and readable storage medium
CN108764226B (en) * 2018-04-13 2022-05-03 顺丰科技有限公司 Image text recognition method, device, equipment and storage medium thereof
CN109086756B (en) * 2018-06-15 2021-08-03 众安信息技术服务有限公司 Text detection analysis method, device and equipment based on deep neural network
CN109447469B (en) * 2018-10-30 2022-06-24 创新先进技术有限公司 Text detection method, device and equipment
CN109492638A (en) * 2018-11-07 2019-03-19 北京旷视科技有限公司 Method for text detection, device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150193667A1 (en) * 2014-01-08 2015-07-09 Qualcomm Incorporated Processing text images with shadows
CN108288078A (en) * 2017-12-07 2018-07-17 腾讯科技(深圳)有限公司 Character identifying method, device and medium in a kind of image
CN108764228A (en) * 2018-05-28 2018-11-06 嘉兴善索智能科技有限公司 Word object detection method in a kind of image
CN109299274A (en) * 2018-11-07 2019-02-01 南京大学 A kind of natural scene Method for text detection based on full convolutional neural networks
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110135248A (en) * 2019-04-03 2019-08-16 华南理工大学 A kind of natural scene Method for text detection based on deep learning
CN110110715A (en) * 2019-04-30 2019-08-09 北京金山云网络技术有限公司 Text detection model training method, text filed, content determine method and apparatus

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RED STONE: "Wu Enda's CS229, someone condensed it into 6 Chinese cheat sheets!", 12 February 2019 (2019-02-12), pages 1 - 8, XP055750860, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/56534902> *
XINYU ZHOU, CONG YAO, HE WEN, YUZHI WANG, SHUCHANG ZHOU, WEIRAN HE, AND JIAJUN LIANG: "EAST: An Efficient and Accurate Scene Text Detector", COMPUTER SCIENCE, 10 July 2017 (2017-07-10), pages 5551 - 5560, XP080762096, DOI: 10.1109/CVPR.2017.283 *
YU ZHENG ,WANG QING-QING ,LYU YUE: "Scene Text Detection Based on Feature Fusion Network", COMPUTER SYSTEMS AND APPLICATIONS, vol. 27, no. 10, 15 October 2018 (2018-10-15), pages 1 - 10, XP055750846, ISSN: 1003-3254, DOI: 10.15888/j.cnki.csa.006539 *

Cited By (92)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417847A (en) * 2020-11-19 2021-02-26 湖南红网新媒体集团有限公司 News content safety monitoring method, system, device and storage medium
CN112434510B (en) * 2020-11-24 2024-03-29 北京字节跳动网络技术有限公司 Information processing method, device, electronic equipment and storage medium
CN112434510A (en) * 2020-11-24 2021-03-02 北京字节跳动网络技术有限公司 Information processing method and device, electronic equipment and storage medium
CN112328710A (en) * 2020-11-26 2021-02-05 北京百度网讯科技有限公司 Entity information processing method, entity information processing device, electronic equipment and storage medium
CN112560476A (en) * 2020-12-09 2021-03-26 中科讯飞互联(北京)信息科技有限公司 Text completion method, electronic device and storage device
CN112686812B (en) * 2020-12-10 2023-08-29 广州广电运通金融电子股份有限公司 Bank card inclination correction detection method and device, readable storage medium and terminal
CN112686812A (en) * 2020-12-10 2021-04-20 广州广电运通金融电子股份有限公司 Bank card inclination correction detection method and device, readable storage medium and terminal
CN112418209A (en) * 2020-12-15 2021-02-26 润联软件系统(深圳)有限公司 Character recognition method and device, computer equipment and storage medium
CN112613376A (en) * 2020-12-17 2021-04-06 深圳集智数字科技有限公司 Re-recognition method and device and electronic equipment
CN112613376B (en) * 2020-12-17 2024-04-02 深圳集智数字科技有限公司 Re-identification method and device and electronic equipment
CN112734699A (en) * 2020-12-24 2021-04-30 浙江大华技术股份有限公司 Article state warning method and device, storage medium and electronic device
CN112541496A (en) * 2020-12-24 2021-03-23 北京百度网讯科技有限公司 Method, device and equipment for extracting POI name and computer storage medium
CN112541496B (en) * 2020-12-24 2023-08-22 北京百度网讯科技有限公司 Method, device, equipment and computer storage medium for extracting POI (point of interest) names
CN112784692A (en) * 2020-12-31 2021-05-11 科大讯飞股份有限公司 Method, device and equipment for identifying text content of image and storage medium
CN112651373B (en) * 2021-01-04 2024-02-09 广联达科技股份有限公司 Method and device for identifying text information of building drawing
CN112651373A (en) * 2021-01-04 2021-04-13 广联达科技股份有限公司 Identification method and device for text information of construction drawing
CN113591893A (en) * 2021-01-26 2021-11-02 腾讯医疗健康(深圳)有限公司 Image processing method and device based on artificial intelligence and computer equipment
CN112802139A (en) * 2021-02-05 2021-05-14 歌尔股份有限公司 Image processing method and device, electronic equipment and readable storage medium
CN112861739A (en) * 2021-02-10 2021-05-28 中国科学技术大学 End-to-end text recognition method, model training method and device
CN112861739B (en) * 2021-02-10 2022-09-09 中国科学技术大学 End-to-end text recognition method, model training method and device
CN112949653A (en) * 2021-02-23 2021-06-11 科大讯飞股份有限公司 Text recognition method, electronic device and storage device
CN112949653B (en) * 2021-02-23 2024-04-16 科大讯飞股份有限公司 Text recognition method, electronic equipment and storage device
CN112966690B (en) * 2021-03-03 2023-01-13 中国科学院自动化研究所 Scene character detection method based on anchor-free frame and suggestion frame
CN112966690A (en) * 2021-03-03 2021-06-15 中国科学院自动化研究所 Scene character detection method based on anchor-free frame and suggestion frame
CN112966609B (en) * 2021-03-05 2023-08-11 北京百度网讯科技有限公司 Target detection method and device
CN112966609A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Target detection method and device
CN112989844A (en) * 2021-03-10 2021-06-18 北京奇艺世纪科技有限公司 Model training and text recognition method, device, equipment and storage medium
CN113011312A (en) * 2021-03-15 2021-06-22 中国科学技术大学 Training method of motion positioning model based on weak supervision text guidance
CN113076823B (en) * 2021-03-18 2023-12-12 深圳数联天下智能科技有限公司 Training method of age prediction model, age prediction method and related device
CN113076823A (en) * 2021-03-18 2021-07-06 深圳数联天下智能科技有限公司 Training method of age prediction model, age prediction method and related device
CN112927173A (en) * 2021-04-12 2021-06-08 平安科技(深圳)有限公司 Model compression method and device, computing equipment and storage medium
CN112927173B (en) * 2021-04-12 2023-04-18 平安科技(深圳)有限公司 Model compression method and device, computing equipment and storage medium
CN113139463A (en) * 2021-04-23 2021-07-20 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for training a model
CN113139463B (en) * 2021-04-23 2022-05-13 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for training a model
CN113205041A (en) * 2021-04-29 2021-08-03 百度在线网络技术(北京)有限公司 Structured information extraction method, device, equipment and storage medium
CN113205041B (en) * 2021-04-29 2023-07-28 百度在线网络技术(北京)有限公司 Structured information extraction method, device, equipment and storage medium
CN113205047A (en) * 2021-04-30 2021-08-03 平安科技(深圳)有限公司 Drug name identification method and device, computer equipment and storage medium
CN113221718B (en) * 2021-05-06 2024-01-16 新东方教育科技集团有限公司 Formula identification method, device, storage medium and electronic equipment
CN113221718A (en) * 2021-05-06 2021-08-06 新东方教育科技集团有限公司 Formula identification method and device, storage medium and electronic equipment
CN113344027B (en) * 2021-05-10 2024-04-23 北京迈格威科技有限公司 Method, device, equipment and storage medium for retrieving objects in image
CN113344027A (en) * 2021-05-10 2021-09-03 北京迈格威科技有限公司 Retrieval method, device, equipment and storage medium for object in image
CN113139625B (en) * 2021-05-18 2023-12-15 北京世纪好未来教育科技有限公司 Model training method, electronic equipment and storage medium thereof
CN113139625A (en) * 2021-05-18 2021-07-20 北京世纪好未来教育科技有限公司 Model training method, electronic device and storage medium thereof
CN113313022B (en) * 2021-05-27 2023-11-10 北京百度网讯科技有限公司 Training method of character recognition model and method for recognizing characters in image
CN113313022A (en) * 2021-05-27 2021-08-27 北京百度网讯科技有限公司 Training method of character recognition model and method for recognizing characters in image
CN113326887A (en) * 2021-06-16 2021-08-31 深圳思谋信息科技有限公司 Text detection method and device and computer equipment
CN113326887B (en) * 2021-06-16 2024-03-29 深圳思谋信息科技有限公司 Text detection method, device and computer equipment
CN113379500A (en) * 2021-06-21 2021-09-10 北京沃东天骏信息技术有限公司 Sequencing model training method and device, and article sequencing method and device
CN113379592A (en) * 2021-06-23 2021-09-10 北京百度网讯科技有限公司 Method and device for processing sensitive area in picture and electronic equipment
CN113379592B (en) * 2021-06-23 2023-09-01 北京百度网讯科技有限公司 Processing method and device for sensitive area in picture and electronic equipment
CN113343970B (en) * 2021-06-24 2024-03-08 中国平安人寿保险股份有限公司 Text image detection method, device, equipment and storage medium
CN113343970A (en) * 2021-06-24 2021-09-03 中国平安人寿保险股份有限公司 Text image detection method, device, equipment and storage medium
CN113378832A (en) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 Text detection model training method, text prediction box method and device
CN113298079B (en) * 2021-06-28 2023-10-27 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium
CN113298079A (en) * 2021-06-28 2021-08-24 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium
CN113361524A (en) * 2021-06-29 2021-09-07 北京百度网讯科技有限公司 Image processing method and device
CN113343987A (en) * 2021-06-30 2021-09-03 北京奇艺世纪科技有限公司 Text detection processing method and device, electronic equipment and storage medium
CN113343987B (en) * 2021-06-30 2023-08-22 北京奇艺世纪科技有限公司 Text detection processing method and device, electronic equipment and storage medium
CN113780087B (en) * 2021-08-11 2024-04-26 同济大学 Postal package text detection method and equipment based on deep learning
CN113780087A (en) * 2021-08-11 2021-12-10 同济大学 Postal parcel text detection method and equipment based on deep learning
CN113762109A (en) * 2021-08-23 2021-12-07 北京百度网讯科技有限公司 Training method of character positioning model and character positioning method
CN113762109B (en) * 2021-08-23 2023-11-07 北京百度网讯科技有限公司 Training method of character positioning model and character positioning method
CN113780131A (en) * 2021-08-31 2021-12-10 众安在线财产保险股份有限公司 Text image orientation recognition method and text content recognition method, device and equipment
CN113780131B (en) * 2021-08-31 2024-04-12 众安在线财产保险股份有限公司 Text image orientation recognition method, text content recognition method, device and equipment
CN113469878B (en) * 2021-09-02 2021-11-12 北京世纪好未来教育科技有限公司 Text erasing method and training method and device of model thereof, and storage medium
CN113469878A (en) * 2021-09-02 2021-10-01 北京世纪好未来教育科技有限公司 Text erasing method and training method and device of model thereof, and storage medium
CN113806589A (en) * 2021-09-29 2021-12-17 云从科技集团股份有限公司 Video clip positioning method, device and computer readable storage medium
CN113806589B (en) * 2021-09-29 2024-03-08 云从科技集团股份有限公司 Video clip positioning method, device and computer readable storage medium
CN114419199B (en) * 2021-12-20 2023-11-07 北京百度网讯科技有限公司 Picture marking method and device, electronic equipment and storage medium
CN114419199A (en) * 2021-12-20 2022-04-29 北京百度网讯科技有限公司 Picture labeling method and device, electronic equipment and storage medium
CN114022882A (en) * 2022-01-04 2022-02-08 北京世纪好未来教育科技有限公司 Text recognition model training method, text recognition device, text recognition equipment and medium
CN114821622A (en) * 2022-03-10 2022-07-29 北京百度网讯科技有限公司 Text extraction method, text extraction model training method, device and equipment
CN114937267A (en) * 2022-04-20 2022-08-23 北京世纪好未来教育科技有限公司 Training method and device for text recognition model and electronic equipment
CN114937267B (en) * 2022-04-20 2024-04-02 北京世纪好未来教育科技有限公司 Training method and device for text recognition model and electronic equipment
CN114758332B (en) * 2022-06-13 2022-09-02 北京万里红科技有限公司 Text detection method and device, computing equipment and storage medium
CN114758332A (en) * 2022-06-13 2022-07-15 北京万里红科技有限公司 Text detection method and device, computing equipment and storage medium
CN114827132B (en) * 2022-06-27 2022-09-09 河北东来工程技术服务有限公司 Ship traffic file transmission control method, system, device and storage medium
CN114842483B (en) * 2022-06-27 2023-11-28 齐鲁工业大学 Standard file information extraction method and system based on neural network and template matching
CN114842483A (en) * 2022-06-27 2022-08-02 齐鲁工业大学 Standard file information extraction method and system based on neural network and template matching
CN114827132A (en) * 2022-06-27 2022-07-29 河北东来工程技术服务有限公司 Ship traffic file transmission control method, system, device and storage medium
CN115171110A (en) * 2022-06-30 2022-10-11 北京百度网讯科技有限公司 Text recognition method, apparatus, device, medium, and product
CN115171110B (en) * 2022-06-30 2023-08-22 北京百度网讯科技有限公司 Text recognition method and device, equipment, medium and product
CN115601553A (en) * 2022-08-15 2023-01-13 杭州联汇科技股份有限公司(Cn) Visual model pre-training method based on multi-level picture description data
CN115601553B (en) * 2022-08-15 2023-08-18 杭州联汇科技股份有限公司 Visual model pre-training method based on multi-level picture description data
CN116226319A (en) * 2023-05-10 2023-06-06 浪潮电子信息产业股份有限公司 Hybrid heterogeneous model training method, device, equipment and readable storage medium
CN116226319B (en) * 2023-05-10 2023-08-04 浪潮电子信息产业股份有限公司 Hybrid heterogeneous model training method, device, equipment and readable storage medium
CN116503517A (en) * 2023-06-27 2023-07-28 江西农业大学 Method and system for generating image by long text
CN116503517B (en) * 2023-06-27 2023-09-05 江西农业大学 Method and system for generating image by long text
CN117315702B (en) * 2023-11-28 2024-02-23 山东正云信息科技有限公司 Text detection method, system and medium based on set prediction
CN117315702A (en) * 2023-11-28 2023-12-29 山东正云信息科技有限公司 Text detection method, system and medium based on set prediction
CN117593752B (en) * 2024-01-18 2024-04-09 星云海数字科技股份有限公司 PDF document input method, PDF document input system, storage medium and electronic equipment
CN117593752A (en) * 2024-01-18 2024-02-23 星云海数字科技股份有限公司 PDF document input method, PDF document input system, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN110110715A (en) 2019-08-09

Similar Documents

Publication Publication Date Title
WO2020221298A1 (en) Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
CN108898137B (en) Natural image character recognition method and system based on deep neural network
CN107609525B (en) Remote sensing image target detection method for constructing convolutional neural network based on pruning strategy
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN113297975B (en) Table structure identification method and device, storage medium and electronic equipment
WO2019192397A1 (en) End-to-end recognition method for scene text in any shape
US9129191B2 (en) Semantic object selection
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
US9129192B2 (en) Semantic object proposal generation and validation
CN109815770B (en) Two-dimensional code detection method, device and system
CN110647829A (en) Bill text recognition method and system
CN110032998B (en) Method, system, device and storage medium for detecting characters of natural scene picture
CN111652217A (en) Text detection method and device, electronic equipment and computer storage medium
CN109993040A (en) Text recognition method and device
CN111259940A (en) Target detection method based on space attention map
CN106372624B (en) Face recognition method and system
CN111475622A (en) Text classification method, device, terminal and storage medium
CN110909724B (en) Thumbnail generation method of multi-target image
CN112347284A (en) Combined trademark image retrieval method
CN113420669B (en) Document layout analysis method and system based on multi-scale training and cascade detection
CN115457565A (en) OCR character recognition method, electronic equipment and storage medium
CN114187595A (en) Document layout recognition method and system based on fusion of visual features and semantic features
KR102405522B1 (en) Apparatus and method for contextual unethical detection reflecting hierarchical characteristics of text
CN116935411A (en) Radical-level ancient character recognition method based on character decomposition and reconstruction
CN108845999B (en) Trademark image retrieval method based on multi-scale regional feature comparison

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20798388

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 180222)

122 Ep: pct application non-entry in european phase

Ref document number: 20798388

Country of ref document: EP

Kind code of ref document: A1