CN111242241A - A method of augmenting training samples for etched character recognition networks - Google Patents
A method of augmenting training samples for etched character recognition networks Download PDFInfo
- Publication number
- CN111242241A CN111242241A CN202010096003.5A CN202010096003A CN111242241A CN 111242241 A CN111242241 A CN 111242241A CN 202010096003 A CN202010096003 A CN 202010096003A CN 111242241 A CN111242241 A CN 111242241A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- stylized
- content
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000003190 augmentative effect Effects 0.000 title claims description 5
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 29
- 230000003416 augmentation Effects 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims description 25
- 238000011478 gradient descent method Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000005530 etching Methods 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 9
- 230000006870 function Effects 0.000 description 20
- 230000004913 activation Effects 0.000 description 7
- 238000010606 normalization Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
本发明公开了一种刻蚀字符识别网络训练样本增广方法,属于图像处理技术及深度学习领域。该方法包括以下步骤:采集场景中的刻蚀字符图像;根据刻蚀字符图像生成内容图像和风格图像;构建双向生成对抗网络;训练双向生成对抗网络;将内容图像和风格图像输入至训练好的双向生成对抗网络,生成刻蚀字符图像。本发明通过生成对抗网络生成大量刻蚀字符图像,在样本规模小的情况下依旧可以获得充足的训练样本,相比较于人工采集样本更加快速高效,且生成的刻蚀字符图像更逼真,提高了利用深度学习方法识别刻蚀字符的精度。
The invention discloses an etched character recognition network training sample augmentation method, which belongs to the field of image processing technology and deep learning. The method includes the following steps: collecting etched character images in a scene; generating content images and style images according to the etched character images; constructing a bidirectional generative adversarial network; training a bidirectional generative adversarial network; Bidirectional Generative Adversarial Network to generate etched character images. The invention generates a large number of etched character images by generating an adversarial network, and sufficient training samples can still be obtained in the case of a small sample size. Compared with manual sample collection, the invention is faster and more efficient, and the generated etched character images are more realistic and improve the performance. Accuracy of Recognizing Etched Characters Using Deep Learning Methods.
Description
技术领域technical field
本发明属于图像处理技术及深度学习领域,特别涉及一种刻蚀字符识别网络训练样本增广方法。The invention belongs to the field of image processing technology and deep learning, and particularly relates to a method for augmenting training samples of an etched character recognition network.
背景技术Background technique
刻蚀字符识别常见于工业设备标牌中的文本识别,是场景文本识别的难点之一。工业设备标牌通常为金属材质且部分放置于户外环境中,因此标牌图像中常常存在着反光、污渍、模糊、划痕等退化情况,这为刻蚀字符的识别带来了诸多困难。Etched character recognition is commonly used in text recognition in industrial equipment signage, and is one of the difficulties in scene text recognition. Industrial equipment signs are usually made of metal and are partially placed in outdoor environments, so there are often degradations such as reflections, stains, blurs, and scratches in sign images, which bring many difficulties to the recognition of etched characters.
使用深度学习的方法识别刻蚀字符需要大量数据训练字符识别模型以满足模型的泛化能力,在样本规模小的情况下训练出的模型容易出现过拟合的现象。在刻蚀字符识别的研究中,特定场景下可以采集的刻蚀字符图像数量小,样本数据匮乏问题严重,无法满足深度学习的需求。此外,样本的采集和整理消耗大量人力物力,仅靠人工收集整理样本效率非常低。因此,使用深度学习的方法识别刻蚀字符亟需解决样本规模小的问题。常用的图像增广方法有翻转、旋转、缩放比例、剪裁、移位、添加噪声等,但这些方法均是对已有的样本做一系列随机改变,只能产生和原有样本相似的图像。Using deep learning to recognize etched characters requires a large amount of data to train a character recognition model to meet the generalization ability of the model, and the model trained when the sample size is small is prone to overfitting. In the research of etched character recognition, the number of etched character images that can be collected in a specific scene is small, and the lack of sample data is serious, which cannot meet the needs of deep learning. In addition, the collection and sorting of samples consumes a lot of manpower and material resources, and the efficiency of collecting and sorting samples manually is very low. Therefore, the use of deep learning methods to identify etched characters needs to solve the problem of small sample size. Commonly used image augmentation methods include flipping, rotating, scaling, cropping, shifting, adding noise, etc., but these methods all make a series of random changes to the existing samples, and can only generate images similar to the original samples.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于针对刻蚀字符识别网络训练样本规模小的问题,提供一种图像样本增广方法,快速生成大批量刻蚀字符图像以满足深度学习网络的训练需求。The purpose of the present invention is to provide an image sample augmentation method for the problem of small scale of training samples of etched character recognition network, which can quickly generate a large batch of etched character images to meet the training requirements of deep learning network.
实现本发明目的的技术解决方案为:一种刻蚀字符识别网络训练样本增广方法,所述方法包括以下步骤:The technical solution for realizing the purpose of the present invention is: an etched character recognition network training sample augmentation method, the method comprises the following steps:
步骤1,采集场景中的刻蚀字符图像;Step 1, collect the etched character image in the scene;
步骤2,根据刻蚀字符图像生成内容图像和风格图像;Step 2, generating a content image and a style image according to the etched character image;
步骤3,构建双向生成对抗网络;Step 3, build a bidirectional generative adversarial network;
步骤4,训练所述双向生成对抗网络;Step 4, training the bidirectional generative adversarial network;
步骤5,将内容图像和风格图像输入至训练好的双向生成对抗网络,生成刻蚀字符图像。Step 5: Input the content image and the style image into the trained bidirectional generative adversarial network to generate the etched character image.
进一步地,步骤2所述根据刻蚀字符图像生成内容图像,具体包括:Further, generating the content image according to the etched character image in step 2 specifically includes:
步骤2-1,标注刻蚀字符图像的文本信息;Step 2-1, marking the text information of the etched character image;
步骤2-2,根据标注的刻蚀字符图像的真实标签统计字符信息;Step 2-2, count character information according to the real label of the marked etched character image;
步骤2-3,根据字符信息生成多种字体的内容图像。Step 2-3, generating content images of multiple fonts according to the character information.
进一步地,步骤2所述根据刻蚀字符图像生成风格图像,具体为:根据刻蚀字符图像特征生成风格图像。Further, generating the style image according to the etched character image in step 2 is specifically: generating the style image according to the characteristics of the etched character image.
进一步地,步骤2所述根据刻蚀字符图像生成风格图像,具体包括:Further, generating a style image according to the etched character image described in step 2 specifically includes:
从采集的刻蚀字符图像中选取分辨率符合第一预设条件和或清晰度符合第二预设条件和或特征显著度符合第三预设条件的图像,作为风格图像。From the collected etched character images, an image whose resolution meets the first preset condition and or whose definition meets the second preset condition and or whose feature significance meets the third preset condition is selected as a style image.
进一步地,步骤3中所述构建双向生成对抗网络,具体包括:Further, building a bidirectional generative adversarial network described in step 3 specifically includes:
步骤3-1,构建风格化生成对抗网络;Step 3-1, build a stylized generative adversarial network;
步骤3-2,构建去风格化生成对抗网络;Step 3-2, build a de-stylized generative adversarial network;
步骤3-3,构建损失函数。Step 3-3, construct the loss function.
进一步地,步骤3-1中所述风格化生成对抗网络包括:风格化生成网络和风格化判别网络;Further, the stylized generative adversarial network described in step 3-1 includes: a stylized generative network and a stylized discriminant network;
所述风格化生成网络,其输入为内容图像和风格图像,输出为风格化字符图像;风格化生成网络包括:内容编码器Ex1,其输入为内容图像,输出为内容特征向量;风格编码器Ex2,其输入为风格图像,输出为风格特征向量;生成器Gx,其输入为所述内容特征向量和风格特征向量,输出为风格化字符图像;The stylized generation network, whose input is a content image and a style image, and an output is a stylized character image; the stylized generation network includes: a content encoder E x1 , whose input is a content image, and the output is a content feature vector; a style encoder E x2 , the input is a style image, and the output is a style feature vector; the generator G x , whose input is the content feature vector and the style feature vector, and the output is a stylized character image;
所述风格化判别网络,其输入为所述风格化字符图像或真实的刻蚀字符图像,输出为一个0到1之间的数,用于表示输入的图像为真实图像的概率。The input of the stylized discriminating network is the stylized character image or the real etched character image, and the output is a number between 0 and 1, which is used to represent the probability that the input image is a real image.
进一步地,步骤3-2中所述去风格化生成对抗网络包括:去风格化生成网络和去风格化判别网络;Further, the de-stylized generative adversarial network described in step 3-2 includes: a de-stylized generation network and a de-stylized discriminant network;
所述去风格化生成网络,其输入为所述风格化字符图像,输出为去风格化字符图像;该去风格化生成网络包括:第一编码器Ey,其输入为风格化字符图像,输出为特征向量;第二生成器Gy,其输入为第一编码器Ey输出的特征向量,输出为去风格化字符图像;The de-stylized generation network, whose input is the stylized character image, and the output is the de-stylized character image; the de-stylized generation network includes: a first encoder E y , whose input is the stylized character image, and the output is is a feature vector; the second generator G y , its input is the feature vector output by the first encoder E y , and the output is a de-stylized character image;
所述去风格化判别网络,其输入为所述去风格化字符图像或者真实的内容图像,输出为一个0到1之间的数,用于表示输入的图像是真实图像的概率。The de-stylized discriminant network has the input of the de-stylized character image or the real content image, and the output is a number between 0 and 1, which is used to represent the probability that the input image is a real image.
进一步地,步骤3-3中所述损失函数L包括:内容图像重构损失L1、风格化生成对抗网络损失L2和去风格化生成对抗网络损失L3,公式为:Further, the loss function L described in step 3-3 includes: content image reconstruction loss L 1 , stylized generative adversarial network loss L 2 and de-stylized generative adversarial network loss L 3 , the formula is:
L=L1+L2+L3 L=L 1 +L 2 +L 3
所述内容图像重构损失L1,用于保证所述内容编码器Ex1能提取内容图像的核心信息,公式为:The content image reconstruction loss L 1 is used to ensure that the content encoder E x1 can extract the core information of the content image, and the formula is:
式中,x表示输入的内容图像,λx表示内容图像重构损失的权重,λx的取值范围为0到1;In the formula, x represents the input content image, λ x represents the weight of the content image reconstruction loss, and the value of λ x ranges from 0 to 1;
所述风格化生成对抗网络损失L2,包括第一像素损失Lspix和第一对抗损失Lsadv,公式为:The stylized generative adversarial network loss L 2 includes the first pixel loss L spix and the first adversarial loss L sadv , the formula is:
L2=λx1Lspix+λx2Lsadv L 2 =λ x1 L spix +λ x2 L sadv
式中,Lspix表示第一像素损失,Lsadv表示第一对抗损失,λx1、λx2分别表示Lspix、Lsadv的权重,λx1、λx2的取值范围为0到1;where L spix represents the first pixel loss, L sadv represents the first adversarial loss, λ x1 and λ x2 represent the weights of L spix and L sadv , respectively, and the value range of λ x1 and λ x2 is 0 to 1;
其中,第一像素损失Lspix的计算公式如下:Among them, the calculation formula of the first pixel loss L spix is as follows:
式中,x、y分别表示输入的内容图像和风格图像,y'表示风格化生成网络生成的图像;In the formula, x and y represent the input content image and style image, respectively, and y' represents the image generated by the stylized generation network;
第一对抗损失Lsadv的计算公式如下:The calculation formula of the first adversarial loss L sadv is as follows:
式中,定义为在风格图像y和风格化生成网络生成的图像y'之间沿直线均匀采样的向量,λsadv为取值范围为0到1的权重参数,Dx代表风格化判别网络;In the formula, Defined as a vector uniformly sampled along a straight line between the style image y and the image y' generated by the stylized generation network, λ sadv is a weight parameter ranging from 0 to 1, and D x represents the stylized discriminant network;
所述去风格化生成对抗网络损失L3,包括第二像素损失Ldpix、第二对抗损失Ldfeat和内容特征损失Ldadv,公式为:The de-stylized generative adversarial network loss L 3 includes the second pixel loss L dpix , the second adversarial loss L dfeat and the content feature loss L dadv , the formula is:
L3=λy1Ldpix+λy2Ldadv+λy3Ldfeat L 3 =λ y1 L dpix +λ y2 L dadv +λ y3 L dfeat
式中,λy1、λy2、λy3分别为Ldpix、Ldadv、Ldfeat的权重,且取值范围均为0到1;In the formula, λ y1 , λ y2 , and λ y3 are the weights of L dpix , L dadv , and L dfeat respectively, and the values range from 0 to 1;
其中,第二像素损失Ldpix的计算公式如下:Among them, the calculation formula of the second pixel loss L dpix is as follows:
第二对抗损失Ldfeat的计算公式如下:The calculation formula of the second adversarial loss L dfeat is as follows:
内容损失Ldadv的计算公式如下:The formula for calculating the content loss L dadv is as follows:
进一步地,步骤4中训练所述双向生成对抗网络,具体过程包括:Further, in step 4, the two-way generative adversarial network is trained, and the specific process includes:
步骤4-1,初始化双向生成对抗网络的参数和迭代次数;Step 4-1, initialize the parameters and the number of iterations of the bidirectional generative adversarial network;
步骤4-2,将内容图像输入至风格化生成对抗网络的内容编码器,并将内容编码器输出的特征输入至风格化生成网络,计算损失函数,并利用梯度下降法更新内容编码器的参数;Step 4-2, input the content image to the content encoder of the stylized generative adversarial network, input the features output by the content encoder to the stylized generative network, calculate the loss function, and use the gradient descent method to update the parameters of the content encoder ;
步骤4-3,将风格图像输入至去风格化生成网络,生成假的内容图像;Step 4-3, input the style image to the de-stylization generation network to generate a fake content image;
步骤4-4,分别将真实的内容图像和假的内容图像输入至去风格化判别网络,计算损失函数并利用梯度下降法更新去风格化判别网络的参数;Steps 4-4, respectively input the real content image and the fake content image to the de-stylized discriminating network, calculate the loss function and update the parameters of the de-stylizing discriminant network by using the gradient descent method;
步骤4-5,将假的内容图像输入至去风格化判别网络,计算损失函数,并利用梯度下降法更新去风格化生成网络的网络参数;Step 4-5, input the fake content image to the de-stylized discriminating network, calculate the loss function, and use the gradient descent method to update the network parameters of the de-stylizing generation network;
步骤4-6,将内容图像和风格图像输入至风格化生成网络,生成假的风格图像;Steps 4-6, input the content image and the style image into the stylization generation network to generate a fake style image;
步骤4-7,分别将真实的风格图像和假的风格图像输入至风格化判别网络,计算损失函数并利用梯度下降法更新风格化判别网络的参数;Steps 4-7, respectively input the real style image and the fake style image to the stylized discriminant network, calculate the loss function and update the parameters of the stylized discriminant network by using the gradient descent method;
步骤4-8,将假的风格图像输入至风格化判别网络,计算损失函数,并利用梯度下降法更新风格化生成网络的网络参数;Steps 4-8, input the fake style image to the stylized discriminating network, calculate the loss function, and use the gradient descent method to update the network parameters of the stylized generation network;
步骤4-9,判断当前迭代次数是否小于设定阈值,若是,重复步骤4-2~步骤4-8;否则结束双向生成对抗网络的训练。Step 4-9, determine whether the current number of iterations is less than the set threshold, if so, repeat steps 4-2 to 4-8; otherwise, end the training of the bidirectional generative adversarial network.
进一步地,步骤5所述将内容图像和风格图像输入至训练好的双向生成对抗网络,生成刻蚀字符图像,具体包括:Further, the content image and the style image are input into the trained bidirectional generative adversarial network described in step 5, and the etched character image is generated, which specifically includes:
步骤5-1,将内容图像和风格图像输入至训练好的风格化生成网络,生成刻蚀字符图像;Step 5-1, input the content image and the style image into the trained stylized generation network to generate the etched character image;
步骤5-2,对生成的刻蚀字符图像进行筛选,删除不符合预设要求的图像。In step 5-2, the generated etched character images are screened, and images that do not meet the preset requirements are deleted.
本发明与现有技术相比,其显著优点为:1)通过生成对抗网络生成大量刻蚀字符图像,在样本规模小的情况下依旧可以获得充足的训练样本;2)通过生成网络生成大量刻蚀字符图像相比较于人工采集样本更加快速高效;3)使用双向生成对抗网络可以生成逼真的字符图像,提高了利用深度学习方法识别刻蚀字符的精度。Compared with the prior art, the present invention has the following significant advantages: 1) generating a large number of etched character images through a generative adversarial network, sufficient training samples can still be obtained in the case of a small sample size; 2) generating a large number of etched characters through a generative network Compared with manual sample collection, etched character images are faster and more efficient; 3) The use of bidirectional generative adversarial network can generate realistic character images, which improves the accuracy of using deep learning methods to identify etched characters.
下面结合附图对本发明作进一步详细描述。The present invention will be described in further detail below with reference to the accompanying drawings.
附图说明Description of drawings
图1为一个实施例中刻蚀字符识别网络训练样本增广方法的流程图。FIG. 1 is a flowchart of a method for augmenting training samples of an etched character recognition network in one embodiment.
图2为一个实施例中内容图像示意图。FIG. 2 is a schematic diagram of a content image in one embodiment.
图3为一个实施例中刻蚀风格图像示意图。FIG. 3 is a schematic diagram of an etching style image in one embodiment.
图4为一个实施例中双向生成对抗网络示意图。FIG. 4 is a schematic diagram of a bidirectional generative adversarial network in one embodiment.
图5为一个实施例中双向生成对抗网络训练流程图。FIG. 5 is a flow chart of bidirectional generative adversarial network training in one embodiment.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
在一个实施例中,结合图1,提供了一种刻蚀字符识别网络训练样本增广方法,该方法包括以下步骤:In one embodiment, with reference to FIG. 1, a method for augmenting training samples of an etched character recognition network is provided, and the method includes the following steps:
步骤1,采集场景中的刻蚀字符图像;Step 1, collect the etched character image in the scene;
步骤2,根据刻蚀字符图像生成内容图像和风格图像;Step 2, generating a content image and a style image according to the etched character image;
步骤3,构建双向生成对抗网络;Step 3, build a bidirectional generative adversarial network;
步骤4,训练双向生成对抗网络;Step 4, train a bidirectional generative adversarial network;
步骤5,将内容图像和风格图像输入至训练好的双向生成对抗网络,生成刻蚀字符图像。Step 5: Input the content image and the style image into the trained bidirectional generative adversarial network to generate the etched character image.
进一步地,在其中一个实施例中,上述步骤2中根据刻蚀字符图像生成内容图像,具体包括:Further, in one of the embodiments, in the above step 2, the content image is generated according to the etched character image, which specifically includes:
步骤2-1,标注刻蚀字符图像的文本信息;Step 2-1, marking the text information of the etched character image;
步骤2-2,根据标注的刻蚀字符图像的真实标签统计字符信息;Step 2-2, count character information according to the real label of the marked etched character image;
步骤2-3,根据字符信息生成多种字体的内容图像如图2所示。Step 2-3, generating content images of multiple fonts according to the character information, as shown in FIG. 2 .
这里,多种字体包括宋体、楷体、合体、仿宋体等。Here, a variety of fonts include Song type, Kai type, fit type, imitation Song type and so on.
这里,作为一种具体示例,内容图像的文字颜色为黑色,背景颜色为白色。Here, as a specific example, the text color of the content image is black, and the background color is white.
进一步地,在其中一个实施例中,上述步骤2中根据刻蚀字符图像生成风格图像,具体为:根据刻蚀字符图像特征生成风格图像。Further, in one of the embodiments, in the above step 2, the style image is generated according to the etched character image, specifically: generating the style image according to the etched character image feature.
进一步地,在其中一个实施例中,上述步骤2中根据刻蚀字符图像生成风格图像,具体包括:Further, in one of the embodiments, in the above step 2, the style image is generated according to the etched character image, which specifically includes:
从采集的刻蚀字符图像中选取分辨率符合第一预设条件和或清晰度符合第二预设条件和或特征显著度符合第三预设条件的图像,作为风格图像如图3所示。From the collected etched character images, an image whose resolution meets the first preset condition and or whose clarity meets the second preset condition and or whose feature significance meets the third preset condition is selected as a style image as shown in FIG. 3 .
这里,上述生成的内容图像和风格图像尺寸规格相同。Here, the content image and style image generated above have the same size specification.
进一步地,在其中一个实施例中,步骤3中构建双向生成对抗网络如图4所示,具体过程包括:Further, in one of the embodiments, building a bidirectional generative adversarial network in step 3 is shown in FIG. 4 , and the specific process includes:
步骤3-1,构建风格化生成对抗网络;Step 3-1, build a stylized generative adversarial network;
步骤3-2,构建去风格化生成对抗网络;Step 3-2, build a de-stylized generative adversarial network;
步骤3-3,构建损失函数。Step 3-3, construct the loss function.
进一步地,在其中一个实施例中,步骤3-1中风格化生成对抗网络包括:风格化生成网络和风格化判别网络。Further, in one of the embodiments, the stylized generative adversarial network in step 3-1 includes: a stylized generative network and a stylized discriminant network.
其中,风格化生成网络的输入为内容图像和风格图像,输出为风格化字符图像。Among them, the input of the stylized generation network is the content image and the style image, and the output is the stylized character image.
这里,输入图像和输出图像为大小相同的三通道图像。Here, the input and output images are three-channel images of the same size.
风格化生成网络包括:内容编码器Ex1,风格编码器Ex2和生成器Gx。The stylization generation network includes: a content encoder E x1 , a style encoder E x2 and a generator G x .
内容编码器Ex1的输入为内容图像,输出为内容特征向量。内容编码器先使用卷积层提取内容图像的特征。然后使用反卷积层上采样将输出特征和之前网络层输出的特征融合。卷积层和反卷积层前会设置激活层,卷积层和反卷积层后会设置批归一化层。The input of the content encoder E x1 is the content image, and the output is the content feature vector. The content encoder first uses convolutional layers to extract features of the content image. Then use deconvolution layer upsampling to fuse the output features with the features output by the previous network layers. The activation layer is set before the convolution layer and the deconvolution layer, and the batch normalization layer is set after the convolution layer and the deconvolution layer.
风格编码器Ex2的输入为风格图像,输出为风格特征向量。风格编码器先使用卷积层提取风格图像的特征。然后使用反卷积层上采样将输出特征和之前网络层输出的特征融合。卷积层和反卷积层前会设置激活层,卷积层和反卷积层后会设置批归一化层。The input of the style encoder E x2 is the style image, and the output is the style feature vector. The style encoder first uses convolutional layers to extract the features of style images. Then use deconvolution layer upsampling to fuse the output features with the features output by the previous network layers. The activation layer is set before the convolution layer and the deconvolution layer, and the batch normalization layer is set after the convolution layer and the deconvolution layer.
生成器Gx的输入为上述内容特征向量和风格特征向量,输出为风格化字符图像。内容特征向量和风格特征向量的尺寸相同。生成器先拼接内容特征向量和风格特征向量,然后使用多个反卷积层上采样生成风格化字符图像。反卷积层前会设置激活层,反卷积层后会设置批归一化层。The input of the generator G x is the above-mentioned content feature vector and style feature vector, and the output is a stylized character image. The content feature vector and the style feature vector are of the same size. The generator first concatenates content feature vectors and style feature vectors, and then uses multiple deconvolution layers to upsample to generate stylized character images. The activation layer is set before the deconvolution layer, and the batch normalization layer is set after the deconvolution layer.
其中,风格化判别网络的输入为风格化字符图像或真实的刻蚀字符图像,输出为一个0到1之间的数,用于表示输入的图像为真实图像的概率。该风格化判别网络包括卷积层,卷积层前会设置激活层,卷积层后会设置批归一化层。Among them, the input of the stylized discriminant network is a stylized character image or a real etched character image, and the output is a number between 0 and 1, which is used to represent the probability that the input image is a real image. The stylized discriminant network includes a convolutional layer, an activation layer is set before the convolutional layer, and a batch normalization layer is set after the convolutional layer.
进一步地,在其中一个实施例中,步骤3-2中去风格化生成对抗网络包括:去风格化生成网络和去风格化判别网络。Further, in one of the embodiments, the de-stylized generative adversarial network in step 3-2 includes: a de-stylized generative network and a de-stylized discriminant network.
其中,去风格化生成网络的输入为风格化字符图像,输出为去风格化字符图像。Among them, the input of the de-stylized generation network is the stylized character image, and the output is the de-stylized character image.
去风格化生成网络包括:第一编码器Ey和第二生成器Gy。The de-stylized generative network includes: a first encoder E y and a second generator G y .
第一编码器Ey的输入为风格化字符图像,输出为特征向量。编码器先使用卷积层提取内容图像的特征。然后使用反卷积层上采样将输出特征和之前网络层输出的特征融合。卷积层和反卷积层前会设置激活层,卷积层和反卷积层后会设置批归一化层。The input of the first encoder E y is a stylized character image, and the output is a feature vector. The encoder first uses convolutional layers to extract features from the content image. Then use deconvolution layer upsampling to fuse the output features with the features output by the previous network layers. The activation layer is set before the convolution layer and the deconvolution layer, and the batch normalization layer is set after the convolution layer and the deconvolution layer.
第二生成器Gy的输入为第一编码器Ey输出的特征向量,输出为去风格化字符图像。生成器使用多个反卷积层上采样生成风格化字符图像。反卷积层前会设置激活层,反卷积层后会设置批归一化层。The input of the second generator G y is the feature vector output by the first encoder E y , and the output is a de-stylized character image. The generator uses multiple deconvolution layers to upsample to generate stylized character images. The activation layer is set before the deconvolution layer, and the batch normalization layer is set after the deconvolution layer.
其中,去风格化判别网络的输入为去风格化字符图像或者真实的内容图像,输出为一个0到1之间的数,用于表示输入的图像是真实图像的概率。该去风格化判别网络包括卷积层,卷积层前会设置激活层,卷积层后会设置批归一化层。Among them, the input of the de-stylized discriminant network is the de-stylized character image or the real content image, and the output is a number between 0 and 1, which is used to indicate the probability that the input image is a real image. The de-stylized discriminant network includes a convolutional layer, an activation layer is set before the convolutional layer, and a batch normalization layer is set after the convolutional layer.
进一步地,在其中一个实施例中,步骤3-3中损失函数L包括:内容图像重构损失L1、风格化生成对抗网络损失L2和去风格化生成对抗网络损失L3,公式为:Further, in one of the embodiments, the loss function L in step 3-3 includes: a content image reconstruction loss L 1 , a stylized generative adversarial network loss L 2 and a de-stylized generative adversarial network loss L 3 , the formula is:
L=L1+L2+L3 L=L 1 +L 2 +L 3
内容图像重构损失L1,用于保证内容编码器Ex1能提取内容图像的核心信息(包括字符结构、笔划信息等),公式为:The content image reconstruction loss L 1 is used to ensure that the content encoder E x1 can extract the core information (including character structure, stroke information, etc.) of the content image, and the formula is:
式中,x表示输入的内容图像,λx表示内容图像重构损失的权重,λx的取值范围为0到1;In the formula, x represents the input content image, λ x represents the weight of the content image reconstruction loss, and the value of λ x ranges from 0 to 1;
风格化生成对抗网络损失L2,包括第一像素损失Lspix和第一对抗损失Lsadv,公式为:The stylized generative adversarial network loss L 2 , including the first pixel loss L spix and the first adversarial loss L sadv , is formulated as:
L2=λx1Lspix+λx2Lsadv L 2 =λ x1 L spix +λ x2 L sadv
式中,Lspix表示第一像素损失,Lsadv表示第一对抗损失,λx1、λx2分别表示Lspix、Lsadv的权重,λx1、λx2的取值范围为0到1;where L spix represents the first pixel loss, L sadv represents the first adversarial loss, λ x1 and λ x2 represent the weights of L spix and L sadv , respectively, and the value range of λ x1 and λ x2 is 0 to 1;
其中,第一像素损失Lspix的计算公式如下:Among them, the calculation formula of the first pixel loss L spix is as follows:
式中,x、y分别表示输入的内容图像和风格图像,y'表示风格化生成网络生成的图像;In the formula, x and y represent the input content image and style image, respectively, and y' represents the image generated by the stylized generation network;
第一对抗损失Lsadv的计算公式如下:The calculation formula of the first adversarial loss L sadv is as follows:
式中,定义为在风格图像y和风格化生成网络生成的图像y'之间沿直线均匀采样的向量,λsadv为取值范围为0到1的权重参数,Dx代表风格化判别网络;In the formula, Defined as a vector uniformly sampled along a straight line between the style image y and the image y' generated by the stylized generation network, λ sadv is a weight parameter ranging from 0 to 1, and D x represents the stylized discriminant network;
去风格化生成对抗网络损失L3,包括第二像素损失Ldpix、第二对抗损失Ldfeat和内容特征损失Ldadv,公式为:The de-stylized generative adversarial network loss L 3 includes the second pixel loss L dpix , the second adversarial loss L dfeat and the content feature loss L dadv , the formula is:
L3=λy1Ldpix+λy2Ldadv+λy3Ldfeat L 3 =λ y1 L dpix +λ y2 L dadv +λ y3 L dfeat
式中,λy1、λy2、λy3分别为Ldpix、Ldadv、Ldfeat的权重,且取值范围均为0到1;In the formula, λ y1 , λ y2 , and λ y3 are the weights of L dpix , L dadv , and L dfeat respectively, and the values range from 0 to 1;
其中,第二像素损失Ldpix的计算公式如下:Among them, the calculation formula of the second pixel loss L dpix is as follows:
第二对抗损失Ldfeat的计算公式如下:The calculation formula of the second adversarial loss L dfeat is as follows:
内容损失Ldadv的计算公式如下:The formula for calculating the content loss L dadv is as follows:
进一步地,在其中一个实施例中,上述步骤4中训练双向生成对抗网络的目标是利用梯度下降法最小化上述损失函数L的值。梯度下降法是利用凸函数的特点,将凸函数的参数沿着梯度相反的方向移动一个步长就能实现函数值的下降。通过不断地迭代重复,最终找到凸函数的局部最小值。在此过程中参数θ的更新公式是:Further, in one of the embodiments, the goal of training the bidirectional generative adversarial network in the above step 4 is to use the gradient descent method to minimize the value of the above-mentioned loss function L. The gradient descent method utilizes the characteristics of convex functions, and moves the parameters of the convex function in the opposite direction of the gradient by one step to realize the decrease of the function value. By repeating iteratively, the local minimum of the convex function is finally found. The update formula of parameter θ in this process is:
式中,θ代表参数,η代表迭代步长,J(θ)代表损失函数。In the formula, θ represents the parameter, η represents the iteration step size, and J(θ) represents the loss function.
结合图5,训练双向生成对抗网络的具体过程包括:Combined with Figure 5, the specific process of training a bidirectional generative adversarial network includes:
步骤4-1,初始化双向生成对抗网络的参数和迭代次数;Step 4-1, initialize the parameters and the number of iterations of the bidirectional generative adversarial network;
步骤4-2,将内容图像输入至风格化生成对抗网络的内容编码器,并将内容编码器输出的特征输入至风格化生成网络,计算损失函数,并利用梯度下降法更新内容编码器的参数;Step 4-2, input the content image to the content encoder of the stylized generative adversarial network, input the features output by the content encoder to the stylized generative network, calculate the loss function, and use the gradient descent method to update the parameters of the content encoder ;
步骤4-3,将风格图像输入至去风格化生成网络,生成假的内容图像;Step 4-3, input the style image to the de-stylization generation network to generate a fake content image;
步骤4-4,分别将真实的内容图像和假的内容图像输入至去风格化判别网络,计算损失函数并利用梯度下降法更新去风格化判别网络的参数;Steps 4-4, respectively input the real content image and the fake content image to the de-stylized discriminating network, calculate the loss function and update the parameters of the de-stylizing discriminant network by using the gradient descent method;
步骤4-5,将假的内容图像输入至去风格化判别网络,计算损失函数,并利用梯度下降法更新去风格化生成网络的网络参数;Step 4-5, input the fake content image to the de-stylized discriminating network, calculate the loss function, and use the gradient descent method to update the network parameters of the de-stylizing generation network;
步骤4-6,将内容图像和风格图像输入至风格化生成网络,生成假的风格图像;Steps 4-6, input the content image and the style image into the stylization generation network to generate a fake style image;
步骤4-7,分别将真实的风格图像和假的风格图像输入至风格化判别网络,计算损失函数并利用梯度下降法更新风格化判别网络的参数;Steps 4-7, respectively input the real style image and the fake style image to the stylized discriminant network, calculate the loss function and update the parameters of the stylized discriminant network by using the gradient descent method;
步骤4-8,将假的风格图像输入至风格化判别网络,计算损失函数,并利用梯度下降法更新风格化生成网络的网络参数;Steps 4-8, input the fake style image to the stylized discriminating network, calculate the loss function, and use the gradient descent method to update the network parameters of the stylized generation network;
步骤4-9,判断当前迭代次数是否小于设定阈值,若是,重复步骤4-2~步骤4-8;否则结束双向生成对抗网络的训练。Step 4-9, determine whether the current number of iterations is less than the set threshold, if so, repeat steps 4-2 to 4-8; otherwise, end the training of the bidirectional generative adversarial network.
进一步地,在其中一个实施例中,上述步骤5将内容图像和风格图像输入至训练好的双向生成对抗网络,生成刻蚀字符图像,具体包括:Further, in one of the embodiments, the above-mentioned step 5 inputs the content image and the style image into the trained bidirectional generative adversarial network to generate the etched character image, which specifically includes:
步骤5-1,将内容图像和风格图像输入至训练好的风格化生成网络,生成刻蚀字符图像;Step 5-1, input the content image and the style image into the trained stylized generation network to generate the etched character image;
步骤5-2,对生成的刻蚀字符图像进行筛选,删除不符合预设要求的图像。In step 5-2, the generated etched character images are screened, and images that do not meet the preset requirements are deleted.
综上,本发明通过生成对抗网络生成大量刻蚀字符图像,在样本规模小的情况下依旧可以获得充足的训练样本,相比较于人工采集样本更加快速高效,且生成的刻蚀字符图像更逼真,提高了利用深度学习方法识别刻蚀字符的精度。To sum up, the present invention generates a large number of etched character images through a generative adversarial network, and sufficient training samples can still be obtained in the case of a small sample size. Compared with manual sample collection, the present invention is faster and more efficient, and the generated etched character images are more realistic. , which improves the accuracy of recognizing etched characters using deep learning methods.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010096003.5A CN111242241A (en) | 2020-02-17 | 2020-02-17 | A method of augmenting training samples for etched character recognition networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010096003.5A CN111242241A (en) | 2020-02-17 | 2020-02-17 | A method of augmenting training samples for etched character recognition networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111242241A true CN111242241A (en) | 2020-06-05 |
Family
ID=70879992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010096003.5A Pending CN111242241A (en) | 2020-02-17 | 2020-02-17 | A method of augmenting training samples for etched character recognition networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111242241A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112132916A (en) * | 2020-08-18 | 2020-12-25 | 浙江大学 | Seal cutting work customized design generation device utilizing generation countermeasure network |
CN112396577A (en) * | 2020-10-22 | 2021-02-23 | 国网浙江省电力有限公司杭州供电公司 | Defect detection method of transformer based on Poisson fusion sample expansion |
CN112489165A (en) * | 2020-11-06 | 2021-03-12 | 中科云谷科技有限公司 | Method, device and storage medium for synthesizing characters |
CN113761831A (en) * | 2020-11-13 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Method, device and equipment for generating style calligraphy and storage medium |
CN114781556A (en) * | 2022-06-22 | 2022-07-22 | 北京汉仪创新科技股份有限公司 | Font generation method, system, device and medium based on character part information |
CN114782961A (en) * | 2022-03-23 | 2022-07-22 | 华南理工大学 | A Character Image Augmentation Method Based on Shape Transformation |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110443864A (en) * | 2019-07-24 | 2019-11-12 | 北京大学 | A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning |
-
2020
- 2020-02-17 CN CN202010096003.5A patent/CN111242241A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110443864A (en) * | 2019-07-24 | 2019-11-12 | 北京大学 | A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning |
Non-Patent Citations (1)
Title |
---|
SHUAI YANG等: "TET-GAN: Text Effects Transfer via Stylization and Destylization", 《THE THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112132916A (en) * | 2020-08-18 | 2020-12-25 | 浙江大学 | Seal cutting work customized design generation device utilizing generation countermeasure network |
CN112132916B (en) * | 2020-08-18 | 2023-11-14 | 浙江大学 | Seal cutting work customized design generating device for generating countermeasure network |
CN112396577A (en) * | 2020-10-22 | 2021-02-23 | 国网浙江省电力有限公司杭州供电公司 | Defect detection method of transformer based on Poisson fusion sample expansion |
CN112489165A (en) * | 2020-11-06 | 2021-03-12 | 中科云谷科技有限公司 | Method, device and storage medium for synthesizing characters |
CN112489165B (en) * | 2020-11-06 | 2024-02-06 | 中科云谷科技有限公司 | Method, device and storage medium for synthesizing characters |
CN113761831A (en) * | 2020-11-13 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Method, device and equipment for generating style calligraphy and storage medium |
CN113761831B (en) * | 2020-11-13 | 2024-05-21 | 北京沃东天骏信息技术有限公司 | Style handwriting generation method, device, equipment and storage medium |
CN114782961A (en) * | 2022-03-23 | 2022-07-22 | 华南理工大学 | A Character Image Augmentation Method Based on Shape Transformation |
CN114781556A (en) * | 2022-06-22 | 2022-07-22 | 北京汉仪创新科技股份有限公司 | Font generation method, system, device and medium based on character part information |
CN114781556B (en) * | 2022-06-22 | 2022-09-02 | 北京汉仪创新科技股份有限公司 | Font generation method, system, device and medium based on character part information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111242241A (en) | A method of augmenting training samples for etched character recognition networks | |
CN109726657B (en) | Deep learning scene text sequence recognition method | |
CN110533737A (en) | The method generated based on structure guidance Chinese character style | |
CN110853057B (en) | Aerial image segmentation method based on global and multi-scale fully convolutional network | |
CN111767827A (en) | A deep learning-based mesoscale vortex identification method | |
CN112950561B (en) | Optical fiber end face defect detection method, device and storage medium | |
CN111340034B (en) | Text detection and identification method and system for natural scene | |
CN107194872A (en) | Remote sensed image super-resolution reconstruction method based on perception of content deep learning network | |
CN107330364A (en) | A kind of people counting method and system based on cGAN networks | |
CN116206185A (en) | Lightweight small target detection method based on improved YOLOv7 | |
CN108053454B (en) | A method for generating graph-structured data based on deep convolutional generative adversarial networks | |
CN111325165A (en) | A Scene Classification Method of Urban Remote Sensing Imagery Considering Spatial Relationship Information | |
CN113256649B (en) | Remote sensing image station selection and line selection semantic segmentation method based on deep learning | |
CN105608454A (en) | Text structure part detection neural network based text detection method and system | |
CN112232395B (en) | Semi-supervised image classification method for generating countermeasure network based on joint training | |
CN114627290A (en) | An Image Segmentation Algorithm of Mechanical Parts Based on Improved DeepLabV3+ Network | |
CN111667008A (en) | Personalized Chinese character font picture generation method based on feature fusion | |
CN113112003A (en) | Data amplification and deep learning channel estimation performance improvement method based on self-encoder | |
Ren et al. | Context aware edge-enhanced GAN for remote sensing image super-resolution | |
CN112215241B (en) | Image feature extraction device based on small sample learning | |
CN112528803B (en) | Road feature extraction method, device, equipment and storage medium | |
CN115292538A (en) | Map line element extraction method based on deep learning | |
CN112784831A (en) | Character recognition method for enhancing attention mechanism by fusing multilayer features | |
CN112861977A (en) | Transfer learning data processing method, system, medium, device, terminal and application | |
Elagamy et al. | HACR-MDL: handwritten Arabic character recognition model using deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200605 |