CN111242241A

CN111242241A - A method of augmenting training samples for etched character recognition networks

Info

Publication number: CN111242241A
Application number: CN202010096003.5A
Authority: CN
Inventors: 茅耀斌; 曹倩倩; 韩翊; 刁洁; 卓一; 张�浩; 项文波; 沈庆强
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2020-06-05

Abstract

The invention discloses an etched character recognition network training sample augmentation method, which belongs to the field of image processing technology and deep learning. The method includes the following steps: collecting etched character images in a scene; generating content images and style images according to the etched character images; constructing a bidirectional generative adversarial network; training a bidirectional generative adversarial network; Bidirectional Generative Adversarial Network to generate etched character images. The invention generates a large number of etched character images by generating an adversarial network, and sufficient training samples can still be obtained in the case of a small sample size. Compared with manual sample collection, the invention is faster and more efficient, and the generated etched character images are more realistic and improve the performance. Accuracy of Recognizing Etched Characters Using Deep Learning Methods.

Description

A method of augmenting training samples for etched character recognition networks

技术领域technical field

本发明属于图像处理技术及深度学习领域，特别涉及一种刻蚀字符识别网络训练样本增广方法。The invention belongs to the field of image processing technology and deep learning, and particularly relates to a method for augmenting training samples of an etched character recognition network.

背景技术Background technique

刻蚀字符识别常见于工业设备标牌中的文本识别，是场景文本识别的难点之一。工业设备标牌通常为金属材质且部分放置于户外环境中，因此标牌图像中常常存在着反光、污渍、模糊、划痕等退化情况，这为刻蚀字符的识别带来了诸多困难。Etched character recognition is commonly used in text recognition in industrial equipment signage, and is one of the difficulties in scene text recognition. Industrial equipment signs are usually made of metal and are partially placed in outdoor environments, so there are often degradations such as reflections, stains, blurs, and scratches in sign images, which bring many difficulties to the recognition of etched characters.

使用深度学习的方法识别刻蚀字符需要大量数据训练字符识别模型以满足模型的泛化能力，在样本规模小的情况下训练出的模型容易出现过拟合的现象。在刻蚀字符识别的研究中，特定场景下可以采集的刻蚀字符图像数量小，样本数据匮乏问题严重，无法满足深度学习的需求。此外，样本的采集和整理消耗大量人力物力，仅靠人工收集整理样本效率非常低。因此，使用深度学习的方法识别刻蚀字符亟需解决样本规模小的问题。常用的图像增广方法有翻转、旋转、缩放比例、剪裁、移位、添加噪声等，但这些方法均是对已有的样本做一系列随机改变，只能产生和原有样本相似的图像。Using deep learning to recognize etched characters requires a large amount of data to train a character recognition model to meet the generalization ability of the model, and the model trained when the sample size is small is prone to overfitting. In the research of etched character recognition, the number of etched character images that can be collected in a specific scene is small, and the lack of sample data is serious, which cannot meet the needs of deep learning. In addition, the collection and sorting of samples consumes a lot of manpower and material resources, and the efficiency of collecting and sorting samples manually is very low. Therefore, the use of deep learning methods to identify etched characters needs to solve the problem of small sample size. Commonly used image augmentation methods include flipping, rotating, scaling, cropping, shifting, adding noise, etc., but these methods all make a series of random changes to the existing samples, and can only generate images similar to the original samples.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于针对刻蚀字符识别网络训练样本规模小的问题，提供一种图像样本增广方法，快速生成大批量刻蚀字符图像以满足深度学习网络的训练需求。The purpose of the present invention is to provide an image sample augmentation method for the problem of small scale of training samples of etched character recognition network, which can quickly generate a large batch of etched character images to meet the training requirements of deep learning network.

实现本发明目的的技术解决方案为：一种刻蚀字符识别网络训练样本增广方法，所述方法包括以下步骤：The technical solution for realizing the purpose of the present invention is: an etched character recognition network training sample augmentation method, the method comprises the following steps:

步骤1，采集场景中的刻蚀字符图像；Step 1, collect the etched character image in the scene;

步骤2，根据刻蚀字符图像生成内容图像和风格图像；Step 2, generating a content image and a style image according to the etched character image;

步骤3，构建双向生成对抗网络；Step 3, build a bidirectional generative adversarial network;

步骤4，训练所述双向生成对抗网络；Step 4, training the bidirectional generative adversarial network;

步骤5，将内容图像和风格图像输入至训练好的双向生成对抗网络，生成刻蚀字符图像。Step 5: Input the content image and the style image into the trained bidirectional generative adversarial network to generate the etched character image.

进一步地，步骤2所述根据刻蚀字符图像生成内容图像，具体包括：Further, generating the content image according to the etched character image in step 2 specifically includes:

步骤2-1，标注刻蚀字符图像的文本信息；Step 2-1, marking the text information of the etched character image;

步骤2-2，根据标注的刻蚀字符图像的真实标签统计字符信息；Step 2-2, count character information according to the real label of the marked etched character image;

步骤2-3，根据字符信息生成多种字体的内容图像。Step 2-3, generating content images of multiple fonts according to the character information.

进一步地，步骤2所述根据刻蚀字符图像生成风格图像，具体为：根据刻蚀字符图像特征生成风格图像。Further, generating the style image according to the etched character image in step 2 is specifically: generating the style image according to the characteristics of the etched character image.

进一步地，步骤2所述根据刻蚀字符图像生成风格图像，具体包括：Further, generating a style image according to the etched character image described in step 2 specifically includes:

从采集的刻蚀字符图像中选取分辨率符合第一预设条件和或清晰度符合第二预设条件和或特征显著度符合第三预设条件的图像，作为风格图像。From the collected etched character images, an image whose resolution meets the first preset condition and or whose definition meets the second preset condition and or whose feature significance meets the third preset condition is selected as a style image.

进一步地，步骤3中所述构建双向生成对抗网络，具体包括：Further, building a bidirectional generative adversarial network described in step 3 specifically includes:

步骤3-1，构建风格化生成对抗网络；Step 3-1, build a stylized generative adversarial network;

步骤3-2，构建去风格化生成对抗网络；Step 3-2, build a de-stylized generative adversarial network;

步骤3-3，构建损失函数。Step 3-3, construct the loss function.

进一步地，步骤3-1中所述风格化生成对抗网络包括：风格化生成网络和风格化判别网络；Further, the stylized generative adversarial network described in step 3-1 includes: a stylized generative network and a stylized discriminant network;

所述风格化生成网络，其输入为内容图像和风格图像，输出为风格化字符图像；风格化生成网络包括：内容编码器E_x1，其输入为内容图像，输出为内容特征向量；风格编码器E_x2，其输入为风格图像，输出为风格特征向量；生成器G_x，其输入为所述内容特征向量和风格特征向量，输出为风格化字符图像；The stylized generation network, whose input is a content image and a style image, and an output is a stylized character image; the stylized generation network includes: a content encoder E _x1 , whose input is a content image, and the output is a content feature vector; a style encoder E _x2 , the input is a style image, and the output is a style feature vector; the generator G _x , whose input is the content feature vector and the style feature vector, and the output is a stylized character image;

所述风格化判别网络，其输入为所述风格化字符图像或真实的刻蚀字符图像，输出为一个0到1之间的数，用于表示输入的图像为真实图像的概率。The input of the stylized discriminating network is the stylized character image or the real etched character image, and the output is a number between 0 and 1, which is used to represent the probability that the input image is a real image.

进一步地，步骤3-2中所述去风格化生成对抗网络包括：去风格化生成网络和去风格化判别网络；Further, the de-stylized generative adversarial network described in step 3-2 includes: a de-stylized generation network and a de-stylized discriminant network;

所述去风格化生成网络，其输入为所述风格化字符图像，输出为去风格化字符图像；该去风格化生成网络包括：第一编码器E_y，其输入为风格化字符图像，输出为特征向量；第二生成器G_y，其输入为第一编码器E_y输出的特征向量，输出为去风格化字符图像；The de-stylized generation network, whose input is the stylized character image, and the output is the de-stylized character image; the de-stylized generation network includes: a first encoder E _y , whose input is the stylized character image, and the output is is a feature vector; the second generator G _y , its input is the feature vector output by the first encoder E _y , and the output is a de-stylized character image;

所述去风格化判别网络，其输入为所述去风格化字符图像或者真实的内容图像，输出为一个0到1之间的数，用于表示输入的图像是真实图像的概率。The de-stylized discriminant network has the input of the de-stylized character image or the real content image, and the output is a number between 0 and 1, which is used to represent the probability that the input image is a real image.

进一步地，步骤3-3中所述损失函数L包括：内容图像重构损失L₁、风格化生成对抗网络损失L₂和去风格化生成对抗网络损失L₃，公式为：Further, the loss function L described in step 3-3 includes: content image reconstruction loss L ₁ , stylized generative adversarial network loss L ₂ and de-stylized generative adversarial network loss L ₃ , the formula is:

L＝L₁+L₂+L₃ L=L ₁ +L ₂ +L ₃

所述内容图像重构损失L₁，用于保证所述内容编码器E_x1能提取内容图像的核心信息，公式为：The content image reconstruction loss L ₁ is used to ensure that the content encoder E _x1 can extract the core information of the content image, and the formula is:

式中，x表示输入的内容图像，λ_x表示内容图像重构损失的权重，λ_x的取值范围为0到1；In the formula, x represents the input content image, λ _x represents the weight of the content image reconstruction loss, and the value of λ _x ranges from 0 to 1;

所述风格化生成对抗网络损失L₂，包括第一像素损失L_spix和第一对抗损失L_sadv，公式为：The stylized generative adversarial network loss L ₂ includes the first pixel loss L _spix and the first adversarial loss L _sadv , the formula is:

L₂＝λ_x1L_spix+λ_x2L_sadv L ₂ =λ _x1 L _spix +λ _x2 L _sadv

式中，L_spix表示第一像素损失，L_sadv表示第一对抗损失，λ_x1、λ_x2分别表示L_spix、L_sadv的权重，λ_x1、λ_x2的取值范围为0到1；where L _spix represents the first pixel loss, L _sadv represents the first adversarial loss, λ _x1 and λ _x2 represent the weights of L _{spix and L sadv} _, respectively, and the value range of λ _x1 and λ _x2 is 0 to 1;

其中，第一像素损失L_spix的计算公式如下：Among them, the calculation formula of the first pixel loss L _spix is as follows:

式中，x、y分别表示输入的内容图像和风格图像，y'表示风格化生成网络生成的图像；In the formula, x and y represent the input content image and style image, respectively, and y' represents the image generated by the stylized generation network;

第一对抗损失L_sadv的计算公式如下：The calculation formula of the first adversarial loss L _sadv is as follows:

式中，

定义为在风格图像y和风格化生成网络生成的图像y'之间沿直线均匀采样的向量，λ_sadv为取值范围为0到1的权重参数，D_x代表风格化判别网络；In the formula,

Defined as a vector uniformly sampled along a straight line between the style image y and the image y' generated by the stylized generation network, λ _sadv is a weight parameter ranging from 0 to 1, and D _x represents the stylized discriminant network;

所述去风格化生成对抗网络损失L₃，包括第二像素损失L_dpix、第二对抗损失L_dfeat和内容特征损失L_dadv，公式为：The de-stylized generative adversarial network loss L ₃ includes the second pixel loss L _dpix , the second adversarial loss L _dfeat and the content feature loss L _dadv , the formula is:

L₃＝λ_y1L_dpix+λ_y2L_dadv+λ_y3L_dfeat L ₃ =λ _y1 L _dpix +λ _y2 L _dadv +λ _y3 L _dfeat

式中，λ_y1、λ_y2、λ_y3分别为L_dpix、L_dadv、L_dfeat的权重，且取值范围均为0到1；In the formula, λ _y1 , λ _y2 , and λ _y3 are the weights of L _dpix , L _dadv , and L _dfeat respectively, and the values range from 0 to 1;

其中，第二像素损失L_dpix的计算公式如下：Among them, the calculation formula of the second pixel loss L _dpix is as follows:

第二对抗损失L_dfeat的计算公式如下：The calculation formula of the second adversarial loss L _dfeat is as follows:

内容损失L_dadv的计算公式如下：The formula for calculating the content loss L _dadv is as follows:

进一步地，步骤4中训练所述双向生成对抗网络，具体过程包括：Further, in step 4, the two-way generative adversarial network is trained, and the specific process includes:

步骤4-1，初始化双向生成对抗网络的参数和迭代次数；Step 4-1, initialize the parameters and the number of iterations of the bidirectional generative adversarial network;

步骤4-2，将内容图像输入至风格化生成对抗网络的内容编码器，并将内容编码器输出的特征输入至风格化生成网络，计算损失函数，并利用梯度下降法更新内容编码器的参数；Step 4-2, input the content image to the content encoder of the stylized generative adversarial network, input the features output by the content encoder to the stylized generative network, calculate the loss function, and use the gradient descent method to update the parameters of the content encoder ;

步骤4-3，将风格图像输入至去风格化生成网络，生成假的内容图像；Step 4-3, input the style image to the de-stylization generation network to generate a fake content image;

步骤4-4，分别将真实的内容图像和假的内容图像输入至去风格化判别网络，计算损失函数并利用梯度下降法更新去风格化判别网络的参数；Steps 4-4, respectively input the real content image and the fake content image to the de-stylized discriminating network, calculate the loss function and update the parameters of the de-stylizing discriminant network by using the gradient descent method;

步骤4-5，将假的内容图像输入至去风格化判别网络，计算损失函数，并利用梯度下降法更新去风格化生成网络的网络参数；Step 4-5, input the fake content image to the de-stylized discriminating network, calculate the loss function, and use the gradient descent method to update the network parameters of the de-stylizing generation network;

步骤4-6，将内容图像和风格图像输入至风格化生成网络，生成假的风格图像；Steps 4-6, input the content image and the style image into the stylization generation network to generate a fake style image;

步骤4-7，分别将真实的风格图像和假的风格图像输入至风格化判别网络，计算损失函数并利用梯度下降法更新风格化判别网络的参数；Steps 4-7, respectively input the real style image and the fake style image to the stylized discriminant network, calculate the loss function and update the parameters of the stylized discriminant network by using the gradient descent method;

步骤4-8，将假的风格图像输入至风格化判别网络，计算损失函数，并利用梯度下降法更新风格化生成网络的网络参数；Steps 4-8, input the fake style image to the stylized discriminating network, calculate the loss function, and use the gradient descent method to update the network parameters of the stylized generation network;

步骤4-9，判断当前迭代次数是否小于设定阈值，若是，重复步骤4-2～步骤4-8；否则结束双向生成对抗网络的训练。Step 4-9, determine whether the current number of iterations is less than the set threshold, if so, repeat steps 4-2 to 4-8; otherwise, end the training of the bidirectional generative adversarial network.

进一步地，步骤5所述将内容图像和风格图像输入至训练好的双向生成对抗网络，生成刻蚀字符图像，具体包括：Further, the content image and the style image are input into the trained bidirectional generative adversarial network described in step 5, and the etched character image is generated, which specifically includes:

步骤5-1，将内容图像和风格图像输入至训练好的风格化生成网络，生成刻蚀字符图像；Step 5-1, input the content image and the style image into the trained stylized generation network to generate the etched character image;

步骤5-2，对生成的刻蚀字符图像进行筛选，删除不符合预设要求的图像。In step 5-2, the generated etched character images are screened, and images that do not meet the preset requirements are deleted.

本发明与现有技术相比，其显著优点为：1)通过生成对抗网络生成大量刻蚀字符图像，在样本规模小的情况下依旧可以获得充足的训练样本；2)通过生成网络生成大量刻蚀字符图像相比较于人工采集样本更加快速高效；3)使用双向生成对抗网络可以生成逼真的字符图像，提高了利用深度学习方法识别刻蚀字符的精度。Compared with the prior art, the present invention has the following significant advantages: 1) generating a large number of etched character images through a generative adversarial network, sufficient training samples can still be obtained in the case of a small sample size; 2) generating a large number of etched characters through a generative network Compared with manual sample collection, etched character images are faster and more efficient; 3) The use of bidirectional generative adversarial network can generate realistic character images, which improves the accuracy of using deep learning methods to identify etched characters.

下面结合附图对本发明作进一步详细描述。The present invention will be described in further detail below with reference to the accompanying drawings.

附图说明Description of drawings

图1为一个实施例中刻蚀字符识别网络训练样本增广方法的流程图。FIG. 1 is a flowchart of a method for augmenting training samples of an etched character recognition network in one embodiment.

图2为一个实施例中内容图像示意图。FIG. 2 is a schematic diagram of a content image in one embodiment.

图3为一个实施例中刻蚀风格图像示意图。FIG. 3 is a schematic diagram of an etching style image in one embodiment.

图4为一个实施例中双向生成对抗网络示意图。FIG. 4 is a schematic diagram of a bidirectional generative adversarial network in one embodiment.

图5为一个实施例中双向生成对抗网络训练流程图。FIG. 5 is a flow chart of bidirectional generative adversarial network training in one embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

在一个实施例中，结合图1，提供了一种刻蚀字符识别网络训练样本增广方法，该方法包括以下步骤：In one embodiment, with reference to FIG. 1, a method for augmenting training samples of an etched character recognition network is provided, and the method includes the following steps:

步骤4，训练双向生成对抗网络；Step 4, train a bidirectional generative adversarial network;

进一步地，在其中一个实施例中，上述步骤2中根据刻蚀字符图像生成内容图像，具体包括：Further, in one of the embodiments, in the above step 2, the content image is generated according to the etched character image, which specifically includes:

步骤2-3，根据字符信息生成多种字体的内容图像如图2所示。Step 2-3, generating content images of multiple fonts according to the character information, as shown in FIG. 2 .

这里，多种字体包括宋体、楷体、合体、仿宋体等。Here, a variety of fonts include Song type, Kai type, fit type, imitation Song type and so on.

这里，作为一种具体示例，内容图像的文字颜色为黑色，背景颜色为白色。Here, as a specific example, the text color of the content image is black, and the background color is white.

进一步地，在其中一个实施例中，上述步骤2中根据刻蚀字符图像生成风格图像，具体为：根据刻蚀字符图像特征生成风格图像。Further, in one of the embodiments, in the above step 2, the style image is generated according to the etched character image, specifically: generating the style image according to the etched character image feature.

进一步地，在其中一个实施例中，上述步骤2中根据刻蚀字符图像生成风格图像，具体包括：Further, in one of the embodiments, in the above step 2, the style image is generated according to the etched character image, which specifically includes:

从采集的刻蚀字符图像中选取分辨率符合第一预设条件和或清晰度符合第二预设条件和或特征显著度符合第三预设条件的图像，作为风格图像如图3所示。From the collected etched character images, an image whose resolution meets the first preset condition and or whose clarity meets the second preset condition and or whose feature significance meets the third preset condition is selected as a style image as shown in FIG. 3 .

这里，上述生成的内容图像和风格图像尺寸规格相同。Here, the content image and style image generated above have the same size specification.

进一步地，在其中一个实施例中，步骤3中构建双向生成对抗网络如图4所示，具体过程包括：Further, in one of the embodiments, building a bidirectional generative adversarial network in step 3 is shown in FIG. 4 , and the specific process includes:

步骤3-3，构建损失函数。Step 3-3, construct the loss function.

进一步地，在其中一个实施例中，步骤3-1中风格化生成对抗网络包括：风格化生成网络和风格化判别网络。Further, in one of the embodiments, the stylized generative adversarial network in step 3-1 includes: a stylized generative network and a stylized discriminant network.

其中，风格化生成网络的输入为内容图像和风格图像，输出为风格化字符图像。Among them, the input of the stylized generation network is the content image and the style image, and the output is the stylized character image.

这里，输入图像和输出图像为大小相同的三通道图像。Here, the input and output images are three-channel images of the same size.

风格化生成网络包括：内容编码器E_x1，风格编码器E_x2和生成器G_x。The stylization generation network includes: a content encoder E _x1 , a style encoder E _x2 and a generator G _x .

内容编码器E_x1的输入为内容图像，输出为内容特征向量。内容编码器先使用卷积层提取内容图像的特征。然后使用反卷积层上采样将输出特征和之前网络层输出的特征融合。卷积层和反卷积层前会设置激活层，卷积层和反卷积层后会设置批归一化层。The input of the content encoder E _x1 is the content image, and the output is the content feature vector. The content encoder first uses convolutional layers to extract features of the content image. Then use deconvolution layer upsampling to fuse the output features with the features output by the previous network layers. The activation layer is set before the convolution layer and the deconvolution layer, and the batch normalization layer is set after the convolution layer and the deconvolution layer.

风格编码器E_x2的输入为风格图像，输出为风格特征向量。风格编码器先使用卷积层提取风格图像的特征。然后使用反卷积层上采样将输出特征和之前网络层输出的特征融合。卷积层和反卷积层前会设置激活层，卷积层和反卷积层后会设置批归一化层。The input of the style encoder E _x2 is the style image, and the output is the style feature vector. The style encoder first uses convolutional layers to extract the features of style images. Then use deconvolution layer upsampling to fuse the output features with the features output by the previous network layers. The activation layer is set before the convolution layer and the deconvolution layer, and the batch normalization layer is set after the convolution layer and the deconvolution layer.

生成器G_x的输入为上述内容特征向量和风格特征向量，输出为风格化字符图像。内容特征向量和风格特征向量的尺寸相同。生成器先拼接内容特征向量和风格特征向量，然后使用多个反卷积层上采样生成风格化字符图像。反卷积层前会设置激活层，反卷积层后会设置批归一化层。The input of the generator G _x is the above-mentioned content feature vector and style feature vector, and the output is a stylized character image. The content feature vector and the style feature vector are of the same size. The generator first concatenates content feature vectors and style feature vectors, and then uses multiple deconvolution layers to upsample to generate stylized character images. The activation layer is set before the deconvolution layer, and the batch normalization layer is set after the deconvolution layer.

其中，风格化判别网络的输入为风格化字符图像或真实的刻蚀字符图像，输出为一个0到1之间的数，用于表示输入的图像为真实图像的概率。该风格化判别网络包括卷积层，卷积层前会设置激活层，卷积层后会设置批归一化层。Among them, the input of the stylized discriminant network is a stylized character image or a real etched character image, and the output is a number between 0 and 1, which is used to represent the probability that the input image is a real image. The stylized discriminant network includes a convolutional layer, an activation layer is set before the convolutional layer, and a batch normalization layer is set after the convolutional layer.

进一步地，在其中一个实施例中，步骤3-2中去风格化生成对抗网络包括：去风格化生成网络和去风格化判别网络。Further, in one of the embodiments, the de-stylized generative adversarial network in step 3-2 includes: a de-stylized generative network and a de-stylized discriminant network.

其中，去风格化生成网络的输入为风格化字符图像，输出为去风格化字符图像。Among them, the input of the de-stylized generation network is the stylized character image, and the output is the de-stylized character image.

去风格化生成网络包括：第一编码器E_y和第二生成器G_y。The de-stylized generative network includes: a first encoder E _y and a second generator G _y .

第一编码器E_y的输入为风格化字符图像，输出为特征向量。编码器先使用卷积层提取内容图像的特征。然后使用反卷积层上采样将输出特征和之前网络层输出的特征融合。卷积层和反卷积层前会设置激活层，卷积层和反卷积层后会设置批归一化层。The input of the first encoder E _y is a stylized character image, and the output is a feature vector. The encoder first uses convolutional layers to extract features from the content image. Then use deconvolution layer upsampling to fuse the output features with the features output by the previous network layers. The activation layer is set before the convolution layer and the deconvolution layer, and the batch normalization layer is set after the convolution layer and the deconvolution layer.

第二生成器G_y的输入为第一编码器E_y输出的特征向量，输出为去风格化字符图像。生成器使用多个反卷积层上采样生成风格化字符图像。反卷积层前会设置激活层，反卷积层后会设置批归一化层。The input of the second generator G _y is the feature vector output by the first encoder E _y , and the output is a de-stylized character image. The generator uses multiple deconvolution layers to upsample to generate stylized character images. The activation layer is set before the deconvolution layer, and the batch normalization layer is set after the deconvolution layer.

其中，去风格化判别网络的输入为去风格化字符图像或者真实的内容图像，输出为一个0到1之间的数，用于表示输入的图像是真实图像的概率。该去风格化判别网络包括卷积层，卷积层前会设置激活层，卷积层后会设置批归一化层。Among them, the input of the de-stylized discriminant network is the de-stylized character image or the real content image, and the output is a number between 0 and 1, which is used to indicate the probability that the input image is a real image. The de-stylized discriminant network includes a convolutional layer, an activation layer is set before the convolutional layer, and a batch normalization layer is set after the convolutional layer.

进一步地，在其中一个实施例中，步骤3-3中损失函数L包括：内容图像重构损失L₁、风格化生成对抗网络损失L₂和去风格化生成对抗网络损失L₃，公式为：Further, in one of the embodiments, the loss function L in step 3-3 includes: a content image reconstruction loss L ₁ , a stylized generative adversarial network loss L ₂ and a de-stylized generative adversarial network loss L ₃ , the formula is:

L＝L₁+L₂+L₃ L=L ₁ +L ₂ +L ₃

内容图像重构损失L₁，用于保证内容编码器E_x1能提取内容图像的核心信息(包括字符结构、笔划信息等)，公式为：The content image reconstruction loss L ₁ is used to ensure that the content encoder E _x1 can extract the core information (including character structure, stroke information, etc.) of the content image, and the formula is:

风格化生成对抗网络损失L₂，包括第一像素损失L_spix和第一对抗损失L_sadv，公式为：The stylized generative adversarial network loss L ₂ , including the first pixel loss L _spix and the first adversarial loss L _sadv , is formulated as:

L₂＝λ_x1L_spix+λ_x2L_sadv L ₂ =λ _x1 L _spix +λ _x2 L _sadv

式中，

去风格化生成对抗网络损失L₃，包括第二像素损失L_dpix、第二对抗损失L_dfeat和内容特征损失L_dadv，公式为：The de-stylized generative adversarial network loss L ₃ includes the second pixel loss L _dpix , the second adversarial loss L _dfeat and the content feature loss L _dadv , the formula is:

进一步地，在其中一个实施例中，上述步骤4中训练双向生成对抗网络的目标是利用梯度下降法最小化上述损失函数L的值。梯度下降法是利用凸函数的特点，将凸函数的参数沿着梯度相反的方向移动一个步长就能实现函数值的下降。通过不断地迭代重复，最终找到凸函数的局部最小值。在此过程中参数θ的更新公式是：Further, in one of the embodiments, the goal of training the bidirectional generative adversarial network in the above step 4 is to use the gradient descent method to minimize the value of the above-mentioned loss function L. The gradient descent method utilizes the characteristics of convex functions, and moves the parameters of the convex function in the opposite direction of the gradient by one step to realize the decrease of the function value. By repeating iteratively, the local minimum of the convex function is finally found. The update formula of parameter θ in this process is:

式中，θ代表参数，η代表迭代步长，J(θ)代表损失函数。In the formula, θ represents the parameter, η represents the iteration step size, and J(θ) represents the loss function.

结合图5，训练双向生成对抗网络的具体过程包括：Combined with Figure 5, the specific process of training a bidirectional generative adversarial network includes:

进一步地，在其中一个实施例中，上述步骤5将内容图像和风格图像输入至训练好的双向生成对抗网络，生成刻蚀字符图像，具体包括：Further, in one of the embodiments, the above-mentioned step 5 inputs the content image and the style image into the trained bidirectional generative adversarial network to generate the etched character image, which specifically includes:

综上，本发明通过生成对抗网络生成大量刻蚀字符图像，在样本规模小的情况下依旧可以获得充足的训练样本，相比较于人工采集样本更加快速高效，且生成的刻蚀字符图像更逼真，提高了利用深度学习方法识别刻蚀字符的精度。To sum up, the present invention generates a large number of etched character images through a generative adversarial network, and sufficient training samples can still be obtained in the case of a small sample size. Compared with manual sample collection, the present invention is faster and more efficient, and the generated etched character images are more realistic. , which improves the accuracy of recognizing etched characters using deep learning methods.

Claims

1. an etching character recognition network training sample augmentation method, is characterized in that, described method comprises the following steps:

Step 1, collect the etched character image in the scene;

Step 2, generating a content image and a style image according to the etched character image;

Step 3, build a bidirectional generative adversarial network;

Step 4, training the bidirectional generative adversarial network;

Step 5: Input the content image and the style image into the trained bidirectional generative adversarial network to generate the etched character image.

2. The etched character recognition network training sample augmentation method according to claim 1, wherein the step 2 generates a content image according to the etched character image, specifically comprising:

Step 2-1, marking the text information of the etched character image;

Step 2-2, count character information according to the real label of the marked etched character image;

Step 2-3, generating content images of multiple fonts according to the character information.

3. The etched character recognition network training sample augmentation method according to claim 1 and 2, wherein the step 2 generates a style image according to the etched character image, specifically: generating a style according to the etched character image feature image.

4. The etched character recognition network training sample augmentation method according to claim 3, wherein the step 2 generates a style image according to the etched character image, specifically comprising:

From the collected etched character images, an image whose resolution meets the first preset condition and or whose definition meets the second preset condition and or whose feature significance meets the third preset condition is selected as a style image.

5. etched character recognition network training sample augmentation method according to claim 4, is characterized in that, described in step 3 constructs bidirectional generative confrontation network, specifically comprises:

Step 3-1, build a stylized generative adversarial network;

Step 3-2, build a de-stylized generative adversarial network;

Step 3-3, construct the loss function.

6. The etched character recognition network training sample augmentation method according to claim 5, wherein the stylized generative adversarial network described in step 3-1 comprises: a stylized generative network and a stylized discrimination network;

The stylized generation network, whose input is a content image and a style image, and an output is a stylized character image; the stylized generation network includes: a content encoder E _x1 , whose input is a content image, and the output is a content feature vector; a style encoder E _x2 , the input is a style image, and the output is a style feature vector; the generator G _x , whose input is the content feature vector and the style feature vector, and the output is a stylized character image;

The input of the stylized discriminating network is the stylized character image or the real etched character image, and the output is a number between 0 and 1, which is used to represent the probability that the input image is a real image.

7. The etched character recognition network training sample augmentation method according to claim 6, wherein the de-stylized generative adversarial network described in step 3-2 comprises: a de-stylized generation network and a de-stylized discriminant network ;

The de-stylized generation network, whose input is the stylized character image, and the output is the de-stylized character image; the de-stylized generation network includes: a first encoder E _y , whose input is the stylized character image, and the output is is a feature vector; the second generator G _y , its input is the feature vector output by the first encoder E _y , and the output is a de-stylized character image;

The de-stylized discriminant network has the input of the de-stylized character image or the real content image, and the output is a number between 0 and 1, which is used to represent the probability that the input image is a real image.

8. The method for augmenting etched character recognition network training samples according to claim 7, wherein the loss function L in step 3-3 comprises: content image reconstruction loss L ₁ , stylized generative adversarial network loss L ₂ and de-stylized GAN loss L ₃ , formulated as:

L=L ₁ +L ₂ +L ₃

The content image reconstruction loss L ₁ is used to ensure that the content encoder E _x1 can extract the core information of the content image, and the formula is:

In the formula, x represents the input content image, λ _x represents the weight of the content image reconstruction loss, and the value of λ _x ranges from 0 to 1;

The stylized generative adversarial network loss L ₂ includes the first pixel loss L _spix and the first adversarial loss L _sadv , the formula is:

L ₂ =λ _x1 L _spix +λ _x2 L _sadv

where L _spix represents the first pixel loss, L _sadv represents the first adversarial loss, λ _x1 and λ _x2 represent the weights of L _{spix and L sadv} _, respectively, and the value range of λ _x1 and λ _x2 is 0 to 1;

Among them, the calculation formula of the first pixel loss L _spix is as follows:

In the formula, x and y represent the input content image and style image, respectively, and y' represents the image generated by the stylized generation network;

The calculation formula of the first adversarial loss L _sadv is as follows:

In the formula,

The de-stylized generative adversarial network loss L ₃ includes the second pixel loss L _dpix , the second adversarial loss L _dfeat and the content feature loss L _dadv , the formula is:

L ₃ =λ _y1 L _dpix +λ _y2 L _dadv +λ _y3 L _dfeat

In the formula, λ _y1 , λ _y2 , and λ _y3 are the weights of L _dpix , L _dadv , and L _dfeat respectively, and the values range from 0 to 1;

Among them, the calculation formula of the second pixel loss L _dpix is as follows:

The calculation formula of the second adversarial loss L _dfeat is as follows:

The formula for calculating the content loss L _dadv is as follows:

9. The etched character recognition network training sample augmentation method according to claim 8, wherein in step 4, the two-way generative adversarial network is trained, and the specific process comprises:

Step 4-1, initialize the parameters and the number of iterations of the bidirectional generative adversarial network;

Step 4-2, input the content image to the content encoder of the stylized generative adversarial network, input the features output by the content encoder to the stylized generative network, calculate the loss function, and use the gradient descent method to update the parameters of the content encoder ;

Step 4-3, input the style image to the de-stylization generation network to generate a fake content image;

Steps 4-4, respectively input the real content image and the fake content image to the de-stylized discriminating network, calculate the loss function and update the parameters of the de-stylizing discriminant network by using the gradient descent method;

Step 4-5, input the fake content image to the de-stylized discriminating network, calculate the loss function, and use the gradient descent method to update the network parameters of the de-stylizing generation network;

Steps 4-6, input the content image and the style image into the stylization generation network to generate a fake style image;

Steps 4-7, respectively input the real style image and the fake style image to the stylized discriminant network, calculate the loss function and update the parameters of the stylized discriminant network by using the gradient descent method;

Steps 4-8, input the fake style image to the stylized discriminating network, calculate the loss function, and use the gradient descent method to update the network parameters of the stylized generation network;

Step 4-9, determine whether the current number of iterations is less than the set threshold, if so, repeat steps 4-2 to 4-8; otherwise, end the training of the bidirectional generative adversarial network.

10. etched character recognition network training sample augmentation method according to claim 9, it is characterized in that, described in step 5, content image and style image are input into trained bidirectional generative adversarial network, generate etched character image, Specifically include:

Step 5-1, input the content image and the style image into the trained stylized generation network to generate the etched character image;

In step 5-2, the generated etched character images are screened, and images that do not meet the preset requirements are deleted.