CN113903043B

CN113903043B - Method for identifying printed Chinese character font based on twin metric model

Info

Publication number: CN113903043B
Application number: CN202111512767.9A
Authority: CN
Inventors: 钟乐海; 闫飞; 李礁; 王荣海; 赵红军
Original assignee: Mianyang Polytechnic
Current assignee: Mianyang Polytechnic
Priority date: 2021-12-11
Filing date: 2021-12-11
Publication date: 2022-05-06
Anticipated expiration: 2041-12-11
Also published as: CN113903043A

Abstract

The invention discloses a method for identifying a printed Chinese character font based on a twin metric model, which comprises the steps of manufacturing a data set of the printed Chinese character font, including a series of images with different fonts, and constructing a training set and a testing set; training a twin metric model by using a training set, wherein the twin metric model comprises two twin subnetworks with the same structure and shared weight, and acquiring a contrast loss function according to a characteristic vector output by the twin subnetworks; continuously sending different images in the training set into any twin subnetwork of the twin metric model to obtain a plurality of characteristic vectors, and constructing a font space by using a contrast loss function; inputting the font image to be detected into the twin metric model to obtain the characteristic vector, and acquiring the font type of the font to be detected from the font space by using the K nearest neighbor algorithm. The method can improve the recognition precision and robustness of the Chinese character font, and simultaneously ensure that the recognition model is smaller and the operation speed is higher.

Description

Method for identifying printed Chinese character font based on twin metric model

Technical Field

The invention belongs to the technical field of font identification methods, and particularly relates to a method for identifying a font of a printed Chinese character based on a twin metric model.

Background

A Chinese character image can be converted into a Chinese character code by using an OCR (optical character recognition) technology, so that the Chinese character code is input into a computer to realize the digitization of a paper document. OCR has therefore been a focus of research in pattern recognition, image processing, and machine learning since the 80's of the last century. Through the development of many years, the recognition of the printed Chinese characters has achieved better effect at present, and the wide application is started, such as Hanwang OCR, Baidu OCR and Ali OCR. However, the existing Chinese OCR system can only read characters, the format information is lost in the input process, and the original layout cannot be completely recovered. In order to automatically recover the font when the layout is restored, the font needs to be identified, and the font identification is an important premise for realizing 'original text reproduction'. The Chinese character font is one of the difficulties in the mode recognition field because of the characteristics of multiple Chinese character categories, variable styles, multiple similar characters and the like.

The method adopted for Chinese character font identification at present mainly comprises an integral analysis method and an individual analysis method. The entire analysis method uses a layout area as an analysis target, and this method cannot recognize that a plurality of types of font information are contained in the layout area. The individual analysis method takes a single character as an analysis object, and the method can avoid the problems of the overall analysis method in multi-font identification, but the characteristics based on manual screening are difficult to be optimal in the aspects of identification accuracy and efficiency. With the continuous development of deep learning technology, CNN is highlighted in the field of computer vision application research, such as target detection, image classification, and the like, as a mainstream network architecture of deep learning. However, most of the existing methods are used for positioning the character region, extracting and identifying the character region of the natural scene, and cannot accurately and reliably identify the font of the printed Chinese character. Moreover, because the structure of the Chinese characters is complex, and different types of calligraphy fonts are evolved in different calligraphy players, the existing deep learning mode is difficult to deal with the various types of classification tasks, and even if the same accuracy can be achieved, the model is quite huge.

Disclosure of Invention

In order to solve the problems, the invention provides a method for identifying the font of the printed Chinese character based on a twin metric model, which can improve the identification precision and robustness of the font of the Chinese character and ensure that the identification model is smaller and the operation speed is higher.

In order to achieve the purpose, the invention adopts the technical scheme that: a method for identifying a font of a printed Chinese character based on a twin metric model comprises the following steps:

s10, making a printed Chinese font data set comprising a series of images with different fonts, and constructing a training set and a testing set;

s20, training a twin metric model by using a training set, wherein the twin metric model comprises two twin subnetworks with the same structure and shared weight, and acquiring a contrast loss function according to a feature vector output by the twin subnetworks; continuously sending different images in the training set into any twin subnetwork of the twin metric model to obtain a plurality of characteristic vectors, and constructing a font space by using a contrast loss function;

and S30, inputting the font image to be detected into the twin metric model to obtain the characteristic vector, and acquiring the font type of the font to be detected from the font space by using the K nearest algorithm.

Further, a training set is used for training a twin metric model, two font images simultaneously pass through two twin subnetworks which have the same structure and are shared by weight, and a first feature vector and a second feature vector are output; calculating the distance between the first characteristic vector and the second characteristic vector to obtain a contrast loss function; judging whether the two images are similar according to a contrast loss function, and enabling the first feature vector and the second feature vector output by the two twin subnetworks to be close to each other in the font space, otherwise, enabling the first feature vector and the second feature vector to be far away; continuously sending different images in the training set into any twin subnetwork of the twin metric model, obtaining a plurality of characteristic vectors through forward propagation, and constructing a font space.

The twin subnetwork comprises a decomposition convolution character characteristic extraction network and an embedding module, wherein the decomposition convolution character characteristic extraction network is responsible for extracting a bottleneck characteristic vector of a character image and inputting the bottleneck characteristic vector into the embedding module to obtain an embedded characteristic vector.

Furthermore, the decomposed convolution font feature extraction network does not have a full connection layer and an output layer, and is directly connected with the embedded module with the full connection structure.

Further, two twin subnetworks with the same structure and shared weight are utilized in the twin metric model to output a first embedded feature vector and a second embedded feature vector; performing distance calculation on the first embedded characteristic vector and the second embedded characteristic vector to obtain a contrast loss function; and judging whether the two images are similar according to the contrast loss function, and enabling the embedded feature vectors output by the two twin subnetworks to be close to each other in the embedded font space, otherwise, enabling the embedded feature vectors to be far away.

Further, training again, continuously sending different images in the training set into any twin subnetwork of the twin metric model to obtain a plurality of embedded characteristic vectors, obtaining a plurality of embedded characteristic vectors through forward propagation, and constructing an embedded font space.

Furthermore, in the embedded font space, if the input images belong to the same font, the distance between the two embedded feature vectors is close in the embedded space, otherwise, the distance is far; each embedded feature vector is classified into a block type according to the font to which the embedded feature vector belongs.

Further, the decomposed convolution font feature extraction network comprises a normalization processing module, a feature extraction module, multiple types of inclusion modules and an average pooling layer which are connected in sequence, and the input font image sequentially passes through the normalization processing module, the feature extraction module, the multiple types of inclusion modules and the average pooling layer to output bottleneck feature vectors.

Further, the embedded module is a fully-connected neural network and comprises an input layer, a hidden layer and an output layer; the input layer receives a one-dimensional characteristic vector obtained by leveling the bottleneck characteristic vector; and outputting the two-dimensional embedded characteristic vector through the hidden layer and the output layer.

Further, the distance acquisition loss function between the feature vectors E1 and E2 of the two twin subnetwork outputs is:

；

where N is the number of sample pairs, y =0 indicates that the two twin subnetwork input samples are different samples, and y =1 indicates that the input samples are similar samples;

representing the inter-sample distance.

The beneficial effects of the technical scheme are as follows:

in the font identification task, the method utilizes the twin network to judge the similarity, simultaneously can obtain the embedded characteristics fed back by the twin network to be cached in the embedded space, only needs to measure the Euclidean distance between the judgment target and the embedded space characteristics, has quite excellent performance for the original text reproduction needing high-frequency execution, and has good generalization capability and higher accuracy and robustness than other convolution neural networks.

The method establishes the twin metric model to participate in font identification, can obtain quite good learning effect based on a small sample set, and obtains accurate Chinese character font identification.

The method can cope with the complex structure of the Chinese character, can identify different types of calligraphy fonts evolved from different calligraphy owners, can cope with various types of classification tasks, and can ensure smaller identification model and faster operation rate while improving the identification precision and robustness of the Chinese character fonts.

Drawings

FIG. 1 is a schematic flow chart of a method for identifying a font of a printed Chinese character based on a twin metric model according to the present invention;

FIG. 2 is a schematic diagram illustrating a method for identifying a font of a printed Chinese character based on a twin metric model according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a decomposed convolutional font feature extraction network according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an embedded module according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described with reference to the accompanying drawings.

In this embodiment, referring to fig. 1 and fig. 2, the present invention provides a method for identifying a printed chinese character font based on a twin metric model, comprising the steps of:

When creating a production printed Chinese font data set comprising a series of images having different fonts, a training set and a test set are constructed: and (3) making a data set of the character volume of the printing Chinese characters, generating a printing character picture by using a PIL tool according to 3755 common Chinese characters in a primary character library and a Chinese character coding table and a character file, wherein the image size is 299 x 299 pixels. The model training and testing can be carried out by adopting printing fonts such as regular script, Song style, black body, Song style imitation, clerical script, young circle, Xinwei Huawen and the like. And randomly selecting 80% of pictures in the printed Chinese character font data set to form a training data set, and forming a test data set by using pictures which are not selected in all font subclasses for evaluating the later algorithm performance.

As an optimization scheme 1 of the above embodiment, a training set is used to train a twin metric model, two font images simultaneously pass through two twin subnetworks with the same structure and shared weight, and a first feature vector and a second feature vector are output; calculating the distance between the first characteristic vector and the second characteristic vector to obtain a contrast loss function; judging whether the two images are similar according to a contrast loss function, and enabling the first feature vector and the second feature vector output by the two twin subnetworks to be close to each other in the font space, otherwise, enabling the first feature vector and the second feature vector to be far away; continuously sending different images in the training set into any twin subnetwork of the twin metric model, obtaining a plurality of characteristic vectors through forward propagation, and constructing a font space.

As an optimization scheme 2 of the above embodiment, the twin subnetwork includes a decomposed convolutional font feature extraction network and an embedding module, where the decomposed convolutional font feature extraction network is responsible for extracting a bottleneck feature vector of a font image, and inputs the bottleneck feature vector to the embedding module to obtain an embedded feature vector. The decomposed convolution font characteristic extraction network does not have a full connection layer and an output layer and is directly connected with an embedded module of a full connection structure.

Outputting a first embedded characteristic vector and a second embedded characteristic vector by utilizing two twin subnetworks which have the same structure and are shared by weights in the twin metric model; performing distance calculation on the first embedded characteristic vector and the second embedded characteristic vector to obtain a contrast loss function; and judging whether the two images are similar according to the contrast loss function, and enabling the embedded feature vectors output by the two twin subnetworks to be close to each other in the embedded font space, otherwise, enabling the embedded feature vectors to be far away.

And (4) training again, continuously sending different images in the training set into any twin subnetwork of the twin metric model to obtain a plurality of embedded characteristic vectors, obtaining a plurality of embedded characteristic vectors through forward propagation, and constructing an embedded font space. If the input images belong to the same font, the distance between the two embedded characteristic vectors in the embedded space is close, otherwise, the distance is far; each embedded feature vector is classified into a block type according to the font to which the embedded feature vector belongs.

As an optimization scheme 3 of the foregoing embodiment, as shown in fig. 3, the decomposed convolution font feature extraction network includes a normalization processing module, a feature extraction module, multiple types of inclusion modules, and an average pooling layer, which are connected in sequence, and the input font image sequentially passes through the normalization processing module, the feature extraction module, the multiple types of inclusion modules, and the average pooling layer to output a bottleneck feature vector.

In the decomposed convolution font feature extraction network, images 299 x 3 are normalized, three convolution layers are sequentially passed through, convolution kernels and step sizes are respectively 3 × 3/2, 3 × 3/1 and 3 × 3/1, and then a feature map C1 is obtained through 3 × 3/2 pooling operation, wherein the size of the feature map C1 is 73 × 73 × 64.

And sequentially passing the feature map C1 through three convolutional layers, wherein the convolutional cores and the step sizes are respectively 3 × 3/1, 3 × 3/2 and 3 × 3/1, so as to obtain a feature map C2 with the size of 35 × 35 × 288.

And sequentially passing the characteristic diagram C2 through three types of inclusion modules to obtain the characteristic diagram C3 with the size of 8 × 8 × 2048.

Finally, C3 passes through the average pooling layer to obtain a bottleneck characteristic of 1 × 1 × 2048.

As an optimization scheme 4 of the above embodiment, as shown in fig. 4, the embedded module is a fully-connected neural network, and includes an input layer, a hidden layer, and an output layer; the input layer receives a one-dimensional characteristic vector obtained by leveling the bottleneck characteristic vector; and outputting the two-dimensional embedded characteristic vector through the hidden layer and the output layer.

The embedded module is a three-layer simple fully-connected neural network. The input layer is 2048 neural units, and receives one-dimensional feature vectors obtained by performing flatten operation on the bottleneck feature vectors. The hidden layer is 512 nerve units, and the output layer is 2 nerve units.

As an optimization scheme 4 of the above embodiment, the distance acquisition loss function between the feature vectors E1 and E2 output by the two twin subnetworks is:

；

the inter-sample distance is expressed by using Euclidean distance:

；

e is a self-defined threshold, and when the samples are different and the Euclidean distance is smaller than e, the distance is included in a loss function for calculation, and the situations that the glyphs are different and similar are processed. E can be adjusted as needed to control the degree of recognition.

For a better understanding of the invention, the working principle of the various parts of the invention is explained below:

the process of training the twin metric model by using the training set to obtain the twin metric model is as follows:

firstly, a decomposed convolution font feature extraction network is constructed as a bottleneck feature extraction module, and the strong feature extraction capability of a trained convolution neural network model is applied to a printing Chinese character font recognition task. The feature extraction module converts the printed Chinese character font image into a one-dimensional feature vector. And (4) the feature vectors of the printed Chinese character font images are sent into a twin network to obtain two-dimensional embedded space features, so that the dimension reduction of the features is completed. In the embedded space, a KNN classifier is adopted to classify the images of the printed Chinese characters, and finally the Chinese character font identification is realized.

The invention adopts a twin network structure, is convenient for comparing the characteristics of the same type of fonts and different types of fonts when the paired images of the printed Chinese character fonts are trained, provides the recognition capability of the similar fonts and provides the recognition precision of the Chinese character fonts. Meanwhile, a twin network is adopted, and the interaction between the two network nodes is utilized, so that a quite good learning effect can be obtained based on a small sample set. The recognition model has a small structure while guaranteeing the recognition precision, and improves the operation speed.

The method can accurately extract the bottleneck characteristic vector of the font image by utilizing the decomposed and convolved font characteristic extraction network, and can effectively provide the capability of identifying the font similarity by mutually freezing the decomposed and convolved font characteristic extraction networks in the two networks.

According to the method, the bottleneck characteristic is embedded into the characteristic extraction process, so that the high-dimensional characteristic can be effectively reduced in dimension, the representativeness of the characteristic is improved, the complexity of measurement calculation is reduced, and the identification capability of the font similarity can be effectively provided by sharing the weight.

The method utilizes the measurement calculation of the embedded characteristics, evaluates the similarity between the characteristics of the Chinese character font samples through the Euclidean distance, and uses the similarity as the basis of font identification.

According to the method, the contrast loss function is calculated through the Euclidean distance of the samples, so that the characteristics of the similar font samples are as close as possible and the characteristics of the different font samples are as far away as possible in the training process of the model, and the clustering is facilitated.

The invention adopts the training model to transfer the characteristic extraction capability to the recognition task of the printed Chinese character font, thereby reducing the sample scale of the data set. The invention adopts a hybrid model to realize a supervised learning mode of feature compression and clustering.

Under the condition that training is completed, the detection precision reaches 99.19%. The method has higher detection precision, can accurately identify the font format of the printed Chinese characters, and can be applied to the following aspects: the font information is used for assisting the recognition engine, so that a higher recognition rate is obtained; can be used to distinguish different typesetting and article structures; and the recovery of the document format information is facilitated.

The foregoing shows and describes the general principles and features of the present invention, together with the advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A method for identifying a printed Chinese character font based on a twin metric model is characterized by comprising the following steps:

s30, inputting the font image to be detected into the twin metric model to obtain a characteristic vector, and acquiring the font type of the font to be detected from the font space by using the K nearest algorithm;

the twin subnetwork comprises a decomposed convolution font feature extraction network and an embedding module, wherein the decomposed convolution font feature extraction network is responsible for extracting bottleneck feature vectors of font images and inputting the bottleneck feature vectors into the embedding module to obtain embedded feature vectors;

the character extraction network comprises a normalization processing module, a character extraction module, a plurality of types of inclusion modules and an average pooling layer which are connected in sequence, and an input character image outputs a bottleneck characteristic vector through the normalization processing module, the character extraction module, the plurality of types of inclusion modules and the average pooling layer in sequence;

the embedded module is a fully-connected neural network and comprises an input layer, a hidden layer and an output layer; the input layer receives a one-dimensional characteristic vector obtained by leveling the bottleneck characteristic vector; and outputting the two-dimensional embedded characteristic vector through the hidden layer and the output layer.

2. The method for identifying the fonts of the printed Chinese characters based on the twin metric model as claimed in claim 1, wherein the twin metric model is trained by a training set, two font images simultaneously pass through two twin subnetworks with the same structure and shared weight, and a first feature vector and a second feature vector are output; calculating the distance between the first characteristic vector and the second characteristic vector to obtain a contrast loss function; judging whether the two images are similar according to a contrast loss function, and enabling the first feature vector and the second feature vector output by the two twin subnetworks to be close to each other in the font space, otherwise, enabling the first feature vector and the second feature vector to be far away; continuously sending different images in the training set into any twin subnetwork of the twin metric model, obtaining a plurality of characteristic vectors through forward propagation, and constructing a font space.

3. The method of claim 2, wherein the decomposed convolutional font feature extraction network does not have a fully connected layer and an output layer, and is directly connected to the fully connected embedded module.

4. The method for identifying printed Chinese character fonts based on the twin metric model as claimed in claim 1, wherein two twin subnetworks with the same structure and shared weight are used in the twin metric model to output a first embedded feature vector and a second embedded feature vector; performing distance calculation on the first embedded characteristic vector and the second embedded characteristic vector to obtain a contrast loss function; and judging whether the two images are similar according to the contrast loss function, and enabling the embedded feature vectors output by the two twin subnetworks to be close to each other in the embedded font space, otherwise, enabling the embedded feature vectors to be far away.

5. The method for identifying printed Chinese character fonts based on twin metric models as claimed in claim 4, wherein the training is repeated, different images in the training set are continuously sent into any twin subnetwork of the twin metric models to obtain a plurality of embedded feature vectors, and the embedded feature vectors are obtained through forward propagation to construct the embedded font space.

6. The method for identifying printed Chinese character fonts based on the twin metric model as recited in claim 5, wherein in the embedded font space, the comparison loss function is used for judging, if the input images belong to the same font, the distance between the two embedded feature vectors is close in the embedded space, otherwise, the distance is far; each embedded feature vector is classified into a block type according to the font to which the embedded feature vector belongs.

7. A method for identifying printed Chinese character fonts based on twin metric models as claimed in claim 1, 2 or 4, characterised in that the distance acquisition penalty function between the feature vectors E1 and E2 of the two twin subnetworks is:

；

representing the distance between samples, e is the custom threshold.