CN113139629A

CN113139629A - Font identification method and device, electronic equipment and storage medium

Info

Publication number: CN113139629A
Application number: CN202010048253.1A
Authority: CN
Inventors: 陆瑾; 熊龙飞; 陈帝光; 鲜晴羽
Original assignee: Zhuhai Kingsoft Office Software Co Ltd; Wuhan Kingsoft Office Software Co Ltd
Current assignee: Zhuhai Kingsoft Office Software Co Ltd; Wuhan Kingsoft Office Software Co Ltd
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2021-07-20

Abstract

The embodiment of the invention provides a font identification method, a font identification device, electronic equipment and a storage medium, wherein the method comprises the following steps: inputting a picture to be recognized containing a plurality of characters to be recognized into a first neural network model trained in advance to obtain position coordinates of the characters to be recognized in the picture to be recognized; dividing an image to be recognized based on the position coordinates of a plurality of characters to be recognized to obtain a plurality of single character pictures; respectively converting a plurality of single character pictures into gray level pictures as to-be-processed gray level pictures; respectively carrying out local binarization processing on a plurality of gray level images to be processed to obtain binary images serving as binary images to be input; and respectively inputting the multiple binary images to be input into a pre-trained second neural network model to obtain the fonts of the characters to be recognized contained in each binary image. By adopting the method provided by the embodiment of the invention, the accuracy rate of font identification is improved.

Description

Font identification method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a font identification method and apparatus, an electronic device, and a storage medium.

Background

The fonts of characters on many pictures are often found interesting in daily life, but the fonts of the characters are difficult to determine because the characters are on the pictures. In order to determine the font of these words, font recognition is very necessary.

The existing identification method mainly comprises the steps of inputting an image containing characters to be identified into a character recognition model, extracting character features of the characters to be identified contained in the input image by the character recognition model, and further comparing the character features with the existing characters to realize character recognition. However, the existing font recognition method is good for the recognition result of the characters contained in the picture with a single picture background, but for the picture with a complex picture background, the character features in the picture become less obvious under the influence of the complex picture background. The complicated picture background can influence the extraction of character features in the picture, so that the recognition result is poor, and the accuracy of font recognition is low.

Disclosure of Invention

An embodiment of the present invention provides a font identification method, apparatus, electronic device and storage medium, so as to solve the problem of low accuracy of font identification.

In order to achieve the above object, an embodiment of the present invention provides a font identification method, including:

inputting a picture to be recognized containing a plurality of characters to be recognized into a pre-trained first neural network model to obtain position coordinates of the characters to be recognized in the picture to be recognized, wherein the first neural network model is obtained by training based on a first training sample set, and the first training sample set comprises: a first sample picture containing first sample characters, and position coordinates of the first sample characters in the first sample picture;

dividing the image to be recognized based on the position coordinates of the characters to be recognized to obtain a plurality of single character pictures, wherein each single character picture comprises one character to be recognized;

respectively converting the single character pictures into gray level pictures as to-be-processed gray level pictures;

respectively carrying out local binarization processing on the multiple gray level images to be processed to obtain binary images serving as binary images to be input;

respectively inputting a plurality of binary images to be input into a pre-trained second neural network model to obtain the fonts of characters to be recognized contained in each binary image, wherein the second neural network model is obtained by training based on a second training sample set, and the second training sample set comprises: a second sample picture containing a second sample text, and a font of the second sample text that is known.

Further, the converting the multiple single character pictures into gray-scale maps respectively as to-be-processed gray-scale maps includes:

converting the single character picture into a gray scale picture aiming at each single character picture;

determining whether the gray scale map meets an image color inversion condition or not for each gray scale map;

when the gray level image meets the image color inversion condition, performing color inversion on the gray level image to obtain an image with the color inverted as a gray level image to be processed; otherwise, the gray-scale image is taken as a gray-scale image to be processed.

Further, the determining whether the gray scale map satisfies an image color inversion condition for each gray scale map includes:

calculating the mean value of pixel values of all pixel points of the gray image as a first mean value aiming at each gray image;

calculating the mean value of the pixel values of the vertex pixels of the gray level image to be used as a second mean value;

and when the first average value is larger than the second average value, determining that the gray-scale image meets the image color inversion condition.

Further, the performing local binarization processing on the multiple to-be-processed gray level maps to obtain binary maps as to-be-input binary maps includes:

performing local binarization processing on each gray level image to be processed to obtain a binary image;

cutting a black area containing the characters to be recognized in the binary image to obtain a circumscribed rectangular image containing the characters to be recognized;

and aiming at each circumscribed histogram, adding a preset number of pixel points around a pixel point matrix of the circumscribed histogram to obtain a new circumscribed histogram serving as a binary image to be input, wherein the pixel values of the preset number of pixel points are all preset pixel values.

Further, the following steps are adopted to obtain the first neural network model based on the training of the first training sample set, and the method includes:

inputting the first sample picture containing the first sample characters contained in the first training sample set into a first to-be-trained neural network model to obtain an output result of the first to-be-trained neural network model;

determining whether a preset training ending condition is met;

if yes, determining the current first to-be-trained neural network model as the trained first neural network model;

and if not, adjusting the first to-be-trained neural network model to obtain a new first to-be-trained neural network model, and starting next training.

Further, training the second neural network model based on a second training sample set by using the following steps, including:

training a second neural network model to be trained for a preset number of times based on the second sample picture containing the second sample characters and the known font of the second sample characters contained in the second training sample set;

calculating the accuracy of a second to-be-trained neural network model by using a test set aiming at the second to-be-trained neural network model after each training, wherein the second to-be-trained neural network model after each training corresponds to one accuracy;

and selecting the trained second neural network model corresponding to the maximum accuracy as the trained second neural network model.

In order to achieve the above object, an embodiment of the present invention further provides a font identification apparatus, including:

the detection module is used for inputting a picture to be recognized containing a plurality of characters to be recognized into a pre-trained first neural network model to obtain position coordinates of the characters to be recognized in the picture to be recognized, wherein the first neural network model is obtained by training based on a first training sample set, and the first training sample set comprises: a first sample picture containing first sample characters, and position coordinates of the first sample characters in the first sample picture;

the image segmentation module is used for segmenting the image to be recognized based on the position coordinates of the characters to be recognized to obtain a plurality of single character pictures, wherein each single character picture comprises one character to be recognized;

the gray level conversion module is used for respectively converting the single character pictures into gray level pictures as to-be-processed gray level pictures;

the binarization module is used for respectively carrying out local binarization processing on the multiple gray level images to be processed to obtain binary images serving as binary images to be input;

the recognition module is configured to input the multiple binary images to be input into a pre-trained second neural network model respectively to obtain a font of a character to be recognized included in each binary image, where the second neural network model is obtained by training based on a second training sample set, and the second training sample set includes: a second sample picture containing a second sample text, and a font of the second sample text that is known.

Further, the gray scale conversion module includes:

the first conversion sub-module is used for converting the single character picture into a gray scale image aiming at each single character picture;

the second conversion submodule is used for determining whether the gray map meets the image color inversion condition or not for each gray map; when the gray level image meets the image color inversion condition, performing color inversion on the gray level image to obtain an image with the color inverted as a gray level image to be processed; otherwise, the gray-scale image is taken as a gray-scale image to be processed.

Further, the second conversion sub-module is specifically configured to calculate, for each of the grayscale images, an average value of pixel values of all pixel points of the grayscale image, and use the average value as a first average value; calculating the mean value of the pixel values of the vertex pixels of the gray level image to be used as a second mean value; and when the first average value is larger than the second average value, determining that the gray-scale image meets the image color inversion condition.

Further, the binarization module is specifically configured to perform local binarization processing on each to-be-processed gray scale map to obtain a binary map; cutting a black area containing the characters to be recognized in the binary image to obtain a circumscribed rectangular image containing the characters to be recognized; and aiming at each circumscribed histogram, adding a preset number of pixel points around a pixel point matrix of the circumscribed histogram to obtain a new circumscribed histogram serving as a binary image to be input, wherein the pixel values of the preset number of pixel points are all preset pixel values.

Further, the apparatus further includes:

the first model training module is used for training based on a first training sample set to obtain the first neural network model by adopting the following steps:

determining whether a preset training ending condition is met;

Further, the apparatus further includes:

the second model training module is used for training based on a second training sample set to obtain the second neural network model by adopting the following steps:

In order to achieve the above object, an embodiment of the present invention provides an electronic device, which includes a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface are configured to complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing any one of the steps of the font identification method when executing the program stored in the memory.

In order to achieve the above object, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above-mentioned font identification method steps.

To achieve the above object, an embodiment of the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to perform any of the above-mentioned font identification method steps.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a font identification method, which comprises the steps of inputting a picture to be identified containing a plurality of characters to be identified into a first neural network model trained in advance to obtain position coordinates of the characters to be identified in the picture to be identified; dividing an image to be recognized based on the position coordinates of a plurality of characters to be recognized to obtain a plurality of single character pictures; respectively converting a plurality of single character pictures into gray level pictures as to-be-processed gray level pictures; respectively carrying out local binarization processing on a plurality of gray level images to be processed to obtain binary images serving as binary images to be input; and respectively inputting the multiple binary images to be input into a pre-trained second neural network model to obtain the fonts of the characters to be recognized contained in each binary image. By adopting the method provided by the embodiment of the invention, the position coordinates of the characters to be recognized in the picture to be recognized are determined through the pre-trained first neural network model, and then the picture to be recognized is segmented according to the determined position coordinates of the characters to be recognized to obtain a plurality of single character pictures; local binarization processing is further carried out on the gray level images of the single character images, and background interference of characters to be recognized in the images to be recognized is removed through the first neural network model and the local binarization processing; and further, inputting a plurality of binary images to be input into a pre-trained second neural network model to obtain the fonts of the characters to be recognized contained in each binary image. By combining the neural network model and the local binarization processing, the font identification is realized under the complex picture background, and the accuracy of the font identification is improved.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a font identification method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another method for font identification according to an embodiment of the present invention;

FIG. 3 is a diagram of a picture to be recognized according to an embodiment of the present invention;

FIG. 4 is a diagram of a plurality of single character pictures according to an embodiment of the present invention;

FIG. 5 is a flowchart of determining a color inversion condition of an image according to an embodiment of the present invention;

fig. 6 is a flowchart of performing local binarization processing on a grayscale image to be processed according to an embodiment of the present invention;

FIG. 7 is a flowchart of training a first neural network model according to an embodiment of the present invention;

FIG. 8 is a flowchart of training a second neural network model according to an embodiment of the present invention;

fig. 9a is a schematic structural diagram of a font identification apparatus according to an embodiment of the present invention;

fig. 9b is a schematic structural diagram of another font identification apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a font identification method, which comprises the following steps as shown in figure 1:

step 101, inputting a to-be-recognized picture containing a plurality of to-be-recognized characters into a pre-trained first neural network model to obtain position coordinates of the plurality of to-be-recognized characters in the to-be-recognized picture, wherein the first neural network model is obtained by training based on a first training sample set, and the first training sample set comprises: the first sample picture containing the first sample text, and the position coordinates of the first sample text in the first sample picture.

And 102, segmenting an image to be recognized based on the position coordinates of the characters to be recognized to obtain a plurality of single character pictures, wherein each single character picture comprises one character to be recognized.

And 103, converting the multiple single character pictures into gray maps serving as gray maps to be processed respectively.

And 104, respectively carrying out local binarization processing on the multiple gray level images to be processed to obtain binary images serving as binary images to be input.

Step 105, respectively inputting a plurality of binary images to be input into a pre-trained second neural network model to obtain fonts of characters to be recognized, wherein each binary image comprises the fonts, the second neural network model is obtained by training based on a second training sample set, and the second training sample set comprises: a second sample picture containing a second sample text, and a font of the second sample text that is known.

By adopting the method provided by the embodiment of the invention, the position coordinates of the characters to be recognized in the picture to be recognized are determined through the pre-trained first neural network model, and then the picture to be recognized is segmented according to the determined position coordinates of the characters to be recognized to obtain a plurality of single character pictures; local binarization processing is further carried out on the gray level images of the single character images, and background interference of characters to be recognized in the images to be recognized is removed through the first neural network model and the local binarization processing; and further, inputting a plurality of binary images to be input into a pre-trained second neural network model to obtain the fonts of the characters to be recognized contained in each binary image. By combining the neural network model and the local binarization processing, the font identification is realized under the complex picture background, and the accuracy of the font identification is improved.

The method and apparatus of the present invention will be described in detail with reference to the accompanying drawings using specific embodiments.

In an embodiment of the present invention, as shown in fig. 2, a font identification method provided in an embodiment of the present invention may include the following steps:

step 201, inputting a to-be-recognized picture containing a plurality of to-be-recognized characters into a pre-trained first neural network model to obtain position coordinates of the plurality of to-be-recognized characters in the to-be-recognized picture, wherein the first neural network model is obtained by training based on a first training sample set, and the first training sample set comprises: the first sample picture containing the first sample text, and the position coordinates of the first sample text in the first sample picture.

In one possible embodiment, the picture to be recognized input into the first neural network model may be as shown in fig. 3, and the words to be recognized included in the picture to be recognized as shown in fig. 3 may include "happy", "open", "gate", and "red".

Step 202, based on the position coordinates of the characters to be recognized, the image to be recognized is segmented to obtain a plurality of single character pictures, wherein each single character picture comprises one character to be recognized.

In one possible implementation, the picture to be recognized as shown in fig. 3 may be divided into five single-word pictures, and the divided five single-word pictures may include the words "happy", "open", "gate", and "red" to be recognized, respectively, as shown in fig. 4.

Step 203, converting the single character picture into a gray scale image for each single character picture.

Step 204, for each gray scale image, determining whether the gray scale image satisfies the color inversion condition, if yes, executing step 205a, and if no, executing step 205 b.

In the embodiment of the present invention, it may be determined, by the method shown in fig. 5, whether each gray scale map satisfies the color inversion condition:

step 501, calculating the mean value of the pixel values of all the pixel points of the gray-scale image as a first mean value, and calculating the mean value of the pixel values of all the vertex pixel points of the gray-scale image as a second mean value;

and 502, comparing the first average value with the second average value, and when the first average value is larger than the second average value, determining that the gray-scale image meets the image color inversion condition, otherwise, not meeting the image color inversion condition.

Step 205a, when the gray scale map satisfies the image color inversion condition, color inversion is performed on the gray scale map, and the obtained image after color inversion is used as the gray scale map to be processed, and the process proceeds to step 206.

In this step, for example, if the pixel matrix of the gray-scale image A is

And when the gray level image A meets the image color inversion condition, performing color inversion on the gray level image A to obtain an image after color inversion as a gray level image to be processed, wherein the pixel matrix of the gray level image to be processed is

And step 205b, when the gray-scale map does not meet the image color inversion condition, taking the gray-scale map as a gray-scale map to be processed, and entering step 206.

And step 206, respectively carrying out local binarization processing on the multiple gray level images to be processed to obtain binary images serving as binary images to be input.

In this step, for each binary image, a black area containing the text to be recognized in the binary image may be cut to obtain a circumscribed rectangle image containing the text to be recognized. And aiming at each circumscribed histogram, adding a preset number of pixel points around a pixel point matrix of the circumscribed histogram to obtain a new circumscribed histogram serving as a binary image to be input, wherein the pixel values of the preset number of pixel points are all preset pixel values.

For example, when the pixel matrix of the circumscribed histogram is a 946 × 793 matrix, two additional pixels may be added around the pixel matrix of the circumscribed histogram, that is, two additional [946 × 4+ (793+4) × 4] — 6972 pixels are added, and then the preset number may be set to 6972. The preset pixel value may be 255.

Step 207, respectively inputting the multiple binary images to be input into a pre-trained second neural network model to obtain the fonts of the characters to be recognized, which are contained in each binary image, wherein the second neural network model is obtained by training based on a second training sample set, and the second training sample set contains: a second sample picture containing a second sample text, and a font of the second sample text that is known.

By adopting the method provided by the embodiment of the invention, the position coordinates of the characters to be recognized in the picture to be recognized are determined through the pre-trained first neural network model, and then the picture to be recognized is segmented according to the determined position coordinates of the characters to be recognized to obtain a plurality of single character pictures; then carrying out local binarization processing on the gray level images of the single character images, and cutting black areas of characters to be recognized in the binary image subjected to the local binarization processing through a first neural network model and the local binarization processing to remove background interference of the characters to be recognized in the image to be recognized; and further, inputting a plurality of binary images to be input into a pre-trained second neural network model to obtain the fonts of the characters to be recognized contained in each binary image. By combining the neural network model and the local binarization processing, the font identification is realized under the complex picture background, and the accuracy of the font identification is improved. Moreover, the accuracy of font identification for dark-color and light-color characters can be improved by performing color inversion processing on the gray-scale image satisfying the image color inversion condition.

In the embodiment of the present invention, the local binarization processing may be performed on a plurality of to-be-processed gray-scale maps by using a method as shown in fig. 6, where the method includes:

step 601, aiming at each gray-scale image to be processed, dividing the gray-scale image to be processed into a plurality of areas.

In this step, the size of the divided region may be divided according to specific situations, for example, the to-be-processed gray-scale map is divided into 10 regions with the same area.

Step 602, for each region, calculating an average value of pixel values of all pixel points in the region as a third average value.

Step 603, determining whether the pixel value of the pixel point is smaller than the third mean value for all the pixel points in the region, if so, executing step 604, and if not, executing step 605.

In step 604, when the pixel value of the pixel point is smaller than the third mean value, the pixel value of the pixel point is determined to be 0.

Step 605, when the pixel value of the pixel point is not less than the third mean value, determining the pixel value of the pixel point to be 255.

In practical applications, for example, for the divided area a, if the area a is a rectangular area, an integral of the gray scale image to be processed may be calculated by using an integral function in opencv (Open source Computer Vision Library), and then an average of the pixel values of the area a is calculated by using a formula:

I(x，y)＝s(x₂，y₂)-s(x₁，y₂)-s(x₂，y₁)+s(x₁，y₁)

wherein I (x, y) represents the mean of the pixel values of the area A, (x, y) represents the coordinates of the center point of the rectangular area A, and x₁And y₁Coordinate value of upper left corner, x, representing rectangular area A₂And y₂Represents the coordinate value of the lower right corner of the rectangular area a, and s () represents an integral graph.

In an embodiment of the present invention, in a possible implementation manner, the training based on the first training sample set is to obtain the first neural network model, as shown in fig. 7, the method specifically includes the following steps:

step 701, inputting a first sample picture containing first sample characters included in a first training sample set into a first to-be-trained neural network model to obtain an output result of the first to-be-trained neural network model.

In this step, the first training sample set may include a plurality of first sample pictures, where the position coordinates of the text in each first sample picture are marked.

In this step, the first sample picture is input into the first neural network model to be trained, and the parameters of the first neural network model to be trained are adjusted according to the position coordinates of the characters in each first sample picture.

In a specific application, the first neural network model to be trained may be an EAST (Efficient and Scene Text Detector) detection network.

Step 702, determining whether a preset training termination condition is met, if yes, executing step 703, and if no, executing step 704.

In this step, the preset training termination condition may include:

training the neural network model to be trained for preset times by using the first training sample set, wherein the preset times can be specifically set according to practical application; alternatively, the first and second electrodes may be,

and inputting a test sample picture contained in the test sample set into the current neural network model to be trained, wherein the numerical value of the calculated loss function is smaller than a preset threshold value, and the preset threshold value can be specifically set according to practical application.

And 703, if yes, determining the current first neural network model to be trained as the trained first neural network model.

Step 704, if not, adjusting the first neural network model to be trained to obtain a new first neural network model to be trained, and returning to step 701 to start the next training.

In this step, adjusting the first to-be-trained neural network model may include:

carrying out adaptive adjustment on parameters of each layer of the first neural network model to be trained;

the model structure of the first neural network model to be trained is adaptively adjusted, for example, the first neural network model to be trained may be subjected to parameter layer addition or subtraction according to the current training result, or the first neural network model to be trained may be subjected to neural network node addition or subtraction according to the current training result.

In an embodiment of the present invention, in a possible implementation manner, a second neural network model is obtained based on training of a second training sample set, as shown in fig. 8, the method may specifically include the following steps:

step 801, training the second neural network model to be trained for a preset number of times based on a second sample picture containing second sample characters and included in the second training sample set and a known font of the second sample characters.

In this step, each second sample picture contains a second sample character, and the second sample picture is a picture subjected to binarization processing.

In a specific application, the second neural Network model to be trained may be a Residual Network model, for example, a ResNet50(Residual Network 50, Residual Network) model, which specifically may be: the dimension of the input layer of ResNet50 is modified to (128, 128, 1), the head full connection of ResNet50 is removed, two full connection layers are changed, a Dropout (exit) mechanism is added between the two full connection layers, the parameter is set to be 0.5, the parameter which indicates that 50% of the first full connection layer is randomly masked during training, and the output dimension of the last full connection layer is consistent with the number of the types of font identification.

In this step, the preset number of times may be specifically set according to the actual application, for example, when the preset number of times is set to 30000 times. And inputting a second sample picture into a second neural network model to be trained in each training, and adjusting the second neural network model to be trained according to the known font of the second sample characters to obtain the trained second neural network model.

Step 802, for the second neural network model to be trained after each training, calculating the accuracy of the second neural network model to be trained by using the test set, wherein the second neural network model to be trained after each training corresponds to one accuracy.

The test set includes: the method comprises the following steps that a plurality of test pictures are obtained, each test picture comprises a test character, the font of the test character is known, and the number of the test pictures can be determined according to the actual application condition.

In this step, when the number of the test pictures is 200, for the second to-be-trained neural network model after each training, 200 test pictures are respectively input into the second to-be-trained neural network model after the training, and the accuracy rate of recognizing the font of the test characters in 200 test pictures is calculated.

And 803, selecting the trained second neural network model corresponding to the maximum accuracy as the trained second neural network model.

In this step, for example, the accuracy of the second neural network model to be trained obtained after each training is calculated by using the test set can be expressed as { A }₁，A₂，…，A_nIn which A₁，A₂，…，A_nAnd respectively representing the accuracy of the second neural network model to be trained obtained after each training, wherein n represents the training times. The maximum accuracy max { A } can be chosen₁，A₂，…，A_nAnd the corresponding trained second neural network model to be trained is used as the trained second neural network model.

Based on the same inventive concept, according to the font identification method provided in the above embodiment of the present invention, correspondingly, another embodiment of the present invention further provides a font identification device, a schematic structural diagram of which is shown in fig. 9a, and specifically includes:

the detection module 901 is configured to input a to-be-recognized picture including a plurality of to-be-recognized characters into a pre-trained first neural network model, so as to obtain position coordinates of the plurality of to-be-recognized characters in the to-be-recognized picture, where the first neural network model is obtained by training based on a first training sample set, and the first training sample set includes: a first sample picture containing first sample characters, and position coordinates of the first sample characters in the first sample picture;

the image segmentation module 902 is configured to segment an image to be recognized based on position coordinates of a plurality of characters to be recognized to obtain a plurality of single character pictures, where each single character picture includes one character to be recognized;

a gray level conversion module 903, configured to convert the multiple single character pictures into gray level images respectively, where the gray level images serve as gray level images to be processed;

a binarization module 904, configured to perform local binarization on the multiple to-be-processed grayscale images respectively to obtain binary images, which are to be input as the to-be-input binary images;

the recognition module 905 is configured to input a plurality of binary images to be input into a pre-trained second neural network model respectively, to obtain a font of a character to be recognized, where each binary image includes the font, the second neural network model is obtained by training based on a second training sample set, and the second training sample set includes: a second sample picture containing a second sample text, and a font of the second sample text that is known.

By adopting the device provided by the embodiment of the invention, the position coordinates of the characters to be recognized in the picture to be recognized are determined through the pre-trained first neural network model, and then the picture to be recognized is segmented according to the determined position coordinates of the characters to be recognized to obtain a plurality of single character pictures; local binarization processing is further carried out on the gray level images of the single character images, and background interference of characters to be recognized in the images to be recognized is removed through the first neural network model and the local binarization processing; and further, inputting a plurality of binary images to be input into a pre-trained second neural network model to obtain the fonts of the characters to be recognized contained in each binary image. By combining the neural network model and the local binarization processing, the font identification is realized under the complex picture background, and the accuracy of the font identification is improved.

Further, as shown in fig. 9b, the grayscale conversion module 903 includes:

a first conversion submodule 906, configured to convert, for each single character picture, the single character picture into a grayscale map;

a second conversion sub-module 907 for determining, for each gray scale map, whether the gray scale map satisfies an image color inversion condition; when the gray level image meets the image color inversion condition, performing color inversion on the gray level image to obtain an image with the color inverted as a gray level image to be processed; otherwise, the gray-scale image is taken as a gray-scale image to be processed.

Further, the second conversion sub-module 907 is specifically configured to calculate, for each gray scale image, an average value of pixel values of all pixel points of the gray scale image, and use the average value as the first average value; calculating the mean value of the pixel values of the vertex pixels of the gray level image to be used as a second mean value; and when the first average value is larger than the second average value, determining that the gray scale map meets the image color inversion condition.

Further, the binarization module 904 is specifically configured to perform local binarization processing on each to-be-processed grayscale image to obtain a binary image; cutting a black area containing characters to be recognized in the binary image aiming at each binary image to obtain a circumscribed rectangular image containing the characters to be recognized; and aiming at each circumscribed histogram, adding a preset number of pixel points around a pixel point matrix of the circumscribed histogram to obtain a new circumscribed histogram serving as a binary image to be input, wherein the pixel values of the preset number of pixel points are all preset pixel values.

Further, as shown in fig. 9b, the font identification apparatus further includes:

a first model training module 908, configured to obtain a first neural network model based on the training of the first training sample set by using the following steps:

inputting a first sample picture containing first sample characters contained in a first training sample set into a first to-be-trained neural network model to obtain an output result of the first to-be-trained neural network model;

determining whether a preset training ending condition is met;

if yes, determining the current first neural network model to be trained as the trained first neural network model;

a second model training module 909, configured to train to obtain a second neural network model based on the second training sample set by using the following steps:

and training the second neural network model to be trained for preset times based on a second sample picture containing second sample characters and a known font of the second sample characters, wherein the second sample picture is contained in the second training sample set.

The embodiment of the present invention further provides an electronic device, as shown in fig. 10, which includes a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 complete mutual communication through the communication bus 1004,

a memory 1003 for storing a computer program;

the processor 1001 is configured to implement the following steps when executing the program stored in the memory 1003:

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the font recognition methods described above.

In a further embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the font recognition methods of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the device, the electronic apparatus and the storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A font recognition method, comprising:

2. The method according to claim 1, wherein the converting the plurality of single character pictures into the gray-scale maps respectively as the gray-scale maps to be processed comprises:

3. The method of claim 2, wherein determining whether the gray map satisfies an image color inversion condition for each gray map comprises:

4. The method according to claim 1, wherein the performing local binarization processing on the plurality of to-be-processed gray-scale maps to obtain binary maps as to-be-input binary maps comprises:

5. The method of any one of claims 1-4, wherein the training of the first neural network model based on the first training sample set comprises:

determining whether a preset training ending condition is met;

6. The method of any one of claims 1-4, wherein the second neural network model is trained based on a second training sample set using the steps comprising:

7. A font recognition apparatus, comprising:

8. The apparatus of claim 7, wherein the gray scale conversion module comprises:

9. The apparatus according to claim 8, wherein the second conversion sub-module is specifically configured to calculate, for each of the gray-scale maps, an average of pixel values of all pixel points of the gray-scale map as a first average; calculating the mean value of the pixel values of the vertex pixels of the gray level image to be used as a second mean value; and when the first average value is larger than the second average value, determining that the gray-scale image meets the image color inversion condition.

10. The apparatus according to claim 7, wherein the binarization module is specifically configured to perform local binarization processing on each to-be-processed gray scale map to obtain a binary map; cutting a black area containing the characters to be recognized in the binary image to obtain a circumscribed rectangular image containing the characters to be recognized; and aiming at each circumscribed histogram, adding a preset number of pixel points around a pixel point matrix of the circumscribed histogram to obtain a new circumscribed histogram serving as a binary image to be input, wherein the pixel values of the preset number of pixel points are all preset pixel values.

11. The apparatus of any one of claims 7-10, further comprising:

determining whether a preset training ending condition is met;

12. The apparatus of any one of claims 7-10, further comprising:

13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.

14. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.