CN115063818A

CN115063818A - Method and system for distinguishing type of confidential documents

Info

Publication number: CN115063818A
Application number: CN202210656569.8A
Authority: CN
Inventors: 程世清; 王思宇; 陈仁平
Original assignee: 31511 Unit Of Chinese Pla
Current assignee: 31511 Unit Of Chinese Pla
Priority date: 2022-06-10
Filing date: 2022-06-10
Publication date: 2022-09-16

Abstract

The embodiment of the invention provides a method and a system for distinguishing the type of a machine-related official document font, which comprises the following steps: taking a document image of the office document to be recognized in a text type preset format as a document image to be detected; performing binarization processing on a document image to be detected, performing horizontal and vertical projection on the document image subjected to binarization processing respectively, and performing character segmentation on the document through shadow black pixels and white interval pixels to obtain an image of each character; according to each character image, carrying out font identification on each character image through the trained font identification model, and outputting a font identification result after the identification is finished; drawing a rectangular frame for each character on the to-be-detected document image according to the position information of each character, and marking the font type code of the corresponding character in the rectangular frame; and when the right mouse button slides over the font type code on the to-be-detected official document image, automatically restoring the font type code into a corresponding font type name and displaying the font type name. Font recognition is based on image processing, can discern various types of files, and is fast and accurate.

Description

Method and system for distinguishing type of machine-related official document

Technical Field

The invention relates to the field of language identification, in particular to a method and a system for distinguishing the type of a machine-related official document.

Background

The official document of the organ is a document with legal effectiveness and standard style in the administrative management of the national parties, the government and the army, and the high and low quality of the official document is the administrative management level and the working wind and is an important index for the service assessment of the organ. An important index of official document quality is whether the official document font meets the national standard specification, for example, the title should be a small-mark Song Dynasty, a first-level black-body character, a second-level title with a regular-body character, a third-level title and a fourth-level title with a Song Dynasty character, and each element of the edition header and the edition inscription has respective font regulation. In the case of office business assessment, in order to judge whether the fonts of office official documents in paper, scanning edition or PDF edition meet the standard or not, the fonts of similar styles are easy to be confused by manual experience, and time and labor are wasted when the official documents to be assessed are large in quantity.

Disclosure of Invention

The embodiment of the invention provides a method and a system for distinguishing the type of a machine-related official document, wherein the type identification is based on image processing, and can identify various types of documents with higher speed and accuracy.

To achieve the above object, in one aspect, an embodiment of the present invention provides a method for determining a font type of a machine-related document, including:

when the official document to be recognized is a text document, converting the text document into PDF (Portable document Format) and then converting the text document into a document image with a preset format; when the official document to be identified is an image, unifying the official document into a document image with a preset format; when the official document to be identified is paper, shooting the paper official document into a picture by using a document acquisition terminal, and sequentially carrying out gray processing, noise removing processing, edge detection, image segmentation and perspective transformation on the shot picture to obtain a document image in a preset format; taking the document image with a preset format as a document image to be detected;

performing binarization processing on a document image to be detected, performing horizontal and vertical projection on the document image subjected to binarization processing respectively, performing character segmentation on the document through shadow black pixels and white interval pixels to obtain an image of each character, and recording position information of each character;

according to each character image, carrying out font identification on each character image through the trained font identification model, and outputting a font identification result after the identification is finished; the font identification result comprises the font type of the character and the position information of the character;

drawing a rectangular frame for each character on the to-be-detected document image according to the position information of each character, marking the font type code of the corresponding character in the rectangular frame, and setting a prompt font type code when a right mouse button slides over;

and when the right mouse button slides over the font type code on the to-be-detected official document image, automatically restoring the font type code into a corresponding font type name and displaying the font type name.

On the other hand, an embodiment of the present invention provides a system for distinguishing a type of a machine-related official document, including:

the official document preprocessing unit is used for converting the official document of the office to be identified into PDF (portable document format) and then converting the official document of the office to be identified into an official document image with a preset format when the official document of the office is a text; when the official document to be identified is an image, unifying the official document into a document image with a preset format; when the official document to be identified is a paper official document, shooting the paper official document into a picture by using an official document acquisition terminal, and sequentially carrying out gray processing, noise removal processing, edge detection, image segmentation and perspective transformation on the shot picture to obtain an official document image in a preset format; taking the document image with a preset format as a document image to be detected;

the document segmentation unit is used for carrying out binarization processing on a document image to be detected, respectively carrying out horizontal projection and vertical projection on the document image subjected to binarization processing, carrying out character segmentation on the document through shadow black pixels and white interval pixels to obtain an image of each character, and recording position information of each character;

the character type recognition unit is used for carrying out font recognition on each character image through the trained font recognition model according to each character image and outputting a font recognition result after the recognition is finished; the font identification result comprises the font type of the character and the position information of the character;

the character type labeling unit is used for drawing a rectangular frame for each character on the to-be-detected document image according to the position information of each character, labeling the font type code of the corresponding character in the rectangular frame, and setting a prompt font type code when a right mouse button slides over;

and the display unit is used for automatically restoring the font type code to the corresponding font type name to display when the font type code is slid on the to-be-detected official document image by the right mouse button.

The technical scheme has the following beneficial effects: font recognition is based on image processing, can discern various types of files, and is fast and accurate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a method for discriminating the font type of official document according to the embodiment of the present invention;

FIG. 2 is a block diagram of a official document font type identification system according to an embodiment of the present invention;

FIG. 3 is a general block diagram of an embodiment of the invention;

FIG. 4 is a block diagram of a document collection module according to an embodiment of the present invention;

FIG. 5 is a statistical chart of training indicators of the font recognition model according to the embodiment of the present invention;

FIG. 6 is a diagram of a test result of a font identification model according to an embodiment of the present invention;

FIG. 7 is a diagram of an effect of document font recognition according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, in combination with the embodiment of the present invention, a method for determining a font type of a machine-related document is provided, which includes:

s101: when the official document to be recognized is a text document, converting the text document into PDF (Portable document Format) and then converting the text document into a document image with a preset format; when the official document to be identified is an image, unifying the official document into a document image with a preset format; when the official document to be identified is paper, shooting the paper official document into a picture by using a document acquisition terminal, and sequentially carrying out gray processing, noise removing processing, edge detection, image segmentation and perspective transformation on the shot picture to obtain a document image in a preset format; taking the document image with a preset format as a document image to be detected;

s102: performing binarization processing on a document image to be detected, performing horizontal and vertical projection on the document image subjected to binarization processing respectively, performing character segmentation on the document through shadow black pixels and white interval pixels to obtain an image of each character, and recording position information of each character;

s103: according to each character image, carrying out font identification on each character image through the trained font identification model, and outputting a font identification result after the identification is finished; the font identification result comprises the font type of the character and the position information of the character;

s104: drawing a rectangular frame for each character on the to-be-detected document image according to the position information of each character, marking the font type code of the corresponding character in the rectangular frame, and setting a prompt font type code when a right mouse button slides over;

s105: and when the right mouse button slides over the font type code on the to-be-detected official document image, automatically restoring the font type code into a corresponding font type name and displaying the font type name.

Preferably, step 101, when the official document to be identified is a paper official document, shooting the paper official document into a picture by using a document acquisition terminal, and sequentially carrying out gray processing, noise removal processing, edge detection, image segmentation and perspective transformation on the shot picture to obtain an official document image in a preset format; taking the document image with the preset format as a document image to be detected, specifically comprising the following steps:

the high-definition camera automatically shoots the paper official document to be identified according to the collected image acquisition instruction, and outputs an original image P of the paper official document to be identified;

creating a copy P of an original document image P ₁ To P ₁ Carrying out gray level processing; after the gray level processing is finished, adopting Gaussian blur to filter noise to obtain an edge image E of the original document image P; the method specifically comprises the following steps:

determination of P Using Sobel Filter ₁ Gradient G (x, y) and direction θ of the edge _M (ii) a Wherein, G _x Being a vertical edge, means an abrupt change in the gradient in the x-direction, G _y Is a horizontal edge, meaning a sudden change in the gradient in the y direction; non-maximum suppression of gradient amplitude in gradient direction for P ₁ Comparing the sizes of the surrounding 8 neighborhood values in the gradient directions of four types of 0 degrees, 45 degrees, 90 degrees and 135 degrees in the 3 x 3 region of each pixel point i, if the pixel i is the maximum value, keeping the pixel point, and otherwise, setting 0; detecting and connecting edges by combining a double-threshold algorithm to obtain an edge image E of the original document image P;

creating a copy E of E ₁ Obtaining E ₁ Determining a quadrilateral edge from the edge closed contour set, and taking the quadrilateral edge as the edge of the original document image P; dividing a target official document image from the original official document image P according to the position information of the quadrilateral edge;

and mapping the target official document image into an A4 paper size ratio image of a standard official document through 4 vertex coordinates, correcting each character of the target official document image into a visually proportional and harmonious pattern through perspective transformation, and forming an official document image in a preset format after correction.

Preferably, step 102, performing binarization processing on the document image to be detected, performing horizontal and vertical projection on the document image subjected to binarization processing, and performing character segmentation on the document through shadow black pixels and white interval pixels to obtain an image of each character, specifically including:

carrying out binarization processing on the document image to be detected to obtain a binarization document image; horizontally projecting the binarized document image on a y axis to obtain a binary image of each line, vertically projecting the binary image of each line on an x axis, judging the starting position and the ending position of each character in the document image to be detected according to black pixels and white interval pixels obtained by projection, and obtaining the position coordinates of the characters according to the starting position and the ending position of each character;

drawing a segmentation rectangular frame for each character in the to-be-detected official document image according to the position coordinates of each character, segmenting the character through the rectangular frame to obtain the image and the position of each character, numbering each character image in a preset form, and recording the number of each character and corresponding character position information.

Preferably, step 103, according to each character image, performing font recognition on each character image through the trained font recognition model, and outputting a font recognition result after the recognition is completed, specifically including:

inputting each character image into a font recognition model which is trained, and respectively obtaining 4 sub-images with the same size for each character image slice through the Focus of a Backbone network Backbone; integrating the width and the height of each sub-image through Concat, and increasing the number of channels of the input image to 64; performing convolution operation with a convolution kernel of 3 and a step length of 2 on the Concat integrated sub-image by using a Conv convolution block, and outputting a first characteristic image; outputting the first characteristic image through a BottleneckCSP module and a Conv convolution block for 3 times to obtain a second characteristic image; performing maximum pooling operation on the second characteristic image in four proportions by an SSP module; integrating the pooling results through a Concat connecting layer, performing convolution and connection operation on the integrated pooling results through a 14-layer network of a Neck portion Neck and a Head portion Head, outputting the font type of each character, and outputting a font identification result after identification is finished;

103, drawing a rectangular frame for each character on the to-be-detected official document image according to the position information of each character, marking the font category code of the corresponding character in the rectangular frame, and prompting the font category code when a right mouse button slides, wherein the method specifically comprises the following steps:

drawing a font position rectangular frame on each character in the document image to be detected according to the position information of each character by using a rectangle () function packaged by python cv 2; and calling plt.text (x, y, s) function marks in the character position rectangular frame by using a matplotlib package to mark a font type code, wherein x and y are respectively an abscissa and an ordinate of a midpoint of the character position rectangular frame, s is the font type code of the character, and the font type code is displayed according to the font type code when a right mouse button slides through the character position rectangular frame.

Preferably, S106, the font recognition model is trained by the following method:

acquiring single Chinese characters, Arabic numerals, punctuation marks and mathematical symbols used by the official document, and respectively manufacturing font sample pictures by adopting rectangular frames according to the font types used by the official document, wherein the background color of the font sample pictures is pure white, the color of the characters is black, and the font style is not thickened; each font sample picture comprises a Chinese character, a number or a symbol; wherein, the font type includes at least one of the following: the style small-label Song simplified style, Song-imitating style _ GB2312, black body, regular style _ GB2312 and Song style;

labeling the character type in each character sample picture by adopting a character type code by using labelImg labeling software, and outputting a text-format label labeling result file; each label result file corresponds to a font sample picture with the same name; taking each label result file and the font sample picture with the same name as the label result file as a data set, and dividing the data set into a number training set and a verification set; wherein, the data in each marking result file comprises: cls is a font type, x and y are horizontal and vertical coordinates of the center point of the rectangular frame respectively, and w and h are width values and height values of the rectangular frame;

configuring a font identification model, wherein a depth control parameter depth _ multiple of the font identification model is set to be 0.33, and a width control parameter width _ multiple of the font identification model is set to be 0.50; the 3 prior frame sizes are respectively set to (10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198), (373,326) under 8, 18, 32 times downsampling, and the weight file is set to yolov5 s.pt; setting the number of types to be 7 in a data configuration file VOC.yaml, setting the name of the font type according to the font type code number, and configuring the addresses of a training set and a verification set of a font identification model;

aiming at a training set, dividing a official document image into single character images, and adjusting each single character image into a preset size to be input into a YOLOV5 model; the YOLOV5 model comprises a Backbone network Backbone, a Neck Neck and a Head; the Backbone network backhaul comprises Focus, Concat, Conv volume block, SSP and BottleneckCSP;

slicing each single character image through Focus to obtain 4 sub-images with the same size; integrating the width and height of the sub-images through Concat, and increasing the number of channels of the input image to 64; performing convolution operation with a convolution kernel of 3 and a step length of 2 on the sub-image integrated by the Concat by using a Conv convolution block, and outputting a first characteristic image; outputting the first characteristic image through a BottleneckCSP module and a Conv convolution block for 3 times to obtain a second characteristic image; performing maximum pooling operation on the second characteristic image in four proportions by an SSP module; integrating the pooling results through a Concat connection layer, performing convolution and connection operation on the integrated pooling results through a 14-layer network of a Neck portion Neck and a Head portion Head, and outputting a single character frame and a character type;

and verifying by adopting a verification set to obtain the trained font recognition model.

As shown in fig. 2, in combination with the embodiment of the present invention, a system for determining a font type of a customs official document is provided, which includes:

the official document preprocessing unit 21 is used for converting the official document of the office to be identified into a PDF (portable document format) and then converting the official document of the office to be identified into an official document image with a preset format when the official document is a text; when the official document to be identified is an image, unifying the official document into a document image with a preset format; when the official document to be identified is paper, shooting the paper official document into a picture by using a document acquisition terminal, and sequentially carrying out gray processing, noise removing processing, edge detection, image segmentation and perspective transformation on the shot picture to obtain a document image in a preset format; taking the document image with a preset format as a document image to be detected;

the document segmentation unit 22 is used for performing binarization processing on a document image to be detected, performing horizontal and vertical projection on the document image subjected to binarization processing respectively, performing character segmentation on the document through shadow black pixels and white interval pixels to obtain an image of each character, and recording position information of each character;

the character type recognition unit 23 is configured to perform font recognition on each character image through the trained font recognition model according to each character image, and output a font recognition result after the recognition is completed; the font identification result comprises the font type of the character and the position information of the character;

the character type labeling unit 24 is used for drawing a rectangular frame for each character on the document image to be detected according to the position information of each character, labeling the font type code of the corresponding character in the rectangular frame, and setting a prompt font type code when a right mouse button slides over;

and the display unit 25 is used for automatically restoring the font type code to the corresponding font type name and displaying the font type code when the font type code is slid on the to-be-detected official document image by the right mouse button.

Preferably, the official document preprocessing unit 21 includes a paper official document preprocessing subunit 211, and the paper official document preprocessing subunit 211 is specifically configured to:

determination of P Using Sobel Filter ₁ Gradient G (x, y) and direction θ of the edge _M (ii) a Wherein G is _x Is a vertical edge, meaning an abrupt change of the gradient in the x-direction, G _y Is a horizontal edge, meaning a sudden change in the gradient in the y direction; non-maximum suppression of gradient amplitude in gradient direction for P ₁ Comparing the magnitudes of surrounding 8 neighboring region values in the gradient directions of four types of 0 degrees, 45 degrees, 90 degrees and 135 degrees in a 3-by-3 region, if the pixel i is the maximum value, keeping the pixel, and if not, setting 0; detecting and connecting edges by combining a double-threshold algorithm to obtain an edge image E of the original document image P;

Preferably, the official document dividing unit 22 is specifically configured to:

Preferably, the character type recognition unit 23 is specifically configured to:

inputting each character image into a font recognition model which is trained, and respectively obtaining 4 sub-images with the same size for each character image slice through a Focus of a Backbone network Backbone; integrating the width and the height of each sub-image through Concat, and increasing the number of channels of the input image to 64; performing convolution operation with a convolution kernel of 3 and a step length of 2 on the Concat integrated sub-image by using a Conv convolution block, and outputting a first characteristic image; outputting the first characteristic image through a BottleneckCSP module and a Conv convolution block for 3 times to obtain a second characteristic image; performing maximum pooling operation on the second characteristic image in four proportions by an SSP module; integrating the pooling results through a Concat connecting layer, performing convolution and connection operation on the integrated pooling results through a 14-layer network of a Neck portion Neck and a Head portion Head, outputting the font type of each character, and outputting a font identification result after identification is finished;

the character type labeling unit 24 is specifically configured to:

drawing a font position rectangular frame on each character in the document image to be detected according to the position information of each character by using a rectangle () function packaged by python cv 2; and calling a plt.text (x, y, s) function mark in the character position rectangular frame by using a matplotlib package, marking a font type code, wherein x and y are respectively an abscissa and an ordinate of a midpoint of the character position rectangular frame, s is the font type code of the character, and displaying the font type code according to the font type code when a right mouse button slides through the character position rectangular frame.

Preferably, the device further comprises a font recognition model training unit 26, wherein the font recognition model training unit 26 comprises:

configuring a font identification model, wherein a depth control parameter depth _ multiple of the font identification model is set to be 0.33, and a width control parameter width _ multiple of the font identification model is set to be 0.50; the 3 prior frame sizes are respectively set to (10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198), (373,326) under 8, 18, 32 times downsampling, and the weight file is set to yolov5 s.pt; setting the number of types to be 7 in a data configuration file VOC.yaml, setting font type names according to font type codes, and configuring the addresses of a training set and a verification set of a font identification model;

aiming at a training set, dividing a official document image into single character images, and adjusting each single character image into a preset size to be input into a YOLOV5 model; the YOLOV5 model comprises a Backbone network Backbone, a Neck Neck and a Head; the Backbone network Backbone comprises Focus, Concat, Conv volume block, SSP and BottleneckCSP;

slicing each single character image through Focus to obtain 4 sub-images with the same size; integrating the width and height of the sub-images through Concat, and increasing the number of channels of the input image to 64; performing convolution operation with a convolution kernel of 3 and a step length of 2 on the Concat integrated sub-image by using a Conv convolution block, and outputting a first characteristic image; outputting the first characteristic image through a BottleneckCSP module and a Conv convolution block for 3 times to obtain a second characteristic image; performing maximum pooling operation on the second characteristic image in four proportions by an SSP module; integrating the pooling results through a Concat connection layer, performing convolution and connection operation on the integrated pooling results through a 14-layer network of a Neck portion Neck and a Head portion Head, and outputting a single character frame and a character type;

and verifying by adopting a verification set to obtain the trained font identification model.

The above technical solutions of the embodiments of the present invention are described in detail below with reference to specific application examples, and reference may be made to the foregoing related descriptions for technical details that are not described in the implementation process.

A method and a device for distinguishing official document fonts based on an image processing technology relate to the fields of natural language processing and computer vision, and are mainly used for quality assessment and evaluation of official documents of parties, politics, military and the like.

The method aims at the problems that when the official document quality assessment and evaluation is carried out, manual judgment and character-by-character check are carried out to judge whether the fonts of all elements are correct, time and labor are wasted, the fonts of similar patterns are easy to confuse, the fonts are analyzed and read by a machine, and the official document can be judged quickly and efficiently when the official documents of main image types such as paper type, scanning type and PDF (Portable document Format) cannot be judged, so that an efficient solution is provided for assisting the official document quality assessment and evaluation. Wherein, official documents: is in a format of the official document of the national administrative organ and conforms to the specification of the national standard GB-T9704-.

The method carries out font discrimination based on the deep learning target detection method of the image according to the characteristics of the detailed image characteristics of each font, and is divided into a document acquisition terminal, a central processing system and a display interpretation terminal 3, as shown in figure 3.

1. Official document acquisition terminal

The document file recognition and conversion system is composed of an image acquisition module, a document import interface and a recognition and conversion module 3, and the structure of the document file recognition and conversion system is shown in figure 4.

(1) An image acquisition module: set up high definition digtal camera 1, resolution ratio: not less than 1080P (1920 × 1080), video compression mode: Motion-JPEG, signal system: PAL or NTSC, frame frequency: greater than 25fps, interface: USB3.0 high-speed interface, drive-free; and after receiving the image acquisition instruction, the image acquisition module automatically shoots the paper document and outputs 1920 x 1080 original images.

(2) A document import interface: and a USB data interface is configured, and data direct import is realized for electronic document documents such as doc, docx and wps and scanned and copied image documents such as bmp, jpg, png and titf.

(3) The identification conversion module: after receiving an original document image P shot by an image acquisition module, creating a P copy P ₁ Then to P ₁ Performing gray scale processing to remove image color, and

the gray processing calculation method is as the formula (1):

where f (x, y) is the image after the gradation processing, and r (x, y), g (x, y), and b (x, y) represent R, G, B values at that point.

Then to the image P ₁ Gaussian blur is performed to filter out noise to more accurately edge the image E. The method specifically comprises the following steps: determining image P using Sobel filter ₁ Gradient G (x, y) and direction θ of the edge _M The definition is shown in formula (2) (3). Wherein G is _x Being a vertical edge, means an abrupt change in the gradient in the x-direction, G _y By horizontal edge is meant an abrupt change in the y direction of the gradient.

Then, the non-maximum value inhibition is carried out on the gradient amplitude value in the gradient direction, and the original document image P is subjected to ₁ Each pixel point i, pair is in3 x 3 region, along 0 °,45 °,90 °And (4) comparing the magnitudes of 8 surrounding neighborhood values in four gradient directions of 135 degrees, if the pixel i is the maximum value, keeping the pixel point, otherwise, setting 0, thus carrying out non-maximum value suppression, and finally, detecting and connecting edges by using a double-threshold algorithm to obtain an edge image E of the original document image P.

Creating a copy E of E ₁ From E ₁ Finding out an edge closed contour set, finding out a quadrilateral edge from the contour set, determining the quadrilateral edge as the edge of the original document image P, and segmenting a target document image from the original document image P through quadrilateral edge position information.

And mapping the target document image into an A4 paper size ratio image of the standard document through 4 vertex coordinates, wherein the resolution is 1120 x 790, and then carrying out perspective transformation by using a perspective transformation method as shown in a formula (4).

And the right 3 x 3 matrix of the formula is a mapping matrix and is obtained from an OpenCV library, and x and y are coordinates of the target document image before perspective transformation. The perspective transformed image is defined as equation (5).

Image f transformed from perspective _t And (x, y) storing in JPEG format as the image to be detected of the paper document. When a paper document image is taken, the image is distorted due to an incorrect angle, and the purpose of perspective transformation is to re-correct the distorted image into an image of a4 size.

The identification conversion module receives documents of document classes such as doc, docx, wps and the like imported from a document import interface, converts the documents into PDF, and converts the PDF into a JPEG format image with the resolution of 1120 × 790 to be processed next step; wherein, scanned and copied image documents such as bmp, jpg, png, titf and the like are also uniformly converted into JPEG format images with the resolution of 1120 × 790 to be processed next step.

2. Central processing system

(1) And constructing a font identification model. The font recognition model is built by relying on the YOLOv5 model. The YOLOv5 model is improved on the basis of the YOLOv3 model, is based on a Pytrch framework, is easier to configure and use in practice, and is higher in training speed and accuracy, and the object recognition speed reaches 140 FPS. The font identification model consists of a Backbone network Backbone, a Neck Neck and a Head.

The Backbone network Backbone comprises modules such as Focus, Conv volume blocks, SSP, BottleneckCSP and the like, when the model identifies the font of the document, the document image with the resolution of 1120 × 790 is firstly divided into single-character images, and then each single-character image is adjusted to 640 × 640 size to be input into the model. The model slices the single-character image through Focus, adjusts the single-character image into 4 sub-images with the size of 320 × 320, and integrates the width and height of the sub-images through Concat, so that the number of channels of the input image is increased to 64. For the Concat integrated image, a Conv convolution block is used to perform convolution operation with a convolution kernel of 3 and a step size of 2, and a feature image with a result of 160 × 128 is output, and then the feature image is output through a bottleeckcsp module and a Conv convolution block 3 times to become a 20 × 1024 image. And the SSP module performs maximum pooling operation on the 20 × 20 images by four groups of 1 × 1, 5 × 5, 9 × 9 and 13 × 13 to improve the model accuracy, integrates the pooling results together by using a Concat connecting layer, performs convolution and connection operation by using 14-layer networks of a Neck portion Neck and a Head portion Head, and outputs single character borders and font types.

The main measure of model loss calculation, rectangular frame loss, is calculated by CIOU and defined as formula (6).

In the formula (6), ρ ² (b,b ^gt ) Is the geometric distance between the midpoints of the predicted and true rectangular frames, c is the union of the minimum frame covering the predicted and true rectangular frames minus the predicted and true rectangular frames, IOU is the cross-over ratio of the predicted and true rectangular frames, v is the coincidence of the length and width of the predicted and true rectangular frames, and is defined asEquation (7).

Alpha is a regulating parameter, and is defined as formula (8).

(2) And (5) training a font recognition model. And after the model is built, starting to manufacture a training set and training the model.

step1. data set creation. The method comprises the steps of collecting all Chinese single Chinese characters, Arabic numerals, punctuation marks, mathematical signs and the like commonly used for official document imitation, respectively manufacturing font square sample pictures according to 7 types of fonts of a small square standard Song simplified form, an imitation Song _ GB2312, a black body, a regular form _ GB2312 and a Song form commonly used for official documents, wherein the resolution ratio is not lower than 32 x 32, each font sample picture only comprises one Chinese character, numeral or sign, the background color is pure white, the character color is black, and the font style is not thickened.

step2. labeling of samples. Labeling each character sample picture by using labelImg labeling software, wherein the output format is a YOLO format, and 7 types of character type codes of a small square mark Song simplified form, an imitated Song song _ GB2312, a black body, a regular script _ GB2312 and a Song song are respectively recorded as: "xiaobiaoson", "fangsong", "heiti", "kaiti", "kaiti _ gb", and "songti", after the labeling is completed, a text file in txt format is output, and each tagged result file in txt format corresponds to 1 sample picture with the same name. The data storage format of the labeling result file is as follows: cls is the target category, x and y are horizontal and vertical coordinates of the center point of the labeling box, and w and h are the width and height values of the labeling box.

step3. model configuration. The model type was selected as low complexity yolov5s, the model depth control parameter depth _ multiple was set to 0.33, and the model width control parameter width _ multiple was set to 0.50. The 3 prior frame sizes under 8, 18, 32 times downsampling are set to (10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198), (373,326), respectively, and the weight file is set to yolov5 s.pt. Yaml, set the number of categories to 7, set the category name according to the font category code, configure the data set of the model and the verification set file address.

step4. model training. And after the model configuration is finished, inputting the data sample into the model for training. Considering that the data set is small in overall scale, the sample picture and the labeling result are divided into a training set and a verification set according to the ratio of 8:2, each file is correspondingly associated with a file name, and the sample picture and the sample labeling result are respectively stored in images and labels folders. Newly building 1 train folder and 1 val folder under images and labels folders, and respectively storing a verification set and a training set.

And starting model training, and after the model training is finished, starting document font identification.

(3) And (5) dividing the document single words. Before the font of the official document is identified, the official document needs to be divided into single character images. The segmentation firstly carries out binarization processing on a document image to be recognized, then carries out horizontal projection on the document image to be recognized (the binarized image) on a y axis to obtain a binary image of each line, then carries out vertical projection on the binary image of each line on an x axis, and finally judges the starting position and the ending position of each character through the statistics of the number of projection black pixels and white interval pixels so as to obtain the position coordinates of a single character, draw a rectangular frame, carry out character segmentation and record the serial number and the position information of the character.

(4) And (5) character style discrimination. And inputting the divided single character image into a model to obtain a character recognition result and position information.

3. Display interpretation terminal

According to the font identification result and the position information of a single character, each character frame (font position rectangular frame) is arranged on the image to be identified through a rectangle () function wrapped by python cv2, the identified font type code is marked, a right-key font name auxiliary prompt is set, and when the right key of the mouse slides over the font marking information, the font type code is automatically reduced into the original font names of a 'square small symbol Song simplified form, an imitation Song _ GB2312, a black body, a regular form _ GB2312 and a Song form', so that the original font names are convenient for a official document appraisal person to interpret.

The beneficial effects obtained by the invention are as follows:

the font recognition is carried out based on the image processing technology, so that the limitation that the traditional text processing technology cannot process image documents such as paper scanning, photos, PDF and the like when interpreting the fonts is overcome; the automatic machine judgment is used for replacing manual character-by-character checking, so that misjudgment and misjudgment of similar and easily confused characters are avoided, the speed and the accuracy are higher, and the efficiency of official document quality assessment is improved. The method has the advantages that the method is not limited to official document font identification, and can help a user identify the font category through the font effect image during image-text design and multimedia production, so that a foundation is laid for accurately searching the font and efficiently creating.

The embodiment takes the style of the common official document including the small standard song simplified style, the imitation song _ GB2312, the black body, the regular script _ GB2312 and the 7 style of song as examples. The experimental environment is as follows: windows 10 operating environment, python3.9.7 programming language, Anaconda3 package manager and environment manager, CPU: intel (R) core (TM) i7-10875H, display card: NVIDIA GeForce RTX 2070, memory: 32G.

1. Firstly, converting a paper document into a JPEG-format image for next processing by a document acquisition terminal through operations such as shooting, edge detection, image segmentation, perspective transformation and the like; converting electronic document documents such as doc, docx, wps and the like into PDF (Portable document Format), and then converting into a JPEG (joint photographic experts group) image format for next processing; and uniformly converting scanned and copied image format documents such as bmp, jpg, png and titf into JPEG image formats for next processing.

2. Constructing a font recognition model training data set, collecting 7889 characters commonly used by official documents, storing the characters in 1 docx document, copying 7 parts of the document, naming the document by 7 font names respectively, and setting all character fonts in the document as fonts corresponding to the document names. The 7 docx document pages are all set to be 5cm wide, 5cm high, 0.5cm white background and 0.5cm margin, the font size is 100, black and not bold, and each page is just 1 word.

3. Opening a character set document by using a client under-package function of a python language win32com module, converting a doc document into a pdf format document by using an ExportAsFixedFormat () function, opening the pdf document by using a fitz module, acquiring each page image of the document, deriving, wherein the image name is 'font name + page number', the format is JPEG, and the resolution is 189 x 189.

In view of the large number of character samples, to increase the sample training speed and make the sample text approximate to the real text size, the page images of 189 × 189 size are compressed into 30 × 30 size. In the process of labeling, considering that manual labeling is time-consuming and labor-consuming, a labeling target is single and has a fixed size, a method for automatically generating a sample by a program is adopted, attribute labels such as < folder > < file > < path > < source > < size > are written in according to the attribute information of a sample picture by referring to an output xml format of labelImg labeling software, upper left and upper right coordinates of characters in the sample picture are respectively set according to (5,5) (25,25), a labeling file in an xml labeling format is generated in batches, and the labeling file is converted into a labeling result file in a YOLO format in batches.

4. Respectively storing 55223 types of font sample pictures and 55223 types of labeling result files in a YOLO format of 7 types in the same directory, forming a training set according to 43563 files under each directory, forming a verification set according to 11660 files, and enabling sample picture names to correspond to labeling result file names one by one. After the model is configured, the sample and the labeling result file are input into the model, the training parameters are batch-size:16, works: 0 and epochs:50, and the training result is shown in figure 5.

TABLE 1 model training index statistical table

Through 50 rounds of iterative training, the model accuracy is 96.7%, the recall rate is 96.2%, the mAP _0.5 is 98.9%, the detailed table 1 shows that the trace/box _ loss is the loss of a training set frame, the trace/obj _ loss is the loss of a training set confidence coefficient, the trace/cls _ loss is the loss of a training set category, the val/box _ loss is the loss of a verification set frame, the val/obj _ loss is the loss of a verification set confidence coefficient, and the val/cls _ loss is the loss of a verification set category, so that the model obtains good index effect. The trained model is used to test a group of font test pictures of 16 characters including 13 regular fonts and 3 square small-label sons, and the result is shown in fig. 6, the model accurately predicts the font types and gives the confidence values of the corresponding font types.

5. Carrying out binaryzation on 1 test document image, then carrying out horizontal and vertical projection, then carrying out character segmentation, numbering the segmented characters in a 'c (serial number)' form, and recording the position coordinate (x) of each character ₁ ,y ₁ ,x ₂ ,y ₂ ) Inputting the character image into the trained model for font recognition, and outputting a result by the model, as shown in fig. 7, the left side is a single character image obtained by segmenting the test official document image, and the right side is a result obtained by recognizing the font by the model, wherein red is a recognized small square mark song font, light yellow is a recognized regular script font, and pink is a recognized imitation song _ GB2312 font.

6. And drawing a font position rectangular frame on the document image by using a rectangle () function of a python cv2 package for the recognized font position and type information, calling a plt text (x, y, s) function to mark the font type information by using a matplotlib package, wherein x and y are respectively a midpoint horizontal coordinate and a longitudinal coordinate of a mark frame, and s is the type information of the font, displaying the marked result document on a terminal display device, and assisting official document appraisers to quickly perform official document quality assessment.

It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. To those skilled in the art; various modifications to these embodiments will be readily apparent, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".

Those of skill in the art will also appreciate that the various illustrative logical blocks, elements, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The various illustrative logical blocks, or elements, described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a user terminal. In the alternative, the processor and the storage medium may reside in different components in a user terminal.

In one or more exemplary designs, the functions described above in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media that facilitate transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media can include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store program code in the form of instructions or data structures and which can be read by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Additionally, any connection is properly termed a computer-readable medium, and, thus, is included if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wirelessly, e.g., infrared, radio, and microwave. Such discs (disk) and disks (disc) include compact disks, laser disks, optical disks, DVDs, floppy disks and blu-ray disks, where magnetic discs generally reproduce data magnetically, while disks generally reproduce data optically with lasers. Combinations of the above may also be included in the computer-readable medium.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for distinguishing the type of a machine-related official document character is characterized by comprising the following steps:

when the official document to be identified is a text type, converting the text type official document into PDF (portable document format), and then converting the text type official document into an official document image with a preset format; when the official document to be identified is an image, unifying the official document into a document image with a preset format; when the official document to be identified is paper, shooting the paper official document into a picture by using a document acquisition terminal, and sequentially carrying out gray processing, noise removing processing, edge detection, image segmentation and perspective transformation on the shot picture to obtain a document image in a preset format; taking the document image with a preset format as a document image to be detected;

2. The official document font type distinguishing method according to claim 1, wherein when the official document to be recognized is a paper official document, the official document collecting terminal is adopted to shoot the paper official document into a picture, and the shot picture is subjected to gray processing, denoising processing, edge detection, image segmentation and perspective transformation in sequence to obtain an official document image in a preset format, and the method specifically comprises the following steps of:

determination of P Using Sobel Filter ₁ Gradient G (x, y) and direction θ of the edge _M (ii) a Wherein G is _x Being a vertical edge, means an abrupt change in the gradient in the x-direction, G _y Is a horizontal edge, meaning a sudden change in the gradient in the y direction; non-maximum suppression of gradient amplitude in gradient direction for P ₁ Comparing the sizes of the surrounding 8 neighborhood values in the gradient directions of four types of 0 degrees, 45 degrees, 90 degrees and 135 degrees in the 3 x 3 region of each pixel point i, if the pixel i is the maximum value, keeping the pixel point, and otherwise, setting 0; detecting and connecting edges by combining a double-threshold algorithm to obtain an edge image E of the original document image P;

3. The official document font type distinguishing method according to claim 2, wherein the method specifically comprises the steps of performing binarization processing on an image of a document to be detected, performing horizontal and vertical projection on the binarized image of the document respectively, and performing character segmentation on the document through shadow black pixels and white interval pixels to obtain an image of each character:

4. The official document font type discrimination method as claimed in claim 3, wherein the font recognition is performed on each character image through the trained font recognition model according to each character image, and a font recognition result is output after the recognition is completed; the method specifically comprises the following steps:

according to the position information of each character, drawing a rectangular frame for each character on the to-be-detected document image, marking the font type code of the corresponding character in the rectangular frame, and prompting the font type code when a right mouse button slides, the method specifically comprises the following steps:

5. The official document font type discrimination method as claimed in claim 1, wherein the font recognition model is trained by the following method:

configuring a font identification model, wherein a depth control parameter depth _ multiple of the font identification model is set to be 0.33, and a width control parameter width _ multiple of the font identification model is set to be 0.50; the 3 prior frame sizes under 8, 18, 32 times downsampling are set to (10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198), (373,326), respectively, and the weight file is set to yolov5 s.pt; setting the number of types to be 7 in a data configuration file VOC.yaml, setting font type names according to font type codes, and configuring the addresses of a training set and a verification set of a font identification model;

6. A system for discriminating the type of a font of a customs official document is characterized by comprising:

the official document preprocessing unit is used for converting the official document of the office to be recognized into PDF (portable document format) and then converting the official document of the office into an official document image of a preset format when the official document is a text; when the official document to be identified is an image, unifying the official document into a document image with a preset format; when the official document to be identified is paper, shooting the paper official document into a picture by using a document acquisition terminal, and sequentially carrying out gray processing, noise removing processing, edge detection, image segmentation and perspective transformation on the shot picture to obtain a document image in a preset format; taking the document image with a preset format as a document image to be detected;

7. The office official document font classification discrimination system as claimed in claim 6, wherein the official document preprocessing unit comprises a paper office official document preprocessing subunit, and the paper office official document preprocessing subunit is specifically configured to:

creating a copy P of an original document image P ₁ To P is to P ₁ Carrying out gray level processing; after the gray level processing is finished, adopting Gaussian blur to filter noise to obtain an edge image E of the original document image P; the method specifically comprises the following steps:

8. The official document font type discrimination system of claim 7, wherein the document segmentation unit is specifically configured to:

9. The official document font type discrimination system as claimed in claim 8, wherein the character type identification unit is specifically configured to:

the character type labeling unit is specifically configured to:

10. The official document font class discrimination system of claim 6, further comprising: a font recognition model training unit, the font recognition model training unit specifically configured to: