CN116543389A

CN116543389A - Character recognition method, device, equipment and medium based on relational network

Info

Publication number: CN116543389A
Application number: CN202310236026.5A
Authority: CN
Inventors: 肖剑波; 俞翔; 谢海燕; 张乔斌; 楼京俊; 黎恒智; 张振海; 胡世峰
Original assignee: Naval University of Engineering PLA
Current assignee: Naval University of Engineering PLA
Priority date: 2023-03-13
Filing date: 2023-03-13
Publication date: 2023-08-04
Anticipated expiration: 2043-03-13
Also published as: CN116543389B

Abstract

The invention discloses a character recognition method, a device, equipment and a medium based on a relational network, wherein the method comprises the following steps: acquiring an image of a handwritten character and preprocessing the image to obtain a preprocessed image data set; taking the preprocessed image data set as input of a pre-trained relational network, and acquiring output of the relational network; according to the output of the relation network, performing recognition post-processing by using a language model, outputting a character recognition result meeting probability requirements as a target character, and returning the character recognition result not meeting the probability requirements to the relation network for re-recognition; training of the relational network includes: extracting feature graphs of the support set and the query set respectively through the embedded function; respectively performing feature map splicing and relevance score calculation on each sample feature map of the query set to obtain character recognition results corresponding to each sample; and forming a sentence by the character recognition results of the samples as the output of the relation network.

Description

Character recognition method, device, equipment and medium based on relational network

Technical Field

The present invention relates to a character image recognition technology in the field of computers, and in particular, to a method, an apparatus, a device, and a medium for character recognition based on a relational network.

Background

Currently, character recognition technology (OCR, optical Character Recognition) is applied in many fields, and can replace a keyboard to complete character input tasks in many occasions at high speed and high efficiency.

The handwritten characters are influenced by factors such as pen modes, writing habits, cultural backgrounds and the like of each person, and the written characters have large difference and are difficult to recognize.

In the related art, a deep learning neural network is generally adopted to learn the mapping relation between an image and a text based on a large amount of labeling data, so that the identification of characters in the image is realized, but the expandability of a new category is severely limited due to a large amount of iteration of label content and data. In some special fields, handwriting annotation is needed for a document, and the recognition accuracy of the model for annotation characters in the handwriting annotation is insufficient, so that the model is difficult to adapt to character recognition under the scene.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a character recognition method, a device, equipment and a medium based on a relational network, which are used for solving at least one technical problem.

According to an aspect of the present invention, there is provided a character recognition method based on a relational network, including:

acquiring an image of a handwritten character and preprocessing the image to obtain a preprocessed image data set;

taking the preprocessed image dataset as input of a pre-trained relational network, and acquiring output of the relational network;

according to the output of the relation network, performing recognition post-processing by using a language model, outputting a character recognition result meeting probability requirements as a target character, and returning the character recognition result not meeting the probability requirements to the relation network for recognition again;

wherein the training of the relational network comprises:

extracting feature graphs of a support set and a query set respectively through an embedded function, wherein the query set is a preprocessed image data set, and the support set is a standard image data set;

respectively performing feature map splicing and relevance score calculation on each sample feature map of the query set to obtain character recognition results corresponding to each sample;

and forming a sentence by the character recognition results of the samples as the output of the relation network.

According to the technical scheme, the simple, flexible and universal framework of the relation network in the small sample task is utilized to identify the handwriting annotation of the handwriting text or the office file, so that a large number of learning labels and data iterations can be reduced relative to a deep learning method, and the expansion of new categories is facilitated; in addition, the recognition result of the relation network is further processed based on the language model, so that the character recognition accuracy of the handwritten text is further improved.

Aiming at the single character recognition technology, the technical scheme carries out character recognition by utilizing embedding mapping and correlation calculation based on a relational network, the character recognition result and a neural network language model GPT-3 are matched with each other, the probability that a character appears in a certain position in a sentence with certain semantics is calculated, the character recognition result meeting the probability requirement is used as a target character to be output, and the accuracy of the output result is ensured.

As a further technical solution, performing feature graph stitching and relevance score calculation on each sample feature graph of the query set, and further including: and splicing one sample feature map in the query set with all sample feature maps in the support set to obtain spliced feature maps, carrying out correlation score calculation on the spliced feature maps, and outputting the character with the highest score as a character recognition result corresponding to the current sample.

Further, the query set has a plurality of preprocessed image data, and the preprocessed image data are sent into the relational network one by one to be identified, and each preprocessed image data can be regarded as an image sample, namely a sample to be identified. For a single sample to be identified, extracting a feature map from the sample to be identified, extracting feature maps from all samples in a support set respectively, cascading the feature maps of the sample to be identified with the feature maps of all samples in the support set in a one-to-one correspondence manner, and obtaining spliced feature maps; and then, carrying out correlation score calculation on the spliced feature images one by one, namely respectively calculating correlation scores of the samples to be identified and all the samples in the support set, and outputting the sample with the highest score as a character identification result of the sample to be identified.

As a further technical solution, the preprocessing includes: and sequentially carrying out graying, noise reduction, binarization, character segmentation and normalization on the image.

As a further technical scheme, after the pretreatment, the method further comprises: carrying out horizontal blurring on the preprocessed character image to form a communication area; performing vertical projection based on the communication area to obtain a projection curve; and calculating the angle of the inclined character based on the projection curve, and performing space rotation transformation on pixel coordinates of the inclined character to finish correction of the inclined font.

Preferably, the corrected character image is corrected by bilinear interpolation, and the corrected character image is smoothed.

As a further technical scheme, the relational network comprises an embedding module and a correlation module;

the embedding module comprises four convolution blocks and is used for respectively extracting feature graphs of the support set and the query set from the input images of the support set and the query set;

the correlation module comprises two convolution blocks and two full-connection layers, wherein the two convolution blocks are used for connecting the feature images of the query set and the feature images corresponding to the image samples in the support set in series, the two full-connection layers are used for converting the two-dimensional feature images output by convolution into one-dimensional vectors, and then the correlation score is calculated by using a Sigmoid function based on the one-dimensional vectors.

The technical scheme fully utilizes a simple, flexible and universal framework of the relation network in a small sample task, is an end-to-end network, can classify a sample from a new class without any update once training is completed, and solves the problem that the existing character recognition mode based on the deep learning neural network severely limits the expansion of the new class.

As a further technical scheme, the recognition sentences output by the relational network are input into a GPT-3 language model to predict the occurrence probability of a sentence in the language, when the predicted probability is lower than a set threshold value, the relational network is returned to carry out recognition again, and otherwise, the target characters are output.

Further, the set threshold may be determined based on an allowable error. The tolerance depends on the need for recognition accuracy.

According to an aspect of the present invention, there is provided a character recognition apparatus based on a relational network, comprising:

the acquisition module is used for acquiring the image of the handwritten character and preprocessing the image to obtain a preprocessed image data set;

the relation network identification module is used for taking the preprocessed image data set as input of a pre-trained relation network and obtaining output of the relation network;

the language model recognition post-processing module is used for carrying out recognition post-processing by utilizing a language model according to the output of the relation network, outputting a character recognition result meeting the probability requirement as a target character, and returning the character recognition result not meeting the probability requirement to the relation network for recognition again;

wherein the training of the relational network comprises:

According to the technical scheme, after the preprocessed image data set is obtained through the obtaining module, the recognition processing is sequentially carried out through the relational network recognition module, the recognition post-processing is carried out through the language model recognition post-processing module, and the character which does not meet the probability requirement in the recognition post-processing is returned to the relational network recognition module for re-recognition, so that the character recognition precision is ensured; meanwhile, the technical scheme can classify a sample from a new class without any update, is not limited by data iterative computation, and realizes the expandability of the new class.

According to an aspect of the present description, there is provided an electronic device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the steps of the relational network based character recognition method.

According to an aspect of the present description, there is provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the relational network-based character recognition method.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a method, which utilizes a simple, flexible and universal framework of a relation network in a small sample task to identify handwriting notes of handwriting texts or office files, and compared with a deep learning method, a large number of learning labels and data iterations can be reduced, so that the expansion of new categories is facilitated; in addition, the recognition result of the relation network is further processed based on the language model, so that the character recognition accuracy of the handwritten text is further improved.

The invention provides a device, which is characterized in that after a preprocessed image data set is obtained through an acquisition module, the device sequentially carries out recognition processing through a relational network recognition module and recognition post-processing through a language model recognition post-processing module, and the character which does not meet probability requirements in the recognition post-processing is returned to the relational network recognition module for re-recognition, so that the character recognition precision is ensured; meanwhile, the technical scheme can classify a sample from a new class without any update, is not limited by data iterative computation, and realizes the expandability of the new class.

Drawings

Fig. 1 is a flowchart of a character recognition method based on a relational network according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a tilt character correction flow according to an embodiment of the invention.

Fig. 3 is a schematic diagram of a relational network training process according to an embodiment of the invention.

Fig. 4 is a schematic diagram of a network structure of a relational network according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a character recognition apparatus based on a relational network according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The embodiment of the invention provides a character recognition method based on a relational network, which is shown in fig. 1 and comprises the following steps:

and step 1, acquiring a handwriting image dataset of the document.

Specifically, the tool for shooting images can be used for shooting and collecting character images in an office environment, and in order to obtain pictures with high recognition rate, the definition of the images should be ensured as much as possible during manual shooting.

Alternatively, the image of the character to be recognized may also be scanned or otherwise acquired.

And 2, preprocessing the acquired image data set.

Most of the files in the office scene are in annotation form, so that most of the pictures can appear in color form, and therefore, after the character images to be recognized are acquired, image preprocessing is needed. Specifically, firstly, graying processing can be performed on the picture by adopting a function preset by matlab; then, carrying out noise reduction treatment on the picture by using a matlab self-carrying wavelet denoising threshold method; then carrying out binarization processing on the image by utilizing an OpenCV-based algorithm threshold; and then, character segmentation and normalization are carried out on the image, so that unified algorithm is convenient to use subsequently.

Optionally, correction processing of the oblique character may also be performed on the preprocessed image dataset, as shown in fig. 2. The character recognition accuracy of the relational network is further guaranteed based on an OpenCV character inclination correction algorithm.

The character inclination correction algorithm based on OpenCV can be performed in the following manner: and carrying out horizontal blurring on the preprocessed character image to form a larger communication area, then, carrying out vertical projection to obtain a projection curve, calculating the angle of the inclined font based on the projection curve, and carrying out space rotation transformation on the pixel coordinates of the inclined font to finish correction of the inclined font.

Since the result obtained after transformation must be rounded, the distortion of the image is inevitably caused, and here, the distortion caused by rounding is reduced by using a bilinear interpolation method, and the corrected binary image is smoothed to eliminate the burr point caused by interpolation.

And 3, extracting the characteristics of the preprocessed image dataset.

The relationship network identification step includes two parts: feature extraction and relevance score calculation. The relationship network is an end-to-end network that, once trained, classifies a sample from a new class without any updates.

The relation network uses a meta learning method, and the core idea of the relation network is to learn an embedding function, map an input space (character picture in the present invention) to a new embedding space, and have a similarity measure in the new embedding space to distinguish different classes.

In the national standard, as many as 3500 commonly used characters, training a model based on the image character recognition of the relational network is much simpler and more convenient in training content than training a character recognition model based on deep learning, particularly, a large number of iterations of tag content and data severely limit the expandability of new categories, and can be well avoided in the relational network.

As shown in fig. 3 and 4, the RN includes two modules: embedding module composed of four convolution blocksThe method comprises the steps of carrying out a first treatment on the surface of the Correlation module consisting of two convolution blocks and two fully connected layers>. Wherein the convolution block is composed of a small convolution kernel of 3×3 and 64 filters. One of the two fully connected layers is used for converting the two-dimensional feature map of the convolution output into a vector of one dimension, and the other is used for obtaining a correlation score by using a Sigmoid function.

In the feature extraction step, due to the particularity of the relation network, a sample set can be randomly extracted and input into an embedding module together with a query set, a feature map is obtained through embedding function processing, and then one sample feature map of the query set and all sample feature maps of a support set are spliced together one by means of a connecting function to obtain a spliced feature map. It should be noted that, the support set is a labeled character image data set (i.e. standard image data set) for training the relational network; and the query set is a preprocessed character image dataset to be detected.

And step 4, inputting the features into a relational network model to calculate the relevance score.

In the step of calculating the correlation score, the spliced feature images are input into a correlation module to calculate the correlation score by using a correlation calculation function, and finally one-shot vector is output to represent the type with the highest similarity degree between the query set image and the support set image.

The relation network is a meta learning model, belongs to metric learning, has good performance on the problems of small samples and even zero samples, and has good prospect. For previous manual human predefined metrics methods, the relationship network further learns a migratable metric to compare relationships between pictures.

As an illustration of the manner in which the above-described examples are provided,from the query set ∈ ->From the support set; mapping by using an embedded module to obtain image characteristics +.>And->Then use a join operator +.>Directly connecting two vectors in series; the obtained result is sent to a correlation module to calculate a correlation score, and finally a correlation score of 0 to 1 is generated to indicate +.>And->Together, there are N scores,

。

further, when the relational network recognizes a character, the relational network compares each support set to obtain a correlation score, compares the correlation scores one item at a time, and then compares the correlation scores to a maximum value. One-dimensional vectors with a maximum term of 1 and the rest of 0 are output. And finally, looking at the support set corresponding to 1, namely the identification result.

In particular, unlike common classification tasks that employ cross entropy loss functions, a relational network employs mean square error to monitor similarity scores, and an optimization objective function is as follows:

。

in the optimization of the objective function,representing a relevance score,/->Representing the i-th pattern sample in the support set, +.>Represents the j-th pattern sample in the query set, m represents the number of image samples in the support set, n represents the number of image samples in the query set,representing embedded function parameters->Parameters representing a correlation calculation function.

It should be noted that this classification problem generally uses cross entropy, but since the final score is a 0 to 1 relationship score, which can also be seen as a regression problem, a mean square error (MSE, mean square error) is used as a loss function.

And 5, performing post-processing on the recognized characters by using a neural network language model.

And after the relational network recognition result is obtained, performing recognition post-processing by using the language model. Here, a neural network language model GPT-3 is utilized. The main focus of GPT-3 is a more general natural language processing model, which uses less field data and does not go through fine tuning step to solve the problem, thus providing a powerful help for the accuracy of character recognition.

Specifically, the recognized characters are input into a neural network language model, and GPT-3 is utilized to predict the occurrence probability of a sentence in the language; in the case of too low a probability, the recognition result of the character may be considered to be erroneous, so that the relationship network needs to be returned to the recognition process, otherwise, the target character may be directly output.

For a sentence which is output and is composed of a section of words ordered by a specific sequence, GPT-3 calculates the probability according to the rationality of the existence position of the words, and the condition of the relational network character recognition is evaluated according to the probability. It should be noted that, the rationality is quantified by the probability of the language model GPT-3, when the relational network recognizes a plurality of characters and combines them together into a sentence, for example, "grass is green" and "grass is active", the first one has a high probability and the second one has a low probability, and the second one having a low probability can be regarded as unreasonable.

As shown in fig. 5, the present invention further provides a character recognition device based on a relational network, including:

wherein the training of the relational network comprises:

The acquisition module is also used for carrying out graying treatment on the picture by adopting a function preset by matlab; then, carrying out noise reduction treatment on the picture by using a matlab self-carrying wavelet denoising threshold method; then carrying out binarization processing on the image by utilizing an OpenCV-based algorithm threshold; and then, character segmentation and normalization are carried out on the image, so that unified algorithm is convenient to use subsequently.

The acquisition module is also used for correcting the tilt character of the preprocessed image data set. The character inclination correction algorithm based on OpenCV can be performed in the following manner: and carrying out horizontal blurring on the preprocessed character image to form a larger communication area, then, carrying out vertical projection to obtain a projection curve, calculating the angle of the inclined font based on the projection curve, and carrying out space rotation transformation on the pixel coordinates of the inclined font to finish correction of the inclined font. Since the result obtained after transformation must be rounded, the distortion of the image is inevitably caused, and here, the distortion caused by rounding is reduced by using a bilinear interpolation method, and the corrected binary image is smoothed to eliminate the burr point caused by interpolation.

The relation network identification module is further used for randomly extracting a sample set and inputting the sample set and the query set together into the embedding module in the feature extraction step, obtaining a feature map through embedding function processing, and then splicing one sample feature map of the query set and all sample feature maps of the support set one by means of a connecting function to obtain a spliced feature map.

The relational network identification module is further used for inputting the spliced feature images into the correlation module to calculate the correlation score by using a correlation calculation function in the correlation score calculation step, and finally outputting a one-shot vector to represent the type with the highest similarity degree between the query set image and the support set image.

The language model recognition post-processing module is also used for inputting the recognized characters into a neural network language model and predicting the occurrence probability of a sentence in the language by using GPT-3; in the case of too low a probability, the recognition result of the character may be considered to be erroneous, so that the relationship network needs to be returned to the recognition process, otherwise, the target character may be directly output.

The invention also provides electronic equipment which can be an industrial personal computer, a server or a computer terminal.

The electronic device comprises a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the relational network based character recognition method.

The electronic device includes a processor, a memory, and a network interface connected by a system bus, where the memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause a processor to perform any of a number of relational network based character recognition methods.

The processor is used to provide computing and control capabilities to support the operation of the entire electronic device. The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of relational network based character recognition methods.

The network interface is used for network communication such as transmitting assigned tasks and the like. It should be appreciated that the processor may be a central processing unit (CentralProcessingUnit, CPU), but may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegratedCircuit, ASIC), field programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any of a variety of processors

A conventional processor, etc.

Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:

wherein the training of the relational network comprises:

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The present invention also provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the relational network based character recognition method.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; these modifications or substitutions do not depart from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present invention.

Claims

1. The character recognition method based on the relational network is characterized by comprising the following steps:

wherein the training of the relational network comprises:

2. The method for character recognition based on a relational network according to claim 1, wherein the feature map stitching and the correlation score calculation are performed on each sample feature map of the query set, respectively, further comprising: and splicing one sample feature map in the query set with all sample feature maps in the support set to obtain spliced feature maps, carrying out correlation score calculation on the spliced feature maps, and outputting the character with the highest score as a character recognition result corresponding to the current sample.

3. The character recognition method based on the relation network according to claim 1, wherein the preprocessing includes: and sequentially carrying out graying, noise reduction, binarization, character segmentation and normalization on the image.

4. A method of character recognition based on a relational network as in claim 3, further comprising, after preprocessing: carrying out horizontal blurring on the preprocessed character image to form a communication area; performing vertical projection based on the communication area to obtain a projection curve; and calculating the angle of the inclined character based on the projection curve, and performing space rotation transformation on pixel coordinates of the inclined character to finish correction of the inclined font.

5. The method of claim 1, wherein the relational network comprises an embedding module and a correlation module;

6. The character recognition method based on the relation network according to claim 1, wherein the recognition sentences output by the relation network are input into a GPT-3 language model to predict the probability of occurrence of a sentence in the language, and when the predicted probability is lower than a set threshold value, the relation network is returned to carry out recognition again, otherwise, the target character is output.

7. A relational network based character recognition apparatus comprising:

wherein the training of the relational network comprises:

8. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the steps of the relational network based character recognition method of any one of claims 1 to 6.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the relational network based character recognition method according to any one of claims 1 to 6.