CN116543389A - Character recognition method, device, equipment and medium based on relational network - Google Patents

Character recognition method, device, equipment and medium based on relational network Download PDF

Info

Publication number
CN116543389A
CN116543389A CN202310236026.5A CN202310236026A CN116543389A CN 116543389 A CN116543389 A CN 116543389A CN 202310236026 A CN202310236026 A CN 202310236026A CN 116543389 A CN116543389 A CN 116543389A
Authority
CN
China
Prior art keywords
character
character recognition
network
relational network
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310236026.5A
Other languages
Chinese (zh)
Other versions
CN116543389B (en
Inventor
肖剑波
俞翔
谢海燕
张乔斌
楼京俊
黎恒智
张振海
胡世峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Naval University of Engineering PLA
Original Assignee
Naval University of Engineering PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Naval University of Engineering PLA filed Critical Naval University of Engineering PLA
Priority to CN202310236026.5A priority Critical patent/CN116543389B/en
Publication of CN116543389A publication Critical patent/CN116543389A/en
Application granted granted Critical
Publication of CN116543389B publication Critical patent/CN116543389B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/141Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a character recognition method, a device, equipment and a medium based on a relational network, wherein the method comprises the following steps: acquiring an image of a handwritten character and preprocessing the image to obtain a preprocessed image data set; taking the preprocessed image data set as input of a pre-trained relational network, and acquiring output of the relational network; according to the output of the relation network, performing recognition post-processing by using a language model, outputting a character recognition result meeting probability requirements as a target character, and returning the character recognition result not meeting the probability requirements to the relation network for re-recognition; training of the relational network includes: extracting feature graphs of the support set and the query set respectively through the embedded function; respectively performing feature map splicing and relevance score calculation on each sample feature map of the query set to obtain character recognition results corresponding to each sample; and forming a sentence by the character recognition results of the samples as the output of the relation network.

Description

Character recognition method, device, equipment and medium based on relational network
Technical Field
The present invention relates to a character image recognition technology in the field of computers, and in particular, to a method, an apparatus, a device, and a medium for character recognition based on a relational network.
Background
Currently, character recognition technology (OCR, optical Character Recognition) is applied in many fields, and can replace a keyboard to complete character input tasks in many occasions at high speed and high efficiency.
The handwritten characters are influenced by factors such as pen modes, writing habits, cultural backgrounds and the like of each person, and the written characters have large difference and are difficult to recognize.
In the related art, a deep learning neural network is generally adopted to learn the mapping relation between an image and a text based on a large amount of labeling data, so that the identification of characters in the image is realized, but the expandability of a new category is severely limited due to a large amount of iteration of label content and data. In some special fields, handwriting annotation is needed for a document, and the recognition accuracy of the model for annotation characters in the handwriting annotation is insufficient, so that the model is difficult to adapt to character recognition under the scene.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a character recognition method, a device, equipment and a medium based on a relational network, which are used for solving at least one technical problem.
According to an aspect of the present invention, there is provided a character recognition method based on a relational network, including:
acquiring an image of a handwritten character and preprocessing the image to obtain a preprocessed image data set;
taking the preprocessed image dataset as input of a pre-trained relational network, and acquiring output of the relational network;
according to the output of the relation network, performing recognition post-processing by using a language model, outputting a character recognition result meeting probability requirements as a target character, and returning the character recognition result not meeting the probability requirements to the relation network for recognition again;
wherein the training of the relational network comprises:
extracting feature graphs of a support set and a query set respectively through an embedded function, wherein the query set is a preprocessed image data set, and the support set is a standard image data set;
respectively performing feature map splicing and relevance score calculation on each sample feature map of the query set to obtain character recognition results corresponding to each sample;
and forming a sentence by the character recognition results of the samples as the output of the relation network.
According to the technical scheme, the simple, flexible and universal framework of the relation network in the small sample task is utilized to identify the handwriting annotation of the handwriting text or the office file, so that a large number of learning labels and data iterations can be reduced relative to a deep learning method, and the expansion of new categories is facilitated; in addition, the recognition result of the relation network is further processed based on the language model, so that the character recognition accuracy of the handwritten text is further improved.
Aiming at the single character recognition technology, the technical scheme carries out character recognition by utilizing embedding mapping and correlation calculation based on a relational network, the character recognition result and a neural network language model GPT-3 are matched with each other, the probability that a character appears in a certain position in a sentence with certain semantics is calculated, the character recognition result meeting the probability requirement is used as a target character to be output, and the accuracy of the output result is ensured.
As a further technical solution, performing feature graph stitching and relevance score calculation on each sample feature graph of the query set, and further including: and splicing one sample feature map in the query set with all sample feature maps in the support set to obtain spliced feature maps, carrying out correlation score calculation on the spliced feature maps, and outputting the character with the highest score as a character recognition result corresponding to the current sample.
Further, the query set has a plurality of preprocessed image data, and the preprocessed image data are sent into the relational network one by one to be identified, and each preprocessed image data can be regarded as an image sample, namely a sample to be identified. For a single sample to be identified, extracting a feature map from the sample to be identified, extracting feature maps from all samples in a support set respectively, cascading the feature maps of the sample to be identified with the feature maps of all samples in the support set in a one-to-one correspondence manner, and obtaining spliced feature maps; and then, carrying out correlation score calculation on the spliced feature images one by one, namely respectively calculating correlation scores of the samples to be identified and all the samples in the support set, and outputting the sample with the highest score as a character identification result of the sample to be identified.
As a further technical solution, the preprocessing includes: and sequentially carrying out graying, noise reduction, binarization, character segmentation and normalization on the image.
As a further technical scheme, after the pretreatment, the method further comprises: carrying out horizontal blurring on the preprocessed character image to form a communication area; performing vertical projection based on the communication area to obtain a projection curve; and calculating the angle of the inclined character based on the projection curve, and performing space rotation transformation on pixel coordinates of the inclined character to finish correction of the inclined font.
Preferably, the corrected character image is corrected by bilinear interpolation, and the corrected character image is smoothed.
As a further technical scheme, the relational network comprises an embedding module and a correlation module;
the embedding module comprises four convolution blocks and is used for respectively extracting feature graphs of the support set and the query set from the input images of the support set and the query set;
the correlation module comprises two convolution blocks and two full-connection layers, wherein the two convolution blocks are used for connecting the feature images of the query set and the feature images corresponding to the image samples in the support set in series, the two full-connection layers are used for converting the two-dimensional feature images output by convolution into one-dimensional vectors, and then the correlation score is calculated by using a Sigmoid function based on the one-dimensional vectors.
The technical scheme fully utilizes a simple, flexible and universal framework of the relation network in a small sample task, is an end-to-end network, can classify a sample from a new class without any update once training is completed, and solves the problem that the existing character recognition mode based on the deep learning neural network severely limits the expansion of the new class.
As a further technical scheme, the recognition sentences output by the relational network are input into a GPT-3 language model to predict the occurrence probability of a sentence in the language, when the predicted probability is lower than a set threshold value, the relational network is returned to carry out recognition again, and otherwise, the target characters are output.
Further, the set threshold may be determined based on an allowable error. The tolerance depends on the need for recognition accuracy.
According to an aspect of the present invention, there is provided a character recognition apparatus based on a relational network, comprising:
the acquisition module is used for acquiring the image of the handwritten character and preprocessing the image to obtain a preprocessed image data set;
the relation network identification module is used for taking the preprocessed image data set as input of a pre-trained relation network and obtaining output of the relation network;
the language model recognition post-processing module is used for carrying out recognition post-processing by utilizing a language model according to the output of the relation network, outputting a character recognition result meeting the probability requirement as a target character, and returning the character recognition result not meeting the probability requirement to the relation network for recognition again;
wherein the training of the relational network comprises:
extracting feature graphs of a support set and a query set respectively through an embedded function, wherein the query set is a preprocessed image data set, and the support set is a standard image data set;
respectively performing feature map splicing and relevance score calculation on each sample feature map of the query set to obtain character recognition results corresponding to each sample;
and forming a sentence by the character recognition results of the samples as the output of the relation network.
According to the technical scheme, after the preprocessed image data set is obtained through the obtaining module, the recognition processing is sequentially carried out through the relational network recognition module, the recognition post-processing is carried out through the language model recognition post-processing module, and the character which does not meet the probability requirement in the recognition post-processing is returned to the relational network recognition module for re-recognition, so that the character recognition precision is ensured; meanwhile, the technical scheme can classify a sample from a new class without any update, is not limited by data iterative computation, and realizes the expandability of the new class.
According to an aspect of the present description, there is provided an electronic device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the steps of the relational network based character recognition method.
According to an aspect of the present description, there is provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the relational network-based character recognition method.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method, which utilizes a simple, flexible and universal framework of a relation network in a small sample task to identify handwriting notes of handwriting texts or office files, and compared with a deep learning method, a large number of learning labels and data iterations can be reduced, so that the expansion of new categories is facilitated; in addition, the recognition result of the relation network is further processed based on the language model, so that the character recognition accuracy of the handwritten text is further improved.
The invention provides a device, which is characterized in that after a preprocessed image data set is obtained through an acquisition module, the device sequentially carries out recognition processing through a relational network recognition module and recognition post-processing through a language model recognition post-processing module, and the character which does not meet probability requirements in the recognition post-processing is returned to the relational network recognition module for re-recognition, so that the character recognition precision is ensured; meanwhile, the technical scheme can classify a sample from a new class without any update, is not limited by data iterative computation, and realizes the expandability of the new class.
Drawings
Fig. 1 is a flowchart of a character recognition method based on a relational network according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a tilt character correction flow according to an embodiment of the invention.
Fig. 3 is a schematic diagram of a relational network training process according to an embodiment of the invention.
Fig. 4 is a schematic diagram of a network structure of a relational network according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a character recognition apparatus based on a relational network according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides a character recognition method based on a relational network, which is shown in fig. 1 and comprises the following steps:
and step 1, acquiring a handwriting image dataset of the document.
Specifically, the tool for shooting images can be used for shooting and collecting character images in an office environment, and in order to obtain pictures with high recognition rate, the definition of the images should be ensured as much as possible during manual shooting.
Alternatively, the image of the character to be recognized may also be scanned or otherwise acquired.
And 2, preprocessing the acquired image data set.
Most of the files in the office scene are in annotation form, so that most of the pictures can appear in color form, and therefore, after the character images to be recognized are acquired, image preprocessing is needed. Specifically, firstly, graying processing can be performed on the picture by adopting a function preset by matlab; then, carrying out noise reduction treatment on the picture by using a matlab self-carrying wavelet denoising threshold method; then carrying out binarization processing on the image by utilizing an OpenCV-based algorithm threshold; and then, character segmentation and normalization are carried out on the image, so that unified algorithm is convenient to use subsequently.
Optionally, correction processing of the oblique character may also be performed on the preprocessed image dataset, as shown in fig. 2. The character recognition accuracy of the relational network is further guaranteed based on an OpenCV character inclination correction algorithm.
The character inclination correction algorithm based on OpenCV can be performed in the following manner: and carrying out horizontal blurring on the preprocessed character image to form a larger communication area, then, carrying out vertical projection to obtain a projection curve, calculating the angle of the inclined font based on the projection curve, and carrying out space rotation transformation on the pixel coordinates of the inclined font to finish correction of the inclined font.
Since the result obtained after transformation must be rounded, the distortion of the image is inevitably caused, and here, the distortion caused by rounding is reduced by using a bilinear interpolation method, and the corrected binary image is smoothed to eliminate the burr point caused by interpolation.
And 3, extracting the characteristics of the preprocessed image dataset.
The relationship network identification step includes two parts: feature extraction and relevance score calculation. The relationship network is an end-to-end network that, once trained, classifies a sample from a new class without any updates.
The relation network uses a meta learning method, and the core idea of the relation network is to learn an embedding function, map an input space (character picture in the present invention) to a new embedding space, and have a similarity measure in the new embedding space to distinguish different classes.
In the national standard, as many as 3500 commonly used characters, training a model based on the image character recognition of the relational network is much simpler and more convenient in training content than training a character recognition model based on deep learning, particularly, a large number of iterations of tag content and data severely limit the expandability of new categories, and can be well avoided in the relational network.
As shown in fig. 3 and 4, the RN includes two modules: embedding module composed of four convolution blocksThe method comprises the steps of carrying out a first treatment on the surface of the Correlation module consisting of two convolution blocks and two fully connected layers>. Wherein the convolution block is composed of a small convolution kernel of 3×3 and 64 filters. One of the two fully connected layers is used for converting the two-dimensional feature map of the convolution output into a vector of one dimension, and the other is used for obtaining a correlation score by using a Sigmoid function.
In the feature extraction step, due to the particularity of the relation network, a sample set can be randomly extracted and input into an embedding module together with a query set, a feature map is obtained through embedding function processing, and then one sample feature map of the query set and all sample feature maps of a support set are spliced together one by means of a connecting function to obtain a spliced feature map. It should be noted that, the support set is a labeled character image data set (i.e. standard image data set) for training the relational network; and the query set is a preprocessed character image dataset to be detected.
And step 4, inputting the features into a relational network model to calculate the relevance score.
In the step of calculating the correlation score, the spliced feature images are input into a correlation module to calculate the correlation score by using a correlation calculation function, and finally one-shot vector is output to represent the type with the highest similarity degree between the query set image and the support set image.
The relation network is a meta learning model, belongs to metric learning, has good performance on the problems of small samples and even zero samples, and has good prospect. For previous manual human predefined metrics methods, the relationship network further learns a migratable metric to compare relationships between pictures.
As an illustration of the manner in which the above-described examples are provided,from the query set ∈ ->From the support set; mapping by using an embedded module to obtain image characteristics +.>And->Then use a join operator +.>Directly connecting two vectors in series; the obtained result is sent to a correlation module to calculate a correlation score, and finally a correlation score of 0 to 1 is generated to indicate +.>And->Together, there are N scores,
further, when the relational network recognizes a character, the relational network compares each support set to obtain a correlation score, compares the correlation scores one item at a time, and then compares the correlation scores to a maximum value. One-dimensional vectors with a maximum term of 1 and the rest of 0 are output. And finally, looking at the support set corresponding to 1, namely the identification result.
In particular, unlike common classification tasks that employ cross entropy loss functions, a relational network employs mean square error to monitor similarity scores, and an optimization objective function is as follows:
in the optimization of the objective function,representing a relevance score,/->Representing the i-th pattern sample in the support set, +.>Represents the j-th pattern sample in the query set, m represents the number of image samples in the support set, n represents the number of image samples in the query set,representing embedded function parameters->Parameters representing a correlation calculation function.
It should be noted that this classification problem generally uses cross entropy, but since the final score is a 0 to 1 relationship score, which can also be seen as a regression problem, a mean square error (MSE, mean square error) is used as a loss function.
And 5, performing post-processing on the recognized characters by using a neural network language model.
And after the relational network recognition result is obtained, performing recognition post-processing by using the language model. Here, a neural network language model GPT-3 is utilized. The main focus of GPT-3 is a more general natural language processing model, which uses less field data and does not go through fine tuning step to solve the problem, thus providing a powerful help for the accuracy of character recognition.
Specifically, the recognized characters are input into a neural network language model, and GPT-3 is utilized to predict the occurrence probability of a sentence in the language; in the case of too low a probability, the recognition result of the character may be considered to be erroneous, so that the relationship network needs to be returned to the recognition process, otherwise, the target character may be directly output.
For a sentence which is output and is composed of a section of words ordered by a specific sequence, GPT-3 calculates the probability according to the rationality of the existence position of the words, and the condition of the relational network character recognition is evaluated according to the probability. It should be noted that, the rationality is quantified by the probability of the language model GPT-3, when the relational network recognizes a plurality of characters and combines them together into a sentence, for example, "grass is green" and "grass is active", the first one has a high probability and the second one has a low probability, and the second one having a low probability can be regarded as unreasonable.
As shown in fig. 5, the present invention further provides a character recognition device based on a relational network, including:
the acquisition module is used for acquiring the image of the handwritten character and preprocessing the image to obtain a preprocessed image data set;
the relation network identification module is used for taking the preprocessed image data set as input of a pre-trained relation network and obtaining output of the relation network;
the language model recognition post-processing module is used for carrying out recognition post-processing by utilizing a language model according to the output of the relation network, outputting a character recognition result meeting the probability requirement as a target character, and returning the character recognition result not meeting the probability requirement to the relation network for recognition again;
wherein the training of the relational network comprises:
extracting feature graphs of a support set and a query set respectively through an embedded function, wherein the query set is a preprocessed image data set, and the support set is a standard image data set;
respectively performing feature map splicing and relevance score calculation on each sample feature map of the query set to obtain character recognition results corresponding to each sample;
and forming a sentence by the character recognition results of the samples as the output of the relation network.
The acquisition module is also used for carrying out graying treatment on the picture by adopting a function preset by matlab; then, carrying out noise reduction treatment on the picture by using a matlab self-carrying wavelet denoising threshold method; then carrying out binarization processing on the image by utilizing an OpenCV-based algorithm threshold; and then, character segmentation and normalization are carried out on the image, so that unified algorithm is convenient to use subsequently.
The acquisition module is also used for correcting the tilt character of the preprocessed image data set. The character inclination correction algorithm based on OpenCV can be performed in the following manner: and carrying out horizontal blurring on the preprocessed character image to form a larger communication area, then, carrying out vertical projection to obtain a projection curve, calculating the angle of the inclined font based on the projection curve, and carrying out space rotation transformation on the pixel coordinates of the inclined font to finish correction of the inclined font. Since the result obtained after transformation must be rounded, the distortion of the image is inevitably caused, and here, the distortion caused by rounding is reduced by using a bilinear interpolation method, and the corrected binary image is smoothed to eliminate the burr point caused by interpolation.
The relation network identification module is further used for randomly extracting a sample set and inputting the sample set and the query set together into the embedding module in the feature extraction step, obtaining a feature map through embedding function processing, and then splicing one sample feature map of the query set and all sample feature maps of the support set one by means of a connecting function to obtain a spliced feature map.
The relational network identification module is further used for inputting the spliced feature images into the correlation module to calculate the correlation score by using a correlation calculation function in the correlation score calculation step, and finally outputting a one-shot vector to represent the type with the highest similarity degree between the query set image and the support set image.
The language model recognition post-processing module is also used for inputting the recognized characters into a neural network language model and predicting the occurrence probability of a sentence in the language by using GPT-3; in the case of too low a probability, the recognition result of the character may be considered to be erroneous, so that the relationship network needs to be returned to the recognition process, otherwise, the target character may be directly output.
The invention also provides electronic equipment which can be an industrial personal computer, a server or a computer terminal.
The electronic device comprises a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the relational network based character recognition method.
The electronic device includes a processor, a memory, and a network interface connected by a system bus, where the memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause a processor to perform any of a number of relational network based character recognition methods.
The processor is used to provide computing and control capabilities to support the operation of the entire electronic device. The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of relational network based character recognition methods.
The network interface is used for network communication such as transmitting assigned tasks and the like. It should be appreciated that the processor may be a central processing unit (CentralProcessingUnit, CPU), but may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegratedCircuit, ASIC), field programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any of a variety of processors
A conventional processor, etc.
Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:
acquiring an image of a handwritten character and preprocessing the image to obtain a preprocessed image data set;
taking the preprocessed image dataset as input of a pre-trained relational network, and acquiring output of the relational network;
according to the output of the relation network, performing recognition post-processing by using a language model, outputting a character recognition result meeting probability requirements as a target character, and returning the character recognition result not meeting the probability requirements to the relation network for recognition again;
wherein the training of the relational network comprises:
extracting feature graphs of a support set and a query set respectively through an embedded function, wherein the query set is a preprocessed image data set, and the support set is a standard image data set;
respectively performing feature map splicing and relevance score calculation on each sample feature map of the query set to obtain character recognition results corresponding to each sample;
and forming a sentence by the character recognition results of the samples as the output of the relation network.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The present invention also provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the relational network based character recognition method.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; these modifications or substitutions do not depart from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present invention.

Claims (9)

1. The character recognition method based on the relational network is characterized by comprising the following steps:
acquiring an image of a handwritten character and preprocessing the image to obtain a preprocessed image data set;
taking the preprocessed image dataset as input of a pre-trained relational network, and acquiring output of the relational network;
according to the output of the relation network, performing recognition post-processing by using a language model, outputting a character recognition result meeting probability requirements as a target character, and returning the character recognition result not meeting the probability requirements to the relation network for recognition again;
wherein the training of the relational network comprises:
extracting feature graphs of a support set and a query set respectively through an embedded function, wherein the query set is a preprocessed image data set, and the support set is a standard image data set;
respectively performing feature map splicing and relevance score calculation on each sample feature map of the query set to obtain character recognition results corresponding to each sample;
and forming a sentence by the character recognition results of the samples as the output of the relation network.
2. The method for character recognition based on a relational network according to claim 1, wherein the feature map stitching and the correlation score calculation are performed on each sample feature map of the query set, respectively, further comprising: and splicing one sample feature map in the query set with all sample feature maps in the support set to obtain spliced feature maps, carrying out correlation score calculation on the spliced feature maps, and outputting the character with the highest score as a character recognition result corresponding to the current sample.
3. The character recognition method based on the relation network according to claim 1, wherein the preprocessing includes: and sequentially carrying out graying, noise reduction, binarization, character segmentation and normalization on the image.
4. A method of character recognition based on a relational network as in claim 3, further comprising, after preprocessing: carrying out horizontal blurring on the preprocessed character image to form a communication area; performing vertical projection based on the communication area to obtain a projection curve; and calculating the angle of the inclined character based on the projection curve, and performing space rotation transformation on pixel coordinates of the inclined character to finish correction of the inclined font.
5. The method of claim 1, wherein the relational network comprises an embedding module and a correlation module;
the embedding module comprises four convolution blocks and is used for respectively extracting feature graphs of the support set and the query set from the input images of the support set and the query set;
the correlation module comprises two convolution blocks and two full-connection layers, wherein the two convolution blocks are used for connecting the feature images of the query set and the feature images corresponding to the image samples in the support set in series, the two full-connection layers are used for converting the two-dimensional feature images output by convolution into one-dimensional vectors, and then the correlation score is calculated by using a Sigmoid function based on the one-dimensional vectors.
6. The character recognition method based on the relation network according to claim 1, wherein the recognition sentences output by the relation network are input into a GPT-3 language model to predict the probability of occurrence of a sentence in the language, and when the predicted probability is lower than a set threshold value, the relation network is returned to carry out recognition again, otherwise, the target character is output.
7. A relational network based character recognition apparatus comprising:
the acquisition module is used for acquiring the image of the handwritten character and preprocessing the image to obtain a preprocessed image data set;
the relation network identification module is used for taking the preprocessed image data set as input of a pre-trained relation network and obtaining output of the relation network;
the language model recognition post-processing module is used for carrying out recognition post-processing by utilizing a language model according to the output of the relation network, outputting a character recognition result meeting the probability requirement as a target character, and returning the character recognition result not meeting the probability requirement to the relation network for recognition again;
wherein the training of the relational network comprises:
extracting feature graphs of a support set and a query set respectively through an embedded function, wherein the query set is a preprocessed image data set, and the support set is a standard image data set;
respectively performing feature map splicing and relevance score calculation on each sample feature map of the query set to obtain character recognition results corresponding to each sample;
and forming a sentence by the character recognition results of the samples as the output of the relation network.
8. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the steps of the relational network based character recognition method of any one of claims 1 to 6.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the relational network based character recognition method according to any one of claims 1 to 6.
CN202310236026.5A 2023-03-13 2023-03-13 Character recognition method, device, equipment and medium based on relational network Active CN116543389B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310236026.5A CN116543389B (en) 2023-03-13 2023-03-13 Character recognition method, device, equipment and medium based on relational network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310236026.5A CN116543389B (en) 2023-03-13 2023-03-13 Character recognition method, device, equipment and medium based on relational network

Publications (2)

Publication Number Publication Date
CN116543389A true CN116543389A (en) 2023-08-04
CN116543389B CN116543389B (en) 2023-09-19

Family

ID=87451222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310236026.5A Active CN116543389B (en) 2023-03-13 2023-03-13 Character recognition method, device, equipment and medium based on relational network

Country Status (1)

Country Link
CN (1) CN116543389B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942302A (en) * 2014-04-16 2014-07-23 苏州大学 Method for establishment and application of inter-relevance-feedback relational network
CN110502734A (en) * 2019-07-30 2019-11-26 苏州闻道网络科技股份有限公司 A kind of document creation method and device
CN111739517A (en) * 2020-07-01 2020-10-02 腾讯科技(深圳)有限公司 Speech recognition method, speech recognition device, computer equipment and medium
US20200364302A1 (en) * 2019-05-15 2020-11-19 Captricity, Inc. Few-shot language model training and implementation
WO2021025290A1 (en) * 2019-08-06 2021-02-11 삼성전자 주식회사 Method and electronic device for converting handwriting input to text
WO2021073266A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Image detection-based test question checking method and related device
CN113312035A (en) * 2021-05-17 2021-08-27 南京大学 Hyperridge Fabric-oriented intelligent contract development plug-in
CN113486174A (en) * 2021-06-15 2021-10-08 北京三快在线科技有限公司 Model training, reading understanding method and device, electronic equipment and storage medium
CN114707509A (en) * 2022-03-29 2022-07-05 中南大学 Traffic named entity recognition method and device, computer equipment and storage medium
CN114781364A (en) * 2021-11-30 2022-07-22 浙江航天恒嘉数据科技有限公司 Relation extraction method and system based on statement entity relation network
CN115496057A (en) * 2022-10-14 2022-12-20 重庆长安新能源汽车科技有限公司 Product technical data management method, device, equipment and medium
CN115527212A (en) * 2021-11-09 2022-12-27 上海曌睿信息科技有限公司 Character recognition processing system based on feature training
CN115690797A (en) * 2022-10-09 2023-02-03 中车工业研究院有限公司 Character recognition method, device, equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942302A (en) * 2014-04-16 2014-07-23 苏州大学 Method for establishment and application of inter-relevance-feedback relational network
US20200364302A1 (en) * 2019-05-15 2020-11-19 Captricity, Inc. Few-shot language model training and implementation
CN110502734A (en) * 2019-07-30 2019-11-26 苏州闻道网络科技股份有限公司 A kind of document creation method and device
WO2021025290A1 (en) * 2019-08-06 2021-02-11 삼성전자 주식회사 Method and electronic device for converting handwriting input to text
WO2021073266A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Image detection-based test question checking method and related device
CN111739517A (en) * 2020-07-01 2020-10-02 腾讯科技(深圳)有限公司 Speech recognition method, speech recognition device, computer equipment and medium
CN113312035A (en) * 2021-05-17 2021-08-27 南京大学 Hyperridge Fabric-oriented intelligent contract development plug-in
CN113486174A (en) * 2021-06-15 2021-10-08 北京三快在线科技有限公司 Model training, reading understanding method and device, electronic equipment and storage medium
CN115527212A (en) * 2021-11-09 2022-12-27 上海曌睿信息科技有限公司 Character recognition processing system based on feature training
CN114781364A (en) * 2021-11-30 2022-07-22 浙江航天恒嘉数据科技有限公司 Relation extraction method and system based on statement entity relation network
CN114707509A (en) * 2022-03-29 2022-07-05 中南大学 Traffic named entity recognition method and device, computer equipment and storage medium
CN115690797A (en) * 2022-10-09 2023-02-03 中车工业研究院有限公司 Character recognition method, device, equipment and storage medium
CN115496057A (en) * 2022-10-14 2022-12-20 重庆长安新能源汽车科技有限公司 Product technical data management method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FLOOD SUNG等: "Learning to Compare: Relation Network for Few-Shot Learning", 《ARXIV:1711.06025V2》, pages 1 - 10 *
薛竹君;杨树强;束阳雪;: "基于实体关系网络的微博文本摘要", 计算机科学, vol. 43, no. 09, pages 77 - 81 *

Also Published As

Publication number Publication date
CN116543389B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
Goodfellow et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks
CN111275046B (en) Character image recognition method and device, electronic equipment and storage medium
RU2693916C1 (en) Character recognition using a hierarchical classification
US11790675B2 (en) Recognition of handwritten text via neural networks
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
Chaitra et al. Deep-CNNTL: text localization from natural scene images using deep convolution neural network with transfer learning
Gupta et al. Machine learning tensor flow based platform for recognition of hand written text
Mohammad et al. Contour-based character segmentation for printed Arabic text with diacritics
CN111144469B (en) End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network
CN116075820A (en) Method, non-transitory computer readable storage medium and apparatus for searching image database
JP2019194788A (en) Learning device, recognition device, learning method and computer program
CN109558883B (en) Blade feature extraction method and device
CN116543389B (en) Character recognition method, device, equipment and medium based on relational network
US11715288B2 (en) Optical character recognition using specialized confidence functions
Nath et al. Improving various offline techniques used for handwritten character recognition: a review
Kataria et al. CNN-bidirectional LSTM based optical character recognition of Sanskrit manuscripts: A comprehensive systematic literature review
US20220398399A1 (en) Optical character recognition systems and methods for personal data extraction
Ren et al. A transformer-based decoupled attention network for text recognition in shopping receipt images
CN115203408A (en) Intelligent labeling method for multi-modal test data
Hu et al. Mathematical formula detection in document images: A new dataset and a new approach
Palani et al. Detecting and extracting information of medicines from a medical prescription using deep learning and computer vision
CN108334884B (en) Handwritten document retrieval method based on machine learning
Jain Unconstrained Arabic & Urdu text recognition using deep CNN-RNN hybrid networks
CN115039144A (en) Mathematical detection in handwriting
Bhatt et al. Text Extraction & Recognition from Visiting Cards

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant