CN115439850A - Image-text character recognition method, device, equipment and storage medium based on examination sheet - Google Patents

Image-text character recognition method, device, equipment and storage medium based on examination sheet Download PDF

Info

Publication number
CN115439850A
CN115439850A CN202211231705.5A CN202211231705A CN115439850A CN 115439850 A CN115439850 A CN 115439850A CN 202211231705 A CN202211231705 A CN 202211231705A CN 115439850 A CN115439850 A CN 115439850A
Authority
CN
China
Prior art keywords
network
image
character recognition
character
improved
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211231705.5A
Other languages
Chinese (zh)
Inventor
闫昊
周逸峰
刘凯
苏超
刘屹
叶颖琦
王皖麟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Zhirong Supply Chain Service Co ltd
Original Assignee
China Merchants Tongshang Financial Leasing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Tongshang Financial Leasing Co ltd filed Critical China Merchants Tongshang Financial Leasing Co ltd
Priority to CN202211231705.5A priority Critical patent/CN115439850A/en
Publication of CN115439850A publication Critical patent/CN115439850A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to an artificial intelligence technology, and discloses a text character recognition method, a text character recognition device, text character recognition equipment and a storage medium based on an examination order. The method comprises the following steps: acquiring a document image to be checked input by a user, and performing processing operation based on character adjustment and watermark elimination on the document image to be checked according to a preset character cleaning strategy to obtain a clean image; acquiring a pre-trained character recognition model according to a preset model construction strategy, and performing image feature extraction operation and image cutting operation on the clean image by using an improved DB network in the character recognition model to obtain a cut image set; and performing character recognition on the cut image set by utilizing an improved character recognition network of the character recognition model to obtain a character recognition result corresponding to the document image to be audited. The invention can improve the accuracy of image-text character recognition based on the examination sheet.

Description

Image-text character recognition method, device, equipment and storage medium based on examination sheet
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a text character recognition method, a text character recognition device, text character recognition equipment and a computer readable storage medium based on an examination order.
Background
With the acceleration of the globalization process, products come and go increasingly frequently around the world. With the customs declaration becoming more standard and mature, the number of declaration documents rapidly increases, and the manual audit is gradually replaced by the intelligent audit, but the current declaration document audit process is limited by document complexity, such as declaration contract format, shooting conditions, scanning conditions, damage effect and the like, so that the accuracy of the intelligent audit is not high.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for identifying image-text characters based on an examination order, and mainly aims to improve the accuracy of image-text character identification based on the examination order.
In order to achieve the above object, the invention provides an examination-order-based image-text character recognition method, which comprises the following steps:
acquiring a document image to be checked input by a user, and performing processing operation based on character adjustment and watermark elimination on the document image to be checked according to a preset character cleaning strategy to obtain a clean image;
acquiring a pre-trained character recognition model according to a preset model construction strategy, and performing image feature extraction operation and image cutting operation on the clean image by using an improved DB network in the character recognition model to obtain a cut image set;
and performing character recognition on the cut image set by utilizing an improved character recognition network of the character recognition model to obtain a character recognition result corresponding to the document image to be audited.
Optionally, the obtaining a pre-trained character recognition model according to a preset model building strategy includes:
acquiring a character detection network based on a DB network and comprising a convolutional neural network and an FPN structure, and a character recognition network based on a CRNN and comprising the convolutional neural network and a BilSTM;
according to a preset model construction strategy, respectively carrying out batch normalization and activation function deletion operations of preset positions on a first EfficientNet network and a second EfficientNet network which are constructed in advance to obtain an improved first EfficientNet network and an improved second EfficientNet network;
replacing a residual error neural network in a pre-constructed DB network by using the improved first EfficientNet network to obtain an improved DB network;
replacing a convolutional neural network in the character recognition network by using the improved second EfficientNet network, and replacing a BiLSTM network in the character recognition network by using a pre-constructed transformer network to obtain an improved character recognition network;
constructing a character recognition model by utilizing the improved DB network and the improved character recognition network;
and acquiring pre-constructed focusing dice loss, cross entropy loss and preset auxiliary task configuration, and training the character recognition model to obtain a trained character recognition model.
Optionally, the obtaining of the pre-constructed focused die loss includes:
obtaining a focus loss focal loss, and die loss dice loss for the modified DB network:
focal loss=(abs(g x,y -p x,y )) γ l x,y
Figure BDA0003881314740000021
wherein x, y are the coordinates of the image pixel, p x,y Is the network prediction value at (x, y), g x,y Is the true tag value at (x, y), l x,y Is the original loss, γ is the modulation factor;
weighting the focusing loss and the dice loss to obtain a focusing dice loss, focal loss:
Figure BDA0003881314740000022
optionally, the training the character recognition model to obtain a trained character recognition model includes:
performing model internal task loss configuration on a first target layer of an improved DB network of the character recognition model and a second target layer of the transformer network by using cross entropy loss according to a preset auxiliary training strategy;
extracting training samples in batches from a pre-constructed training set, and carrying out network forward recognition on the training samples by using the character recognition model to obtain a final prediction result;
calculating a loss value between the final prediction result and a real label value of the training sample according to the focused dice loss;
calculating the gradient of model parameters according to the loss value reverse propagation, and performing network reverse updating on the model parameters to obtain an updated character recognition model;
calculating the model precision of the updated character recognition model by utilizing a pre-constructed test sample set;
judging whether the model precision meets a preset requirement or not;
when the model precision does not reach the preset requirement, returning to the step of extracting the training sample from the pre-constructed training set, and carrying out iterative updating on the updated character recognition model;
and when the model precision reaches a preset requirement, taking the updated character recognition model which is updated finally as the character recognition model which is trained.
Optionally, the performing, by using an improved DB network in the character recognition model, an image feature extraction operation and an image cropping operation on the clean image to obtain a cropped image set includes:
sequentially performing preset times of continuous convolution operation, one batch normalization operation and one function activation operation on the clean image by utilizing an improved first EfficientNet network of an improved DB network in the character recognition model to obtain a characteristic matrix set;
performing feature fusion operation on the feature matrix set to obtain an image feature sequence set;
and carrying out full connection operation on the image feature sequence set by using a full connection layer to obtain a character region classification result, judging by using an image connection region according to the character region classification result to obtain a character region position result, and carrying out image interception on a character region in the clean image according to the character region position result to obtain a cut image set.
Optionally, the performing, according to a preset character cleaning policy, a processing operation based on character adjustment and watermark elimination on the document image to be audited to obtain a clean image includes:
according to a preset character cleaning strategy, carrying out Gaussian blurring and graying operation on the document image to be checked to obtain a gray image;
carrying out self-adaptive binarization and morphological corrosion operations on the gray level image to obtain a character corrosion image, and framing a minimum external rectangle of the character corrosion image to obtain a character area image;
identifying the integrity of the edge characters of the text area image;
when the edge characters are complete, taking the character area image as a complete character image;
when the edge characters are incomplete, performing edge supplement operation on the edge characters to obtain complete character images;
performing image feature extraction operation on the complete character image to obtain an image feature set, and identifying whether the image contains a seal or a watermark;
and according to the seal detection and watermark identification results, carrying out pixel value adjustment operation of self-adaptive color channel separation, and eliminating the seal characters and the watermark characters to obtain a clean image.
In order to solve the above problems, the present invention further provides an examination-order-based text-character recognition apparatus, comprising:
the image preprocessing module is used for acquiring a document image to be checked input by a user, and performing processing operation based on character adjustment and watermark elimination on the document image to be checked according to a preset character cleaning strategy to obtain a clean image;
the image feature extraction module is used for acquiring a pre-trained character recognition model according to a preset model construction strategy, and performing image feature extraction operation and image cutting operation on the clean image by utilizing an improved DB network in the character recognition model to obtain a cut image set;
and the image characteristic identification module is used for carrying out character identification on the cutting image set by utilizing an improved character identification network of the character identification model to obtain a character identification result corresponding to the document image to be audited.
Optionally, the obtaining a pre-trained character recognition model according to a preset model building strategy includes:
acquiring a character detection network based on a DB network and comprising a convolutional neural network and an FPN structure, and a character recognition network based on a CRNN and comprising the convolutional neural network and a BilSTM;
according to a preset model construction strategy, respectively carrying out batch normalization and activation function deletion operations of preset positions on a first EfficientNet network and a second EfficientNet network which are constructed in advance to obtain an improved first EfficientNet network and an improved second EfficientNet network;
replacing a residual error neural network in a pre-constructed DB network by using the improved first EfficientNet network to obtain an improved DB network;
replacing a convolutional neural network in the character recognition network by using the improved second EfficientNet network, and replacing a BiLSTM network in the character recognition network by using a pre-constructed transformer network to obtain an improved character recognition network;
constructing a character recognition model by utilizing the improved DB network and the improved character recognition network;
and acquiring pre-constructed focusing dice loss, cross entropy loss and preset auxiliary task configuration, and training the character recognition model to obtain a trained character recognition model.
In order to solve the above problem, the present invention also provides an electronic device, including:
at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the above-described document examination-based teletext character recognition method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is executed by a processor in an electronic device to implement the above text-based text-character recognition method.
The embodiment of the invention firstly carries out character adjustment and watermark elimination operations on the document image to be checked input by a user through a character cleaning strategy, thereby increasing the accuracy of subsequent identification; then, performing feature extraction on the clean image by using an improved DB network in a pre-trained character recognition model to obtain a file feature sequence set, wherein the improved DB network adopts an Efficientnet network to replace an original residual error neural network to obtain a better neural network width and depth ratio, then batch normalization processing and activation functions in a network layer are reduced, image information can be transmitted backwards to a greater extent, and data richness of the file feature sequence set is increased; and then, carrying out character recognition on the file feature sequence set through an improved character recognition network to obtain a character recognition result, wherein the improved character recognition network replaces a BilSTM network by transform except for adopting a mode of reducing batch normalization and activation functions of an Efficientnet network, thereby further increasing the extraction strength of the context features. Therefore, the method, the device, the equipment and the storage medium for identifying the graphic characters based on the examination order provided by the embodiment of the invention can improve the accuracy of the identification of the graphic characters based on the examination order.
Drawings
Fig. 1 is a schematic flowchart of a text-text character recognition method based on an examination order according to an embodiment of the present invention;
fig. 2 is a detailed flowchart illustrating a step in the examination-based text-character recognition method according to an embodiment of the present invention;
fig. 3 is a detailed flowchart illustrating a step in the examination-order-based text-character recognition method according to an embodiment of the present invention;
fig. 4 is a detailed flowchart illustrating a step in the examination-order-based text-character recognition method according to an embodiment of the present invention;
FIG. 5 is a functional block diagram of an examination-sheet-based teletext character recognition arrangement according to an embodiment of the invention;
fig. 6 is a schematic structural diagram of an electronic device implementing the examination-form-based text-character recognition method according to an embodiment of the present invention.
The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides an image-text character recognition method based on an examination order. In the embodiment of the present application, the execution subject of the examination-form-based text-text character recognition method includes, but is not limited to, at least one of electronic devices, such as a server and a terminal, which can be configured to execute the method provided in the embodiment of the present application. In other words, the examination-form-based teletext character recognition method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.
Fig. 1 is a schematic flow chart of an examination-based text-character recognition method according to an embodiment of the present invention. In this embodiment, the method for identifying text characters based on an examination order includes:
s1, obtaining a document image to be checked input by a user, and performing processing operation based on character adjustment and watermark elimination on the document image to be checked according to a preset character cleaning strategy to obtain a clean image.
In the embodiment of the invention, the document image to be audited can be an application document received by relevant departments such as customs and the like, wherein the document image is mostly various contracts in an image form, and the conditions of partial dark light, reflection, watermark existence, seal existence, complete character shooting and the like exist.
The character cleaning strategy in the embodiment of the invention is to perform denoising, gray level change and other processing on the whole image by a conventional means, then capture the range of a character region, cut the character region in the image, further increase the subsequent identification accuracy, judge whether the edge position is complete, and perform self-adaptive edge repairing if the edge position is incomplete; and finally, adjusting the pixel value of the image, and removing the seal and the watermark.
Specifically, referring to fig. 2, in the embodiment of the present invention, the performing, according to a preset text cleaning policy, a processing operation based on text adjustment and watermark elimination on the document image to be audited to obtain a clean image includes:
s11, carrying out Gaussian blur and graying operation on the document image to be checked according to a preset character cleaning strategy to obtain a gray image;
s12, carrying out self-adaptive binarization and morphological corrosion operations on the gray level image to obtain a character corrosion image, and framing a minimum circumscribed rectangle of the character corrosion image to obtain a character area image;
s13, identifying the integrity of the edge characters of the character area image;
when the edge characters are complete, S14, taking the character area image as a complete character image;
when the edge characters are incomplete, S15, performing edge supplement operation on the edge characters to obtain complete character images;
s16, carrying out image feature extraction operation on the complete character image to obtain an image feature set, and identifying whether the image contains a seal or a watermark;
s17, according to the seal detection and watermark identification results, pixel value adjustment operation of self-adaptive color channel separation is carried out, and the seal characters and the watermark characters are eliminated to obtain a clean image.
In the embodiment of the invention, the document image to be examined can be subjected to denoising and ashing through a Gaussian formula and a graying formula to obtain a grayscale image, then the grayscale image is further sharpened through binarization, and then the morphological corrosion is carried out on the image by using an opencv module method of python to communicate the character area in the image, thereby facilitating the subsequent minimum circumscribed rectangle frame selection of the character area.
Then, in the embodiment of the invention, a minimum rectangle which can contain all characters is selected by a minimum external method to obtain a character area image, and self-adaptive edge supplement is carried out by judging the size of the character area image and the size of an original image. After the complete character image is obtained, the seal position can be eliminated by detecting whether the seal exists or not and the position area of the seal if the seal exists through a pre-trained detection model capable of detecting whether the seal exists or not; whether the image has the watermark or not is identified through a pre-trained classification model capable of judging whether the image has the watermark or not, and if the image has the watermark, the pixel value of the watermark is judged, so that the watermark is adaptively eliminated, and the clean image is obtained.
S2, according to a preset model construction strategy, obtaining a pre-trained character recognition model, and performing image feature extraction operation and image cutting operation on the clean image by using an improved DB network in the character recognition model to obtain a cut image set.
In the embodiment of the invention, the model construction strategy is a model construction method configured for obtaining a better neural network width and depth ratio by a model, transmitting image information backwards to a greater extent and further increasing the extraction strength of context features.
In detail, referring to fig. 3, in the embodiment of the present invention, the obtaining a pre-trained character recognition model according to a preset model building strategy includes:
s21, acquiring a character detection network based on a DB network and containing a convolutional neural network and an FPN structure, and a character recognition network based on a CRNN and containing the convolutional neural network and a BilSTM;
s22, respectively carrying out batch normalization and activation function deletion operations of preset positions on the first and second pre-constructed efficientNet networks according to a preset model construction strategy to obtain an improved first and second efficientNet network;
s23, replacing a residual error neural network in the pre-constructed DB network by using the improved first EfficientNet network to obtain an improved DB network;
s24, replacing a convolutional neural network in the character recognition network by using the improved second EfficientNet network, and replacing a BiLSTM network in the character recognition network by using a pre-constructed transformer network to obtain an improved character recognition network;
s25, constructing a character recognition model by utilizing the improved DB network and the improved character recognition network;
s26, obtaining pre-constructed focusing dice loss, cross entropy loss and preset auxiliary task configuration, and training the character recognition model to obtain a trained character recognition model.
It should be noted that the open source optical character recognition method is divided into two parts of character detection and character recognition, wherein the character detection part is framed by a DB network model structure, and the character recognition part is framed by a circular convolution neural network structure.
The original network structure of the DB network takes ResNet18 or ResNet50 of a convolutional neural network as a backbone network and is used for extracting image features; the original network structure of the cyclic convolutional neural network is a VGG16 of the convolutional neural network and a BilSTM network of the cyclic neural network, the VGG16 is used for extracting image features and outputting an image feature sequence, the BilSTM network is used for extracting context-related features of sequences in the image feature sequence output by the VGG16, outputting a feature sequence simultaneously containing the image features and the context-related information features, and inputting the feature sequence to a subsequently connected full-connection layer to obtain a final character recognition result.
Further, the Efficientnet network is a network in which the network performance is improved by simultaneously adjusting the width of the network, the depth of the network, and the resolution of the input network.
In the embodiment of the invention, because the Efficientnet has better width and depth proportion of the neural network, the image characteristic information can be more effectively extracted under the condition of the same number of parameters, and therefore, the residual error neural network in the DB network and the VGG16 in the character recognition network are respectively replaced by the Efficientnet-B1 and the Efficientnet-B3 of the convolutional neural network. Wherein, B1 and B3 represent that the width and depth expansion of different coefficients is carried out on the basic network B0, and the model is increased to achieve the effect of improving the precision of the model. Wherein, a batch normalization and an activation function exist among the neural network layers, and a general convolutional neural network layer can be expressed as a [ convolution-batch normalization-activation function ].
The activation function is commonly used as Relu, and the Relu function can set the negative neural network unit output to zero, so that the part of information is lost; batch normalization reforms the input data of the same batch into a distribution with a mean of 0 and a variance of 1, and the process of batch normalization also loses a part of the information by adding a learnable bias. Therefore, in order to increase the information propagation effect, in the embodiment of the present invention, the Efficientnet-B1 and the Efficientnet-B3 are modified by deleting the activation function and performing batch normalization, so as to obtain the first and second improved efficientnets, where an example improved network may be represented as [ convolution-batch normalization-activation function ].
Finally, the BiLSTM network in the character recognition network is replaced by the transducer network to obtain an improved character recognition network, wherein the attention mechanism in the transducer network can obtain more abundant context characteristic information than the BiLSTM network.
Further, in the embodiment of the present invention, the obtaining of the loss of the pre-structure includes:
obtaining a focus loss, focal loss, and dice loss, dice loss of the DB network:
focal loss=(abs(g x,y -p x,y )) γ l x,y
Figure BDA0003881314740000091
wherein x, y are the coordinates of the image pixel, p x,y Is the network prediction value at (x, y), g x,y Is the true tag value at (x, y), l x,y Is originalInitial loss, γ is the modulation factor;
weighting the focusing loss and the dice loss to obtain a focusing dice loss, focal dice loss:
Figure BDA0003881314740000101
where γ is a modulation factor that controls the amplitude of the adjustment of the final loss relative to the original loss, and is typically set to 2.
It should be noted that in the text detection task, most of the input image is non-text regions, and a small part is text regions, so that the model is more likely to learn the features of the non-text regions rather than the feature books of the text regions. In order to solve the problem of unbalance of positive and negative samples, the dice loss corresponding to the DB original network is obtained firstly, however, the local loss is used in the invention, so that the model is more concerned with learning of difficult samples, and the focused dice loss is obtained.
Further, referring to fig. 4, in the embodiment of the present invention, the training the character recognition model to obtain a trained character recognition model includes:
s201, performing model internal task loss configuration on a first target layer of an improved DB network of the character recognition model and a second target layer of the transformer network by using cross entropy loss according to a preset auxiliary training strategy;
s202, extracting training samples from a pre-constructed training set in batches, and carrying out network forward recognition on the training samples by using the character recognition model to obtain a final prediction result;
s203, calculating a loss value between the final prediction result and a real label value of the training sample according to the focused dice loss;
s204, calculating the gradient of the model parameters according to the loss value reverse propagation, and performing network reverse updating on the model parameters to obtain an updated character recognition model;
s205, calculating the model precision of the updated character recognition model by using a pre-constructed test sample set and the test sample set;
s206, judging whether the model precision meets a preset requirement or not;
when the model precision does not reach the preset requirement, repeating the step S202, and carrying out iterative updating on the updated character recognition model;
and when the model precision reaches the preset requirement, S207, taking the updated character recognition model updated finally as the trained character recognition model.
In the embodiment of the invention, auxiliary loss is set in addition to a training method adopting gradient reverse propagation. Wherein the auxiliary loss (auxiliary loss) is added at the penultimate and third penultimate convolutional layers of the head of the improved DB network, and the penultimate and third last convolutional layers of the head of the improved DB network are used to output the result respectively and independently calculate the loss, which is called auxiliary loss. The operation is only carried out in the training process and not in the reasoning process, and only the final layer outputs the result in the reasoning process. The embodiment of the invention can increase the convergence rate and improve the model training efficiency by adding the auxiliary loss in the training process; and the supervision is enhanced, the back propagation of the gradient is enhanced, and the character recognition model can learn better. In addition, an auxiliary task is also set in the penultimate layer of the transform network, and the training effect is the same, which is not described herein again.
After obtaining the character recognition model, in the embodiment of the present invention, the performing, by using an improved DB network in the character recognition model, an image feature extraction operation and an image cropping operation on the clean image to obtain a cropped image set includes:
sequentially performing preset times of continuous convolution operation, one batch normalization operation and one function activation operation on the clean image by utilizing an improved first EfficientNet network of an improved DB network in the character recognition model to obtain a characteristic matrix set;
performing feature fusion operation on the feature matrix set to obtain an image feature sequence set;
and carrying out full connection operation on the image feature sequence set by using a full connection layer to obtain a character region classification result, judging by using an image connection region according to the character region classification result to obtain a character region position result, and carrying out image interception on a character region in the clean image according to the character region position result to obtain a cut image set.
In the embodiment of the invention, the image connected region judgment refers to a common and basic method for finding and marking each connected region in an image in a plurality of application fields of CVPR and image analysis processing, wherein the connected region refers to an image region which has the same pixel value and is formed by adjacent foreground pixel points.
And S3, carrying out character recognition on the cut image set by utilizing an improved character recognition network of the character recognition model to obtain a character recognition result corresponding to the document image to be audited.
In the embodiment of the invention, the image feature extraction is carried out on the cutting image set by utilizing an improved second EfficientNet network of the improved character recognition network, the context feature extraction is carried out on the extracted features by a transformer network to obtain an attention enhancement feature set, and then the attention enhancement feature set is subjected to character recognition by a full connection layer to obtain a final character recognition result.
The embodiment of the invention firstly carries out character adjustment and watermark elimination operations on the document image to be checked input by a user through a character cleaning strategy, thereby increasing the accuracy of subsequent identification; then, performing feature extraction on the clean image by using an improved DB network in a pre-trained character recognition model to obtain a file feature sequence set, wherein the improved DB network adopts an Efficientnet network to replace an original residual error neural network to obtain a better neural network width and depth ratio, then batch normalization processing and activation functions in a network layer are reduced, image information can be transmitted backwards to a greater extent, and data richness of the file feature sequence set is increased; and then, carrying out character recognition on the file feature sequence set through an improved character recognition network to obtain a character recognition result, wherein the improved character recognition network replaces a BilSTM network by transform except for adopting a mode of reducing batch normalization and activation functions of an Efficientnet network, thereby further increasing the extraction strength of the context features. Therefore, the method for identifying the image-text characters based on the examination order provided by the embodiment of the invention can improve the accuracy of identifying the image-text characters based on the examination order.
Fig. 5 is a functional block diagram of an examination-based teletext character recognition arrangement according to an embodiment of the present invention.
The examination-order-based graphic character recognition device 100 of the present invention may be installed in an electronic device. According to the realized functions, the examination-form-based teletext character recognition arrangement 100 may comprise an image preprocessing module 101, an image feature extraction module 102 and an image feature recognition module 103. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the image preprocessing module 101 is configured to obtain a document image to be audited input by a user, and perform processing operations based on character adjustment and watermark elimination on the document image to be audited according to a preset character cleaning policy to obtain a clean image;
the image feature extraction module 102 is configured to obtain a pre-trained character recognition model according to a preset model construction strategy, and perform image feature extraction operation and image clipping operation on the clean image by using an improved DB network in the character recognition model to obtain a clipped image set;
the image feature recognition module 103 is configured to perform text recognition on the cut image set by using an improved text recognition network of the character recognition model to obtain a character recognition result corresponding to the document image to be checked.
In detail, when the modules in the examination-sheet-based text-text character recognition apparatus 100 in the embodiment of the present application are used, the same technical means as the examination-sheet-based text-text character recognition method described in fig. 1 to 4 are adopted, and the same technical effects can be produced, which is not described herein again.
Fig. 6 is a schematic structural diagram of an electronic device 1 for implementing an examination-form-based text-character recognition method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as an examination-based teletext character recognition program, stored in the memory 11 and operable on the processor 10.
In some embodiments, the processor 10 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device 1, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (for example, executing a text character recognition program based on a list) stored in the memory 11 and calling data stored in the memory 11.
The memory 11 includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only for storing application software installed in the electronic device and various data, such as codes of a text-character recognition program based on an examination order, etc., but also for temporarily storing data that has been output or will be output.
The communication bus 12 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
The communication interface 13 is used for communication between the electronic device 1 and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Fig. 6 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 6 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the embodiments described are illustrative only and are not to be construed as limiting the scope of the claims.
The memory 11 of the electronic device 1 stores an examination-based teletext character recognition program which is a combination of instructions which, when executed in the processor 10, enable:
acquiring a document image to be checked input by a user, and performing processing operation based on character adjustment and watermark elimination on the document image to be checked according to a preset character cleaning strategy to obtain a clean image;
acquiring a pre-trained character recognition model according to a preset model construction strategy, and performing image feature extraction operation and image cutting operation on the clean image by utilizing an improved DB network in the character recognition model to obtain a cut image set;
and performing character recognition on the cut image set by utilizing an improved character recognition network of the character recognition model to obtain a character recognition result corresponding to the document image to be checked.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to the drawing, and is not repeated here.
Further, the integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor of an electronic device, implements:
acquiring a document image to be checked input by a user, and performing processing operation based on character adjustment and watermark elimination on the document image to be checked according to a preset character cleaning strategy to obtain a clean image;
acquiring a pre-trained character recognition model according to a preset model construction strategy, and performing image feature extraction operation and image cutting operation on the clean image by using an improved DB network in the character recognition model to obtain a cut image set;
and performing character recognition on the cut image set by utilizing an improved character recognition network of the character recognition model to obtain a character recognition result corresponding to the document image to be audited.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
Furthermore, it will be obvious that the term "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. An examination-order-based image-text character recognition method is characterized by comprising the following steps:
acquiring a document image to be checked input by a user, and performing processing operation based on character adjustment and watermark elimination on the document image to be checked according to a preset character cleaning strategy to obtain a clean image;
acquiring a pre-trained character recognition model according to a preset model construction strategy, and performing image feature extraction operation and image cutting operation on the clean image by utilizing an improved DB network in the character recognition model to obtain a cut image set;
and performing character recognition on the cut image set by utilizing an improved character recognition network of the character recognition model to obtain a character recognition result corresponding to the document image to be checked.
2. The document examination-based teletext character recognition method of claim 1, wherein obtaining a pre-trained character recognition model according to a preset model construction strategy comprises:
acquiring a character detection network based on a DB network and comprising a convolutional neural network and an FPN structure, and a character recognition network based on a CRNN and comprising the convolutional neural network and a BilSTM;
according to a preset model construction strategy, respectively carrying out batch normalization and activation function deletion operations of preset positions on a first EfficientNet network and a second EfficientNet network which are constructed in advance to obtain an improved first EfficientNet network and an improved second EfficientNet network;
replacing a residual error neural network in a pre-constructed DB network by using the improved first EfficientNet network to obtain an improved DB network;
replacing a convolutional neural network in the character recognition network by using the improved second EfficientNet network, and replacing a BiLSTM network in the character recognition network by using a pre-constructed transformer network to obtain an improved character recognition network;
constructing a character recognition model by utilizing the improved DB network and the improved character recognition network;
and acquiring pre-constructed focusing dice loss, cross entropy loss and preset auxiliary task configuration, and training the character recognition model to obtain a trained character recognition model.
3. The document examination-based teletext character recognition method of claim 2, wherein the obtaining of pre-constructed focused die losses comprises:
obtaining a focus loss focal loss, and die loss dice loss for the modified DB network:
focal loss=(abs(g x,y -p x,y )) γ l x,y
Figure FDA0003881314730000011
wherein x, y are the coordinates of the image pixel, p x,y Is the network prediction value at (x, y), g x,y Is the true tag value at (x, y), l x,y Is the raw loss, γ is the modulation factor;
weighting the focusing loss and the dice loss to obtain a focusing dice loss, focal loss:
Figure FDA0003881314730000021
4. the examination-based teletext character recognition method of claim 2, wherein training the character recognition model to obtain a trained character recognition model comprises:
performing model internal task loss configuration on a first target layer of an improved DB network of the character recognition model and a second target layer of the transformer network by using cross entropy loss according to a preset auxiliary training strategy;
extracting training samples in batches from a pre-constructed training set, and carrying out network forward recognition on the training samples by using the character recognition model to obtain a final prediction result;
calculating a loss value between the final prediction result and a real label value of the training sample according to the focused dice loss;
calculating the gradient of model parameters according to the loss value reverse propagation, and performing network reverse updating on the model parameters to obtain an updated character recognition model;
calculating the model precision of the updated character recognition model by utilizing a pre-constructed test sample set;
judging whether the model precision meets a preset requirement or not;
when the model precision does not reach the preset requirement, returning to the step of extracting the training sample from the pre-constructed training set, and performing iterative updating on the updated character recognition model;
and when the model precision reaches a preset requirement, taking the updated character recognition model updated last as the trained character recognition model.
5. The text-based character recognition method of claim 2, wherein the performing image feature extraction and image cropping operations on the clean image using the improved DB network in the character recognition model to obtain a cropped image set comprises:
sequentially carrying out continuous convolution operation, one-time batch normalization operation and one-time function activation operation on the clean image for a preset number of times by utilizing an improved first EfficientNet network of an improved DB network in the character recognition model to obtain a characteristic matrix set;
performing feature fusion operation on the feature matrix set to obtain an image feature sequence set;
and carrying out full connection operation on the image feature sequence set by using a full connection layer to obtain a character region classification result, judging by using an image connection region according to the character region classification result to obtain a character region position result, and carrying out image interception on a character region in the clean image according to the character region position result to obtain a cut image set.
6. The examination-form-based image-text character recognition method of claim 1, wherein the processing operation based on text adjustment and watermark elimination is performed on the document image to be examined according to a preset text cleaning strategy to obtain a clean image, and the processing operation comprises:
performing Gaussian blurring and graying operation on the document image to be checked according to a preset character cleaning strategy to obtain a gray image;
carrying out self-adaptive binarization and morphological corrosion operations on the gray level image to obtain a character corrosion image, and framing a minimum external rectangle of the character corrosion image to obtain a character area image;
identifying the integrity of the edge characters of the text area image;
when the edge characters are complete, taking the character area image as a complete character image;
when the edge characters are incomplete, performing edge supplementing operation on the edge characters to obtain a complete character image;
performing image feature extraction operation on the complete character image to obtain an image feature set, and identifying whether the image contains a seal or a watermark;
and according to the seal detection and watermark identification results, carrying out pixel value adjustment operation of self-adaptive color channel separation, and eliminating the seal characters and the watermark characters to obtain a clean image.
7. An examination-based teletext character recognition arrangement, the arrangement comprising:
the image preprocessing module is used for acquiring a document image to be checked input by a user, and performing processing operation based on character adjustment and watermark elimination on the document image to be checked according to a preset character cleaning strategy to obtain a clean image;
the image feature extraction module is used for acquiring a pre-trained character recognition model according to a preset model construction strategy, and performing image feature extraction operation and image cutting operation on the clean image by utilizing an improved DB network in the character recognition model to obtain a cut image set;
and the image characteristic identification module is used for carrying out character identification on the cut image set by utilizing an improved character identification network of the character identification model to obtain a character identification result corresponding to the document image to be audited.
8. The examination-based teletext character recognition arrangement of claim 7, wherein obtaining a pre-trained character recognition model according to a pre-determined model building strategy comprises:
acquiring a character detection network based on a DB network and comprising a convolutional neural network and a FPN structure, and a character recognition network based on CRNN and comprising the convolutional neural network and BiLSTM;
according to a preset model construction strategy, respectively carrying out batch normalization and activation function deletion operations of preset positions on a first EfficientNet network and a second EfficientNet network which are constructed in advance to obtain an improved first EfficientNet network and an improved second EfficientNet network;
replacing a residual error neural network in a pre-constructed DB network by using the improved first EfficientNet network to obtain an improved DB network;
replacing a convolutional neural network in the character recognition network by using the improved second EfficientNet network, and replacing a BiLSTM network in the character recognition network by using a pre-constructed transformer network to obtain an improved character recognition network;
constructing a character recognition model by utilizing the improved DB network and the improved character recognition network;
and acquiring pre-constructed focusing dice loss, cross entropy loss and preset auxiliary task configuration, and training the character recognition model to obtain a trained character recognition model.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the method of claim 1 to 6.
10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the examination-based teletext character recognition method according to any one of claims 1 to 6.
CN202211231705.5A 2022-10-08 2022-10-08 Image-text character recognition method, device, equipment and storage medium based on examination sheet Pending CN115439850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211231705.5A CN115439850A (en) 2022-10-08 2022-10-08 Image-text character recognition method, device, equipment and storage medium based on examination sheet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211231705.5A CN115439850A (en) 2022-10-08 2022-10-08 Image-text character recognition method, device, equipment and storage medium based on examination sheet

Publications (1)

Publication Number Publication Date
CN115439850A true CN115439850A (en) 2022-12-06

Family

ID=84251871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211231705.5A Pending CN115439850A (en) 2022-10-08 2022-10-08 Image-text character recognition method, device, equipment and storage medium based on examination sheet

Country Status (1)

Country Link
CN (1) CN115439850A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311539A (en) * 2023-05-19 2023-06-23 亿慧云智能科技(深圳)股份有限公司 Sleep motion capturing method, device, equipment and storage medium based on millimeter waves

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311539A (en) * 2023-05-19 2023-06-23 亿慧云智能科技(深圳)股份有限公司 Sleep motion capturing method, device, equipment and storage medium based on millimeter waves
CN116311539B (en) * 2023-05-19 2023-07-28 亿慧云智能科技(深圳)股份有限公司 Sleep motion capturing method, device, equipment and storage medium based on millimeter waves

Similar Documents

Publication Publication Date Title
CN109902622B (en) Character detection and identification method for boarding check information verification
US10817741B2 (en) Word segmentation system, method and device
CN110705583A (en) Cell detection model training method and device, computer equipment and storage medium
CN112418216B (en) Text detection method in complex natural scene image
CN113283446B (en) Method and device for identifying object in image, electronic equipment and storage medium
CN112699775A (en) Certificate identification method, device and equipment based on deep learning and storage medium
CN114581646A (en) Text recognition method and device, electronic equipment and storage medium
CN112232336A (en) Certificate identification method, device, equipment and storage medium
CN115294483A (en) Small target identification method and system for complex scene of power transmission line
CN115439850A (en) Image-text character recognition method, device, equipment and storage medium based on examination sheet
CN104966109A (en) Medical laboratory report image classification method and apparatus
CN114882204A (en) Automatic ship name recognition method
CN111414889B (en) Financial statement identification method and device based on character identification
CN112686243A (en) Method and device for intelligently identifying picture characters, computer equipment and storage medium
CN106650758A (en) Identity card information decoding method based on image segmenting technology
CN113657279B (en) Bill image layout analysis method and device
CN112801960B (en) Image processing method and device, storage medium and electronic equipment
CN112288748B (en) Semantic segmentation network training and image semantic segmentation method and device
Rani et al. Object Detection in Natural Scene Images Using Thresholding Techniques
CN112541899A (en) Incomplete certificate detection method and device, electronic equipment and computer storage medium
CN113077048B (en) Seal matching method, system, equipment and storage medium based on neural network
CN115311451A (en) Image blur degree evaluation method and device, computer equipment and storage medium
Mali et al. Handwritten Equations Solver Using Convolution Neural Network
CN116740730A (en) Multi-class text sequence recognition method and device and electronic equipment
CN117058692A (en) Character recognition error correction method, device, equipment and medium based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20231211

Address after: 519000, Room 114-845, Government Service Center, Building 2, Citizen Service Center, No. 868 Hengqin Gang'ao Avenue, Zhuhai City, Guangdong Province (centralized office area)

Applicant after: China Merchants Zhirong Supply Chain Service Co.,Ltd.

Address before: Building 2, Minghai Center, south of Chongqing Road, west of Hulunbeier Road, Tianjin Pilot Free Trade Zone (Dongjiang Bonded Port Area), 300000 Tianjin - 5,6-202

Applicant before: China Merchants Tongshang Financial Leasing Co.,Ltd.

TA01 Transfer of patent application right