CN111832564A - Image character recognition method and system, electronic equipment and storage medium - Google Patents
Image character recognition method and system, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111832564A CN111832564A CN202010698163.7A CN202010698163A CN111832564A CN 111832564 A CN111832564 A CN 111832564A CN 202010698163 A CN202010698163 A CN 202010698163A CN 111832564 A CN111832564 A CN 111832564A
- Authority
- CN
- China
- Prior art keywords
- image
- character recognition
- training
- text line
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000012549 training Methods 0.000 claims abstract description 89
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims abstract description 46
- 238000006243 chemical reaction Methods 0.000 claims abstract description 43
- 238000013507 mapping Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims description 24
- 238000012360 testing method Methods 0.000 claims description 22
- 230000009466 transformation Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 13
- 230000008034 disappearance Effects 0.000 abstract description 6
- 238000004891 communication Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000009471 action Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 102100032202 Cornulin Human genes 0.000 description 1
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
The application discloses an image character recognition method, a system, an electronic device and a readable storage medium, wherein the method comprises the following steps: acquiring a training sample comprising a text line sample image and a corresponding training label; constructing a residual block for fusing the high-order features and the low-order features and a conversion module for mapping the features to a high dimension and controlling the number of image channels; sequentially connecting the constructed residual block and the conversion module to generate a network model based on a residual network; training the network model by using the training sample to generate a character recognition model; and inputting the image to be recognized into the character recognition model to obtain a character recognition result corresponding to the image to be recognized. The method and the device have the advantages that the gradient disappearance problem in the network training process can be avoided by using the residual block, the operation can be simplified by using the conversion module, the parameters required by the program are reduced, the network lightweight is realized, the network model is trained by using the text line sample image, the text line recognition can be realized by using the obtained character recognition model, and the individual character cutting is not needed.
Description
Technical Field
The present disclosure relates to the field of pattern recognition and machine learning technologies, and in particular, to an image and text recognition method and system, an electronic device, and a computer-readable storage medium.
Background
The deep learning is one of the extended fields based on machine learning, and is a series of new algorithms generated by the development of a neural network algorithm as a source plus the increase of the structural depth of a model and accompanied by the improvement of big data and computing power, supports processing of various types of data such as images, texts, voices and sequences, and can realize classification, regression, prediction and the like. Common deep learning models include autoencoders, constrained boltzmann machines, convolutional neural networks, and the like.
Usually, the performance of CNN can be improved by increasing the number of neurons, but the expected effect of increasing the network size blindly cannot be obtained, i.e. the performance improved by increasing the number of network layers is not proportional to the increased computational overhead. At present, theoretical analysis and quantitative research aiming at a network model are deficient, and how to optimize the structure of the network model and improve the performance of the network model is a problem to be solved by technical personnel in the field.
Disclosure of Invention
The application aims to provide an image character recognition method, an image character recognition system, an electronic device and a computer readable storage medium, which can avoid the problem of gradient disappearance in the network training process, simplify the operation and realize the lightweight network.
In order to achieve the above object, the present application provides an image character recognition method, including:
acquiring training samples, wherein the training samples comprise text line sample images and training labels corresponding to the text line sample images;
constructing a residual block for fusing the high-order features and the low-order features, and a conversion module for mapping the features to a high dimension and controlling the number of image channels;
sequentially connecting the constructed residual block and the conversion module to generate a network model based on a residual network;
training the network model by using the training sample to generate a character recognition model;
and inputting the image to be recognized into the character recognition model to obtain a character recognition result corresponding to the image to be recognized.
Optionally, the obtaining a training sample includes:
acquiring a text line sample image, and preprocessing the text line sample image;
and coding the text in each text line sample image into a vector sequence to obtain a training label corresponding to each text line sample image.
Optionally, the preprocessing the text line sample image includes:
sequentially carrying out graying processing and brightness self-adaptive adjustment processing on the text line sample image to obtain an adjusted sample image;
performing enhancement processing on the adjusted sample image, wherein the enhancement processing comprises any one or combination of distortion transformation, stretching transformation, perspective correction and noise point adding processing;
and carrying out size adjustment operation or alignment operation on the sample image after the enhancement processing.
Optionally, the encoding the text in each text line sample image into a vector sequence to obtain the training label corresponding to each text line sample image includes:
and acquiring corresponding position indexes of all characters in the text line sample image in a preset dictionary, and generating a corresponding vector sequence as a training label.
Optionally, the training the network model by using the training samples to generate a character recognition model includes:
and training the network model by using an error back propagation algorithm according to the training sample to generate a character recognition model.
Optionally, the conversion module includes a batch normalization layer, an activation function layer, a convolution layer, and a pooling layer.
Optionally, after the training of the network model by using the training sample and generating a character recognition model, the method further includes:
obtaining a test sample, and testing the character recognition model by using the test sample to obtain a test recognition result;
and correcting the character recognition model by combining the actual character content of the test sample and the test recognition result.
Optionally, the inputting the image to be recognized into the character recognition model to obtain the character recognition result corresponding to the image to be recognized includes:
inputting the image to be recognized into the character recognition model, and deducing to obtain a corresponding feature matrix by utilizing forward operation;
and carrying out dimension transformation on the characteristic matrix, and decoding the transformed data according to a preset dictionary to obtain a corresponding character recognition result.
To achieve the above object, the present application provides an image character recognition system, comprising:
the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring training samples, and the training samples comprise text line sample images and training labels corresponding to the text line sample images;
the module construction module is used for constructing a residual block for fusing high-order features and low-order features and a conversion module for mapping the features to a high dimension and controlling the number of image channels;
the module connecting module is used for sequentially connecting the constructed residual block and the conversion module to generate a network model based on a residual network;
the model training module is used for training the network model by using the training sample to generate a character recognition model;
and the character recognition module is used for inputting the image to be recognized into the character recognition model to obtain a character recognition result corresponding to the image to be recognized.
To achieve the above object, the present application provides an electronic device including:
a memory for storing a computer program;
a processor for implementing the steps of any of the image character recognition methods disclosed above when executing the computer program.
To achieve the above object, the present application provides a computer readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the steps of any one of the image character recognition methods disclosed in the foregoing.
According to the scheme, the image character recognition method provided by the application comprises the following steps: acquiring training samples, wherein the training samples comprise text line sample images and training labels corresponding to the text line sample images; constructing a residual block for fusing the high-order features and the low-order features, and a conversion module for mapping the features to a high dimension and controlling the number of image channels; sequentially connecting the constructed residual block and the conversion module to generate a network model based on a residual network; training the network model by using the training sample to generate a character recognition model; and inputting the image to be recognized into the character recognition model to obtain a character recognition result corresponding to the image to be recognized. According to the method, when the character recognition model is constructed, the residual block and the conversion module are constructed firstly, the high-order characteristic and the low-order characteristic are fused by the residual block, the problem of gradient disappearance in the network training process can be avoided, the characteristic is mapped to a high dimension by the conversion module, the number of image channels is controlled, operation is simplified, parameters required by a program are reduced, network lightweight is realized, the residual block and the conversion module can be further utilized to construct the network model, the network model is trained by utilizing the text line sample image, the obtained character recognition model can realize text line recognition, and single character cutting is not needed.
The application also discloses an image character recognition system, an electronic device and a computer readable storage medium, which can also realize the technical effects.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of an image character recognition method disclosed in an embodiment of the present application;
fig. 2 is a flowchart of a specific image character recognition method disclosed in the embodiment of the present application;
fig. 3 is a schematic structural diagram of a specific residual block structure disclosed in the embodiment of the present application;
fig. 4 is a schematic structural diagram of a specific conversion module disclosed in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a specific network model based on a residual error network disclosed in an embodiment of the present application;
fig. 6 is a structural diagram of an image character recognition system disclosed in an embodiment of the present application;
fig. 7 is a block diagram of an electronic device disclosed in an embodiment of the present application;
fig. 8 is a block diagram of another electronic device disclosed in the embodiments of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, an image character recognition method disclosed in the embodiment of the present application includes:
s101: acquiring training samples, wherein the training samples comprise text line sample images and training labels corresponding to the text line sample images;
in the embodiment of the application, a training sample is obtained first. Specifically, a text line sample image may be obtained first, and the text line sample image may be preprocessed; and coding the text in each text line sample image into a vector sequence to obtain a training label corresponding to each text line sample image. The training sample comprises a sample image and a training label corresponding to the image, a common data set can be used, a sample generator can be used for synthesizing data, and the text line data can comprise a common font and a common background.
In a specific implementation, the process of preprocessing the text line sample image may include: carrying out graying processing and brightness self-adaptive adjustment processing on the text line sample image in sequence to obtain an adjusted sample image; performing enhancement processing on the adjusted sample image, wherein the enhancement processing can include but is not limited to distortion transformation, stretching transformation, perspective correction and noisy point processing; and carrying out size adjustment operation or alignment operation on the sample image after the enhancement processing.
As a possible implementation, when performing the resizing operation on the sample image, the image size may be changed to (W, H) by using bilinear interpolation, where W is the adjusted image width and H is the adjusted image height. When the sample images are aligned, the maximum width value of all the sample images can be selected, blank images are generated according to the width value, the images needing to be aligned are randomly pasted to a certain position in the blank images, other blank areas are filled with boundary value pixels of the images, and the sample images are ensured not to be distorted and distorted while the sizes of the same batch of sample images are ensured to be consistent.
Specifically, when the text in each text line sample image is encoded into a vector sequence to obtain the training label corresponding to each text line sample image, the corresponding position index of each character in the text line sample image in the preset dictionary may be obtained, and the corresponding vector sequence is generated as the training label. That is, the embodiment of the present application may provide a predetermined dictionary in advance, where the dictionary may include characters collected in advance and a blank character. The text labels corresponding to each image can be encoded according to the dictionary, and each word in the labels is converted into a position index in the dictionary. And if a certain field does not find the corresponding effective character in the dictionary, replacing the effective character by using a blank character.
S102: constructing a residual block for fusing the high-order features and the low-order features, and a conversion module for mapping the features to a high dimension and controlling the number of image channels;
in this step, a residual block and a conversion module will be constructed. It should be noted that the residual block is mainly used for fusing a high-order feature and a low-order feature, and may be regarded as an integrated model assembled by a series of path sets, and the existence of the bypass in the residual block increases a channel for information circulation, so that the feature information at the front end of the network can be better utilized, and the disappearance of gradients or explosion in the network training process can be effectively inhibited. The depth of the residual block can be freely tailored at different locations in the network.
The conversion module is used for mapping the features to a high dimension and controlling the number of channels of the image. In particular, the conversion module may include a batch normalization layer, an activation function layer, a convolution layer, and a pooling layer. The batch normalization layer is used for accelerating the convergence speed of the model and simplifying the parameter initialization process; the activation function layer can introduce nonlinear transformation to improve the expression capability of the network; the convolutional layer can reduce the number of channels of the feature map through controlling parameters, and complete deep extraction of features at the same time; the pooling layer is used for reducing feature dimensionality, reducing computing resources and accelerating computing efficiency.
S103: sequentially connecting the constructed residual block and the conversion module to generate a network model based on a residual network;
it is to be understood that after the residual block and the transformation module are constructed, the constructed residual block and transformation module can be used to connect to generate a network model based on the residual network. It should be noted that, when modules are connected, the number of the residual blocks and the number of the conversion modules can be selected according to actual requirements, that is, there may be a plurality of residual blocks and a plurality of conversion modules, and the connection rule is that a conversion module is connected above and below the residual block and ends with the residual block.
S104: training the network model by using the training sample to generate a character recognition model;
after the initial network model is constructed and generated, the network model can be trained by using the training samples to generate the finally required character recognition model. Specifically, the network model can be trained by using an error back propagation algorithm according to the training samples to generate the character recognition model.
As a preferred implementation manner, after the network model based on the residual error network is constructed and generated, the network model may be further tested in the embodiments of the present application. Specifically, after the network model is generated, a test sample may be obtained that is similar to the training sample, including test sample images and training labels corresponding to the images, which may use sample data in a common data set, or may use a sample generator to synthesize new sample data. Testing the character recognition model by using the test sample to obtain a test recognition result; and correcting the preliminarily generated character recognition model by combining the actual character content of the test sample and the test recognition result so as to improve the recognition accuracy of the model.
S105: and inputting the image to be recognized into the character recognition model to obtain a character recognition result corresponding to the image to be recognized.
After the character recognition model is generated, the image to be recognized is input into the character recognition model to realize the recognition of the image characters, and a corresponding character recognition result is obtained.
In a specific implementation, the process of inputting the image to be recognized into the character recognition model to obtain the corresponding character recognition result may include: inputting an image to be recognized into a character recognition model, and deducing to obtain a corresponding characteristic matrix by utilizing forward operation; and carrying out dimension transformation on the characteristic matrix, and decoding the transformed data according to a preset dictionary to obtain a corresponding character recognition result.
According to the scheme, the image character recognition method provided by the application comprises the following steps: acquiring training samples, wherein the training samples comprise text line sample images and training labels corresponding to the text line sample images; constructing a residual block for fusing the high-order features and the low-order features, and a conversion module for mapping the features to a high dimension and controlling the number of image channels; sequentially connecting the constructed residual block and the conversion module to generate a network model based on a residual network; training the network model by using the training sample to generate a character recognition model; and inputting the image to be recognized into the character recognition model to obtain a character recognition result corresponding to the image to be recognized. According to the method, when the character recognition model is constructed, the residual block and the conversion module are constructed firstly, the high-order characteristic and the low-order characteristic are fused by the residual block, the problem of gradient disappearance in the network training process can be avoided, the characteristic is mapped to a high dimension by the conversion module, the number of image channels is controlled, operation is simplified, parameters required by a program are reduced, network lightweight is realized, the residual block and the conversion module can be further utilized to construct the network model, the network model is trained by utilizing the text line sample image, the obtained character recognition model can realize text line recognition, and single character cutting is not needed.
The embodiment of the application discloses a specific image character recognition method, and compared with the previous embodiment, the embodiment further explains and optimizes the technical scheme. Referring to fig. 2, specifically:
step 1: preparing training sample images and testing sample images, and encoding text labels corresponding to the images into vector sequences as labels. Specifically, all sample data may be synthesized using a sample generator, divided into a training sample set and a test sample set on a 9:1 scale.
Step 2: the training sample images are pre-processed including, but not limited to, graying, luminance adaptation, warping, resizing, and alignment.
And step 3: constructing a residual block structure: the depth of the residual block can be freely customized at different locations in the network. Referring to fig. 3, let the depth of the residual block structure be 3;
step 3.1: the first layer is a batch normalization layer, and the feature quantity num _ features is equal to the number m of input feature map channels;
step 3.2: the second layer is an activation function layer, and the infionce is True;
step 3.3: the third layer is a convolution layer, out _ channels is a fixed value n, kernel _ size is 3 x 3, stride is 1 x 1, padding is 1 x 1, and bias is False;
step 3.4: fusing the input data in the step 3.1 and the output data in the step 3.3, wherein dim is 1, and the number of output data channels is m + n;
step 3.5: the fourth layer is a batch normalization layer, and num _ features is equal to the number m + n of output data channels in the step 3.4;
step 3.6: the fifth layer is an activation function layer, and the inplace is True;
step 3.7: the sixth layer is a convolutional layer, out _ channels is a fixed value n, kernel _ size is 3 x 3, stride is 1 x 1, padding is 1 x 1, and bias is False;
step 3.8: fusing the input data in the step 3.5 with the output data in the step 3.7, wherein dim is 1, and the number of output data channels is m + n + n;
step 3.9: the seventh layer is a batch normalization layer, and num _ features is equal to the number of output data channels m + n + n in the step 3.8;
step 3.10: the eighth layer is an activation function layer, and the infionce is True;
step 3.11: the ninth layer is a convolutional layer, out _ channels is a fixed value n, kernel _ size is 3 x 3, stride is 1 x 1, padding is 1 x 1, bias is False;
step 3.12: and (3) fusing the input data in the step (3.9) and the output data in the step (3.11), wherein dim is 1, and the number of output data channels is m + n + n + n.
And 4, step 4: a conversion module is constructed, as shown in fig. 4, the features can be mapped to high dimensions, the number of image channels is controlled, operation resources are reduced, and operation efficiency is improved;
step 4.1: the first layer is a batch normalization layer, num _ features is equal to the number m of input data channels;
step 4.2: the second layer is an activation function layer, and the infionce is True;
step 4.3: the third layer is a convolution layer, out _ channels is half of input _ channels, kernel _ size is 1 × 1, stride is 1 × 1, bias is False;
step 4.4: the fourth layer is a pooling layer, and by adopting mean pooling, kernel _ size is 2 × 1, stride is 2 × 1, and padding is 0 × 0.
And 5: a network model based on a residual error network is constructed, and a residual error block and a conversion module are sequentially connected to form a network, as shown in fig. 5, in the embodiment of the present application, 4 residual error blocks and 3 conversion modules are used together, wherein the depths of the residual error blocks are 6, 12, 24 and 16 respectively;
step 5.1: the first layer is a convolutional layer, input _ channels is 1, output _ channels is 64, kernel _ size is 7, stride is 2, padding is 3;
step 5.2: the second layer is the batch normalization layer, num _ features equals 64;
step 5.3: the third layer is an activation function layer, and the infionce is True;
step 5.4: the fourth layer is a pooling layer, maximum pooling is adopted, kernel _ size is 3 × 3, stride is 2 × 2, padding is 1 × 1;
step 5.5: the fifth layer is a residual block with a depth of 6, the out _ channels of the convolutional layers is a fixed value of 32, and the number of final output channels is 256;
step 5.6: the sixth layer is a conversion module, the number of input channels is 256, and the number of output channels is 128;
step 5.7: the seventh layer is a residual block with a depth of 12, the out _ channels of the convolutional layers is a fixed value of 32, and the number of final output channels is 512;
step 5.8: the eighth layer is a conversion module, the number of input channels is 512, and the number of output channels is 256;
step 5.9: the ninth layer is a residual block with a depth of 24, the out _ channels of the convolutional layers is a fixed value of 32, and the final output channel number is 1024;
step 5.10: the tenth layer is a conversion module, the number of input channels is 1024, and the number of output channels is 512;
step 5.11: the eleventh layer is a residual block with a depth of 16, the out _ channels of the convolutional layers are a fixed value of 32, and the final output channel number is 1024;
step 5.12: the twelfth layer is a batch normalization layer, num _ features equals 1024;
step 5.13: the thirteenth layer is an activation function layer, and the infilace is True;
step 5.14: the fourteenth layer is a convolutional layer, out _ channels is 512, kernel _ size is 1 × 1, stride is 1 × 1, bias is False;
step 5.15: the fifteenth layer is a convolutional layer, input _ channels is 512, output _ channels is 5877, kernel _ size is 1 × 1, and stride is 1 × 1.
Step 6: inputting the training labels in the step 1 and the training sample images in the step 2 into the network defined in the step 5, training the network by adopting an error back propagation algorithm, wherein an objective function can adopt a joint-sense time classification loss function, the cross entropy of the real labels of the input images and the model prediction result is measured, preferably, an optimizer can use adadelta, the learning rate is equal to 0.01, the batchsize is equal to 64, and continuous iteration is performed, so that the character recognition model is obtained.
And 7: and after training is finished, inputting the test sample image into the character recognition model, and deducing by adopting forward operation to obtain a final characteristic matrix.
And 8: performing dimension transformation on the matrix characteristics in the step 7, and decoding according to a dictionary to obtain a final character recognition result;
step 8.1: acquiring the maximum value of the feature matrix with dim being 2;
step 8.2: performing dimension transformation on the maximum value matrix in the step 8.1, and performing dimension 0 and 1 interchange;
step 8.3: and (4) decoding the data after the interchange in the step 8.2 according to a dictionary, and converting the vector into a corresponding character recognition result.
Therefore, the image character recognition method can realize text line recognition, does not need to be cut into single characters for recognition, and has better recognition rate for characters with complex backgrounds. Furthermore, the residual error network structure is applied to the field of character recognition, and a complex RNN structure is not used, so that the operation is simplified, parameters required by a program are reduced, and the lightweight network is realized. In addition, the method can directly identify the input text line image end to end, does not need to perform complicated processing and feature extraction on the image to be identified, and has the identification speed obviously higher than that of the character model of the CRNN.
In the following, an image character recognition system provided by an embodiment of the present application is introduced, and an image character recognition system described below and an image character recognition method described above may be referred to each other.
Referring to fig. 6, an image character recognition system provided in an embodiment of the present application includes:
a sample obtaining module 201, configured to obtain a training sample, where the training sample includes text line sample images and training labels corresponding to the text line sample images;
a module construction module 202 for constructing a residual block for fusing a high-order feature and a low-order feature, and a conversion module for mapping the features to a high dimension and controlling the number of image channels;
a module connection module 203, configured to connect the constructed residual block and the conversion module in sequence, and generate a network model based on a residual network;
the model training module 204 is configured to train the network model by using the training samples to generate a character recognition model;
and the character recognition module 205 is configured to input the image to be recognized into the character recognition model, so as to obtain a character recognition result corresponding to the image to be recognized.
For the specific implementation process of the modules 201 to 205, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.
The present application further provides an electronic device, and as shown in fig. 7, an electronic device provided in an embodiment of the present application includes:
a memory 100 for storing a computer program;
the processor 200, when executing the computer program, may implement the steps provided by the above embodiments.
Specifically, the memory 100 includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer-readable instructions, and the internal memory provides an environment for the operating system and the computer-readable instructions in the non-volatile storage medium to run. The processor 200 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data processing chip in some embodiments, and provides computing and controlling capability for the electronic device, and when executing the computer program stored in the memory 100, the steps of the image and text recognition method disclosed in any of the foregoing embodiments may be implemented.
On the basis of the above embodiment, as a preferred implementation, referring to fig. 8, the electronic device further includes:
and an input interface 300 connected to the processor 200, for acquiring computer programs, parameters and instructions imported from the outside, and storing the computer programs, parameters and instructions into the memory 100 under the control of the processor 200. The input interface 300 may be connected to an input device for receiving parameters or instructions manually input by a user. The input device may be a touch layer covered on a display screen, or a button, a track ball or a touch pad arranged on a terminal shell, or a keyboard, a touch pad or a mouse, etc.
And a display unit 400 connected to the processor 200 for displaying data processed by the processor 200 and for displaying a visualized user interface. The display unit 400 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like.
And a network port 500 connected to the processor 200 for performing communication connection with each external terminal device. The communication technology adopted by the communication connection can be a wired communication technology or a wireless communication technology, such as a mobile high definition link (MHL) technology, a Universal Serial Bus (USB), a High Definition Multimedia Interface (HDMI), a wireless fidelity (WiFi), a bluetooth communication technology, a low power consumption bluetooth communication technology, an ieee802.11 s-based communication technology, and the like.
While fig. 8 illustrates only an electronic device having the assembly 100 and 500, those skilled in the art will appreciate that the configuration illustrated in fig. 8 is not intended to be limiting of electronic devices and may include fewer or more components than those illustrated, or some components may be combined, or a different arrangement of components.
The present application also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk. The storage medium has a computer program stored thereon, and the computer program, when executed by a processor, implements the steps of the image character recognition method disclosed in any of the foregoing embodiments.
When the character recognition model is constructed, firstly, a residual block and a conversion module are constructed, high-order features and low-order features are fused by the residual block, the problem of gradient disappearance in the network training process can be avoided, the features are mapped to high dimensions by the conversion module and the number of image channels is controlled, operation is simplified, parameters required by programs are reduced, network lightweight is realized, then the network model can be constructed by the residual block and the conversion module, the network model is trained by the aid of the text line sample images, the obtained character recognition model can recognize text lines, and single character cutting is not needed.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Claims (10)
1. An image character recognition method is characterized by comprising the following steps:
acquiring training samples, wherein the training samples comprise text line sample images and training labels corresponding to the text line sample images;
constructing a residual block for fusing the high-order features and the low-order features, and a conversion module for mapping the features to a high dimension and controlling the number of image channels;
sequentially connecting the constructed residual block and the conversion module to generate a network model based on a residual network;
training the network model by using the training sample to generate a character recognition model;
and inputting the image to be recognized into the character recognition model to obtain a character recognition result corresponding to the image to be recognized.
2. The image text recognition method of claim 1, wherein the obtaining training samples comprises:
acquiring a text line sample image, and preprocessing the text line sample image;
and coding the text in each text line sample image into a vector sequence to obtain a training label corresponding to each text line sample image.
3. The image word recognition method of claim 2, wherein the pre-processing the text line sample image comprises:
sequentially carrying out graying processing and brightness self-adaptive adjustment processing on the text line sample image to obtain an adjusted sample image;
performing enhancement processing on the adjusted sample image, wherein the enhancement processing comprises any one or combination of distortion transformation, stretching transformation, perspective correction and noise point adding processing;
and carrying out size adjustment operation or alignment operation on the sample image after the enhancement processing.
4. The image word recognition method of claim 2, wherein the encoding the text in each of the text line sample images into a vector sequence to obtain the training label corresponding to each of the text line sample images comprises:
and acquiring corresponding position indexes of all characters in the text line sample image in a preset dictionary, and generating a corresponding vector sequence as a training label.
5. The image character recognition method of claim 1, wherein the training the network model by using the training samples to generate a character recognition model comprises:
and training the network model by using an error back propagation algorithm according to the training sample to generate a character recognition model.
6. The image text recognition method of claim 1, wherein the conversion module comprises a batch normalization layer, an activation function layer, a convolution layer, and a pooling layer.
7. The image character recognition method of any one of claims 1 to 6, wherein after the training of the network model by using the training samples to generate a character recognition model, the method further comprises:
obtaining a test sample, and testing the character recognition model by using the test sample to obtain a test recognition result;
and correcting the character recognition model by combining the actual character content of the test sample and the test recognition result.
8. The image character recognition method of claim 7, wherein the inputting the image to be recognized into the character recognition model to obtain the character recognition result corresponding to the image to be recognized comprises:
inputting the image to be recognized into the character recognition model, and deducing to obtain a corresponding feature matrix by utilizing forward operation;
and carrying out dimension transformation on the characteristic matrix, and decoding the transformed data according to a preset dictionary to obtain a corresponding character recognition result.
9. An image text recognition system, comprising:
the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring training samples, and the training samples comprise text line sample images and training labels corresponding to the text line sample images;
the module construction module is used for constructing a residual block for fusing high-order features and low-order features and a conversion module for mapping the features to a high dimension and controlling the number of image channels;
the module connecting module is used for sequentially connecting the constructed residual block and the conversion module to generate a network model based on a residual network;
the model training module is used for training the network model by using the training sample to generate a character recognition model;
and the character recognition module is used for inputting the image to be recognized into the character recognition model to obtain a character recognition result corresponding to the image to be recognized.
10. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the image text recognition method according to any one of claims 1 to 8 when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010698163.7A CN111832564A (en) | 2020-07-20 | 2020-07-20 | Image character recognition method and system, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010698163.7A CN111832564A (en) | 2020-07-20 | 2020-07-20 | Image character recognition method and system, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111832564A true CN111832564A (en) | 2020-10-27 |
Family
ID=72924157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010698163.7A Pending CN111832564A (en) | 2020-07-20 | 2020-07-20 | Image character recognition method and system, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111832564A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112766051A (en) * | 2020-12-29 | 2021-05-07 | 有米科技股份有限公司 | Attention-based image character recognition method and device |
CN112801085A (en) * | 2021-02-09 | 2021-05-14 | 沈阳麟龙科技股份有限公司 | Method, device, medium and electronic equipment for recognizing characters in image |
CN113610838A (en) * | 2021-08-25 | 2021-11-05 | 华北电力大学(保定) | Bolt defect data set expansion method |
CN113642480A (en) * | 2021-08-17 | 2021-11-12 | 苏州大学 | Character recognition method, device, equipment and storage medium |
CN114241495A (en) * | 2022-02-28 | 2022-03-25 | 天津大学 | Data enhancement method for offline handwritten text recognition |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109993109A (en) * | 2019-03-29 | 2019-07-09 | 成都信息工程大学 | Image character recognition method |
CN109993164A (en) * | 2019-03-20 | 2019-07-09 | 上海电力学院 | A kind of natural scene character recognition method based on RCRNN neural network |
CN110032732A (en) * | 2019-03-12 | 2019-07-19 | 平安科技(深圳)有限公司 | A kind of text punctuate prediction technique, device, computer equipment and storage medium |
US20190272438A1 (en) * | 2018-01-30 | 2019-09-05 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for detecting text |
CN111144345A (en) * | 2019-12-30 | 2020-05-12 | 泰康保险集团股份有限公司 | Character recognition method, device, equipment and storage medium |
CN111401155A (en) * | 2020-02-28 | 2020-07-10 | 北京大学 | Image identification method of residual error neural network based on implicit Euler jump connection |
CN111428718A (en) * | 2020-03-30 | 2020-07-17 | 南京大学 | Natural scene text recognition method based on image enhancement |
-
2020
- 2020-07-20 CN CN202010698163.7A patent/CN111832564A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190272438A1 (en) * | 2018-01-30 | 2019-09-05 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for detecting text |
CN110032732A (en) * | 2019-03-12 | 2019-07-19 | 平安科技(深圳)有限公司 | A kind of text punctuate prediction technique, device, computer equipment and storage medium |
CN109993164A (en) * | 2019-03-20 | 2019-07-09 | 上海电力学院 | A kind of natural scene character recognition method based on RCRNN neural network |
CN109993109A (en) * | 2019-03-29 | 2019-07-09 | 成都信息工程大学 | Image character recognition method |
CN111144345A (en) * | 2019-12-30 | 2020-05-12 | 泰康保险集团股份有限公司 | Character recognition method, device, equipment and storage medium |
CN111401155A (en) * | 2020-02-28 | 2020-07-10 | 北京大学 | Image identification method of residual error neural network based on implicit Euler jump connection |
CN111428718A (en) * | 2020-03-30 | 2020-07-17 | 南京大学 | Natural scene text recognition method based on image enhancement |
Non-Patent Citations (1)
Title |
---|
KAIMING HE, XIANGYU ZHANG, SHAOQING REN, JIAN SUN: "Deep Residual Learning for Image Recognition", 《PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, pages 770 - 778 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112766051A (en) * | 2020-12-29 | 2021-05-07 | 有米科技股份有限公司 | Attention-based image character recognition method and device |
CN112801085A (en) * | 2021-02-09 | 2021-05-14 | 沈阳麟龙科技股份有限公司 | Method, device, medium and electronic equipment for recognizing characters in image |
CN113642480A (en) * | 2021-08-17 | 2021-11-12 | 苏州大学 | Character recognition method, device, equipment and storage medium |
CN113610838A (en) * | 2021-08-25 | 2021-11-05 | 华北电力大学(保定) | Bolt defect data set expansion method |
CN114241495A (en) * | 2022-02-28 | 2022-03-25 | 天津大学 | Data enhancement method for offline handwritten text recognition |
CN114241495B (en) * | 2022-02-28 | 2022-05-03 | 天津大学 | Data enhancement method for off-line handwritten text recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111832564A (en) | Image character recognition method and system, electronic equipment and storage medium | |
CN107767408B (en) | Image processing method, processing device and processing equipment | |
CN109871532B (en) | Text theme extraction method and device and storage medium | |
CN107730474B (en) | Image processing method, processing device and processing equipment | |
WO2022022173A1 (en) | Drug molecular property determining method and device, and storage medium | |
JP2022503424A (en) | QR code generation method, device, storage medium and electronic device | |
CN111476228A (en) | White-box confrontation sample generation method for scene character recognition model | |
CN106354701A (en) | Chinese character processing method and device | |
US11599727B2 (en) | Intelligent text cleaning method and apparatus, and computer-readable storage medium | |
US11403802B2 (en) | Methods and systems for optimal transport of non-linear transformations | |
CN111985243B (en) | Emotion model training method, emotion analysis device and storage medium | |
CN114564593A (en) | Completion method and device of multi-mode knowledge graph and electronic equipment | |
CN111581926A (en) | Method, device and equipment for generating file and computer readable storage medium | |
CN113283336A (en) | Text recognition method and system | |
US20220398697A1 (en) | Score-based generative modeling in latent space | |
Liu et al. | Joint graph learning and matching for semantic feature correspondence | |
CN110033034B (en) | Picture processing method and device for non-uniform texture and computer equipment | |
US20210374490A1 (en) | Method and apparatus of processing image, device and medium | |
CN113962192B (en) | Method and device for generating Chinese character font generation model and Chinese character font generation method and device | |
CN111274793B (en) | Text processing method and device and computing equipment | |
CN115659987B (en) | Multi-mode named entity recognition method, device and equipment based on double channels | |
KR20210048281A (en) | Apparatus and method for generating video with background removed | |
CN115454423A (en) | Static webpage generation method and device, electronic equipment and storage medium | |
CN112861546A (en) | Method and device for acquiring text semantic similarity value, storage medium and electronic equipment | |
CN112749364A (en) | Webpage generation method, device, equipment and storage medium based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |