CN112801085A - Method, device, medium and electronic equipment for recognizing characters in image - Google Patents

Method, device, medium and electronic equipment for recognizing characters in image Download PDF

Info

Publication number
CN112801085A
CN112801085A CN202110176821.0A CN202110176821A CN112801085A CN 112801085 A CN112801085 A CN 112801085A CN 202110176821 A CN202110176821 A CN 202110176821A CN 112801085 A CN112801085 A CN 112801085A
Authority
CN
China
Prior art keywords
character
image
training
characters
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110176821.0A
Other languages
Chinese (zh)
Inventor
冯煜博
徐娇
王广普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Linlong Technology Co ltd
Original Assignee
Shenyang Linlong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Linlong Technology Co ltd filed Critical Shenyang Linlong Technology Co ltd
Priority to CN202110176821.0A priority Critical patent/CN112801085A/en
Publication of CN112801085A publication Critical patent/CN112801085A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

The embodiment of the invention discloses a method, a device, a medium and electronic equipment for identifying characters in an image. The method comprises the following steps: acquiring a character image area to be identified; if the character image area to be identified contains characters, extracting character features; inputting the character features into a pre-training language model for predicting each character by the pre-training language model to obtain a character prediction result; the pre-training language model is obtained by training based on a pre-constructed covered training sample; and taking the character prediction result as the recognition result of the characters in the image. By adopting the technical scheme provided by the application, the effect of accurately identifying the characters can be realized aiming at the low-quality images.

Description

Method, device, medium and electronic equipment for recognizing characters in image
Technical Field
The embodiment of the invention relates to the technical field of image recognition, in particular to a method, a device, a medium and electronic equipment for recognizing characters in an image.
Background
With the development of scientific technology, image processing has become a part of many fields. In some scenes, characters in the image often need to be converted into text content, which requires enhancement processing and character recognition on the image. The enhancement processing mainly comprises means such as image denoising, image super-resolution, image deblurring and the like, and on the basis, character recognition is carried out, so that the aim of automatically recognizing the text in the image can be fulfilled. However, for some use scenes of low-quality images, the error rate of the character extraction process is very high because the characters are completely blurred or even damaged, and if manual verification is needed, the efficiency of character recognition is greatly influenced and the cost of character recognition is increased.
Disclosure of Invention
The embodiment of the invention provides a method, a device, a medium and electronic equipment for recognizing characters in an image, which can realize the effect of accurately recognizing the characters aiming at low-quality images.
In a first aspect, an embodiment of the present invention provides a method for recognizing characters in an image, where the method includes:
acquiring a character image area to be identified;
if the character image area to be identified contains characters, extracting character features;
inputting the character features into a pre-training language model for predicting each character by the pre-training language model to obtain a character prediction result; the pre-training language model is obtained by training based on a pre-constructed covered training sample;
and taking the character prediction result as the recognition result of the characters in the image.
Further, extracting character features includes:
and extracting character features of the image to be recognized by using a feature extraction layer consisting of a convolutional neural network and a pooling layer.
Further, the method for extracting the character features of the image to be recognized by using the feature extraction layer composed of the convolution layer and the pooling layer comprises the following steps:
performing feature extraction on the image to be identified by using a convolutional neural network to obtain feature mapping;
performing maximum pooling on the extracted feature mapping by using a pooling layer to obtain refined feature mapping;
and converting the refined feature mapping into a feature sequence.
Further, before converting the refined feature map into a feature sequence, the method further comprises:
carrying out normalization processing on the refined feature mapping to obtain a normalization result;
correspondingly, the refining feature mapping is converted into a feature sequence, and the method comprises the following steps:
and converting the normalization result into a characteristic sequence.
Further, the training process of the pre-training language model includes:
obtaining a covered training sample; the covered training sample comprises partial covering and/or full covering of a single character;
dividing the training samples into a training set and a test set;
inputting training samples of the training set into an initial network model for model training so as to predict the current characters through the correlation coefficient of the context to the current predicted characters;
and if the initial network model meets the preset conditions after being tested by the training samples of the test set, determining the initial network model as a pre-training language model.
In a second aspect, an embodiment of the present invention further provides an apparatus for recognizing characters in an image, including:
the character image area acquisition module is used for acquiring a character image area to be identified;
the character feature extraction module is used for extracting character features if the character image area to be identified contains characters;
the character prediction result determining module is used for inputting the character features into a pre-training language model and predicting each character by the pre-training language model to obtain a character prediction result; the pre-training language model is obtained by training based on a pre-constructed covered training sample;
and the recognition result determining module is used for taking the character prediction result as a recognition result of characters in the image.
Further, the text feature extraction module includes:
and the feature extraction unit is used for extracting the character features of the image to be recognized by using a feature extraction layer consisting of the convolutional neural network and the pooling layer.
Further, the feature extraction unit is specifically configured to:
performing feature extraction on the image to be identified by using a convolutional neural network to obtain feature mapping;
performing maximum pooling on the extracted feature mapping by using a pooling layer to obtain refined feature mapping;
and converting the refined feature mapping into a feature sequence.
Further, the text feature extraction module further includes:
the normalization processing unit is used for performing normalization processing on the refining feature mapping to obtain a normalization result;
correspondingly, the refining feature mapping is converted into a feature sequence, and the method comprises the following steps:
and converting the normalization result into a characteristic sequence.
Further, the training process of the pre-training language model includes:
obtaining a covered training sample; the covered training sample comprises partial covering and/or full covering of a single character;
dividing the training samples into a training set and a test set;
inputting training samples of the training set into an initial network model for model training so as to predict the current characters through the correlation coefficient of the context to the current predicted characters;
and if the initial network model meets the preset conditions after being tested by the training samples of the test set, determining the initial network model as a pre-training language model.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for recognizing characters in an image according to an embodiment of the present application.
In a fourth aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable by the processor, where the processor executes the computer program to implement the method for recognizing characters in an image according to the embodiment of the present application.
According to the technical scheme provided by the embodiment of the application, a character image area to be identified is obtained; if the character image area to be identified contains characters, extracting character features; inputting the character features into a pre-training language model for predicting each character by the pre-training language model to obtain a character prediction result; the pre-training language model is obtained by training based on a pre-constructed covered training sample; and taking the character prediction result as the recognition result of the characters in the image. According to the technical scheme, the effect of accurately identifying the characters can be achieved for the low-quality images.
Drawings
Fig. 1 is a flowchart of a method for recognizing characters in an image according to an embodiment of the present invention;
FIG. 2 is a diagram of an embodiment of the present invention for providing a low quality image;
FIG. 3 is a schematic diagram of a process of recognizing characters in an image according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a model used in the recognition process according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a device for recognizing characters in an image according to a second embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Image enhancement: in the process of acquiring, transmitting and storing the image, the complex imaging causes (such as noise, blur, distortion and the like) in reality reduce the visual perception quality of the image. In order to restore a low-quality image to a high-quality image, many methods have been proposed by researchers, and among them, the technologies such as "image denoising", "image super-resolution", and "image deblurring" are relatively representative.
Character Recognition (OCR): that is, optical character recognition, is a technology for automatically recognizing texts in images, and has a long research history and a wide application range, such as document electronization, identity authentication, digital financial systems and license plate recognition. In addition, in a factory, by automatically extracting text information of a product, it is possible to more conveniently manage the product. Off-line work or test paper of students can be electronized through an OCR system, so that communication between teachers and students is more effective.
In the traditional scheme, the image enhancement only can enhance the quality of the image containing objects or people, and the incomplete characters in the image cannot be subjected to restoration operations such as completion and the like. The character recognition can only recognize characters with higher imaging quality in the image, and can not process incomplete characters; it is not possible to process text in noisy, blurred or low resolution pictures. Therefore, it is difficult to perform accurate character recognition using conventional schemes for some low-quality images, especially images with missing text portions.
This is also true because language processing belongs to the cognitive intelligence category and image processing belongs to the perceptual intelligence category. Deep learning only by data volume increase and computing power increase cannot enable the model to evolve from perceptual intelligence to cognitive intelligence, and knowledge needs to be introduced to assist the learning of the model to possibly improve the model.
The traditional character recognition research does not consider the real problems in the industry such as the conditions that the printed characters in the low-resolution images and the images are damaged or shielded, and the like, so that the research of related problems is lacked in the academic world.
Part of the speech recognition models also have a brake on the lack of practical problem-driven reasons in the industry, so that the current mainstream research is not concerned with the core problem proposed herein, namely the text recognition of low-quality images.
Based on the scheme, the pre-training language model is provided, and is based on the neural network pre-training of large-scale unsupervised texts, so that the model has certain natural language understanding capacity. And fine-tuning the pre-trained model on the target field, so that the model can better deal with the problem of the target field.
Example one
Fig. 1 is a flowchart of a method for recognizing characters in an image according to an embodiment of the present invention, where the embodiment is applicable to a case of performing character recognition on a low-quality image, and the method can be executed by a device for recognizing characters in an image according to an embodiment of the present invention, where the device can be implemented by software and/or hardware, and can be integrated in an electronic device of a service system.
As shown in fig. 1, the method includes:
and S110, acquiring a character image area to be recognized.
The text image area to be recognized may be the image area containing text in the low-quality image described above. The low quality image comes from the scan element in the majority. Such as book scans, newspaper scans, and the like. It can be understood that in a low-quality image, a text area may be partially blocked, so that a certain text or some texts are partially or completely blocked, and features cannot be extracted.
Fig. 2 is a schematic diagram of providing a low-quality image according to an embodiment of the present invention, as shown in fig. 2, the low-quality image is a scanned book. After image enhancement processing, the characters become fuzzy, so that the characters in the picture cannot be recognized manually. After the character recognition processing, the error rate of the recognition result is extremely high. Conventional language models cannot process pictures.
And S120, if the character image area to be recognized contains characters, extracting character features.
The method comprises the steps of firstly, identifying whether characters are contained or not, if so, adopting the character identification scheme provided by the scheme, and if not, directly identifying other images.
If the text is contained, feature extraction can be performed on the text area. Specifically, a Convolutional Neural Network (CNN) may be used to perform feature extraction on the image to obtain a feature map. The underlying CNN consists of three structures, convolution (convolution), activation (activation) and pooling (displacement). The result of the CNN output is a specific feature space for each image. When processing an image classification task, we will use the feature space output by the CNN as an input of a fully connected layer or a fully connected neural network (FCN), and use the fully connected layer to complete mapping, i.e., classification, from the input image to the tag set. Of course, the most important work in the whole process is how to iteratively adjust the network weights through the training data, i.e. the back propagation algorithm. Currently, mainstream Convolutional Neural Networks (CNNs), such as VGG, ResNet, etc., are combined by simple CNN adjustment.
In this scheme, optionally, extract characters characteristic, include:
and extracting character features of the image to be recognized by using a feature extraction layer consisting of a convolutional neural network and a pooling layer.
The character features can be directly used as subsequent input data without being processed after the features extracted based on the image, and can also be converted to obtain a feature sequence to be used as subsequent input.
Specifically, the method for extracting the character features of the image to be recognized by using the feature extraction layer composed of the convolution layer and the pooling layer comprises the following steps:
performing feature extraction on the image to be identified by using a convolutional neural network to obtain feature mapping;
performing maximum pooling on the extracted feature mapping by using a pooling layer to obtain refined feature mapping;
and converting the refined feature mapping into a feature sequence.
Using a Convolutional Neural Network (CNN) to perform Feature extraction on the image to obtain Feature mapping (Feature Maps), also called Feature Maps or landmark Maps (landmap);
and performing maximum Pooling (Max Pooling) on the extracted feature map by using a Pooling layer (Pooling) to obtain a refined feature map.
After pooling is completed, normalization processing can be carried out on the refined feature mapping to obtain a normalization result;
correspondingly, the refining feature mapping is converted into a feature sequence, and the method comprises the following steps:
and converting the normalization result into a characteristic sequence.
By using a normalization layer (normalization) to carry out Batch normalization (Batch normalization) processing on the refined feature map, gradient diffusion of the neural network can be prevented, and the obtained result is more accurate.
S130, inputting the character features into a pre-training language model for predicting each character by the pre-training language model to obtain a character prediction result; the pre-training language model is obtained by training based on a pre-constructed covered training sample.
After the character features are obtained, the character features can be input into a pre-training language model so that the pre-training language model can recognize characters one by one, and the characters can be predicted by combining the context, so that a character prediction result is obtained.
In this scheme, optionally, the training process of the pre-training language model includes:
obtaining a covered training sample; the covered training sample comprises partial covering and/or full covering of a single character;
dividing the training samples into a training set and a test set;
inputting training samples of the training set into an initial network model for model training so as to predict the current characters through the correlation coefficient of the context to the current predicted characters;
and if the initial network model meets the preset conditions after being tested by the training samples of the test set, determining the initial network model as a pre-training language model.
Specifically, the training samples can be text images with shielding, or clear text images, and after manual processing, part of text in the text images is shielded. After the initial model is divided into a training set and a test set, the training set can be adopted for training, and whether the initial model obtained by training converges or not or whether the prediction accuracy of the characters can reach a preset condition or not is determined by using the test set. The preset condition here may be that the setting is made to reach an accuracy of 99.5%, or even more.
According to the scheme, a traditional character detection and recognition method is utilized, an area where characters appear in an image is recognized firstly, then the image in the area is identified by a character recognition network, the characters in the current recognition area are predicted according to context by combining a pre-training language model while the characters are recognized, finally, the prediction conditions of the character recognition network and the pre-training language model are comprehensively considered in an output layer, and a character recognition result of the model is output according to the implicit information of the neural network such as context, image information and the like.
And S140, taking the character prediction result as the recognition result of the characters in the image.
It can be understood that if the prediction is completed, the prediction result can be directly used as the final result of the character recognition, so that the character recognition work on the low-quality image is completed.
Fig. 3 is a schematic diagram of a process of recognizing characters in an image according to an embodiment of the present invention, as shown in fig. 3, the process of the present invention is actually executed and mainly includes the following steps:
step 1: inputting a character recognition image;
step 2: performing feature extraction on the image by using a Convolutional Neural Network (CNN) to obtain feature mapping;
and step 3: performing maximum Pooling (Max Pooling) on the extracted feature mapping by using a Pooling layer (Pooling) to obtain refined feature mapping;
and 4, step 4: carrying out Batch normalization (Batch normalization) processing on the refined feature mapping by using a normalization layer (normalization) to prevent the gradient diffusion of the neural network;
and 5: circularly executing the step 2 to the step 4 for 6 times;
step 6: mapping the Map-to-Sequence to a Sequence network, and converting the Feature mapping into a Feature Sequence (Feature Sequence);
and 7: and inputting the characteristic sequence into a BERT model for prediction to obtain a character recognition result.
Fig. 4 is a schematic structural diagram of a model used in an identification process according to an embodiment of the present invention, and as shown in fig. 4, the model provided herein is composed of three parts, namely, a convolutional layer, a map-to-sequence layer, and a full link layer.
The convolution layer is used for extracting high-dimensional latent semantic features of the image;
the map-to-sequence layer is used for converting the three-dimensional continuous tensor into a three-dimensional sequence tensor;
the fully connected layer receives the sequence features of the image and maps them to text.
Specifically, in the BERT model, [ CLS ] represents a tag of a classification task, Fe is a Feature (Feature), E is an embedded vector, C is a classification tag, T is a contextual representation of a character, and O is a character predicted by the model.
The scheme provides a new multi-modal task combining natural language processing and computer vision, namely a character recognition task of low-quality images.
In addition, the scheme expands the character recognition task to a more subdivided field, so that the application range of a character recognition model is wider, and the artificial intelligence technology can be used for assisting cultural relic protection work such as recognition and recovery of ancient book characters even along with the continuous development of related research of the task provided by the text; or the fields of satellite high-altitude exploration and the like.
Example two
Fig. 5 is a schematic structural diagram of an apparatus for recognizing characters in an image according to a second embodiment of the present invention. As shown in fig. 5, the apparatus for recognizing characters in an image includes:
a text image region obtaining module 510, configured to obtain a text image region to be identified;
a text feature extraction module 520, configured to extract text features if the text image region to be identified contains text;
a character prediction result determining module 530, configured to input the character features into a pre-training language model, and enable the pre-training language model to predict each character to obtain a character prediction result; the pre-training language model is obtained by training based on a pre-constructed covered training sample;
and the recognition result determining module 540 is configured to use the character prediction result as a recognition result of characters in the image.
Further, the text feature extraction module includes:
and the feature extraction unit is used for extracting the character features of the image to be recognized by using a feature extraction layer consisting of the convolutional neural network and the pooling layer.
Further, the feature extraction unit is specifically configured to:
performing feature extraction on the image to be identified by using a convolutional neural network to obtain feature mapping;
performing maximum pooling on the extracted feature mapping by using a pooling layer to obtain refined feature mapping;
and converting the refined feature mapping into a feature sequence.
Further, the text feature extraction module further includes:
the normalization processing unit is used for performing normalization processing on the refining feature mapping to obtain a normalization result;
correspondingly, the refining feature mapping is converted into a feature sequence, and the method comprises the following steps:
and converting the normalization result into a characteristic sequence.
Further, the training process of the pre-training language model includes:
obtaining a covered training sample; the covered training sample comprises partial covering and/or full covering of a single character;
dividing the training samples into a training set and a test set;
inputting training samples of the training set into an initial network model for model training so as to predict the current characters through the correlation coefficient of the context to the current predicted characters;
and if the initial network model meets the preset conditions after being tested by the training samples of the test set, determining the initial network model as a pre-training language model.
The product can execute the method provided by the first embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE III
Embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for recognizing characters in an image, the method including:
acquiring a character image area to be identified;
if the character image area to be identified contains characters, extracting character features;
inputting the character features into a pre-training language model for predicting each character by the pre-training language model to obtain a character prediction result; the pre-training language model is obtained by training based on a pre-constructed covered training sample;
and taking the character prediction result as the recognition result of the characters in the image.
Storage medium-any of various types of memory electronics or storage electronics. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in the computer system in which the program is executed, or may be located in a different second computer system connected to the computer system through a network (such as the internet). The second computer system may provide the program instructions to the computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.
Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the operation of recognizing characters in images as described above, and may also perform related operations in the method of recognizing characters in images provided in any embodiments of the present application.
Example four
The embodiment of the application provides electronic equipment. Fig. 6 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application. As shown in fig. 6, the present embodiment provides an electronic device 600, which includes: one or more processors 620; the storage device 610 is configured to store one or more programs, and when the one or more programs are executed by the one or more processors 620, the one or more processors 620 are enabled to implement the method for recognizing characters in an image provided in an embodiment of the present application, the method includes:
acquiring a character image area to be identified;
if the character image area to be identified contains characters, extracting character features;
inputting the character features into a pre-training language model for predicting each character by the pre-training language model to obtain a character prediction result; the pre-training language model is obtained by training based on a pre-constructed covered training sample;
and taking the character prediction result as the recognition result of the characters in the image.
The electronic device 600 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the electronic device 600 includes a processor 620, a storage device 610, an input device 630, and an output device 640; the number of the processors 620 in the electronic device may be one or more, and one processor 620 is taken as an example in fig. 6; the processor 620, the storage device 610, the input device 630, and the output device 640 in the electronic apparatus may be connected by a bus or other means, and are exemplified by being connected by a bus 650 in fig. 6.
The storage device 610 is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and module units, such as program instructions corresponding to the recognition method of characters in images in the embodiment of the present application.
The storage device 610 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. In addition, the storage 610 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the storage 610 may further include memory located remotely from the processor 620, which may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input means 630 may be used to receive input numbers, character information, or voice information, and to generate key signal inputs related to user settings and function control of the electronic device. The output device 640 may include a display screen, a speaker, and other electronic devices.
The electronic equipment provided by the embodiment of the application can realize the effect of accurately identifying the characters aiming at the low-quality images.
The device, the medium and the electronic device for recognizing characters in images provided in the above embodiments can operate the method for recognizing characters in images provided in any embodiment of the present application, and have corresponding functional modules and beneficial effects for operating the method. For the technical details not described in detail in the above embodiments, reference may be made to the method for recognizing characters in an image provided in any embodiment of the present application.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for recognizing characters in an image, the method comprising:
acquiring a character image area to be identified;
if the character image area to be identified contains characters, extracting character features;
inputting the character features into a pre-training language model for predicting each character by the pre-training language model to obtain a character prediction result; the pre-training language model is obtained by training based on a pre-constructed covered training sample;
and taking the character prediction result as the recognition result of the characters in the image.
2. The method of claim 1, wherein extracting textual features comprises:
and extracting character features of the image to be recognized by using a feature extraction layer consisting of a convolutional neural network and a pooling layer.
3. The method of claim 2, wherein extracting the text features of the image to be recognized using a feature extraction layer consisting of a convolutional layer and a pooling layer comprises:
performing feature extraction on the image to be identified by using a convolutional neural network to obtain feature mapping;
performing maximum pooling on the extracted feature mapping by using a pooling layer to obtain refined feature mapping;
and converting the refined feature mapping into a feature sequence.
4. The method of claim 3, wherein prior to converting the refined feature map into a feature sequence, the method further comprises:
carrying out normalization processing on the refined feature mapping to obtain a normalization result;
correspondingly, the refining feature mapping is converted into a feature sequence, and the method comprises the following steps:
and converting the normalization result into a characteristic sequence.
5. The method of claim 1, wherein the training process of the pre-trained language model comprises:
obtaining a covered training sample; the covered training sample comprises partial covering and/or full covering of a single character;
dividing the training samples into a training set and a test set;
inputting training samples of the training set into an initial network model for model training so as to predict the current characters through the correlation coefficient of the context to the current predicted characters;
and if the initial network model meets the preset conditions after being tested by the training samples of the test set, determining the initial network model as a pre-training language model.
6. An apparatus for recognizing characters in an image, the apparatus comprising:
the character image area acquisition module is used for acquiring a character image area to be identified;
the character feature extraction module is used for extracting character features if the character image area to be identified contains characters;
the character prediction result determining module is used for inputting the character features into a pre-training language model and predicting each character by the pre-training language model to obtain a character prediction result; the pre-training language model is obtained by training based on a pre-constructed covered training sample;
and the recognition result determining module is used for taking the character prediction result as a recognition result of characters in the image.
7. The apparatus of claim 6, wherein the text feature extraction module comprises:
and the feature extraction unit is used for extracting the character features of the image to be recognized by using a feature extraction layer consisting of the convolutional neural network and the pooling layer.
8. The apparatus according to claim 7, wherein the feature extraction unit is specifically configured to:
performing feature extraction on the image to be identified by using a convolutional neural network to obtain feature mapping;
performing maximum pooling on the extracted feature mapping by using a pooling layer to obtain refined feature mapping;
and converting the refined feature mapping into a feature sequence.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for recognizing characters in an image according to any one of claims 1 to 5.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for recognizing a character in an image according to any one of claims 1 to 5 when executing the computer program.
CN202110176821.0A 2021-02-09 2021-02-09 Method, device, medium and electronic equipment for recognizing characters in image Pending CN112801085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110176821.0A CN112801085A (en) 2021-02-09 2021-02-09 Method, device, medium and electronic equipment for recognizing characters in image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110176821.0A CN112801085A (en) 2021-02-09 2021-02-09 Method, device, medium and electronic equipment for recognizing characters in image

Publications (1)

Publication Number Publication Date
CN112801085A true CN112801085A (en) 2021-05-14

Family

ID=75814903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110176821.0A Pending CN112801085A (en) 2021-02-09 2021-02-09 Method, device, medium and electronic equipment for recognizing characters in image

Country Status (1)

Country Link
CN (1) CN112801085A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035538A (en) * 2022-03-22 2022-09-09 北京百度网讯科技有限公司 Training method of text recognition model, and text recognition method and device
WO2023093361A1 (en) * 2021-11-25 2023-06-01 北京有竹居网络技术有限公司 Image character recognition model training method, and image character recognition method and apparatus
CN116612466A (en) * 2023-07-20 2023-08-18 腾讯科技(深圳)有限公司 Content identification method, device, equipment and medium based on artificial intelligence
CN117831038A (en) * 2022-01-10 2024-04-05 于胜田 Method and system for recognizing characters of big data digital archives

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654129A (en) * 2015-12-30 2016-06-08 成都数联铭品科技有限公司 Optical character sequence recognition method
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109685055A (en) * 2018-12-26 2019-04-26 北京金山数字娱乐科技有限公司 Text filed detection method and device in a kind of image
WO2019232853A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Chinese model training method, chinese image recognition method, device, apparatus and medium
WO2019232847A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Handwriting model training method, handwritten character recognition method and apparatus, and device and medium
CN111126068A (en) * 2019-12-25 2020-05-08 中电云脑(天津)科技有限公司 Chinese named entity recognition method and device and electronic equipment
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111275038A (en) * 2020-01-17 2020-06-12 平安医疗健康管理股份有限公司 Image text recognition method and device, computer equipment and computer storage medium
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111832564A (en) * 2020-07-20 2020-10-27 浙江诺诺网络科技有限公司 Image character recognition method and system, electronic equipment and storage medium
CN111860525A (en) * 2020-08-06 2020-10-30 宁夏宁电电力设计有限公司 Bottom-up optical character recognition method suitable for terminal block
CN112036292A (en) * 2020-08-27 2020-12-04 平安科技(深圳)有限公司 Character recognition method and device based on neural network and readable storage medium
CN112329767A (en) * 2020-10-15 2021-02-05 方正株式(武汉)科技开发有限公司 Contract text image key information extraction system and method based on joint pre-training
CN112330569A (en) * 2020-11-27 2021-02-05 上海眼控科技股份有限公司 Model training method, text denoising method, device, equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654129A (en) * 2015-12-30 2016-06-08 成都数联铭品科技有限公司 Optical character sequence recognition method
WO2019232853A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Chinese model training method, chinese image recognition method, device, apparatus and medium
WO2019232847A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Handwriting model training method, handwritten character recognition method and apparatus, and device and medium
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109685055A (en) * 2018-12-26 2019-04-26 北京金山数字娱乐科技有限公司 Text filed detection method and device in a kind of image
CN111126068A (en) * 2019-12-25 2020-05-08 中电云脑(天津)科技有限公司 Chinese named entity recognition method and device and electronic equipment
CN111275038A (en) * 2020-01-17 2020-06-12 平安医疗健康管理股份有限公司 Image text recognition method and device, computer equipment and computer storage medium
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111832564A (en) * 2020-07-20 2020-10-27 浙江诺诺网络科技有限公司 Image character recognition method and system, electronic equipment and storage medium
CN111860525A (en) * 2020-08-06 2020-10-30 宁夏宁电电力设计有限公司 Bottom-up optical character recognition method suitable for terminal block
CN112036292A (en) * 2020-08-27 2020-12-04 平安科技(深圳)有限公司 Character recognition method and device based on neural network and readable storage medium
CN112329767A (en) * 2020-10-15 2021-02-05 方正株式(武汉)科技开发有限公司 Contract text image key information extraction system and method based on joint pre-training
CN112330569A (en) * 2020-11-27 2021-02-05 上海眼控科技股份有限公司 Model training method, text denoising method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIM S等: "Chinese text classificaion based on character-level CNN and SVM", THE INSTITUTE OF INTERNAET, BROADCASTING AND COMMUNICATION, pages 1 - 6 *
吕云翔 等: "Python深度学习", 30 September 2020, 北京:机械工业出版社, pages: 98 - 101 *
周成伟;: "基于卷积神经网络的自然场景中数字识别", 计算机技术与发展, no. 11, pages 107 - 111 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023093361A1 (en) * 2021-11-25 2023-06-01 北京有竹居网络技术有限公司 Image character recognition model training method, and image character recognition method and apparatus
CN117831038A (en) * 2022-01-10 2024-04-05 于胜田 Method and system for recognizing characters of big data digital archives
CN115035538A (en) * 2022-03-22 2022-09-09 北京百度网讯科技有限公司 Training method of text recognition model, and text recognition method and device
CN116612466A (en) * 2023-07-20 2023-08-18 腾讯科技(深圳)有限公司 Content identification method, device, equipment and medium based on artificial intelligence
CN116612466B (en) * 2023-07-20 2023-09-29 腾讯科技(深圳)有限公司 Content identification method, device, equipment and medium based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN112801085A (en) Method, device, medium and electronic equipment for recognizing characters in image
WO2021244621A1 (en) Scenario semantic parsing method based on global guidance selective context network
CN113674140B (en) Physical countermeasure sample generation method and system
CN111488865B (en) Image optimization method and device, computer storage medium and electronic equipment
WO2018153322A1 (en) Key point detection method, neural network training method, apparatus and electronic device
CN111160533A (en) Neural network acceleration method based on cross-resolution knowledge distillation
DE102022107186A1 (en) GENERATOR UTILIZATION FOR DEEPFAKE DETECTION
CN111767906B (en) Face detection model training method, face detection device and electronic equipment
CN111401309B (en) CNN training and remote sensing image target identification method based on wavelet transformation
CN114329034A (en) Image text matching discrimination method and system based on fine-grained semantic feature difference
CN109697442B (en) Training method and device of character recognition model
CN114742014B (en) Few-sample text style migration method based on associated attention
CN113780326A (en) Image processing method and device, storage medium and electronic equipment
CN114707589B (en) Method, apparatus, storage medium, device and program product for generating challenge sample
CN112560668B (en) Human behavior recognition method based on scene priori knowledge
CN115565186B (en) Training method and device for character recognition model, electronic equipment and storage medium
CN118043860A (en) Personalized text-to-image diffusion model
CN116452435A (en) Image high-quality harmonious model training and device
CN115116117A (en) Learning input data acquisition method based on multi-mode fusion network
CN116958615A (en) Picture identification method, device, equipment and medium
CN116110110A (en) Fake image detection method, terminal and storage medium based on face key points
CN115049546A (en) Sample data processing method and device, electronic equipment and storage medium
Hallyal et al. Optimized recognition of CAPTCHA through attention models
CN116168398B (en) Examination paper approval method, device and equipment based on image identification
Jiang et al. Diff-CAPTCHA: An Image-based CAPTCHA with Security Enhanced by Denoising Diffusion Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination