CN108427953A

CN108427953A - A kind of character recognition method and device

Info

Publication number: CN108427953A
Application number: CN201810162541.2A
Authority: CN
Inventors: 袁飞; 华仁红; 刘洋; 陈德
Original assignee: Beijing Yida Turing Technology Co Ltd
Current assignee: Beijing Yida Turing Technology Co Ltd
Priority date: 2018-02-26
Filing date: 2018-02-26
Publication date: 2018-08-21

Abstract

A kind of character recognition method of offer of the embodiment of the present invention and device.The method includes：Picture to be identified is obtained, includes text information to be identified on the picture to be identified；Using the picture to be identified as the input of target nerve network model, Text region is carried out to the picture to be identified by the target nerve network model being pre-created, to obtain the text information in the picture to be identified.Described device is for executing the method.For the embodiment of the present invention by the way that picture to be identified to be input in target nerve network model, target nerve network model carries out Text region to picture to be identified, obtains the text information in picture to be identified, improves the efficiency and accuracy of Text region.

Description

Character recognition method and device

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a character recognition method and device.

Background

The technology of automatically recognizing characters by using a computer is an important field of pattern recognition application. People need to process a large amount of words, reports and texts in production and life. In order to reduce the labor of people and improve the processing efficiency, the 50 s began to discuss the general character recognition method and develop an optical character recognizer. In the 60 s, utility machines using magnetic ink and special fonts were introduced. In the later 60 s, a plurality of character types and handwritten character recognition machines appeared, and the recognition precision and the machine performance of the character recognition machines can basically meet the requirements. Such as a handwritten form number recognition machine and a printed form english number recognition machine for letter sorting. In the 70 s, the basic theory of character recognition and the development of high-performance character recognition machines were mainly studied, and the research of character recognition was emphasized.

The character recognition can be applied to many fields, such as reading, translation, retrieval of document data, letter and parcel sorting, manuscript editing and proofreading, gathering and analysis of a large number of statistical reports and cards, bank check processing, commodity invoice statistical gathering, commodity code recognition, commodity warehouse management, automatic processing of a large number of credit cards in water, electricity, gas, house renting, personal insurance and other charge collection services, local automation of office typists and the like.

In the process of implementing the embodiment of the present invention, the inventor finds that, in the prior art, when identifying characters in an image, single characters in the image need to be matched one by one, then the single characters are synthesized, and finally text information in the image is obtained.

Disclosure of Invention

Aiming at the problems in the prior art, the embodiment of the invention provides a character recognition method and a character recognition device.

In a first aspect, an embodiment of the present invention provides a text recognition method, including:

acquiring a picture to be recognized, wherein the picture to be recognized comprises character information to be recognized;

and taking the picture to be recognized as the input of a target neural network model, and performing character recognition on the picture to be recognized through the pre-established target neural network model to obtain the character information in the picture to be recognized.

Optionally, the method further comprises:

collecting a plurality of pictures with text information, acquiring the text information corresponding to the pictures, and dividing the pictures into a training set and a verification set according to a preset proportion;

constructing a multilayer convolutional neural network, wherein the multilayer convolutional neural network comprises the number of layers, the size of a convolutional kernel and the number of convolutional kernels;

training the multilayer convolutional neural network by adopting an error back propagation algorithm by taking the pictures in the training set as input and the character information corresponding to the pictures as output to obtain a target multilayer convolutional neural network;

acquiring intermediate neural network models of which the loss values generated in the training process of the target multilayer convolutional neural network meet preset conditions, and verifying each intermediate neural network model through the verification set to obtain the accuracy rate corresponding to each intermediate neural network model;

and taking the intermediate neural network model with the highest accuracy as the target neural network model.

Optionally, the multi-layer convolutional neural network includes a residual layer, a BN layer, an excitation layer, and an LSTM layer, and the training the multi-layer convolutional neural network using an error back propagation algorithm includes:

and adjusting the number of layers corresponding to the residual layer, the BN layer, the excitation layer and the LSTM layer respectively until the loss value corresponding to the adjusted multilayer convolutional neural network is smaller than a preset threshold value.

Optionally, the objective function of the target neural network model is a CTC loss function.

In a second aspect, an embodiment of the present invention provides a text recognition apparatus, including:

the device comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a picture to be recognized, and the picture to be recognized comprises character information to be recognized;

and the recognition module is used for taking the picture to be recognized as the input of a target neural network model and performing character recognition on the picture to be recognized through the pre-established target neural network model so as to obtain the character information in the picture to be recognized.

Optionally, the apparatus further includes a model creation module configured to:

Optionally, the multi-layer convolutional neural network includes a residual layer, a BN layer, an excitation layer, and an LSTM layer, and the model creation module is specifically configured to:

In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor, a memory, and a bus, wherein,

the processor and the memory are communicated with each other through the bus;

the memory stores program instructions executable by the processor, the processor being capable of performing the method steps of the first aspect when invoked by the program instructions.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, including:

the non-transitory computer readable storage medium stores computer instructions that cause the computer to perform the method steps of the first aspect.

According to the character recognition method and device provided by the embodiment of the invention, the picture to be recognized is input into the target neural network model, and the target neural network model performs character recognition on the picture to be recognized to obtain character information in the picture to be recognized, so that the efficiency and the accuracy of character recognition are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a character recognition method according to an embodiment of the present invention;

fig. 2 is a screenshot of a substation device indicator in an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a character recognition device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a text recognition method according to an embodiment of the present invention, as shown in fig. 1, the method includes:

step 101: acquiring a picture to be recognized, wherein the picture to be recognized comprises character information to be recognized;

specifically, the identification device acquires a picture to be identified, wherein the picture to be identified can be a substation equipment indicator, a safety warning board, a road sign indicator and the like, and text information to be identified is arranged on the picture to be identified.

Step 102: and taking the picture to be recognized as the input of a target neural network model, and performing character recognition on the picture to be recognized through the pre-established target neural network model to obtain the character information in the picture to be recognized.

Specifically, after receiving the picture to be recognized, the recognition device inputs the picture to be recognized into the target neural network model, and the target neural network model performs character recognition on the picture to be recognized to obtain character information in the picture to be recognized.

According to the embodiment of the invention, the picture to be recognized is input into the target neural network model, and the target neural network model performs character recognition on the picture to be recognized to obtain character information in the picture to be recognized, so that the efficiency and the accuracy of character recognition are improved.

On the basis of the above embodiment, the method further includes:

Specifically, a target neural network model is created in advance, and the specific steps are as follows: firstly, a large number of pictures with character information are collected, manual labeling is carried out on the pictures in advance, characters on each picture are obtained, a text file can be generated on each picture, the content of the text file is the character content of the picture, and the name of the text file is the name of the picture. Dividing the acquired pictures into a training set and a verification set by using a preset proportion, for example: 1000 pictures are collected, 800 pictures and corresponding text information are randomly selected from the pictures to serve as a training set, and the remaining 200 pictures and corresponding text information serve as a verification set.

The method comprises the steps of constructing a multilayer convolutional neural network, wherein the multilayer convolutional neural network comprises the number of layers, the size of convolutional kernels and the number of convolutional kernels, the number of layers, the size of convolutional kernels and the number of convolutional kernels are initial values set in advance according to experience, training the multilayer convolutional neural network by taking pictures in a training set as input and taking character information corresponding to the pictures as output, training the multilayer convolutional neural network by adopting an error back propagation algorithm during training, judging whether the structure of the current multilayer convolutional neural network needs to be adjusted according to a loss value obtained by a target function by adopting a CTC (central control unit) loss function, and finally obtaining the target multilayer convolutional neural network.

When a target multilayer convolutional neural network is trained, adjusting each parameter in the target multilayer convolutional neural network through an error back propagation algorithm, obtaining a corresponding intermediate neural network model after each adjustment, enabling each intermediate neural network model to correspond to a loss value, sequencing the loss values of all the obtained intermediate neural network models from small to large, selecting the intermediate neural network models positioned at the first few bits from the sequencing, respectively verifying the selected intermediate neural network models by using a verification set to obtain the accuracy corresponding to each intermediate neural network model, and taking the intermediate neural network model with the highest accuracy as the target neural network model.

According to the embodiment of the invention, the multilayer convolutional neural network is created in advance, a large number of collected pictures are used as input, character information corresponding to the pictures is used as output for training, the target multilayer convolutional neural network is obtained, then the intermediate neural network model generated in the process of training the target multilayer convolutional neural network is obtained, and the target neural network model is determined from the intermediate neural network model through the accuracy rate, so that the accuracy rate of character recognition on the pictures is improved.

On the basis of the above embodiment, the multi-layer convolutional neural network includes a residual layer, a canonical BN layer, an excitation layer, and a temporal recurrent neural network LSTM layer, and the training of the multi-layer convolutional neural network by using an error back propagation algorithm includes:

Specifically, the constructed multilayer convolutional neural network comprises a residual error layer, a specification (BN) layer, an excitation layer and a (LSTM) layer, each network layer has a corresponding number of layers, the ordering of the network layers can be preset according to actual conditions, and when the multilayer convolutional neural network is trained by using pictures in a training set and text information corresponding to each picture, the number of layers of one or more network layers can be increased or reduced, for example: one residual layer and one LSTM layer can be added, so that a multilayer convolutional neural network with a deeper structure is obtained, and it can be understood that the effect is different when the residual layer and the LSTM layer are added to different positions of the original multilayer convolutional neural network, and the adjustment can be performed according to the actual condition or the loss value after training. And (3) adding a residual error layer and an LSTM layer, then training the new multilayer convolutional neural network again, if the loss value corresponding to the new multilayer convolutional neural network is not less than the preset threshold, continuing to adjust the result of the multilayer convolutional neural network until the loss value corresponding to the adjusted multilayer convolutional neural network is less than the preset threshold, and then obtaining the multilayer convolutional neural network which is the target multilayer convolutional neural network.

The embodiment of the invention obtains the final target multilayer convolutional neural network by adjusting the residual error layer, the BN layer, the excitation layer and the LSTM layer, and takes the character recognition problem as an end-to-end problem.

The following takes an equipment identification signboard in a transformer substation as an example, fig. 2 is a diagram of the equipment identification signboard of the transformer substation provided by the embodiment of the present invention, and as shown in fig. 2, the key steps involved in the method of the present invention are described in detail:

the method comprises the following steps of firstly, collecting a large number of nameplate pictures, manually marking character information on each nameplate, generating a text file for each nameplate, wherein the content of the text file is the character information on the nameplate, and the name of the text file is the name of the nameplate picture. And marking a large number of nameplate pictures, and dividing the nameplate pictures into a training set and a verification set according to the proportion.

And step two, constructing a multilayer convolutional neural network, including defining the number of layers, the size of convolutional kernels, the number of convolutional kernels and the like for forming the multilayer convolutional neural network. In consideration of the excellent performance of the residual neural network in image feature extraction, the residual neural network is adopted to extract features, feature maps extracted by a plurality of residual layers are sent to the LSTM layer to learn features, the output of the LSTM layer is used as the input of a CTC loss function, and finally a character recognition result is output. Preferably, a multi-layer convolutional neural network of 5 residual layers, 5 BN layers, 1 LSTM layer may be employed. The convolutional layer is designed in such a way that the features can be extracted quickly and efficiently.

And step three, training the network by adopting an error back propagation algorithm. And adopting the CTC loss function as an objective function of the multilayer convolutional neural network.

And step four, adding a residual error layer and an LSTM layer, designing a deeper network structure, and enabling the deeper network to learn better characteristics.

And step five, repeating the step three and the step four, namely adding a residual error layer every time to obtain a new network structure, and then training the new structure. And when the loss value is not reduced after the network layer is added or is not increased and is smaller than the set threshold value, taking the current network structure as the target multilayer convolutional neural network. Finally, through continuous trial, a neural network model formed by 8 layer residual error layers, 8 BN layers, 8 excitation layers and 2 LSTM layers is determined, so that the loss value can be ensured to be small enough, and the calculation time can be ensured to be short.

And step six, selecting a plurality of intermediate neural network models with smaller loss values from the intermediate neural network models stored during the final network structure training, verifying the models by using a verification set, and selecting one intermediate neural network model with the highest accuracy as a final target neural network model.

(1) In 7 months of 2017, the experiment of the invention is carried out on number 1 of Zhen outdoor avenue in the sunny ward region in Beijing, the substation equipment indicator is identified in the experiment, and characters on the indicator are identified by the method of the invention. The method comprises the steps of firstly collecting a large number of sign pictures, manually marking picture contents, training a network to obtain a device sign character recognition model, then inputting a picture, and directly outputting a sign character result.

(2) In 11 months in 2017, the experiment of the invention is carried out on number 1 of Zhen outdoors in the sunny region in Beijing, the safety warning board is identified in the experiment, and the characters on the warning board are identified by the method of the invention. Firstly, a large number of sign pictures are collected, picture contents are marked manually, a network is trained to obtain a device sign character recognition model, then a picture is input, and a warning sign character result is directly output.

(3) In 11 months in 2017, the experiment of the invention is carried out on number 1 of Anzhen outdoors in the sunny region in Beijing, the warning sign of the equipment is identified in the experiment, and the characters on the indicating plate are identified by the method of the invention. Firstly, collecting a large number of equipment warning mark pictures, manually marking the picture contents, training a network to obtain an equipment warning mark character recognition model, then inputting a picture, and directly outputting an equipment warning mark character result, wherein the whole recognition process is rapid and accurate, and successfully passes the test of a national information center software evaluation center.

Fig. 3 is a schematic structural diagram of a character recognition device according to an embodiment of the present invention, as shown in fig. 3, the character recognition device includes: an obtaining module 301 and an identifying module 302, wherein:

the acquiring module 301 is configured to acquire a picture to be identified, where the picture to be identified includes text information to be identified; the recognition module 302 is configured to use the picture to be recognized as an input of a target neural network model, and perform text recognition on the picture to be recognized through the pre-created target neural network model to obtain the text information in the picture to be recognized.

Specifically, the obtaining module 301 obtains a picture to be recognized, where the picture to be recognized may be a substation equipment sign, a safety warning sign, a road sign, and the like, and the picture to be recognized has text information to be recognized. After receiving the picture to be recognized, the recognition module 302 inputs the picture to be recognized into the target neural network model, and the target neural network model performs character recognition on the picture to be recognized to obtain character information in the picture to be recognized, which should be noted that the target neural network model is created in advance and trained, and can input corresponding character information according to the picture to be recognized input by the user.

The embodiment of the apparatus provided in the present invention may be specifically configured to execute the processing flows of the above method embodiments, and the functions of the apparatus are not described herein again, and refer to the detailed description of the above method embodiments.

On the basis of the above embodiment, the apparatus further includes a model creation module configured to:

Specifically, the specific steps of creating the target neural network model are consistent with the above embodiments, and are not described herein again.

On the basis of the above embodiment, the multilayer convolutional neural network includes a residual layer, a canonical BN layer, an excitation layer, and a temporal recurrent neural network LSTM layer, and the model creating module is specifically configured to:

On the basis of the above embodiment, the objective function of the target neural network model is a CTC loss function.

Fig. 4 is a schematic structural diagram of an entity of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes: a processor (processor)401, a memory (memory)402, and a bus 404; wherein,

the processor 401 and the memory 402 complete communication with each other through the bus 404;

the processor 401 is configured to call the program instructions in the memory 402 to execute the methods provided by the above-mentioned method embodiments, for example, including: acquiring a picture to be recognized, wherein the picture to be recognized comprises character information to be recognized; and taking the picture to be recognized as the input of a target neural network model, and performing character recognition on the picture to be recognized through the pre-established target neural network model to obtain the character information in the picture to be recognized.

The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above-mentioned method embodiments, for example, comprising: acquiring a picture to be recognized, wherein the picture to be recognized comprises character information to be recognized; and taking the picture to be recognized as the input of a target neural network model, and performing character recognition on the picture to be recognized through the pre-established target neural network model to obtain the character information in the picture to be recognized.

The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the methods provided by the above method embodiments, for example, including: acquiring a picture to be recognized, wherein the picture to be recognized comprises character information to be recognized; and taking the picture to be recognized as the input of a target neural network model, and performing character recognition on the picture to be recognized through the pre-established target neural network model to obtain the character information in the picture to be recognized.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The above-described embodiments of the apparatuses and the like are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for recognizing a character, comprising:

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the multi-layered convolutional neural network comprises a residual layer, a canonical BN layer, an excitation layer, and a temporal recurrent neural network (LSTM) layer, and wherein the training the multi-layered convolutional neural network using an error back propagation algorithm comprises:

4. The method of any one of claims 1-3, wherein the objective function of the target neural network model is a CTC loss function.

5. A character recognition apparatus, comprising:

6. The apparatus of claim 5, further comprising a model creation module to:

7. The apparatus of claim 6, wherein the multi-layer convolutional neural network comprises a residual layer, a canonical BN layer, an excitation layer, and a temporal recurrent neural network (LSTM) layer, and wherein the model creation module is specifically configured to:

8. The apparatus of any one of claims 5-7, wherein the objective function of the target neural network model is a CTC loss function.

9. An electronic device, comprising: a processor, a memory, and a bus, wherein,

the processor and the memory are communicated with each other through the bus;

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any one of claims 1-4.

10. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1-4.