CN110363190A

CN110363190A - A kind of character recognition method, device and equipment

Info

Publication number: CN110363190A
Application number: CN201910681467.XA
Authority: CN
Inventors: 张瀚文; 张宏韬; 李兆佳; 曲建方
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2019-07-26
Filing date: 2019-07-26
Publication date: 2019-10-22

Abstract

This specification embodiment provides a kind of character recognition method, device and equipment.The described method includes: receiving images to be recognized；According to the corresponding characteristic value of pixel in the images to be recognized, text filed subgraph is obtained；The text filed subgraph is inputted into Text region model, obtains the identification lteral data corresponding to the text filed subgraph；The Text region model, including the machine learning model gone out based on endorsed image data sample training；It integrates the identification lteral data and feeds back.By the above method, the step of improving the accuracy of Text region in image, simplify Text region, so as to fast be accurately realized Text region.

Description

A kind of character recognition method, device and equipment

Technical field

This specification embodiment is related to field of computer technology, in particular to a kind of character recognition method, device and equipment.

Background technique

In the past, for the text in image, although the mankind can easily identify text therein, for computer For, text can not be directly identified from image.As the image comprising text that people are contacted is more and more, entirely Portion by manpower to the text in image carry out identification and it is unrealistic.Character recognition technology also just comes into being.Existing text Identification technology be mostly according to the training of a large amount of data sample, obtain can to the training pattern that text is identified and then It is trained using the training pattern.

But it when carrying out Text region using the training pattern in the prior art, generally requires to be utilized respectively multiple Stage is executed, and realizes that process is relatively complicated, when identifying text, needs to be in advance single word by character string cutting Symbol, is identified respectively with this, these single characters are carried out splicing again after the completion of identification and are integrated into normal text, so that in text Efficiency when word identifies is also more low.In addition, it is directed to scalloping, fuzzy situation, it often just can not be compared in the initial stage The region where text is judged well, brings no small difficulty to Text region.Therefore, needing one kind at present can facilitate It is accurately realized the technology of Text region.

Summary of the invention

The purpose of this specification embodiment is to provide a kind of character recognition method, device and equipment, to solve the prior art In in identification text be to need individually to annotate character, and identify the not high problem of accuracy, so that realization is quick and precisely Ground carries out Text region.

To solve the above-mentioned problems, this specification embodiment proposes a kind of character recognition method, device and equipment, specifically Implement as follows:

A kind of character recognition method, comprising:

Receive images to be recognized；

According to the corresponding characteristic value of pixel in the images to be recognized, text filed subgraph is obtained；

The text filed subgraph is inputted into Text region model as a whole, obtains corresponding to the text filed son The identification lteral data of image；The Text region model, including the machine gone out based on endorsed image data sample training Learning model；

It integrates the identification lteral data and feeds back.

A kind of character recognition device, comprising:

Image receiver module, for receiving images to be recognized；

Text filed subgraph obtains module, for obtaining according to the corresponding characteristic value of pixel in the images to be recognized Take text filed subgraph；

Identify that lteral data obtains module, for the text filed subgraph to be inputted Text region mould as a whole Type obtains the identification lteral data corresponding to the text filed subgraph；The Text region model, including based on endorsed Image data sample training go out machine learning model；

Feedback module, for integrating the identification lteral data and feeding back.

A kind of Text region equipment, comprising:

Memory, for storing computer instruction；

Processor performs the steps of reception images to be recognized for executing the computer instruction；According to it is described to It identifies the corresponding characteristic value of pixel in image, obtains text filed subgraph；As a whole by the text filed subgraph Text region model is inputted, the identification lteral data corresponding to the text filed subgraph is obtained；The Text region model, Including the machine learning model gone out based on endorsed image data sample training；It integrates the identification lteral data and feeds back.

The technical solution provided by above this specification embodiment is as it can be seen that this specification embodiment is receiving figure to be identified As after, first according to the corresponding characteristic value of pixel in images to be recognized, text filed subgraph is obtained.Utilize pixel spy Value indicative is to the text filed region for being judged to specify that needs are analyzed first, thus fuzzy in face of image or the feelings such as distort When condition, the region that can be accurately distributed to text is identified, improves the robustness of recognition methods.For text filed Subgraph is directly inputted and is identified in Text region model, without carrying out cutting according to character in text filed subgraph, Simplify execution step.The Text region model, including the machine learning gone out based on endorsed image data sample training Model enables the Text region model to identify text included in text filed subgraph or character.Most The text that will identify that afterwards is integrated and is fed back, so as to complete this Text region process.Pass through above-mentioned Text region side The introduction of the embodiment of method is as can be seen that this method can efficiently identify out the region where text, and simplify text The operating procedure of identification easily and accurately identifies text so as to realize.

Detailed description of the invention

In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only The some embodiments recorded in this specification, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of flow chart of character recognition method of this specification embodiment；

Fig. 2 is a kind of module map of character recognition device of this specification embodiment；

Fig. 3 is a kind of module map of Text region equipment of this specification embodiment；

Fig. 4 A is a kind of schematic diagram of the document of text filed subgraph to be determined of this specification embodiment；

Fig. 4 B is a kind of schematic diagram for the document that text filed subgraph has been determined of this specification embodiment；

Fig. 5 is a kind of structure chart of CRNN convolution loop neural network model of this specification embodiment.

Specific embodiment

Below in conjunction with the attached drawing in this specification embodiment, the technical solution in this specification embodiment is carried out clear Chu is fully described by, it is clear that described embodiment is only this specification a part of the embodiment, rather than whole implementation Example.The embodiment of base in this manual, those of ordinary skill in the art are obtained without making creative work Every other embodiment, all should belong to this specification protection range.

A kind of character recognition method of this specification embodiment is introduced below in conjunction with attached drawing 1.The execution of the method Main body is computer equipment, and the computer equipment includes but is not limited to server, industrial personal computer (industrial control computer), PC machine Deng.The method is shown in the specific implementation steps are as follows:

S100: images to be recognized is received.

Images to be recognized is the targeted image for needing to identify text therein of this specification embodiment.To Identify that in image include character to be identified and/or character string.The character includes English alphabet, number, symbol, Chinese character Deng.

The images to be recognized, which can be, is submitted directly to the computer equipment for identification by user, is also possible to it When his computer equipment has the image for needing to be identified to wherein text in task implementation procedure, described image is sent To the computer equipment for executing the character recognition method.It may include that other come that the computer, which receives images to be recognized also, Source, with no restriction to this.

The images to be recognized may include the images such as document image, paper image, handwritten text image.It is described to be identified Image for itself format and related particular content with no restriction.

S200: according to the corresponding characteristic value of pixel in the images to be recognized, text filed subgraph is obtained.

Image is made of numerous pixels.Pixel is the smallest partition unit of image, is corresponding with location information in the picture And characteristic value.Generally in the image comprising text, the color of text and the color different from of image background color.Pass through analysis chart As in the case where all pixels corresponding characteristic value, in the difference according to corresponding to the text filed characteristic value with image background color, It can determine the region that text is distributed.

But the image that the region that is distributed of text is constituted is often irregular, in later use Text region model When identifying to text, if numerical value corresponding to text is directly inputted Text region model, phase may can not be obtained The recognition effect answered.Therefore, in order to which subsequent process can better implement, finally obtained text filed subgraph can be Preferably identification is realized to meet the format for the training sample that text identification model is directed in rectangular area comprising text Effect.

It, can be first according to the corresponding characteristic value of pixel in the images to be recognized, meter in order to realize above-mentioned technical effect The corresponding activation score value of pixel is calculated, then obtains the pixel that activation score value is greater than point threshold, as activation pixel, finally again Using image outline detection algorithm, the acquisition of text filed subgraph is realized.

Characteristic value can be used to indicate that specific value corresponding to the feature of pixel.For example, under normal circumstances, text pair The color of the pixel answered pixel corresponding with image background is not identical.It can be using the color value of pixel as corresponding feature Value, the color value may include R numerical value, G numerical value and B numerical value when indicating the color of pixel using RGB；In benefit When indicating the color of pixel with CMY, the color value may include C numerical value, M numerical value and Y numerical value.Utilize HSV, HIS When indicating the color of pixel Deng other color spaces, the color value can also be set according to specific circumstances, herein no longer It repeats.

In some cases, text is typically distributed across specific position, such as the center of image, can also be corresponding by pixel Location information be set as characteristic value, corresponding activation pixel is obtained according to corresponding location information, constructs text filed subgraph Picture.After the characteristic value for calculating all pixels, the activation pixel for meeting text filed feature can be filtered out accordingly, construct text Region subgraph.

Activation score value refer to the pixel with it is text filed between correlation score.It activates score value higher, more may indicate In text filed.Obtain the corresponding activation score value of each pixel and then by the corresponding activation score value of pixel with set in advance Fixed point threshold is compared.If the activation score value of pixel is greater than point threshold, the pixel is obtained as activation pixel.Swash Look exactly like element is the pixel being distributed in text filed substantially.Pixel is activated according to these, recycles image outline detection algorithm, Text filed subgraph can be obtained.Described image contour detecting algorithm, may include Sobel operator, Laplacian operator, The edge detection algorithms such as Canny operator, determine text distribution image edge after, that is, can determine text distribution image.

The above process is illustrated using a specific example.It as shown in Figure 4 A, is a list figure to be identified Picture.The activation score value of the pixel in image is calculated first, further according to activation score value, is compared with activation point threshold Compared with determining activation pixel therein.In Fig. 4 A, the position of pixel is activated to be determined substantially.Further according to determining activation pixel, According to image outline detection algorithm, text filed subgraph is determined, as shown in Figure 4 B, text filed subgraph is by white box table Show.Using text filed subgraph as the image identified in subsequent process by Text region model.

It, when identifying the text in image, needs for character string to be split as single word different from the prior art Symbol, then each character is known otherwise respectively.It can in images to be recognized in this method embodiment for being identified With the character string for including a character or being made of multiple characters.It in the follow-up process, can directly will include multiple characters Input text image identification model in.So, operation is simplified, the efficiency of Text region is improved.

Preferably, when the images to be recognized is document image, numerous tables is usually contained in the document image Lattice item.Subsequent Text region is accurately executed in order to more convenient, avoids mixing in the corresponding text of different zones in document The problem of to cause identification to malfunction, the table line in document can be continued to identify first with table line detecting algorithm.Root According to the table line in the obtained document image of identification, document image is cut, obtain at least one cutting image and then Successively according to the corresponding characteristic value of pixel in each cutting image, corresponding text filed subgraph is obtained respectively.

Above-mentioned steps in order to facilitate understanding are illustrated using a specific example.It as shown in Figure 4 A, is a Zhang Dan According to, in document have different regions, respectively indicate different meanings.First choice identifies the table line in document, then According to table line, after carrying out cutting to document, multiple regions as illustrated in the drawing are formed, corresponding to different cutting images, These cutting images are identified respectively again, to reduce in identification process because obscuring confusion caused by region, are improved The accuracy rate of Text region.

S300: inputting Text region model for the text filed subgraph as a whole, obtains corresponding to the text The identification lteral data of region subgraph.

The Text region model can be the machine learning model gone out based on endorsed image data sample training. By being annotated in advance to the corresponding text of image each in image data sample, recycle machine learning to the model into Row training, can directly identify the text in image.

Preferably, the machine learning model gone out based on endorsed image data sample training, can be CRNN volumes Product Recognition with Recurrent Neural Network model.

In one embodiment, the CRNN convolution loop neural network model, including convolutional layer, batch standardization layer, Pond layer, dropout layers, circulation layer, transcription layer.Specifically, the CRNN convolution loop neural network model can be such as Fig. 5 institute Show, the CRNN convolution loop neural network model may include convolutional layer, circulation layer and transcription layer three parts.Convolutional layer master It is used to from the image of input extract characteristic sequence, circulation layer is used to predict the label distribution of the characteristic sequence proposed, and turns Record layer is mainly by converting final recognition result by operations such as duplicate removal integration for label distribution.

Specifically, in above-mentioned convolutional layer, and can wrap containing 5 modules, it may include again smaller in each module Convolutional layer.Module 1, module 2 are made of convolutional layer, batch standardization layer, pond layer, and module 3 is by two layers of convolutional layer and batch rule Generalized layer, pond layer form, and module 4 is made of three-layer coil lamination and batch standardization layer, pond layer, and module 5 is by convolutional layer and criticizes Standardize layer composition.Convolutional layer in module carries out convolution operation mainly for image and is convenient for extract the feature in image Subsequent step is identified.

Batch standardization layer is mainly used for the data for transmitting convolutional layer and standardizes again, so that the mean value of output data is close 0, standard deviation is close to 1, so that the phenomenon that alleviating the gradient disappearance or gradient explosion in training process, improves the convergence of machine learning Speed accelerates training progress.

Pond layer is mainly used for reducing received data, to improve calculating speed, increases the Shandong of extracted feature Stick

There are problems that over-fitting in order to avoid generating model, dropout layers can be increased after module 5, it is main to use In temporarily deleting at random the neuron in neural network, to increase the generalization ability of model.

Since the Text region model is based on directly on what character string was trained, when obtaining training data without It needs to divide, is marked without to each character, character string is labeled respectively directly.So, not only greatly Reduce the workload that the training stage is labeled data sample greatly, model service stage also can directly to character string into Row identification, simplifies identification process, improves the efficiency of Text region.

It,, can be with for front and back there are the character string of semantic relation under specifically identification scene meanwhile for character string It is further determined using the relationship, to guarantee that recognition result is more accurate.

During practical application, due to may be more similar between different types of character, such as letter o and number 0 May be more similar, carrying out identification according to image may bad identification.In response to this, it is respectively trained out using the above method English identification model, digital identification model and Chinese Character Recognition model etc., concrete application are directed to different application scenarios benefits in the process It is identified with corresponding model, so as to further increase the accuracy rate of identification.

In the training stage, by utilizing the data sample being labeled based on character string to the Text region model It is trained, so that the Text region model can not only directly identify character, also has for character string preferable Recognition effect simplifies training step, also improves text so as to be identified directly against the character string in input picture The efficiency of word identification.

S400: the integration identification lteral data is simultaneously fed back.

After identifying above-mentioned text filed subgraph, obtain corresponding respectively to each text filed subgraph Identify lteral data.In order to enable the lteral data showed still retains the bandwagon effect in original image, by the text Data are integrated according to position of the corresponding text filed subgraph in images to be recognized, so that it is guaranteed that the text identified Bandwagon effect it is normal.

In the case where the images to be recognized is document image, if before for the table line pair in the document image Document image is cut, then the identification lteral data is set to the corresponding region of cutting image first, then will include There is the cutting image of identification lteral data to be restored according to the corresponding position in images to be recognized, is preferably read to realize Read effect.

After by identification lteral data integration, by the identification lteral data after the integration according to reception data It is fed back in source.For example, the lteral data after identification is fed back to user for the images to be recognized that user directly provides； For the images to be recognized that other servers provide, the lteral data after identification is fed back into corresponding server.

Above method step is illustrated below with a specific Sample Scenario.User needs to count at year end It is used for the expense of office appliance in 1 year, but since document quantity is excessive, it is more to check that each document and counting can waste one by one Time, and it is easy error.The problem can be better solved by carrying out identification for the corresponding data in document.It collects first all Document, the corresponding scan image of these documents is committed to the server for being used for carrying out Text region.Server is in reception After stating image, according to the corresponding characteristic value of pixel in image, it is first determined the text filed subgraph for dividing this to be distributed.It will Text filed subgraph sequentially inputs Text region model, after identifying the wherein text datas such as corresponding character or character string. The text data that these are identified is integrated and fed back further according to position corresponding to text filed subgraph.User receives To after all text datas, directly wherein keyword relevant to office appliance is scanned for, so that it is determined that office appliance Corresponding price, and the expense of office appliance is finally counted, save corresponding resource and time.

Can be seen that text recognition method that this specification embodiment is introduced by the above method and Sample Scenario can To using the corresponding characteristic value of pixel in image, accurately text filed subgraph therein is determined, before recycling Text therein is identified using the machine learning model that the image data sample training of annotation goes out, so as to quickly quasi- It really realizes text identification, improves the efficiency of text identification, and ensure and accurately improve the Shandong of character recognition method Stick.

A kind of embodiment of character recognition device of this specification introduced below, described device are integrated in the computer and set It is standby.The specific module of described device is as shown in Figure 2:

Image receiver module 210, for receiving images to be recognized；

Text filed subgraph obtains module 220, is used for according to the corresponding characteristic value of pixel in the images to be recognized, Obtain text filed subgraph；

Identify that lteral data obtains module 230, for the text filed subgraph to be inputted Text region as a whole Model obtains the identification lteral data corresponding to the text filed subgraph；The Text region model, including be based on being criticized The machine learning model that the image data sample training of note goes out；

Feedback module 240, for integrating the identification lteral data and feeding back.

As shown in figure 3, this specification embodiment provides a kind of computer equipment.The computer equipment may include storage Device and processor.

In the present embodiment, the memory can be implemented in any suitable manner.For example, the memory can be Read-only memory, mechanical hard disk, solid state hard disk or USB flash disk etc..The memory can be used for storing computer instruction.

In the present embodiment, the processor can be implemented in any suitable manner.For example, processor can take example Such as microprocessor or processor and storage can by (micro-) processor execute computer readable program code (such as software or Firmware) computer-readable medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), programmable logic controller (PLC) and the form etc. for being embedded in microcontroller.

The processor can execute the computer instruction and perform the steps of reception images to be recognized；According to described The corresponding characteristic value of pixel in images to be recognized, obtains text filed subgraph；Using the text filed subgraph as whole Body inputs Text region model, obtains the identification lteral data corresponding to the text filed subgraph；The Text region mould Type, including the machine learning model gone out based on endorsed image data sample training；Integrate the identification lteral data and anti- Feedback.

In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog2.Those skilled in the art It will be apparent to the skilled artisan that only needing method flow slightly programming in logic and being programmed into integrated circuit with above-mentioned several hardware description languages In, so that it may it is readily available the hardware circuit for realizing the logical method process.

System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.

As seen through the above description of the embodiments, those skilled in the art can be understood that this specification It can realize by means of software and necessary general hardware platform.Based on this understanding, the technical solution of this specification Substantially the part that contributes to existing technology can be embodied in the form of software products in other words, the computer software Product can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes each embodiment of this specification or embodiment Certain parts described in method.

All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.

This specification can be used in numerous general or special purpose computing system environments or configuration.Such as: personal computer, Server computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, Set top box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer including any of the above system are set Standby distributed computing environment etc..

This specification can describe in the general context of computer-executable instructions executed by a computer, such as journey Sequence module.Generally, program module include routines performing specific tasks or implementing specific abstract data types, programs, objects, Component, data structure etc..This specification can also be practiced in a distributed computing environment, in these distributed computing environment In, by executing task by the connected remote processing devices of communication network.In a distributed computing environment, program module It can be located in the local and remote computer storage media including storage equipment.

Although depicting this specification by embodiment, it will be appreciated by the skilled addressee that there are many become for this specification Shape and the spirit changed without departing from this specification, it is desirable to which the attached claims include these deformations and change without departing from this The spirit of specification.

Claims

1. a kind of character recognition method characterized by comprising

Receive images to be recognized；

The text filed subgraph is inputted into Text region model as a whole, obtains corresponding to the text filed subgraph Identification lteral data；The Text region model includes: the machine learning gone out based on endorsed image data sample training Model；

It integrates the identification lteral data and feeds back.

2. the method as described in claim 1, which is characterized in that the images to be recognized, including document image.

3. method according to claim 2, which is characterized in that after the reception images to be recognized, further includes:

According to the table line in document image, document image is cut, obtains at least one cutting image；

Correspondingly, described obtain text filed subgraph according to the corresponding characteristic value of pixel in the images to be recognized, packet It includes:

According to the corresponding characteristic value of pixel at least one described cutting image, text filed subgraph is obtained.

4. the method as described in claim 1, which is characterized in that include at least one word in the text filed subgraph Symbol or character string.

5. the method as described in claim 1, which is characterized in that described according to the corresponding spy of pixel in the images to be recognized Value indicative obtains text filed subgraph, comprising:

According to the corresponding characteristic value of pixel in the images to be recognized, the corresponding activation score value of pixel is calculated；

The pixel that activation score value is greater than point threshold is obtained, as activation pixel；

Based on the activation pixel, text filed subgraph is obtained using image outline detection algorithm.

6. the method as described in claim 1, which is characterized in that the machine gone out based on endorsed image data sample training Device learning model, including CRNN convolution loop neural network model.

7. the method as described in claim 1, which is characterized in that the endorsed image data sample includes: for character The image data sample that string is annotated.

8. the method as described in claim 1, which is characterized in that described to integrate the identification lteral data and feed back, comprising:

The position that the identification lteral data corresponds in images to be recognized is integrated；

Identification lteral data after feedback integration.

9. a kind of character recognition device characterized by comprising

Image receiver module, for receiving images to be recognized；

Text filed subgraph obtains module, for obtaining text according to the corresponding characteristic value of pixel in the images to be recognized Region subgraph；

It identifies that lteral data obtains module, for the text filed subgraph to be inputted Text region model, is corresponded to The identification lteral data of the text filed subgraph；The Text region model, including based on endorsed image data sample Originally the machine learning model trained；

10. a kind of Text region equipment characterized by comprising

Memory, for storing computer instruction；

Processor performs the steps of reception images to be recognized for executing the computer instruction；According to described to be identified The corresponding characteristic value of pixel in image, obtains text filed subgraph；The text filed subgraph is inputted into Text region mould Type obtains the identification lteral data corresponding to the text filed subgraph；The Text region model, including based on endorsed Image data sample training go out machine learning model；It integrates the identification lteral data and feeds back.