CN109241974A

CN109241974A - A kind of recognition methods and system of text image

Info

Publication number: CN109241974A
Application number: CN201810965342.5A
Authority: CN
Inventors: 康立; 齐伟; 刘燕清
Original assignee: Suzhou Yan Tu Education Technology Co Ltd
Current assignee: Suzhou Yan Tu Education Technology Co Ltd
Priority date: 2018-08-23
Filing date: 2018-08-23
Publication date: 2019-01-18
Anticipated expiration: 2038-08-23
Also published as: CN109241974B

Abstract

The present invention relates to a kind of recognition methods of text image, it include: to rotate images to be recognized input picture compression normotopia network, so that the text in images to be recognized is in horizontal position, compression of images normotopia network is obtained by the method training of machine learning, has the function of image rotation；The image recognition text exported from compression of images normotopia network.The invention has the benefit that by carrying out compression and normotopia automatically to images to be recognized with convolution self-encoding encoder, and identified using text identification neural network, it ensure that the accuracy of Text region, eliminate artificial pretreated process, hand labor is saved, is provided convenience for user.

Description

A kind of recognition methods and system of text image

Technical field

The invention belongs to technical field of character recognition, and in particular to a kind of recognition methods and system of text image.

Background technique

OCR software for discerning characters refers to and utilizes OCR (Optical Character Recognition, optical character identification) Word content on picture, photo is converted directly into the software of editable text by technology.

Existing Text region process includes: that paper document is converted into electronic document by electronic equipment, such as by sweeping It retouches instrument or digital camera obtains the image file of paper document；OCR software for discerning characters is analyzed and processed image file, obtains Take text and layout information.

The above method in the actual operation process, due to electronic equipment obtain image file be difficult to ensure it is horizontally arranged, Therefore operator is needed to rotate manually to image text, adjustment character arranging direction to level, when papery text to be identified When gear number amount is more, operator's larger workload causes recognition efficiency low, and manual operation is easy error, it is also difficult to protect Demonstrate,prove recognition accuracy.

Can it be those skilled in the art's urgent need to resolve that a kind of more convenient text image recognition method therefore be provided Problem.

Summary of the invention

In order to solve the problems, such as that text identification low efficiency of the existing technology, accuracy rate are low, the present invention provides one kind The recognition methods and system of text image have the characteristics that recognition efficiency is high, accuracy rate is high.

It is convenient for people to use the object of the present invention is to provide one kind and saves manual labor and the higher text diagram of recognition efficiency As recognition methods and identifying system.

The recognition methods of the text image of specific embodiment according to the present invention comprising: images to be recognized is inputted Compression of images normotopia network is rotated, so that the text in the images to be recognized is in horizontal position, described image pressure Contracting normotopia network is obtained by the method training of machine learning, has the function of image rotation；

To the image recognition text of described image compression normotopia network output.

Preferably, while described image compression normotopia network rotates the images to be recognized, also in text diagram As edge addition marker site, the marker site is used to distinguish the text and white space in text image；

To the process of the image recognition text of described image compression normotopia network output are as follows: compress normotopia net from described image The image recognition text that network is exported according to the marker site.

Preferably, described image compression normotopia network is while rotate the images to be recognized, also to it is described to Identification image is compressed.

Preferably, it line by line, is word for word cut to described through compression and the progress of postrotational images to be recognized according to the mark point It cuts；

Images to be recognized input text identification neural network after cutting is subjected to text identification, the text identification nerve Network is obtained by the method training of machine learning, has text recognition function.

Preferably, the acquisition process of the text identification neural network includes:

Establish character library；

Build convolutional neural networks of classifying more；

It chooses the text in character library and is spliced into complete image, input described image compression normotopia network is compressed

Character library training convolutional neural networks after compressing normotopia Web compression using described image；

Obtain the text identification neural network.

Preferably, the text identification neural network by the convolutional neural networks convolutional layer, pond layer, full articulamentum It is constituted with corresponding network weight.

Preferably, described image compression normotopia network is made of the convolutional layer and pond layer of the convolutional neural networks.

Preferably, the acquisition methods of described image compression normotopia network include:

Obtain training image text；

Rotation normotopia is carried out to training image, as training target, and marks original image as training set；

According to the sample of training target, text is word for word cut line by line, cleavage site is added in text interval；

Training sample and training target input convolution self-encoding encoder are trained, the convolution after the completion of training is encoded certainly The full articulamentum in decoder is deleted, obtains to have to automatically correct and compresses normotopia network with the described image of compressed capability.

Preferably, to the process of text identification use distributed processing mode, multiple groups text identification neural network simultaneously into Row work；And the result of distributed text identification is integrated in order, obtain final text identification result.

The identifying system of the text image of specific embodiment according to the present invention, comprising:

Text image obtains module, and the text image obtains module for obtaining user's images to be recognized；

Compression of images normotopia network, described image compress normotopia network be used for user's images to be recognized of the acquisition into Row rotation and compression；

Text cutting module, the text cutting module are word for word cut the image after rotary compression line by line；With And

Text identification module, the text identification module carry out identification to the image after cutting and export corresponding text.

The recognition methods and system of a kind of text image provided in an embodiment of the present invention, beneficial effect include: by self-editing The mode that code device and convolutional neural networks combine, makes the user do not need again to pre-process original image, provide for user It is convenient, while maintaining higher Text region precision；The cumbersome step for simplifying existing Text region, makes Text region exist It can be completed in consolidated network system.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of flow diagram one of the recognition methods of the text image provided according to an exemplary embodiment；

Fig. 2 is a kind of flow diagram two of the recognition methods of the text image provided according to an exemplary embodiment；

Fig. 3 is the flow diagram of the text identification neural network composition provided according to an exemplary embodiment；

Fig. 4 is the flow diagram one of the composition of the compression of images normotopia network provided according to an exemplary embodiment；

Fig. 5 is the flow diagram two of the composition of the compression of images normotopia network provided according to an exemplary embodiment；

Fig. 6 is the structural schematic diagram of the whole identification network provided according to an exemplary embodiment；

Fig. 7 is the structural schematic diagram of the character identification system provided according to an exemplary embodiment；

Fig. 8 is provided according to an exemplary embodiment through rotating the schematic diagram with the text image after compression processing.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, technical solution of the present invention will be carried out below Detailed description.Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Base Embodiment in the present invention, those of ordinary skill in the art are obtained all without making creative work Other embodiment belongs to the range that the present invention is protected.

Shown in referring to Fig.1, the embodiment provides a kind of recognition methods of text image, comprising:

101, images to be recognized is obtained.

102, images to be recognized input picture compression normotopia network is rotated, so that at the text in images to be recognized In horizontal position；Wherein compression of images normotopia network is obtained by the method training of machine learning, has image rotation Function；

103, the image recognition text exported from compression of images normotopia Network device.

A kind of recognition methods of text image provided in this embodiment, is combined by self-encoding encoder and convolutional neural networks Mode, make the user do not need again to pre-process original image, provide convenience for user, while maintaining higher text Accuracy of identification；The cumbersome step for simplifying existing Text region, can be completed Text region in consolidated network system.

As a kind of feasible implementation of above-described embodiment, convolution self-encoding encoder includes by multiple convolutional layers, Chi Hua The encoder that layer is constituted and the decoder being made of anti-pond layer and warp lamination.Convolutional layer includes multiple convolution kernels, to input Image carries out feature extraction and obtains characteristic pattern；The activation primitive of convolutional layer can be with are as follows: h^k=σ (x*W^k+β^k).Pond layer is to feature Figure carries out noise reduction sampling operation, to reduce the calculation amount of convolution operation.Deconvolution is operated to its corresponding volume of every characteristic pattern The transposition of product core carries out convolution operation and simultaneously sums, and activation primitive can be with are as follows: y=σ (∑ h^k*(W^T)^k+c)。

It carries out rotating positive bit manipulation being because the orientation of the most text of image of shooting is not in page top edge Horizontal position causes the segmentation of text and identification difficulty to increase, and accuracy reduces, using convolution self-encoding encoder to the text of input Image carries out rotation transformation, keeps the image text orientation of output horizontal.

Referring to shown in Fig. 2, in a specific embodiment of the invention, compression of images normotopia network handles identify that image carries out While rotation, marker site also is added at text image edge, marker site is used to distinguish the text and sky in text image White region；The image recognition text exported from compression of images normotopia network are as follows: according to marker site from compression of images normotopia network The image recognition text of output；

While compression of images normotopia network handles identification image is rotated, also images to be recognized is compressed；

It line by line, is word for word cut according to mark point to through compression and the progress of postrotational images to be recognized；

Images to be recognized input text identification neural network after cutting is subjected to text identification, text identification neural network It is to be obtained by the method training of machine learning, there is text recognition function.

Carrying out compression to image while using compression of images normotopia network is because having from code book body good Compression of images ability, if being only used for image rotation excessively waste of resource.Therefore after being rotated to image to image into Row compression will avoid the wasting of resources.Autocoder is that a kind of have three layers of a neural network: input layer, hidden layer (coding layer) and Decoding layer.The purpose of the network is its input of reconstruct, and the study of its hidden layer is made to arrive the well-characterized of the input.Autocoder mind It is a kind of unsupervised machine learning algorithm through network, applies backpropagation, target value can be arranged to equal with input value. The training objective of autocoder is will to input to copy to output.In inside, it has a description for characterizing the code of its input Hidden layer.The convolution that the present invention uses has been compatible with denoising autocoder from coding, and randomly part is using impaired input Identity function risk is solved, so that autocoder must be restored or be denoised.What this technology can be used for being inputted Well-characterized.Good characterization refers to that the characterization that can steadily obtain from impaired input, the characterization can be used for restoring it Corresponding noiseless input.

After carrying out rotary compression to image, because treated, image includes cleavage site, can be easily by image It is word for word cut line by line.Referring to shown in Fig. 8, as a specific embodiment of the invention, to text image after rotary compression Cleavage site information include three kinds of data, wherein first data indicate that line number, second data indicate that x-axis is sat from left to right Mark, third data indicate y-axis coordinate, have all carried out the label of label and position to every row of text in this way, so that it may real easily Now to the cutting of text.

Referring to shown in Fig. 3, as a kind of feasible implementation of above-described embodiment, text can be obtained by following procedure This identification neural network:

301, character library is established；

302, convolutional neural networks of classifying are built more；

303, it chooses the text in character library and is spliced into complete image, input picture compression normotopia network is compressed, utilized Character library training convolutional neural networks after compression of images normotopia Web compression；

304, the convolutional neural networks after interception training, obtain text identification neural network.

In a specific embodiment of the invention, the training process of a convolution Text region network includes:

The scanned picture for first collecting different literals, establishes a complete character library, is carried out using compression of images normotopia network Compression processing, as sample set；

Convolutional neural networks are initialized, using random parameter assignment network weight, are in network wait instruct

Practice state, network convolutional layer uses Relu activation primitive, i.e. f (x)=max (0, x)；

The character library being collected into is upset into sequence and is grouped, then by batch initialized convolutional neural networks of input, to net Network is trained；

Training progress is observed, cross validation is carried out to training result, until network performance tends to restrain, completes training.

Text image after compression of images network processes includes the high-level information of original image, filters and has cut and is non- Necessary information.

Referring to shown in Fig. 4, as a kind of feasible implementation of above-described embodiment, the acquisition of compression of images normotopia network Process may include:

401, training image text is obtained；

402, rotation normotopia is carried out to training image, as training target, and marks original image as training set；

403, according to the sample of training target, text is word for word cut line by line, cleavage site is added in text interval；

404, training sample and training target input convolution self-encoding encoder are trained, certainly by the convolution after the completion of training Coding deletes the full articulamentum in decoder, and obtaining has the compression of images normotopia network automatically corrected with compressed capability.

Referring to Figure 5, in a specific embodiment of the invention, one convolution self-encoding encoder of training includes following mistake Journey:

501, training sample is collected；

502, training sample is slightly rotated, training set is added, and mark original image；

503, the picture after rotation normotopia in training sample is word for word cut；

504, cleavage site is added in cut place, is spliced into full picture as training target；

505, convolution self-encoding encoder is initialized, random value assignment network is used；

506, training sample random ordering is arranged, inputs convolution self-encoding encoder in batches and is trained until convergence.

Training method is the reconstructed error between the reconstructed image and training target for minimizing convolution self-encoding encoder.Training loss Function uses lowest mean square difference function, i.e.,

Wherein y_iFor training target target value,For the value of reconstructed image.The more new formula of convolutional network parameter are as follows:

It is in a specific embodiment of the invention, whole to identify that network includes: compression of images normotopia net referring to shown in Fig. 6 Network, program and text identification neural network that the images to be recognized after rotary compression is word for word cut.Due to figure Output result as compressing normotopia network has apparent cleavage site, therefore text cutting is not necessarily in conventional text cutting side Formula.Can dynamic scan input picture, using cleavage site as boundary, image between cleavage site connects a Text region nerve net Network.It is whole to identify that network be split as two subsystems, Text region nerve net if server-side processes ability is enough Network can be directly connected to convolution self-encoding encoder end, form a complete neural network.The design avoid because text by A large amount of communication congestions between GPU and CPU produced by word is divided, the significant increase utilization efficiency and calculating speed of GPU.

In some embodiments of the invention, distributed processing mode can be used in the process of Text region, multiple whole Body identification network works at the same time, and recognition speed can be substantially improved.

The embodiments of the present invention also provide a kind of systems of text image identification, comprising:

Text image obtains module, for obtaining user's images to be recognized；

Compression of images normotopia network, for user's images to be recognized of acquisition to be rotated and compressed；

Text cutting module, for word for word being cut the image after rotary compression line by line；And

Text identification module exports corresponding text for carrying out identification to the image after cutting.

In some embodiments of the invention, the use environment that text image integrally identifies includes multiple terminals and one A server end, the system that server end is equipped with the identification of above-mentioned text image.Terminal can be but not limited to various energy operation figures As in the personal computer of mathematical formulae detection method, laptop, personal digital assistant, smart phone, tablet computer and Portable wearable device etc..Server can be the server for realizing simple function, be also possible to realize the clothes of multiple functions Business device, specifically can be independent physical server, is also possible to physical server cluster.It is identified needed for client terminal shooting Text, such as a examination paper, are sent to server end by network；Server end by picture to be measured using convolution from code machine into Row automation pretreatment, reuses Text region network and is identified, obtain final result.Identify content of text by network again Communication return is carried out to client, user obtains recognition result.

In a specific embodiment of the invention, the computer equipment of server end includes the place connected by system bus Manage device, memory, network interface, display screen and input unit.Wherein, the processor is for providing calculating and control ability, branch Support the operation of entire terminal.The memory of computer equipment includes non-volatile memory medium and built-in storage, non-volatile to deposit Storage media is stored with operating system and computer program, when which is executed by processor, so that processor is realized Mathematical formulae detection method in a kind of image.Built-in storage in computer equipment can also store computer program, the calculating When machine program is executed by processor, processor may make to execute a kind of recognition methods of whole text image.Computer equipment Network interface with terminal for communicating.The input unit of computer equipment can be the touch layer covered on display screen, can also be with It is external keyboard, Trackpad or mouse etc., input unit can obtain the operation interface that user uses finger to show display screen The instruction of generation, such as obtain user and input image to be detected etc. by clicking the particular options in terminal.Display screen can be used for Show the text filed of input interface or output.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any The technical staff for being familiar with this art field in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all cover Within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

1. a kind of recognition methods of text image characterized by comprising

Images to be recognized input picture compression normotopia network is rotated, so that the text in the images to be recognized is in water Flat position, described image compression normotopia network are obtained by the method training of machine learning, have the function of image rotation；

2. the method according to claim 1, wherein described image compresses normotopia network to the images to be recognized While rotation, marker site also is added at text image edge, the marker site is for distinguishing in text image Text and white space；

To the process of the image recognition text of described image compression normotopia network output are as follows: compress normotopia network root from described image The image recognition text exported according to the marker site.

3. according to the method described in claim 2, it is characterized in that, described image compresses normotopia network to the images to be recognized While rotation, also the images to be recognized is compressed.

4. according to the method described in claim 3, it is characterized in that,

It line by line, is word for word cut to described through compression and the progress of postrotational images to be recognized according to the mark point；

Images to be recognized input text identification neural network after cutting is subjected to text identification, the text identification neural network It is to be obtained by the method training of machine learning, there is text recognition function.

5. according to the method described in claim 4, it is characterized in that, the acquisition process of the text identification neural network includes:

Establish character library；

Build convolutional neural networks of classifying more；

It chooses the text in character library and is spliced into complete image, input described image compression normotopia network carries out compression and utilizes the figure As the character library training convolutional neural networks after compression normotopia Web compression；

Obtain the text identification neural network.

6. according to the method described in claim 5, it is characterized in that, the text identification neural network is by the convolutional Neural net Convolutional layer, pond layer, full articulamentum and the corresponding network weight of network are constituted.

7. according to the method described in claim 5, it is characterized in that, described image compresses normotopia network by the convolutional Neural net The convolutional layer and pond layer of network form.

8. the method according to the description of claim 7 is characterized in that the acquisition methods of described image compression normotopia network include:

Obtain training image text；

Training sample and training target input convolution self-encoding encoder are trained, the convolution after the completion of training is encoded into deletion certainly Full articulamentum in decoder obtains to have to automatically correct and compresses normotopia network with the described image of compressed capability.

9. method according to any one of claims 1 to 8, which is characterized in that used to the process of text identification distributed Processing mode, text identification neural network described in multiple groups work simultaneously；And in order to the result of distributed text identification It is integrated, obtains final text identification result.

10. a kind of identifying system of text image characterized by comprising

Compression of images normotopia network, described image compression normotopia network is for revolving user's images to be recognized of the acquisition Turn and compresses；

Text cutting module, the text cutting module are word for word cut the image after rotary compression line by line；And text This identification module, the text identification module carry out identification to the image after cutting and export corresponding text.