CN109241974B

CN109241974B - Text image identification method and system

Info

Publication number: CN109241974B
Application number: CN201810965342.5A
Authority: CN
Inventors: 康立; 齐伟; 刘燕清
Original assignee: Suzhou Yantu Education Technology Co ltd
Current assignee: Suzhou Yantu Education Technology Co ltd
Priority date: 2018-08-23
Filing date: 2018-08-23
Publication date: 2020-12-01
Anticipated expiration: 2038-08-23
Also published as: CN109241974A

Abstract

The invention relates to a text image identification method, which comprises the following steps: inputting an image to be recognized into an image compression righting network to rotate so as to enable a text in the image to be recognized to be in a horizontal position, wherein the image compression righting network is obtained by training through a machine learning method and has an image rotation function; the image output from the image compression righting network identifies text. The invention has the beneficial effects that: the image to be recognized is automatically compressed and corrected by the convolutional self-encoder, and is recognized by the text recognition neural network, so that the accuracy of character recognition is ensured, the process of manual preprocessing is omitted, the manual labor is saved, and convenience is provided for users.

Description

Text image identification method and system

Technical Field

The invention belongs to the technical field of character recognition, and particularly relates to a text image recognition method and system.

Background

The OCR Character Recognition software is software that directly converts text contents on a picture or a photograph into an editable text by using an OCR (Optical Character Recognition) technique.

The existing character recognition process includes: converting the paper document into an electronic document by an electronic device, for example, acquiring an image file of the paper document by a scanner or a digital camera; and the OCR character recognition software analyzes the image file to acquire characters and layout information.

In the actual operation process, because the image files acquired by the electronic equipment are difficult to ensure horizontal arrangement, an operator needs to manually rotate the image text and adjust the character arrangement direction to be horizontal, when the number of paper documents to be identified is large, the workload of the operator is large, the identification efficiency is low, the manual operation is easy to make mistakes, and the identification accuracy is difficult to ensure.

Therefore, it is an urgent problem to be solved by those skilled in the art to provide a more convenient text image recognition method.

Disclosure of Invention

In order to solve the problems of low text recognition efficiency and low accuracy rate in the prior art, the invention provides a text image recognition method and system, which have the characteristics of high recognition efficiency, high accuracy rate and the like.

The invention aims to provide a text image recognition method and a text image recognition system which are convenient for people to use, save physical labor and have higher recognition efficiency.

The method for recognizing the text image according to the embodiment of the invention comprises the following steps: inputting an image to be recognized into an image compression normal position network, and rotating the image to be recognized so as to enable a text in the image to be recognized to be in a horizontal position, wherein the image compression normal position network is obtained by training through a machine learning method and has an image rotation function;

and identifying texts for the images output by the image compression positioning network.

Preferably, the image compression righting network rotates the image to be identified, and adds a mark site at the edge of the text image, wherein the mark site is used for distinguishing characters and blank areas in the text image;

the process of recognizing the text of the image output by the image compression normal position network comprises the following steps: and identifying texts from the images output by the image compression righting network according to the marked sites.

Preferably, the image compression righting network compresses the image to be recognized while rotating the image to be recognized.

Preferably, the compressed and rotated image to be recognized is cut line by line and word by word according to the mark point;

and inputting the cut image to be recognized into a text recognition neural network for text recognition, wherein the text recognition neural network is obtained by training through a machine learning method and has a text recognition function.

Preferably, the acquiring process of the text recognition neural network includes:

establishing a word stock;

building a multi-classification convolutional neural network;

selecting characters in a character library to be spliced into a complete image, and inputting the image compression righting network for compression

Training a convolutional neural network by using the compressed word stock of the image compression normal position network;

and obtaining the text recognition neural network.

Preferably, the text recognition neural network is composed of convolutional layers, pooling layers, full-link layers and corresponding network weights of the convolutional neural network.

Preferably, the image compression righting network consists of convolutional layers and pooling layers of the convolutional neural network.

Preferably, the method for acquiring the image compression righting network comprises:

acquiring a training image text;

rotating and righting the training images to be used as training targets, and marking original images to be used as training sets;

cutting the text line by line word by word according to a sample of a training target, and adding cutting sites at word intervals;

inputting the training sample and the training target into a convolution self-encoder for training, deleting a full connection layer in a decoder by the trained convolution self-encoder, and obtaining the image compression normal position network with automatic correction and compression capability.

Preferably, a distributed processing mode is adopted in the text recognition process, and a plurality of groups of text recognition neural networks work simultaneously; and integrating the results of the distributed text recognition in sequence to obtain a final text recognition result.

The system for recognizing a text image according to an embodiment of the present invention includes:

the text image acquisition module is used for acquiring an image to be identified by a user;

the image compression righting network is used for rotating and compressing the acquired image to be identified of the user;

the text cutting module cuts the image subjected to the rotary compression line by line and word by word; and

and the text recognition module recognizes the cut image and outputs corresponding characters.

The method and the system for recognizing the text image have the advantages that: by combining the self-encoder with the convolutional neural network, a user does not need to preprocess an original image, convenience is provided for the user, and high character recognition precision is maintained; the complicated steps of the existing character recognition are simplified, and the character recognition can be completed in the same network system.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a first flowchart of a method for recognizing text images according to an exemplary embodiment;

FIG. 2 is a block flow diagram II of a method for recognizing text images according to an exemplary embodiment;

FIG. 3 is a block flow diagram of a text recognition neural network composition provided in accordance with an exemplary embodiment;

FIG. 4 is a first flowchart diagram of the composition of an image compression righting network provided in accordance with an exemplary embodiment;

FIG. 5 is a block flow diagram II of a configuration of an image compression righting network provided in accordance with an exemplary embodiment;

FIG. 6 is a block diagram of an overall recognition network provided in accordance with an exemplary embodiment;

FIG. 7 is a block diagram of a text recognition system provided in accordance with an exemplary embodiment;

FIG. 8 is a schematic illustration of a text image after rotation and compression processing provided in accordance with an exemplary embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

Referring to fig. 1, an embodiment of the present invention provides a text image recognition method, including:

101. and acquiring an image to be identified.

102. Inputting an image to be recognized into an image compression righting network to rotate so as to enable a text in the image to be recognized to be in a horizontal position; the image compression normal position network is obtained by training through a machine learning method and has an image rotation function;

103. the image output from the image compression righting network device identifies text.

According to the text image identification method provided by the embodiment, the original image does not need to be preprocessed by a user in a mode of combining the self-encoder and the convolutional neural network, convenience is provided for the user, and meanwhile higher character identification precision is maintained; the complicated steps of the existing character recognition are simplified, and the character recognition can be completed in the same network system.

As a possible implementation of the above embodiment, the convolutional self-encoder includes an encoder composed of a plurality of convolutional layers, a pooling layer, and a decoder composed of an anti-pooling layer and an anti-convolutional layer. The convolutional layer comprisesThe method comprises the steps that a plurality of convolution kernels are used for carrying out feature extraction on an input image to obtain a feature map; the activation function of the convolutional layer may be: h is^k＝σ(x*W^k+β^k). The pooling layer performs a de-noising sampling operation on the feature map to reduce the computational complexity of the convolution operation. Deconvolution operation convolution operation and summation are performed on each feature map with the transpose of its corresponding convolution kernel, and the activation function may be: y-sigma (∑ h)^k*(W^T)^k+c)。

The rotation righting operation is carried out because the arrangement direction of most characters of the shot image and the upper edge of the page are not in the horizontal position, so that the difficulty of character segmentation and recognition is increased, the accuracy is reduced, and the convolution self-encoder is used for carrying out rotation transformation on the input text image, so that the arrangement direction of the output image text is horizontal.

Referring to fig. 2, in an embodiment of the present invention, while the image compression righting network rotates the image to be recognized, mark points are added at the edge of the text image, and the mark points are used for distinguishing characters and blank areas in the text image; the image recognition text output from the image compression righting network is: identifying texts from the images output by the image compression normal position network according to the marked sites;

the image compression normal position network compresses the image to be identified while rotating the image to be identified;

cutting the compressed and rotated image to be identified line by line and word by word according to the mark points;

The reason for compressing images while using an image compression righting network is that self-encoding itself has good image compression capability, which is too resource-wasting if used only for image rotation. Compressing the image after rotating it will avoid wasting resources. An autoencoder is a neural network with three layers: an input layer, a hidden layer (coding layer) and a decoding layer. The purpose of the network is to reconstruct its input so that its hidden layer learns a good representation of the input. An autoencoder neural network is an unsupervised machine learning algorithm that applies back propagation and can set a target value equal to an input value. The training goal of an autoencoder is to copy the input to the output. Internally it has a hidden layer that describes the code used to characterize its input. The convolution self-coding used by the invention is compatible with the denoising automatic encoder, and the damaged input is partially adopted at random to solve the risk of the identity function, so that the automatic encoder needs to recover or denoise. This technique can be used to get a good representation of the input. A good characterization refers to a characterization that can be robustly obtained from the corrupted input, which can be used to recover its corresponding noise-free input.

After the image is subjected to rotation compression, the processed image contains cutting sites, so that the image can be easily cut line by line word by word. Referring to fig. 8, as an embodiment of the present invention, the cutting point information after the text image is compressed by rotation includes three kinds of data, where the first data represents a line number from left to right, the second data represents an x-axis coordinate, and the third data represents a y-axis coordinate, so that each line of the text is marked with a label and a position, and the text can be easily cut.

As a possible implementation of the above embodiment, as shown in fig. 3, the text recognition neural network may be obtained by the following process:

301. establishing a word stock;

302. building a multi-classification convolutional neural network;

303. characters in a word stock are selected and spliced into a complete image, an image compression righting network is input for compression, and the word stock compressed by the image compression righting network is used for training a convolutional neural network;

304. and intercepting the trained convolutional neural network to obtain a text recognition neural network.

In an embodiment of the present invention, a training process of a convolutional word recognition network comprises:

firstly, collecting scanned pictures of different characters, establishing a complete character library, and compressing by using an image compression normal position network to serve as a sample set;

initializing the convolutional neural network, and assigning network weight value by using random parameter to make the network in the state of waiting for training

State-hardened, the network convolution layer uses the Relu activation function, i.e., f (x) max (0, x);

disordering, sequencing and grouping the collected word banks, inputting the word banks into the initialized convolutional neural network batch by batch, and training the network;

and observing the training progress, and performing cross validation on the training result until the network performance tends to converge, thereby finishing the training.

The text image processed by the image compression network contains high-level information of the original image, and unnecessary information is filtered and cut.

As a possible implementation manner of the foregoing embodiment, as shown in fig. 4, the acquiring process of the image compression righting network may include:

401. acquiring a training image text;

402. rotating and righting the training images to be used as training targets, and marking original images to be used as training sets;

403. cutting the text line by line word by word according to a sample of a training target, and adding cutting sites at word intervals;

404. inputting the training sample and the training target into a convolution self-encoder for training, deleting a full connection layer in a decoder by the trained convolution self-encoder, and obtaining the image compression normal position network with automatic correction and compression capability.

Referring to FIG. 5, in an embodiment of the present invention, training a convolutional auto-encoder comprises the following processes:

501. collecting training samples;

502. carrying out small-amplitude rotation on the training sample, adding a training set, and marking an original picture;

503. performing word-by-word cutting on the rotated and positioned pictures in the training sample;

504. adding cutting sites at the cutting positions, and splicing into a complete picture as a training target;

505. initializing a convolution self-encoder, and assigning a network by using a random value;

506. and (4) disordering and arranging the training samples, and inputting the training samples into the convolution self-encoder in batches for training until convergence.

The training mode is to minimize the reconstruction error between the reconstructed image of the convolution self-encoder and the training target. The training loss function uses a minimum mean square error function, i.e.

Wherein y is_iIn order to be a value for the training target,

are the values of the reconstructed image. The update formula of the convolution network parameters is as follows:

referring to fig. 6, in an embodiment of the present invention, the overall identification network includes: the image compression positioning network, a program for cutting the image to be identified word by word after rotation compression and a text identification neural network. Because the output result of the image compression righting network has obvious cutting sites, the character cutting does not need to be carried out according to the traditional character cutting mode. The input image can be dynamically scanned, the cutting sites are taken as boundaries, and the images among the cutting sites are connected with a character recognition neural network. If the processing capacity of the server side is enough, the whole recognition network does not need to be divided into two subsystems, and the character recognition neural network can be directly connected to the tail end of the convolutional self-encoder to form a complete neural network. The design avoids a large amount of communication congestion between the GPU and the CPU caused by character-by-character segmentation, and greatly improves the utilization efficiency and the calculation speed of the GPU.

In some embodiments of the present invention, the process of character recognition may adopt a distributed processing manner, and a plurality of overall recognition networks work simultaneously, which may greatly increase recognition speed.

The embodiment of the invention also provides a text image recognition system, which comprises:

the image compression normal position network is used for rotating and compressing the acquired image to be identified of the user;

the text cutting module is used for cutting the image subjected to the rotary compression line by line and word by word; and

and the text recognition module is used for recognizing the cut image and outputting corresponding characters.

In some embodiments of the present invention, the environment for recognizing the whole text image includes a plurality of terminals and a server, and the server is provided with the system for recognizing the text image. The terminal can be, but is not limited to, various personal computers, laptops, personal digital assistants, smart phones, tablet computers, portable wearable devices and the like capable of operating the mathematical formula detection method in the image. The server may be a server implementing a single function, or may be a server implementing multiple functions, specifically, an independent physical server, or a physical server cluster. The client terminal shoots a text to be identified, such as an examination paper, and sends the text to the server end through the network; and the server side carries out automatic preprocessing on the picture to be detected by using a convolution self-coding machine, and then identifies the picture by using a character identification network to obtain a final result. And the identification text content is communicated and returned to the client through the network, and the user obtains an identification result.

In an embodiment of the present invention, the computer device on the server side includes a processor, a memory, a network interface, a display screen, and an input device, which are connected through a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole terminal. The memory of the computer device includes a non-volatile storage medium and an internal memory, the non-volatile storage medium storing an operating system and a computer program that, when executed by the processor, causes the processor to implement a method of detecting a mathematical formula in an image. The internal memory of the computer device may also store a computer program that, when executed by the processor, causes the processor to perform a method of recognizing an overall text image. The network interface of the computer device is used for communicating with the terminal. The input device of the computer device may be a touch layer covered on a display screen, or an external keyboard, a touch pad, or a mouse, and the input device may obtain an instruction generated by a user using a finger to an operation interface displayed on the display screen, for example, obtain an instruction that the user inputs an image to be detected by clicking a specific option on a terminal. The display screen may be used to display text regions for input interfaces or outputs.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for recognizing a text image, comprising:

inputting an image to be recognized into an image compression normal position network, and rotating the image to be recognized so as to enable a text in the image to be recognized to be in a horizontal position, wherein the image compression normal position network is obtained by training through a machine learning method and has an image rotation function;

recognizing texts of the images output by the image compression normal position network;

the image compression normal position network rotates the image to be identified, and simultaneously adds mark sites on the edge of the text image, wherein the mark sites are used for distinguishing characters and blank areas in the text image;

the process of recognizing the text of the image output by the image compression normal position network comprises the following steps: recognizing texts from the images output by the image compression normal position network according to the marked sites;

the image compression normal position network consists of a convolutional layer and a pooling layer of a convolutional neural network;

the method for acquiring the image compression normal position network comprises the following steps:

acquiring a training image text;

2. The method of claim 1, wherein the image compression righting network compresses the image to be recognized while rotating the image to be recognized.

3. The method of claim 2,

cutting the compressed and rotated image to be identified line by line and word by word according to the mark point;

4. The method of claim 3, wherein the obtaining of the text recognition neural network comprises:

establishing a word stock;

building a multi-classification convolutional neural network;

characters in a character library are selected and spliced into a complete image, and the image compression righting network is input for compression;

and obtaining the text recognition neural network.

5. The method of claim 4, wherein the text recognition neural network is comprised of convolutional layers, pooling layers, fully-connected layers, and corresponding network weights of the convolutional neural network.

6. The method according to any one of claims 1 to 5, characterized in that a distributed processing mode is adopted for the text recognition process, and a plurality of groups of text recognition neural networks work simultaneously; and integrating the results of the distributed text recognition in sequence to obtain a final text recognition result.

7. A system for recognizing a text image, comprising:

the text cutting module cuts the image subjected to the rotary compression line by line and word by word; and the text recognition module is used for recognizing the cut image and outputting corresponding characters.