CN105654127A

CN105654127A - End-to-end-based picture character sequence continuous recognition method

Info

Publication number: CN105654127A
Application number: CN201511018552.6A
Authority: CN
Inventors: 刘世林; 何宏靖; 陈炳章; 吴雨浓; 姚佳
Original assignee: Chengdu Business Big Data Technology Co Ltd
Current assignee: Chengdu Business Big Data Technology Co Ltd
Priority date: 2015-12-30
Filing date: 2015-12-30
Publication date: 2016-06-08

Abstract

The invention belongs to the picture character recognition field and relates to an end-to-end-based picture character sequence continuous recognition method. According to the method of the invention, CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) technologies are adopted; feature extraction is performed on a whole picture containing a plurality of characters through a CNN; identical features are transmitted to an RNN so as to be subjected to repeatedly recursive use; and continuous prediction of the plurality of characters can be realized. With the method adopted, a defect that picture segmentation is required before OCR (optical character recognition) can be eliminated, the early-stage processing process of picture character recognition can be simplified, and the efficiency of character recognition can be significantly improved; and since the RNN recursively uses output data of the last round, the recognition accuracy of character and word sequences can be better improved, and the processing efficiency of character recognition can be further improved.

Description

Based on picture character Sequentially continuous recognition methods end to end

Technical field

The present invention relates to pictograph identification field, particularly to based on picture character Sequentially continuous recognition methods end to end.

Background technology

Development along with society, create a large amount of to digitized demands of paper media such as ancient books, document, bill, business cards, here digitized is not limited only to use scanner or camera to carry out " photo ", the more important thing is to change into these paper documents and store with document readable, editable, realize this process to need the picture scanned is carried out pictograph identification, and traditional pictograph is identified as optical character identification (OCR), optical character identification is identified being scanned on the basis of electronic image by paper document to be identified. It is contemplated that the quality of the scanning quality of effect, paper document itself is (not such as printing quality, font clarity, font specification degree etc.), contents and distribution's (arranging situation of word, ratio plain text and table text and bill) difference, the actual effect of OCR is always not satisfactory. And requirement for the recognition accuracy of different paper documents is variant, the identification of such as bill, requirement to accuracy rate is very high, because if a numeral identifies that mistake this may result in fatal consequence, traditional OCR identifies can not meet so high-precision identification requirement.

Conventional OCR method includes the processing procedures such as the cutting of picture, feature extraction, monocase identification, and wherein the cutting of picture contains substantial amounts of Image semantic classification process, such as Slant Rectify, background denoising, the extraction of monocase; These processing procedures are not only loaded down with trivial details consuming time, and it would furthermore be possible to make picture lose a lot of available informations; And when picture to be identified comprises the character string of multiple word, traditional OCR method needs that former character string is cut into some little pictures comprising single word and identifies respectively, the method major problem is that: the cutting difficulty of a monocase picture, particularly it is mixed with the man of left and right radical, letter, numeral, symbol, or when the distortion of background noise, character, bonding, cutting is more difficult. And once cutting there is a problem, be difficult to obtain recognition result accurately. Identify that need to be badly in need of can image character recognition method rapidly and efficiently in the face of huge.

Summary of the invention

It is an object of the invention to overcome above-mentioned deficiency existing in prior art, it is provided that based on picture character Sequentially continuous recognition methods end to end.Invention applies convolutional neural networks (CNN) and the technology of recurrent neural network (RNN), by CNN, the whole picture comprising multiple character is carried out feature extraction, then same feature feeding RNN is carried out recurrence to reuse, to realize predicting continuously the purpose of multiple characters. The optical character sequence recognition that the inventive method realizes, overcoming before OCR identifies of system first to carry out the drawback of picture cutting, greatly improves the recognition efficiency of pictograph.

In order to realize foregoing invention purpose, the invention provides techniques below scheme:

Based on picture character Sequentially continuous recognition methods end to end, comprise implemented below step:

(1) convolutional neural networks and recurrent neural networks model are built, each moment of wherein said recurrent neural network input signal includes: the sample characteristics data that described convolutional neural networks extracts, the vector data that the words that the output data of a upper moment recurrent neural network and a upper moment recurrent neural network identify changes into;

(2) training sample set is used to train described convolutional neural networks and recurrent neural networks model;

(3) in described convolutional neural networks pictograph sequence inputting to be identified trained and recurrent neural network, the characteristic of picture to be identified is extracted by described convolutional neural networks, it is input in described recurrent neural network, through the iteration successively of described recurrent neural network, export the complete recognition result of pictograph sequence to be identified.

Concrete, the computing formula of the recurrent neural network forward algorithm used in the inventive method is as follows:

a_{h}^{t} = Σ_{i}^{I} w_{i h} x_{i}^{t} + Σ_{h^{'}}^{H} w_{h^{'} h} b_{h^{'}}^{t - 1}

b_{h}^{t} = θ (a_{h}^{t})

a_{k}^{t} = Σ_{h}^{H} w_{h k} b_{h}^{t}

y_{k}^{t} = \frac{\exp (a_{k}^{t})}{Σ_{k^{'}}^{k} \exp (a_{k^{'}}^{t})}

Wherein I is the dimension of input vector, and H is the neuron number of hidden layer, and K is the neuron number of output layer, and x is the characteristic that convolutional neural networks extracts,For the input of hidden layer neuron in current time recurrent neural network,Output for current time recurrent neural network hidden layer neuron; w_ih, w_h��h, forCorresponding weight parameter.For the current time neuronic input of recurrent neural network output layer; w_hkFor the weight that each neuron of output layer is corresponding;For the current time neuronic output of recurrent neural network output layer,It is a probit, represents the current time correspondence neuron output value ratio adding sum relative to all neuron output values of output layer.

Further, in the inventive method, the parameter w used during the transmission of signal forward_ih, w_h��hAll share across sequential, this avoid the linear increase of model complexity, cause possible over-fitting.

Further, the present invention adopts above-mentioned forward algorithm to transmit operational data in convolutional neural networks and recurrent neural network step by step, identification (prediction) data are got at output layer, when the annotation results with training sample that predicts the outcome has deviation, adjust each weight in neutral net by error backpropagation algorithm classical in neutral net.

Further, in neural network training process, checked the training result of neutral net by development set, adjust the training direction of neutral net in time, prevent the generation of over-fitting situation, in model training process, be only only remained in the training pattern that in development set, recognition accuracy is the highest.

Further, this comprises implemented below step based on the neural network training process of picture character Sequentially continuous recognition methods end to end:

(2-1) training sample manually marked is input in convolutional neural networks;

(2-2) by described convolutional network, input training sample is carried out feature extraction;

(2-3) characteristic extracted by described convolutional neural networks inputs in the first moment recurrent neural network as the first data;

(2-4) the first prediction data is exported through the calculating of the first moment recurrent neural network; Obtain the words recognition result of this moment recurrent neural network according to the first prediction data, this result is defined as: the first recognition result;

(2-5) vector data of correspondence and by the first recognition result is changed into;

(2-6) by the first data, first recognition result of the first prediction data and vectorization is as the input data of the second moment recurrent neural network, calculating through recurrent neural network exports the second prediction data, and obtains second recognition result corresponding by the second prediction data;

(27) using input data as the 3rd moment recurrent neural network of the first data and the second prediction data;

Recurrence successively, until reaching the recurrence number of times set, terminates identifying; Each moment RNN is doped word (or word) is recorded successively and just final is obtained complete string content.

Further, when carrying out model training, comprising and be normalized and artificial annotation process by training sample icon, wherein normalized process includes: sets most long word (or word) number that picture sentence is possible, such as sets the length of sentence as 25.

Further, in being normalized process, in order to avoid transformation of data, the zoom of size uses the mode of equal proportion, with the region background colour polishing of target size disappearance.

Further, normalized picture is manually marked, it is possible to first natural language is carried out word segmentation processing, such as " this thing is very good " is become " this thing is very good "; If the sentence number of words of mark is less than 25, using a special word:<EOS>carries out polishing (length to 25), then randomly select the data of 75% as training set, the data of 25% are as development set.

Compared with prior art, beneficial effects of the present invention: the present invention provides based on picture character Sequentially continuous recognition methods end to end, the present invention adopts convolutional neural networks that word sequence picture to be identified is carried out entirety and levies extraction, and the characteristic extracted is input in the recurrent neural network in each moment as the first data, the pictograph sequence recognition that the inventive method realizes, picture global feature is extracted by convolutional neural networks, it is being made without the basis of single character cutting and noise filtering achieves the identification of whole word sequence, relative to traditional OCR method, present invention, avoiding the inaccurate irreversible identification mistake that may result in of character segmentation, greatly simplify the early stage processing procedure of pictograph identification, significantly improve the efficiency of Text region.

In addition the inventive method realizes the continuous identification of character in word sequence by recurrent neural network, when using recurrent neural network to identify character, the input signal of each moment recurrent neural network also comprises the output data of a moment recurrent neural network, each moment recurrent neural network is when carrying out the Text region of correspondence, namely the picture global feature that convolutional neural networks extracts has been relied on, the output data of a upper moment recurrent neural network are also relied on it, compared to OCR method, recognition accuracy is higher, and simplify the end processing sequences identifying word, recognition efficiency is higher, recognition result is more accurately and reliably.

In a word, the inventive method simplifies the processing procedure of pictograph sequence recognition, significantly improve recognition efficiency and accuracy rate, make developer can focus more on the deposit of the tuning in model and data, improving development efficiency, the inventive method has high using value and is widely applied prospect in pictograph identification field.

Accompanying drawing illustrates:

Fig. 1 be the inventive method realize process schematic.

Fig. 2 is convolutional neural networks structural representation.

Fig. 3 is that the inventive method word sequence identification process signal flows to schematic diagram.

Detailed description of the invention

Below in conjunction with test example and detailed description of the invention, the present invention is described in further detail. But this should not being interpreted as, the scope of the above-mentioned theme of the present invention is only limitted to below example, and all technology realized based on present invention belong to the scope of the present invention.

The present invention provides based on picture character Sequentially continuous recognition methods end to end. Invention applies convolutional neural networks (CNN) and the technology of recurrent neural network (RNN), by CNN, the whole picture comprising multiple character is carried out feature extraction, then same feature feeding RNN is carried out recurrence to reuse, to realize predicting continuously the purpose of multiple characters. The optical character sequence recognition that the inventive method realizes, overcoming before OCR identifies of system first to carry out the drawback of picture cutting, greatly improve the recognition efficiency of pictograph, developer is made to focus more on the deposit of the tuning in model and data, improve development efficiency, again due in the process of model training and application RNN recurrence employ last round of output data, identification accurately higher.

Techniques below scheme is the invention provides: based on picture character Sequentially continuous recognition methods end to end, comprise implemented below step as shown in Figure 1 in order to realize foregoing invention purpose:

(1) convolutional neural networks and recurrent neural networks model are built, each moment of wherein said recurrent neural network input signal includes: the sample characteristics data that described convolutional neural networks extracts, the vector data that the words that the output data of a upper moment recurrent neural network and a upper moment recurrent neural network identify changes into; As shown in Figure 2: described convolutional neural networks is mainly used for the automatic study of picture feature. Wherein, each characteristic pattern (featuremap, shown in vertical setting of types rectangle in figure) generation be all by an own convolution kernel (i.e. little rectangle frame in Fig. 2, it is shared in the characteristic pattern specified) carry out preliminary feature extraction, the feature that convolutional layer is extracted by double sampling layer is sampled and is mainly solved the redundancy of convolutional layer institute extraction feature. In brief, described convolutional neural networks extracts the different characteristic of picture by convolutional layer, by double sampling layer, the feature extracted is sampled, (multiple convolutional layer can be comprised in a convolutional neural networks removing redundancy, double sampling layer and full articulamentum), finally by full articulamentum different characteristic patterns is together in series and constitutes final full picture feature, the inventive method uses a convolutional neural networks, whole pictures is carried out disposable feature extraction, it is entirely avoided the irreversible identification mistake that picture cutting may result in.

(2) training sample set is used to train described convolutional neural networks and recurrent neural networks model.

a_{h}^{t} = Σ_{i}^{I} w_{i h} x_{i}^{t} + Σ_{h^{'}}^{H} w_{h^{'} h} b_{h^{'}}^{t - 1}

b_{h}^{t} = θ (a_{h}^{t})

a_{k}^{t} = Σ_{h}^{H} w_{h k} b_{h}^{t}

y_{k}^{t} = \frac{\exp (a_{k}^{t})}{Σ_{k^{'}}^{k} \exp (a_{k^{'}}^{t})}

Wherein I is the dimension of input vector, and H is the neuron number of hidden layer, and K is the neuron number of output layer, and x is the characteristic that convolutional neural networks extracts,For the input of hidden layer neuron in current time recurrent neural network,Output (b for current time recurrent neural network hidden layer neuron⁰=0), �� () isArriveFunction; w_ih, w_h��h, forCorresponding weight parameter, in a forward algorithm transmittance process, parameter w_ih, w_h��hAll sharing across sequential, so-called sharing across sequential refers to that recurrent neural network is in signal forward transmittance process, each moment w_ih, w_h��hIdentical (the not w of value_ih=w_h��h), the not w of RNN in the same time_ih, w_h��hIt is worth identical, reduces the complexity of model parameter, it also avoid the linear increase of model complexity and cause possible over-fitting.For the current time neuronic input of recurrent neural network output layer; w_hkFor the weight that each neuron of output layer is corresponding;For the current time neuronic output of recurrent neural network output layer,It is a probit, represents the current time correspondence neuron output value ratio adding sum relative to all neuron output values of output layer, generally, will selectThe recognition result that classification is this moment recurrent neural network that the maximum output neuron of value is corresponding.

Can be seen that the input data of hidden layer neuron in the recurrent neural network used the inventive method include the CNN training sample feature extracted, the output data of a upper moment recurrent neural network hidden layer from above-mentioned formula. Therefore the recurrent neural network that the present invention uses is when the word (word) of prediction current time, both the feature of image had been relied on, be also relied on the feature (language model) of upper moment output, signal transduction process as it is shown on figure 3, recognition efficiency and accuracy rate.

Further, the present invention adopts above-mentioned forward algorithm to transmit operational data in convolutional neural networks and recurrent neural network step by step, identification (prediction) data are got at output layer, when the annotation results with training sample that predicts the outcome has deviation, each weight in neutral net is adjusted by error backpropagation algorithm classical in neutral net, error back propagation step by step is shared all neurons of each layer by error back propagation method, obtain the neuronic error signal of each layer, and then revise each neuronic weight. Successively transmitted operational data by forward algorithm, and the process being revised a neuronic weight by backward algorithm gradually is exactly the training process of neutral net; Repeating said process, until the accuracy that predicts the outcome reaches the threshold value set, deconditioning, now it is believed that neural network model is trained completes.

(2-7) using input data as the 3rd moment recurrent neural network of the first data and the second prediction data;

Recurrence successively, the vector of words (recognition result) correspondence that characteristic (the first data), the output data (prediction data) of a upper moment RNN and the upper moment RNN extracted by CNN identifies, as the input data of current time RNN, the prediction through RNN exports a word (or word); Until reaching the recurrence number of times set, terminate identifying; Each moment RNN is doped word (or word) is recorded successively and just final is obtained complete string content.

Further, when carrying out model training, comprise and training sample icon is normalized and artificial annotation process, normalized sample, the basic parameter making sample is homogeneous, reduces data unrelated complexity during model training, is conducive to simplified model training process; Wherein normalized process includes: set most long word (or word) number that picture sentence is possible, such as set the length of sentence as 25, the length of word sequence to be identified is corresponding with the maximum recurrence number of times of recurrent neural network, the longest number of characters that word sequence to be identified is set when being trained sample and preparing can be corresponding the maximum recurrence number of times of default recurrent neural network, increase the stability of model and predictable.

Further, normalized picture is manually marked, it is possible to first natural language is carried out word segmentation processing, such as " this thing is very good " is become " this thing is very good ". If sentence word (or word) number of mark less than the maximum word (or word) number (less than 25) arranged, uses one special word to carry out polishing (such as use "<EOS>" by the samples pictures polishing less than 25 characters (or word) to the length of 25 characters (or word)).

Further, after above-mentioned normalized and artificial mark, randomly select the data of 75% as training sample set, choose the data of 25% as development sample collection. Neutral net is only saved in the model that in development set, recognition accuracy is the highest, development sample and the uniform format of training sample in the training process, is conducive to improving the training effectiveness of neutral net.

Claims

1. based on picture character Sequentially continuous recognition methods end to end, it is characterised in that comprise implemented below step:

(1) building convolutional neural networks and recurrent neural networks model, input of wherein said recurrent neural network each moment signal includes: the sample characteristics data that described convolutional neural networks extracts, the output data of a upper moment recurrent neural network;

(3), in the described convolutional neural networks trained by pictograph sequence inputting to be identified and recurrent neural network, the complete recognition result of pictograph sequence to be identified is exported.

2. the method for claim 1, it is characterised in that: the recurrent neural networks model used in this method adopts following forward algorithm formula:

a_{h}^{t} = Σ_{i}^{I} w_{i h} x_{i}^{t} + Σ_{h^{'}}^{H} w_{h^{'} h} b_{h^{'}}^{t - 1}

b_{h}^{t} = θ (a_{h}^{t})

a_{k}^{t} = Σ_{h}^{H} w_{h k} b_{h}^{t}

y_{k}^{t} = \frac{\exp (a_{k}^{t})}{Σ_{k^{'}}^{k} \exp (a_{k^{'}}^{t})}

Wherein I is the dimension of input vector, and H is the neuron number of hidden layer, and K is the neuron number of output layer, and x is the characteristic that convolutional neural networks extracts,For the input of hidden layer neuron in current time recurrent neural network,Output for current time recurrent neural network hidden layer neuron;For the current time neuronic input of recurrent neural network output layer;For the current time neuronic output of recurrent neural network output layer,It is a probit, represents the current time correspondence neuron output value ratio adding sum relative to all neuron output values of output layer.

3. method as claimed in claim 2, it is characterised in that: described parameter w_ih, w_h��hFor sharing across sequential, the w that each moment uses in a sample training process_ih, w_h��hIt is worth identical.

4. method as claimed in claim 3, it is characterised in that: in neural network training process, checked the training result of neutral net by development set, be only remained in convolutional neural networks and recurrent neural networks model that in development set, recognition accuracy is the highest.

5. the method as described in one of claims 1 to 3, it is characterised in that: comprise implemented below step:

Recurrence successively, until when reaching the recurrence number of times set, terminating calculating.

6. method as claimed in claim 5, it is characterised in that: when preparing training sample and development sample, samples pictures being normalized, described normalized includes: the most long word number arranging that picture to be identified allows or word number.

7. method as claimed in claim 6, it is characterised in that: when the sample being normalized manually is marked, when the number of words comprised in samples pictures is less than the most long word number arranged, use the marker character set by the number of words polishing in samples pictures.