CN109583328A

CN109583328A - A kind of depth convolutional neural networks character identifying method being embedded in partially connected

Info

Publication number: CN109583328A
Application number: CN201811345088.5A
Authority: CN
Inventors: 徐琴珍; 曹钊铭; 张旭帆; 杨哲; 潘迪; 周宇; 杨绿溪
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2019-04-05
Anticipated expiration: 2038-11-13
Also published as: CN109583328B

Abstract

The invention discloses a kind of depth convolutional neural networks character identifying methods for being embedded in partially connected, belong to Information and Communication Engineering field.Character identifying method provided by the invention, it is based on depth convolutional neural networks, unlike existing method, the present invention is the improvement to depth convolutional neural networks structure, aiming at the problem that memory overflows when the training present in the legacy network, it is embedded in partially connected layer in original depth convolutional neural networks, increases the width of network, reduce parameter, reduces the dimension of parameter space.To save the required hardware spending in network training and test, reduce occupied memory headroom when trained network, required duration when shortening trained.

Description

A kind of depth convolutional neural networks character identifying method being embedded in partially connected

Technical field

The present invention relates to Information and Communication Engineering technical field, especially a kind of depth convolutional Neural for being embedded in partially connected Network character identifying method.

Background technique

Information technology is developed rapidly with the universal of computer, image as the important transmitting carrier in information technology, Very important status is accounted in information propagation, and is incorporated in our daily life.The artificial intelligence of computer is general In the life for being applied to us, all trades and professions are assisted in using the efficiently automatic processing capacity of computer and have become mitigation people Employee makees, and accelerates a kind of trend of working efficiency.However computer can not pass through vision as the mankind directly acquires image Included in information, thus image information be identified as this process of text information for computer understanding image have it is important Meaning.Currently, image recognition has been applied to various scenes, is such as used for the Vehicular real time monitoring of road, obtains license plate number information, The identification of the certificate photo of various government bodies and data input etc., assistance application greatly improves working efficiency, simultaneously also Reduce error rate, largely release human resources, to reduce cost of labor.

Character recognition is intended to that text information is detected and identified from picture, that is, is scanned to text information, then to image File is analyzed and processed, and obtains the process of text and layout information.It is mainly used in document identification and certificate identification.Document Identification can digitize document printing quick and precisely to extract effective information, and certificate identification is then by papers-scanning or copy Digitlization reduces working strength to improve working efficiency.For the identification process of character in image, compared to artificial, meter Embodiment is had the advantage that undoubtedly in the simple and duplicate work of calculation machine processing.It is to simulate the intelligence of human vision to target image Judged and is identified.By obtaining the image comprising text object, and the image comprising text information is subjected to digitlization knowledge Other process.

Conventional method requires to train detection model and identification model by manual extraction feature greatly, due to low-level image feature The distinctive semantic gap between high-level semantic is single when coping with the variation of multiclass font and complex background interferes Feature selecting is just no longer applicable in.Existing character identifying method, the character recognition engine Tesseract for having Google to increase income are (a kind of Character recognition technologies), although it is in identification English alphabet and fruitful in Arabic numerals, its identification on Chinese character There is no so ideal, subsequent improvement work are also cumbersome and complicated.Due to the fast development of deep learning, it is based on deep learning Character recognition technologies emerge in multitude.Artificial neural network serves as feature extractor and classifier, using character picture as input, Recognition result can directly be exported.This is also achieved that identification technology end to end.But character recognition is carried out with deep learning Also there is its drawback, the training of neural network needs a large amount of training data, and with the intensification of network structure, required hardware is opened Pin also has to take into account.

Although being carried out using the method for deep learning, character recognition is more efficient, and the training of neural network needs a large amount of Training data.Nowadays, in order to keep recognition result more accurate, precision is higher, is usually mentioned using depth convolutional neural networks Take feature, identification classification.But as network is deeper and deeper, gradient extinction tests are more obvious, required hardware spending also compared with It is considerable.

Summary of the invention

A kind of insertion partially connected is provided the technical problem to be solved by the present invention is to overcome the deficiencies in the prior art Depth convolutional neural networks character identifying method improves original depth convolutional neural networks.Retain its close connection Thought, to guarantee the mutual transmitting of gradient information.It is embedded in partially connected layer in the module, to save in network training and survey Required hardware spending when examination.

The present invention uses following technical scheme to solve above-mentioned technical problem:

A kind of depth convolutional neural networks character identifying method of the insertion partially connected proposed according to the present invention, including with Lower step:

Training set is sent into the depth convolutional neural networks detection mould for being used for the insertion partially connected of character recognition by step 1 In type, the depth convolutional neural networks detection model of insertion partially connected is trained, after the completion of training, what preservation was trained It is embedded in the model parameter of partially connected, is used for character recognition；

The structure of depth convolutional neural networks detection model are as follows: binary channels first is carried out simultaneously to the picture in the training set of input Row processing, wherein a channel processing mode are as follows: it first passes through convolutional layer and extracts feature, form characteristic pattern, then by pond layer, it is right Extracted characteristic pattern is compressed；The processing mode in another channel are as follows: using convolution kernel to the picture in the training set of input Carry out dimensionality reduction；The processing result in the two channels is connected by cascading layers；Then, then by sequentially connected first closely connect Connection module, First Transition module, the second close link block, the second transitional module and the close link block of third extract feature, After feature extraction finishes, the output of the close link block of third is passed through to normalization layer, active coating, displacement layer and smoothing layer again Subsequent processing is carried out, the result of output layer is finally obtained；

Step 2 first uses the existing convolutional neural networks model for text detection under natural scene to test picture In text information positioned and extracted, to the text information extracted be put into it is trained insertion partially connected depth roll up In product neural network detection model, tested.

It is further as a kind of depth convolutional neural networks character identifying method for being embedded in partially connected of the present invention Prioritization scheme, the detailed process being trained to the depth convolutional neural networks detection model of insertion partially connected in step 1 is such as Under:

Picture in data set is uniformly truncated into unified size by step 1.1, includes fixed number in every picture Character；Meanwhile one dictionary comprising commonly used word of production；Each word corresponds to a unique serial number, each in every picture Sequence composed by the corresponding serial number of word is exactly the label of this picture；Every picture is tagged, and be proportionally divided into Training set and test set；

Step 1.2, be loaded into the insertion partially connected depth convolutional neural networks detection model pre-training Model Weight, Start to train the depth convolutional neural networks detection model of the insertion partially connected；

Step 1.3, in the training process sets the threshold value stopped ahead of time as n, and n is the integer greater than 2 and less than 7, if The best test set error of n iteration is not identical, then return step 1.2, if the best test set error of n iteration no longer mentions Precision is risen, then deconditioning, stores the model parameter of trained insertion partially connected；And record training used in memory with Duration.

It is further as a kind of depth convolutional neural networks character identifying method for being embedded in partially connected of the present invention Prioritization scheme, is tested that specific step is as follows in step 2:

Step 2.1, using the existing convolutional neural networks model for text detection under natural scene to test picture In text information positioned and extracted；Essential characteristic first is extracted using depth convolutional neural networks and forms characteristic pattern, in institute Sliding window is done with k × k window on obtained characteristic pattern, it is k × k × C feature vector that each window, which obtains a length, and C is logical Road number, k are the size of window in sliding window operation, these feature vectors are for determining that target waits for favored area；V={ v_c,v_hIt is prediction The coordinate of text box,For actual text frame coordinate, v_cTo predict the longitudinally central position of text box, v_hFor prediction The height of text box,The longitudinally central position of actual text frame,For the height of actual text frame；

Wherein,Indicate the centre coordinate of text box vertical direction, h^aFor the height of text box, c_yIndicate predicted text The centre coordinate of this frame vertical direction, h are the height for the text box predicted,Indicate the center of actual text frame vertical direction Coordinate, h^*Indicate the height of actual text frame；

Obtaining the coordinate v={ v of prediction text box_c,v_hAfter, it is filtered out with the non-maxima suppression algorithm of a standard Extra prediction block；And the text chunk of prediction is merged into line of text；It completes to position the text information in test picture；

Step 2.2, the text information for extracting positioning are sent into the depth convolution of the trained insertion partially connected of step 1 In neural network detection model, the test of character recognition is carried out.

It is further as a kind of depth convolutional neural networks character identifying method for being embedded in partially connected of the present invention Prioritization scheme, the first close link block are used to extract initial characteristics, and First Transition module is used to reduce the first close connection mould The port number of block output carries out dimensionality reduction to the output of the first close link block, and the second close link block is used to further mention Feature is taken, the second transitional module is used to reduce the port number of the second close link block output, to the second close link block Output carries out dimensionality reduction, and the close link block of third is used to extract feature again, increases the ability to express of feature.

It is further as a kind of depth convolutional neural networks character identifying method for being embedded in partially connected of the present invention Prioritization scheme, k 3.

The invention adopts the above technical scheme compared with prior art, has following technical effect that

The present invention improves original depth convolutional neural networks, retains its close-connected thought, to guarantee ladder Spend the mutual transmitting of information；It is embedded in partially connected layer in the module, increases the width of network, reduces parameter, improve simultaneously Its detection accuracy；Reduce the dimension of parameter space；To save the required hardware spending in network training and test, subtract Occupied memory headroom when trained network is lacked, required duration when shortening trained.

Detailed description of the invention

Fig. 1 is the depth convolutional neural networks character recognizing process figure for being embedded in partially connected.

Fig. 2 is accuracy curve on training set and test set.

Fig. 3 is error curve on training set and test set.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with the accompanying drawings and the specific embodiments The present invention will be described in detail.

Fig. 1 is the depth convolutional neural networks character recognizing process figure for being embedded in partially connected；It is specific as follows:

Step 1: improving original depth convolutional neural networks

The thought that partially connected is combined in original depth convolutional neural networks structure, reduces parameter, and it is empty to reduce parameter Between dimension, to achieve the purpose that save the required hardware spending in network training and test.Improved nerve net Network structure is as shown in table 1.Table 2 is explained the term in table 1.

Table 1 is improved deep neural network structure

Table 2 is that the term in table 1 is explained

The structure of depth convolutional neural networks detection model are as follows: binary channels first is carried out simultaneously to the picture in the training set of input Row processing, wherein a channel processing mode are as follows: it first passes through convolutional layer and extracts feature, form characteristic pattern, then by pond layer, it is right Extracted characteristic pattern is compressed；The processing mode in another channel are as follows: utilize the figure in the training set of small convolution verification input Piece carries out dimensionality reduction；The processing result in the two channels is connected by cascading layers；Then, then it is close by sequentially connected first Link block, First Transition module, the second close link block, the second transitional module and the close link block of third extract feature And then subsequent processing is carried out by normalization layer, active coating, displacement layer and smoothing layer, finally obtain the result of output layer；

First close link block is mainly used to extract initial characteristics, and First Transition module is placed on the first close connection mould After block, for reducing the port number of the first close link block output, dimensionality reduction is carried out to the output of the first close link block, To reach reduction parameter, the purpose of parameter space dimension is reduced.Second close link block is mainly used to further extract feature, Second transitional module is placed on after the second close link block, for reducing the port number of the second close link block output, Dimensionality reduction is carried out to the output of the second close link block, to reach reduction parameter, reduces the purpose of parameter space dimension.Third is tight Close link block is mainly used to extract feature again, increases the ability to express of feature.

The structure of depth convolutional neural networks detection model are as follows: binary channels first is carried out simultaneously to the picture in the training set of input Row processing, wherein a channel processing mode are as follows: it first passes through convolutional layer and extracts feature, form characteristic pattern, then by pond layer, it is right Extracted characteristic pattern is compressed；The processing mode in another channel are as follows: using convolution kernel to the picture in the training set of input Carry out dimensionality reduction；The processing result in the two channels is connected by cascading layers；Then, then by sequentially connected first closely connect Connection module, First Transition module, the second close link block, the second transitional module and the close link block of third extract feature. First close link block includes n (value of n is between 6-48) two kinds of convolution kernel A and B of different sizes, first to cascading layers Output be normalized, then make to normalize result by active coating, be then utilized respectively convolution kernel A of different sizes and B pairs The output of active coating carries out convolution operation, and the result of convolution kernel A and B processing is spliced by cascading layers.At so repeating N times are managed, are sent in First Transition module.In First Transition module, closely connect first with the convolution kernel that size is C to first The output of connection module carries out convolution operation, then carries out pondization operation to the result after process of convolution.The is sent into after pondization processing Two close link blocks.Second close link block includes n (value of n is between 6-48) two kinds of convolution kernels of different sizes A and B (value of n, A, B are identical as the first close link block), is first normalized the output of First Transition module, then make Result is normalized by active coating, convolution kernel A and B of different sizes is then utilized respectively and convolution behaviour is carried out to the output of active coating Make, the result of convolution kernel A and B processing is spliced by cascading layers.So reprocessing n times, are sent to the second stage die In block.In the second transitional module, first with the convolution kernel that size is C (value of C is identical with First Transition module) to the The output of two close link blocks carries out convolution operation, then carries out pondization operation to the result after process of convolution.Pondization processing It is sent into the close link block of third afterwards.The close link block of third also include n (value of n is between 6-48) two kinds of sizes not Same convolution kernel A and B (value of n, A, B are identical as the first close link block), is first normalized the output of cascading layers, Make to normalize result again by active coating, is then utilized respectively convolution kernel A and B of different sizes and the output of active coating is rolled up Product operation splices the result of convolution kernel A and B processing by cascading layers.So reprocessing n times, so far, feature extraction It finishes.Later, after the output of the close link block of third to be passed through to normalization layer, active coating, displacement layer and smoothing layer progress again Continuous processing finally obtains the result of output layer.

In deep learning network, with the intensification of network, gradient disappearance problem can be further obvious.Dense block is (tight Close link block) layer there are problems that efficiently solve gradient disappearance.Each dense block layers by normalizing Layer, active coating, convolutional layer and cascading layers are constituted.Its premise of maximum information transmission between layers in guaranteeing network Under, all layers are connected, that is, output of each layer of the input from all layers of front.x_lRepresent l layers of output, H_lIndicate one Nonlinear transformation.[x₀,x₁,...,x_l-1] represent 0 Dao l layers output.Then,

x_l=H_l([x₀,x₁,...,x_l-1])

Block layers of each dense convolution operation comprising 81 × 1 and 3 × 3.1 × 1 convolution operation is mainly used for Dimensionality reduction, it can greatly reduce number of parameters and calculation amount.

Each transition layer (transition zone/transitional module) is abandoned by normalization layer, active coating, convolutional layer Layer, pond layer composition.Block layers of institute's output ground channel number of Dense are more, therefore transition layer is placed on Two block layers of dense intermediate, by 1 × 1 convolution kernel come dimensionality reduction.Since each layer is all mutual in dense block Connection, in order to avoid over-fitting, dropout layer (discarding layers) are added in transition layer to reduce at random point Branch.

Step 2: production data set.

Picture in training set is uniformly truncated into 280 × 32 sizes, includes ten characters in every picture.From corpus Choose include Chinese character, English alphabet, number and punctuate totally 5990 characters as label.By the tagged conduct of every picture Training sample.Made training sample shares 3644007, goes out training set and test set according to the ratio cut partition of 99:1, Middle training set is 3607567, and test set is 36440.

Step 3: being loaded into pre-training model, initialize parameters, start to train network.For trained GPU (at figure Reason device) it is configured to Tesla P100, parameter initialization value is as shown in table 3.

3 parameter initialization value of table

Parameter

img_h

img_w

batch_size

maxlabellength

epoch

nclass

Meaning

Picture height

Picture width

Block size

Maximum tag length

Wheel

Classification

Value

32

280

64

10

30

5990

In the training process, the threshold value of early stopping is set as 5, if the preferably verifying of 5 iteration (epoch) It is not identical to collect error, then return step 1.3, if 5 iteration (epoch) verifying collection errors will not promote precision again, stops Training, storage model.And record memory used in training and duration.Trained model is drawn again in training set and test set On curve, as shown in Figure 2.Error change curve of the model trained on training set and test set is as shown in Figure 3.It improves Preceding network can not be trained normally under the conditions of the hardware memory of 32G, it may appear that the problem of memory overflows.And improved net Network can complete training under the hardware condition, and shorten trained duration to a certain extent.

Step 4: utilizing the existing convolutional neural networks model for text detection under natural scene in test picture Text information positioned and extracted.The network is designed for text detection, first (is used for large-scale image using VGG16 The depth convolutional neural networks of identification) essential characteristic formation characteristic pattern is extracted, it is made on obtained characteristic pattern of 3 × 3 windows Sliding window, each window, which obtains the feature vector that a length is 3 × 3 × C (C is port number), to be used to predict and set between frame Offset distance.Detail is as follows, v={ v_c,v_hIt is opposite prediction coordinate,For actual text frame coordinate.Table Show the centre coordinate of text box vertical direction, h^aFor the height of text box, these can directly be calculated from input picture. c_yIndicating the centre coordinate of predicted text box vertical direction, h is the height for the text box predicted,Indicate actual text The centre coordinate of frame vertical direction, h^*Indicate the height of actual text frame.

After obtaining the text box of dense prediction, extra prediction is filtered out with the non-maxima suppression algorithm of a standard Frame.The text chunk predicted is finally merged into line of text, prepares output.

Step 5: the text information extracted being sent into trained model, the test of character recognition is carried out.Test institute Business license and text segment are sent into deep by 1050 Ti of GPU (graphics processor) model GeForce GTX used respectively Degree convolutional neural networks are tested.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers It is included within the scope of protection of the present invention.

Claims

1. a kind of depth convolutional neural networks character identifying method for being embedded in partially connected, which comprises the following steps:

Training set is sent into the depth convolutional neural networks detection model for the insertion partially connected for being used for character recognition by step 1, The depth convolutional neural networks detection model of insertion partially connected is trained, after the completion of training, saves trained insertion The model parameter of partially connected is used for character recognition；

The structure of depth convolutional neural networks detection model are as follows: binary channels first is carried out to the picture in the training set of input and is located parallel Reason, wherein a channel processing mode are as follows: first pass through convolutional layer and extract feature, form characteristic pattern, then by pond layer, to being mentioned The characteristic pattern taken is compressed；The processing mode in another channel are as follows: the picture in the training set of input is carried out using convolution kernel Dimensionality reduction；The processing result in the two channels is connected by cascading layers；Then, then by sequentially connected first mould is closely connected Block, First Transition module, the second close link block, the second transitional module and the close link block of third extract feature, feature After extraction finishes, the output of the close link block of third is passed through to normalization layer, active coating, displacement layer and smoothing layer again and is carried out Subsequent processing finally obtains the result of output layer；

Step 2 first uses the existing convolutional neural networks model for text detection under natural scene in test picture Text information is positioned and is extracted, and the depth convolution mind of trained insertion partially connected is put into the text information extracted Through being tested in network detection model.

2. a kind of depth convolutional neural networks character identifying method for being embedded in partially connected according to claim 1, special Sign is, is trained that detailed process is as follows in step 1 to the depth convolutional neural networks detection model of insertion partially connected:

Picture in data set is uniformly truncated into unified size by step 1.1, includes the word of fixed number in every picture Symbol；Meanwhile one dictionary comprising commonly used word of production；Each word corresponds to a unique serial number, each word in every picture Sequence composed by corresponding serial number is exactly the label of this picture；Every picture is tagged, and proportionally it is divided into instruction Practice collection and test set；

Step 1.2, be loaded into the insertion partially connected depth convolutional neural networks detection model pre-training Model Weight, start The depth convolutional neural networks detection model of the training insertion partially connected；

Step 1.3, in the training process sets the threshold value stopped ahead of time as n, and n is the integer greater than 2 and less than 7, if n is a The best test set error of iteration is not identical, then return step 1.2, if the best test set error of n iteration is no longer promoted Precision, then deconditioning, stores the model parameter of trained insertion partially connected；And record training used in memory and when It is long.

3. a kind of depth convolutional neural networks character identifying method for being embedded in partially connected according to claim 1, special Sign is, is tested that specific step is as follows in step 2:

Step 2.1, using the existing convolutional neural networks model for text detection under natural scene in test picture Text information is positioned and is extracted；Essential characteristic first is extracted using depth convolutional neural networks and forms characteristic pattern, acquired Characteristic pattern on k × k window do sliding window, it is k × k × C feature vector that each window, which obtains a length, and C is port number, K is the size of window in sliding window operation, these feature vectors are for determining that target waits for favored area；V={ v_c,v_hIt is prediction text The coordinate of frame,For actual text frame coordinate, v_cTo predict the longitudinally central position of text box, v_hTo predict text The height of frame,The longitudinally central position of actual text frame,For the height of actual text frame；

v_h=log (h/h^a)

Wherein,Indicate the centre coordinate of text box vertical direction, h^aFor the height of text box, c_yIndicate predicted text box The centre coordinate of vertical direction, h are the height for the text box predicted,Indicate that the center of actual text frame vertical direction is sat Mark, h^*Indicate the height of actual text frame；

Obtaining the coordinate v={ v of prediction text box_c,v_hAfter, it is extra to be filtered out with the non-maxima suppression algorithm of a standard Prediction block；And the text chunk of prediction is merged into line of text；It completes to position the text information in test picture；

Step 2.2, the text information for extracting positioning are sent into the depth convolutional Neural of the trained insertion partially connected of step 1 In network detection model, the test of character recognition is carried out.

4. a kind of depth convolutional neural networks character identifying method for being embedded in partially connected according to claim 1, special Sign is that the first close link block is used to extract initial characteristics, and First Transition module is used to reduce by the first close link block The port number of output carries out dimensionality reduction to the output of the first close link block, and the second close link block is used to further extract Feature, the second transitional module is used to reduce the port number of the second close link block output, to the defeated of the second close link block Dimensionality reduction is carried out out, and the close link block of third is used to extract feature again, increases the ability to express of feature.

5. a kind of depth convolutional neural networks character identifying method for being embedded in partially connected according to claim 3, special Sign is, k 3.