CN110991435A

CN110991435A - Express waybill key information positioning method and device based on deep learning

Info

Publication number: CN110991435A
Application number: CN201911182294.3A
Authority: CN
Inventors: 张登银; 张震; 周超; 丁飞; 赵莎莎
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2020-04-10

Abstract

The invention discloses an express waybill key information positioning method and device based on deep learning, wherein the method comprises the steps of pre-constructing and training two neural network classification models, and the first neural network model is used for identifying a key information area in an express waybill; the second neural network model is used for identifying key information in the key information area; acquiring an express bill image by using shooting equipment, extracting convolution characteristic mapping of the image to be detected through a convolution neural network from the image, inputting the convolution characteristic mapping into a first neural network model, and positioning and extracting a key information area; and extracting the convolution characteristic mapping of the key information area by utilizing a convolution neural network, inputting the convolution characteristic mapping into a second neural network model, and outputting key information. In order to reduce multi-factor interference of express image backgrounds, the two models are trained, so that the system has high identification accuracy.

Description

Express waybill key information positioning method and device based on deep learning

Technical Field

The invention relates to an express waybill key information positioning method and device based on deep learning, and belongs to the field of image processing.

Background

In recent years, the express industry has undergone rapid development in China, and express business volume has increased year by year. Except that the large-scale express delivery transfer stations adopt expensive intelligent sorting machines, most of the transfer stations manually sort packages. Due to the fact that the manual sorting speed is low and a certain error rate exists, the problems of transfer station piece stagnation and wrong sorting can be caused. In order to accelerate the efficiency of sorting and distribution of parcels which do not adopt a manual parcel sorting transfer station, it is significant to carry out key information positioning and identification on express list images. However, when shooting a waybill, the problems of insufficient brightness, blurring, containing much background, inclination of angle, etc. often exist. In addition, complicated table lines, irrelevant patterns, irrelevant character areas and the like on the express waybill image enable the positioning and identification of relevant information in the express waybill to be very challenging. In recent years, scholars propose a form positioning and extracting method based on graph representation and matching to position the key information of the express delivery form, but the accuracy of the method for positioning the key information of the express delivery form with weak light or with a mask is low.

Disclosure of Invention

The invention aims to provide an express waybill key information positioning method based on deep learning, so as to solve the problem of low accuracy of positioning of key information areas in an express waybill.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

on one hand, the invention provides an express waybill key information positioning method based on deep learning, which is characterized by comprising the following steps:

acquiring an express bill image by using shooting equipment, extracting convolution characteristic mapping of an image to be detected through a convolution neural network according to a set candidate frame, inputting the image to a first neural network model which is constructed and trained in advance, and positioning and extracting a key information area; and extracting the convolution characteristic mapping of the key information area by utilizing the convolution neural network, inputting the convolution characteristic mapping into a pre-constructed and trained second neural network model, and outputting key information. The first neural network model is used for identifying a key information area in the express bill; the second neural network model for identifying key information in the key information region further comprises:

the first neural network model and the second neural network model adopt the same structure. Preferably both adopt a Faster R-CNN model, wherein the fast R-CNN model comprises a region suggestion network and a region-based fast convolution neural network;

constructing an express bill picture library with a labeled key information area as a first training set and a first testing set; performing feature extraction on the training set by using a convolutional neural network to obtain a convolutional feature mapping of a first training set;

inputting convolution feature mapping of a first training set, and initializing parameters of a region suggestion network and a region-based fast convolution neural network of a first Faster R-CNN model; constructing a cost function of a first Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; alternately training a region suggestion network of the first Faster R-CNN model and a region-based fast convolution neural network to obtain a trained first Faster R-CNN model;

positioning and extracting key information areas by using a test set test key information area recognition model, marking key information in recognition results, and constructing a second training set and a second test set;

performing feature extraction on the second training set by using a convolutional neural network to obtain convolutional feature mapping of the second training set; inputting convolution feature mapping of a second training set, and initializing parameters of a region suggestion network and a region-based fast convolution neural network of a second Faster R-CNN model; constructing a cost function of a second Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; and alternately training the area suggestion network and the area-based fast convolution neural network of the second Faster R-CNN model to obtain the trained second Faster R-CNN model.

Further, the regional suggestion network includes one 3 × 3 convolutional layer and two 1 × 1 parallel convolutional layers; inputting the convolution feature map into a convolution layer of 3 x 3, and sliding on the input feature map by taking pixels as units according to a set candidate frame to obtain an anchor point; inputting the generated anchor points into two parallel convolution layers with 1 x1 input to carry out position regression and foreground and background judgment, respectively outputting foreground and background confidence coefficients of the anchor points and positions of all candidate frames, and screening out regions with the highest foreground confidence coefficient in a specific number from the obtained rectangular background selection frames according to preset conditions to obtain a final key region set.

Still further, when the number of positive samples of the candidate box does not satisfy the set threshold, the positive samples are supplemented by adopting the following method:

redefining a real frame corresponding to the candidate frame in the negative sample; and if the intersection ratio of the real frame of the negative sample and the candidate frame of the negative sample is greater than a set threshold value, putting the candidate frame into a supplementary positive sample set, and randomly selecting the candidate frame in the supplementary positive sample set to supplement the positive sample when the number of the positive samples does not meet the set threshold value. By adopting the method for supplementing the positive sample, more small target frames containing target information are marked as the positive sample, so that the positive sample which generally participates in training contains more target information, the model learns more target information, the problems of low convergence speed and possibility of reducing the accuracy of the model are solved to a certain extent, and the probability of misjudgment is reduced.

Still further, the calculation method of the frame merging and crossing ratio candidate is as follows:

the position information of the ith real box is expressed as:

wherein

Respectively represent_iThe coordinate values of the upper left corner and the lower right corner of the table;

the position information of the jth candidate box is expressed as:

wherein

Respectively represent a_jThe coordinate values of the upper left corner and the lower right corner of the table;

according to_iAnd a_jRedefining real frame gt 'corresponding to j candidate frame'_jiThe position information is:

at this time, real frame'_jiAnd candidate frame a_jThe cross-over IOU value between is expressed as:

here, area (×) indicates the area of the equation. The invention provides a positive sample supplement method, which improves the robustness of the system.

In the above embodiments, the region-suggested network of the first Faster R-CNN model employs candidate boxes with an aspect ratio of (0.3,0.5,0.8) and a scale of (64 × 64,128 × 128,256 × 256). The size of the selected candidate frame is more suitable for the size of the express waybill. The area of the second Faster R-CNN model suggests that the network adopts the candidate frames with the aspect ratio of (0.2,0.5,1) and the scale of (32 x 32,64 x 64,128 x 128), the sizes of the anchor points are more consistent with the sizes of the key information area and the key information target frame, the calculated amount of non-maximum value inhibition is reduced, the generation of the candidate frames with higher overlapping rate with the real frames is facilitated, and the recall rate of the model is increased.

In the above technical solution, preferably, the fast convolutional neural network based on the key information region includes two ROI pooling layers, one full-link layer and two parallel full-link layers, and respectively outputs the confidence of the key information region and the candidate frame position after the frame regression.

On the other hand, an express waybill key information positioner based on deep learning, its characterized in that includes: the system comprises a data collection module and an express waybill key information positioning module;

the data collection and collection module is used for collecting express bill images by using shooting equipment;

the express waybill key information positioning module is used for extracting convolution characteristic mapping of an image to be detected through a convolution neural network according to a set candidate frame, inputting the convolution characteristic mapping into a first neural network model which is constructed and trained in advance, and positioning and extracting key information areas; and extracting the convolution characteristic mapping of the key information area by utilizing the convolution neural network, inputting the convolution characteristic mapping into a pre-constructed and trained second neural network model, and outputting key information.

The pre-constructed and trained first neural network training model is used for identifying key information areas in the express waybill;

the pre-constructed and trained second neural network training model is used for identifying key information in a key information area;

further, still include: the system comprises a data set generation module and a convolutional neural network module; the first neural network training model and the second neural network training model are identical in structure, and the same Faster R-CNN model is adopted to obtain a first fast R-CNN model and a second fast R-CNN model; the first Faster R-CNN model and the second Faster R-CNN model both comprise a region suggestion network construction and training module and a region-based fast convolution neural network construction and training module;

the data set generating module is used for constructing an express bill picture library with labeled key information areas as a first training set and a first testing set;

the convolutional neural network module is also used for extracting the characteristics of the first training set to obtain a first training set convolution characteristic mapping, and the first training set convolution characteristic mapping is input to a first Faster R-CNN model;

a region suggestion network construction and training module and a region-based fast convolutional neural network construction and training module of the first Faster R-CNN model initialize parameters of the region suggestion network and the region-based fast convolutional neural network of the first Faster R-CNN model; constructing a cost function of a first Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; alternately training a region suggestion network of the first Faster R-CNN model and a region-based fast convolution neural network to obtain a trained first Faster R-CNN model;

the data set generating module is also used for positioning and extracting key information areas by using a test set test key information area recognition model, marking key information in recognition results and constructing a second training set and a second test set;

the convolutional neural network module is also used for extracting the characteristics of the second training set by using a convolutional neural network to obtain convolutional characteristic mapping of the second training set; inputting a second training set convolution feature mapping;

a region suggestion network construction and training module and a region-based fast convolutional neural network construction and training module of the second Faster R-CNN model initialize parameters of the region suggestion network and the region-based fast convolutional neural network of the second Faster R-CNN model; constructing a cost function of a second Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; and alternately training the area suggestion network and the area-based fast convolution neural network of the second Faster R-CNN model to obtain the trained second Faster R-CNN model.

The beneficial technical effects are as follows:

firstly, the key information of the express waybill is regarded as objects of different categories, and the problem of positioning the key information of the express waybill is solved by using a target identification technology; according to the method, two models are trained, the first neural network model is used for identifying the key information area in the express bill image, and the second neural network model is used for identifying the key information in the key information area, so that the accuracy of identifying the key information of the express bill by the neural network is improved.

Secondly, in the method, two neural network models, namely a Faster R-CNN model, fuse a regional suggestion network and a fast convolution neural network, so that the training and the testing of the whole network are very convenient, and the target detection precision is improved;

thirdly, according to the size of the key information target frame of the express waybill, the anchor point with a specific size is adopted, and the positioning speed and the accuracy of the key information area in the express waybill and the key information in the key information area are further improved.

Fourthly, the invention provides a positive sample supplementing method aiming at the problem that the number of positive samples is small in the problem of positioning the key information of the express waybill. The supplemented positive samples all contain partial targets, so that the model learns more target information, false reports and false reports are reduced to a certain extent, and the robustness of the system is improved.

Drawings

FIG. 1 is a result diagram of positioning key information of an express bill image by directly using a Faster R-CNN method;

FIG. 2 is a schematic flow chart of a method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a positive sample supplement method according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating the results of a positioning region M of a model A according to an embodiment of the present invention;

fig. 5 is a result diagram of positioning key information of an express bill image by a model B according to an embodiment of the present invention;

fig. 6 is a network structure diagram of a model a according to a second embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

At present, the problem of positioning of key information of an express waybill is solved by using a target identification technology. The method for directly positioning the key information of the express waybill in the prior art has the problems of key information false detection and missing detection as shown in figure 1.

The invention trains two neural network classification models, wherein a (first neural network model) is used for identifying an area M (a minimum rectangular area containing names, telephones and addresses of senders and receivers) in an express bill, and a (second neural network model) is used for identifying the names, telephones and addresses of the senders and receivers in the area M. The invention provides an express waybill key information positioning method based on deep learning, which comprises the following steps:

constructing and training two neural network classification models, wherein the first neural network model is used for identifying a key information area in an express bill; the second neural network model is used for identifying key information in the key information area;

acquiring an express bill image by using shooting equipment, extracting convolution characteristic mapping of the image to be detected through a convolution neural network from the image, inputting the convolution characteristic mapping into a first neural network model, and positioning and extracting a key information area; and extracting the convolution characteristic mapping of the key information area by utilizing a convolution neural network, inputting the convolution characteristic mapping into a second neural network model, and outputting key information.

In a specific embodiment, the first neural network model and the second neural network model may have the same structure or different structures.

In a first embodiment (as shown in fig. 2) for implementing the present invention, the first neural network model and the second neural network model use the same fast R-CNN structure, which includes the following steps:

step 1) collecting and marking express bill photos shot manually or mechanically, and segmenting the express bill photos into a training set and a test set;

step 2) inputting the training set in the step 1) into a Convolutional Neural Network (CNN) for feature extraction to obtain Convolutional feature mapping;

step 3), constructing a model A, namely a first Faster R-CNN model:

constructing a regional suggestion Network (RPN), the RPN comprising one 3 × 3 convolutional layer and two 1 × 1 parallel convolutional layers; inputting the convolution feature map into a convolution layer of 3 × 3, sliding the convolution feature map in units of pixels, and generating an anchor point according to the size of a candidate frame at each sliding position, wherein the candidate frames with the aspect ratio of (0.3,0.5,0.8) and the scale of (64 × 64,128 × 128,256 × 256) are selected in the embodiment, that is, each pixel generates 9 anchor points with different scales; the anchor point size better conforms to the size of the region M, reduces the amount of computation for non-maximum suppression, and is beneficial to generating candidate frames with higher overlapping rate with the real frames. Inputting the generated anchor points into two parallel convolution layers of 1 x1 to perform position regression and foreground and background judgment, respectively outputting foreground and background confidence coefficients of the anchor points and positions of all candidate frames, and screening out front specific number of regions with the highest foreground confidence coefficient from the obtained rectangular background selection frames according to preset conditions to obtain a final region suggestion set D;

constructing a Fast Region-Based convolutional neural network (Fast R-CNN) model, wherein the Fast R-CNN model consists of two ROI pooling layers, a full connection layer and two parallel full connection layers, and respectively outputting the confidence of the Region and the candidate frame position after frame regression; inputting the convolution characteristics into a Fast R-CNN model, and outputting the position, the category and the confidence coefficient of a target in an image;

step 4) modifying parameters related to the total number of the categories in the model A and output category labels according to the total number of the categories of the data set; initializing improved convolution layer parameters shared by RPN and Fast R-CNN by using downloaded and trained ImageNet classification model weight parameters, wherein the unique layers of the two networks adopt Gaussian distribution random initialization parameters with the average value of 0 and the standard deviation of 0.01;

step 5) constructing a cost function for training an RPN network and a cost function for training a Fast R-CNN network in the model A;

and 6) training the model by using a back propagation algorithm and a random gradient descent algorithm and adopting a mode of alternately training two networks of RPN and Fast R-CNN, wherein when the RPN is trained, if the number of positive samples is insufficient, a candidate box is randomly selected from C to supplement the positive samples.

In a specific embodiment, the method of supplementing the positive sample is as follows (as shown in fig. 3):

redefining a real frame corresponding to the candidate frame in the negative sample; and if the intersection ratio of the real frame of the negative sample and the candidate frame of the negative sample is greater than a set threshold value, putting the candidate frame into a supplementary positive sample set, and randomly selecting the candidate frame in the key area set to supplement the positive sample when the number of the positive samples still does not meet the set threshold value.

Further, the calculation method of the frame candidate intersection ratio is as follows:

the position information of the ith real box is expressed as:

wherein

the position information of the jth candidate box is expressed as:

wherein

here, area (×) indicates the area of the equation.

Sequentially adjusting the weight of each layer of neural network according to preset parameters to obtain a trained model A;

and 7) identifying the region M by using the test set test model A in the step 1), storing the result, marking the image, and segmenting the training set and the test set. Wherein six target frames of a receiver name, a receiver telephone, a receiver address, a sender name, a sender telephone and a sender address are marked;

step 8) extracting the characteristics of the training set in the step 7) through VGG-16 to obtain convolution characteristic mapping;

step 9) constructing a model B, constructing a cost function for training an RPN network and a cost function for training a second Fast R-CNN network in the model B, initializing parameters, and alternately training the RPN and the Fast R-CNN to obtain the trained model B for identifying the name of a receiver, the telephone of the receiver, the address of the receiver, the name of a sender, the telephone of the sender and the address of the sender in the area M;

step 10) testing the model B through the test set in the step 6);

and 11) preprocessing an express waybill image collected actually, inputting the image into the model A, identifying and extracting the area M, inputting the area M into the model B, identifying the name of a receiver, the telephone of the receiver, the address of the receiver, the name of a sender, the telephone of the sender and the address of the sender, and outputting the confidence coefficient and the position information of a target frame.

In step 1, an express waybill photo shot manually or by a machine is collected and marked, wherein a key information area of the express waybill is marked as an area M;

1.1) carrying out gray processing on the image, wherein the gray processing adopts a weighted average method:

V(x，y)＝0·299*RGB_R+0.587*RGB_G+0.114*RGB_B

where V (x, y) represents the gray level value of the color image converted into a gray level image, and RGB _ R, RGB _ G, RGB _ B represents the three separate intensity values of red, green, and blue, respectively.

1.2) marking an area M in the image by using a horizontal rectangular frame to obtain object frame type information (obj) and position information (x, y, w, h), wherein x, y, w, h respectively represent the upper left corner coordinate and width and height of the object frame;

1.3) making the marked images into a data set, and segmenting the data set into a training set and a testing set.

Step 2) inputting the training set in the step 1) into CNN for feature extraction to obtain convolution feature mapping; the CNN selects a network for feature extraction in the VGG-16 model, and is used for extracting features of the input image.

Step 3) constructing a model A, wherein the model integrates an improved regional suggestion network RPN and a Fast R-CNN network;

3.1) improved regional proposal network RPN: comprises a 3 × 3 convolutional layer and two 1 × 1 parallel convolutional layers; inputting the CNN-extracted convolution feature map into a 3 x 3 convolution layer, sliding on the input feature map in pixel units, and generating anchor points with an aspect ratio of (0.3,0.5,0.8) and a scale of (64 x 64,128 x 128,256 x 256) at each sliding position; inputting the generated anchor point into two 1 multiplied by 1 parallel convolutional layers for position regression and foreground and background judgment, and respectively outputting foreground and background confidence coefficients of the anchor point and positions of all candidate frames, wherein the positions of the candidate frames comprise coordinates x and y of a central point of the candidate frames and four parameters of width w and height h;

3.2) sorting the candidate frames output by the RPN in a descending order according to the score of softmax, reserving the first 2000 candidate frames, then merging the candidate frames by using a non-maximum suppression algorithm, reserving the first 300 candidate frames with the highest confidence coefficient, and obtaining a final RPN region suggestion set D;

3.3) Fast R-CNN consists of two ROI pooling layers, a full-link layer and two parallel full-link layers, and respectively outputs the confidence of the region and the candidate frame position after frame regression;

the ROI posing layer performs pooling operation on the region suggestion set D and the convolution feature mapping, maps the ROI to a corresponding position of the feature mapping according to input image, divides the mapped region into sections with the same size, and performs maximum pooling operation on each section;

the full-connection layer combines output results of the ROI posing layer, finally two full-connection layers which are connected in parallel are input, region classification and frame regression are carried out on the candidate frame, and the position, the category and the confidence coefficient of the target in the image are output;

step 5) constructing a cost function of training RPN and a cost function of Fast R-CNN:

the cost function for training the RPN in this embodiment is:

where j denotes the index of the candidate box, c_jRepresenting candidate box a_jThe probability of the class prediction of (c),

denotes a_jA class label of, and when a_jIn the case of a positive sample, the sample is,

otherwise

Representation of prediction a_jHas a central coordinate (x, y) of w and a width of h.

Representing the coordinates (x) of the center point of the real frame corresponding to the candidate frame of the positive sample^*，y^*) Width and height are respectively w^*，h^*. The parameter lambda is a balance weight parameter, N_clsTo the total number of anchor points, N_regIs the number of positive samples, L_clsAnd performing target and non-target classification on the candidate box by adopting cross entropy loss, wherein the classification is represented as follows: l is_cls(c，u)＝-logc_u。L_regFor regression losses, L₁The loss, expressed as:

the cost function for training Fast R-CNN in this embodiment is:

L(c，u，r^u，v)＝L_cls(c，u)+λ[u≥1]L_reg(r^u，v)

wherein c is class prediction probability, u is the u-th class, r^uAnd v is an actual correction value.

Step 6) training the model by adopting a mode of alternately training two networks of RPN and Fast R-CNN, which comprises the following steps:

6.1) RPN training end-to-end by back propagation and random gradient descent. Each mini-batch contains 256 candidate boxes for RPN extraction, with a 1: 1 ratio of positive and negative samples, for training. If the number of positive samples is insufficient, in order to enable the RPN to learn more target information so as to propose a candidate frame with higher quality, a method of randomly selecting negative samples for supplement is not adopted, and a candidate frame is randomly selected from C for supplement. This stage iterates a certain number of times to minimize classification errors and positive sample position deviations.

6.2) taking the candidate frame generated by the RPN as the input of a Fast R-CNN model, and independently training a Fast R-CNN detection network. This phase iterates a certain number of times to minimize the loss function of Fast R-CNN.

6.3) initializing the parameters of the shared convolution layer in the RPN by using the parameters of the shared convolution layer of the detection network in 5.2), then fixing the parameters of the shared convolution layer, and only finely adjusting the parameters of the RPN unique layer. This stage iterates a certain number of times to minimize classification errors and positive sample position deviations.

6.4) keeping the parameters of the convolutional layer shared by the two networks fixed, and only finely adjusting the parameters of the fast R-CNN full-connection layer. And iterating the stage for a certain number of times to obtain a trained model A.

And 7) using the test set test model A in the step 1), and marking the detection result as a data set. The method specifically comprises the following steps:

7.1) inputting the test image in the step 1), and extracting a characteristic diagram by using a convolutional neural network;

7.2) identifying the region M of the input image through the model A trained in the step 5) to obtain a region M target frame with the highest confidence coefficient and position information (x, y, w, h) of the region M target frame;

7.3) saving a key information area image through the position information of the target frame of the area M;

7.4) marking the key information in the image saved in the last step by using a horizontal rectangular frame, and marking the key information as six types including rev _ name, rev _ phone, rev _ address, name, phone and address. Respectively representing the name of the addressee, the telephone of the addressee, the address of the addressee, the name of the sender, the telephone of the sender and the address of the sender. And obtaining the position information (x, y, w, h) of each real frame;

7.5) segmenting the labeled image data set into a training set and a testing set.

Step 8) extracting the characteristics of the training set in the step 7) through CNN to obtain convolution characteristic mapping;

step 9) repeating the training steps of the RPN and Fast R-CNN networks in the area suggestion networks from the step 3) to the step 6); building a model B, building a cost function for training an RPN network and a cost function for training a Fast R-CNN network in the model B, initializing parameters, and alternately training the RPN and the Fast R-NN to obtain a trained model B for identifying the names of recipients, the telephones of the recipients, the addresses of the recipients, the names of the senders, the telephones of the senders and the addresses of the senders in the area M;

step 10) testing the model B through the test set in the step 8); the testing steps are the same as those in the step 7);

step 11) preprocessing an express waybill image collected actually, extracting a characteristic diagram of an image to be detected by using a CNN (CNN), inputting the characteristic diagram into a model A, and positioning and extracting an area M, as shown in FIG. 4; and then, a CNN is used for extracting a feature map of the area M, the feature map is input into a model B, the names of the addressees, the telephones of the addressees, the addresses of the addressees, the names of the senders, the telephones of the senders and the addresses of the senders are identified, the confidence coefficient and the position information of the target frame are output, and the positioning result is shown in FIG. 5.

The method and the system take the key information of the express waybill as objects of different categories, and solve the problems of positioning and extracting the key information of the express waybill. The characteristics of the express waybill image fully considered in the embodiment train two Faster R-CNN models, the model A is used for identifying the region M in the express waybill image, and the model B is used for identifying the key information in the region M, so that the method has high accuracy. On the basis, aiming at the problem of positioning the key information of the express waybill, the model adopts the anchor point with a specific size, so that the positioning speed and the positioning accuracy of the region M in the express waybill and the key information in the region M are further improved. Further, aiming at the problem that the number of positive samples is small in the process of training the RPN, a positive sample supplementing method is provided, and the robustness of the model is improved.

In the second embodiment of the present invention, the first neural network model adopts a Young Only Look One (YOLO) v3 structure, and the second neural network model adopts a Faster R-CNN structure, which includes the following steps:

step 1) collecting and marking express bill photos shot manually or mechanically, carrying out batch normalization after pretreatment in the step 1) of the embodiment, and segmenting the express bill photos into a training set and a test set;

the batch normalization method comprises the following steps:

suppose the bottom left corner and top right corner of an annotated target box are (x1, y1) and (x2, y2), respectively, and the width and height are w and h, respectively. Then, the coordinates of the normalized center point are ((x2+ x1)/2/w, (y2+ y1)/2/h), and the width and height of the normalized target box are (x2-x1)/w and (y2-y1)/h, respectively.

And 2) generating a candidate frame. And clustering the real target frames labeled by the training set, and acquiring initial candidate frames by using the IOU value as a rating index. The specific process is as follows:

and clustering the real target frames of the training set by adopting a K-means algorithm, and when the IOU value of the candidate frame and the IOU value of the real frame are not lower than 0.5, selecting the candidate frame as an initial candidate frame and marking the initial candidate frame as a positive sample. Meanwhile, in order to solve the problem of unbalance between positive and negative samples, the positive sample supplementing method provided by the text is adopted to increase the number of positive samples. Then, the distance dis (a, gt) of the real box a from the initial candidate box gt can be expressed as:

dis(a,gt)＝1-IOU(a,gt)

in order to accelerate the convergence speed of the training process, the initial candidate box is used as the initial network parameter of the model A.

And 3) constructing a model A, wherein the model adopts a YOLOv3 framework, namely the Darknet-53 and a detection network are composed of two parts, as shown in figure 6, and the two parts are respectively used for feature extraction and multi-scale prediction. Darknet-53 is a combination of successive 3X 3 and 1X 1 convolution layers, with the addition of a shortcut connection structure. The upper part of fig. 6 is the configuration parameters of the Darknet-53 skeleton network and the output parameters of the input image with size 256 × 256 after passing through the layers of the skeleton network. The left-hand number indicates the number of the right-hand residual operation cycles, and finally, feature maps of five scales of 128 × 128, 64 × 64, 32 × 32, 16 × 16 and 8 × 8 can be obtained.

And performing multi-scale prediction on the feature maps of 32 × 32, 16 × 16 and 8 × 8 after feature fusion. The specific characteristic fusion process is as follows: firstly, 5 convolution operations are performed on the 8 × 8 characteristic diagram, the convolution kernel size is 1 × 1, 3 × 3, 1 × 1 in sequence, and the step size is 1. And then, the convolution layer with the convolution kernel size of 3 x 3 and the step length of 1 and the convolution kernel number halved is connected to realize the dimension reduction effect. And then, performing double upsampling on the features, splicing the features with the previous features (the dimension is 16 x 16), and repeating the operation and splicing the feature map of 32 x 32. And finally, outputting prediction results on the fused feature maps with the sizes of 32 × 32, 16 × 16 and 8 × 8.

For the three feature maps after fusion, 3 frames are predicted for each pixel unit cell, and the center point coordinate (t) of each candidate frame is predicted_x，t_y) And width t_wAnd a high t_h. For the predicted cell, if the cell is offset from the top left corner of the picture's real frame (c)_x，c_y) And the real frame has a width p_wAnd a height p_hThen, the predicted candidate frame coordinate parameters are:

b_x＝σ(t_x)+c_x

b_y＝σ(t_y)+c_y

wherein, b_x，b_y，b_w，b_hAnd respectively representing the coordinates of the center point of the candidate frame obtained by prediction, width and height, and sigma (#) represents a square loss function.

Step 4) constructing a cost function L of the model A_detIncluding coordinate loss L_coordLoss of confidence L_confAnd a classification loss L_clsAnd can be expressed as:

Loss＝L_coord+L_conf+L_cls

wherein λ is_cWeight for coordinate loss, set to 5; s is the number of input image cells, and is set to 7 x 7; b, setting the number of the predicted candidate frames of each cell to be 9;

whether a detection target exists in the jth candidate frame of the ith cell or not is judged, if yes, the detection target is 1, and if not, the detection target is 0; x is the number of_i，y_i，w_i，h_iRespectively, the center coordinates, height, and width of the prediction candidate frame;

the center coordinates, height and width of a real target frame are referred to; lambda [ alpha ]_noobjThe weight, which refers to the confidence loss, is set to 0.5; c_iTo prepareMeasuring a confidence coefficient;

is the true confidence; cls refers to the class to which the detection target belongs; p is a radical of_i(cls) refers to the predicted probability that an object in a cell belongs to class cls;

is the actual probability.

And step 5) training the model A. The model is optimized in an end-to-end manner, and network parameters are optimized by using a multitask loss function. The entire training process used a batch-random gradient descent method to optimize the loss function for a total of 60000 iterations. And (3) calling initial network parameters of the model A, setting a learning rate to be 0.01, setting a weight attenuation value to be 0.0005, setting a batch training size to be 64, and setting the learning rates to be 0.001 and 0.0001 respectively after 20000 times and 50000 times of network iteration to finally obtain the trained model A.

And 6) constructing a model B according to the steps 7 to 10 in the embodiment, and training the model B to obtain the trained model B.

And 7) preprocessing the express waybill image collected in practice, inputting the model A, identifying and extracting the area M, inputting the area M into the model B, identifying the name of a receiver, the telephone of the receiver, the address of the receiver, the name of a sender, the telephone of the sender and the address of the sender, and outputting the confidence coefficient and the position information of the target frame.

The method and the system creatively take the key information of the express waybill as objects of different categories, and solve the problems of positioning and extracting the key information of the express waybill by using a target identification technology.

According to the method, two neural network classification models are trained according to the characteristics of the express waybill image, wherein the model A is used for identifying the region M in the express waybill image, and the model B is used for identifying the key information in the region M, so that the method has high accuracy. On the basis, aiming at the problem of positioning the key information of the express waybill, the model adopts the anchor point with a specific size, so that the accuracy of the region M in the express waybill and the accuracy of the key information in the region M are further improved. Further, aiming at the fact that the number of positive samples is small in the training process in the first embodiment, a positive sample supplementing method is provided, and the robustness of the model is improved.

Example (b): the utility model provides an express delivery waybill key information positioner based on deep learning, includes: the system comprises a first neural network model training module, a second neural network model training module, a data collection module and an express waybill key information positioning module;

the first neural network training model is used for training the first neural network training model and identifying a key information area in the express waybill;

the second neural network training model is used for training the second neural network training model according to the key information area in the express bill output by the first neural network training model and identifying the key information in the key information area;

the express waybill key information positioning module is used for extracting convolution characteristic mapping of an image to be detected through a convolution neural network from the image, inputting the convolution characteristic mapping into a first neural network model, positioning and extracting a key information area; and extracting the convolution characteristic mapping of the key information area by utilizing a convolution neural network, inputting the convolution characteristic mapping into a second neural network model, and outputting key information.

On the basis of the above embodiment, the method further comprises the following steps: the system comprises a data set generation module and a convolutional neural network module; the first neural network training model and the second neural network training model are identical in structure, adopt an Faster R-CNN model and comprise a region suggestion network construction and training module and a region-based fast convolution neural network construction and training module;

a region suggestion network construction and training module and a region-based fast convolutional neural network construction and training module of the second Faster R-CNN model initialize parameters of the region suggestion network and the region-based fast convolutional neural network of the second Faster R-CNN model; constructing a cost function of a second Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; and alternately training the area suggestion network and the area-based fast convolution neural network of the second Faster R-CNN model to obtain the trained second Faster R-CNN model. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An express waybill key information positioning method based on deep learning is characterized by comprising the following steps:

acquiring an express bill image by using shooting equipment, extracting convolution characteristic mapping of an image to be detected through a convolution neural network according to a set candidate frame, inputting the image to a first neural network model which is constructed and trained in advance, and positioning and extracting a key information area; and extracting the convolution characteristic mapping of the key information area by utilizing the convolution neural network, inputting the convolution characteristic mapping into a pre-constructed and trained second neural network model, and outputting key information.

2. The express waybill key information positioning method based on deep learning of claim 1, comprising the following steps:

the pre-built and trained first neural network model and the pre-built and trained second neural network model adopt the same fast R-CNN model, and the fast R-CNN models respectively comprise a region suggestion network and a region-based fast convolution neural network; the specific training method comprises the following steps:

3. The express waybill key information positioning method based on deep learning of claim 2, wherein the regional suggestion network comprises one 3 x 3 convolutional layer and two 1 x1 parallel convolutional layers; inputting the convolution feature map into a convolution layer of 3 x 3, and sliding on the input feature map by taking pixels as units according to a set candidate frame to obtain an anchor point; inputting the generated anchor points into two parallel convolution layers with 1 x1 input to carry out position regression and foreground and background judgment, respectively outputting foreground and background confidence coefficients of the anchor points and positions of all candidate frames, and screening out regions with the highest foreground confidence coefficient in a specific number from the obtained rectangular background selection frames according to preset conditions to obtain a final key region set.

4. The express waybill key information positioning method based on deep learning of claim 1, wherein when the number of positive samples of the candidate box does not meet a set threshold, the positive samples are supplemented by adopting the following method:

5. The express waybill key information positioning method based on deep learning of claim 4, wherein a candidate frame cross-correlation calculation method is as follows:

the position information of the ith real box is expressed as:

wherein

the position information of the jth candidate box is expressed as:

wherein

here, area (×) indicates the area of the equation.

6. The method of claim 3, wherein the area recommendation network of the first Faster R-CNN model uses candidate boxes with an aspect ratio of (0.3,0.5,0.8) and a scale of (64 x 64,128 x 128, 256) as the candidate boxes.

7. The method of claim 3, wherein the area recommendation network of the second Faster R-CNN model uses candidate frames with an aspect ratio of (0.2,0.5,1) and a scale of (32 x 32,64 x 64,128 x 128).

8. The express waybill key information positioning method based on deep learning of claim 2, wherein the fast convolutional neural network based on a key information region comprises two ROI pooling layers, one full-link layer and two parallel full-link layers, and the confidence of the key information region and the candidate frame position after frame regression are respectively output.

9. The utility model provides an express delivery waybill key information positioner based on deep learning which characterized in that includes: the system comprises a data set acquisition module and an express waybill key information positioning module;

10. The express waybill key information positioning device based on deep learning of claim 9,

the first neural network training model and the second neural network training model both adopt the same Faster R-CNN model to obtain a first Faster R-CNN model and a second Faster R-CNN model, and the device further comprises: the system comprises a data set generation module and a convolutional neural network module;

the first Faster R-CNN model and the second Faster R-CNN model both comprise a region suggestion network construction and training module and a region-based fast convolution neural network construction and training module;