CN110991435A - Express waybill key information positioning method and device based on deep learning - Google Patents

Express waybill key information positioning method and device based on deep learning Download PDF

Info

Publication number
CN110991435A
CN110991435A CN201911182294.3A CN201911182294A CN110991435A CN 110991435 A CN110991435 A CN 110991435A CN 201911182294 A CN201911182294 A CN 201911182294A CN 110991435 A CN110991435 A CN 110991435A
Authority
CN
China
Prior art keywords
neural network
key information
region
faster
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911182294.3A
Other languages
Chinese (zh)
Inventor
张登银
张震
周超
丁飞
赵莎莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201911182294.3A priority Critical patent/CN110991435A/en
Publication of CN110991435A publication Critical patent/CN110991435A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/225Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an express waybill key information positioning method and device based on deep learning, wherein the method comprises the steps of pre-constructing and training two neural network classification models, and the first neural network model is used for identifying a key information area in an express waybill; the second neural network model is used for identifying key information in the key information area; acquiring an express bill image by using shooting equipment, extracting convolution characteristic mapping of the image to be detected through a convolution neural network from the image, inputting the convolution characteristic mapping into a first neural network model, and positioning and extracting a key information area; and extracting the convolution characteristic mapping of the key information area by utilizing a convolution neural network, inputting the convolution characteristic mapping into a second neural network model, and outputting key information. In order to reduce multi-factor interference of express image backgrounds, the two models are trained, so that the system has high identification accuracy.

Description

Express waybill key information positioning method and device based on deep learning
Technical Field
The invention relates to an express waybill key information positioning method and device based on deep learning, and belongs to the field of image processing.
Background
In recent years, the express industry has undergone rapid development in China, and express business volume has increased year by year. Except that the large-scale express delivery transfer stations adopt expensive intelligent sorting machines, most of the transfer stations manually sort packages. Due to the fact that the manual sorting speed is low and a certain error rate exists, the problems of transfer station piece stagnation and wrong sorting can be caused. In order to accelerate the efficiency of sorting and distribution of parcels which do not adopt a manual parcel sorting transfer station, it is significant to carry out key information positioning and identification on express list images. However, when shooting a waybill, the problems of insufficient brightness, blurring, containing much background, inclination of angle, etc. often exist. In addition, complicated table lines, irrelevant patterns, irrelevant character areas and the like on the express waybill image enable the positioning and identification of relevant information in the express waybill to be very challenging. In recent years, scholars propose a form positioning and extracting method based on graph representation and matching to position the key information of the express delivery form, but the accuracy of the method for positioning the key information of the express delivery form with weak light or with a mask is low.
Disclosure of Invention
The invention aims to provide an express waybill key information positioning method based on deep learning, so as to solve the problem of low accuracy of positioning of key information areas in an express waybill.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
on one hand, the invention provides an express waybill key information positioning method based on deep learning, which is characterized by comprising the following steps:
acquiring an express bill image by using shooting equipment, extracting convolution characteristic mapping of an image to be detected through a convolution neural network according to a set candidate frame, inputting the image to a first neural network model which is constructed and trained in advance, and positioning and extracting a key information area; and extracting the convolution characteristic mapping of the key information area by utilizing the convolution neural network, inputting the convolution characteristic mapping into a pre-constructed and trained second neural network model, and outputting key information. The first neural network model is used for identifying a key information area in the express bill; the second neural network model for identifying key information in the key information region further comprises:
the first neural network model and the second neural network model adopt the same structure. Preferably both adopt a Faster R-CNN model, wherein the fast R-CNN model comprises a region suggestion network and a region-based fast convolution neural network;
constructing an express bill picture library with a labeled key information area as a first training set and a first testing set; performing feature extraction on the training set by using a convolutional neural network to obtain a convolutional feature mapping of a first training set;
inputting convolution feature mapping of a first training set, and initializing parameters of a region suggestion network and a region-based fast convolution neural network of a first Faster R-CNN model; constructing a cost function of a first Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; alternately training a region suggestion network of the first Faster R-CNN model and a region-based fast convolution neural network to obtain a trained first Faster R-CNN model;
positioning and extracting key information areas by using a test set test key information area recognition model, marking key information in recognition results, and constructing a second training set and a second test set;
performing feature extraction on the second training set by using a convolutional neural network to obtain convolutional feature mapping of the second training set; inputting convolution feature mapping of a second training set, and initializing parameters of a region suggestion network and a region-based fast convolution neural network of a second Faster R-CNN model; constructing a cost function of a second Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; and alternately training the area suggestion network and the area-based fast convolution neural network of the second Faster R-CNN model to obtain the trained second Faster R-CNN model.
Further, the regional suggestion network includes one 3 × 3 convolutional layer and two 1 × 1 parallel convolutional layers; inputting the convolution feature map into a convolution layer of 3 x 3, and sliding on the input feature map by taking pixels as units according to a set candidate frame to obtain an anchor point; inputting the generated anchor points into two parallel convolution layers with 1 x1 input to carry out position regression and foreground and background judgment, respectively outputting foreground and background confidence coefficients of the anchor points and positions of all candidate frames, and screening out regions with the highest foreground confidence coefficient in a specific number from the obtained rectangular background selection frames according to preset conditions to obtain a final key region set.
Still further, when the number of positive samples of the candidate box does not satisfy the set threshold, the positive samples are supplemented by adopting the following method:
redefining a real frame corresponding to the candidate frame in the negative sample; and if the intersection ratio of the real frame of the negative sample and the candidate frame of the negative sample is greater than a set threshold value, putting the candidate frame into a supplementary positive sample set, and randomly selecting the candidate frame in the supplementary positive sample set to supplement the positive sample when the number of the positive samples does not meet the set threshold value. By adopting the method for supplementing the positive sample, more small target frames containing target information are marked as the positive sample, so that the positive sample which generally participates in training contains more target information, the model learns more target information, the problems of low convergence speed and possibility of reducing the accuracy of the model are solved to a certain extent, and the probability of misjudgment is reduced.
Still further, the calculation method of the frame merging and crossing ratio candidate is as follows:
the position information of the ith real box is expressed as:
Figure BDA0002291588610000041
wherein
Figure BDA0002291588610000042
Respectively representiThe coordinate values of the upper left corner and the lower right corner of the table;
the position information of the jth candidate box is expressed as:
Figure BDA0002291588610000043
wherein
Figure BDA0002291588610000044
Respectively represent ajThe coordinate values of the upper left corner and the lower right corner of the table;
according toiAnd ajRedefining real frame gt 'corresponding to j candidate frame'jiThe position information is:
Figure BDA0002291588610000045
at this time, real frame'jiAnd candidate frame ajThe cross-over IOU value between is expressed as:
Figure BDA0002291588610000046
here, area (×) indicates the area of the equation. The invention provides a positive sample supplement method, which improves the robustness of the system.
In the above embodiments, the region-suggested network of the first Faster R-CNN model employs candidate boxes with an aspect ratio of (0.3,0.5,0.8) and a scale of (64 × 64,128 × 128,256 × 256). The size of the selected candidate frame is more suitable for the size of the express waybill. The area of the second Faster R-CNN model suggests that the network adopts the candidate frames with the aspect ratio of (0.2,0.5,1) and the scale of (32 x 32,64 x 64,128 x 128), the sizes of the anchor points are more consistent with the sizes of the key information area and the key information target frame, the calculated amount of non-maximum value inhibition is reduced, the generation of the candidate frames with higher overlapping rate with the real frames is facilitated, and the recall rate of the model is increased.
In the above technical solution, preferably, the fast convolutional neural network based on the key information region includes two ROI pooling layers, one full-link layer and two parallel full-link layers, and respectively outputs the confidence of the key information region and the candidate frame position after the frame regression.
On the other hand, an express waybill key information positioner based on deep learning, its characterized in that includes: the system comprises a data collection module and an express waybill key information positioning module;
the data collection and collection module is used for collecting express bill images by using shooting equipment;
the express waybill key information positioning module is used for extracting convolution characteristic mapping of an image to be detected through a convolution neural network according to a set candidate frame, inputting the convolution characteristic mapping into a first neural network model which is constructed and trained in advance, and positioning and extracting key information areas; and extracting the convolution characteristic mapping of the key information area by utilizing the convolution neural network, inputting the convolution characteristic mapping into a pre-constructed and trained second neural network model, and outputting key information.
The pre-constructed and trained first neural network training model is used for identifying key information areas in the express waybill;
the pre-constructed and trained second neural network training model is used for identifying key information in a key information area;
further, still include: the system comprises a data set generation module and a convolutional neural network module; the first neural network training model and the second neural network training model are identical in structure, and the same Faster R-CNN model is adopted to obtain a first fast R-CNN model and a second fast R-CNN model; the first Faster R-CNN model and the second Faster R-CNN model both comprise a region suggestion network construction and training module and a region-based fast convolution neural network construction and training module;
the data set generating module is used for constructing an express bill picture library with labeled key information areas as a first training set and a first testing set;
the convolutional neural network module is also used for extracting the characteristics of the first training set to obtain a first training set convolution characteristic mapping, and the first training set convolution characteristic mapping is input to a first Faster R-CNN model;
a region suggestion network construction and training module and a region-based fast convolutional neural network construction and training module of the first Faster R-CNN model initialize parameters of the region suggestion network and the region-based fast convolutional neural network of the first Faster R-CNN model; constructing a cost function of a first Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; alternately training a region suggestion network of the first Faster R-CNN model and a region-based fast convolution neural network to obtain a trained first Faster R-CNN model;
the data set generating module is also used for positioning and extracting key information areas by using a test set test key information area recognition model, marking key information in recognition results and constructing a second training set and a second test set;
the convolutional neural network module is also used for extracting the characteristics of the second training set by using a convolutional neural network to obtain convolutional characteristic mapping of the second training set; inputting a second training set convolution feature mapping;
a region suggestion network construction and training module and a region-based fast convolutional neural network construction and training module of the second Faster R-CNN model initialize parameters of the region suggestion network and the region-based fast convolutional neural network of the second Faster R-CNN model; constructing a cost function of a second Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; and alternately training the area suggestion network and the area-based fast convolution neural network of the second Faster R-CNN model to obtain the trained second Faster R-CNN model.
The beneficial technical effects are as follows:
firstly, the key information of the express waybill is regarded as objects of different categories, and the problem of positioning the key information of the express waybill is solved by using a target identification technology; according to the method, two models are trained, the first neural network model is used for identifying the key information area in the express bill image, and the second neural network model is used for identifying the key information in the key information area, so that the accuracy of identifying the key information of the express bill by the neural network is improved.
Secondly, in the method, two neural network models, namely a Faster R-CNN model, fuse a regional suggestion network and a fast convolution neural network, so that the training and the testing of the whole network are very convenient, and the target detection precision is improved;
thirdly, according to the size of the key information target frame of the express waybill, the anchor point with a specific size is adopted, and the positioning speed and the accuracy of the key information area in the express waybill and the key information in the key information area are further improved.
Fourthly, the invention provides a positive sample supplementing method aiming at the problem that the number of positive samples is small in the problem of positioning the key information of the express waybill. The supplemented positive samples all contain partial targets, so that the model learns more target information, false reports and false reports are reduced to a certain extent, and the robustness of the system is improved.
Drawings
FIG. 1 is a result diagram of positioning key information of an express bill image by directly using a Faster R-CNN method;
FIG. 2 is a schematic flow chart of a method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a positive sample supplement method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the results of a positioning region M of a model A according to an embodiment of the present invention;
fig. 5 is a result diagram of positioning key information of an express bill image by a model B according to an embodiment of the present invention;
fig. 6 is a network structure diagram of a model a according to a second embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
At present, the problem of positioning of key information of an express waybill is solved by using a target identification technology. The method for directly positioning the key information of the express waybill in the prior art has the problems of key information false detection and missing detection as shown in figure 1.
The invention trains two neural network classification models, wherein a (first neural network model) is used for identifying an area M (a minimum rectangular area containing names, telephones and addresses of senders and receivers) in an express bill, and a (second neural network model) is used for identifying the names, telephones and addresses of the senders and receivers in the area M. The invention provides an express waybill key information positioning method based on deep learning, which comprises the following steps:
constructing and training two neural network classification models, wherein the first neural network model is used for identifying a key information area in an express bill; the second neural network model is used for identifying key information in the key information area;
acquiring an express bill image by using shooting equipment, extracting convolution characteristic mapping of the image to be detected through a convolution neural network from the image, inputting the convolution characteristic mapping into a first neural network model, and positioning and extracting a key information area; and extracting the convolution characteristic mapping of the key information area by utilizing a convolution neural network, inputting the convolution characteristic mapping into a second neural network model, and outputting key information.
In a specific embodiment, the first neural network model and the second neural network model may have the same structure or different structures.
In a first embodiment (as shown in fig. 2) for implementing the present invention, the first neural network model and the second neural network model use the same fast R-CNN structure, which includes the following steps:
step 1) collecting and marking express bill photos shot manually or mechanically, and segmenting the express bill photos into a training set and a test set;
step 2) inputting the training set in the step 1) into a Convolutional Neural Network (CNN) for feature extraction to obtain Convolutional feature mapping;
step 3), constructing a model A, namely a first Faster R-CNN model:
constructing a regional suggestion Network (RPN), the RPN comprising one 3 × 3 convolutional layer and two 1 × 1 parallel convolutional layers; inputting the convolution feature map into a convolution layer of 3 × 3, sliding the convolution feature map in units of pixels, and generating an anchor point according to the size of a candidate frame at each sliding position, wherein the candidate frames with the aspect ratio of (0.3,0.5,0.8) and the scale of (64 × 64,128 × 128,256 × 256) are selected in the embodiment, that is, each pixel generates 9 anchor points with different scales; the anchor point size better conforms to the size of the region M, reduces the amount of computation for non-maximum suppression, and is beneficial to generating candidate frames with higher overlapping rate with the real frames. Inputting the generated anchor points into two parallel convolution layers of 1 x1 to perform position regression and foreground and background judgment, respectively outputting foreground and background confidence coefficients of the anchor points and positions of all candidate frames, and screening out front specific number of regions with the highest foreground confidence coefficient from the obtained rectangular background selection frames according to preset conditions to obtain a final region suggestion set D;
constructing a Fast Region-Based convolutional neural network (Fast R-CNN) model, wherein the Fast R-CNN model consists of two ROI pooling layers, a full connection layer and two parallel full connection layers, and respectively outputting the confidence of the Region and the candidate frame position after frame regression; inputting the convolution characteristics into a Fast R-CNN model, and outputting the position, the category and the confidence coefficient of a target in an image;
step 4) modifying parameters related to the total number of the categories in the model A and output category labels according to the total number of the categories of the data set; initializing improved convolution layer parameters shared by RPN and Fast R-CNN by using downloaded and trained ImageNet classification model weight parameters, wherein the unique layers of the two networks adopt Gaussian distribution random initialization parameters with the average value of 0 and the standard deviation of 0.01;
step 5) constructing a cost function for training an RPN network and a cost function for training a Fast R-CNN network in the model A;
and 6) training the model by using a back propagation algorithm and a random gradient descent algorithm and adopting a mode of alternately training two networks of RPN and Fast R-CNN, wherein when the RPN is trained, if the number of positive samples is insufficient, a candidate box is randomly selected from C to supplement the positive samples.
In a specific embodiment, the method of supplementing the positive sample is as follows (as shown in fig. 3):
redefining a real frame corresponding to the candidate frame in the negative sample; and if the intersection ratio of the real frame of the negative sample and the candidate frame of the negative sample is greater than a set threshold value, putting the candidate frame into a supplementary positive sample set, and randomly selecting the candidate frame in the key area set to supplement the positive sample when the number of the positive samples still does not meet the set threshold value.
Further, the calculation method of the frame candidate intersection ratio is as follows:
the position information of the ith real box is expressed as:
Figure BDA0002291588610000111
wherein
Figure BDA0002291588610000112
Respectively representiThe coordinate values of the upper left corner and the lower right corner of the table;
the position information of the jth candidate box is expressed as:
Figure BDA0002291588610000113
wherein
Figure BDA0002291588610000114
Respectively represent ajThe coordinate values of the upper left corner and the lower right corner of the table;
according toiAnd ajRedefining real frame gt 'corresponding to j candidate frame'jiThe position information is:
Figure BDA0002291588610000115
at this time, real frame'jiAnd candidate frame ajThe cross-over IOU value between is expressed as:
Figure BDA0002291588610000116
here, area (×) indicates the area of the equation.
Sequentially adjusting the weight of each layer of neural network according to preset parameters to obtain a trained model A;
and 7) identifying the region M by using the test set test model A in the step 1), storing the result, marking the image, and segmenting the training set and the test set. Wherein six target frames of a receiver name, a receiver telephone, a receiver address, a sender name, a sender telephone and a sender address are marked;
step 8) extracting the characteristics of the training set in the step 7) through VGG-16 to obtain convolution characteristic mapping;
step 9) constructing a model B, constructing a cost function for training an RPN network and a cost function for training a second Fast R-CNN network in the model B, initializing parameters, and alternately training the RPN and the Fast R-CNN to obtain the trained model B for identifying the name of a receiver, the telephone of the receiver, the address of the receiver, the name of a sender, the telephone of the sender and the address of the sender in the area M;
step 10) testing the model B through the test set in the step 6);
and 11) preprocessing an express waybill image collected actually, inputting the image into the model A, identifying and extracting the area M, inputting the area M into the model B, identifying the name of a receiver, the telephone of the receiver, the address of the receiver, the name of a sender, the telephone of the sender and the address of the sender, and outputting the confidence coefficient and the position information of a target frame.
In step 1, an express waybill photo shot manually or by a machine is collected and marked, wherein a key information area of the express waybill is marked as an area M;
1.1) carrying out gray processing on the image, wherein the gray processing adopts a weighted average method:
V(x,y)=0·299*RGB_R+0.587*RGB_G+0.114*RGB_B
where V (x, y) represents the gray level value of the color image converted into a gray level image, and RGB _ R, RGB _ G, RGB _ B represents the three separate intensity values of red, green, and blue, respectively.
1.2) marking an area M in the image by using a horizontal rectangular frame to obtain object frame type information (obj) and position information (x, y, w, h), wherein x, y, w, h respectively represent the upper left corner coordinate and width and height of the object frame;
1.3) making the marked images into a data set, and segmenting the data set into a training set and a testing set.
Step 2) inputting the training set in the step 1) into CNN for feature extraction to obtain convolution feature mapping; the CNN selects a network for feature extraction in the VGG-16 model, and is used for extracting features of the input image.
Step 3) constructing a model A, wherein the model integrates an improved regional suggestion network RPN and a Fast R-CNN network;
3.1) improved regional proposal network RPN: comprises a 3 × 3 convolutional layer and two 1 × 1 parallel convolutional layers; inputting the CNN-extracted convolution feature map into a 3 x 3 convolution layer, sliding on the input feature map in pixel units, and generating anchor points with an aspect ratio of (0.3,0.5,0.8) and a scale of (64 x 64,128 x 128,256 x 256) at each sliding position; inputting the generated anchor point into two 1 multiplied by 1 parallel convolutional layers for position regression and foreground and background judgment, and respectively outputting foreground and background confidence coefficients of the anchor point and positions of all candidate frames, wherein the positions of the candidate frames comprise coordinates x and y of a central point of the candidate frames and four parameters of width w and height h;
3.2) sorting the candidate frames output by the RPN in a descending order according to the score of softmax, reserving the first 2000 candidate frames, then merging the candidate frames by using a non-maximum suppression algorithm, reserving the first 300 candidate frames with the highest confidence coefficient, and obtaining a final RPN region suggestion set D;
3.3) Fast R-CNN consists of two ROI pooling layers, a full-link layer and two parallel full-link layers, and respectively outputs the confidence of the region and the candidate frame position after frame regression;
the ROI posing layer performs pooling operation on the region suggestion set D and the convolution feature mapping, maps the ROI to a corresponding position of the feature mapping according to input image, divides the mapped region into sections with the same size, and performs maximum pooling operation on each section;
the full-connection layer combines output results of the ROI posing layer, finally two full-connection layers which are connected in parallel are input, region classification and frame regression are carried out on the candidate frame, and the position, the category and the confidence coefficient of the target in the image are output;
step 4) modifying parameters related to the total number of the categories in the model A and output category labels according to the total number of the categories of the data set; initializing improved convolution layer parameters shared by RPN and Fast R-CNN by using downloaded and trained ImageNet classification model weight parameters, wherein the unique layers of the two networks adopt Gaussian distribution random initialization parameters with the average value of 0 and the standard deviation of 0.01;
step 5) constructing a cost function of training RPN and a cost function of Fast R-CNN:
the cost function for training the RPN in this embodiment is:
Figure BDA0002291588610000141
where j denotes the index of the candidate box, cjRepresenting candidate box ajThe probability of the class prediction of (c),
Figure BDA0002291588610000157
denotes ajA class label of, and when ajIn the case of a positive sample, the sample is,
Figure BDA0002291588610000151
otherwise
Figure BDA0002291588610000152
Figure BDA0002291588610000153
Representation of prediction ajHas a central coordinate (x, y) of w and a width of h.
Figure BDA0002291588610000154
Representing the coordinates (x) of the center point of the real frame corresponding to the candidate frame of the positive sample*,y*) Width and height are respectively w*,h*. The parameter lambda is a balance weight parameter, NclsTo the total number of anchor points, NregIs the number of positive samples, LclsAnd performing target and non-target classification on the candidate box by adopting cross entropy loss, wherein the classification is represented as follows: l iscls(c,u)=-logcu。LregFor regression losses, L1The loss, expressed as:
Figure BDA0002291588610000155
Figure BDA0002291588610000156
the cost function for training Fast R-CNN in this embodiment is:
L(c,u,ru,v)=Lcls(c,u)+λ[u≥1]Lreg(ru,v)
wherein c is class prediction probability, u is the u-th class, ruAnd v is an actual correction value.
Step 6) training the model by adopting a mode of alternately training two networks of RPN and Fast R-CNN, which comprises the following steps:
6.1) RPN training end-to-end by back propagation and random gradient descent. Each mini-batch contains 256 candidate boxes for RPN extraction, with a 1: 1 ratio of positive and negative samples, for training. If the number of positive samples is insufficient, in order to enable the RPN to learn more target information so as to propose a candidate frame with higher quality, a method of randomly selecting negative samples for supplement is not adopted, and a candidate frame is randomly selected from C for supplement. This stage iterates a certain number of times to minimize classification errors and positive sample position deviations.
6.2) taking the candidate frame generated by the RPN as the input of a Fast R-CNN model, and independently training a Fast R-CNN detection network. This phase iterates a certain number of times to minimize the loss function of Fast R-CNN.
6.3) initializing the parameters of the shared convolution layer in the RPN by using the parameters of the shared convolution layer of the detection network in 5.2), then fixing the parameters of the shared convolution layer, and only finely adjusting the parameters of the RPN unique layer. This stage iterates a certain number of times to minimize classification errors and positive sample position deviations.
6.4) keeping the parameters of the convolutional layer shared by the two networks fixed, and only finely adjusting the parameters of the fast R-CNN full-connection layer. And iterating the stage for a certain number of times to obtain a trained model A.
And 7) using the test set test model A in the step 1), and marking the detection result as a data set. The method specifically comprises the following steps:
7.1) inputting the test image in the step 1), and extracting a characteristic diagram by using a convolutional neural network;
7.2) identifying the region M of the input image through the model A trained in the step 5) to obtain a region M target frame with the highest confidence coefficient and position information (x, y, w, h) of the region M target frame;
7.3) saving a key information area image through the position information of the target frame of the area M;
7.4) marking the key information in the image saved in the last step by using a horizontal rectangular frame, and marking the key information as six types including rev _ name, rev _ phone, rev _ address, name, phone and address. Respectively representing the name of the addressee, the telephone of the addressee, the address of the addressee, the name of the sender, the telephone of the sender and the address of the sender. And obtaining the position information (x, y, w, h) of each real frame;
7.5) segmenting the labeled image data set into a training set and a testing set.
Step 8) extracting the characteristics of the training set in the step 7) through CNN to obtain convolution characteristic mapping;
step 9) repeating the training steps of the RPN and Fast R-CNN networks in the area suggestion networks from the step 3) to the step 6); building a model B, building a cost function for training an RPN network and a cost function for training a Fast R-CNN network in the model B, initializing parameters, and alternately training the RPN and the Fast R-NN to obtain a trained model B for identifying the names of recipients, the telephones of the recipients, the addresses of the recipients, the names of the senders, the telephones of the senders and the addresses of the senders in the area M;
step 10) testing the model B through the test set in the step 8); the testing steps are the same as those in the step 7);
step 11) preprocessing an express waybill image collected actually, extracting a characteristic diagram of an image to be detected by using a CNN (CNN), inputting the characteristic diagram into a model A, and positioning and extracting an area M, as shown in FIG. 4; and then, a CNN is used for extracting a feature map of the area M, the feature map is input into a model B, the names of the addressees, the telephones of the addressees, the addresses of the addressees, the names of the senders, the telephones of the senders and the addresses of the senders are identified, the confidence coefficient and the position information of the target frame are output, and the positioning result is shown in FIG. 5.
The method and the system take the key information of the express waybill as objects of different categories, and solve the problems of positioning and extracting the key information of the express waybill. The characteristics of the express waybill image fully considered in the embodiment train two Faster R-CNN models, the model A is used for identifying the region M in the express waybill image, and the model B is used for identifying the key information in the region M, so that the method has high accuracy. On the basis, aiming at the problem of positioning the key information of the express waybill, the model adopts the anchor point with a specific size, so that the positioning speed and the positioning accuracy of the region M in the express waybill and the key information in the region M are further improved. Further, aiming at the problem that the number of positive samples is small in the process of training the RPN, a positive sample supplementing method is provided, and the robustness of the model is improved.
In the second embodiment of the present invention, the first neural network model adopts a Young Only Look One (YOLO) v3 structure, and the second neural network model adopts a Faster R-CNN structure, which includes the following steps:
step 1) collecting and marking express bill photos shot manually or mechanically, carrying out batch normalization after pretreatment in the step 1) of the embodiment, and segmenting the express bill photos into a training set and a test set;
the batch normalization method comprises the following steps:
suppose the bottom left corner and top right corner of an annotated target box are (x1, y1) and (x2, y2), respectively, and the width and height are w and h, respectively. Then, the coordinates of the normalized center point are ((x2+ x1)/2/w, (y2+ y1)/2/h), and the width and height of the normalized target box are (x2-x1)/w and (y2-y1)/h, respectively.
And 2) generating a candidate frame. And clustering the real target frames labeled by the training set, and acquiring initial candidate frames by using the IOU value as a rating index. The specific process is as follows:
and clustering the real target frames of the training set by adopting a K-means algorithm, and when the IOU value of the candidate frame and the IOU value of the real frame are not lower than 0.5, selecting the candidate frame as an initial candidate frame and marking the initial candidate frame as a positive sample. Meanwhile, in order to solve the problem of unbalance between positive and negative samples, the positive sample supplementing method provided by the text is adopted to increase the number of positive samples. Then, the distance dis (a, gt) of the real box a from the initial candidate box gt can be expressed as:
dis(a,gt)=1-IOU(a,gt)
in order to accelerate the convergence speed of the training process, the initial candidate box is used as the initial network parameter of the model A.
And 3) constructing a model A, wherein the model adopts a YOLOv3 framework, namely the Darknet-53 and a detection network are composed of two parts, as shown in figure 6, and the two parts are respectively used for feature extraction and multi-scale prediction. Darknet-53 is a combination of successive 3X 3 and 1X 1 convolution layers, with the addition of a shortcut connection structure. The upper part of fig. 6 is the configuration parameters of the Darknet-53 skeleton network and the output parameters of the input image with size 256 × 256 after passing through the layers of the skeleton network. The left-hand number indicates the number of the right-hand residual operation cycles, and finally, feature maps of five scales of 128 × 128, 64 × 64, 32 × 32, 16 × 16 and 8 × 8 can be obtained.
And performing multi-scale prediction on the feature maps of 32 × 32, 16 × 16 and 8 × 8 after feature fusion. The specific characteristic fusion process is as follows: firstly, 5 convolution operations are performed on the 8 × 8 characteristic diagram, the convolution kernel size is 1 × 1, 3 × 3, 1 × 1 in sequence, and the step size is 1. And then, the convolution layer with the convolution kernel size of 3 x 3 and the step length of 1 and the convolution kernel number halved is connected to realize the dimension reduction effect. And then, performing double upsampling on the features, splicing the features with the previous features (the dimension is 16 x 16), and repeating the operation and splicing the feature map of 32 x 32. And finally, outputting prediction results on the fused feature maps with the sizes of 32 × 32, 16 × 16 and 8 × 8.
For the three feature maps after fusion, 3 frames are predicted for each pixel unit cell, and the center point coordinate (t) of each candidate frame is predictedx,ty) And width twAnd a high th. For the predicted cell, if the cell is offset from the top left corner of the picture's real frame (c)x,cy) And the real frame has a width pwAnd a height phThen, the predicted candidate frame coordinate parameters are:
bx=σ(tx)+cx
by=σ(ty)+cy
Figure BDA0002291588610000201
Figure BDA0002291588610000202
wherein, bx,by,bw,bhAnd respectively representing the coordinates of the center point of the candidate frame obtained by prediction, width and height, and sigma (#) represents a square loss function.
Step 4) constructing a cost function L of the model AdetIncluding coordinate loss LcoordLoss of confidence LconfAnd a classification loss LclsAnd can be expressed as:
Figure BDA0002291588610000203
Figure BDA0002291588610000204
Figure BDA0002291588610000205
Loss=Lcoord+Lconf+Lcls
wherein λ iscWeight for coordinate loss, set to 5; s is the number of input image cells, and is set to 7 x 7; b, setting the number of the predicted candidate frames of each cell to be 9;
Figure BDA0002291588610000206
whether a detection target exists in the jth candidate frame of the ith cell or not is judged, if yes, the detection target is 1, and if not, the detection target is 0; x is the number ofi,yi,wi,hiRespectively, the center coordinates, height, and width of the prediction candidate frame;
Figure BDA0002291588610000211
the center coordinates, height and width of a real target frame are referred to; lambda [ alpha ]noobjThe weight, which refers to the confidence loss, is set to 0.5; ciTo prepareMeasuring a confidence coefficient;
Figure BDA0002291588610000212
is the true confidence; cls refers to the class to which the detection target belongs; p is a radical ofi(cls) refers to the predicted probability that an object in a cell belongs to class cls;
Figure BDA0002291588610000213
is the actual probability.
And step 5) training the model A. The model is optimized in an end-to-end manner, and network parameters are optimized by using a multitask loss function. The entire training process used a batch-random gradient descent method to optimize the loss function for a total of 60000 iterations. And (3) calling initial network parameters of the model A, setting a learning rate to be 0.01, setting a weight attenuation value to be 0.0005, setting a batch training size to be 64, and setting the learning rates to be 0.001 and 0.0001 respectively after 20000 times and 50000 times of network iteration to finally obtain the trained model A.
And 6) constructing a model B according to the steps 7 to 10 in the embodiment, and training the model B to obtain the trained model B.
And 7) preprocessing the express waybill image collected in practice, inputting the model A, identifying and extracting the area M, inputting the area M into the model B, identifying the name of a receiver, the telephone of the receiver, the address of the receiver, the name of a sender, the telephone of the sender and the address of the sender, and outputting the confidence coefficient and the position information of the target frame.
The method and the system creatively take the key information of the express waybill as objects of different categories, and solve the problems of positioning and extracting the key information of the express waybill by using a target identification technology.
According to the method, two neural network classification models are trained according to the characteristics of the express waybill image, wherein the model A is used for identifying the region M in the express waybill image, and the model B is used for identifying the key information in the region M, so that the method has high accuracy. On the basis, aiming at the problem of positioning the key information of the express waybill, the model adopts the anchor point with a specific size, so that the accuracy of the region M in the express waybill and the accuracy of the key information in the region M are further improved. Further, aiming at the fact that the number of positive samples is small in the training process in the first embodiment, a positive sample supplementing method is provided, and the robustness of the model is improved.
Example (b): the utility model provides an express delivery waybill key information positioner based on deep learning, includes: the system comprises a first neural network model training module, a second neural network model training module, a data collection module and an express waybill key information positioning module;
the first neural network training model is used for training the first neural network training model and identifying a key information area in the express waybill;
the second neural network training model is used for training the second neural network training model according to the key information area in the express bill output by the first neural network training model and identifying the key information in the key information area;
the data collection and collection module is used for collecting express bill images by using shooting equipment;
the express waybill key information positioning module is used for extracting convolution characteristic mapping of an image to be detected through a convolution neural network from the image, inputting the convolution characteristic mapping into a first neural network model, positioning and extracting a key information area; and extracting the convolution characteristic mapping of the key information area by utilizing a convolution neural network, inputting the convolution characteristic mapping into a second neural network model, and outputting key information.
On the basis of the above embodiment, the method further comprises the following steps: the system comprises a data set generation module and a convolutional neural network module; the first neural network training model and the second neural network training model are identical in structure, adopt an Faster R-CNN model and comprise a region suggestion network construction and training module and a region-based fast convolution neural network construction and training module;
the data set generating module is used for constructing an express bill picture library with labeled key information areas as a first training set and a first testing set;
the convolutional neural network module is also used for extracting the characteristics of the first training set to obtain a first training set convolution characteristic mapping, and the first training set convolution characteristic mapping is input to a first Faster R-CNN model;
a region suggestion network construction and training module and a region-based fast convolutional neural network construction and training module of the first Faster R-CNN model initialize parameters of the region suggestion network and the region-based fast convolutional neural network of the first Faster R-CNN model; constructing a cost function of a first Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; alternately training a region suggestion network of the first Faster R-CNN model and a region-based fast convolution neural network to obtain a trained first Faster R-CNN model;
the data set generating module is also used for positioning and extracting key information areas by using a test set test key information area recognition model, marking key information in recognition results and constructing a second training set and a second test set;
the convolutional neural network module is also used for extracting the characteristics of the second training set by using a convolutional neural network to obtain convolutional characteristic mapping of the second training set; inputting a second training set convolution feature mapping;
a region suggestion network construction and training module and a region-based fast convolutional neural network construction and training module of the second Faster R-CNN model initialize parameters of the region suggestion network and the region-based fast convolutional neural network of the second Faster R-CNN model; constructing a cost function of a second Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; and alternately training the area suggestion network and the area-based fast convolution neural network of the second Faster R-CNN model to obtain the trained second Faster R-CNN model. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. An express waybill key information positioning method based on deep learning is characterized by comprising the following steps:
acquiring an express bill image by using shooting equipment, extracting convolution characteristic mapping of an image to be detected through a convolution neural network according to a set candidate frame, inputting the image to a first neural network model which is constructed and trained in advance, and positioning and extracting a key information area; and extracting the convolution characteristic mapping of the key information area by utilizing the convolution neural network, inputting the convolution characteristic mapping into a pre-constructed and trained second neural network model, and outputting key information.
2. The express waybill key information positioning method based on deep learning of claim 1, comprising the following steps:
the pre-built and trained first neural network model and the pre-built and trained second neural network model adopt the same fast R-CNN model, and the fast R-CNN models respectively comprise a region suggestion network and a region-based fast convolution neural network; the specific training method comprises the following steps:
constructing an express bill picture library with a labeled key information area as a first training set and a first testing set; performing feature extraction on the training set by using a convolutional neural network to obtain a convolutional feature mapping of a first training set;
inputting convolution feature mapping of a first training set, and initializing parameters of a region suggestion network and a region-based fast convolution neural network of a first Faster R-CNN model; constructing a cost function of a first Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; alternately training a region suggestion network of the first Faster R-CNN model and a region-based fast convolution neural network to obtain a trained first Faster R-CNN model;
positioning and extracting key information areas by using a test set test key information area recognition model, marking key information in recognition results, and constructing a second training set and a second test set;
performing feature extraction on the second training set by using a convolutional neural network to obtain convolutional feature mapping of the second training set; inputting convolution feature mapping of a second training set, and initializing parameters of a region suggestion network and a region-based fast convolution neural network of a second Faster R-CNN model; constructing a cost function of a second Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; and alternately training the area suggestion network and the area-based fast convolution neural network of the second Faster R-CNN model to obtain the trained second Faster R-CNN model.
3. The express waybill key information positioning method based on deep learning of claim 2, wherein the regional suggestion network comprises one 3 x 3 convolutional layer and two 1 x1 parallel convolutional layers; inputting the convolution feature map into a convolution layer of 3 x 3, and sliding on the input feature map by taking pixels as units according to a set candidate frame to obtain an anchor point; inputting the generated anchor points into two parallel convolution layers with 1 x1 input to carry out position regression and foreground and background judgment, respectively outputting foreground and background confidence coefficients of the anchor points and positions of all candidate frames, and screening out regions with the highest foreground confidence coefficient in a specific number from the obtained rectangular background selection frames according to preset conditions to obtain a final key region set.
4. The express waybill key information positioning method based on deep learning of claim 1, wherein when the number of positive samples of the candidate box does not meet a set threshold, the positive samples are supplemented by adopting the following method:
redefining a real frame corresponding to the candidate frame in the negative sample; and if the intersection ratio of the real frame of the negative sample and the candidate frame of the negative sample is greater than a set threshold value, putting the candidate frame into a supplementary positive sample set, and randomly selecting the candidate frame in the key area set to supplement the positive sample when the number of the positive samples still does not meet the set threshold value.
5. The express waybill key information positioning method based on deep learning of claim 4, wherein a candidate frame cross-correlation calculation method is as follows:
the position information of the ith real box is expressed as:
Figure FDA0002291588600000031
wherein
Figure FDA0002291588600000032
Respectively representiThe coordinate values of the upper left corner and the lower right corner of the table;
the position information of the jth candidate box is expressed as:
Figure FDA0002291588600000033
wherein
Figure FDA0002291588600000034
Respectively represent ajThe coordinate values of the upper left corner and the lower right corner of the table;
according toiAnd ajRedefining real frame gt 'corresponding to j candidate frame'jiThe position information is:
Figure FDA0002291588600000035
at this time, real frame'jiAnd candidate frame ajThe cross-over IOU value between is expressed as:
Figure FDA0002291588600000036
here, area (×) indicates the area of the equation.
6. The method of claim 3, wherein the area recommendation network of the first Faster R-CNN model uses candidate boxes with an aspect ratio of (0.3,0.5,0.8) and a scale of (64 x 64,128 x 128, 256) as the candidate boxes.
7. The method of claim 3, wherein the area recommendation network of the second Faster R-CNN model uses candidate frames with an aspect ratio of (0.2,0.5,1) and a scale of (32 x 32,64 x 64,128 x 128).
8. The express waybill key information positioning method based on deep learning of claim 2, wherein the fast convolutional neural network based on a key information region comprises two ROI pooling layers, one full-link layer and two parallel full-link layers, and the confidence of the key information region and the candidate frame position after frame regression are respectively output.
9. The utility model provides an express delivery waybill key information positioner based on deep learning which characterized in that includes: the system comprises a data set acquisition module and an express waybill key information positioning module;
the data collection and collection module is used for collecting express bill images by using shooting equipment;
the express waybill key information positioning module is used for extracting convolution characteristic mapping of an image to be detected through a convolution neural network according to a set candidate frame, inputting the convolution characteristic mapping into a first neural network model which is constructed and trained in advance, and positioning and extracting key information areas; and extracting the convolution characteristic mapping of the key information area by utilizing the convolution neural network, inputting the convolution characteristic mapping into a pre-constructed and trained second neural network model, and outputting key information.
10. The express waybill key information positioning device based on deep learning of claim 9,
the first neural network training model and the second neural network training model both adopt the same Faster R-CNN model to obtain a first Faster R-CNN model and a second Faster R-CNN model, and the device further comprises: the system comprises a data set generation module and a convolutional neural network module;
the first Faster R-CNN model and the second Faster R-CNN model both comprise a region suggestion network construction and training module and a region-based fast convolution neural network construction and training module;
the data set generating module is used for constructing an express bill picture library with labeled key information areas as a first training set and a first testing set;
the convolutional neural network module is also used for extracting the characteristics of the first training set to obtain a first training set convolution characteristic mapping, and the first training set convolution characteristic mapping is input to a first Faster R-CNN model;
a region suggestion network construction and training module and a region-based fast convolutional neural network construction and training module of the first Faster R-CNN model initialize parameters of the region suggestion network and the region-based fast convolutional neural network of the first Faster R-CNN model; constructing a cost function of a first Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; alternately training a region suggestion network of the first Faster R-CNN model and a region-based fast convolution neural network to obtain a trained first Faster R-CNN model;
the data set generating module is also used for positioning and extracting key information areas by using a test set test key information area recognition model, marking key information in recognition results and constructing a second training set and a second test set;
the convolutional neural network module is also used for extracting the characteristics of the second training set by using a convolutional neural network to obtain convolutional characteristic mapping of the second training set; inputting a second training set convolution feature mapping;
a region suggestion network construction and training module and a region-based fast convolutional neural network construction and training module of the second Faster R-CNN model initialize parameters of the region suggestion network and the region-based fast convolutional neural network of the second Faster R-CNN model; constructing a cost function of a second Faster R-CNN model region suggestion network and a fast convolutional neural network based on a key information region; and alternately training the area suggestion network and the area-based fast convolution neural network of the second Faster R-CNN model to obtain the trained second Faster R-CNN model.
CN201911182294.3A 2019-11-27 2019-11-27 Express waybill key information positioning method and device based on deep learning Pending CN110991435A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911182294.3A CN110991435A (en) 2019-11-27 2019-11-27 Express waybill key information positioning method and device based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911182294.3A CN110991435A (en) 2019-11-27 2019-11-27 Express waybill key information positioning method and device based on deep learning

Publications (1)

Publication Number Publication Date
CN110991435A true CN110991435A (en) 2020-04-10

Family

ID=70087323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911182294.3A Pending CN110991435A (en) 2019-11-27 2019-11-27 Express waybill key information positioning method and device based on deep learning

Country Status (1)

Country Link
CN (1) CN110991435A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507738A (en) * 2020-05-04 2020-08-07 武汉积墨包装印刷有限公司 Ink tracing and recycling process method based on block chain and 5G communication
CN111695558A (en) * 2020-04-28 2020-09-22 深圳市跨越新科技有限公司 Logistics waybill picture rectification method and system based on YoloV3 model
CN111695559A (en) * 2020-04-28 2020-09-22 深圳市跨越新科技有限公司 Freight note picture information coding method and system based on YoloV3 model
CN111709294A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Express delivery personnel identity identification method based on multi-feature information
CN112052853A (en) * 2020-09-09 2020-12-08 国家气象信息中心 Text positioning method of handwritten meteorological archive data based on deep learning
CN112163667A (en) * 2020-09-16 2021-01-01 闽江学院 Novel Faster R-CNN network model and training method thereof
CN112183374A (en) * 2020-09-29 2021-01-05 佛山科学技术学院 Automatic express sorting device and method based on raspberry group and deep learning
CN112308688A (en) * 2020-12-02 2021-02-02 杭州微洱网络科技有限公司 Size meter detection method suitable for e-commerce platform
CN112308822A (en) * 2020-10-10 2021-02-02 杭州电子科技大学 Intervertebral disc CT image detection method based on deep convolutional neural network
CN112861800A (en) * 2021-03-16 2021-05-28 南京邮电大学 Express identification method based on improved Faster R-CNN model
CN112927217A (en) * 2021-03-23 2021-06-08 内蒙古大学 Thyroid nodule invasiveness prediction method based on target detection
CN113076972A (en) * 2021-03-04 2021-07-06 山东师范大学 Two-stage Logo image detection method and system based on deep learning
CN113553948A (en) * 2021-07-23 2021-10-26 中远海运科技(北京)有限公司 Automatic recognition and counting method for tobacco insects and computer readable medium
CN113554706A (en) * 2021-07-29 2021-10-26 中科微至智能制造科技江苏股份有限公司 Trolley package position detection method based on deep learning
CN113780087A (en) * 2021-08-11 2021-12-10 同济大学 Postal parcel text detection method and equipment based on deep learning
CN113870870A (en) * 2021-12-02 2021-12-31 自然资源部第一海洋研究所 Convolutional neural network-based real-time recognition method for marine mammal vocalization

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245545A (en) * 2018-09-26 2019-09-17 浙江大华技术股份有限公司 A kind of character recognition method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245545A (en) * 2018-09-26 2019-09-17 浙江大华技术股份有限公司 A kind of character recognition method and device

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695558A (en) * 2020-04-28 2020-09-22 深圳市跨越新科技有限公司 Logistics waybill picture rectification method and system based on YoloV3 model
CN111695559A (en) * 2020-04-28 2020-09-22 深圳市跨越新科技有限公司 Freight note picture information coding method and system based on YoloV3 model
CN111695559B (en) * 2020-04-28 2023-07-18 深圳市跨越新科技有限公司 YoloV3 model-based waybill picture information coding method and system
CN111695558B (en) * 2020-04-28 2023-08-04 深圳市跨越新科技有限公司 Logistics shipping list picture correction method and system based on YoloV3 model
CN111507738A (en) * 2020-05-04 2020-08-07 武汉积墨包装印刷有限公司 Ink tracing and recycling process method based on block chain and 5G communication
CN111709294A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Express delivery personnel identity identification method based on multi-feature information
CN111709294B (en) * 2020-05-18 2023-07-14 杭州电子科技大学 Express delivery personnel identity recognition method based on multi-feature information
CN112052853B (en) * 2020-09-09 2024-02-02 国家气象信息中心 Text positioning method of handwriting meteorological archive data based on deep learning
CN112052853A (en) * 2020-09-09 2020-12-08 国家气象信息中心 Text positioning method of handwritten meteorological archive data based on deep learning
CN112163667B (en) * 2020-09-16 2024-01-12 闽江学院 Novel Faster R-CNN network model and training method thereof
CN112163667A (en) * 2020-09-16 2021-01-01 闽江学院 Novel Faster R-CNN network model and training method thereof
CN112183374A (en) * 2020-09-29 2021-01-05 佛山科学技术学院 Automatic express sorting device and method based on raspberry group and deep learning
CN112308822A (en) * 2020-10-10 2021-02-02 杭州电子科技大学 Intervertebral disc CT image detection method based on deep convolutional neural network
CN112308688A (en) * 2020-12-02 2021-02-02 杭州微洱网络科技有限公司 Size meter detection method suitable for e-commerce platform
CN113076972A (en) * 2021-03-04 2021-07-06 山东师范大学 Two-stage Logo image detection method and system based on deep learning
CN112861800B (en) * 2021-03-16 2022-08-05 南京邮电大学 Express identification method based on improved Faster R-CNN model
CN112861800A (en) * 2021-03-16 2021-05-28 南京邮电大学 Express identification method based on improved Faster R-CNN model
CN112927217B (en) * 2021-03-23 2022-05-03 内蒙古大学 Thyroid nodule invasiveness prediction method based on target detection
CN112927217A (en) * 2021-03-23 2021-06-08 内蒙古大学 Thyroid nodule invasiveness prediction method based on target detection
CN113553948A (en) * 2021-07-23 2021-10-26 中远海运科技(北京)有限公司 Automatic recognition and counting method for tobacco insects and computer readable medium
CN113554706A (en) * 2021-07-29 2021-10-26 中科微至智能制造科技江苏股份有限公司 Trolley package position detection method based on deep learning
CN113554706B (en) * 2021-07-29 2024-02-27 中科微至科技股份有限公司 Trolley parcel position detection method based on deep learning
CN113780087A (en) * 2021-08-11 2021-12-10 同济大学 Postal parcel text detection method and equipment based on deep learning
CN113780087B (en) * 2021-08-11 2024-04-26 同济大学 Postal package text detection method and equipment based on deep learning
CN113870870B (en) * 2021-12-02 2022-04-05 自然资源部第一海洋研究所 Convolutional neural network-based real-time recognition method for marine mammal vocalization
CN113870870A (en) * 2021-12-02 2021-12-31 自然资源部第一海洋研究所 Convolutional neural network-based real-time recognition method for marine mammal vocalization

Similar Documents

Publication Publication Date Title
CN110991435A (en) Express waybill key information positioning method and device based on deep learning
CN109886359B (en) Small target detection method and detection system based on convolutional neural network
CN109829893B (en) Defect target detection method based on attention mechanism
CN113160192B (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN109919934B (en) Liquid crystal panel defect detection method based on multi-source domain deep transfer learning
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN108960135B (en) Dense ship target accurate detection method based on high-resolution remote sensing image
CN109934293A (en) Image-recognizing method, device, medium and obscure perception convolutional neural networks
CN107403430A (en) A kind of RGBD image, semantics dividing method
CN107341523A (en) Express delivery list information identifying method and system based on deep learning
CN106022363B (en) A kind of Chinese text recognition methods suitable under natural scene
CN112560675B (en) Bird visual target detection method combining YOLO and rotation-fusion strategy
CN110766002B (en) Ship name character region detection method based on deep learning
CN111967313B (en) Unmanned aerial vehicle image annotation method assisted by deep learning target detection algorithm
CN110163836A (en) Based on deep learning for the excavator detection method under the inspection of high-altitude
CN110245545A (en) A kind of character recognition method and device
CN110781882A (en) License plate positioning and identifying method based on YOLO model
CN113420643B (en) Lightweight underwater target detection method based on depth separable cavity convolution
CN111027538A (en) Container detection method based on instance segmentation model
CN114549507B (en) Improved Scaled-YOLOv fabric flaw detection method
CN110728307A (en) Method for realizing small sample character recognition of X-ray image by self-generating data set and label
CN113887410A (en) Deep learning-based multi-category food material identification system and method
CN107133647A (en) A kind of quick Manuscripted Characters Identification Method
CN114821408A (en) Method, device, equipment and medium for detecting parcel position in real time based on rotating target detection
CN114140665A (en) Dense small target detection method based on improved YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200410