CN110598697A

CN110598697A - Container number positioning method based on thickness character positioning

Info

Publication number: CN110598697A
Application number: CN201910777129.6A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Leveraging Network Technology Co Ltd
Current assignee: Shanghai Leveraging Network Technology Co Ltd
Priority date: 2019-08-23
Filing date: 2019-08-23
Publication date: 2019-12-20

Abstract

The invention discloses a container number positioning method based on thickness character positioning, and belongs to the technical field of image processing. The overall flow chart is shown in the figure. Firstly, acquiring a container photo; then, the container number is roughly positioned through the deep neural network framework provided by the invention; then, fine positioning is carried out on the container number information in the coarse positioning area by using a deep neural network; carrying out perspective transformation on the positioned character quadrilateral frame, and then carrying out angle correction on the container number; and finally, obtaining the box number of the container by using the length information of the text box. The invention uses a box number character area coarse positioning depth neural network algorithm to detect the coarse positioning box number character area by using a new single-stage box number character fine positioning network. The method can solve the problem that the characters are not segmented by the traditional algorithm, and meanwhile, compared with the method of directly performing fine positioning on the global picture, the efficiency and accuracy of performing accurate text positioning on the coarse positioning box number character area are greatly improved. Therefore, the robustness, the rapidity and the universality of the container number positioning under the conditions of complex natural environment, angle deviation and the like are improved by using the thick and thin box number information positioning frame.

Description

Container number positioning method based on thickness character positioning

Technical Field

The invention designs an image processing method, belongs to the technical field of image processing, and particularly relates to a container number positioning method based on thickness character positioning.

Background

With the increasing growth of the shipping industry, containers are playing an increasingly important role as the primary loading tool for shipping. The container number is used as the identification code of each container, and the management and distribution of the containers can be facilitated. The automatic identification of the container number is more and more emphasized by people, and more container number identification systems are widely applied to customs, storage yards and the like.

Technicians at ports and the like rely primarily on container numbers on containers to manage and distribute the containers.

At present, many yard management and handling vessel management are automated. The identification of the container number of the container is generally divided into the links of preprocessing the container picture, positioning the container number area, identifying the container number character and the like. Wherein the container number positioning is in an intermediate important stage. At present, the positioning of the container number of the container is easily influenced by other containers in storage yards and the like or natural environments, so that the positioning of the container number is wrong or deviated. And once the box number positioning has problems, the subsequent step identification is difficult to realize, and the condition of identification failure or identification error occurs.

The traditional container number positioning based on machine vision mainly realizes the positioning of the container number based on the imaging information of the container number, such as the edge detection of a container number picture, the character characteristics of the container number, the establishment of a template for template matching and the like.

The calculation efficiency is greatly reduced if the global character accurate positioning of the deep neural network is directly used. Therefore, the problem that the container number region is automatically positioned in a complex natural scene cannot be well solved by firstly roughly positioning the container number character region and then carrying out the existing algorithm on the roughly positioned local region of interest. The following is a related algorithm for container number location in recent years.

In the process of passing through an edge detection algorithm, the document [1] carries out filtering processing on a container picture, then carries out edge detection in the horizontal direction by adopting a Sobel operator, then carries out column positioning and row positioning attempts according to the edge image in the horizontal direction by utilizing the edge texture characteristics of a container number area, and finally carries out comprehensive analysis according to the result of row positioning to extract a correct container number area. The method is mainly used under the condition that only the container in the picture is not interfered by external natural factors, and when other characters exist on the picture or the threshold value is just close to the characters of the container, the result of matching and positioning can be wrong. In the container area inclination correction, angle detection is carried out only by Hough detection, and left and right deviation caused by shooting angles cannot be removed, so that the obtained positioning result cannot be directly used for container number identification. In addition, the document [1] determines the container number by using the number of the statistics, and the position of the container number determined by the number can be wrongly or wrongly divided due to inaccurate counting caused by character division. Document [1] does not mention the avoidance of such a situation. In fact, this also occurs frequently in experiments.

In the adaptive location process of the container number through the image entropy, firstly, the container image is preprocessed to obtain a binary image, the binary image is initially located by using a projection method, a container number area is initially extracted from the original container image, and the container number is judged to be one line or multiple lines according to the image entropy of the initially extracted container number area. If the box number is one line, the box number is successfully positioned and a box number area image is extracted; if the box number is multiple rows, then the relocation is performed. Secondly, corroding, opening and expanding the binary image of the multiple lines of box numbers by using a morphological method, dividing the multiple lines of box number image into a plurality of connected regions, and removing non-box number regions according to the area change range and shape characteristics of the multiple lines of box number regions; and (4) re-extracting the box number area from the original container image. The method has poor interference resistance and is easily influenced by the environment. First, the identification of the container number generally has much interference, such as dark environment and uneven illumination. This can have an effect on the color information of the container, making it impossible to count black and white transitions on a projection-based basis. And other rusts or dirt in the container can affect the entropy of the image. The effect is poor in practical application.

Disclosure of Invention

In view of the above, the present invention aims to provide a container number positioning method based on coarse and fine character positioning, which can solve the problems in the prior art that the container number is not accurately positioned due to external influences, the shooting angle is influenced, the anti-interference capability is poor, and the efficiency and accuracy of directly and accurately positioning the text on the whole image.

The technical scheme of the invention generally provides a method for positioning the container number of a container in a rough and fine mode, and the method comprises two models based on a deep neural network, wherein one model is a container character part rough positioning model based on the deep neural network, and the other model is a character fine positioning model based on the deep neural network. Firstly, the container character coarse positioning model is used for carrying out coarse positioning on character parts of the container, then the container number fine positioning model is used for carrying out image obtaining on the coarse positioning model to obtain the position of the container number, and then perspective transformation is used for correcting the angle of the obtained image. The overall flow chart is shown in fig. 1.

The container character part rough positioning model and the training method based on the deep neural network are as follows:

the invention provides an innovative framework for roughly positioning container numbers, the framework flow is shown in figure 2, and the structure of a neural network is shown in figure 3. First, a set of convolution layers is used to extract a feature map of a candidate image. The feature map is shared for subsequent regional candidate network layers and fully connected layers. The area candidate network is then used to generate area candidate image blocks. The layer judges whether the anchor point belongs to the foreground or the background through softmax, and then corrects the anchor point by utilizing bounding box regression to obtain an accurate proposed area. And then, collecting the input feature map and the candidate target area in a pooling manner, extracting the feature map of the target area after integrating the information, and sending the feature map into a subsequent full-connection layer to judge the target category. And finally, calculating whether the type of the target area is a box number character area or not by using the target area characteristic diagram, and simultaneously, regressing the bounding box again to obtain the final accurate position of the character part detection box of the container.

The training method is to take a large number of container photos to label the container number character area picture frame to train the coarse positioning neural network model provided by the invention.

The character fine positioning model based on the deep neural network is as follows:

we propose an innovative framework for container number location based on full volume network (FCN) and non-maximum suppression (NMS), the framework flow is shown in fig. 7. Different from previous researches, most of the traditional text detection methods and some text detection methods based on deep learning are multi-stage, a plurality of stages need to be optimized during training, the final model effect is influenced certainly, and time is consumed.

The network transmits the input text pictures into a multi-channel full convolution network part to generate a plurality of pixel level text score graphs and geometric channels. As shown in fig. 8, the multi-channel convolution network is divided into three parts, feature extraction, feature merging and output.

And (3) feature extraction and combination:

firstly, a general network is used as a basic layer for feature extraction. And then extracting a network according to the pork liver feature, and extracting feature maps of different levels. Their sizes are 1/32, 1/16, 1/8 and 1/4 of the output pictures respectively, so that feature maps with different scales can be obtained, and the aim is to solve the problem of severe text scale transformation, wherein the initial stage can be used for predicting small text lines, and the later stage can be used for predicting large text lines. Then, in the merging layer, the feature map of the last layer extracted from the feature extraction network layer is first sent to the unprool (pooling), and the image is enlarged by 2 times. And then concatenated with the feature map of the previous layer.

Note: wherein g is_iIs a merging base, h_iIs a merged elemental graph, operator [ -; a]Showing the connection along the channel axis.At each merging stage, the feedback from the feature map of the last stage is first fed to the parsing layer, doubled in size, and then concatenated with the current feature map. Next, conv_1×1Reduced number of channels and reduced computation, followed by a conv_3×3It fuses the information and ultimately produces the output of this merging stage. After the last merging phase, conv_3×3Layer generation final feature map g of merged branches₄And feeds it to the output layer.

And (3) outputting:

1. scoring chart: scoring each pixel to obtain 0-1 of interval

2. RBOX: rotating rectangle, five channels, point to four edge distances (four edges fixed in sequence) and rotation angle

3. QUAD: quadrilateral, eight-channel, point-to-four points (x1, y1... x4, y4)8 offsets

And accurately marking the position of each character area in the character areas of the container, converting the character areas into a data format of a rotating rectangle, and training the deep neural network to obtain a model capable of accurately positioning the box number characters of the container. By using the model, each small character area can be accurately positioned in the character area of the roughly positioned container.

and (4) obtaining QUAD text box coordinates by taking a text positioning model, and arranging clockwise starting from the upper left point. The ith character frame with coordinates of (B)_{i_y1}，B_{i_x1}，B_{i_y2}，B_{i_x2}，B_{i_y3}，B_{i_x3}，B_{i_y4}，B_{i_x4}) Taking the maximum value of x and y directions as B_{i_xmin}，B_{i_xmgx}，B_{i_ymin}，B_{i_ymin}. By perspective transformation, the original (B)_{i_y1}，B_{i_x1}，B_{i_y2}，B_{i_x2}，B_{i_y3}，B_{i_x3}，B_{i_y4}，B_{i_x4}) Perspective transformation of coordinates into (B)_{i_ymin}，B_{i_xmin}，B_{i_ymin}，B_{i_xmgx}，B_{i_ymgx}，B_{i_xmgx}，B_{i_ymgx}，B_{i_xmin}). Thus, the influence of the shooting angle is eliminated, and the container number identification is prepared for the subsequent container number identification.

Compared with the prior art, the container number positioning method based on thickness character positioning provided by the invention has the following beneficial effects:

(1) the method has the advantages that the container number region is roughly divided through the deep neural network to obtain the rough positioning box number region, the problems of external natural environment interference and inaccurate box number positioning caused by poor shooting angle can be well solved, and the problems of low efficiency and low accuracy of the precise positioning of the deep neural network for precisely positioning the text on the whole image are solved by using the rough positioning box number character region.

(2) The deep neural network model is finely positioned through the container characters, so that the position of the container number can be quickly and accurately positioned, and the speed and the accuracy of the conventional positioning mode are improved;

(3) the problem that the subsequent container number identification is wrong due to serious character inclination caused by the shooting angle of the container characters can be solved by combining the character quadrilateral boundary obtained by the container character positioning depth with perspective transformation.

Drawings

FIG. 1 is a flow chart of container number positioning based on thickness character positioning

FIG. 2 is a flow chart of coarse positioning of container number

FIG. 3 is a schematic diagram of a coarse positioning network structure for container characters

FIG. 4 is a schematic illustration of pooling of propofol target regions

FIG. 5 is a diagram of a regional classification network architecture

FIG. 6 is a diagram of the effect of coarse positioning of the container

FIG. 7 is a text positioning flow chart of the container

FIG. 8 is a flow chart of a multi-channel full convolution network

FIG. 9 is a schematic view of the final positioning of the case number

Detailed Description

The following examples are given for the purpose of illustrating the present invention, and the detailed embodiments and specific procedures are given for the purpose of implementing the present invention on the premise of the technical solution thereof, but the scope of the present invention is not limited to the following examples.

According to the technical scheme of the invention, the detailed description of the implementation mode of the specific algorithm is carried out according to the flow of image processing, and the detailed description is divided into container picture preprocessing, container number coarse positioning, container number fine positioning, container character correction and container number positioning.

(1) Container picture preprocessing

The acquired RBG image is subjected to gray processing by adopting the following formula so as to reduce the size of the image and reduce the interference of the system identification whole course color on the identification of the box number:

grey is 0.3R + 0.59G + 0.11B, where R, G and B represent image three channel values;

and then carrying out median filtering on the obtained gray level image to remove noise interference.

(2) Coarse positioning of container number

Noun interpretation

conv convolution

Pooling pooling

relu modified linear element

softmax normalized exponential function

We propose an innovative framework for coarse positioning of container numbers, the framework flow is shown in fig. 2. First, a set of underlying volume base layers is used to extract a feature map of a candidate image. The feature map is shared for subsequent regional candidate network layers and fully connected layers. The area candidate network is then used to generate area candidate image blocks. The layer judges whether the anchor point belongs to the foreground or the background through softmax, and then corrects the anchor point by utilizing bounding box regression to obtain an accurate proposed area. And then, collecting the input feature map and the candidate target area in a pooling manner, extracting the feature map of the target area after integrating the information, and sending the feature map into a subsequent full-connection layer to judge the target category. And finally, calculating the category of the target area by using the target area characteristic diagram, and simultaneously performing bounding box regression again to obtain the final accurate position of the character part detection box of the container.

Feature extraction

A container image of any size PxQ, first scaled to a fixed size MxN, and then fed into the network; the characteristic extraction layer comprises three layers of conv, pooling and relu. Taking the VGG16 model as an example, the Conv layers part has 13 Conv layers, 13 relu layers and 4 pooling layers. In Conv layers:

1. all conv layers are: kernel _ size is 3, pad is 1, stride is 1

2. All pooling layers were: kernel _ size 2, pad 0, stride 2

In the convolutional layer, all convolutions are extended (pad is 1, i.e., one 0 is filled), so that the original image becomes (M +2) x (N +2), and then 3x3 is convolved to output MxN. It is this setting that results in the Conv layer in Conv layers not changing the input and output matrix size. The posing layer kernel _ size of Conv layers is 2 and stride is 2. Thus, each MxN matrix passing through the pooling layer becomes (M/2) x (N/2) in size. In summary, in the entire Conv layers, the Conv and relu layers do not change the input/output size, and only the pooling layer changes the output length/width to 1/2 for input. Then, one MxN-sized matrix is fixed by Conv layers to become (M/16) x (N/16)! Thus, the feature maps generated by Conv layers can be associated with the original image.

Regional candidate network

The RPN network first performs a 3x3 convolution operation on the input (M/16) x (N/16) Feature Map, and then splits into two paths: one is used for judging the category of anchors, and the other is used for calculating the predicted value of the bounding box. Thus the input of Propusal is 6, four predictors of bbox (x, y, w, h), and the category of anchor (forego and background).

1. Generating anchors, and performing bbox regression on all anchors (the anchors are completely consistent during generation and training)

2. And (3) extracting the pre _ nms _ topN (e.g.6000) anchors according to the descending order of the input positive softmax registers, namely extracting the positive anchors after the correction position.

3. Defining positive anchors beyond the image boundary as the image boundary (preventing proposal from exceeding the image boundary during subsequent roi posing)

4. Eliminating very small (width < threshold or height < threshold) positive anchors

5. Carrying out nonmaximum compression

Propusal Layer has 3 inputs: the positive and negative anchors classifier results rpn _ cls _ prob _ reshape, the result of the corresponding bbox reg is output as prosal.

Then output propusal [ x1, y1, x2, y2], since mapping anchors back to the original in the third step determines whether the boundary is exceeded, here the output propusal is of the MxN input image scale, which is useful in subsequent networks. The detection section is finished, and the identification section is followed by the container character section.

Target area pooling

The target area pooling work is as follows: and extracting the proxy feature from the feature maps by using the proxy of the body channel, and sending the proxy feature to a subsequent full-connection and softmax network for classification (namely classifying whether the proxy is the container text area or the background at all).

Target area pooling process:

since the pro posal is of the corresponding M × N scale, it is first mapped back to the feature map scale of (M/16) × (N/16) size using the spatial _ scale parameter; dividing the feature map area corresponding to each pro-positive horizontally into grids of a dotted _ w multiplied by a dotted _ h; each portion of the grid was subjected to max firing. After the processing, even though the output results of the probes with different sizes are all of the fixed sizes of the probe _ w × probe _ h, fixed-length output is realized. The target area pooling is schematically shown in FIG. 4.

Region classification

The region classification part calculates whether each pro-deal is a container character region or not through full connect layer and softmax by using the obtained pro-deal feature maps, and outputs cls _ prob probability vector; and simultaneously, obtaining the position offset bbox _ pred of each proposal by using the bounding box regression again, and using the position offset bbox _ pred for regression of a more accurate target detection frame. The network structure of the area classification part is shown in figure 5.

After the pro-social feature maps with the size of 7x7 ═ 49 are obtained from the RoI Pooling, the maps are sent to the subsequent network, and the following 2 things can be seen:

1. the propusals are classified by full concatenation and softmax, which is actually already the category of recognition

2. Performing bounding box regression on the propulses again to obtain a higher-precision rect box

The container number rough positioning model is trained through the steps, the container picture is input, and the coordinate value of the external rectangular frame of the character area of the container is obtained. Respectively, the coordinates of the upper left point and the coordinates of the lower right point. And then intercepting the image on the original image to obtain a picture for roughly positioning the container number, wherein the roughly positioning effect is shown as figure 6.

(3) Container number thin positioning

The picture-to-multi-channel full convolution network portion generates a plurality of pixel-level text score maps and geometric channels. As shown in fig. 8, the multi-channel convolution network is divided into three parts, feature extraction, feature merging and output.

Feature extraction and feature combination:

firstly, a general network is used as a basic layer for feature extraction. And then extracting the network according to the main features and extracting feature maps of different levels. Their sizes being respectively of the output picturesTherefore, feature maps with different scales can be obtained, the problem of severe text scale transformation is solved, small text lines can be predicted at the beginning, and large text lines can be predicted at the later stage. Then, in the merging layer, the feature map of the last layer extracted from the feature extraction network layer is first sent to the unprool (pooling), and the image is enlarged by 2 times. Then connected with the characteristic diagram of the previous layer in series

The specific process is as follows:

h₁＝f₁1/32

g₁＝unpool(h₁)1/16

h₂＝conv_3×3(conv_1×1([g₁；f₂]))g₁1/16 f₂ 1/16 h₂ 1/8

g₂＝＝unpool(h₂)1/4

h₃＝conv_3×3(conv_1×1([g₂；f₃]))g₂ 1/4 f₃ 1/8 h₃ 3/8

g₃＝unpool(h₃)3/4

h₄＝conv_3×3(conv_1×1([g₃；f₄]))g₃ 3/4 f₄ 1/4 h₄1

g₄＝conv3X3(h₄)1

note: wherein g is_iIs a merging base, h_iIs a merged elemental graph, operator [ -; a]Showing the connection along the channel axis. At each merging stage, the feedback from the feature map of the last stage is first fed to the parsing layer, doubled in size, and then concatenated with the current feature map. Next, conv_1×1With reduced passageIn number and with reduced computation, followed by a conv_3×3It fuses the information and ultimately produces the output of this merging stage. After the last merging phase, conv_3×3Layer generation final feature map g of merged branches₄And feeds it to the output layer.

And (3) outputting:

1. scoring chart: scoring each pixel to obtain 0-1 of interval

RBOX: rotating rectangle, five channels, point to four edge distances (four edges fixed in sequence) and rotation angle

QUAD: quadrilateral, eight-channel, point-to-four points (x1, y1... x4, y4)8 offsets

(4) Container character correction

(5) Positioning container number

The method for acquiring the container number mainly comprises the step of positioning the container number through the length information of the text frame. The external environmental interference of the container is eliminated due to the rough positioning in advance. All the information of the container number is obtained by the fine positioning model. And (4) processing the coordinates of the text box obtained by the upper fine positioning model, and firstly arranging the coordinates from small to large according to the y direction of the upper left point of the quadrangle. Because the container numbers of the containers are distributed on the upper part of the container on any box type, the interference information of the container numbers below the middle part of the container is eliminated. Then, the length of the quadrangle frame formed by the container number in the x direction is the longest, and the quadrangle frame formed by the container number in the x direction is taken as the line where the container number is located. And the distance between the length of the upper left point in the y direction and the height of half character of the container number is taken as the container number.

The invention provides a container number positioning method, which is characterized in that a camera is used for shooting a container picture and uploading the picture to the cloud, and a container number character area is roughly cut by using a container body depth neural network model, so that the problem of inaccurate container number positioning caused by natural environment is solved. The container character positioning method has the advantages that the container character region is positioned through the deep neural network container character positioning model, the problems that characters are lost and the positioning is inaccurate in the traditional algorithm can be solved, and meanwhile, the problems of low efficiency and low accuracy of character positioning of the whole image by the accurate positioning network are solved. The shooting angle deviation can be solved by combining perspective transformation with character quadrilateral coordinate information, so that the characters are in the horizontal direction, and the characters can be conveniently recognized later. The scheme well improves the stability, rapidity and universality of container number positioning.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A container number positioning method based on thickness character positioning is characterized by comprising the following steps:

s1: acquiring a container photo;

s2: carrying out rough positioning through a target detection depth neural network container number area to obtain a picture of a container number character area;

s3: carrying out accurate character positioning on the obtained picture of the container number character area by using a deep neural network;

s4: carrying out perspective transformation on the character quadrangular character frame of the character area obtained by positioning, and carrying out angle correction on the container number and the number;

s5: and obtaining the box number of the container by using the length information of the text box.

2. The container number positioning method based on thickness character positioning as claimed in claim 1, wherein the operation procedure of step S1 is:

s11: shooting a picture of the container by using a camera;

s12: and uploading the picture to a cloud-end interface through a network.

3. The container number positioning method based on coarse and fine character positioning as claimed in claim 1, wherein the container number coarse positioning network model structure and training detection method in step S2 is:

s21: and (3) marking picture frames in the container number character coarse positioning area by taking a large number of container photos, randomly selecting a large number of pictures to be divided into a test set and a training data set, respectively carrying out unified processing on all the test sets and the training data set, and generating a training file for deep neural network training. Parameters such as iteration times, learning rate and the like are set for model training, and the trained models are stored.

S22: and loading a trained container body detection model, and detecting the input container picture to obtain the position area coordinates (x _ cont _1, y x _ cont _1, x x _ cont _2 and y x _ cont _2) of the container.

S23: and intercepting the container picture through S22 to obtain a picture of the container body part.

4. The container number positioning method based on thickness character positioning as claimed in claim 1, wherein the operation procedure of step S3 is:

and S31, collecting the pictures of the container body, marking the character areas, setting relevant parameters for training, and training a character positioning model of the end-to-end character area, which can input the container pictures to obtain the container number.

S32: and loading the end-to-end character region positioning model obtained in the step S31, and inputting the picture of the container body part obtained in the step S2 to perform character region positioning. A series of coordinate sets (x _ i _1, y _ i1), (x _ i _2, y _ i _2), (x _ i _3, y _ i _3), (x _ i _4, y _ i _4) of text regions are obtained. A quadrilateral coordinate representing the ith letter area.

5. The container number positioning method based on thickness character positioning as claimed in claim 1, wherein the operation procedure of step S4 is:

the region of the coordinate set surrounding city is subjected to perspective transformation, so that the subsequent box number identification is not influenced by photographing, and the box number is in the horizontal direction.

6. The container number positioning method based on thickness character positioning as claimed in claim 1, wherein the operation procedure of step S5 is:

the coordinate set obtained in S32 is centered vertically, the maximum value in the left-right direction is taken as the coordinate of the rectangular frame minimum bounding the text box, the coordinate of the upper left point is (X _ i _1, Y _ i _1), the coordinate of the lower right point is (X _ i _2, Y _ i _2), the minimum Y point is at these coordinate points, and the length-width ratio is greater than 3, the box number is taken as a line, and the text box of this line is positioned as one line of the box number. Namely, the positioning of the container number is completed.