CN110942057A

CN110942057A - Container number identification method and device and computer equipment

Info

Publication number: CN110942057A
Application number: CN201811113365.XA
Authority: CN
Inventors: 桂一鸣
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-09-25
Filing date: 2018-09-25
Publication date: 2020-03-31

Abstract

The application provides a container number identification method and device and computer equipment. The application provides a container number identification method, which comprises the following steps: positioning a target area where the container number is located from an image to be identified containing the container number, and carrying out space transformation on the target area to obtain a transformed target area; performing feature extraction on the transformed target region to obtain a first feature map; inputting the first characteristic diagram into a pre-trained container number identification model, serializing the first characteristic diagram by the container number identification model to obtain a characteristic sequence, coding the characteristic sequence to obtain a coding result, and outputting a decoding result after decoding the coding result; and determining the container number in the image to be identified according to the decoding result. The container number identification method can accurately identify the container number from the image to be identified.

Description

Container number identification method and device and computer equipment

Technical Field

The application relates to the field of image recognition, in particular to a container number recognition method, a container number recognition device and computer equipment.

Background

In a gateway operation, each container is typically assigned a box number to identify the individual container by box number. In recent years, in order to reduce manual copying errors and labor cost, container numbers are usually identified through an automatic identification technology.

The related technology discloses a container number identification method, which comprises the following steps: positioning a target area where the container number is located from the image to be identified; carrying out character segmentation on the target area; respectively identifying a plurality of characters obtained by segmentation to obtain a plurality of identification results; and combining a plurality of identification results to obtain the container number.

When the method is used for identifying the container number, the character segmentation needs to be carried out on the target area where the container number is located, the dependency is strong, the problem of inaccurate character segmentation exists under the conditions of poor light, pollution, large inclination and the like, and the problem of low identification accuracy caused by inaccurate character segmentation exists.

Disclosure of Invention

In view of this, the present application provides a method, an apparatus and a computer device for identifying a container number, so as to provide a method for identifying a container number with high identification accuracy.

A first aspect of the present application provides a method for identifying a container number, where the method includes:

positioning a target area where the container number is located from an image to be identified containing the container number, and carrying out space transformation on the target area to obtain a transformed target area;

performing feature extraction on the transformed target region to obtain a first feature map;

inputting the first characteristic diagram into a pre-trained container number identification model, serializing the first characteristic diagram by the container number identification model to obtain a characteristic sequence, coding the characteristic sequence to obtain a coding result, and outputting a decoding result after decoding the coding result;

and determining the container number in the image to be identified according to the decoding result.

In a second aspect, the present application provides a container number identification device, which comprises a detection module, an identification module and a processing module, wherein,

the detection module is used for positioning a target area where the container number is located from the image to be identified containing the container number;

the identification module is used for carrying out spatial transformation on the target area to obtain a transformed target area;

the identification module is further used for extracting the features of the transformed target area to obtain a first feature map;

the identification module is further configured to input the first feature map into a pre-trained container number identification model, serialize the first feature map by the container number identification model to obtain a feature sequence, encode the feature sequence to obtain an encoding result, and decode the encoding result and output a decoding result;

and the processing module is used for determining the container number in the image to be identified according to the decoding result.

A third aspect of the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods provided by the first aspect of the present application.

A fourth aspect of the present application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods provided in the first aspect of the present application when executing the program.

The method, the device and the computer equipment for identifying the container number are characterized in that a target area where the container number is located in an image to be identified containing the container number, the target area is subjected to space transformation to obtain a transformed target area, the transformed target area is subjected to feature extraction to obtain a first feature map, the first feature map is input into a pre-trained container number identification model, the first feature map is serialized through the container number identification model to obtain a feature sequence, the feature sequence is subjected to coding processing to obtain a coding result, and a decoding result is output after the coding result is decoded, so that the container number in the image to be identified is determined according to the decoding result. Therefore, the container number can be identified based on the target area without character segmentation of the target area, and the identification accuracy is high.

Drawings

Fig. 1 is a flowchart of a first embodiment of a container number identification method provided in the present application;

FIG. 2 is a schematic view of a container number shown in an exemplary embodiment of the present application;

FIG. 3 is an implementation schematic diagram illustrating a first feature diagram serialization according to an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of an attention model shown in an exemplary embodiment of the present application;

fig. 5 is a flowchart of a second embodiment of a container number identification method provided in the present application;

FIG. 6 is a schematic diagram of a detection network shown in an exemplary embodiment of the present application;

fig. 7 is a flowchart of a third embodiment of a container number identification method provided in the present application;

FIG. 8 is a schematic diagram of an implementation of a method for identifying a container number according to an exemplary embodiment of the present application;

FIG. 9 is a block diagram of the hardware of a computing device in which a container identification device is located according to an exemplary embodiment of the present application;

fig. 10 is a schematic structural diagram of a first embodiment of a container number identification device provided in the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The application provides a container number identification method, a container number identification device and computer equipment, and aims to provide a container number identification method with high identification accuracy.

The container number identification method and device provided by the application can be applied to computer equipment. For example, it can be applied to an image capturing apparatus (the image capturing apparatus may be a camera), and for example, it can also be applied to a server. In the present application, this is not limited.

Several specific embodiments are given below to describe the technical solutions of the present application in detail, and these specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 1 is a flowchart of a first embodiment of a container number identification method provided in the present application. Referring to fig. 1, the method provided in this embodiment may include:

s101, positioning a target area where the container number is located from an image to be identified containing the container number, and carrying out space transformation on the target area to obtain a transformed target area.

It should be noted that the image to be identified is a snapshot frame image acquired by the image acquisition device, which may be an image acquired by the image acquisition device in real time and containing a container number, or an image stored by the image acquisition device and containing a container number.

Specifically, fig. 2 is a schematic diagram of a container number shown in an exemplary embodiment of the present application. Referring to fig. 2, the container numbers may be distributed horizontally or vertically, and the composition structure of the container numbers may be represented as XYZ. Wherein, X is a 4-digit main box number (all letters), Y is a 7-digit number, and Z is a 4-digit ISO number (note that the ISO number may not appear in some cases). It should be noted that the last digit in Y is a check code, and can be calculated from the 4-digit letter in X and the first 6-digit number in Y.

Further, a related object detection algorithm may be employed to locate the object region in which the container number is located from the image to be identified containing the container number. In the present embodiment, this is not limited. For example, in one embodiment, the target area where the container number is located may be located from the image to be identified based on a YOLO (young Only Look one) model.

For another example, in another embodiment, the detection network may be used to locate the target area where the container number is located from the image to be identified that contains the container number.

For example, a deep convolutional neural network can be constructed for detecting container numbers, the input of the deep convolutional neural network is set as an image, the output is the coordinate position of the container number on the image, a quadrangle is adopted to represent the area coordinate of each row or each column, when the container number is composed of a plurality of rows or a plurality of columns, a plurality of quadrangle coordinates are output, and the quadrangle can be inclined to represent that the container number has a certain direction.

The constructed deep convolutional neural network may be a modified network based on an SSD network. For example, in one embodiment, the network may comprise a 29-tier full convolutional network structure, where the first 13 tiers are inherited from the VGG-16 network (the last fully-connected tier in the VGG16 network is converted to a convolutional layer), followed by a 10-tier full convolutional network structure, followed by a Text-box layer that connects 6 full convolutional network structures. The Text-box layer is a key component of the constructed SSD modification-based network, and 6 full-convolution network structures in the 6 Text-box layers are respectively connected with 6 feature map positions in the previous network, wherein each feature map position is a Text-box layer prediction n-dimensional vector (the n-dimensional vector can comprise 2 dimensions of texts, 4 dimensions of horizontal boundary rectangles, 5 dimensions of rotation boundary rectangles and 8 dimensions of quadrangles).

It should be noted that the detailed description of the detection network will be described in the following embodiments, and will not be described herein.

Further, the STN network may be used to perform spatial transformation on the target region to obtain a transformed target region. In a specific implementation, the target region is input to a pre-trained STN (Spatial transforms, STN) network, and the STN network performs Spatial transformation on the target region and outputs the transformed target region.

Specifically, the STN network does not require the calibration of key points, and can perform spatial transformation (the spatial transformation includes, but is not limited to, translation, scaling, rotation, etc.) on the target region. Thus, the converted target area is used for identification, and the identification accuracy can be improved.

It should be noted that, for the specific structure and the specific implementation principle of the STN network, reference may be made to the description in the related art, and details are not described herein.

And S102, performing feature extraction on the transformed target region to obtain a first feature map.

Specifically, the feature extraction may be performed on the transformed target region by using a conventional method. For example, a Scale-invariant Feature Transform (SIFT) algorithm is used to extract features of the transformed target region. Of course, the neural network may also be used to perform feature extraction on the transformed target region, for example, in an embodiment, a specific implementation process of this step may include:

inputting the transformed target area into a neural network for feature extraction, and performing feature extraction on the transformed target area by a specified layer in the neural network; the designated layer comprises a convolutional layer, or the designated layer comprises a convolutional layer and at least one of a pooling layer and a fully-connected layer; and determining the output result of the specified layer as the first feature map.

Specifically, the neural network for feature extraction may include a convolutional layer for performing a filtering process on the input transformed target region. Further, at this time, the first characteristic diagram is extracted as a result of the filtering process performed by the convolutional layer. In addition, the neural network for feature extraction may further include a pooling layer and/or a fully connected layer. For example, in one embodiment, the neural network for feature extraction includes a convolutional layer, a pooling layer and a fully-connected layer, where the convolutional layer is used for performing filtering processing on the input transformed target region; the pooling layer is used for compressing the filtering result; and the full connection layer is used for carrying out aggregation processing on the compression processing result. Further, at this time, the aggregation processing result output by the full connection layer is the first feature map extracted.

S103, inputting the first characteristic diagram into a pre-trained container number identification model, serializing the first characteristic diagram by the container number identification model to obtain a characteristic sequence, coding the characteristic sequence to obtain a coding result, and decoding the coding result and outputting a decoding result.

Specifically, fig. 3 is an implementation schematic diagram illustrating a first feature diagram in a serialization manner according to an exemplary embodiment of the present application. Referring to fig. 3, the process of serializing the first feature map may include:

(1) sliding a preset sliding window on the first characteristic diagram according to a preset moving step length to divide a local characteristic diagram of the position of the sliding window;

(2) and determining all the segmented local feature maps as the feature sequence.

Specifically, in an embodiment, the container number identification model may be an attention model, and the attention model may include a convolutional network, and step (1) may be implemented by the convolutional network.

In addition, the size of the preset sliding window is adapted to the first characteristic diagram. For example, when the dimension of the first feature map is a × B × C (where a and B are height and width of the first feature map, respectively, and C is the number of channels included in the first feature map). At this time, the size of the sliding window may be set to a × a. In addition, the preset moving step length is set according to actual needs. In the present embodiment, this is not limited. For example, in one embodiment, the preset moving step is 2.

Further, referring to fig. 3, in a specific implementation, a preset sliding window may be placed at one end of the first feature map, a local feature map of a position where the sliding window is located may be segmented, the sliding window may be moved based on a preset moving step, and a local feature map of a position where the sliding window is located may be segmented. Thus, this process is repeated until the sliding window is moved to the other end of the first profile. And finally, determining all the segmented local feature maps as a feature sequence.

It should be noted that, when the first feature map is segmented by using the preset sliding window and the preset moving step, if the final remaining portion cannot be covered by the sliding window, the first feature map may be filled. In addition, since the first feature map includes a plurality of channels, each of the divided partial feature maps also includes a plurality of channels.

Further, fig. 4 is a schematic diagram of an attention model according to an exemplary embodiment of the present application. Referring to FIG. 4, the attention model may further include an input layer, a hidden layer and an output layer connected in sequence, wherein X (X)₁、X₂、X₃、X₄……、X_mRepresenting a sequence of features input to the input layer α_t，1、α_t，2、α_t，3、α_t，3……α_t，mRepresenting the sequence of features at time tThe weight parameter of each feature in the column (the dimension of the weight parameter of each feature is the same as that of the feature); ct represents the coding result at time t; s_t-1、S_tA hidden state (initial time, hidden state 0) indicating context correlation at each time; y is_t、y_t+1Indicating the decoding result at each time instant. In this example, the decoding result at each time may include the confidence level of each candidate character at that time (the candidate characters are in a predetermined category in the output layer (the output layer is composed of classifiers), which in this example includes 37 categories (0-9, a-Z and one terminator), and the character recognized at that time (the recognized character is the candidate character with the highest confidence level in each candidate character).

Referring to fig. 4, a specific implementation process of encoding the feature sequence to obtain an encoding result and outputting a decoding result after decoding the encoding result is described in detail below, and the process may include:

(1) and calculating the weight parameter of each feature in the feature sequence at each moment.

In particular, this step is implemented by the input layer. In addition, the weight parameter of each feature in each time feature sequence can be calculated according to a first formula, where the first formula is:

wherein, α_t，iThe weight parameter of the ith characteristic in the characteristic sequence at the moment t;

X_iis the ith feature in the feature sequence;

s_t-1is a hidden layer state at the time of t-1;

is an activation function;

w and U are model parameters of the attention model.

(2) And calculating the coding result of each moment according to the weight parameter of each characteristic in the characteristic sequence at each moment and the characteristic sequence.

Specifically, this step is implemented by a hidden layer. In addition, the implementation process of the step can include: and performing weighted summation processing on the characteristic sequences by using the weight parameters of the characteristics in the characteristic sequences at each moment, and determining the obtained weighted sum as the coding result at the moment.

Referring to the foregoing description, the process may be represented by a second formula:

where Ct is the encoding result at time t.

(3) And calculating the hidden layer state of the context correlation at each moment according to the characteristic sequence and the coding result at each moment.

Specifically, this step is implemented by a hidden layer. Further, the hidden state of the context correlation at each time may be calculated according to a third formula:

s_t＝LSTM(s_t-1，C_t，y_t-1)

the hidden layer state at time t is related to the hidden layer state at time t-1, the decoding result at time t, and the decoding result output by the attention model at time t-1.

(4) And obtaining a decoding result of each moment according to the context-dependent hidden layer state of each moment.

In particular, this step is implemented by the output layer. Further, the decoding result at each time may be calculated by a fourth formula:

y_t＝softmax(s_t)

specifically, the decoding result at each time includes the confidence of each candidate character at the time and the character recognized at the time. The character recognized at this time is the most reliable candidate character among the candidate characters.

According to the method provided by the embodiment, the attention model is used for identifying the target area after space transformation, so that the container number can be identified based on the target area without character identification, and the accuracy is high.

And S104, determining the container number in the image to be identified according to the decoding result.

Specifically, in an embodiment, the characters identified in the decoding result can be directly and sequentially combined, and the combination result is determined as the container number in the image to be identified.

In the method provided by this embodiment, a target area where a container number is located in an image to be identified containing the container number, the target area is subjected to spatial transformation to obtain a transformed target area, features of the transformed target area are extracted to obtain a first feature map, the first feature map is input into a pre-trained container number identification model, the container number identification model sequences the first feature map to obtain a feature sequence, the feature sequence is encoded to obtain an encoding result, and the encoding result is decoded and then a decoding result is output, so that the container number in the image to be identified is determined according to the decoding result. Therefore, the container number can be identified based on the target area where the whole container number is located without character segmentation, and the identification accuracy is high. In addition, the method provided by the application can identify the container numbers of rotating, bending and inclining (namely can identify the container numbers with larger deformation) without adding manual marks, and has wide applicability.

Fig. 5 is a flowchart of a second embodiment of a container number identification method provided in the present application. Referring to fig. 5, in the method provided in this embodiment, based on the above embodiment, in step S101, a specific implementation process of locating a target area where a container number is located in an image to be identified that includes the container number may include:

s501, inputting the image to be recognized into a pre-trained detection network, extracting multi-level features of the image to be recognized by the detection network to obtain a specified number of second feature maps, classifying and performing position regression on each second feature map respectively, and outputting classification results and position information of a plurality of candidate regions; and the dimensions of the specified number of second feature maps are different.

In particular, the detection network may be implemented by convolutional layers. The specified number is set according to actual needs. For example, in one embodiment, the specified number is 6.

Fig. 6 is a schematic diagram of a detection network according to an exemplary embodiment of the present application. Referring to fig. 6, the detection network may include a 29-layer full convolutional network structure, in which the first 13 layers of the full convolutional network structure are inherited from the VGG-16 network (the last fully connected layer in the VGG-16 network is converted into a convolutional layer), and are followed by 10 layers of the full convolutional network structure (such as Conv6 to Conv11_2 in fig. 6, in which Conv8_2, Conv9_2, Conv10_2, and Conv11_2 are respectively preceded by one layer of the full convolutional network structure, which is not shown in fig. 6). It should be noted that, referring to fig. 6, the 23-layer full convolution network structure is used to perform multi-layer feature extraction, so as to obtain 6 second feature maps.

Further, with continued reference to FIG. 6, the 23-layer full convolutional network structure is followed by 6 Text-box layers, each of which may be implemented by a full convolutional network structure. Referring to fig. 6, each Text-box layer is connected to the previous full convolution network structure, and is configured to perform classification and position regression on the second feature map output by the full convolution network structure, and output classification results and position information of the multiple candidate regions.

In addition, referring to fig. 6, an NMS layer is connected to the back of the Text-box layer, and is configured to perform non-maximum suppression processing on the plurality of candidate regions based on the classification result and the location information of each candidate region, so as to obtain a target region where the container number is located.

It should be noted that, in an embodiment, the classification result may be a binary classification result of the foreground and the background. In addition, the position information of a candidate region may be characterized by a 12-dimensional vector including coordinates of four points of a quadrangle containing the candidate region (8-dimensional), and coordinates of a center point of a horizontal rectangle circumscribing the quadrangle, a width, and a height (4-dimensional). Of course, in an embodiment, the position information may further include coordinates and a width (5 dimensions) of two diagonal points of the rotation boundary rectangle corresponding to the candidate region.

It should be noted that, because the dimensions of the plurality of second feature maps are different, that is, the receptive fields of the plurality of second feature maps are different, the finally obtained target region is obtained by performing classification and position regression on the feature maps of the plurality of different receptive fields, and has a strong multi-scale detection capability.

The implementation principle of the Text-box layer for classifying and position regression of the second feature map is briefly described below.

Specifically, the Text-box layer includes three portions: a candidate box layer, a classification layer and a position regression layer. The candidate frame layer is used for generating a plurality of candidate frames with different sizes at the positions of the pixel points according to a preset rule by taking each pixel point in the second characteristic diagram as a center, and then providing the candidate frames for the classification layer and the position regression layer to perform classification judgment and position refinement.

Further, the classification layer outputs a probability that each candidate box belongs to the foreground and the background. The position regression layer outputs position information of each candidate frame. It should be noted that the classification layer and the position regression layer are implemented by convolution layers. In addition, to accommodate text detection, the convolutional layer may employ a 3 x 5 convolutional kernel.

For example, in one embodiment, the dimension of one of the second feature maps is 40 × 42 × 128, where 40 × 42 represents the height and width of the second feature map, and 128 represents the number of channels included in the second feature map. The candidate frame layer generates 20 different candidate frames at the position of each pixel point in the second feature map as a center according to a preset rule (for a specific implementation principle, refer to the description in the related art, and no further description is given here).

Further, the dimension of the convolution kernel in the classification layer is 40 × 3 × 5 × 128, where 40 denotes the number of convolution kernels and 3 × 5 denotes the size of the convolution kernel. The step size of the convolution kernel shift is 1. In this way, the classification layer performs convolution processing on the second feature map to obtain a first convolution processing result. The dimension of the first convolution process result is 40 x 42. Further, for a first target convolution processing result corresponding to each pixel point in the first convolution processing result (the first target convolution processing result includes 40 convolution values), the first convolution processing result characterizes classification results of 20 candidate frames corresponding to the pixel point (the classification result of each candidate frame includes two classifications (two dimensions) of a foreground and a background).

Further, for example, the dimension of the convolution kernel of the position regression layer is 240 × 3 × 5 × 128, where 240 represents the number of convolution kernels and 3 × 5 represents the size of the convolution kernel. The step size of the convolution kernel shift is 1. In this way, the position regression layer performs convolution processing on the second feature map to obtain a second convolution processing result. The dimension of the second convolution processing result is 240 × 40 × 42. And for a second target convolution processing result corresponding to each pixel point in the second convolution processing result (the second target convolution processing result includes 240 convolution values), the second target convolution processing result represents position information of 20 candidate frames corresponding to the pixel point. Referring to the foregoing description, the position information of each candidate frame includes 12 dimensions. For example, the first 12 convolution values of the 240 convolution values represent the position information of the first candidate box.

The above-mentioned candidate frame is understood as a candidate region.

S502, based on the classification result and the position information of each candidate area, performing non-maximum suppression processing on the plurality of candidate areas to obtain a target area where the container number is located.

Continuing to refer to FIG. 6, in the example shown in FIG. 6, this step may be implemented by detecting the NMS layer in the network.

It should be noted that, for a specific implementation principle and implementation procedure of non-maximum suppression, reference may be made to the description in the related art, and details are not described here.

In the method provided by this embodiment, an image to be recognized is input to a pre-trained detection network, the detection network performs multi-level feature extraction on the image to be recognized to obtain a specified number of second feature maps, and performs classification and position regression on each second feature map, respectively, to output classification results and position information of a plurality of candidate regions, and performs non-maximum suppression processing on the plurality of candidate regions based on the classification results and the position information of each candidate region to obtain a target region where a container number is located. In this way, the dimensions of the second feature maps are different, that is, the receptive fields of the second feature maps are different, and the finally obtained target region is obtained by performing classification and position regression on the feature maps of the different receptive fields, so that the method has strong multi-scale detection capability. Therefore, the target area where the container number is located can be accurately located.

Optionally, in a possible implementation manner of the present application, in step S101, a specific implementation process of locating a target area where a container number is located in an image to be identified that includes the container number may include:

(1) and adjusting the size of the image to be identified to obtain a plurality of target images with different sizes.

For example, in an embodiment, the target images with different sizes may be obtained by performing interpolation processing or downsampling processing on the image to be recognized.

(2) Inputting the target images into a pre-trained detection network aiming at each target image, performing multi-level feature extraction on the target images by the detection network to obtain a specified number of second feature maps, performing classification and position regression on each second feature map respectively, and outputting classification results and position information of a plurality of candidate regions; wherein the specified number of second feature maps differ in dimension.

(3) And performing non-maximum suppression processing on the plurality of candidate areas based on the classification result and the position information of each candidate area to obtain a target area where the container number in the target image is located.

Specifically, the specific implementation process and implementation principle of steps (2) and (3) may refer to the description in the foregoing embodiments, and are not described herein again.

(4) And determining the target area where the container number in the image to be identified is located according to the target area where the container number in the target image is located.

Specifically, the non-maximum suppression processing may be performed on the target area where the container number in the plurality of target images is located based on the target area where the container number in the target image is located, so as to obtain the target area where the container number in the image to be identified is located.

According to the method provided by the embodiment, the size of the image to be recognized is adjusted to obtain a plurality of target images with different sizes, and then the target area where the container is located in the image to be recognized is located based on the target images. In this way, the accuracy of the positioning can be further improved.

It should be noted that the networks used in the present application are all networks trained in advance. The training process of the network may include:

(1) constructing a network;

for example, in constructing a detection network, the input is set as an image, and the output is set as position information of an area where a container number is located. For example, a quadrangle, which may be inclined to indicate that a box number has a certain direction, is used to indicate the area coordinates of each row or each column, and when a box number is composed of a plurality of rows or a plurality of columns, a plurality of quadrangle coordinates are output.

For another example, the identification network may be configured such that the area where the container number is input is output as a box number character string, and is represented by a line XYZ, where X represents a 4-digit master box number, Y represents a 7-digit number, and Z represents a 4-digit ISO number.

(2) Obtaining a training sample;

for example, in this example, when training the detection network, the label information of the training sample is the position information of the area where the container number is located. It should be noted that a complete box number may be composed of multiple rows or columns, and each row or column should be labeled with coordinates of a quadrangle, which encloses all characters of the row or column, but should not leave too much space, and the quadrangle may be tilted to indicate that the box number has a certain direction.

For another example, when training the container number identification network, the label information of the training sample is a box number character string. It should be noted that a complete box number may be composed of multiple rows or multiple columns, the labeling result is written in a row XYZ format, X represents a 4-bit main box number, Y represents a 7-bit number, Z represents a 4-bit ISO number, the last digit in Y is a check digit, and can be calculated from a 4-bit letter in X and the first 6-bit digits of Y, and the labeling should be verified.

(3) And training the network by adopting the training set to obtain the trained network.

Specifically, the network parameters in the network may be set to specified values, and then the network may be trained by using the obtained training samples to obtain a trained network.

Specifically, the process may include two stages of forward propagation and backward propagation: forward propagation, namely inputting a training sample, performing forward propagation on the training sample to extract data characteristics, and calculating a loss function; and backward propagation, namely performing backward propagation from the last layer of the network to the front layer of the network by using the loss function, and modifying the network parameters of the network by using a gradient descent method so as to converge the loss function.

Fig. 7 is a flowchart of a third embodiment of a container number identification method provided in the present application. Referring to fig. 7, in the method provided in this embodiment, based on the above embodiment, in step S104, the process of determining the container number in the image to be identified according to the decoding result may include:

s701, judging whether the decoding result meets a specified check rule.

Specifically, the specific implementation process of this step may include:

(1) and judging whether the composition structure of the decoding result is matched with the composition structure of the container number.

Specifically, as can be seen from the foregoing description, the constituent structures of the container numbers can be expressed in XYZ form. Wherein, X is a 4-digit main box number (all letters), Y is a 7-digit number, and Z is a 4-digit ISO number (note that the ISO number may not appear in some cases). In this step, when the first 4 characters identified in the decoding result are all letters and the 5 th to 11 th characters identified in the decoding result are all numbers, it is determined that the composition structure of the decoding result is matched with the composition structure of the container number, otherwise, it is determined that the composition structure of the decoding result is not matched with the composition structure of the container number.

(2) If not, determining that the decoding result does not meet the check rule.

Specifically, when the component structure of the decoding result is determined not to match the component structure of the container number according to the step (1), the decoding result is determined not to satisfy the check rule. For example, in one embodiment, when there is a number in the first 4 characters identified in the decoding result, it is determined that the composition result of the decoding result does not match with the composition structure of the container number, and it is determined that the decoding result does not satisfy the check rule. For another example, when letters exist in the 5 th character to the 11 th character identified in the decoding result, it is determined that the composition result of the decoding result does not match with the composition structure of the container number, and it is determined that the decoding result does not satisfy the check rule.

(3) If yes, calculating the check value of the decoding result according to a preset rule.

(4) And judging whether the check value is equal to the check code identified in the decoding result.

(5) And if the check value is equal to the check code identified in the decoding result, determining that the decoding result meets the check rule, otherwise, determining that the decoding result does not meet the check rule.

Specifically, the check value of the decoding result can be calculated according to the following method that (1) the first 4 characters identified in the decoding result are converted into numbers according to the preset corresponding relation between the letters and the numbers to obtain the converted decoding result; (2) calculating the check value of the decoding result according to the following formula:

s is a check value of a decoding result;

C_nis the nth character in the converted decoding result.

Specifically, when the first 4 characters identified in the decoding result are converted into numbers, the conversion may be performed according to a preset correspondence relationship between the characters and the numbers. For example, table 1 shows the preset correspondence between letters and numbers for an exemplary embodiment of the present application:

TABLE 1 Preset correspondence between letters and numbers

In addition, referring to the foregoing description, the 11 th bit in the container number is the check code, and therefore, the 11 th character identified in the decoding result is the identified check code. In this step, it is determined whether the calculated check value is equal to the check code identified in the decoding result, and it is determined that the decoding result satisfies the check rule when the calculated check value is equal to the check code identified in the decoding result, otherwise it is determined that the decoding result does not satisfy the check rule.

And S702, if so, determining a first combination result obtained by sequentially combining all the characters identified in the decoding result as the container number in the image to be identified.

S703, if not, correcting the decoding result to obtain a corrected decoding result, and determining a second combination result obtained by sequentially combining all characters in the corrected decoding result as the container number in the image to be identified; wherein the corrected decoding result satisfies the check rule.

Specifically, in a possible implementation manner, a specific implementation process of the step may include:

(1) and (3) executing the step (2) when the composition structure of the decoding result is not matched with the composition structure of the container number, and executing the step (5) when the composition structure of the decoding result is matched with the composition structure of the container number.

(2) And carrying out first correction on the decoding result so that the composition structure of the decoding result after the first correction is matched with the composition structure of the container number.

Specifically, if there are numbers in the first 4 characters identified in the decoding result, in an embodiment, it may be determined whether the numbers are misrecognized characters recorded in a different type character misrecognition table established in advance, if so, the numbers are replaced with letters corresponding to the numbers recorded in the different type character misrecognition table, otherwise, the numbers are replaced with letters with the highest confidence level in each candidate character at that moment. Of course, in another embodiment, the number may be directly replaced by the letter with the highest confidence in each candidate character at that time.

Further, if there is a letter in the 5 th to 11 th characters identified in the decoding result, in an embodiment, it may be determined whether the letter is a misrecognized character recorded in a different type character misrecognition table established in advance, if so, the letter is replaced with a number corresponding to the letter recorded in the different type character misrecognition table, otherwise, the letter is replaced with a number with the highest confidence level in each candidate character at the time. Of course, in an embodiment, the letter may be directly replaced with the most reliable number in each candidate character at that time.

Table 2 for example, the present application illustrates pre-established tables of different types of character misidentifications as an exemplary embodiment. Referring to table 2, at the time of recognition, 0 is easily mistaken for O, or O is mistaken for 0. For example, in one embodiment, when the decoding result identifies that "0" exists in the first 4 bits, then "0" is replaced with "O".

TABLE 2 misrecognized table of different types of characters

0	O
		1	I
2	Z

(3) And when the first corrected decoding result meets the verification rule, determining a combination result obtained by sequentially combining all characters in the first corrected decoding result as the container number in the image to be identified.

(4) And (5) when the decoding result after the first correction does not meet the check rule, executing the step.

(5) Performing second correction on the decoding result or the decoding result after the first correction to obtain a second corrected decoding result, and determining a combination result obtained by sequentially combining all characters in the second corrected decoding result as the container number in the image to be recognized; wherein the second corrected decoding result satisfies the check rule.

In a possible implementation manner, the character with the lowest confidence level in the decoding result or the characters in the first corrected decoding result may be modified to obtain the second corrected decoding result.

According to the check rule, the target character corresponding to the character with the lowest confidence coefficient when the check rule is met is calculated, and the character with the lowest confidence coefficient is replaced by the target character to obtain a second corrected decoding result.

In addition, in another possible implementation manner, whether the misrecognized character recorded in the character misrecognized table of the same type exists in the first 10 characters in the decoding result or the first corrected decoding result may be determined according to a pre-established character misrecognized table of the same type, if the misrecognized character recorded in the character misrecognized table of the same type exists in the 10 characters, the misrecognized character is replaced with a character corresponding to the misrecognized character to obtain a corrected decoding result, whether the corrected decoding result meets the check rule is further determined, and if the corrected decoding result meets the check rule, the combination result obtained by sequentially combining the characters in the corrected decoding result is determined as the identified container number. If not, modifying the decoding result or the number with the lowest confidence degree in each character in the first corrected decoding result according to the method to obtain a second corrected decoding result.

Further, if there are at least two misrecognized characters recorded in the character misrecognized table of the same type in the 11 characters, at this time, any one misrecognized character may be replaced with a character corresponding to the misrecognized character to obtain a plurality of corrected decoding results, and further determine whether there is a target decoding result satisfying the check rule in the plurality of corrected decoding results, and if so, determine the target decoding result as a second corrected decoding result. Namely, the combination result after all characters in the target decoding result are combined in sequence is determined as the identified container number. If the corrected decoding result does not exist, replacing any two mistaken recognized characters with the characters corresponding to the mistaken recognized characters to obtain at least one corrected decoding result, further judging whether the corrected decoding result exists a target decoding result meeting the check rule, and if the corrected decoding result exists, determining the target decoding result as a second corrected decoding result. Namely, the combination result after all the characters in the target decoding result are combined in sequence is determined as the container number in the image to be identified. And if the number does not exist, modifying the decoding result or the number with the lowest confidence degree in each character in the first corrected decoding result according to the method to obtain a second corrected decoding result.

It should be noted that, in this example, if there is no misrecognized character recorded in the character misrecognition table of the same type, at this time, the number with the lowest confidence level in each character in the decoding result or the first corrected decoding result may be directly modified to obtain the second corrected decoding result.

For example, Table 3 illustrates an exemplary embodiment of a pre-established list of like-type misrecognized characters. Referring to table 3, at the time of recognition, "M" is easily recognized as "N" by mistake. Therefore, if "M" is identified in the decoding result, then "M" may be modified to "N".

TABLE 3 same type character list for misrecognition

M	N
		O	D
U	J
		E	F
L	I
		6	8

In the method provided by this embodiment, by determining whether the decoding result meets a specified check rule, when the decoding result meets the specified check rule, a first combination result obtained by sequentially combining the characters identified in the decoding result is determined as the container number in the image to be identified, and when the decoding result does not meet the specified check rule, the decoding result is corrected to obtain a corrected decoding result, and a second combination result obtained by sequentially combining the characters in the corrected decoding result is determined as the container number in the image to be identified. Wherein, the corrected decoding result meets the check rule. In this way, the accuracy can be further improved.

Fig. 8 is an implementation schematic diagram of a container number identification method according to an exemplary embodiment of the present application. Referring to fig. 8, in the example shown in fig. 8, the STN network, the network for feature extraction, and the container number recognition model are integrated into a recognition network, and when a target area is input into the recognition network, the network can output a decoding result. In this way, the container number in the image to be identified can be determined based on the decoding result.

Corresponding to the embodiment of the container number identification method, the application also provides an embodiment of a container number identification device.

The embodiment of the container number identification device can be applied to computer equipment. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and is formed by reading corresponding computer program instructions in the memory into the memory for operation through the processor of the computer device where the software implementation is located as a logical means. From a hardware level, as shown in fig. 9, a hardware structure diagram of a computing device where a container number identification apparatus is located is shown in an exemplary embodiment of the present application. In addition to the storage 910, the processor 920, the memory 930, and the network interface 940 shown in fig. 9, the computer device in which the apparatus is located in the embodiment may also include other hardware, which is not described again, generally according to the actual function of the apparatus identified by the container number.

Fig. 10 is a schematic structural diagram of a first embodiment of a container number identification device provided in the present application. Referring to fig. 10, the container number identification apparatus provided in this embodiment may include a detection module 100, an identification module 200, and a processing module 300, wherein,

the detection module 100 is configured to locate a target area where a container number is located from an image to be identified, where the image includes the container number;

the identification module 200 is configured to perform spatial transformation on the target region to obtain a transformed target region;

the identification module 200 is further configured to perform feature extraction on the transformed target region to obtain a first feature map;

the identification module 200 is further configured to input the first feature map into a pre-trained container number identification model, serialize the first feature map by the container number identification model to obtain a feature sequence, encode the feature sequence to obtain an encoding result, and decode the encoding result and output a decoding result;

The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods provided in the first aspect of the present application.

In particular, computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks.

With continued reference to fig. 9, the present application further provides a computer device, comprising a memory 910, a processor 920 and a computer program stored on the memory 910 and executable on the processor 920, wherein the processor 920 implements the steps of any one of the methods provided in the first aspect of the present application when executing the program.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A method for identifying a container number, the method comprising:

2. The method of claim 1, wherein determining the container number in the image to be identified according to the decoding result comprises:

judging whether the decoding result meets a specified check rule or not;

if so, determining a first combination result obtained by sequentially combining all the characters identified in the decoding result as the container number in the image to be identified;

if not, correcting the decoding result to obtain a corrected decoding result; wherein the corrected decoding result satisfies the check rule;

and determining a second combination result obtained by sequentially combining all characters in the corrected decoding result as the container number in the image to be identified.

3. The method of claim 2, wherein the determining whether the decoding result satisfies a specified check rule comprises:

judging whether the composition structure of the decoding result is matched with the composition structure of the container number;

if not, determining that the decoding result does not meet the check rule;

if so, calculating a check value of the decoding result according to a preset rule;

judging whether the check value is equal to the check code identified in the decoding result or not;

and if the check value is equal to the check code identified in the decoding result, determining that the decoding result meets the check rule, otherwise, determining that the decoding result does not meet the check rule.

4. The method of claim 1, wherein locating the target area in which the container number is located from the image to be identified containing the container number comprises:

inputting the image to be recognized into a pre-trained detection network, performing multi-level feature extraction on the image to be recognized by the detection network to obtain a specified number of second feature maps, performing classification and position regression on each second feature map respectively, and outputting classification results and position information of a plurality of candidate regions; wherein the dimensions of the specified number of second feature maps are different;

and performing non-maximum suppression processing on the plurality of candidate areas based on the classification result and the position information of each candidate area to obtain a target area where the container number is located.

5. The method of claim 1, wherein locating the target area in which the container number is located from the image to be identified containing the container number comprises:

carrying out size adjustment on the image to be identified to obtain a plurality of target images with different sizes;

inputting the target images into a pre-trained detection network aiming at each target image, performing multi-level feature extraction on the target images by the detection network to obtain a specified number of second feature maps, performing classification and position regression on each second feature map respectively, and outputting classification results and position information of a plurality of candidate regions; wherein the dimensions of the specified number of second feature maps are different;

performing non-maximum suppression processing on the candidate areas based on the classification result and the position information of each candidate area to obtain a target area where the container number in the target image is located;

and determining the target area where the container number in the image to be identified is located according to the target area where the container number in the target image is located.

6. The method of claim 1, wherein the spatially transforming the target region to obtain a transformed target region comprises:

and inputting the target area into a pre-trained STN network, and outputting the converted target area after the target area is subjected to spatial conversion by the STN network.

7. The method according to claim 1, wherein the encoding the signature sequence to obtain an encoding result and decoding the encoding result to output a decoding result, comprises:

calculating the weight parameter of each feature in the feature sequence at each moment;

calculating the coding result of each moment according to the weight parameter of each characteristic in the characteristic sequence at each moment and the characteristic sequence;

calculating a hidden layer state related to the context at each moment according to the characteristic sequence and the coding result at each moment;

and obtaining a decoding result of each moment according to the context-dependent hidden layer state of each moment.

8. The device for identifying the container number of the container is characterized by comprising a detection module, an identification module and a processing module, wherein,

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-7 are implemented when the program is executed by the processor.