WO2022205018A1 - 车牌字符识别方法、装置、设备及存储介质 - Google Patents

车牌字符识别方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022205018A1
WO2022205018A1 PCT/CN2021/084183 CN2021084183W WO2022205018A1 WO 2022205018 A1 WO2022205018 A1 WO 2022205018A1 CN 2021084183 W CN2021084183 W CN 2021084183W WO 2022205018 A1 WO2022205018 A1 WO 2022205018A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
tensor
license plate
character
character recognition
Prior art date
Application number
PCT/CN2021/084183
Other languages
English (en)
French (fr)
Inventor
张玉兵
Original Assignee
广州视源电子科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司 filed Critical 广州视源电子科技股份有限公司
Priority to CN202180024657.XA priority Critical patent/CN115485746A/zh
Priority to PCT/CN2021/084183 priority patent/WO2022205018A1/zh
Publication of WO2022205018A1 publication Critical patent/WO2022205018A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the embodiments of the present application relate to the technical field of image processing, and in particular, to a method, device, device, and storage medium for character recognition of a license plate.
  • a license plate means a license plate, but also a vehicle number plate.
  • License plate recognition refers to the extraction and recognition of moving license plates from complex backgrounds. License plate recognition technology has been widely used in various fields, such as the high-speed bayonet detection scene applied in the transportation field, and the parking lot entrance and exit detection scene applied in the security field.
  • the photographing device of the license plate is installed in a fixed position, and is equipped with a corresponding photographing trigger mechanism (such as a ground-sensing trigger mechanism) to realize the photographing of the license plate.
  • the image is processed by using the license plate recognition method based on character segmentation to obtain the recognition result of the license plate characters.
  • the license plate recognition method based on character segmentation refers to dividing the license plate in the image according to the characters to obtain multiple sub-images, each sub-image contains a character, and then classifying each sub-image to determine its the corresponding character.
  • FIG. 1 is a license plate image provided by the prior art, which is an image obtained by photographing the license plate when the photographing device is fixed. It should be noted that the first four characters in FIG. Q" and "G") are normal shooting effects, and the following characters have been blurred to avoid information leakage.
  • FIG. Q" and "G" are normal shooting effects, and the following characters have been blurred to avoid information leakage.
  • FIG. 2 is another license plate image provided by the prior art, which is an image obtained by photographing the license plate when the photographing device moves.
  • the license plate shown in FIG. 2 is obviously blurred.
  • Embodiments of the present application provide a license plate character recognition method, device, device and storage medium to solve the technical problem in the related art that the license plate characters cannot be accurately recognized when the license plate image quality is low.
  • an embodiment of the present application provides a method for character recognition of a license plate, including:
  • the target image displays a license plate to be recognized, and the license plate to be recognized contains a plurality of characters;
  • the feature tensor is input into each feature fusion network, and the feature vector of each of the characters is obtained through the feature fusion network, and each of the feature fusion networks outputs a feature vector;
  • Each of the feature vectors is input into a corresponding classifier, and the classifier is used to obtain a character recognition result of each of the characters, and each of the classifiers outputs a character recognition result.
  • the embodiments of the present application also provide a license plate character recognition device, including:
  • an image acquisition module configured to acquire at least one target image, wherein the target image displays a license plate to be recognized, and the license plate to be recognized contains a plurality of characters;
  • a feature tensor determination module used to process the target image by using the backbone network to obtain the feature tensor of the license plate to be recognized;
  • a feature vector determination module for inputting the feature tensor into each feature fusion network, and obtaining the feature vector of each of the characters through the feature fusion network, and each of the feature fusion networks outputs a feature vector;
  • a recognition result determination module which is used to input each of the feature vectors into the corresponding classifiers, and use the classifiers to obtain character recognition results of each of the characters, and each of the classifiers outputs a character recognition result .
  • an embodiment of the present application also provides a license plate character recognition device, including:
  • processors one or more processors
  • memory for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the license plate character recognition method according to the first aspect.
  • an embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the license plate character recognition method as described in the first aspect.
  • the above method, device, equipment and storage medium for character recognition of license plate by acquiring a target image containing the license plate to be recognized, and using the backbone network to obtain the feature tensor of the license plate to be recognized, and using the feature fusion network to obtain the feature vector of each character in the license plate to be recognized , the technical means of obtaining the character recognition results of each character by using the classifier solves the technical problem that the characters of the license plate cannot be accurately recognized when the image quality of the license plate is low.
  • the above technical means do not involve character segmentation, but directly predict each character, which avoids the dependence on character segmentation, thereby reducing the dependence on image quality.
  • the structures of the backbone network, the feature fusion network and the classifier are less limited and can be adjusted according to the actual situation, making the above scheme more flexible.
  • Fig. 1 is a kind of license plate image that the prior art provides
  • Fig. 2 is another license plate image provided by the prior art
  • FIG. 3 is a flowchart of a method for recognizing license plate characters according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of a double-layer character license plate provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a license plate character recognition model provided by an embodiment of the present application.
  • Fig. 6 is a kind of target image provided by the embodiment of this application.
  • FIG. 7 is an exemplary diagram of a flow chart for generating a spatial attention map according to an embodiment of the present application.
  • FIG. 8 is an example diagram of another spatial attention map generation process provided by an embodiment of the present application.
  • FIG. 9 provides another target image according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of another license plate character recognition model provided by an embodiment of the application.
  • FIG. 11 is another target image provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a license plate character recognition device provided by an embodiment of the application.
  • FIG. 14 is a schematic structural diagram of a license plate character recognition device according to an embodiment of the present application.
  • CTC Connectionist temporal classification
  • CTC is a time series classification algorithm, which can be trained in time series classification tasks without the need for labels to be aligned one by one in time, which reduces the redundant work of label pre-definement.
  • CTC has been widely used in speech recognition and optical character recognition.
  • CTC is applied in the field of license plate recognition, which uses a deep neural network model to realize license plate recognition.
  • the training process of the deep neural network model may be: acquiring a large number of images containing license plates, and then determining the license plate bounding box in the image. coordinates and key points (such as the diagonal points of the license plate) coordinates. For example, the coordinates of the license plate bounding box 11 and the key points 12 shown in FIG. 2 are determined.
  • affine or perspective transformation is performed on the license plate in the image to align the license plate, that is, convert the license plate in Figure 2 into a regular rectangle.
  • a large number of aligned images are used to train a deep neural network model, wherein the deep neural network model includes a backbone network and a head network. Calculate the loss function.
  • the backbone network is reserved for application. In the application process, the image that needs to recognize the license plate is obtained and input to the backbone network to extract the corresponding feature tensor. After that, the feature tensor is decoded to predict each character in the license plate, and then the license plate recognition result is obtained.
  • the length of the CTC output sequence is uncertain, which will result in inaccurate recognition results.
  • the character length in the output recognition result may be longer or shorter than the character length contained in the license plate.
  • CTC cannot use the prior information of the license plate. For example, take the license plate of a car in mainland China as an example, the first character in the license plate is the Chinese abbreviation of the province (municipality, autonomous region), and the corresponding character category can be a set of Chinese abbreviations , the second character is an uppercase English letter, and its corresponding character category can be a set of uppercase English letters.
  • the character category corresponding to each character is the same, and the character category includes a set of Chinese abbreviations, A set of uppercase English letters and a set of numbers from 0 to 9, etc.
  • the CTC needs to perform prediction in the above character categories during decoding, which increases the difficulty of CTC in predicting characters.
  • CTC only supports the prediction of a single direction sequence, and it cannot recognize double-layer license plates (that is, license plates with upper and lower characters).
  • the embodiments of the present application provide a license plate character recognition method, so as to accurately recognize the license plate in the image when the photographing device moves, obtain a fixed-length recognition result, and also recognize the double-layer license plate and predict the difficulty. low, easy to implement.
  • the license plate character recognition method provided by the embodiment of the present application may be performed by a license plate character recognition device, and the license plate character recognition device may be implemented by means of software and/or hardware, and the license plate character recognition device may be composed of two or more physical entities, It can also be a physical entity.
  • the license plate character recognition device is a robot as an example for description, wherein the robot is configured with a moving device and a photographing device.
  • the specific structures and working modes of the mobile device and the photographing device are not limited in the embodiments. It is understandable that the license plate character recognition device may also be a mobile phone, a computer or other devices.
  • the license plate character recognition device may also not have a mobile device and/or a photographing device. When the license plate character recognition device does not have a photographing device, it can acquire and process images captured by an external photographing device.
  • FIG. 3 is a flowchart of a license plate character recognition method according to an embodiment of the present application.
  • the license plate character recognition method specifically includes:
  • Step 110 Acquire at least one target image.
  • the target image displays a license plate to be recognized, and the license plate to be recognized contains multiple characters.
  • the license plate to be recognized refers to the license plate that needs to be recognized currently, and contains multiple characters. Taking the license plate of a car in mainland China as an example, it contains 7 characters or 8 characters, of which 8 characters are new energy license plates, and 7 characters are ordinary license plates.
  • the image containing the license plate to be recognized is called the target image.
  • the target image is photographed by a photographing device, wherein the embodiment of the mechanism used by the photographing device to trigger photographing is not limited.
  • the target image may be one or more.
  • a target image is used as an example for description.
  • the target images captured by the photographing device may have the same resolution or different resolutions.
  • the target images have the same resolution as an example for description. In this case, the captured target images have a fixed size. .
  • an image captured by the photographing device when an image captured by the photographing device is acquired, first perform license plate detection on the image to determine whether it contains a license plate; Determined to be the license plate to be recognized.
  • the embodiment of detecting whether the image contains a license plate is not limited. For example, an edge detection method is used to detect whether the image contains the edge of the license plate, and then it is determined whether it contains the license plate. For another example, whether the image contains the color of the license plate is detected by the method of color segmentation, And then determine whether the license plate is included.
  • the license plate to be recognized in the photographed target image may have problems such as motion, shaking and blurring.
  • the license plate to be recognized in the target image is aligned, so that the license plate to be recognized after the alignment process is located in a rectangular pixel area.
  • the method further includes: mapping the license plate to be recognized in the target image to the set pixel coordinate area.
  • the coordinates of the bounding box (bounding box) of the detected license plate to be recognized and the coordinates of the key points are output, wherein the bounding box refers to a rectangular box containing the license plate to be recognized, and the boundary
  • the coordinates of the box refer to the pixel coordinates of the bounding box in the target image.
  • the key points refer to the upper left vertex, the lower left vertex, the upper right vertex and the lower right vertex of the license plate to be recognized in the bounding box. For example, the four key points shown in Figure 2.
  • the coordinates of the keypoints refer to the pixel coordinates of the keypoints in the target image.
  • affine or perspective transformation is performed on each pixel in the license plate to be recognized according to the coordinates of the bounding box and the coordinates of the key points.
  • the transformed license plate to be recognized has a uniform size.
  • the pixel coordinates of the four key points of the transformed license plate to be recognized in the target image are fixed, and the area between the four key points is used as the area to be recognized.
  • the pixel coordinate area of the license plate is recognized.
  • the transformed license plate to be recognized is located in the pixel coordinate area.
  • the pixel coordinate area is a rectangular area.
  • the pixel coordinates of the four key points of the transformed license plate to be recognized in the target image are (3,9), (3,91), (29,91) and (29,9) respectively.
  • the height and width of the pixel coordinate area are 32 and 100, respectively.
  • the affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, and the affine transformation includes translation transformation, rotation transformation, scaling transformation, tilt transformation, and flip transformation.
  • Perspective transformation refers to using the condition that the perspective center, the image point and the target point are collinear, and according to the law of perspective rotation, the shadow-bearing surface (perspective surface) is rotated around the trace (perspective axis) by a certain angle, destroying the original projection light.
  • Harness a transformation that still preserves the projected geometry on the shadow-bearing surface.
  • the parameters (such as translation parameters, rotation parameters) used in the transformation are determined by the coordinates before and after transformation of the four key points, as well as the coordinates and pixel coordinates of the bounding box.
  • the pixel points are subjected to affine or perspective transformation to map the license plate to be recognized to the set pixel coordinate area.
  • the target images used subsequently are all images that map the to-be-recognized license plate to the set pixel coordinate area. At this time, the to-be-recognized license plate in the target image has a fixed size.
  • Step 120 using the backbone network to process the target image to obtain the feature tensor of the license plate to be recognized.
  • a license plate character recognition model is used to realize the recognition of license plate characters. It can be understood that the license plate character recognition model used in the embodiment is a trained neural network model, and the embodiment of the training process is not limited.
  • the license plate character recognition model includes a backbone network. Among them, the backbone network (Backbone) is used to identify the features of the license plate to be recognized in the target image, and output the feature tensor.
  • the specific structural embodiment of the backbone network is not limited, for example, a convolutional neural network (CNN), a residual network (ResNet) or a lightweight network (MobileNet) for the mobile terminal is used.
  • the backbone network after the target image is input to the backbone network, performs feature extraction on the license plate to be recognized in the target image, and outputs a feature tensor.
  • the feature tensor refers to the high-dimensional data representing the features of the license plate to be recognized, and the features of the license plate to be recognized include color features, texture features, shape features, and spatial relationship features of the license plate to be recognized. It should be noted that a value in the feature tensor may be considered as an element, and an element may represent a feature pixel, where the feature pixel refers to a pixel describing a feature.
  • a four-dimensional tensor when the target image is input to the backbone network, a four-dimensional tensor can be optionally input, wherein the input four-dimensional tensor can be represented by N*H*W*C, where N represents the number of samples of the target image (that is, the input number of target images), H represents the height of the target image, W represents the width of the target image, and C represents the number of channels of the target image.
  • the height and width refer to the pixel height (the number of pixels contained in the vertical direction) and the pixel width (the number of pixels contained in the horizontal direction) of the target image, and the number of channels is determined according to the color standard of the target image.
  • the image is an RGB image, so the number of channels is 3, corresponding to R channel, G channel and B channel respectively.
  • the RGB image is an image of various colors obtained by changing the three color channels of red (R), green (G), and blue (B) and superimposing them on each other.
  • the backbone network After the backbone network processes the target image, it outputs a four-dimensional feature tensor, that is, a feature tensor of N'*H'*W'*C'.
  • N' in the feature tensor corresponds to N, which represents the number of samples of the input target image
  • H', W', and C' represent the height, width, and number of channels of the feature tensor, respectively. It can be understood that the specific values of H', W', and C' can be determined during the training process of the license plate character recognition model.
  • the number of W' is greater than or equal to the maximum number of characters contained in the license plate, so as to facilitate the processing of the subsequent classifier.
  • the license plate of a car in mainland China contains at most 8 characters. Therefore, W' ⁇ 8 to ensure subsequent classification.
  • the computer can accurately identify each character.
  • the maximum number of characters contained in the license plate is taken as the total number of currently recognizable characters. For example, if the maximum number of characters contained in the license plate is 8, it means that at most 8 characters can be recognized at the same time.
  • the width of the feature tensor is set to be greater than or equal to the target number, and the target number is the total number of recognizable characters.
  • Step 130 Input the feature tensor into each feature fusion network, and obtain the feature vector of each character through the feature fusion network, and each feature fusion network outputs a feature vector.
  • the license plate character recognition model further includes multiple feature fusion networks.
  • the feature fusion network is used to fuse the features contained in the feature tensor to obtain the feature vector of the corresponding character.
  • the dimension of the feature vector is smaller than the dimension of the feature tensor.
  • the feature fusion network outputs the feature vector of H"*W"*C", where H", W" , C" represent the height, width and number of channels of the feature vector, respectively.
  • H" and W" are 1, and C" and C' are equal.
  • the number of feature fusion networks is equal to the number of targets, and each feature fusion network is used to output a feature vector of a character, that is, each feature vector is used to represent a feature of a corresponding character.
  • the feature tensor is input into 8 feature fusion networks respectively, so that the 8 feature fusion networks output 8 feature vectors, each of which represents the characteristics of the corresponding characters in the license plate.
  • the target number is the total number of identifiable characters, not the total number of characters in the license plate to be recognized. For example, if the target number is 8, and the license plate to be recognized contains 7 characters, then this step is composed of 8 feature fusion networks. Output 8 feature vectors. At this time, the feature vector corresponding to the 8th character will be recognized as empty by the subsequent classifier.
  • the feature fusion network adopts a spatial attention mechanism, and through the spatial attention mechanism, the features that need to be focused on can be determined in the feature tensor, and then the corresponding feature vector is output, wherein the focused feature is specifically used to describe the corresponding feature. character characteristics.
  • a Convolutional Block Attention Module (CBAM) is used as a feature fusion network to implement a spatial attention mechanism.
  • CBAM Convolutional Block Attention Module
  • obtaining the feature vector of each character through the feature fusion network includes steps 131-134:
  • Step 131 Perform a maximum pooling operation and an average pooling operation on the feature tensors, respectively, to obtain a maximum pooling feature tensor and an average pooling feature tensor, respectively.
  • Pooling refers to the process of abstracting information, which can achieve dimensionality reduction, downsampling, removal of redundant information, and feature compression.
  • the max pooling operation refers to taking the point with the largest value in the local receptive field, for example, determining the matrix size of the tree pool for performing max pooling, and then dividing the feature tensor into multiple sub-regions according to the size of the matrix. , each sub-region is a local receptive field, after that, the maximum value of each sub-region is taken to represent the sub-region, and each obtained maximum value is formed into a new tensor to realize the maximum pooling operation.
  • the texture features in the feature tensor are preserved through the max pooling operation.
  • a new tensor formed by each maximum value is recorded as a maximum pooling feature tensor.
  • the average pooling operation averages all values in the local receptive field.
  • the mean pooling operation is similar to the max pooling operation, except that the average value of each subregion is taken to represent the subregion to form a new tensor.
  • the overall features of the feature tensor are preserved through the average pooling operation.
  • the new tensor composed of each average value is recorded as the average pooling feature tensor.
  • the size of the tree-pooling matrix used in the max-pooling operation and the average-pooling operation is the same.
  • both the maximum pooling feature tensor and the average pooling feature tensor are feature tensors with a channel number of 1.
  • Step 132 Obtain a fusion feature tensor according to the maximum pooling feature tensor and the average pooling feature tensor.
  • the max-pooled feature tensor and the average-pooled feature tensor are fused to simultaneously reference texture features and overall features in subsequent processing.
  • a tensor is still obtained after the maximum pooling feature tensor and the average pooling feature tensor are fused.
  • the tensor obtained by fusion is recorded as a fusion feature tensor.
  • the method used in the fusion can be set according to the actual situation.
  • the fusion of the maximum pooled feature tensor and the average pooled feature tensor is described in two ways: connection and summation. At this point, the following scheme can be applied to this step:
  • Option 1 Connect the maximum pooled feature tensor and the average pooled feature tensor to obtain a fusion feature tensor.
  • the function of concatenate is feature association.
  • the maximum pooling feature tensor and the average pooling feature tensor are connected along the channel dimension to obtain the fusion feature tensor.
  • the maximum pooling feature tensor and the average pooling feature tensor are both tensors with a channel number of 1.
  • a fusion feature tensor with a channel number of 2 can be obtained, and the fusion feature tensor has The height and width are consistent with the height and width of the max pooled feature tensor and the average pooled feature tensor.
  • Option 2 Perform element-wise summation of the maximum pooled feature tensor and the average pooled feature tensor to obtain a fusion feature tensor.
  • Element-wise operations mean that each element is computed independently.
  • element-wise summation refers to summing the elements located at the same position in the maximum pooling feature tensor and the average pooling feature tensor.
  • the size of the max pooling feature tensor and the average pooling feature tensor are the same, so the elements at the same position in the max pooling feature tensor and the average pooling feature tensor correspond one-to-one.
  • the elements at the same position are added one by one to obtain the fusion feature tensor.
  • the fused feature tensor, max pooled feature tensor, and average pooled feature tensor have the same number of channels, height and width.
  • the max pooling feature tensor and the average pooling feature tensor are both feature tensors with 1 channel, then the fusion feature tensor is also a feature tensor with 1 channel.
  • Step 133 constructing a spatial attention map according to the fusion feature tensor.
  • the spatial attention map is two-dimensional data, which is used to indicate the features that should be focused locally in the feature tensor, wherein the features that are focused locally are the features of the corresponding characters.
  • the spatial attention map is obtained after element-wise activation of the fused feature tensor.
  • a convolution operation is performed on the fused feature tensor to obtain a fused feature tensor with a channel number of 1. That is, a convolution layer is set, and the number of channels output by the convolution layer is 1, and the fusion feature tensor with the channel number of 1 is obtained after the convolution operation is performed on the fusion feature tensor through the convolution layer.
  • element-wise activation is performed on the fused feature tensor with channel number 1 to obtain the spatial attention map. It can be understood that when the number of channels of the fusion feature tensor itself is 1, the convolution operation can be ignored, and the fusion feature tensor can be directly activated element by element.
  • element-by-element activation means that each element in the fusion feature tensor is processed through an activation function.
  • the type of activation function can be set according to the actual situation, as long as the output value of the activation function is non-negative (that is, each value in the spatial attention map is non-negative).
  • the activation function adopts the sigmoid function, 1+tanh function, etc. in CBAM. It should be noted that each value in the spatial attention map can represent the weight of the feature of the corresponding region in the feature tensor, and when the value is non-negative, it can ensure that the subsequent feature weighting has a clear physical meaning.
  • Step 134 Obtain the feature vector of the corresponding character in the license plate to be recognized according to the spatial attention map and the feature tensor.
  • the features that should be focused on in the feature tensor are determined, and then the feature vectors of the corresponding characters are obtained according to the focused features.
  • this step 134 may include steps 1341-1343:
  • Step 1341 automatically expanding the spatial attention map along the channel dimension to obtain an attention tensor map.
  • an attention map with a channel number greater than 1 can be obtained.
  • the obtained attention map is recorded as an attention tensor map.
  • the automatic expansion of the spatial attention map along the channel dimension can be understood as copying the two-dimensional spatial attention map along the channel dimension, and each copy is considered to be automatically expanded once.
  • the number of automatic expansions is related to the number of channels of the feature tensor.
  • the number of automatic expansions is equal to the number of channels of the feature tensor, so that the number of channels of the attention tensor map is equal to the number of channels of the feature tensor, thereby facilitating Subsequent element-wise multiplication calculations. Understandably, after auto-dilation, only the number of channels changes.
  • the size of the spatial attention map is h*w*1, where h and w are the height and width of the spatial attention map, respectively, and 1 is the number of channels of the spatial attention map.
  • the size of the obtained attention tensor map is h*w*c, where h and w remain unchanged, and c>1.
  • Step 1342 Perform element-by-element multiplication on the feature tensor and the attention tensor map to obtain a multiplication tensor.
  • Element-wise multiplication refers to using the attention tensor map to multiply each element in the feature tensor.
  • each element in the attention tensor map has one or more corresponding elements in the feature tensor
  • each element in the spatial attention map may represent the weight of the feature (ie, one or more elements) of the corresponding region in the feature tensor.
  • element-wise multiplication can be understood as multiplying each element in the feature tensor with the corresponding element in the attention tensor graph, and this process can also be considered as a process of weighting the features.
  • the tensor obtained after element-by-element multiplication is recorded as a multiplication tensor.
  • the size of the multiplication tensor is the same as the size of the feature tensor.
  • the feature tensor is a 4-dimensional feature tensor, which is denoted as X, the size of X is n*h*w*c, the expanded attention tensor map is denoted as W, and the size of W is n*h *w*c, that is, the size of the feature tensor and the attention tensor map is equal, and the multiplication tensor obtained after element-wise multiplication is denoted as M.
  • X[i,j,k,l] represents the element at position (i,j,k,l) in X, where the specific values of i, j, k, l are included in n, h, w, c.
  • W[i,j,k,l] represents the element in W whose position is (i,j,k,l)
  • M[i,j,k,l] represents the element in M whose position is (i,j,k,l) )Elements.
  • the multiplication tensor is obtained by multiplying the elements at the same position in the feature tensor and the attention tensor graph.
  • Step 1343 After summing the multiplication tensor along its own width dimension and height dimension, the feature vector of the corresponding character in the license plate to be recognized is obtained.
  • the elements in each channel in the multiplication tensor are summed along the width dimension and the height dimension, wherein the summation along the width dimension and the height dimension can be understood as summing the elements in the same channel and under the same width, and then adding the sum along the width dimension and the height dimension. Add the sum values of the same height, that is, add the elements in the same channel.
  • a vector whose height and width are both 1 and the number of channels is constant can be obtained. In the embodiment, the obtained vector is recorded as the feature vector of the corresponding character.
  • the size of the multiplication tensor is h'*w'*c', where h', w', and c' represent the height, width, and number of channels of the multiplication tensor, respectively.
  • 1*1*c' that is, a feature vector of size 1*c'
  • the dimension of the feature vector may be reduced through a fully connected layer to reduce the number of channels of the feature vector.
  • the fully connected layer can realize the synthesis of features. It can be understood that since the number of channels of the feature tensor and the feature vector are equal, in order to avoid the situation that the number of channels in the feature vector is too large, the number of channels of the output feature tensor can also be adjusted during the training process of the backbone network to obtain A feature tensor with a smaller number of channels is obtained, and then a feature vector with a smaller number of channels is obtained.
  • the feature fusion network may include a maximum pooling layer (used for maximum pooling operation), an average pooling layer (used for average pooling operation), and a feature fusion layer (used for feature connection) or element-by-element addition), convolution layer (optional, when the fusion feature tensor is 1, it may not be set), activation function layer (for obtaining spatial attention map), automatic expansion part (for automatic expansion) , an element-wise multiplication section (for doing element-wise multiplication), a summing section (for summing along the width and height dimensions).
  • the feature vector of each character can be obtained respectively.
  • the feature fusion network may also use a learnable mask mechanism.
  • the learnable mask mechanism refers to the introduction of a mask parameter (mask).
  • the function of the mask parameter is similar to that of the spatial attention map.
  • the mask parameter can be used to adjust some features in the feature tensor (that is, the non-corresponding feature).
  • the feature of the character) is occluded, so that the feature fusion network only pays attention to the feature of the corresponding character in the feature tensor, and then obtains the feature vector.
  • the learnable mask mechanism when the learnable mask mechanism is adopted, the feature tensor is input into each feature fusion network, and the feature vector of each character is obtained through the feature fusion network, including steps 135-136:
  • Step 135 Obtain mask parameters corresponding to each feature fusion network.
  • the mask parameter refers to a learnable parameter, which is optimized in the training process of the license plate character recognition model, that is, the mask parameter is introduced when the license plate character recognition model is trained, and the mask parameter is made to follow the license plate character recognition model.
  • the training process is optimized so that the optimized mask parameters can block the features of non-corresponding characters in the feature tensor.
  • Each feature fusion network corresponds to a mask parameter, and each mask parameter has the same size.
  • the size of the mask parameter is H'*W'*1, wherein the height and width of the mask parameter are equal to the height and width of the feature tensor, and the number of channels is 1.
  • the mask parameter is also a non-negative parameter.
  • a truncation operation is performed on the mask parameter. Negative numbers in the parameters are set to zero to ensure that the mask parameters of each update are non-negative parameters. It is understandable that when training the license plate character recognition model, the mask parameters are updated at the same time when the model parameters are updated.
  • softmax normalization refers to limiting the data to a certain value.
  • softmax normalization refers to normalization through the softmax function.
  • the softmax function can limit the data between (0, 1).
  • the mask can be
  • the code parameter is a non-negative parameter.
  • Step 136 Input the mask parameters and the feature tensor to the corresponding feature fusion network, so as to obtain the feature vector of the corresponding character through the feature fusion network.
  • the feature fusion network can focus only on the features of the corresponding characters in the feature tensor according to the mask parameters, and then output the corresponding feature vector.
  • the process of processing mask parameters and feature tensors by the feature fusion network is the same as the process of processing spatial attention maps and feature tensors in the above-mentioned embodiment, that is, after automatically expanding the mask parameters along the channel dimension, the feature tensors and The automatically expanded mask parameters are multiplied element-by-element, and then the tensors obtained after element-by-element multiplication are summed along the width dimension and the height dimension to obtain the feature vector of the corresponding character.
  • the specific process refer to the above embodiment.
  • Step 140 Input each feature vector into the corresponding classifier respectively, use the classifier to obtain the character recognition result of each character, and each classifier outputs a character recognition result.
  • the license plate character recognition model further includes a plurality of classifiers.
  • the number of classifiers is equal to the number of feature fusion networks, which are the number of targets.
  • each classifier corresponds to a character, which is used to determine the content of the character in the category space according to the feature vector, and then obtain the character recognition result.
  • the character recognition result refers to the specific content of the character.
  • the category space is a set of selectable contents of corresponding characters, which can be used as prior information.
  • the entire content that can appear in the character at the current position in the license plate is obtained to form a category space corresponding to the character, and is associated with the corresponding classifier.
  • the first character is the Chinese abbreviation of the province (municipalities, autonomous regions). Therefore, the collection of Chinese abbreviations of provinces (municipalities and autonomous regions) can be used as the category space, and each Chinese abbreviation As an element in the category space, at this time, the category space has a total of 31 elements.
  • the second character is an uppercase English letter. Since the English letters I and O are not used for the license plate of Continental cars, the category space corresponding to the second character can be composed of the remaining 24 uppercase English letters except I and O.
  • the category space corresponding to the third character to the sixth character can include 10 Arabic numerals (numbers 0 to 9) and 24 uppercase English letters.
  • the category space of the seventh character also includes five Chinese characters of "Hong Kong”, “Macao”, “Police”, “Xue” and “Hang”, so that the license plate characters
  • the recognition model can recognize the license plates of Hong Kong cars, Macau cars, police cars, coach cars of driving schools and trailers that pass through the mainland.
  • the category space of the eighth character includes 10 Arabic numerals, 24 uppercase English letters, "Hong Kong”, “Macao”, “Police”, “Study”, “Hanging” and “Empty”.
  • "empty” means that the corresponding character is empty, that is, the license plate to be recognized has 7 characters, and the 8th character does not exist.
  • the classifier selects the closest element in the category space as the character recognition result according to the feature vector.
  • the specific structure of the classifier can be set according to the actual situation.
  • the classifier includes a fully connected layer and an activation function layer as an example for description.
  • using the classifier to obtain the character recognition result of each of the characters includes steps 141-142:
  • Step 141 Determine the logarithmic vector of the feature vector by using the fully connected layer of the classifier in combination with the corresponding category space, and each classifier corresponds to a category space.
  • the classifier implements the classification through the softmax function, and the vector input to the softmax function during the classification process of the softmax function is a logits vector, which is obtained through the fully connected layer.
  • the number of fully connected layers may be one or more. In the embodiment, one fully connected layer is used as an example for description.
  • the feature vector is input into the fully connected layer, and the fully connected layer determines the logits value of each element in the category space as the specific content of the character according to the feature vector.
  • the logits value can be understood as when the output of the fully connected layer does not pass the softmax function, use It is used to describe the value of the probability that the element belongs to the specific content of the character, and then the logits values are arranged to output a logarithmic vector.
  • the logarithmic vector is a 1*n vector, where the value of n is determined by the number of elements in the category space.
  • the category space corresponding to the second character of a license plate in mainland China contains 24 uppercase English letters (the letters I and O are not used for license plates), so the logarithmic vector output by the fully connected layer of the corresponding classifier is 1*24 , that is, each classifier currently receives a 1*C" feature vector and outputs a 1*24 logarithmic vector.
  • Step 142 Use the loss function of the classifier to predict the probability vector of the logarithmic vector, and obtain the character recognition result of the corresponding character according to the probability vector.
  • the loss function is the softmax function, and the 1*n logarithmic vector is passed through the softmax function to output a 1*n vector.
  • the meaning of the vector is the probability that the specific content of the character belongs to the corresponding element in the classification space. Therefore, the embodiment
  • the vector output by the softmax function is recorded as a probability vector. Specifically, the higher the value in the probability vector, the higher the probability that the element corresponding to the value in the category space is the corresponding character. Further, the element with the largest corresponding probability in the category space is selected as the character recognition result of the corresponding character according to the probability vector. It is understandable that the character recognition result can be empty. For example, the current license plate has 7 characters in total, and the license plate character recognition model can recognize 8 characters. Then, the character recognition result corresponding to the 8th character is "null".
  • the character recognition results of each character are arranged in sequence as the license plate recognition result of the license plate to be recognized.
  • Each character in the license plate to be recognized corresponds to a feature fusion network and a classifier, which can ensure a fixed-length character recognition result.
  • the difficulty of prediction can be reduced and the prediction accuracy can be improved.
  • by adding "empty" Chinese characters in the category space it is possible to recognize license plates with different numbers of characters, such as the recognition of 7-character license plates and 8-character license plates, which increases the reusability of the technical solution. , no need to train an additional license plate character recognition model.
  • the license plate may include a single-layer license plate and a double-layer license plate.
  • the license plate shown in FIG. 1 is a single-layer license plate
  • the license plate shown in FIG. 4 is a double-layer license plate.
  • 4 is a schematic diagram of a double-layer character license plate provided by the embodiment of the present application. Referring to Figure 4, the license plate is a double-layered character, including upper and lower layers. It should be noted that the first three characters of the lower layer in Fig. 4 have been blurred to avoid information leakage.
  • a single-layer license plate is recorded as a single-layer character license plate
  • a double-layer license plate is recorded as a double-layer character license plate.
  • step 210 is further included:
  • Step 210 Perform feature selection operations on the feature tensors, respectively, to obtain the first partition tensor and the second partition tensor of the feature tensor, where the first partition tensor corresponds to the upper character of the double-layer character license plate, and the second partition The tensor corresponds to the lower-level characters of a double-layer character license plate.
  • the height in the feature tensor obtained through the backbone network should be greater than 1 to avoid confusion between the features of the two layers of characters.
  • the backbone network extracts the feature tensor of the license plate to be recognized, the position of the feature corresponding to each character in the feature tensor is similar to the position of the character in the license plate to be recognized.
  • the feature of the upper layer character is at the top of the feature tensor, The features of the lower-level characters are located at the bottom of the feature tensor.
  • the feature tensor is divided into two areas, the top part corresponds to a partition, which is the feature tensor corresponding to the upper characters in the double-character license plate, and the bottom part corresponds to a partition, which is the middle and lower characters of the double-character license plate corresponding feature tensor.
  • the partition corresponding to the top part is recorded as the first partition tensor
  • the partition corresponding to the bottom part is recorded as the second partition tensor.
  • the dividing line between the first partition tensor and the second partition tensor may be determined according to the height of the pixel coordinate area where the double-layer character license plate is located.
  • a rounding operation is performed to ensure the height of the first partition tensor and the second partition tensor.
  • the implementation of the rounding operation can be set according to the actual situation.
  • the set parameter is a
  • the height of the feature tensor is H'.
  • round(aH') The integer dividing line is obtained by means of +1, where round() means rounding, that is, after rounding aH', add 1 to obtain the dividing line of the integer, or obtain the dividing line of the integer by means of floor(aH')+1 , where floor() means rounding down, that is, rounding down aH' and then adding 1 to obtain the dividing line of the integer, or obtaining the dividing line of the integer by ceil(aH'), where ceil ( ) means rounding up, that is, rounding up aH' to obtain the dividing line of the integer.
  • a feature selection operation is added to the license plate character recognition model, so as to slice the feature tensor through the feature selection operation, thereby obtaining the first partition tensor and the second partition tensor.
  • the feature tensor After slicing the feature tensor through the feature selection operation, it is divided into two layers, upper and lower layers, the upper layer is the first partition tensor, and the first partition tensor is the feature corresponding to each character in the first layer of the license plate.
  • Tensor, the next layer is the second partition tensor, and the second partition tensor is the feature vector corresponding to each character in the second layer of the license plate.
  • a feature selection operation is performed on the feature tensor before the feature tensor is input into the feature fusion network.
  • the number of feature selection operations is equal to the number of feature fusion networks.
  • the first partition tensor is selected from the tensor and input to the feature fusion network corresponding to the upper-level character.
  • Another part of the feature selection operation selects the second partition tensor from the feature tensor and input it to the lower-level character.
  • the corresponding feature fusion network is performed before the feature tensor is input into the feature fusion network.
  • the current feature selection operation may be preset for selecting the first partition tensor or the second partition tensor, and after slicing the feature tensor according to the boundary line, the corresponding partition tensor may be output.
  • the error may cause the upper-layer characters to be positioned lower or the lower-layer characters to be positioned higher.
  • the The first partition tensor does not completely cover the features of the upper-level characters or the second partition tensor does not completely cover the features of the lower-level characters, thereby affecting the accuracy of subsequent processing.
  • a partition tensor that overlaps between the first partition tensor and the second partition tensor is set.
  • the overlapping partition tensor means that there are some elements that are the same in the first partition tensor and the second partition tensor, and the element is usually a feature pixel in the middle area of the upper-level character and the lower-level character, wherein the overlapping partition tensor
  • the height can be set according to the actual situation. For example, the height of the first partition tensor is set to [0-3], and the height of the second partition tensor is [2-6]. There are overlapping elements in [2-3] heights between the two-partition tensors.
  • the method further includes: using the backbone network to downsample the feature tensor.
  • the manner and number of downsampling can be determined according to the height of the feature tensor. For example, when the height of the feature tensor is 48, a feature tensor with a height of 12 is obtained after downsampling twice by means of average pooling. For another example, a feature tensor with a height of 6 is obtained after downsampling three times by using the maximum pooling method. It can be understood that, at this time, the feature tensor output by the backbone network refers to the down-sampled feature tensor.
  • inputting the feature tensor to each feature fusion network includes: inputting the first partition tensor to the each feature fusion network, and input the second partition tensor to each feature fusion network corresponding to the lower-level characters.
  • the feature fusion network is divided into a feature fusion network corresponding to upper-layer characters and a feature fusion network corresponding to lower-layer characters.
  • the feature fusion network of upper-layer characters can obtain the corresponding feature vector of upper-layer characters, and the feature fusion of lower-layer characters.
  • the network can obtain the feature vector corresponding to the lower-level character.
  • the number of feature fusion networks corresponding to the upper-layer characters is equal to the number of characters contained in the upper-layer characters.
  • the double-layer character license plate of a car in mainland China there are 2 upper-layer characters. Therefore, set 2 feature fusion networks corresponding to the upper-layer characters to receive the first partition tensor, and there are 5 lower-layer characters. Therefore, set The five feature fusion networks correspond to lower-level characters and are used to receive the second partition tensor.
  • the recognition of the double-layer character license plate can be realized.
  • the feature tensor is down-sampled and then segmented through the feature selection operation, which can ensure that the feature selection operation can segment the feature tensor with a smaller height, and reduce the difficulty of segmenting the feature selection operation.
  • FIG. 5 is a schematic structural diagram of a license plate character recognition model provided by an embodiment of the application.
  • the license plate character recognition model shown in Figure 5 can recognize single-layer character license plates.
  • the license plate character recognition model includes a backbone network (Backbone), 8 feature fusion networks (FeatFuse1-FeatFuse8) and 8 classifiers (Classilier1-Classilier8). License plate, among them, the category space corresponding to the first classifier contains 31 elements, which are the abbreviations of provinces (except Taiwan province), municipalities directly under the Central Government and autonomous regions, and the category space corresponding to the second classifier contains 24 elements.
  • the category space corresponding to the third to seventh classifiers contains 34 elements, which are Arabic numerals 0-9 and 24 uppercase English letters except I and O.
  • the category space corresponding to the eighth classifier contains 35 elements. Compared with the category space corresponding to the third to seventh classifiers, there are more Chinese characters "empty", so that the license plate character recognition model can recognize the license plate with 7 characters. or 8 character license plate.
  • FIG. 6 is a target image provided by an embodiment of the application, and the license plate to be recognized is “Jing P***27”, and there are 7 characters in total. In order to avoid information leakage, the license plate to be recognized is The third to fifth characters are blurred. It is understandable that the license plate to be recognized in the target image has been mapped to the set pixel coordinate area.
  • the license plate character recognition model specifically input a tensor of 1*32*100*3, that is, input a target image with a height of 32, a width of 100, and a number of channels of 3.
  • the backbone network processes this target image to output a feature tensor of 1*1*25*512*.
  • FIG. 7 is an example diagram of a generation process of a spatial attention map provided by an embodiment of the present application.
  • the height of the feature tensor is 1, the width is 25, and the number of channels is 512.
  • the maximum pooling (MP) operation and the average pooling (AP) operation are performed on the feature tensor 13 respectively.
  • the height of the maximum pooling feature tensor and the average pooling feature tensor are both 1, the width is 25, and the number of channels is 1. Both are 1.
  • the fusion feature tensor The height is 1, the width is 25, and the number of channels is 2.
  • the convolution (Cnov) and the activation function (Act) are operated to obtain the spatial attention map 17, where the height of the spatial attention map is 1 and the width is 25.
  • FIG. 8 is another example flow chart of generating a spatial attention map provided by an embodiment of the present application.
  • the maximum pooling (MP) operation and the average pooling (AP) operation are respectively performed on the feature tensor 13 to obtain the maximum pooled feature tensor 14 with the channel number of 1 and the average
  • the pooled feature tensor is 15, in which the height of the maximum pooled feature tensor and the average pooled feature tensor are both 1, the width is 25, and the number of channels is 1.
  • the fusion feature tensor 18 with the channel number of 1 is obtained in this way, in which the height of the fusion feature tensor is 1, the width is 25, and the number of channels is 1.
  • the spatial attention map 19 where the height of the spatial attention map is 1, the width is 25, and the number of channels is 1. Understandably, the currently used convolution operation is optional.
  • a 1 ⁇ 512 feature vector is obtained according to the spatial attention map and feature tensor.
  • the 8 feature vectors are respectively input into the 8 classifiers to output the corresponding character recognition results through the classifiers, wherein the character recognition result output by the first classifier is "Jing", and the second classifier output The character recognition result is "P”...
  • the character recognition result output by the seventh classifier is "7", and the character recognition result output by the eighth classifier is "null”.
  • FIG. 9 is another target image provided by the embodiment of the application, and the license plate to be recognized is “Guangdong B***710”, which has 8 characters in total. The third to fifth characters in the license plate are blurred. It is understandable that the license plate to be recognized in the target image has been mapped to the set pixel coordinate area.
  • the license plate character recognition model specifically input a tensor of 1*32*100*3, that is, input a target image with a height of 32, a width of 100, and a number of channels of 3.
  • the backbone network processes this target image to output a feature tensor of 1*1*25*512*.
  • the feature tensors are input to 8 feature fusion networks to obtain 8 1 ⁇ 512 feature vectors respectively.
  • input the 8 feature vectors into the 8 classifiers respectively, so as to output the corresponding character recognition results through the classifiers.
  • the character recognition result is "B”...
  • the character recognition result output by the seventh classifier is "1”, and the character recognition result output by the eighth classifier is "0".
  • FIG. 10 is a schematic structural diagram of another license plate character recognition model provided by an embodiment of the present application.
  • the license plate character recognition model shown in Figure 10 can recognize double-layer character license plates.
  • the license plate character recognition model includes a backbone network (Backbone), 7 feature fusion networks (FeatFuse1-FeatFuse1) and 7 classifiers (Classifier1-Classifier7) , and set the feature selection operation in the license plate character recognition model, where the feature selection operation that outputs the first partition tensor is recorded as set FeatSelectA, at this time, the first feature fusion network and the second feature fusion network use FeatSelectA, The feature selection operation that outputs the second partition tensor is recorded as set FeatSelectB, and the third feature fusion network to the seventh feature fusion network adopts FeatSelectB.
  • Backbone backbone
  • 7 feature fusion networks FeatFuse1-FeatFuse1
  • Classifier1-Classifier7 7 classifiers
  • the feature selection selection and feature fusion network are shown together in Figure 10, where the license plate character recognition model is used to recognize license plates of cars in mainland China.
  • the category space corresponding to the first classifier contains 31 elements, which are the abbreviations of provinces (except Taiwan province), municipalities directly under the Central Government and autonomous regions, and the category space corresponding to the second classifier contains 24 elements, which are excluding I and O. 24 uppercase English letters of The category space of includes 39 elements, which are Arabic numerals 0-9, 24 uppercase English letters except I and O, and "hang".
  • FIG. 11 is another target image provided by the embodiment of the application, and the license plate to be recognized is “Jing A***3 Hanging”, which has a total of 7 characters.
  • the license plate to be recognized is The third to fifth characters in the license plate are blurred. It is understandable that the license plate to be recognized in the target image has been mapped to the set pixel coordinate area.
  • the backbone network processes this target image to output a feature tensor of 1*6*25*512.
  • FIG. 12 is a schematic diagram of feature selection provided by an embodiment of the present application, which is a schematic diagram of a feature selection operation when a feature tensor is segmented.
  • the feature tensor is divided into the first partition tensor A of [0-a*6]*25*512 and [a*6-h] *25*512 second partition tensor B, where h is 6. It can be understood that, for the processing process of the feature fusion network, reference may be made to the above examples.
  • the character recognition result output by the first classifier is "Jing"
  • the second classifier outputs
  • the character recognition result is "A”
  • the character recognition result output by the seventh classifier is "hang”.
  • FIG. 13 is a schematic structural diagram of a license plate character recognition device provided by an embodiment of the application.
  • the license plate character recognition device includes an image acquisition module 301, a feature tensor determination module 302, a feature vector determination module 303, and a recognition result determination module 303.
  • Module 304 is a schematic structural diagram of a license plate character recognition device provided by an embodiment of the application.
  • the license plate character recognition device includes an image acquisition module 301, a feature tensor determination module 302, a feature vector determination module 303, and a recognition result determination module 303.
  • Module 304 is a schematic structural diagram of a license plate character recognition device provided by an embodiment of the application.
  • the license plate character recognition device includes an image acquisition module 301, a feature tensor determination module 302, a feature vector determination module 303, and a recognition result determination module 303.
  • Module 304 is a schematic structural diagram of a license plate character recognition device provided by an embodiment of the application.
  • the license plate character recognition device includes an
  • the image acquisition module 301 is used to acquire at least one target image, the target image displays the license plate to be recognized, and the license plate to be recognized contains multiple characters;
  • the feature tensor determination module 302 is used to process the target image by using the backbone network to obtain Obtain the feature tensor of the license plate to be recognized;
  • the feature vector determination module 303 is used to input the feature tensor into each feature fusion network, and obtain the feature vector of each character through the feature fusion network, and each feature fusion network outputs a feature vector ;
  • the recognition result determination module 304 is used to input each feature vector into the corresponding classifier, and use the classifier to obtain the character recognition result of each character, and each classifier outputs a character recognition result.
  • the feature vector determination module 303 includes: a tensor input unit for inputting feature tensors into each feature fusion network; a pooling unit for performing maximum pooling operations on the feature tensors respectively and average pooling operations to obtain maximum pooled feature tensors and average pooled feature tensors, respectively; fusion unit, used to obtain fused feature tensors based on maximum pooled feature tensors and average pooled feature tensors; attention map The acquisition unit is used to construct the spatial attention map according to the fusion feature tensor; the vector determination unit is used to obtain the feature vector of the corresponding character in the license plate to be recognized according to the spatial attention map and the feature tensor.
  • the fusion unit is specifically used for: connecting the maximum pooling feature tensor and the average pooling feature tensor to obtain a fusion feature tensor; or; combining the maximum pooling feature tensor and the average pooling feature tensor The fused feature tensor is summed element-wise to obtain a fused feature tensor.
  • the vector determination unit includes: an automatic expansion subunit for automatically expanding the spatial attention map along the channel dimension to obtain an attention tensor map; a multiplication subunit for comparing the feature tensor and attention The multiplication tensor is obtained after element-wise multiplication of the tensor graph; the summation subunit is used to sum the multiplication tensor along its own width dimension and height dimension to obtain the feature vector of the corresponding character in the license plate to be recognized.
  • the feature vector determination module 303 includes: a mask acquisition unit for acquiring mask parameters corresponding to each feature fusion network; a mask input unit for inputting mask parameters and feature tensors into to the corresponding feature fusion network to obtain the feature vector of the corresponding character through the feature fusion network.
  • the mask parameter is a non-negative parameter.
  • the recognition result determination module 304 includes: a vector input unit for inputting each feature vector into the corresponding classifier respectively; a logarithmic vector determination unit for using the fully connected layer of the classifier The logarithmic vector of the feature vector is determined in combination with the corresponding category space, and each classifier corresponds to a category space; the probability vector determination unit is used to predict the probability vector of the logarithmic vector by using the loss function of the classifier, and according to the probability vector Obtain the character recognition result of the corresponding character.
  • the license plate to be recognized is a double-layer character license plate; the device further includes: a partition module 305, which is used to perform feature selection on the feature tensor before inputting the feature tensor into each feature fusion network. operation to obtain the first partition tensor and the second partition tensor of the feature tensor, respectively, the first partition tensor corresponds to the upper-level characters of the double-layer character license plate, and the second partition tensor corresponds to the lower-level characters of the double-layer character license plate .
  • a partition module 305 which is used to perform feature selection on the feature tensor before inputting the feature tensor into each feature fusion network. operation to obtain the first partition tensor and the second partition tensor of the feature tensor, respectively, the first partition tensor corresponds to the upper-level characters of the double-layer character license plate, and the second partition tensor corresponds to the lower-level characters of the double-layer character license plate .
  • the feature vector determination module 303 can be used to input the first partition tensor to each feature fusion network corresponding to the upper character, and input the second partition tensor to each feature fusion network corresponding to the lower character, through the feature fusion network.
  • the feature vector of each character is obtained, and each feature fusion network outputs a feature vector.
  • the feature tensor determination module 302 is further configured to perform down-sampling processing on the feature tensor by using the backbone network.
  • the width of the feature tensor is greater than or equal to the target number, and the target number is the total number of recognizable characters.
  • the device further includes: a mapping module, configured to map the to-be-recognized license plate in the target image to a set pixel coordinate area after acquiring at least one target image.
  • a mapping module configured to map the to-be-recognized license plate in the target image to a set pixel coordinate area after acquiring at least one target image.
  • the license plate character recognition device provided above can be used to execute the license plate character recognition method provided by any of the above embodiments, and has corresponding functions and beneficial effects.
  • the included units and modules are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, The specific names of the functional units are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application.
  • FIG. 14 is a schematic structural diagram of a license plate character recognition device according to an embodiment of the present application.
  • the license plate character recognition device includes a processor 40, a memory 41, an input device 42, an output device 43, a photographing device 44 and a mobile device 45; the number of processors 40 in the license plate character recognition device may be one or more One processor 40 is taken as an example in FIG. 14 .
  • the processor 40, the memory 41, the input device 42, the output device 43, the photographing device 44 and the mobile device 45 in the license plate character recognition device can be connected through a bus or other means.
  • the memory 41 can be used to store software programs, computer-executable programs and modules, such as program instructions/modules corresponding to the license plate character recognition method in the embodiments of the present application (for example, the license plate character recognition device in the license plate character recognition device).
  • the processor 40 executes various functional applications and data processing of the license plate character recognition device by running the software programs, instructions and modules stored in the memory 41, ie, realizes the above-mentioned license plate character recognition method.
  • the memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the license plate character recognition device, and the like. Additionally, memory 41 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some instances, memory 41 may further include memory located remotely from processor 40, and these remote memories may be connected to the license plate character recognition device through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the input device 42 may be used to receive input numerical or character information, and to generate key signal input related to user settings and function control of the license plate character recognition device.
  • the output device 43 may include a display device such as a display screen.
  • the photographing device 44 is used for photographing the target image, and the moving device 45 is used for controlling the license plate character recognition device to move.
  • the license plate character recognition device may also include communication means for data communication with other devices.
  • the above-mentioned license plate character recognition device includes a license plate character recognition device, which can be used to execute any license plate character recognition method, and has corresponding functions and beneficial effects.
  • embodiments of the present application also provide a storage medium containing computer-executable instructions, when executed by a computer processor, the computer-executable instructions are used to execute the relevant information in the license plate character recognition method provided by any embodiment of the present application. operation, and has corresponding functions and beneficial effects.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product.
  • the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects.
  • the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • the present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions.
  • These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory in the form of, for example, read only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Character Discrimination (AREA)

Abstract

一种车牌字符识别方法、装置、设备及存储介质,所述方法包括:获取至少一张目标图像,所述目标图像中显示有待识别车牌,所述待识别车牌中包含多个字符(110);利用主干网络处理所述目标图像,以得到所述待识别车牌的特征张量(120);将所述特征张量输入至各特征融合网络中,并通过所述特征融合网络得到各所述字符的特征向量,每个所述特征融合网络输出一个特征向量(130);将每个所述特征向量分别输入至对应的分类器中,并利用所述分类器得到各所述字符的字符识别结果,每个所述分类器输出一个字符识别结果(140)。采用上述方法可以解决相关技术中在车牌图像质量较低时无法对车牌字符进行准确识别的技术问题。

Description

车牌字符识别方法、装置、设备及存储介质 技术领域
本申请实施例涉及图像处理技术领域,尤其涉及一种车牌字符识别方法、装置、设备及存储介质。
背景技术
车牌是指牌照,也指车辆号牌。车牌识别是指将运动中的车牌从复杂背景中提取并识别出来。车牌识别技术已经被广泛应用到各个领域,如应用在交通领域的高速卡口检测场景,再如应用在安保领域的停车场出入口检测场景。这些场景下,车牌的拍摄装置安装在固定位置,并配套对应的拍照触发机制(如地感触发机制),以实现对车牌的拍摄。之后,采用基于字符分割的车牌识别方法处理拍摄的图像以得到车牌字符的识别结果。其中,基于字符分割的车牌识别方法是指将图像中的车牌按照字符进行分割,以得到多张子图像,每张子图像包含一个字符,之后,分别对每张子图像进行分类,以确定其对应的字符。
虽然上述方法可以实现车牌识别,但是,对拍摄装置的要求比较高,需要拍摄装置固定,以得到清晰的车牌图像。当拍摄装置具有移动性后(如将车牌的拍摄装置安装在可移动的机器人上),其拍摄的图像质量会下降,如包含车牌的图像会出现严重的运动和抖动模糊、角度偏移(未正对车牌进行拍摄)、复杂的光照等问题。举例而言,图1为现有技术提供的一种车牌图像,其是拍摄装置固定时对车牌拍摄后得到的图像,需说明,图1中前4个字符(即“京”“N”“Q”“G”)为正常拍摄效果,后面的字符经过了模糊处理,以避免信息泄露。图2为现有技术提供的另一种车牌图像,其是拍摄装置移动时对车牌拍摄后得到的图像,相比于图1,图2示出的车牌明显模糊。此时,基于字符分割的车牌识别方法难以进行准确的字符分割,导致分割精度降低,进而无法保证后续识别字符的过程正常进行。
综上,在车牌图像质量较低时如何对车牌字符进行准确的识别成为了亟需解决的技术问题。
发明内容
本申请实施例提供了一种车牌字符识别方法、装置、设备及存储介质,以解决相关技术中在车牌图像质量较低时无法对车牌字符进行准确识别的技术问题。
第一方面,本申请实施例提供了一种车牌字符识别方法,包括:
获取至少一张目标图像,所述目标图像中显示有待识别车牌,所述待识别车牌中包含多个字符;
利用主干网络处理所述目标图像,以得到所述待识别车牌的特征张量;
将所述特征张量输入至各特征融合网络中,并通过所述特征融合网络得到各所述字符的特征向量,每个所述特征融合网络输出一个特征向量;
将每个所述特征向量分别输入至对应的分类器中,并利用所述分类器得到各所述字符的字符识别结果,每个所述分类器输出一个字符识别结果。
第二方面,本申请实施例还提供了一种车牌字符识别装置,包括:
图像获取模块,用于获取至少一张目标图像,所述目标图像中显示有待识别车牌,所述待识别车牌中包含多个字符;
特征张量确定模块,用于利用主干网络处理所述目标图像,以得到所述待识别车牌的特征张量;
特征向量确定模块,用于将所述特征张量输入至各特征融合网络中,并通过所述特征融合网络得到各所述字符的特征向量,每个所述特征融合网络输出一个特征向量;
识别结果确定模块,用于将每个所述特征向量分别输入至对应的分类器中,并利用所述分类器得到各所述字符的字符识别结果,每个所述分类器输出一个字符识别结果。
第三方面,本申请实施例还提供了一种车牌字符识别设备,包括:
一个或多个处理器;
存储器,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的车牌字符识别方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面所述的车牌字符识别方法。
上述车牌字符识别方法、装置、设备及存储介质,通过获取包含待识别车牌的目标图像,并利用主干网络得到待识别车牌的特征张量,利用特征融合网络得到待识别车牌中各字符的特征向量,利用分类器得到各字符的字符识别结果的技术手段,解决了在车牌图像质量较低时无法对车牌字符进行准确识别的技术问题。上述技术手段不涉及字符分割,而是直接对各字符进行预测,避免了对字符分割的依赖,进而降低了对图像质量的依赖。并且,对主干网络、特征融合网络和分类器的结构限定较小,可以根据实际情况进行调整,使得上述方案的灵活性更高。
附图说明
图1为现有技术提供的一种车牌图像;
图2为现有技术提供的另一种车牌图像;
图3为本申请实施例提供的一种车牌字符识别方法的流程图;
图4为本申请实施例提供的一种双层字符车牌的示意图;
图5为本申请实施例提供的一种车牌字符识别模型的结构示意图;
图6为本申请实施例提供的一种目标图像;
图7为本申请实施例提供的一种空间注意力图生成流程示例图;
图8为本申请实施例提供的另一种空间注意力图生成流程示例图;
图9为本申请实施例提供的另一种目标图像;
图10为本申请实施例提供的另一种车牌字符识别模型的结构示意图;
图11为本申请实施例提供的又一种目标图像;
图12为本申请实施例提供的特征选择示意图;
图13为本申请实施例提供的一种车牌字符识别装置的结构示意图;
图14为本申请实施例提供的一种车牌字符识别设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
CTC(Connectionist temporal classification)为一种时序分类算法,其可以在时序分类任务中不需要标签在时间上一一对齐就可以进行训练,减少了标签预划定的冗杂工作。目前,CTC在语音识别和光学字符识别领域中得到了广泛的应用。一些技术中,将CTC应用在车牌识别领域中,其利用深度神经网络模型实现车牌识别,其中,深度神经网络模型的训练过程可以是:获取大量包含车牌的图像,之后,确定图像中车牌边界框的坐标和关键点(如车牌的对角点)坐标。例如,确定图2中示出的车牌边界框11和关键点12的坐标。之后,根据车牌边界框和关键点的坐标,对图像中的车牌进行仿射或透视变换,以将车牌对齐,即将图2中的车牌转换成规整的矩形。之后,利用大量经过对齐的图像训练深度神经网络模型,其中,深度神经网络模型包括主干网络和头部网络,主干网络用于识别图像中车牌的特征张量,头部网络用于根据特征张量计算损失函数。当深度神经网络模型训练完毕(即模型稳定)后,保留主干网络进行应用。应用过程中,获取需要识别车牌的图像并输入至主干网络,以提取相应的特征张量,之后,对特征张量进行解码,以预测车牌中的每个字符,进而得到车牌的 识别结果。
但是,CTC输出序列的长度具有不确定性,这样会造成识别结果不精准,例如,其输出的识别结果中的字符长度可能长于或短于车牌包含的字符长度。此外,CTC不能利用车牌的先验信息,如以中国大陆汽车的车牌为例,其车牌中第一个字符为省(直辖市、自治区)的汉字简称,其对应的字符类别可以为汉字简称的集合,第二个字符为大写英文字母,其对应的字符类别可以为大写英文字母的集合,然而,利用CTC识别车牌时,每个字符对应的字符类别均相同,且字符类别均包括汉字简称集合、大写英文字母集合以及0-9数字集合等,此时,CTC解码时需要在上述字符类别中进行预测,增大了CTC预测字符的难度。同时,CTC仅支持单一方向序列的预测,其无法识别双层车牌(即存在上下两层字符的车牌)。
综上,本申请实施例提供一种车牌字符识别方法,以在拍摄装置移动情况下,对图像中的车牌准确识别,并可以得到定长的识别结果,还能够识别双层车牌,并且预测难度低,便于实现。
本申请实施例提供的车牌字符识别方法可以由车牌字符识别设备执行,该车牌字符识别设备可以通过软件和/或硬件的方式实现,该车牌字符识别设备可以是两个或多个物理实体构成,也可以是一个物理实体构成。
一个实施例中,以车牌字符识别设备为机器人为例进行描述,其中,该机器人配置有移动装置和拍摄装置,通过移动装置使机器人具备移动功能,通过拍摄装置使机器人具备拍摄图像的功能。移动装置和拍摄装置的具体结构、工作方式实施例不作限定。可理解,车牌字符识别设备还可以是手机、计算机等设备。车牌字符识别设备也可以不具有移动装置和/或拍摄装置,当车牌字符识别设备不具有拍摄装置时,其可以获取外置的拍摄装置拍摄的图像并进行处理。
图3为本申请实施例提供的一种车牌字符识别方法的流程图。参考图3,该车牌字符识别方法具体包括:
步骤110、获取至少一张目标图像,目标图像中显示有待识别车牌,待识别车牌中包含多个字符。
具体的,待识别车牌是指当前需要进行识别的车牌,其包含多个字符。以中国大陆汽车的车牌为例,其包含7个字符或8个字符,其中,8个字符为新能源车牌,7个字符为普通车牌。实施例中,将包含待识别车牌的图像称为目标图像。目标图像由拍摄装置拍摄,其中,拍摄装置触发拍摄时所采用的机制实施例不作限定,如触发拍摄的机制为检测到拍摄范围内存在除地面和墙面外的其他物体时进行拍摄。示例性的,目标图像可以为一张或多张,实施例中,以一张目标图像为例进行描述。可选的,拍摄装置拍摄的目标图像可具有相同的分辨 率或不同的分辨率,实施例中,以目标图像具有相同的分辨率为例进行描述,此时,拍摄的目标图像具有固定的尺寸。
示例性的,当获取到拍摄装置拍摄的一张图像时,先对图像进行车牌检测,以确定其是否包含车牌,若包含车牌,则将该图像确定为目标图像,并将目标图像包含的车牌确定为待识别车牌。其中,检测图像是否包含车牌的方式实施例不作限定,例如,通过边缘检测方法检测图像是否包含车牌的边缘,进而确定是否包含车牌,再如,通过色彩分割的方法检测图像是否包含车牌的颜色,进而确定是否包含车牌。
可选的,当拍摄装置移动时,拍摄的目标图像中待识别车牌可能存在运动和抖动模糊等问题。为了避免上述问题对后续处理的影响,实施例中,对目标图像中待识别车牌进行对齐处理,使得对齐处理后的待识别车牌位于长方形的像素区域内。此时,本步骤之后,还包括:将目标图像中的待识别车牌映射到设定的像素坐标区域中。一个实施例中,检测到目标图像包含车牌时,输出检测到的待识别车牌的边界框(bounding box)的坐标和关键点的坐标,其中,边界框是指包含待识别车牌的矩形框,边界框的坐标是指边界框在目标图像中的像素坐标。关键点是指边界框内待识别车牌的左上顶点、左下顶点、右上顶点和右下顶点。例如,图2中示出的四个关键点。关键点的坐标是指关键点在目标图像中的像素坐标。之后,根据边界框的坐标和关键点的坐标对待识别车牌中各像素点进行仿射或透视变换。可理解,变换后的待识别车牌具有统一的尺寸,实施例中,变换后的待识别车牌的四个关键点在目标图像中的像素坐标固定,以将四个关键点之间的区域作为待识别车牌的像素坐标区域,此时,变换后的待识别车牌均位于像素坐标区域中。可选的,像素坐标区域为矩形区域。举例而言,变换后的待识别车牌的四个关键点在目标图像中的像素坐标分别为(3,9)、(3,91)、(29,91)和(29,9)。像素坐标区域的高和宽分别为32和100。示例性的,仿射变换是一种二维坐标到二维坐标的线性变换,仿射变换包括:平移变换、旋转变换、缩放变换、倾斜变换、翻转变换。透视变换是指利用透视中心、像点、目标点三点共线的条件,按透视旋转定律使承影面(透视面)绕迹线(透视轴)旋转某一角度,破坏原有的投影光线束,仍能保持承影面上投影几何图形不变的变换。通过四个关键点变换前的坐标和变换后的坐标以及边界框的坐标和像素坐标区域确定变换时使用的参数(如平移参数、旋转参数)等,之后,根据该参数对边界框内的各像素点进行仿射或透视变换,以将待识别车牌映射到设定像素坐标区域。需说明,后续使用的目标图像均是将待识别车牌映射到设定像素坐标区域的图像,此时,目标图像中的待识别车牌具有固定的尺寸。
步骤120、利用主干网络处理目标图像,以得到待识别车牌的特征张量。
一个实施例中,采用车牌字符识别模型实现车牌字符的识别。可理解,实施例中使用的 车牌字符识别模型为训练好的神经网络模型,其训练过程实施例不作限定。一个实施例中,车牌字符识别模型包括主干网络。其中,主干网络(Backbone)用于识别目标图像中待识别车牌的特征,并输出特征张量。主干网络的具体结构实施例不作限定,如采用卷积神经网络(CNN)、残差网络(ResNet)或针对移动端的轻量化网络(MobileNet)。一个实施例中,将目标图像输入至主干网络后,主干网络对目标图像中的待识别车牌进行特征提取,并输出特征张量。其中,特征张量是指通过高维的数据表示待识别车牌的特征,待识别车牌的特征包括待识别车牌的颜色特征、纹理特征、形状特征和空间关系特征等。需说明,特征张量中一个数值可以认为是一个元素,一个元素可表示一个特征像素,其中,特征像素是指描述特征的像素。
一个实施例中,将目标图像输入至主干网络时,可选输入四维张量,其中,输入的四维张量可以通过N*H*W*C表示,N表示目标图像的样本数(即输入的目标图像的张数),H表示目标图像的高度、W表示目标图像的宽度,C表示目标图像的通道数。其中,高度和宽度是指目标图像的像素高度(竖直方向上包含的像素数量)和像素宽度(水平方向上包含的像素数量),通道数根据目标图像的颜色标准确定,实施例中,目标图像为RGB图像,因此,通道数为3,分别对应为R通道、G通道和B通道。其中,RGB图像是通过对红(R)、绿(G)、蓝(B)三个颜色通道的变化以及它们相互之间的叠加来得到各式各样的颜色的图像。主干网络处理目标图像后,输出四维特征张量,即输出N’*H’*W’*C’的特征张量。其中,特征张量中N’对应于N,表示输入的目标图像的样本数,H’、W’、C’分别表示特征张量的高度、宽度和通道数。可理解,H’、W’、C’的具体值可在车牌字符识别模型的训练过程中确定。实施例中,W’的数量大于或等于车牌包含字符的最大数量,以便于后续分类器的处理,例如,中国大陆的汽车车牌最多包含8个字符,因此,W’≥8,以保证后续分类器可以准确的识别出每个字符。一个实施例中,将车牌包含字符的最大数量作为当前可识别的字符的总数量,例如,车牌包含字符的最大数量为8,则说明当前最多可同时识别8个字符。此时,实施例中,设定特征张量的宽度大于或等于目标数量,目标数量为可识别字符的总数量。
步骤130、将特征张量输入至各特征融合网络中,并通过特征融合网络得到各字符的特征向量,每个特征融合网络输出一个特征向量。
示例性的,车牌字符识别模型还包括多个特征融合网络。其中,特征融合网络用于对特征张量中包含的特征进行融合,以得到对应字符的特征向量。特征向量的维度小于特征张量的维度。举例而言,将N’*H’*W’*C’的特征张量输入特征融合网络后,由特征融合网络输出H”*W”*C”的特征向量,其中,H”、W”、C”分别表示特征向量的高度、宽度和通道数。一个实施例中,H”和W”为1,C”和C’相等。
一个实施例中,特征融合网络的数量等于目标数量,每个特征融合网络用于输出一个字符的特征向量,即每个特征向量用于表示相应字符的特征。例如,目标数量为8时,将特征张量分别输入8个特征融合网络,以由8个特征融合网络分别输出8个特征向量,每个特征向量表示车牌中对应字符的特征。可理解,目标数量是可识别字符的总数量,而非待识别车牌中字符的总数量,例如,目标数量为8,待识别车牌包含7个字符,那么,本步骤是由8个特征融合网络输出8个特征向量,此时,与第8个字符对应的特征向量会被后续分类器识别为空。
特征融合网络的具体结构可以根据实际情况设定。一个实施例中,特征融合网络采用空间注意力机制,通过空间注意力机制可在特征张量中确定出需要重点关注的特征,进而输出相应的特征向量,其中,重点关注的特征具体为用于描述相应字符的特征。可选的,使用卷积块注意力模块(CBAM)作为特征融合网络,以实现空间注意力机制。实施例中,在采用空间注意力机制时,通过特征融合网络得到各字符的特征向量包括步骤131-步骤134:
步骤131、对特征张量分别进行最大池化操作和平均池化操作,以分别得到最大池化特征张量和平均池化特征张量。
池化是指信息进行抽象的过程,其可达到降维、下采样、去除冗余信息、对特征进行压缩等效果。示例性的,最大池化操作是指取局部接受域中值最大的点,举例而言,确定执行最大池化的树池的矩阵大小,之后,根据矩阵大小将特征张量划分成多个子区域,每个子区域为一个局部接受域,之后,取每个子区域的最大值代表该子区域,并将取得的各最大值组成新的张量,以实现最大池化操作。通过最大池化操作保留了特征张量中的纹理特征。实施例中,将各最大值组成新的张量记为最大池化特征张量。进一步的,平均池化操作是对局部接受域中的所有值求均值。均值池化操作与最大池化操作类似,只是取每个子区域的平均值代表该子区域,以组成新的张量。通过平均池化操作保留了特征张量整体的特征。实施例中,将各平均值组成的新的张量记为平均池化特征张量。可选的,最大池化操作和平均池化操作中使用的树池矩阵大小相同。一个实施例中,最大池化特征张量和平均池化特征张量均为通道数为1的特征张量。
步骤132、根据最大池化特征张量和平均池化特征张量得到融合特征张量。
将最大池化特征张量和平均池化特征张量进行融合,以在后续处理过程中同时参考纹理特征和整体的特征。实施例中,最大池化特征张量和平均池化特征张量融合后仍然得到一张量,实施例中,将融合得到的张量记为融合特征张量。
其中,融合时采用的方式可以根据实际情况设定。实施例中,以连接及求和两种方式描述最大池化特征张量和平均池化特征张量的融合。此时,本步骤可以应用下述方案:
方案一、将最大池化特征张量和平均池化特征张量进行连接,以得到融合特征张量。
其中,连接(concatenate)的作用为特征联合。实施例中,沿通道维将最大池化特征张量和平均池化特征张量进行连接,以得到融合特征张量。举例而言,最大池化特征张量和平均池化特征张量均为通道数为1的张量,沿通道维进行连接后,可以得到通道数为2的融合特征张量,且融合特征张量的高度和宽度与最大池化特征张量和平均池化特征张量的高度和宽度一致。
方案二、对最大池化特征张量和平均池化特征张量进行逐元素求和,以得到融合特征张量。
逐元素运算是指对每个元素均计算独立计算。实施例中,逐元素求和是指对最大池化特征张量和平均池化特征张量中位于相同位置的元素进行求和。一个实施例中,最大池化特征张量和平均池化特征张量的大小相同,因此,最大池化特征张量和平均池化特征张量中相同位置的元素一一对应。在进行逐元素求和时,逐个将相同位置的元素进行相加,以得到融合特征张量。举例而言,最大池化特征张量中某个位置的元素为2,平均池化特征张量中相同位置的元素为1,那么,融合特征向量中该位置的元素为2+1=3。可理解,融合特征张量、最大池化特征张量以及平均池化特征张量具有相同的通道数、高度和宽度。例如,最大池化特征张量和平均池化特征张量均为通道数为1的特征张量,那么,融合特征张量也是通道数为1的特征张量。
步骤133、根据融合特征张量构建空间注意力图。
示例性的,空间注意力图为二维的数据,其用于指示在特征张量中应局部重点关注的特征,其中,局部重点关注的特征便为对应字符的特征。一个实施例中,对融合特征张量进行逐元素激活后得到空间注意力图。一个实施例中,对融合特征张量进行卷积操作,以得到通道数为1的融合特征张量。即设置一卷积层,该卷积层输出的通道数为1,通过该卷积层对融合特征张量进行卷积操作后得到通道数为1的融合特征张量。之后,再对通道数为1的融合特征张量进行逐元素激活,以得到空间注意力图。可理解,当融合特征张量本身的通道数为1时,可以忽略卷积操作,而直接对融合特征张量进行逐元素激活。
一个实施例中,逐元素激活是指将融合特征张量中的每个元素均通过激活函数进行处理。其中,激活函数的类型可以根据实际情况设定,只需满足激活函数输出的值为非负数(即空间注意力图中各数值为非负数)即可。例如,激活函数采用CBAM中的sigmoid函数、1+tanh函数等。需说明,空间注意力图中各数值可以表示特征张量中对应区域的特征的权重,数值为非负数时可以保证后续特征加权时具有明确的物理意义。
步骤134、根据空间注意力图和特征张量得到待识别车牌中对应字符的特征向量。
根据空间注意力图,确定特征张量中应重点关注的特征,进而根据重点关注的特征得到相应字符的特征向量。
一个实施例中,本步骤134可以包括步骤1341-步骤1343:
步骤1341、将空间注意力图沿通道维自动扩张后得到注意力张量图。
示例性的,将空间注意力图沿通道维自动扩张(broadcast)后,可得到通道数大于1的注意力图,实施例中,将得到的注意力图记为注意力张量图。其中,将空间注意力图沿通道维自动扩张可理解为将二维的空间注意力图沿通道维进行复制,每复制一次,认为自动扩张一次。一个实施例中,自动扩张的次数与特征张量的通道数有关,如自动扩张的次数等于特征张量的通道数,使得注意力张量图的通道数等于特征张量的通道数,进而便于后续逐元素乘法计算。可理解,自动扩张后,仅通道数发生变化。举例而言,空间注意力图的尺寸为h*w*1,其中,h和w分别为空间注意力图的高和宽,1为空间注意力图的通道数。将空间注意力图沿通道维自动扩张c次后,得到的注意力张量图的尺寸为h*w*c,其中,h和w保持不变,c>1。
步骤1342、对特征张量和注意力张量图进行逐元素乘法后得到乘法张量。
逐元素乘法是指利用注意力张量图对特征张量中每个元素进行乘法运算。示例性的,注意力张量图中每个元素在特征张量中存在对应的一个或多个元素,且空间注意力图中各元素可以表示特征张量中对应区域的特征(即一个或多个元素)的权重,此时,逐元素乘法可理解为将特征张量中的每个元素与注意力张量图中的对应元素进行乘法运算,该过程也可以认为是对特征进行加权的过程。实施例中,将逐元素乘法后得到的张量记为乘法张量。可理解,乘法张量的尺寸与特征张量的尺寸相同。举例而言,特征张量为4维特征张量,其记为X,X的尺寸为n*h*w*c,扩张后的注意力张量图记为W,W的尺寸为n*h*w*c,即特征张量和注意力张量图的尺寸相等,进行逐元素乘法后得到的乘法张量记为M,此时,逐元素乘法的公式如下M[i,j,k,l]=X[i,j,k,l]*W[i,j,k,l]。其中,X[i,j,k,l]表示X中位置为(i,j,k,l)的元素,其中,i、j、k、l的具体数值分别包含在n、h、w、c中。W[i,j,k,l]表示W中位置为(i,j,k,l)的元素,M[i,j,k,l]表示M中位置为(i,j,k,l)的元素。即将特征张量和注意力张量图中相同位置的元素相乘后,得到了乘法张量。
步骤1343、将乘法张量沿自身的宽度维和高度维进行求和后得到待识别车牌中对应字符的特征向量。
示例性的,将乘法张量中各通道内的元素沿宽度维和高度维进行求和,其中,沿宽度维和高度维进行求和可以理解为将同一通道内、同一宽度下的各元素求和后,再将同一高度的各和值进行相加,即将同一通道内的各元素进行相加。沿高度维和宽度维进行求和后,可得 到高度和宽度均为1且通道数不变的向量,实施例中,将得到的向量记为对应字符的特征向量。举例而言,乘法张量的尺寸为h’*w’*c’,其中,h’、w’、c’分别表示乘法张量的高度、宽度和通道数,对乘法张量沿高度维和宽度维求和后,可得到1*1*c’(即得到尺寸为1*c’)的特征向量。
可选的,若特征向量中通道数过大,则得到特征向量后,可以通过全连接层对特征向量进行降维,以减小特征向量的通道数。其中,全连接层可以实现特征的综合。可理解,由于特征张量和特征向量的通道数相等,那么为了避免特征向量中通道数过大的情况,也可以在主干网络训练过程中,调整其输出的特征张量的通道数,以得到通道数较小的特征张量,进而得到通道数较小的特征向量。
需说明,按照上述方式得到特征向量时,特征融合网络可包括最大池化层(用于最大池化操作)、平均池化层(用于平均池化操作)、特征融合层(用于特征连接或逐元素相加)、卷积层(可选的,融合特征张量为1时也可以不设置)、激活函数层(用于得到空间注意力图)、自动扩张部分(用于进行自动扩张)、逐元素乘法部分(用于进行逐元素乘法)、求和部分(用于沿宽度维和高度维进行求和)。各特征融合网络按照上述步骤操作后,可分别得到各字符的特征向量。
另一个实施例中,特征融合网络除了采用空间注意力机制外,还可以采用可学习的掩码机制。其中,可学习的掩码机制是指引入掩码参数(mask),实施例中,掩码参数的作用与空间注意力图的作用类似,通过掩码参数可对特征张量中的部分特征(即非对应字符的特征)进行遮挡,以使特征融合网络仅关注特征张量中对应字符的特征,进而得到特征向量。实施例中,在采用可学习的掩码机制时,将所述特征张量输入至各特征融合网络中,并通过特征融合网络得到各字符的特征向量包括步骤135-步骤136:
步骤135、获取每个特征融合网络对应的掩码参数。
其中,掩码参数是指是可学习的参数,其在车牌字符识别模型训练过程中被优化,即在训练车牌字符识别模型时就引入了掩码参数,并使掩码参数跟随车牌字符识别模型的训练过程而进行优化,以使得优化后的掩码参数可以遮挡特征张量中非对应字符的特征。每个特征融合网络对应一个掩码参数,各掩码参数的尺寸相同。一个实施例中,掩码参数的尺寸为H’*W’*1,其中,即掩码参数的高度和宽度与特征张量的高度和宽度相等,通道数为1。车牌字符识别模型训练完成后,记录各特征融合网络对应的掩码参数,以在本步骤中可直接获取各特征融合网络的掩码参数。
一个实施例中,同空间注意力图相似,掩码参数也为非负参数。可选的,为了保证掩码参数为非负参数,在训练车牌字符识别模型的过程中,每更新一次掩码参数后,对掩码参数 进行一次截断操作,其中,截断操作具体为将掩码参数中的负数置零,以保证每次更新的掩码参数均为非负参数。可理解,训练车牌字符识别模型时,每更新模型参数时均同时更新掩码参数。还可选的,获取每个特征融合网络对应的掩码参数时,对掩码参数进行softmax归一化,以保证掩码参数为非负参数,其中,归一化是指将数据限定在一定的范围内,softmax归一化是指通过softmax函数实现归一化,实施例中,softmax函数可以将数据限定在(0,1)之间,通过softmax函数处理掩码参数后,便可以使掩码参数为非负参数。
步骤136、将掩码参数和特征张量输入至对应的特征融合网络,以通过特征融合网络得到对应字符的特征向量。
具体的,将掩码参数和特征张量输入至特征融合网络后,特征融合网络可以根据掩码参数在特征张量中仅关注对应字符的特征,进而输出相应的特征向量。其中,特征融合网络处理掩码参数和特征张量的过程与上述实施例中处理空间注意力图和特征张量的过程相同,即先将掩码参数沿通道维自动扩张后,对特征张量和自动扩张后的掩码参数进行逐元素乘法,之后,将逐元素乘法后得到的张量沿宽度维和高度维进行求和后得到对应字符的特征向量,其具体过程可参考上述实施例。
每个特征融合网络按照上述方式进行处理后,便可以得到对应字符的特征向量。
步骤140、将每个特征向量分别输入至对应的分类器中,利用分类器得到各字符的字符识别结果,每个分类器输出一个字符识别结果。
示例性的,车牌字符识别模型还包括多个分类器。其中,分类器的数量与特征融合网络的数量相等,均为目标数量,此时,每个分类器对应一个字符,其用于根据特征向量在类别空间中确定字符的内容,进而得到字符识别结果。其中,字符识别结果是指字符的具体内容。可理解,类别空间是相应字符可选择内容的集合,其可以作为先验信息。可选的,获取车牌中当前位置的字符可出现的全部内容组成该字符对应的类别空间,并与对应的分类器关联。举例而言,中国大陆汽车的车牌为例,第一个字符为省(直辖市、自治区)的汉字简称,因此,可以将各省(直辖市、自治区)的汉字简称的集合作为类别空间,每个汉字简称作为类别空间中的一个元素,此时,类别空间共有31个元素。第二个字符为大写的英文字母,由于英文字母I和O不用于大陆汽车的车牌,因此,第二个字符对应的类别空间可以由除I和O外剩余的24个大写的英文字母组成。第三个字符至第六个字符对应的类别空间均可包括10个阿拉伯数字(数字0至数字9)和24个大写英文字母。第七个字符的类别空间除包括10个阿拉伯数字和24个大写英文字母外,还包括“港”、“澳”、“警”、“学”、“挂”五个汉字,以使车牌字符识别模型可识别在大陆通行的香港车、澳门车、警车、驾校的教练车和挂车的车牌。第八个字符的类别空间包括10个阿拉伯数字、24个大写英文字母、“港”、“澳”、“警”、 “学”、“挂”和“空”。其中,“空”是指对应字符为空,即待识别车牌有7个字符,不存在第8个字符。
之后,分类器根据特征向量在类别空间中选择最接近的元素作为字符识别结果。
示例性的,分类器的具体结构可根据实际情况设定,实施例中以分类器包括全连接层和激活函数层为例进行描述。此时,利用分类器得到各所述字符的字符识别结果包括步骤141-步骤142:
步骤141、利用分类器的全连接层结合相应的类别空间确定特征向量的对数向量,每个分类器对应一个类别空间。
实施例中,分类器通过softmax函数实现分类,softmax函数分类过程中输入softmax函数的向量为对数(logits)向量,其通过全连接层得到。其中,全连接层的数量可以为一个或多个,实施例中,以一个全连接层为例进行描述。示例性的,将特征向量输入全连接层,全连接层根据该特征向量确定类别空间中各元素为字符具体内容的logits值,该logits值可理解为全连接层输出未经过softmax函数时,用于描述元素属于字符具体内容的概率的值,进而将各logits值排列后输出对数向量。可理解,对数向量为1*n的向量,其中,n的取值通过类别空间中元素的个数决定。例如,中国大陆车牌的第二个字符对应的类别空间中包含24个大写的英文字母(字母I和O不用于车牌),因此,相应分类器的全连接层输出的对数向量为1*24的向量,即当前每个分类器接收1*C”的特性向量后输出1*24的对数向量。
步骤142、利用分类器的损失函数预测出对数向量的概率向量,并根据概率向量得到对应字符的字符识别结果。
其中,损失函数为softmax函数,将1*n的对数向量经过softmax函数后输出1*n的向量,该向量的意义是代表字符的具体内容属于分类空间中对应元素的概率,因此,实施例中将softmax函数输出的向量记为概率向量。具体的,概率向量中的数值越高,该数值在类别空间中对应的元素为对应字符的概率越大。进一步的,根据概率向量选择类别空间中对应概率最大的元素作为对应字符的字符识别结果。可理解,字符识别结果可为空,例如,当前车牌共7个字符,车牌字符识别模型可识别8个字符,那么,第8个字符对应的字符识别结果为“空”。
示例性的,将各字符的字符识别结果按顺序排列后作为待识别车牌的车牌识别结果。
上述,通过获取包含待识别车牌的目标图像,并利用主干网络得到待识别车牌的特征张量,利用特征融合网络得到待识别车牌中各字符的特征向量,利用分类器得到各字符的字符识别结果的技术手段,解决了在车牌图像质量较低时无法对车牌字符进行准确识别的技术问题。上述技术手段不涉及字符分割,而是直接对各字符进行预测,避免了对字符分割的依赖。并且,对主干网络、特征融合网络和分类器的结构限定较小,可以根据实际情况进行调整, 使得上述方案的灵活性更高。无需引入CTC,提高了车牌识别的效率,便于部署应用。待识别车牌中每个字符对应一个特征融合网络和分类器,可以保证得到定长的字符识别结果。通过为每个位置的字符设置对应的类别空间,可以减小预测难度,提高预测准确率。并且,通过在类别空间中添加“空”的汉字,可实现对不同字符数量的车牌进行识别,如实现对7个字符的车牌和8个字符的车牌的识别,增加了技术方案的复用性,无需再训练额外的车牌字符识别模型。
在上述实施例的基础上,车牌可以包括单层的车牌和双层的车牌,例如,图1所示的车牌为单层的车牌,图4所示的车牌为双层的车牌,其中,图4为本申请实施例提供的一种双层字符车牌的示意图。参考图4,该车牌为双层字符,包括上下两层。需说明图4中下层的前3个字符经过了模糊处理,以避免信息泄露。实施例中,将单层的车牌记为单层字符车牌,双层的车牌记为双层字符车牌。对于双层字符车牌而言,为了避免上下层字符间特征影响后续识别的准确度。因此,实施例中,设定待识别车牌为双层字符车牌时,步骤130之前,还包括步骤210:
步骤210、对特征张量分别进行特征选择操作,以分别得到特征张量的第一分区张量和第二分区张量,第一分区张量对应于双层字符车牌的上层字符,第二分区张量对应于双层字符车牌的下层字符。
可理解,对于双层字符车牌而言,通过主干网络得到特征张量中的高度应大于1,以避免两层字符的特征间存在混淆。此时,主干网络提取待识别车牌的特征张量时,各字符对应的特征在特征张量中的位置与字符在待识别车牌中的位置相似,此时,上层字符的特征位于特征张量的顶端,下层字符的特征位于特征张量的底端。据此,将特征张量分为两个区,顶端部分对应一个分区,其为双层字符车牌中上层字符对应的特征张量,底端部分对应一个分区,其为双层字符车牌中下层字符对应的特征张量。实施例中,将顶端部分对应的分区记为第一分区张量,将底端部分对应的分区记为第二分区张量。示例性的,第一分区张量和第二分区张量间的分界线可根据双层字符车牌所在的像素坐标区域的高度决定,一个实施例中,确定分界线时引入一参数,该参数大于0且小于1,该参数与特征张量的高度相乘后,便可以确定分界线的高度。例如,该参数为0.4,特征张量的高度为10,那么,分界线的高度为0.4*10=4,此时,第一分区张量对应[0-4]高度的特征张量,第二分区张量对应[4-10]高度的特征张量。一个实施例中,设定参数后,根据该参数和特征张量的高度确定的分界线可能不是整数,此时,进行取整操作,以保证第一分区张量和第二分区张量的高度为整数值。其中,取整操作的实现方式可根据实际情况设定,举例而言,设定的参数为a,特征张量的高度为H’,若a*H’不是整数,则通过round(aH’)+1的方式得到整数分界线,其中,round()表示四 舍五入,即对aH’进行四舍五入后再加1以得到整数的分界线,或者,通过floor(aH’)+1的方式得到整数分界线,其中,floor()表示向下取整,即对aH’进行向下取整后再加1以得到整数的分界线,或者,通过ceil(aH’)的方式得到整数分界线,其中,ceil()表示向上取整,即对aH’进行向上取整以得到整数的分界线。
一个实施例中,在车牌字符识别模型中增加特征选择操作,以通过特征选择操作对特征张量进行切片,进而得到第一分区张量和第二分区张量。示例性的,通过特征选择操作对特征张量进行切片后,将其分成上下两层,上一层为第一分区张量,第一分区张量为车牌的第一层中各字符对应的特征张量,下一层为第二分区张量,第二分区张量为车牌的第二层中各字符对应的特征向量。一个实施例中,将特征张量输入特征融合网络之前,先对特征张量进行特征选择操作,此时,进行特征选择操作的次数与特征融合网络的数量相等,其中,一部分特征选择操作从特征张量中选择出第一分区张量,并输入至上层字符对应的特征融合网络,另一部分特征选择操作从特征张量中选择出第二分区张量,并输入至下层字符对应的特征融合网络。
示例性的,进行特征选择操作时,先根据设定的参数确定分界线,并在分界线不是整数时进行取整操作,之后,根据分界线对特征张量进行切片,并选择对应的第一分区张量或第二分区张量。比如,在特征选择操作中设置参数a。此时,对特征张量进行特征选择操作时,根据参数a和特征张量的高度确定分界线,并根据分界线对特征张量进行切片(其中,根据参数a生成分界线的方式可以参照前述的内容)。一个实施例中,可以预先设置当前的特征选择操作用于选择第一分区张量还是选择第二分区张量,当根据分界线对特征张量进行切片后,便可以输出对应的分区张量。
一个实施例中,将待识别车牌映射到设定的像素坐标区域时可能存在误差,该误差可导致上层字符位置靠下或者下层字符位置靠上,此时,对特征张量进行切片时,会出现第一分区张量没有完全覆盖上层字符的特征或第二分区张量没有完全覆盖下层字符的特征的情况,进而影响后续处理的准确性。为了防止上述情况发生,实施例中,设置第一分区张量和第二分区张量间存在重叠的分区张量。示例性的,重叠的分区张量是指第一分区张量和第二分区张量中存在部分相同的元素,该元素通常为上层字符和下层字符中间区域的特征像素,其中,重叠的分区张量的高度可根据实际情况设定,举例而言,设定第一分区张量的高度为[0-3],第二分区张量的高度为[2-6],即第一分区张量和第二分区张量间存在[2-3]高度中重叠的元素。
一个实施例中,为了便于特征选择操作切分特征张量,利用主干网络处理目标图像,以得到待识别车牌的特征张量时,还包括:利用主干网络对特征张量进行下采样处理。其中,下采样的方式和次数可根据特征张量的高度确定。例如,特征张量的高度为48时,采用平均 池化的方式进行两次下采样后,得到高度为12的特征张量。再如,采用最大池化的方式进行三次下采样后,得到高度为6的特征张量。可理解,此时,主干网络输出的特征张量是指下采样后的特征张量。
一个实施例中,由于特征张量被分为第一分区张量和第二分区张量,因此,将特征张量输入至各特征融合网络包括:将第一分区张量输入至上层字符对应的各特征融合网络,并将第二分区张量输入至所述下层字符对应的各特征融合网络。示例性的,将特征融合网络分为上层字符对应的特征融合网络和下层字符对应的特征融合网络,此时,上层字符的特征融合网络可得到对应的上层字符的特征向量,下层字符的特征融合网络可得到对应下层字符的特征向量。可理解,上层字符对应的特征融合网络的数量与上层字符包含的字符数量相等。举例而言,中国大陆汽车的双层字符车牌中,上层字符共有2个,因此,设置2个特征融合网络对应上层字符,用于接收第一分区张量,下层字符共有5个,因此,设置5个特征融合网络对应下层字符,用于接收第二分区张量。
上述,通过设置特征选择操作,可以实现对双层字符车牌的识别。并且,利用特征选择操作得到的第一分区张量和第二分区张量间存在重叠的元素,可以避免第一分区张量和第二分区张量未包含字符的完整特征的情况,保证了识别准确性。对特征张量进行下采样后再经过特征选择操作进行切分,可以保证特征选择操作切分高度较小的特征张量,降低了特征选择操作切分时的难度。
下面对本申请实施例提供的技术方案进行示例性描述:
示例一、图5为本申请实施例提供的一种车牌字符识别模型的结构示意图。图5所示的车牌字符识别模型可识别单层字符车牌。参考图5,该车牌字符识别模型包括一个主干网络(Backbone)、8个特征融合网络(FeatFuse1-FeatFuse8)和8个分类器(Classilier1-Classilier8),该车牌字符识别模型用于识别中国大陆汽车的车牌,其中,第一个分类器对应的类别空间包含31个元素,分别为各省(台湾省除外)、直辖市和自治区的简称,第二个分类器对应的类别空间包含24个元素,分别为除I和O外的24个大写英文字母,第三至第七个分类器对应的类别空间包含34个元素,分别为阿拉伯数字0-9以及除I和O外的24个大写英文字母。第八个分类器对应的类别空间包含35个元素,其与第三至第七个分类器对应的类别空间相比,多了汉字“空”,以使车牌字符识别模型识别7个字符的车牌或8个字符的车牌。
举例而言,图6为本申请实施例提供的一种目标图像,其包含的待识别车牌为“京P***27”,共有7个字符,其中,为了避免信息泄露,待识别车牌中第三至第五个字符进行了模糊处理。可理解,该目标图像中的待识别车牌已经被映射到设定的像素坐标区域中。此时,将目标图像输入至车牌字符识别模型时,具体输入1*32*100*3的张量,即输入1张高度为32、宽度 为100、通道数为3的目标图像。之后,主干网络处理该目标图像,以输出1*1*25*512*的特征张量。得到特征张量后,分别将特征张量输入至8个特征融合网络。其中,当特征融合网络采用空间注意力机制时,图7为本申请实施例提供的一种空间注意力图生成流程示例图。参考图7,特征张量的高度为1、宽度为25、通道数为512,对特征张量13分别进行最大池化(maximum pooling,MP)操作和平均池化(average pooling,AP)操作,以分别得到通道数为1的最大池化特征张量14和平均池化特征张量15,最大池化特征张量和平均池化特征张量的高度均为1、宽度均为25、通道数均为1,之后,通过连接(Cat)的方式连接最大池化特征张量14和平均池化特征张量15,以得到通道数为2的融合特征张量16,其中,融合特征张量的高度为1、宽度为25、通道数为2,之后,经过卷积(Cnov)和激活函数(Act)的操作,以得到空间注意力图17,其中,空间注意力图的高度为1、宽度为25、通道数为1。或者,图8为本申请实施例提供的另一种空间注意力图生成流程示例图。参考图8,对特征张量13分别进行最大池化(maximum pooling,MP)操作和平均池化(average pooling,AP)操作,以分别得到通道数为1的最大池化特征张量14和平均池化特征张量15,其中,最大池化特征张量和平均池化特征张量的高度均为1、宽度均为25、通道数均为1,之后,通过逐元素求和(sum)的方式得到通道数为1的融合特征张量18,其中,融合特征张量的高度为1、宽度为25、通道数为1,之后,经过卷积(Cnov)和激活函数(Act)的操作,以得到空间注意力图19,其中,空间注意力图的高度为1、宽度为25、通道数为1。可理解,当前使用的卷积操作为可选操作。之后,根据空间注意力图和特征张量得到1×512的特征向量。之后,将8个特征向量分别输入至8个分类器中,以通过分类器输出对应的字符识别结果,其中,第1个分类器输出的字符识别结果为“京”,第2个分类器输出的字符识别结果为“P”……第7个分类器输出的字符识别结果为“7”,第8个分类器输出的字符识别结果为“空”,将上述各字符识别结果结合后,可以得到待识别车牌的最终识别结果。
再举例而言,图9为本申请实施例提供的另一种目标图像,其包含的待识别车牌为“粤B***710”,共有8个字符,其中,为了避免信息泄露,待识别车牌中第三至第五个字符进行了模糊处理。可理解,该目标图像中的待识别车牌已经被映射到设定的像素坐标区域中。此时,将目标图像输入至车牌字符识别模型时,具体输入1*32*100*3的张量,即输入1张高度为32、宽度为100、通道数为3的目标图像。之后,主干网络处理该目标图像,以输出1*1*25*512*的特征张量。之后,将特征张量分别输入至8个特征融合网络以分别得到8个1×512的特征向量。之后,将8个特征向量分别输入至8个分类器中,以通过分类器输出对应的字符识别结果,其中,第1个分类器输出的字符识别结果为“粤”,第2个分类器输出的字符识别结果为“B”……第7个分类器输出的字符识别结果为“1”,第8个分类器输出的 字符识别结果为“0”,将上述各字符识别结果结合后,可以得到待识别车牌最终的识别结果。
示例二、图10为本申请实施例提供的另一种车牌字符识别模型的结构示意图。图10所示的车牌字符识别模型可识别双层字符车牌。参考图10,由于双层字符车牌通常包含7个字符,因此,该车牌字符识别模型包括一个主干网络(Backbone)、7个特征融合网络(FeatFuse1-FeatFuse1)和7个分类器(Classifier1-Classifier7),并在车牌字符识别模型中设置特征选择操作,其中,将输出第一分区张量的特征选择操作记为置FeatSelectA,此时,第一个特征融合网络和第二个特征融合网络采用FeatSelectA,将输出第二分区张量的特征选择操作记为置FeatSelectB,第三个特征融合网络至第七个特征融合网络采用FeatSelectB。图10中将特征选择选择和特征融合网络结合在一起示出,其中,该车牌字符识别模型用于识别中国大陆汽车的车牌。第一个分类器对应的类别空间包含31个元素,分别为各省(台湾省除外)、直辖市和自治区的简称,第二个分类器对应的类别空间包含24个元素,分别为除I和O外的24个大写英文字母,第三至第六个分类器对应的类别空间包含34个元素,分别为阿拉伯数字0-9和除I和O外的24个大写英文字母,第七个分类器对应的类别空间包括39个元素,分别为阿拉伯数字0-9、除I和O外的24个大写英文字母以及“挂”。
举例而言,图11为本申请实施例提供的又一种目标图像,其包含的待识别车牌为“京A***3挂”,共有7个字符,其中,为了避免信息泄露,待识别车牌中第三至第五个字符进行了模糊处理。可理解,该目标图像中的待识别车牌已经被映射到设定的像素坐标区域中。此时,将目标图像输入至车牌字符识别模型时,具体输入1*48*100*3的张量,即输入1张高度为48、宽度为100、通道数为3的目标图像。之后,主干网络处理该目标图像,以输出1*6*25*512的特征张量。之后,将特征张量分别经过特征选择操作和特征融合网络,以得到7个1×512的特征向量。其中,图12为本申请实施例提供的特征选择示意图,其为特征选择操作对特征张量进行切分时的示意图。参考图12,根据特征张量的高度6和设定的参数a,将特征张量切分成[0-a*6]*25*512的第一分区张量A和[a*6-h]*25*512的第二分区张量B,其中,h为6。可理解,特征融合网络的处理过程可参照上述示例。之后,将7个特征向量分别输入至7个分类器中,以通过分类器输出对应的字符识别结果,其中,第1个分类器输出的字符识别结果为“京”,第2个分类器输出的字符识别结果为“A”……第7个分类器输出的字符识别结果为“挂”,将上述各字符识别结果结合后,可以得到待识别车牌最终的识别结果。
图13为本申请实施例提供的一种车牌字符识别装置的结构示意图,参考图13,该车牌字符识别装置包括图像获取模块301、特征张量确定模块302、特征向量确定模块303和识别 结果确定模块304。
其中,图像获取模块301,用于获取至少一张目标图像,目标图像中显示有待识别车牌,待识别车牌中包含多个字符;特征张量确定模块302,用于利用主干网络处理目标图像,以得到待识别车牌的特征张量;特征向量确定模块303,用于将特征张量输入至各特征融合网络中,并通过特征融合网络得到各字符的特征向量,每个特征融合网络输出一个特征向量;识别结果确定模块304,用于将每个特征向量分别输入至对应的分类器中,并利用分类器得到各字符的字符识别结果,每个分类器输出一个字符识别结果。
在上述实施例的基础上,特征向量确定模块303包括:张量输入单元,用于将特征张量输入至各特征融合网络中;池化单元,用于对特征张量分别进行最大池化操作和平均池化操作,以分别得到最大池化特征张量和平均池化特征张量;融合单元,用于根据最大池化特征张量和平均池化特征张量得到融合特征张量;注意力图获取单元,用于根据融合特征张量构建空间注意力图;向量确定单元,用于根据空间注意力图和特征张量得到待识别车牌中对应字符的特征向量。
在上述实施例的基础上,融合单元具体用于:将最大池化特征张量和平均池化特征张量进行连接,以得到融合特征张量;或;对最大池化特征张量和平均池化特征张量进行逐元素求和,以得到融合特征张量。
在上述实施例的基础上,向量确定单元包括:自动扩张子单元,用于将空间注意力图沿通道维自动扩张后得到注意力张量图;乘法子单元,用于对特征张量和注意力张量图进行逐元素乘法后得到乘法张量;求和子单元,用于将乘法张量沿自身的宽度维和高度维进行求和后得到待识别车牌中对应字符的特征向量。
在上述实施例的基础上,特征向量确定模块303包括:掩码获取单元,用于获取每个特征融合网络对应的掩码参数;掩码输入单元,用于将掩码参数和特征张量输入至对应的特征融合网络,以通过特征融合网络得到对应字符的特征向量。
在上述实施例的基础上,掩码参数为非负参数。
在上述实施例的基础上,识别结果确定模块304包括:向量输入单元,用于将每个特征向量分别输入至对应的分类器中;对数向量确定单元,用于利用分类器的全连接层结合相应的类别空间确定特征向量的对数向量,每个所述分类器对应一个类别空间;概率向量确定单元,用于利用分类器的损失函数预测出对数向量的概率向量,并根据概率向量得到对应字符的字符识别结果。
在上述实施例的基础上,待识别车牌为双层字符车牌;该装置还包括:分区模块305,用于将特征张量输入至各特征融合网络之前,对所述特征张量分别进行特征选择操作,以分 别得到特征张量的第一分区张量和第二分区张量,第一分区张量对应于双层字符车牌的上层字符,第二分区张量对应于双层字符车牌的下层字符。相应的,特征向量确定模块303可以用于将第一分区张量输入至上层字符对应的各特征融合网络,并将第二分区张量输入至下层字符对应的各特征融合网络,通过特征融合网络得到各字符的特征向量,每个特征融合网络输出一个特征向量。
在上述实施例的基础上,第一分区张量和第二分区张量间存在重叠的分区张量。
在上述实施例的基础上,特征张量确定模块302,还用于利用主干网络对特征张量进行下采样处理。
在上述实施例的基础上,特征张量的宽度大于或等于目标数量,目标数量为可识别字符的总数量。
在上述实施例的基础上,该装置还包括:映射模块,用于获取至少一张目标图像之后,将目标图像中的待识别车牌映射到设定的像素坐标区域中。
上述提供的车牌字符识别装置可用于执行上述任意实施例提供的车牌字符识别方法,具备相应的功能和有益效果。
值得注意的是,上述车牌字符识别装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。
图14为本申请实施例提供的一种车牌字符识别设备的结构示意图。如图14所示,该车牌字符识别设备包括处理器40、存储器41、输入装置42、输出装置43、拍摄装置44和移动装置45;车牌字符识别设备中处理器40的数量可以是一个或多个,图14中以一个处理器40为例。车牌字符识别设备中处理器40、存储器41、输入装置42、输出装置43、拍摄装置44和移动装置45可以通过总线或其他方式连接,图14中以通过总线连接为例。
存储器41作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请实施例中的车牌字符识别方法对应的程序指令/模块(例如,车牌字符识别装置中的图像获取模块301、特征张量确定模块302、特征向量确定模块303和识别结果确定模块304)。处理器40通过运行存储在存储器41中的软件程序、指令以及模块,从而执行车牌字符识别设备的各种功能应用以及数据处理,即实现上述的车牌字符识别方法。
存储器41可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据车牌字符识别设备的使用所创建的数据等。此外,存储器41可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少 一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器41可进一步包括相对于处理器40远程设置的存储器,这些远程存储器可以通过网络连接至车牌字符识别设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置42可用于接收输入的数字或字符信息,以及产生与车牌字符识别设备的用户设置以及功能控制有关的键信号输入。输出装置43可包括显示屏等显示设备。拍摄装置44用于拍摄目标图像,移动装置45用于控制车牌字符识别设备进行移动。车牌字符识别设备还可包括通信装置,以与其他设备进行数据通信。
上述车牌字符识别设备包含车牌字符识别装置,可以用于执行任意车牌字符识别方法,具备相应的功能和有益效果。
此外,本申请实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行本申请任意实施例所提供的车牌字符识别方法中的相关操作,且具备相应的功能和有益效果。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。
因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。
注意,上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。

Claims (15)

  1. 一种车牌字符识别方法,其中,包括:
    获取至少一张目标图像,所述目标图像中显示有待识别车牌,所述待识别车牌中包含多个字符;
    利用主干网络处理所述目标图像,以得到所述待识别车牌的特征张量;
    将所述特征张量输入至各特征融合网络中,并通过所述特征融合网络得到各所述字符的特征向量,每个所述特征融合网络输出一个特征向量;
    将每个所述特征向量分别输入至对应的分类器中,并利用所述分类器得到各所述字符的字符识别结果,每个所述分类器输出一个字符识别结果。
  2. 根据权利要求1所述的车牌字符识别方法,其中,所述通过所述特征融合网络得到各所述字符的特征向量包括:
    对所述特征张量分别进行最大池化操作和平均池化操作,以分别得到最大池化特征张量和平均池化特征张量;
    根据所述最大池化特征张量和所述平均池化特征张量得到融合特征张量;
    根据所述融合特征张量构建空间注意力图;
    根据所述空间注意力图和所述特征张量得到所述待识别车牌中对应字符的特征向量。
  3. 根据权利要2所述的车牌字符识别方法,其中,所述根据所述最大池化特征张量和所述平均池化特征张量得到融合特征张量包括:
    将所述最大池化特征张量和所述平均池化特征张量进行连接,以得到融合特征张量;或;
    对所述最大池化特征张量和所述平均池化特征张量进行逐元素求和,以得到融合特征张量。
  4. 根据权利要求2所述的车牌字符识别方法,其中,所述根据所述空间注意力图和所述特征张量得到所述待识别车牌中对应字符的特征向量包括:
    将所述空间注意力图沿通道维自动扩张后得到注意力张量图;
    对所述特征张量和所述注意力张量图进行逐元素乘法后得到乘法张量;
    将所述乘法张量沿自身的宽度维和高度维进行求和后得到所述待识别车牌中对应字符的特征向量。
  5. 根据权利要求1所述的车牌字符识别方法,其中,所述将所述特征张量输入至各特征融合网络中,并通过所述特征融合网络得到各所述字符的特征向量包括:
    获取每个特征融合网络对应的掩码参数;
    将所述掩码参数和所述特征张量输入至对应的特征融合网络,以通过所述特征融合网络得到对应字符的特征向量。
  6. 根据权利要求5所述的车牌字符识别方法,其中,所述掩码参数为非负参数。
  7. 根据权利要求1所述的车牌字符识别方法,其中,所述利用所述分类器得到各所述字符的字符识别结果包括:
    利用所述分类器的全连接层结合相应的类别空间确定所述特征向量的对数向量,每个所述分类器对应一个类别空间;
    利用所述分类器的损失函数预测出所述对数向量的概率向量,并根据所述概率向量得到对应字符的字符识别结果。
  8. 根据权利要求1所述的车牌字符识别方法,其中,所述待识别车牌为双层字符车牌;
    所述将所述特征张量输入至各特征融合网络之前,还包括:
    对所述特征张量分别进行特征选择操作,以分别得到所述特征张量的第一分区张量和第二分区张量,所述第一分区张量对应于所述双层字符车牌的上层字符,所述第二分区张量对应于所述双层字符车牌的下层字符;
    所述将所述特征张量输入至各特征融合网络包括:
    将所述第一分区张量输入至所述上层字符对应的各特征融合网络,并将所述第二分区张量输入至所述下层字符对应的各特征融合网络。
  9. 根据权利要求8所述的车牌字符识别方法,其中,所述第一分区张量和所述第二分区张量间存在重叠的分区张量。
  10. 根据权利要求8所述的车牌字符识别方法,其中,所述利用主干网络处理所述目标图像,以得到所述待识别车牌的特征张量时,还包括:
    利用所述主干网络对所述特征张量进行下采样处理。
  11. 根据权利要求1所述的车牌字符识别方法,其中,所述特征张量的宽度大于或等于目标数量,所述目标数量为可识别字符的总数量。
  12. 根据权利要求1所述的车牌字符识别方法,其中,所述获取至少一张目标图像之后,还包括:
    将所述目标图像中的待识别车牌映射到设定的像素坐标区域中。
  13. 一种车牌字符识别装置,其中,包括:
    图像获取模块,用于获取至少一张目标图像,所述目标图像中显示有待识别车牌,所述待识别车牌中包含多个字符;
    特征张量确定模块,用于利用主干网络处理所述目标图像,以得到所述待识别车牌的特征张量;
    特征向量确定模块,用于将所述特征张量输入至各特征融合网络中,并通过所述特征融 合网络得到各所述字符的特征向量,每个所述特征融合网络输出一个特征向量;
    识别结果确定模块,用于将每个所述特征向量分别输入至对应的分类器中,并利用所述分类器得到各所述字符的字符识别结果,每个所述分类器输出一个字符识别结果。
  14. 一种车牌字符识别设备,其中,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-12中任一所述的车牌字符识别方法。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-12中任一所述的车牌字符识别方法。
PCT/CN2021/084183 2021-03-30 2021-03-30 车牌字符识别方法、装置、设备及存储介质 WO2022205018A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180024657.XA CN115485746A (zh) 2021-03-30 2021-03-30 车牌字符识别方法、装置、设备及存储介质
PCT/CN2021/084183 WO2022205018A1 (zh) 2021-03-30 2021-03-30 车牌字符识别方法、装置、设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/084183 WO2022205018A1 (zh) 2021-03-30 2021-03-30 车牌字符识别方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022205018A1 true WO2022205018A1 (zh) 2022-10-06

Family

ID=83455484

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084183 WO2022205018A1 (zh) 2021-03-30 2021-03-30 车牌字符识别方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN115485746A (zh)
WO (1) WO2022205018A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410189A (zh) * 2022-10-31 2022-11-29 松立控股集团股份有限公司 一种复杂场景车牌检测方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180101750A1 (en) * 2016-10-11 2018-04-12 Xerox Corporation License plate recognition with low-rank, shared character classifiers
CN108564088A (zh) * 2018-04-17 2018-09-21 广东工业大学 车牌识别方法、装置、设备及可读存储介质
CN111553205A (zh) * 2020-04-12 2020-08-18 西安电子科技大学 无车牌信息车辆重识别方法、系统、介质、视频监控系统
CN111914838A (zh) * 2020-07-28 2020-11-10 同济大学 一种基于文本行识别的车牌识别方法
CN112052845A (zh) * 2020-10-14 2020-12-08 腾讯科技(深圳)有限公司 图像识别方法、装置、设备及存储介质
CN112508018A (zh) * 2020-12-14 2021-03-16 北京澎思科技有限公司 车牌识别方法、装置及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180101750A1 (en) * 2016-10-11 2018-04-12 Xerox Corporation License plate recognition with low-rank, shared character classifiers
CN108564088A (zh) * 2018-04-17 2018-09-21 广东工业大学 车牌识别方法、装置、设备及可读存储介质
CN111553205A (zh) * 2020-04-12 2020-08-18 西安电子科技大学 无车牌信息车辆重识别方法、系统、介质、视频监控系统
CN111914838A (zh) * 2020-07-28 2020-11-10 同济大学 一种基于文本行识别的车牌识别方法
CN112052845A (zh) * 2020-10-14 2020-12-08 腾讯科技(深圳)有限公司 图像识别方法、装置、设备及存储介质
CN112508018A (zh) * 2020-12-14 2021-03-16 北京澎思科技有限公司 车牌识别方法、装置及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410189A (zh) * 2022-10-31 2022-11-29 松立控股集团股份有限公司 一种复杂场景车牌检测方法

Also Published As

Publication number Publication date
CN115485746A (zh) 2022-12-16

Similar Documents

Publication Publication Date Title
CN110032962B (zh) 一种物体检测方法、装置、网络设备和存储介质
CN114202672A (zh) 一种基于注意力机制的小目标检测方法
CN111626295B (zh) 车牌检测模型的训练方法和装置
JP2018022360A (ja) 画像解析装置、画像解析方法およびプログラム
CN109145747A (zh) 一种水面全景图像语义分割方法
KR102197930B1 (ko) 번호판 인식 방법 및 시스템
CN113191204B (zh) 一种多尺度遮挡行人检测方法及系统
CN111091023A (zh) 一种车辆检测方法、装置及电子设备
CN111598065A (zh) 深度图像获取方法及活体识别方法、设备、电路和介质
CN111127516A (zh) 无搜索框的目标检测和跟踪方法及系统
CN114782412A (zh) 图像检测方法、目标检测模型的训练方法及装置
JP2011060221A (ja) 識別器生成方法、コンピュータプログラム、識別器生成装置及び所定物体検出装置
CN111881984A (zh) 一种基于深度学习的目标检测方法和装置
WO2022205018A1 (zh) 车牌字符识别方法、装置、设备及存储介质
CN115393635A (zh) 一种基于超像素分割以及数据增强的红外小目标检测方法
US20230087261A1 (en) Three-dimensional target estimation using keypoints
WO2022219402A1 (en) Semantically accurate super-resolution generative adversarial networks
CN114387346A (zh) 一种图像识别、预测模型处理方法、三维建模方法和装置
JP7165353B2 (ja) 画像特徴量出力装置、画像認識装置、画像特徴量出力プログラム、及び画像認識プログラム
CN112053407B (zh) 一种交通执法影像中基于ai技术的车道线自动检测方法
CN113284185A (zh) 用于遥感目标检测的旋转目标检测方法
CN111292331B (zh) 图像处理的方法与装置
CN116863227A (zh) 一种基于改进YOLOv5的危化品车辆检测方法
EP4332910A1 (en) Behavior detection method, electronic device, and computer readable storage medium
Song et al. Vision-based parking space detection: A mask R-CNN approach

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21933698

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21933698

Country of ref document: EP

Kind code of ref document: A1