CN115880693A - Roadside scene OCR method based on double-attention mechanism and content perception upsampling - Google Patents

Roadside scene OCR method based on double-attention mechanism and content perception upsampling Download PDF

Info

Publication number
CN115880693A
CN115880693A CN202211432082.8A CN202211432082A CN115880693A CN 115880693 A CN115880693 A CN 115880693A CN 202211432082 A CN202211432082 A CN 202211432082A CN 115880693 A CN115880693 A CN 115880693A
Authority
CN
China
Prior art keywords
upsampling
text
roadside
roadside scene
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211432082.8A
Other languages
Chinese (zh)
Inventor
石宏
谢红刚
肖进胜
侯凯元
肖胜华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Wanghua Technology Co ltd
Wuhan Telecom Industry Co ltd
Original Assignee
Wuhan Wanghua Technology Co ltd
Wuhan Telecom Industry Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Wanghua Technology Co ltd, Wuhan Telecom Industry Co ltd filed Critical Wuhan Wanghua Technology Co ltd
Priority to CN202211432082.8A priority Critical patent/CN115880693A/en
Publication of CN115880693A publication Critical patent/CN115880693A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides a roadside scene OCR method based on a double-attention mechanism and content perception upsampling, which comprises the following steps: preprocessing an image to be detected containing a roadside scene; performing text detection on the preprocessed image to be detected by adopting a multitask convolutional neural network model based on a double-attention mechanism and content perception upsampling; and performing text recognition on the text detection result by adopting a text recognition model considering the center loss to obtain roadside scene character data. The road condition perception capability of the running vehicle is improved, the road condition perception capability of the running vehicle is suitable for recognizing characters of complex roadside scenes, and the false detection rate can be obviously reduced.

Description

Roadside scene OCR method based on double-attention mechanism and content perception upsampling
Technical Field
The invention relates to the technical field of combination of automatic driving and computer vision, in particular to a roadside scene OCR method and system based on a double-attention machine mechanism and content perception upsampling, an electronic device and a storage medium.
Background
With the development of artificial intelligence, computer vision is widely applied to perception in complex scenes of vehicle driving, wherein character detection in road conditions is an important perception content. A common OCR (optical character recognition) method includes: and carrying out binarization segmentation according to the text characteristics, and carrying out character detection according to the segmentation result. Due to the fact that the scene of the running vehicle is complex, a large number of differences exist in the angle and size of the character text images of the roadside scene, and factors influencing detection are large.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides the roadside scene OCR method, the system, the electronic equipment and the storage medium based on the double-attention mechanism and the content perception up-sampling, which are beneficial to improving the road condition perception capability of the running vehicle, are suitable for identifying characters of complex roadside scenes and can obviously reduce the false detection rate.
According to a first aspect of the invention, there is provided a roadside scene OCR method based on a dual-attention mechanism and content-aware upsampling, comprising:
preprocessing an image to be detected containing a roadside scene;
performing text detection on the preprocessed image to be detected by adopting a multitask convolutional neural network model based on a double-attention mechanism and content perception upsampling;
and performing text recognition on the text detection result by adopting a text recognition model considering the center loss to obtain roadside scene character data.
On the basis of the technical scheme, the invention can be improved as follows.
Optionally, the preprocessing the to-be-detected image including the roadside scene includes:
acquiring vehicle roadside detection data, and acquiring a to-be-detected image containing roadside scene characters in the detection data;
extracting roadside scene character images from the image to be detected and correcting the roadside scene character images;
and performing brightness enhancement and/or color enhancement on the corrected roadside scene character image.
Optionally, a feature point matching-based method is adopted to extract and correct roadside scene character images in the image to be detected.
Optionally, the brightness of the image is enhanced through an MSRCR algorithm, and the color of the image is enhanced through an ACE algorithm.
Optionally, the preprocessed image to be detected is subjected to text detection by using a multitask convolutional neural network model based on a double-attention mechanism and content perception upsampling; the method comprises the following steps:
inputting an image to be detected into a multitask convolutional neural network model for text detection to obtain a purified characteristic diagram; the multitask convolution neural network model adopts a ResNet residual error network combined with deformable convolution, and each residual error block of the ResNet residual error network comprises a double attention module of channel attention and space attention;
and downsampling the purified feature map, upsampling by adopting an upsampling operator with content perception, fusing feature maps with different scales generated in the ResNet residual error network sampling process, and taking the fused text feature map as a text detection result.
Optionally, the process of performing text detection on the detected image in the ResNet residual network combined with the deformable convolution includes the following steps:
inputting an image to be detected into a deformable convolution unit of a ResNet residual error network, wherein the deformable convolution unit comprises a standard convolution unit and an offset domain with unchanged scale and 2N channels which are sequentially arranged, and the offset domain is obtained by calculating the standard convolution unit and represents the offset of each layer of pixel points of a convolution field in the x axis and the y axis; in a deformable convolution unit, after the offset is added to a standard convolution kernel, the size and the position of the convolution kernel are adaptively adjusted according to the content of an input feature map to obtain a feature map F;
respectively performing maximum pooling on the feature map F on the spatial dimension by using a channel attention module to obtain detail feature vectors with the size of 1 multiplied by C, and performing average pooling to obtain background feature vectors with the size of 1 multiplied by C; the detail feature vector and the background feature vector share a multilayer perceptron, the detail feature vector and the background feature vector are added pixel by pixel and pass through a sigmoid activation layer, and the channel attention M with the size of 1 multiplied by C is obtained c Attention to the channel M c Multiplying the channel coefficient by the characteristic diagram F to obtain a channel purification characteristic diagram F';
utilizing a space attention module to respectively perform maximum pooling on a channel purification feature map F 'on channel dimensions to obtain detail feature vectors with the size of H multiplied by W multiplied by 1, performing average pooling to obtain background feature vectors with the size of H multiplied by W multiplied by 1, splicing the detail feature vectors with the size of H multiplied by W multiplied by 1 and the background feature vectors obtained by maximum pooling and average pooling, and then sequentially passing through a convolution layer and a sigmoid activation layer to obtain space attention M' with the size of H multiplied by W multiplied by 1 s Attention to space M s Multiplying the space coefficient by the channel purification characteristic diagram F 'pixel by pixel to obtain an updated purification characteristic diagram F';
and after the purified characteristic diagram F ' is downsampled for multiple times, upsampling the purified characteristic diagram F ' by using an upsampling operator with content perception, fusing the purified characteristic diagrams F ' with different scales obtained in the downsampling and upsampling processes, and outputting a text box obtained by text detection.
Optionally, the performing text recognition on the text detection result by using a text recognition model considering center loss to obtain roadside scene character data includes:
inputting a text detection result into an associated sequence classification layer of a text recognition model to obtain a text transcription label;
inputting the text transcription tag into a time sequence classification layer in a text recognition model based on a convolution cyclic neural network to output a text recognition result, and reading roadside scene character data according to the text recognition result; and a central loss function is added into a loss module of the convolution cyclic neural network model.
Optionally, adding a central loss function to a loss module of the convolutional recurrent neural network model, including:
assuming a classification loss L of a fully connected classification layer in the convolutional recurrent neural network model s As shown in formula (1):
Figure BDA0003945545190000041
wherein x is i For input of feature vectors, x i ∈R d (ii) a W is a network matrix parameter, and W belongs to R d ,R d Is a network parameter set, j is a row sequence number of a network matrix W, and n is a row dimension of the network matrix W; y is i I is the serial number of the prediction category, and m is the number of the prediction categories; t represents the transposition operation of the matrix;
center loss L c Calculated by equation (2):
Figure BDA0003945545190000042
wherein, C yi Denotes y i Center of class feature distribution, x i Representing the features input into the fully-connected classification layer, m representing the number of prediction classes;
in the model training process, according to the back propagation of the training process, the central loss L is respectively aligned through the formula (3) and the formula (4) c And a characteristic distribution center C yi And (3) updating:
Figure BDA0003945545190000043
Figure BDA0003945545190000044
wherein, δ (y) i = k) denotes when prediction category is k, δ (y) i Value of = k) is equal to 1, otherwise equal to 0; k is the serial number of the current real label, C k For the feature distribution center, Δ C, of the current real tag k Distributing the center for the updated real label characteristic;
will update the center loss L c Passing formula (5) and classification loss L s Adding up to obtain the classification loss function value L total
L total =L s +λL c (5),
Wherein, the lambda is the weight and is taken according to experience.
According to a second aspect of the present invention, there is provided a roadside scene OCR system based on a dual-attention mechanism and content-aware upsampling, comprising:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for preprocessing an image to be detected containing a roadside scene;
the detection module is used for performing text detection on the preprocessed image to be detected by adopting a multitask convolutional neural network model based on a double-attention mechanism and content perception upsampling;
and the recognition module is used for performing text recognition on the text detection result by adopting a text recognition model considering the center loss to obtain roadside scene character data.
According to a third aspect of the present invention, there is provided an electronic device comprising a memory, a processor for implementing the steps of the roadside scene OCR method based on the dual-attention mechanism and content-aware upsampling described above when executing a computer management like program stored in the memory.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer management like program, which when executed by a processor, implements the steps of the roadside scene OCR method based on dual attention mechanism and content aware upsampling described above.
Compared with the traditional roadside scene character recognition method, the roadside scene character detection method, the roadside scene character recognition system, the electronic equipment and the storage medium, the roadside scene OCR method, the roadside scene character recognition system, the electronic equipment and the storage medium are more suitable for roadside scene character detection in complex scenes, and the accuracy of roadside scene character information detection in complex scenes can be obviously improved.
Drawings
FIG. 1 is a flow chart of a roadside scene OCR method based on a dual-attention mechanism and content-aware upsampling according to the present invention;
FIG. 2 is a flow chart of a method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the processing procedure of the tested image in the multitask convolutional neural network model in the embodiment shown in FIG. 2;
FIG. 4 is a schematic diagram of a roadside scene OCR system based on a dual-attention mechanism and content-aware upsampling according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a hardware structure of a possible electronic device provided in the present invention;
fig. 6 is a schematic diagram of a hardware structure of a possible computer-readable storage medium provided in the present invention.
Detailed Description
The following detailed description of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention, but are not intended to limit the scope of the invention.
Fig. 1 is a flowchart of a roadside scene OCR method based on a dual-attention mechanism and content-aware upsampling according to the present invention, as shown in fig. 1, the method includes:
preprocessing an image to be detected containing a roadside scene;
performing text detection on the preprocessed image to be detected by adopting a multitask convolutional neural network model based on a double-attention mechanism and content perception upsampling;
and performing text recognition on the text detection result by adopting a text recognition model considering the center loss to obtain roadside scene character data.
It can be appreciated that, based on the drawbacks of the background art, the embodiment of the present invention provides a roadside scene OCR method based on a dual-attention mechanism and content-aware upsampling. Due to the fact that the scene where the vehicle is located is complex, the roadside scene character images obtained are greatly different in angle and size, and factors influencing detection are more, so that the method is provided, the method is more suitable for roadside scene character detection in complex scenes, and accuracy of roadside scene character information detection in complex scenes can be remarkably improved.
In a possible embodiment, as shown in fig. 2, the preprocessing the image to be measured including the roadside scene includes:
s100, vehicle roadside detection data are obtained, and an image to be detected containing roadside scene characters in the detection data is obtained;
specifically, the images to be detected in the roadside detection data can be obtained by shooting through a monitoring camera.
S200, extracting roadside scene character images from the image to be detected and correcting the roadside scene character images;
the roadside scene character image in the image to be detected can be extracted by adopting a characteristic point matching-based method and corrected, such as inclination correction.
S300, performing brightness enhancement on the corrected roadside scene character images;
further, the MSRCR (multi-scale retina enhancement) algorithm can be adopted to enhance the brightness of the corrected roadside scene text image.
And S400, performing color enhancement on the roadside scene character image with enhanced brightness.
Further, color enhancement can be performed on the roadside scene character images after brightness enhancement by adopting an ACE (adaptive communication environment) -based method.
It can be understood that through steps S100 to S400, preprocessing of the image to be detected is realized, and the preprocessed image to be detected can be input into the multitask convolutional neural network model for text detection.
In a possible embodiment, in step S500, the preprocessed image to be detected is subjected to text detection by using a multitask convolutional neural network model based on a dual attention mechanism and content-aware upsampling; the method comprises the following steps:
inputting an image to be detected into a trained multitask convolution neural network model for text detection to obtain a purified characteristic diagram; the multitask convolution neural network model adopts a ResNet residual network combined with deformable convolution, and a double attention module of channel attention and space attention is added in each residual block of the ResNet residual network;
and downsampling the purified feature map, upsampling by adopting an upsampling operator with content perception, fusing feature maps with different scales generated in the ResNet residual error network sampling process, and outputting the fused text feature map (such as a text box) as a text detection result.
The following provides a specific implementation of this step with reference to fig. 3:
step S510, inputting an image to be detected into a deformable convolution unit of a ResNet residual network, wherein the deformable convolution unit comprises a standard convolution unit and an offset domain with unchanged scale and 2N channels, which are sequentially arranged, and the offset domain is obtained by calculating the standard convolution unit and represents the offset of each layer of pixel points of a convolution field in an x axis and a y axis; and in the deformable convolution unit, after the standard convolution kernel is added with the offset, the size and the position of the convolution kernel are adaptively adjusted according to the content of the input feature map to obtain a feature map F.
In the step, the standard convolution of the ResNet residual error network part is replaced by the deformable convolution so as to adapt to the feature extraction of the text with irregular shape and changeable font.
Step S520, the feature map F is respectively subjected to maximum pooling by utilizing a channel attention module in the spatial dimension to obtain a detailed feature vector with the size of 1 multiplied by C, and average pooling is carried out to obtain a background feature vector with the size of 1 multiplied by C; the detail feature vector and the background feature vector share a multilayer perceptron, the detail feature vector and the background feature vector are added pixel by pixel and pass through a sigmoid activation layer, and the channel attention M with the size of 1 multiplied by C is obtained c Attention to the channel M c Multiplying the channel coefficient and the characteristic diagram F to obtain a channel purification characteristic diagram F';
step S530, a space attention module is used for respectively carrying out maximum pooling on a channel purification feature map F' on channel dimensions to obtain detail feature vectors with the size of H multiplied by W multiplied by 1, carrying out average pooling to obtain background feature vectors with the size of H multiplied by W multiplied by 1, splicing the detail feature vectors with the size of H multiplied by W multiplied by 1 and the background feature vectors obtained by maximum pooling and average pooling, and then sequentially passing through a convolution layer and a sigmoid activation layer to obtain space attention M with the size of H multiplied by W multiplied by 1 s Attention to space M s Multiplying the space coefficient by the channel purification characteristic diagram F 'pixel by pixel to obtain an updated purification characteristic diagram F';
and step S540, after the purified feature map F ' is continuously downsampled for three times, the purified feature map F ' is upsampled for three times by using an upsampling operator with content perception, as can be seen from the graph in FIG. 3, after each downsampling or upsampling operation is carried out, the sizes of the feature maps are changed, and the purified feature maps F ' with different scales obtained by downsampling and upsampling are fused to obtain a final output text box of text detection.
It is understood that after steps S510 to S540 shown in fig. 2 to fig. 3, the text recognition may be performed on the output text box in the subsequent step S600 to read the text data of the roadside scene of the vehicle.
In a possible embodiment, in step S600, since the text lines in the landmark are all regular text lines, the embodiment adds a center loss to the text recognition model to further improve the character recognition accuracy. Specifically, step S600 includes:
s610, inputting a text detection result (namely the output text box obtained in the step S500) into an associated sequence classification layer of a text recognition model based on a convolution cyclic neural network to obtain a text transcription label;
the step is to avoid the situation that in the next step, the length of a predicted text obtained after the convolution cyclic neural network obtains the character posterior probability matrix is larger than the actual text label length.
S620, inputting the text transcription label into a time sequence classification layer in a text recognition model based on a convolution cycle neural network to output a text recognition result, and reading roadside scene character data according to the text recognition result; and a central loss function is added into a loss module of the convolution cyclic neural network model.
Specifically, adding a central loss function to a loss module of the convolutional recurrent neural network model includes:
assuming a classification loss L of a fully connected classification layer in the convolutional recurrent neural network model s As shown in formula (1):
Figure BDA0003945545190000101
wherein x is i For input of feature vectors, x i ∈R d (ii) a W is a network matrix parameter, W belongs to R d ,R d Is a network parameter set, j is a row sequence number of a network matrix W, and n is a row latitude of the network matrix W; yi is a prediction type, i is a serial number of the prediction type, m is the number of the prediction types, and T represents that the matrix is subjected to transposition operation;
center loss L c Calculated by equation (2):
Figure BDA0003945545190000102
wherein, C yi Denotes y i Class specificCenter of the characteristic distribution, x i Representing features input into the fully-connected classification layer, m representing the number of prediction classes;
from the above equation, the center loss Lc is the distance between the feature of the calculated sample and the feature center, and it is desirable that the distance between the feature of the input sample in a batch and the feature center of the class is as small as possible.
Considering that the centroid is transformed during the training process, the centroid loss and the feature centroid need to be updated according to the centroid adjustment problem. In the model training process, according to the back propagation of the training process, the central loss L is respectively adjusted through the formula (3) and the formula (4) c And a characteristic distribution center C yi And (3) updating:
Figure BDA0003945545190000111
Figure BDA0003945545190000112
wherein, δ (y) i = k) denotes δ (y) when the prediction class is k i Value of = k) is equal to 1, otherwise equal to 0; k is the serial number of the current real label, C k For the feature distribution center, Δ C, of the current real tag k Distributing centers for the updated real label features;
as can be seen from the above equation (4), only the predicted category y i When the label k is equal to the real label k, the feature center C of the class is updated k
Center loss L after updating c Passing formula (5) and classification loss L s Adding them to obtain the classification loss function value L total
L total =L s +λL c (5),
Wherein λ is a weight, and the value of the embodiment is 0.1 according to experience.
Then adopting the classification loss function value L obtained by the calculation of the steps total Iterating a loss module of a convolutional recurrent neural network modelAnd updating until an iteration stop condition is met.
It can be understood that, in the embodiment, the central loss function is added to the loss module, and it is hoped that the situation of character misrecognition is reduced by increasing the gaps between character feature distributions, so as to improve the accuracy of text recognition.
Fig. 4 is a structural diagram of a roadside scene OCR system based on a dual-attention mechanism and content-aware upsampling according to an embodiment of the present invention, and as shown in fig. 4, the roadside scene OCR system based on the dual-attention mechanism and content-aware upsampling includes a preprocessing module, a detection module, and an identification module, where:
the preprocessing module is used for preprocessing the image to be detected containing the roadside scene;
the detection module is used for performing text detection on the preprocessed image to be detected by adopting a multitask convolutional neural network model based on a double-attention mechanism and content perception upsampling;
and the recognition module is used for performing text recognition on the text detection result by adopting a text recognition model considering the center loss to obtain roadside scene character data.
It can be understood that the roadside scene OCR system based on the dual-attention mechanism and the content-aware upsampling provided by the present invention corresponds to the roadside scene OCR method based on the dual-attention mechanism and the content-aware upsampling provided by the foregoing embodiments, and the relevant technical features of the roadside scene OCR system based on the dual-attention mechanism and the content-aware upsampling may refer to the relevant technical features of the roadside scene OCR method based on the dual-attention mechanism and the content-aware upsampling, and are not described herein again.
Referring to fig. 5, fig. 5 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the invention. As shown in fig. 5, an embodiment of the present invention provides an electronic device 500, which includes a memory 510, a processor 520, and a computer program 511 stored in the memory 510 and executable on the processor 520, wherein the processor 520 executes the computer program 511 to implement the following steps:
preprocessing an image to be detected containing a roadside scene;
performing text detection on the preprocessed image to be detected by adopting a multitask convolutional neural network model based on a double-attention mechanism and content perception upsampling;
and performing text recognition on the text detection result by adopting a text recognition model considering the center loss to obtain roadside scene character data.
Referring to fig. 6, fig. 6 is a schematic diagram of an embodiment of a computer-readable storage medium according to the present invention. As shown in fig. 6, the present embodiment provides a computer-readable storage medium 600 having a computer program 611 stored thereon, the computer program 611, when executed by a processor, implementing the steps of:
preprocessing an image to be detected containing a roadside scene;
performing text detection on the preprocessed image to be detected by adopting a multitask convolutional neural network model based on a double-attention mechanism and content perception upsampling;
and performing text recognition on the text detection result by adopting a text recognition model considering the center loss to obtain roadside scene character data.
The roadside scene OCR method, the roadside scene OCR system and the storage medium based on the double-attention mechanism and the content perception upsampling are more suitable for roadside scene character detection in complex scenes, for example, the complex scenes with a great amount of difference of roadside scene character images in angles and sizes are eliminated from interference factors of detection, and accuracy of roadside scene character information detection in the complex scenes can be obviously improved.
It should be noted that, in the foregoing embodiments, the description of each embodiment has an emphasis, and reference may be made to the related description of other embodiments for a part that is not described in detail in a certain embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A roadside scene OCR method based on a double-attention mechanism and content perception upsampling is characterized by comprising the following steps:
preprocessing an image to be detected containing a roadside scene;
performing text detection on the preprocessed image to be detected by adopting a multitask convolutional neural network model based on a double-attention mechanism and content perception upsampling;
and performing text recognition on the text detection result by adopting a text recognition model considering the center loss to obtain roadside scene character data.
2. The roadside scene OCR method based on the dual-attention mechanism and the content-aware upsampling as recited in claim 1, wherein the preprocessing the to-be-detected image including the roadside scene comprises:
acquiring vehicle roadside detection data, and acquiring an image to be detected containing roadside scene characters in the detection data;
extracting roadside scene character images from the image to be detected and correcting the roadside scene character images;
and performing brightness enhancement and/or color enhancement on the corrected roadside scene character image.
3. A roadside scene OCR method based on dual-attention mechanism and content-aware upsampling as recited in claim 2,
and extracting and correcting roadside scene character images in the image to be detected by adopting a characteristic point matching-based method.
4. A roadside scene OCR method based on double attention mechanism and content aware upsampling as recited in claim 2 or 3, wherein the image is subjected to brightness enhancement by MSRCR algorithm and color enhancement by ACE algorithm.
5. The roadside scene OCR method based on the double-attention mechanism and the content perception upsampling as recited in claim 2, wherein the preprocessed image to be detected is subjected to text detection by adopting a multitask convolutional neural network model based on the double-attention mechanism and the content perception upsampling; the method comprises the following steps:
inputting an image to be detected into a multitask convolutional neural network model for text detection to obtain a purified characteristic diagram; the multitask convolution neural network model adopts a ResNet residual network combined with deformable convolution, and each residual block of the ResNet residual network comprises a double attention module of channel attention and space attention;
and downsampling the purified characteristic graph, upsampling by adopting an upsampling operator with content perception, fusing the characteristic graphs with different scales generated in the sampling process of the ResNet residual error network, and taking the fused text characteristic graph as a text detection result.
6. The roadside scene OCR method based on the double-attention mechanism and the content perception upsampling as claimed in claim 5, wherein the process of text detection on the detected image in the ResNet residual network combined with deformable convolution comprises the following steps:
inputting an image to be detected into a deformable convolution unit of a ResNet residual network, wherein the deformable convolution unit comprises a standard convolution unit and an offset domain with unchanged scale and 2N channels, which are sequentially arranged, and the offset domain is obtained by calculation of the standard convolution unit and represents the offset of each layer of pixel points in a convolution field in an x axis and a y axis; in a deformable convolution unit, after the standard convolution kernel is added with the offset, the size and the position of the convolution kernel are adaptively adjusted according to the content of the input feature map to obtain a feature map F;
respectively performing maximum pooling on the feature map F on the spatial dimension by using a channel attention module to obtain detail feature vectors with the size of 1 multiplied by C, and performing average pooling to obtain background feature vectors with the size of 1 multiplied by C; the detail feature vector and the background feature vector share a multilayer perceptron, the detail feature vector and the background feature vector are added pixel by pixel and pass through a sigmoid activation layer, and channel attention M with the size of 1 multiplied by C is obtained c Attention to the channel M c Multiplying the channel coefficient by the characteristic diagram F to obtain a channel purification characteristic diagram F';
utilizing a space attention module to respectively perform maximum pooling on a channel purification feature map F 'on channel dimensions to obtain detail feature vectors with the size of H multiplied by W multiplied by 1, performing average pooling to obtain background feature vectors with the size of H multiplied by W multiplied by 1, splicing the detail feature vectors with the size of H multiplied by W multiplied by 1 and the background feature vectors obtained by maximum pooling and average pooling, and then sequentially passing through a convolution layer and a sigmoid activation layer to obtain space attention M' with the size of H multiplied by W multiplied by 1 s Attention to space M s Multiplying the space coefficient by the channel purification characteristic diagram F 'pixel by pixel to obtain an updated purification characteristic diagram F';
and after down-sampling the purified feature map F ' for multiple times, up-sampling the purified feature map F ' by using an up-sampling operator with content perception, fusing the purified feature maps F ' with different scales obtained by down-sampling and up-sampling, and outputting a text box obtained by text detection.
7. The roadside scene OCR method based on the dual-attention mechanism and the content-aware upsampling as recited in claim 1, wherein the performing text recognition on the text detection result by using a text recognition model considering center loss to obtain roadside scene character data comprises:
inputting a text detection result into an associated sequence classification layer of a text recognition model to obtain a text transcription label;
inputting the text transcription label into a time sequence classification layer in a text recognition model based on a convolution cyclic neural network so as to output a text recognition result, and reading roadside scene character data according to the text recognition result; and a central loss function is added into a loss module of the convolution cyclic neural network model.
8. The roadside scene OCR method based on the double-attention mechanism and the content perception upsampling as claimed in claim 7, wherein a central loss function is added in a loss module of the convolution cyclic neural network model, and the method comprises the following steps:
assuming a classification penalty L for a fully connected classification layer in the convolutional recurrent neural network model s As shown in formula (1):
Figure FDA0003945545180000041
wherein x is i For the input feature vector, x i ∈R d (ii) a W is a network matrix parameter, W belongs to R d ,R d Is a network parameter set, j is a row sequence number of a network matrix W, and n is a row latitude of the network matrix W; y is i The matrix is a prediction type, i is a serial number of the prediction type, m is the number of the prediction types, and T represents the transposition operation of the matrix;
center loss L c Calculated by equation (2):
Figure FDA0003945545180000042
wherein, C yi Denotes y i Center of class feature distribution, x i Representing features input into the fully-connected classification layer, m representing the number of prediction classes;
in the model training process, according to the back propagation of the training process, the central loss L is respectively aligned through the formula (3) and the formula (4) c And center of feature distribution
Figure FDA0003945545180000045
And (3) updating:
Figure FDA0003945545180000043
/>
Figure FDA0003945545180000044
wherein, δ (y) i = k) denotes δ (y) when the prediction class is k i K) is equal to 1, otherwise equal to 0; k is the serial number of the current real label, C k For the feature distribution center, Δ C, of the current real tag k Distributing centers for the updated real label features;
center loss L after updating c Passing formula (5) and classification loss L s Adding up to obtain the classification loss function value L total
L total =L s +λL c (5),
Wherein, the lambda is the weight and is taken according to experience.
9. A roadside scene OCR system based on a dual-attention mechanism and content-aware upsampling, comprising:
the preprocessing module is used for preprocessing the image to be detected containing the roadside scene;
the detection module is used for performing text detection on the preprocessed image to be detected by adopting a multitask convolutional neural network model based on a double-attention mechanism and content perception upsampling;
and the recognition module is used for performing text recognition on the text detection result by adopting a text recognition model considering the center loss to obtain roadside scene character data.
10. An electronic device comprising a memory, a processor for implementing the steps of the dual-attention mechanism and content-aware upsampling based roadside scene OCR method as claimed in any one of claims 1-8 when executing a computer management like program stored in the memory.
CN202211432082.8A 2022-11-16 2022-11-16 Roadside scene OCR method based on double-attention mechanism and content perception upsampling Pending CN115880693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211432082.8A CN115880693A (en) 2022-11-16 2022-11-16 Roadside scene OCR method based on double-attention mechanism and content perception upsampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211432082.8A CN115880693A (en) 2022-11-16 2022-11-16 Roadside scene OCR method based on double-attention mechanism and content perception upsampling

Publications (1)

Publication Number Publication Date
CN115880693A true CN115880693A (en) 2023-03-31

Family

ID=85759961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211432082.8A Pending CN115880693A (en) 2022-11-16 2022-11-16 Roadside scene OCR method based on double-attention mechanism and content perception upsampling

Country Status (1)

Country Link
CN (1) CN115880693A (en)

Similar Documents

Publication Publication Date Title
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
CN109086811B (en) Multi-label image classification method and device and electronic equipment
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
CN110569696A (en) Neural network system, method and apparatus for vehicle component identification
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN112016467B (en) Traffic sign recognition model training method, recognition method, system, device and medium
CN110991513B (en) Image target recognition system and method with continuous learning ability of human-like
CN111461213B (en) Training method of target detection model and target rapid detection method
CN111274980B (en) Small-size traffic sign identification method based on YOLOV3 and asymmetric convolution
CN110689043A (en) Vehicle fine granularity identification method and device based on multiple attention mechanism
CN112036455B (en) Image identification method, intelligent terminal and storage medium
CN111476806B (en) Image processing method, image processing device, computer equipment and storage medium
CN110852358A (en) Vehicle type distinguishing method based on deep learning
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN116977844A (en) Lightweight underwater target real-time detection method
CN113033305B (en) Living body detection method, living body detection device, terminal equipment and storage medium
CN112132753B (en) Infrared image super-resolution method and system for multi-scale structure guide image
CN115880693A (en) Roadside scene OCR method based on double-attention mechanism and content perception upsampling
CN115861595A (en) Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning
CN111639542A (en) License plate recognition method, device, equipment and medium
JP7117720B1 (en) image generator
CN116665113B (en) Remote sensing scene recognition method, system and medium based on uncertainty quantification
CN117253217A (en) Charging station vehicle identification method and device, electronic equipment and storage medium
CN117576369A (en) Fire detection method, fire detection device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination