CN113989604A - Tire DOT information identification method based on end-to-end deep learning - Google Patents

Tire DOT information identification method based on end-to-end deep learning Download PDF

Info

Publication number
CN113989604A
CN113989604A CN202111370406.5A CN202111370406A CN113989604A CN 113989604 A CN113989604 A CN 113989604A CN 202111370406 A CN202111370406 A CN 202111370406A CN 113989604 A CN113989604 A CN 113989604A
Authority
CN
China
Prior art keywords
tire
dot
information
feature
dot information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111370406.5A
Other languages
Chinese (zh)
Inventor
蔡念
李嘉豪
何兆泉
罗智浩
王晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202111370406.5A priority Critical patent/CN113989604A/en
Publication of CN113989604A publication Critical patent/CN113989604A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06T3/02

Abstract

The invention discloses a tire DOT information identification method based on end-to-end deep learning, which comprises the following steps: carrying out feature extraction on the tire image to respectively obtain first feature maps output in N stages, and simultaneously carrying out feature fusion on the first feature maps output in the N stages to obtain a second feature map; performing DOT information rough positioning on the fused second feature map, and detecting whether three characters of DOT and position information thereof exist or not to obtain a regional map; generating a mask image from the area image, multiplying the mask image by the second feature image, finely positioning DOT information on a third feature image obtained by multiplying, obtaining DOT information text probability and position information, and positioning to a candidate text block with an angle; detecting the bending direction of the tire on the first characteristic diagram output in the last stage to acquire character direction information of the tire tread of the tire; affine transformation is carried out on the candidate text blocks and the character direction information of the tire tread, and the candidate text blocks and the character direction information are converted into horizontal text blocks with upward directions; and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.

Description

Tire DOT information identification method based on end-to-end deep learning
Technical Field
The invention relates to the technical field of image recognition, in particular to a tire DOT information recognition method based on end-to-end deep learning.
Background
Tire DOT information carries product information for its tire manufacturer and is very important to the manufacturer. The factory needs to determine information such as the origin information, factory code, and date of manufacture of the tire from the DOT information of the recovered tire. In the automobile manufacturing industry, the process on each relevant tire needs to read and match the tread information, and if the identification information is wrong with the actual condition, the result that the estimation is impossible can be caused. If rely on artifical the detection, the speed is very slow, still needs a large amount of manpowers, and long-time work brings visual fatigue and also can make the rate of accuracy decline. There is therefore a need for an automatic detection system for detecting tire DOT information.
The existing methods for locating and identifying the DOT information of the tire mostly adopt the traditional image processing method to detect the target, as follows:
(1) there are conventional methods of image processing. When the characters are positioned through template matching, the abrasion of the die causes the surrounding fins of the characters to cover the character areas, the matching of template pictures is wrong, and the like, so that the positioning is wrong. If the distance between the embossed characters is small, the embossed characters can not be well segmented because the projection curve presents a trough which is not obvious enough, and the identification accuracy is affected. The Least Square Support Vector Machine (LSSVM) training is based on a single character, the types of the tire DOT information characters are numerous, the individual characters need to be manually segmented to serve as a data set, and the training is very tedious and has large workload. If the boundary of the segmented character is not good, the trained model is influenced, and the recognition effect is further influenced.
Meanwhile, such conventional image-based processing methods involve image preprocessing, character positioning, character segmentation, and character recognition, which require artificial setting of thresholds. Due to the influence of factors such as the shooting environment and the material of the tire, the threshold values cannot be well applied, and the accuracy of tire character detection and identification is further influenced.
(2) The method combines the existing deep learning and the traditional image processing. Extracting concentric circles through Hough circle transformation, and unfolding the obtained concentric circles into rectangles, namely unfolding the tire surface into a rectangular tire tread. And finally, carrying out target detection on the DOT character on the tire by a deep learning Faster R-CNN method. Here, only the tire DOT information is subjected to position detection and then recognized. The stay is that after the tire text information is detected by the positioning network, the tire text information is identified by the text identification network, and the tire text information and the text identification network are respectively and independently trained. Such a stepwise approach involves a number of cumbersome steps and error accumulation, resulting in poor tire DOT information identification performance.
Disclosure of Invention
The invention provides a tire DOT information identification method based on end-to-end deep learning, aiming at solving the problem of low identification accuracy rate in the prior art, and the identification accuracy rate of the tire DOT information can be effectively improved.
In order to achieve the purpose of the invention, the technical scheme is as follows:
a tire DOT information identification method based on end-to-end deep learning comprises the following steps:
s1: carrying out feature extraction on the tire image acquired with the tire DOT information to respectively obtain first feature maps output in N stages, and simultaneously carrying out feature fusion on the first feature maps output in the N stages to obtain a second feature map;
s2: performing DOT information rough positioning on the fused second feature map, and detecting whether three characters of DOT and position information thereof exist or not so as to obtain a regional map;
s3: generating a mask image from the area image, multiplying the mask image by the second feature image, and performing DOT information fine positioning on a third feature image obtained after multiplication to obtain DOT information text probability and position information so as to position a candidate text block with an angle;
s4: detecting the bending direction of the tire on the first characteristic diagram output in the last stage to acquire character direction information of the tire tread of the tire;
s5: affine transformation is carried out on the candidate text blocks obtained in the step S3 and the character direction information of the tire tread obtained in the step S4, and the candidate text blocks are converted into horizontal text blocks with upward directions;
s6: and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
Preferably, in step S1, a feature extraction network is used to perform feature extraction, where the feature extraction network includes a ResNet-50 network and a feature pyramid network FPN;
the ResNet-50 network firstly extracts the characteristics of the tire images of which the DOT information is acquired, so as to obtain first characteristic maps C1, C2, C3 and C4 which are output in 4 stages, wherein the corresponding resolutions are 1/4,1/8,1/16 and 1/32 of the input tire images respectively;
inputting the first feature maps C1, C2, C3 and C4 into a feature pyramid network FPN for feature fusion, wherein the feature fusion is used for connecting low-level feature mapping and high-level semantic feature mapping; and respectively outputting second characteristic maps P1, P2, P3 and P4.
Further, the rough location of the DOT information is as follows:
s201: inputting the fused second feature P1 into a spatial attention module to obtain an output feature map A1;
s202: prediction is carried out through a regional suggestion network RPN, softmax classification and position regression are carried out on the feature map A1, and a regional map containing the DOT three characters and position information thereof is obtained.
Still further, the detailed steps of the fine positioning of DOT information are as follows:
s301: establishing a text detection branch for a third feature map, wherein the size of the third feature map is 1/4 the size of the collected tire image;
s302: the third feature map consists of six channels, the first channel calculates the probability that each pixel is a positive sample, the middle four channels calculate the distance between each positive sample pixel point and the upper, right, lower and left boundaries of the text box, and the last channel predicts the direction of the related boundary box;
s303: thereby generating DOT information text probabilities and location information and locating candidate text blocks for the tire image.
Still further, the detection of the tire bending direction is specifically as follows: and taking out the last layer of feature map C4 output by the ResNet-50 network, classifying through two layers of full connection layers and through class, predicting a 4-dimensional array, representing the probability of belonging to four directions, namely the upper direction, the lower direction, the left direction and the right direction of the tire bending, and obtaining the character direction information of the tire tread.
Still further, in step S5, the affine transformation is specifically as follows:
inputting the obtained character direction information of the tire tread and the obtained candidate text block with an angle as affine transformation parameters into an ROI Rotate module, and carrying out affine transformation on the text block to convert the text block into a horizontal text block with an upward direction; the process of the radiation transformation is divided into two steps:
(1) calculating affine transformation parameters through the predicted coordinates of the text candidates obtained in step S3 and the character direction information of the tire tread obtained in step S4;
(2) an affine transformation is applied to the shared feature maps of each region separately, and a normal-case horizontal feature map of the text region is obtained.
Furthermore, in order to reduce the influence of local loss of each stage on convergence, a total loss function is adopted for training to ensure effective convergence; wherein the total loss function is defined as follows:
Ltotal=λ1Ldot2Ldetect3Lcls4Lrg
in the formula, LdotA loss function representing a DOT information rough positioning stage; l isdetectA loss function representing the DOT information fine positioning stage; l isclsA loss function representing a stage of detecting a bending direction of the tire; l isrgA loss function representing a DOT character recognition stage; lambda [ alpha ]1,λ2,λ3,λ4Are the corresponding trade-off factors representing the contribution of the four losses to the overall loss function.
Still further, the loss function L of DOT information rough positioning stagedotThe system consists of a classification loss function and a position regression loss function, and the formula is as follows:
Figure BDA0003362170740000041
in the formula, i represents an anchor frame index; p is a radical ofiRepresenting the probability that a positive sample is predicted;
Figure BDA0003362170740000042
representing the probability of the corresponding true value; t is tiA candidate box representing a prediction;
Figure BDA0003362170740000043
representing a true label box corresponding to the positive anchor; n is a radical ofs、NgRespectively representing the number of samples of the corresponding tasks; l issRepresenting a classification loss function; l isgThe positional regression loss function is represented.
Still further, so the loss function L of the DOT information fine positioning stagedetect
Ldetect=Ldcls+Ldreg
Figure BDA0003362170740000044
Wherein, | X ^ Y | represents the intersection of set X and Y, | X | and | Y | represent their element number, for cutting apart the task, | X | and | Y | represent group True and Predict mask cut separately; l isdclsRepresents a loss of classification;
Ldregrepresenting the total Loss of coordinate regression, using the IOU Loss + cosine angle difference Loss:
Ldreg=Liou+Lθ
IOU Loss:
Figure BDA0003362170740000045
wherein the content of the first and second substances,
Figure BDA0003362170740000046
representing predicted geometry, R*Is its corresponding label box;
cosine angle difference loss:
Figure BDA0003362170740000047
wherein the content of the first and second substances,
Figure BDA0003362170740000048
is a prediction of the angle of rotation, and θ*Indicating the annotated value.
Still further, the loss function L of the tire bending direction detection stageclsThe formula is as follows:
Figure BDA0003362170740000051
Figure BDA0003362170740000052
in the formula, ajRepresents the jth value of the input vector T; a iskRepresents the kth value of the input vector T; y isjRepresenting a real tag; t represents the number of categories; sjIs the jth value of the vector S, indicating the probability that this sample belongs to the jth class;
the loss function L of the DOT character recognition stagergThe formula is as follows:
Figure BDA0003362170740000053
where ψ represents a set of ground truth sequences; y denotes the estimated sequence and l the authentic marker sequence.
The invention has the following beneficial effects:
1. the invention combines the characteristic that the DOT information of the tire starts with the DOT character, and the DOT character is roughly positioned and found out through the DOT information. Meanwhile, a mask image with a DOT information rough position is generated and multiplied by a third feature image output by a feature extraction network, the multiplied result is used as the input of DOT fine positioning, the interference of character information outside a DOT information area is eliminated, the region of interest is extracted, and the detection accuracy is further improved.
2. The invention provides a tire DOT information identification method based on end-to-end deep learning, wherein a frame of the tire DOT information identification method is seamlessly composed of a feature extraction network module, a DOT information rough positioning module, a DOT information fine positioning module, a tire bending direction detection module, an ROI Rotate module and a DOT character identification module so as to complete different tasks. And each part of the framework is not independent, and needs to be trained through a total loss function without error accumulation.
3. Compared with the traditional text positioning algorithm, the method adopts the feature extraction network to detect the DOT information text, can extract abundant features, and improves the text detection accuracy. In the traditional text recognition method, the characters need to be segmented firstly and then recognized, and when the character spacing is small, the segmentation effect is not good, and recognition is influenced. And a text recognition network based on deep learning is introduced, so that even if the intervals among the DOT information embossed characters of the tire are very small and even the embossed characters are connected, the DOT information embossed characters can be accurately recognized.
Drawings
Fig. 1 is a schematic block diagram of a tire DOT information identification method based on end-to-end deep learning shown in embodiment 1.
Fig. 2 is a tire image of the tire DOT information acquired in example 1.
Detailed Description
The invention is described in detail below with reference to the drawings and the detailed description.
Example 1
As shown in fig. 1, a method for identifying DOT information of a tire based on end-to-end deep learning includes the following steps:
s1: the tire image of the tire DOT information acquired by the image acquisition hardware system is shown in fig. 2, a camera of the tire image system shoots a part of the tire each time, the resolution is high, and characters on the tire can be clearly displayed in the image. As can be seen from fig. 2, since the direction in which the tire is curved is different, the text information can be corrected in the positive direction based on this information. Each group of DOT information has three characters of 'DOT' in front, so that whether the image has complete DOT information or not can be judged according to the information.
In the embodiment, the tire image acquired with the tire DOT information is subjected to feature extraction to respectively obtain the first feature maps output in the N stages, and meanwhile, the first feature maps output in the N stages are subjected to feature fusion to obtain the second feature map.
In a specific embodiment, in step S1, a feature extraction network is used to perform feature extraction, where the feature extraction network includes a residual error network ResNet-50 network and a feature pyramid network FPN. The feature extraction network is a convolutional neural network.
The residual error network ResNet-50 network firstly extracts the characteristics of the tire images collected with the tire DOT information, as shown in FIG. 1, first characteristic maps C1, C2, C3 and C4 output in 4 stages are obtained, and the corresponding resolutions are 1/4,1/8,1/16 and 1/32 of the input tire images respectively.
Still further, the first feature maps C1, C2, C3 and C4 are respectively input into a feature pyramid network FPN for feature fusion, and are used for connecting low-level feature mapping and high-level semantic feature mapping; the second feature maps P1, P2, P3, P4 are output, respectively, with the corresponding resolutions 1/4,1/8,1/16,1/32, respectively, of the tire image.
And a feature extraction part, outputting the output of the last 4 stages of the residual error network ResNet-50 as the input of a feature fusion stage.
The feature fusion part has higher resolution of low-level features and contains more position and detail information, but has lower semantic property and more noise due to less convolution. The high-level features have stronger semantic information, but the resolution is very low, and the perception capability of the details is poor. The advantages of each layer can be extracted by fusing the characteristic diagram of the bottom layer and the characteristic diagram of the high layer, so that the detection and segmentation performance is improved.
S2: and roughly positioning DOT information on the fused second feature map, and detecting whether three characters of DOT and position information of the three characters exist or not so as to obtain a regional map.
In a specific embodiment, the rough location of the DOT information is as follows:
s201: inputting the fused second feature P1 into a spatial attention module to obtain an output feature map A1; in order to make the network focus more on text features, a spatial Attention module (Attention module) is added.
S202: prediction is carried out through a regional suggestion network RPN, softmax classification and position regression are carried out on the feature map A1, and a regional map containing the DOT three characters and position information thereof is obtained. After the DOT three-character position information is obtained, the DOT information area position is preliminarily positioned.
S3: and generating a mask image from the area image, multiplying the mask image by the second feature image, and finely positioning DOT information on a third feature image obtained after multiplication to obtain DOT information text probability and position information so as to position the candidate text block with an angle.
In a specific embodiment, the detailed steps of the fine positioning of DOT information are as follows:
s301: and establishing a text detection branch for a third feature map, wherein the size of the third feature map is 1/4 of the size of the acquired tire image, so that the huge calculation amount can be reduced, the positioning performance is not obviously lost, and after the DOT information is roughly positioned, the calculation of other character information of the tire is shielded, so that the prediction is more focused on the DOT information region part.
S302: the third feature map is composed of six channels, the first channel calculates the probability that each pixel is a positive sample, the middle four channels calculate the distance between each positive sample pixel point and the upper, right, lower and left boundaries of the text box, and the last channel predicts the direction of the related boundary box.
S303: thereby generating DOT information text probabilities and location information and locating candidate text blocks for the tire image.
S4: obtaining the correct orientation of the text box is important because the recognition stage requires better coordinates of the text box to correctly recognize the text. As shown in fig. 2, 360 degrees of tire character direction is possible at this time. The orientation of the tire direction corresponds to the direction of the characters on the tread thereof. Therefore, the tire bending direction detection is carried out on the first characteristic diagram output in the last stage, and the character direction information of the tire tread is obtained.
The detection of the bending direction of the tire is as follows: and taking out the last layer of feature map C4 output by the ResNet-50 network, classifying through two layers of full connection layers and through class, predicting a 4-dimensional array, representing the probability of belonging to four directions, namely the upper direction, the lower direction, the left direction and the right direction of the tire bending, and obtaining the character direction information of the tire tread.
S5: affine transformation is carried out on the candidate text blocks obtained in the step S3 and the character direction information of the tire tread obtained in the step S4, and the candidate text blocks are converted into horizontal text blocks with upward directions;
in a specific embodiment, in step S5, the affine transformation is specifically as follows:
inputting the obtained character direction information of the tire tread and the obtained candidate text block with an angle as affine transformation parameters into an ROI Rotate module, and carrying out affine transformation on the text block to convert the text block into a horizontal text block with an upward direction; the process of the radiation transformation is divided into two steps:
(1) calculating affine transformation parameters through the predicted coordinates of the text candidates obtained in step S3 and the character direction information of the tire tread obtained in step S4;
(2) an affine transformation is applied to the shared feature maps of each region separately, and a normal-case horizontal feature map of the text region is obtained.
S6: and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
Predicting the text label by using the region characteristics output by the ROI Rotate module; the text recognition network based on deep learning is composed of a VGG16 layer and a BLSTM layer. The input features of LSTM are reduced only twice along the width axis by sharing the convolution with the original image, taking into account the length of the tag sequence in the text region.
The multi-task framework inevitably generates inconsistent convergence, the convergence of the identification method described in the embodiment is influenced by four stages of DOT information rough positioning, DOT information fine positioning, tire bending direction detection and DOT character identification, and training needs to be carried out through a total loss function, so that error accumulation is reduced. The present embodiment analyzes the contribution of the local loss of each stage of the method to convergence through theoretical derivation and experimental comparison, which can supervise the composition of the total loss function of the method to ensure effective convergence. Wherein the total loss function is defined as follows:
Ltotal=λ1Ldot2Ldetect3Lcls4Lrg
in the formula, LdotIndicating the DOT information rough positioning; l isdetectShowing DOT information fine positioning;
Lclsindicating tire bending direction detection; l isrgIndicating DOT character recognition; lambda [ alpha ]1,λ2,λ3,λ4Are the corresponding trade-off factors representing the contribution of the four losses to the overall loss function.
Generally, when the tire DOT information identification method based on end-to-end deep learning is trained, the contribution of local loss at each stage to the total loss function should be balanced. The nature of the training data and the loss functions dictates that the size of each local loss function may vary widely. If this amplitude difference is not processed correctly, the convergence of the frame may be biased towards one local loss function during training, while the convergence of the other local loss functions may be attenuated or even ignored.
Theoretical derivation and experimental comparison show that the initial loss value of DOT information fine positioning is at least two orders of magnitude smaller than the loss of other three stages. In order to maintain a relative balance of the four-stage losses and ensure consistent convergence, the principle of the initial configuration trade-off factor can be generalized in that the contribution of the local losses used for fine positioning of DOT information should be initially set to a value two orders of magnitude greater than the other three local losses. In practice, the contributions of L1, L3 and 43 are fixed, and only the contribution of L2 is adjusted, instead of adjusting the contributions of the four phases simultaneously. The contribution of the four phases can be set to λ1=0.01,λ2=1,λ3=0.01,λ3=0.01。
DOT information coarse positioning
In this embodiment, the loss function of the DOT information coarse positioning network is composed of two parts, namely a classification loss function and a position regression loss function, and the formula is as follows:
Figure BDA0003362170740000091
where i denotes an anchor frame index (anchors index); p is a radical ofiRepresents the probability that a positive sample is predicted (positive softmax probability);
Figure BDA0003362170740000092
a probability (Ground True prediction) representing the corresponding True value; t is tiA candidate box representing a prediction;
Figure BDA0003362170740000093
representing a true label box corresponding to the positive anchor; n is a radical ofs、NgRespectively representing the number of samples of the corresponding tasks; l issRepresenting a classification loss function, wherein the classification loss function is a softmax loss function; l isgThe position regression loss function is expressed, and the position regression loss function is soomth L1 loss function.
And calculating softmax loss function, which is used for network training for classifying anchors as positive and negative.
And calculating a soomth L1 loss function used for bounding box regression network training.
DOT information fine positioning
The DOT information in the tire only accounts for a small part in the image, the Dice loss is provided for the problem that the foreground proportion is too small, and the Dice loss has the advantages of being better for the problem of category imbalance:
Figure BDA0003362170740000094
wherein, | X ^ Y | represents the intersection of set X and Y, | X | and | Y | represent their element number, for cutting apart the task, | X | and | Y | represent group True and Predict mask cut separately; l isdclsIndicating a loss of classification.
Coordinate regression uses IOU Loss + cosine angle difference Loss:
IOU Loss:
Figure BDA0003362170740000101
wherein the content of the first and second substances,
Figure BDA0003362170740000102
representing predicted geometry, R*Is its corresponding label box.
Cosine angle difference loss:
Figure BDA0003362170740000103
wherein the content of the first and second substances,
Figure BDA0003362170740000104
is a prediction of the angle of rotation, and θ*Indicating the annotated value.
LdregRepresenting the total Loss of coordinate regression, using the IOU Loss + cosine angle difference Loss:
Ldreg=Liou+Lθ
so the loss function for fine positioning of DOT information:
Ldetec=Ldcls+Ldreg
loss function L in tire bending direction detection stagecls
Figure BDA0003362170740000105
Figure BDA0003362170740000106
Wherein, ajRepresents the jth value of the input vector T; a iskRepresents the kth value of the input vector T; y isjRepresenting a real tag; t represents the number of categories; sjIs the jth value of the vector S, indicating the probability that this sample belongs to the jth class
DOT character recognition;
in DOT character recognition stage, a connectionist time classification loss function (CTC) is used, which is a promising loss function for deep learning of text recognition:
Figure BDA0003362170740000107
where ψ represents a set of ground truth sequences; y denotes the estimated sequence and l the authentic marker sequence.
The identification method provided by the embodiment has the advantages and beneficial effects:
1. since the tire surface has many characters, the present embodiment only focuses on DOT information, and other characters will affect our detection results. In combination with the characteristic that the tire DOT information starts with the DOT character, the DOT character is found through the RPN network, and the DOT information is roughly positioned. Meanwhile, a mask image with a DOT information rough position is generated and multiplied by a second feature image output by a feature extraction network, the multiplied result is used as the input of DOT fine positioning, the interference of character information outside the DOT information area is eliminated, the region of interest is extracted, and the detection accuracy is further improved.
2. The prior method detects the DOT information of the tire, stays after detecting the text information of the tire by using a positioning network, and then identifies the text information of the tire by using a text identification network, wherein the DOT information and the text information are respectively and independently trained. Such a stepwise approach involves a number of cumbersome steps and error accumulation, resulting in poor tire DOT information identification performance. The tire DOT information identification method provided by the embodiment is characterized in that a neural network framework is formed by six parts, namely a feature extraction network, DOT information rough positioning, DOT information fine positioning, a tire bending direction, ROI rotation and DOT character identification in a seamless mode, so that different tasks are completed. And each stage of the framework is not independent, and the framework needs to be trained through a total loss function without error accumulation.
3. The deep learning based multitasking framework inevitably has inconsistent convergence. The present embodiment analyzes the contribution of the local loss of each part of the framework to convergence through theoretical derivation and experimental comparison, which can supervise the composition of the proposed total loss function of the multi-task framework to ensure effective convergence.
4. Compared with the traditional text positioning algorithm, the method and the device have the advantages that the DOT information text is detected by the feature extraction network, rich features can be extracted, and the text detection accuracy is improved. In the traditional text recognition method, the characters need to be segmented firstly and then recognized, and when the character spacing is small, the segmentation effect is not good, and recognition is influenced. And a text recognition network based on deep learning is introduced, so that even if the intervals among the DOT information embossed characters of the tire are very small and even the embossed characters are connected, the DOT information embossed characters can be accurately recognized.
Example 2
Based on the tire DOT information identification method based on the end-to-end deep learning described in the embodiment 1, the embodiment also provides a tire DOT information identification device, and the device comprises a feature extraction network module, a DOT information rough positioning module, a DOT information fine positioning module, a tire bending direction detection module, an ROI Rotate module and a DOT character identification module;
the feature extraction network module is used for extracting features of the tire image acquired with the tire DOT information to obtain first feature maps output in N stages, and meanwhile, performing feature fusion on the first feature maps output in the N stages to obtain a second feature map;
the DOT information rough positioning module is used for carrying out DOT information rough positioning on the second characteristic diagram and detecting whether three characters of DOT and position information thereof exist or not so as to obtain a region diagram;
the DOT information fine positioning module is used for generating a mask image for the area image and multiplying the mask image by the second feature image to obtain a third feature image for DOT information fine positioning to obtain DOT information text probability and position information so as to position candidate text blocks with angles;
the tire bending direction detection module is used for detecting the tire bending direction of the first characteristic diagram output at the last stage to acquire character direction information of the tire tread;
the ROI Rotate module is used for carrying out affine transformation on the character direction information of the candidate text block and the tire tread and converting the character direction information into a horizontal text block with an upward direction;
and the DOT character recognition module is used for performing DOT character recognition on the horizontal text block input text recognition network based on deep learning.
Example 3
A computer system comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method steps when executing the computer program as follows:
s1: carrying out feature extraction on the tire image acquired with the tire DOT information to respectively obtain first feature maps output in N stages, and simultaneously carrying out feature fusion on the first feature maps output in the N stages to obtain a second feature map;
s2: performing DOT information rough positioning on the fused second feature map, and detecting whether three characters of DOT and position information thereof exist or not so as to obtain a regional map;
s3: generating a mask image from the area image, multiplying the mask image by the second feature image, and performing DOT information fine positioning on a third feature image obtained after multiplication to obtain DOT information text probability and position information so as to position a candidate text block with an angle;
s4: detecting the bending direction of the tire on the first characteristic diagram output in the last stage to acquire character direction information of the tire tread of the tire;
s5: affine transformation is carried out on the candidate text blocks obtained in the step S3 and the character direction information of the tire tread obtained in the step S4, and the candidate text blocks are converted into horizontal text blocks with upward directions;
s6: and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
Example 4
A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method steps of:
s1: carrying out feature extraction on the tire image acquired with the tire DOT information to respectively obtain first feature maps output in N stages, and simultaneously carrying out feature fusion on the first feature maps output in the N stages to obtain a second feature map;
s2: performing DOT information rough positioning on the fused second feature map, and detecting whether three characters of DOT and position information thereof exist or not so as to obtain a regional map;
s3: generating a mask image from the area image, multiplying the mask image by the second feature image, and performing DOT information fine positioning on a third feature image obtained after multiplication to obtain DOT information text probability and position information so as to position a candidate text block with an angle;
s4: detecting the bending direction of the tire on the first characteristic diagram output in the last stage to acquire character direction information of the tire tread of the tire;
s5: affine transformation is carried out on the candidate text blocks obtained in the step S3 and the character direction information of the tire tread obtained in the step S4, and the candidate text blocks are converted into horizontal text blocks with upward directions;
s6: and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
The embodiments of the present invention can be arbitrarily combined to achieve different technical effects.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk SolidStateDisk), among others.
One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A tire DOT information identification method based on end-to-end deep learning is characterized in that: the method comprises the following steps:
s1: carrying out feature extraction on the tire image acquired with the tire DOT information to respectively obtain first feature maps output in N stages, and simultaneously carrying out feature fusion on the first feature maps output in the N stages to obtain a second feature map;
s2: performing DOT information rough positioning on the fused second feature map, and detecting whether three characters of DOT and position information thereof exist or not so as to obtain a regional map;
s3: generating a mask image from the area image, multiplying the mask image by the second feature image, and performing DOT information fine positioning on a third feature image obtained after multiplication to obtain DOT information text probability and position information so as to position a candidate text block with an angle;
s4: detecting the bending direction of the tire on the first characteristic diagram output in the last stage to acquire character direction information of the tire tread of the tire;
s5: affine transformation is carried out on the candidate text blocks obtained in the step S3 and the character direction information of the tire tread obtained in the step S4, and the candidate text blocks are converted into horizontal text blocks with upward directions;
s6: and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
2. The method of identifying tire DOT information based on end-to-end deep learning of claim 1, wherein: step S1, extracting features by using a feature extraction network, wherein the feature extraction network comprises a ResNet-50 network and a feature pyramid network FPN;
the ResNet-50 network firstly extracts the characteristics of the tire images of which the DOT information is acquired, so as to obtain first characteristic maps C1, C2, C3 and C4 which are output in 4 stages, wherein the corresponding resolutions are 1/4,1/8,1/16 and 1/32 of the input tire images respectively;
inputting the first feature maps C1, C2, C3 and C4 into a feature pyramid network FPN for feature fusion, wherein the feature fusion is used for connecting low-level feature mapping and high-level semantic feature mapping; and respectively outputting second characteristic maps P1, P2, P3 and P4.
3. The method of identifying tire DOT information based on end-to-end deep learning of claim 2, wherein: the rough location of the DOT information is as follows:
s201: inputting the fused second feature P1 into a spatial attention module to obtain an output feature map A1;
s202: prediction is carried out through a regional suggestion network RPN, softmax classification and position regression are carried out on the feature map A1, and a regional map containing the DOT three characters and position information thereof is obtained.
4. A tire DOT information identification method based on end-to-end deep learning according to claim 3, characterized in that: the detailed steps of the DOT information fine positioning are as follows:
s301: establishing a text detection branch for a third feature map, wherein the size of the third feature map is 1/4 the size of the collected tire image;
s302: the third feature map consists of six channels, the first channel calculates the probability that each pixel is a positive sample, the middle four channels calculate the distance between each positive sample pixel point and the upper, right, lower and left boundaries of the text box, and the last channel predicts the direction of the related boundary box;
s303: thereby generating DOT information text probabilities and location information and locating candidate text blocks for the tire image.
5. The method of identifying tire DOT information based on end-to-end deep learning of claim 4, wherein: the detection of the bending direction of the tire is as follows: and taking out the last layer of feature map C4 output by the ResNet-50 network, classifying through two layers of full connection layers and through class, predicting a 4-dimensional array, representing the probability of belonging to four directions, namely the upper direction, the lower direction, the left direction and the right direction of the tire bending, and obtaining the character direction information of the tire tread.
6. The method of identifying DOT information in a tire based on end-to-end deep learning of claim 5, wherein: in step S5, the affine transformation is specifically as follows:
inputting the obtained character direction information of the tire tread and the obtained candidate text block with an angle as affine transformation parameters into an ROI Rotate module, and carrying out affine transformation on the text block to convert the text block into a horizontal text block with an upward direction; the process of the radiation transformation is divided into two steps:
(1) calculating affine transformation parameters through the predicted coordinates of the text candidates obtained in step S3 and the character direction information of the tire tread obtained in step S4;
(2) an affine transformation is applied to the shared feature maps of each region separately, and a normal-case horizontal feature map of the text region is obtained.
7. The method of identifying tire DOT information based on end-to-end deep learning of claim 6, wherein: in order to reduce the influence of local loss of each stage on convergence, a total loss function is adopted for training to ensure effective convergence; wherein the total loss function is defined as follows:
Ltotal=λ1Ldot2Ldetect3Lcls4Lrg
in the formula, LdotA loss function representing a DOT information rough positioning stage; l isdetectA loss function representing the DOT information fine positioning stage; l isclsA loss function representing a stage of detecting a bending direction of the tire; l isrgA loss function representing a DOT character recognition stage; lambda [ alpha ]1,λ2,λ3,λ4Are the corresponding trade-off factors representing the contribution of the four losses to the overall loss function.
8. The method of identifying tire DOT information based on end-to-end deep learning of claim 7, wherein: loss function L of DOT information rough positioning stagedotThe system consists of a classification loss function and a position regression loss function, and the formula is as follows:
Figure FDA0003362170730000031
in the formula, i represents an anchor frame index; p is a radical ofiRepresenting the probability that a positive sample is predicted;
Figure FDA0003362170730000032
representing the probability of the corresponding true value; t is tiA candidate box representing a prediction;
Figure FDA0003362170730000033
indicating the true label corresponding to the positive anchorFraming; n is a radical ofs、NgRespectively representing the number of samples of the corresponding tasks; l issRepresenting a classification loss function; l isgThe positional regression loss function is represented.
9. The method of identifying tire DOT information based on end-to-end deep learning of claim 7, wherein: so that the loss function L of the fine positioning stage of the DOT informationdetect
Ldetect=Ldcls+Ldreg
Figure FDA0003362170730000034
Wherein, | X ^ Y | represents the intersection of set X and Y, | X | and | Y | represent their element number, for cutting apart the task, | X | and | Y | represent group True and Predict mask cut separately; l isdclsRepresents a loss of classification;
Ldregrepresenting the total Loss of coordinate regression, using the IOU Loss + cosine angle difference Loss:
Ldreg=Liou+Lθ
IOU Loss:
Figure FDA0003362170730000035
wherein the content of the first and second substances,
Figure FDA0003362170730000036
representing predicted geometry, R*Is its corresponding label box;
cosine angle difference loss:
Figure FDA0003362170730000037
wherein the content of the first and second substances,
Figure FDA0003362170730000038
is a prediction of the angle of rotation, and θ*Indicating the annotated value.
10. The method of identifying tire DOT information based on end-to-end deep learning of claim 7, wherein: the loss function L of the tire bending direction detection stageclsThe formula is as follows:
Figure FDA0003362170730000041
Figure FDA0003362170730000042
in the formula, ajRepresents the jth value of the input vector T; a iskRepresents the kth value of the input vector T; y isjRepresenting a real tag; t represents the number of categories; sjIs the jth value of the vector S, indicating the probability that this sample belongs to the jth class;
the loss function L of the DOT character recognition stagergThe formula is as follows:
Figure FDA0003362170730000043
where ψ represents a set of ground truth sequences; y denotes the estimated sequence and l the authentic marker sequence.
CN202111370406.5A 2021-11-18 2021-11-18 Tire DOT information identification method based on end-to-end deep learning Pending CN113989604A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111370406.5A CN113989604A (en) 2021-11-18 2021-11-18 Tire DOT information identification method based on end-to-end deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111370406.5A CN113989604A (en) 2021-11-18 2021-11-18 Tire DOT information identification method based on end-to-end deep learning

Publications (1)

Publication Number Publication Date
CN113989604A true CN113989604A (en) 2022-01-28

Family

ID=79749384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111370406.5A Pending CN113989604A (en) 2021-11-18 2021-11-18 Tire DOT information identification method based on end-to-end deep learning

Country Status (1)

Country Link
CN (1) CN113989604A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824591A (en) * 2023-04-19 2023-09-29 钛玛科(北京)工业科技有限公司 Identification method for tire sidewall characters
CN116894937A (en) * 2023-06-25 2023-10-17 德联易控科技(北京)有限公司 Method, system and electronic equipment for acquiring parameters of wheel aligner

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824591A (en) * 2023-04-19 2023-09-29 钛玛科(北京)工业科技有限公司 Identification method for tire sidewall characters
CN116824591B (en) * 2023-04-19 2023-12-05 钛玛科(北京)工业科技有限公司 Identification method for tire sidewall characters
CN116894937A (en) * 2023-06-25 2023-10-17 德联易控科技(北京)有限公司 Method, system and electronic equipment for acquiring parameters of wheel aligner
CN116894937B (en) * 2023-06-25 2024-02-06 德联易控科技(北京)有限公司 Method, system and electronic equipment for acquiring parameters of wheel aligner

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN113160192B (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN113989604A (en) Tire DOT information identification method based on end-to-end deep learning
CN113435319B (en) Classification method combining multi-target tracking and pedestrian angle recognition
CN110310305B (en) Target tracking method and device based on BSSD detection and Kalman filtering
CN112784712B (en) Missing child early warning implementation method and device based on real-time monitoring
CN110751619A (en) Insulator defect detection method
CN106407978B (en) Method for detecting salient object in unconstrained video by combining similarity degree
Lu et al. Superthermal: Matching thermal as visible through thermal feature exploration
CN112541491A (en) End-to-end text detection and identification method based on image character region perception
CN113780040A (en) Lip key point positioning method and device, storage medium and electronic equipment
CN112364687A (en) Improved Faster R-CNN gas station electrostatic sign identification method and system
CN111612802A (en) Re-optimization training method based on existing image semantic segmentation model and application
CN111738264A (en) Intelligent acquisition method for data of display panel of machine room equipment
CN108985216B (en) Pedestrian head detection method based on multivariate logistic regression feature fusion
CN114758139B (en) Method for detecting accumulated water in foundation pit
CN116503760A (en) Unmanned aerial vehicle cruising detection method based on self-adaptive edge feature semantic segmentation
CN116363655A (en) Financial bill identification method and system
CN115953744A (en) Vehicle identification tracking method based on deep learning
Dai et al. An Improved ORB Feature Extraction Algorithm Based on Enhanced Image and Truncated Adaptive Threshold
CN114241202A (en) Method and device for training dressing classification model and method and device for dressing classification
CN113850167A (en) Commodity identification method and system based on edge calculation and machine deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination