CN113989604A - Tire DOT information identification method based on end-to-end deep learning - Google Patents
Tire DOT information identification method based on end-to-end deep learning Download PDFInfo
- Publication number
- CN113989604A CN113989604A CN202111370406.5A CN202111370406A CN113989604A CN 113989604 A CN113989604 A CN 113989604A CN 202111370406 A CN202111370406 A CN 202111370406A CN 113989604 A CN113989604 A CN 113989604A
- Authority
- CN
- China
- Prior art keywords
- tire
- dot
- information
- feature
- dot information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013135 deep learning Methods 0.000 title claims abstract description 35
- 230000009466 transformation Effects 0.000 claims abstract description 26
- 238000005452 bending Methods 0.000 claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 24
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims abstract description 22
- 230000004927 fusion Effects 0.000 claims abstract description 14
- 238000010586 diagram Methods 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 58
- 238000001514 detection method Methods 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 4
- 238000005520 cutting process Methods 0.000 claims description 3
- 239000003550 marker Substances 0.000 claims description 3
- 230000005855 radiation Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 description 7
- 238000003860 storage Methods 0.000 description 6
- 238000009825 accumulation Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 238000009795 derivation Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000012843 least square support vector machine Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000005299 abrasion Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 208000003464 asthenopia Diseases 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/02—Affine transformations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a tire DOT information identification method based on end-to-end deep learning, which comprises the following steps: carrying out feature extraction on the tire image to respectively obtain first feature maps output in N stages, and simultaneously carrying out feature fusion on the first feature maps output in the N stages to obtain a second feature map; performing DOT information rough positioning on the fused second feature map, and detecting whether three characters of DOT and position information thereof exist or not to obtain a regional map; generating a mask image from the area image, multiplying the mask image by the second feature image, finely positioning DOT information on a third feature image obtained by multiplying, obtaining DOT information text probability and position information, and positioning to a candidate text block with an angle; detecting the bending direction of the tire on the first characteristic diagram output in the last stage to acquire character direction information of the tire tread of the tire; affine transformation is carried out on the candidate text blocks and the character direction information of the tire tread, and the candidate text blocks and the character direction information are converted into horizontal text blocks with upward directions; and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
Description
Technical Field
The invention relates to the technical field of image recognition, in particular to a tire DOT information recognition method based on end-to-end deep learning.
Background
Tire DOT information carries product information for its tire manufacturer and is very important to the manufacturer. The factory needs to determine information such as the origin information, factory code, and date of manufacture of the tire from the DOT information of the recovered tire. In the automobile manufacturing industry, the process on each relevant tire needs to read and match the tread information, and if the identification information is wrong with the actual condition, the result that the estimation is impossible can be caused. If rely on artifical the detection, the speed is very slow, still needs a large amount of manpowers, and long-time work brings visual fatigue and also can make the rate of accuracy decline. There is therefore a need for an automatic detection system for detecting tire DOT information.
The existing methods for locating and identifying the DOT information of the tire mostly adopt the traditional image processing method to detect the target, as follows:
(1) there are conventional methods of image processing. When the characters are positioned through template matching, the abrasion of the die causes the surrounding fins of the characters to cover the character areas, the matching of template pictures is wrong, and the like, so that the positioning is wrong. If the distance between the embossed characters is small, the embossed characters can not be well segmented because the projection curve presents a trough which is not obvious enough, and the identification accuracy is affected. The Least Square Support Vector Machine (LSSVM) training is based on a single character, the types of the tire DOT information characters are numerous, the individual characters need to be manually segmented to serve as a data set, and the training is very tedious and has large workload. If the boundary of the segmented character is not good, the trained model is influenced, and the recognition effect is further influenced.
Meanwhile, such conventional image-based processing methods involve image preprocessing, character positioning, character segmentation, and character recognition, which require artificial setting of thresholds. Due to the influence of factors such as the shooting environment and the material of the tire, the threshold values cannot be well applied, and the accuracy of tire character detection and identification is further influenced.
(2) The method combines the existing deep learning and the traditional image processing. Extracting concentric circles through Hough circle transformation, and unfolding the obtained concentric circles into rectangles, namely unfolding the tire surface into a rectangular tire tread. And finally, carrying out target detection on the DOT character on the tire by a deep learning Faster R-CNN method. Here, only the tire DOT information is subjected to position detection and then recognized. The stay is that after the tire text information is detected by the positioning network, the tire text information is identified by the text identification network, and the tire text information and the text identification network are respectively and independently trained. Such a stepwise approach involves a number of cumbersome steps and error accumulation, resulting in poor tire DOT information identification performance.
Disclosure of Invention
The invention provides a tire DOT information identification method based on end-to-end deep learning, aiming at solving the problem of low identification accuracy rate in the prior art, and the identification accuracy rate of the tire DOT information can be effectively improved.
In order to achieve the purpose of the invention, the technical scheme is as follows:
a tire DOT information identification method based on end-to-end deep learning comprises the following steps:
s1: carrying out feature extraction on the tire image acquired with the tire DOT information to respectively obtain first feature maps output in N stages, and simultaneously carrying out feature fusion on the first feature maps output in the N stages to obtain a second feature map;
s2: performing DOT information rough positioning on the fused second feature map, and detecting whether three characters of DOT and position information thereof exist or not so as to obtain a regional map;
s3: generating a mask image from the area image, multiplying the mask image by the second feature image, and performing DOT information fine positioning on a third feature image obtained after multiplication to obtain DOT information text probability and position information so as to position a candidate text block with an angle;
s4: detecting the bending direction of the tire on the first characteristic diagram output in the last stage to acquire character direction information of the tire tread of the tire;
s5: affine transformation is carried out on the candidate text blocks obtained in the step S3 and the character direction information of the tire tread obtained in the step S4, and the candidate text blocks are converted into horizontal text blocks with upward directions;
s6: and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
Preferably, in step S1, a feature extraction network is used to perform feature extraction, where the feature extraction network includes a ResNet-50 network and a feature pyramid network FPN;
the ResNet-50 network firstly extracts the characteristics of the tire images of which the DOT information is acquired, so as to obtain first characteristic maps C1, C2, C3 and C4 which are output in 4 stages, wherein the corresponding resolutions are 1/4,1/8,1/16 and 1/32 of the input tire images respectively;
inputting the first feature maps C1, C2, C3 and C4 into a feature pyramid network FPN for feature fusion, wherein the feature fusion is used for connecting low-level feature mapping and high-level semantic feature mapping; and respectively outputting second characteristic maps P1, P2, P3 and P4.
Further, the rough location of the DOT information is as follows:
s201: inputting the fused second feature P1 into a spatial attention module to obtain an output feature map A1;
s202: prediction is carried out through a regional suggestion network RPN, softmax classification and position regression are carried out on the feature map A1, and a regional map containing the DOT three characters and position information thereof is obtained.
Still further, the detailed steps of the fine positioning of DOT information are as follows:
s301: establishing a text detection branch for a third feature map, wherein the size of the third feature map is 1/4 the size of the collected tire image;
s302: the third feature map consists of six channels, the first channel calculates the probability that each pixel is a positive sample, the middle four channels calculate the distance between each positive sample pixel point and the upper, right, lower and left boundaries of the text box, and the last channel predicts the direction of the related boundary box;
s303: thereby generating DOT information text probabilities and location information and locating candidate text blocks for the tire image.
Still further, the detection of the tire bending direction is specifically as follows: and taking out the last layer of feature map C4 output by the ResNet-50 network, classifying through two layers of full connection layers and through class, predicting a 4-dimensional array, representing the probability of belonging to four directions, namely the upper direction, the lower direction, the left direction and the right direction of the tire bending, and obtaining the character direction information of the tire tread.
Still further, in step S5, the affine transformation is specifically as follows:
inputting the obtained character direction information of the tire tread and the obtained candidate text block with an angle as affine transformation parameters into an ROI Rotate module, and carrying out affine transformation on the text block to convert the text block into a horizontal text block with an upward direction; the process of the radiation transformation is divided into two steps:
(1) calculating affine transformation parameters through the predicted coordinates of the text candidates obtained in step S3 and the character direction information of the tire tread obtained in step S4;
(2) an affine transformation is applied to the shared feature maps of each region separately, and a normal-case horizontal feature map of the text region is obtained.
Furthermore, in order to reduce the influence of local loss of each stage on convergence, a total loss function is adopted for training to ensure effective convergence; wherein the total loss function is defined as follows:
Ltotal=λ1Ldot+λ2Ldetect+λ3Lcls+λ4Lrg
in the formula, LdotA loss function representing a DOT information rough positioning stage; l isdetectA loss function representing the DOT information fine positioning stage; l isclsA loss function representing a stage of detecting a bending direction of the tire; l isrgA loss function representing a DOT character recognition stage; lambda [ alpha ]1,λ2,λ3,λ4Are the corresponding trade-off factors representing the contribution of the four losses to the overall loss function.
Still further, the loss function L of DOT information rough positioning stagedotThe system consists of a classification loss function and a position regression loss function, and the formula is as follows:
in the formula, i represents an anchor frame index; p is a radical ofiRepresenting the probability that a positive sample is predicted;representing the probability of the corresponding true value; t is tiA candidate box representing a prediction;representing a true label box corresponding to the positive anchor; n is a radical ofs、NgRespectively representing the number of samples of the corresponding tasks; l issRepresenting a classification loss function; l isgThe positional regression loss function is represented.
Still further, so the loss function L of the DOT information fine positioning stagedetect:
Ldetect=Ldcls+Ldreg
Wherein, | X ^ Y | represents the intersection of set X and Y, | X | and | Y | represent their element number, for cutting apart the task, | X | and | Y | represent group True and Predict mask cut separately; l isdclsRepresents a loss of classification;
Ldregrepresenting the total Loss of coordinate regression, using the IOU Loss + cosine angle difference Loss:
Ldreg=Liou+Lθ
IOU Loss:
wherein the content of the first and second substances,representing predicted geometry, R*Is its corresponding label box;
cosine angle difference loss:
wherein the content of the first and second substances,is a prediction of the angle of rotation, and θ*Indicating the annotated value.
Still further, the loss function L of the tire bending direction detection stageclsThe formula is as follows:
in the formula, ajRepresents the jth value of the input vector T; a iskRepresents the kth value of the input vector T; y isjRepresenting a real tag; t represents the number of categories; sjIs the jth value of the vector S, indicating the probability that this sample belongs to the jth class;
the loss function L of the DOT character recognition stagergThe formula is as follows:
where ψ represents a set of ground truth sequences; y denotes the estimated sequence and l the authentic marker sequence.
The invention has the following beneficial effects:
1. the invention combines the characteristic that the DOT information of the tire starts with the DOT character, and the DOT character is roughly positioned and found out through the DOT information. Meanwhile, a mask image with a DOT information rough position is generated and multiplied by a third feature image output by a feature extraction network, the multiplied result is used as the input of DOT fine positioning, the interference of character information outside a DOT information area is eliminated, the region of interest is extracted, and the detection accuracy is further improved.
2. The invention provides a tire DOT information identification method based on end-to-end deep learning, wherein a frame of the tire DOT information identification method is seamlessly composed of a feature extraction network module, a DOT information rough positioning module, a DOT information fine positioning module, a tire bending direction detection module, an ROI Rotate module and a DOT character identification module so as to complete different tasks. And each part of the framework is not independent, and needs to be trained through a total loss function without error accumulation.
3. Compared with the traditional text positioning algorithm, the method adopts the feature extraction network to detect the DOT information text, can extract abundant features, and improves the text detection accuracy. In the traditional text recognition method, the characters need to be segmented firstly and then recognized, and when the character spacing is small, the segmentation effect is not good, and recognition is influenced. And a text recognition network based on deep learning is introduced, so that even if the intervals among the DOT information embossed characters of the tire are very small and even the embossed characters are connected, the DOT information embossed characters can be accurately recognized.
Drawings
Fig. 1 is a schematic block diagram of a tire DOT information identification method based on end-to-end deep learning shown in embodiment 1.
Fig. 2 is a tire image of the tire DOT information acquired in example 1.
Detailed Description
The invention is described in detail below with reference to the drawings and the detailed description.
Example 1
As shown in fig. 1, a method for identifying DOT information of a tire based on end-to-end deep learning includes the following steps:
s1: the tire image of the tire DOT information acquired by the image acquisition hardware system is shown in fig. 2, a camera of the tire image system shoots a part of the tire each time, the resolution is high, and characters on the tire can be clearly displayed in the image. As can be seen from fig. 2, since the direction in which the tire is curved is different, the text information can be corrected in the positive direction based on this information. Each group of DOT information has three characters of 'DOT' in front, so that whether the image has complete DOT information or not can be judged according to the information.
In the embodiment, the tire image acquired with the tire DOT information is subjected to feature extraction to respectively obtain the first feature maps output in the N stages, and meanwhile, the first feature maps output in the N stages are subjected to feature fusion to obtain the second feature map.
In a specific embodiment, in step S1, a feature extraction network is used to perform feature extraction, where the feature extraction network includes a residual error network ResNet-50 network and a feature pyramid network FPN. The feature extraction network is a convolutional neural network.
The residual error network ResNet-50 network firstly extracts the characteristics of the tire images collected with the tire DOT information, as shown in FIG. 1, first characteristic maps C1, C2, C3 and C4 output in 4 stages are obtained, and the corresponding resolutions are 1/4,1/8,1/16 and 1/32 of the input tire images respectively.
Still further, the first feature maps C1, C2, C3 and C4 are respectively input into a feature pyramid network FPN for feature fusion, and are used for connecting low-level feature mapping and high-level semantic feature mapping; the second feature maps P1, P2, P3, P4 are output, respectively, with the corresponding resolutions 1/4,1/8,1/16,1/32, respectively, of the tire image.
And a feature extraction part, outputting the output of the last 4 stages of the residual error network ResNet-50 as the input of a feature fusion stage.
The feature fusion part has higher resolution of low-level features and contains more position and detail information, but has lower semantic property and more noise due to less convolution. The high-level features have stronger semantic information, but the resolution is very low, and the perception capability of the details is poor. The advantages of each layer can be extracted by fusing the characteristic diagram of the bottom layer and the characteristic diagram of the high layer, so that the detection and segmentation performance is improved.
S2: and roughly positioning DOT information on the fused second feature map, and detecting whether three characters of DOT and position information of the three characters exist or not so as to obtain a regional map.
In a specific embodiment, the rough location of the DOT information is as follows:
s201: inputting the fused second feature P1 into a spatial attention module to obtain an output feature map A1; in order to make the network focus more on text features, a spatial Attention module (Attention module) is added.
S202: prediction is carried out through a regional suggestion network RPN, softmax classification and position regression are carried out on the feature map A1, and a regional map containing the DOT three characters and position information thereof is obtained. After the DOT three-character position information is obtained, the DOT information area position is preliminarily positioned.
S3: and generating a mask image from the area image, multiplying the mask image by the second feature image, and finely positioning DOT information on a third feature image obtained after multiplication to obtain DOT information text probability and position information so as to position the candidate text block with an angle.
In a specific embodiment, the detailed steps of the fine positioning of DOT information are as follows:
s301: and establishing a text detection branch for a third feature map, wherein the size of the third feature map is 1/4 of the size of the acquired tire image, so that the huge calculation amount can be reduced, the positioning performance is not obviously lost, and after the DOT information is roughly positioned, the calculation of other character information of the tire is shielded, so that the prediction is more focused on the DOT information region part.
S302: the third feature map is composed of six channels, the first channel calculates the probability that each pixel is a positive sample, the middle four channels calculate the distance between each positive sample pixel point and the upper, right, lower and left boundaries of the text box, and the last channel predicts the direction of the related boundary box.
S303: thereby generating DOT information text probabilities and location information and locating candidate text blocks for the tire image.
S4: obtaining the correct orientation of the text box is important because the recognition stage requires better coordinates of the text box to correctly recognize the text. As shown in fig. 2, 360 degrees of tire character direction is possible at this time. The orientation of the tire direction corresponds to the direction of the characters on the tread thereof. Therefore, the tire bending direction detection is carried out on the first characteristic diagram output in the last stage, and the character direction information of the tire tread is obtained.
The detection of the bending direction of the tire is as follows: and taking out the last layer of feature map C4 output by the ResNet-50 network, classifying through two layers of full connection layers and through class, predicting a 4-dimensional array, representing the probability of belonging to four directions, namely the upper direction, the lower direction, the left direction and the right direction of the tire bending, and obtaining the character direction information of the tire tread.
S5: affine transformation is carried out on the candidate text blocks obtained in the step S3 and the character direction information of the tire tread obtained in the step S4, and the candidate text blocks are converted into horizontal text blocks with upward directions;
in a specific embodiment, in step S5, the affine transformation is specifically as follows:
inputting the obtained character direction information of the tire tread and the obtained candidate text block with an angle as affine transformation parameters into an ROI Rotate module, and carrying out affine transformation on the text block to convert the text block into a horizontal text block with an upward direction; the process of the radiation transformation is divided into two steps:
(1) calculating affine transformation parameters through the predicted coordinates of the text candidates obtained in step S3 and the character direction information of the tire tread obtained in step S4;
(2) an affine transformation is applied to the shared feature maps of each region separately, and a normal-case horizontal feature map of the text region is obtained.
S6: and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
Predicting the text label by using the region characteristics output by the ROI Rotate module; the text recognition network based on deep learning is composed of a VGG16 layer and a BLSTM layer. The input features of LSTM are reduced only twice along the width axis by sharing the convolution with the original image, taking into account the length of the tag sequence in the text region.
The multi-task framework inevitably generates inconsistent convergence, the convergence of the identification method described in the embodiment is influenced by four stages of DOT information rough positioning, DOT information fine positioning, tire bending direction detection and DOT character identification, and training needs to be carried out through a total loss function, so that error accumulation is reduced. The present embodiment analyzes the contribution of the local loss of each stage of the method to convergence through theoretical derivation and experimental comparison, which can supervise the composition of the total loss function of the method to ensure effective convergence. Wherein the total loss function is defined as follows:
Ltotal=λ1Ldot+λ2Ldetect+λ3Lcls+λ4Lrg
in the formula, LdotIndicating the DOT information rough positioning; l isdetectShowing DOT information fine positioning;
Lclsindicating tire bending direction detection; l isrgIndicating DOT character recognition; lambda [ alpha ]1,λ2,λ3,λ4Are the corresponding trade-off factors representing the contribution of the four losses to the overall loss function.
Generally, when the tire DOT information identification method based on end-to-end deep learning is trained, the contribution of local loss at each stage to the total loss function should be balanced. The nature of the training data and the loss functions dictates that the size of each local loss function may vary widely. If this amplitude difference is not processed correctly, the convergence of the frame may be biased towards one local loss function during training, while the convergence of the other local loss functions may be attenuated or even ignored.
Theoretical derivation and experimental comparison show that the initial loss value of DOT information fine positioning is at least two orders of magnitude smaller than the loss of other three stages. In order to maintain a relative balance of the four-stage losses and ensure consistent convergence, the principle of the initial configuration trade-off factor can be generalized in that the contribution of the local losses used for fine positioning of DOT information should be initially set to a value two orders of magnitude greater than the other three local losses. In practice, the contributions of L1, L3 and 43 are fixed, and only the contribution of L2 is adjusted, instead of adjusting the contributions of the four phases simultaneously. The contribution of the four phases can be set to λ1=0.01,λ2=1,λ3=0.01,λ3=0.01。
DOT information coarse positioning
In this embodiment, the loss function of the DOT information coarse positioning network is composed of two parts, namely a classification loss function and a position regression loss function, and the formula is as follows:
where i denotes an anchor frame index (anchors index); p is a radical ofiRepresents the probability that a positive sample is predicted (positive softmax probability);a probability (Ground True prediction) representing the corresponding True value; t is tiA candidate box representing a prediction;representing a true label box corresponding to the positive anchor; n is a radical ofs、NgRespectively representing the number of samples of the corresponding tasks; l issRepresenting a classification loss function, wherein the classification loss function is a softmax loss function; l isgThe position regression loss function is expressed, and the position regression loss function is soomth L1 loss function.
And calculating softmax loss function, which is used for network training for classifying anchors as positive and negative.
And calculating a soomth L1 loss function used for bounding box regression network training.
DOT information fine positioning
The DOT information in the tire only accounts for a small part in the image, the Dice loss is provided for the problem that the foreground proportion is too small, and the Dice loss has the advantages of being better for the problem of category imbalance:
wherein, | X ^ Y | represents the intersection of set X and Y, | X | and | Y | represent their element number, for cutting apart the task, | X | and | Y | represent group True and Predict mask cut separately; l isdclsIndicating a loss of classification.
Coordinate regression uses IOU Loss + cosine angle difference Loss:
IOU Loss:
wherein the content of the first and second substances,representing predicted geometry, R*Is its corresponding label box.
Cosine angle difference loss:
wherein the content of the first and second substances,is a prediction of the angle of rotation, and θ*Indicating the annotated value.
LdregRepresenting the total Loss of coordinate regression, using the IOU Loss + cosine angle difference Loss:
Ldreg=Liou+Lθ
so the loss function for fine positioning of DOT information:
Ldetec=Ldcls+Ldreg。
loss function L in tire bending direction detection stagecls:
Wherein, ajRepresents the jth value of the input vector T; a iskRepresents the kth value of the input vector T; y isjRepresenting a real tag; t represents the number of categories; sjIs the jth value of the vector S, indicating the probability that this sample belongs to the jth class
DOT character recognition;
in DOT character recognition stage, a connectionist time classification loss function (CTC) is used, which is a promising loss function for deep learning of text recognition:
where ψ represents a set of ground truth sequences; y denotes the estimated sequence and l the authentic marker sequence.
The identification method provided by the embodiment has the advantages and beneficial effects:
1. since the tire surface has many characters, the present embodiment only focuses on DOT information, and other characters will affect our detection results. In combination with the characteristic that the tire DOT information starts with the DOT character, the DOT character is found through the RPN network, and the DOT information is roughly positioned. Meanwhile, a mask image with a DOT information rough position is generated and multiplied by a second feature image output by a feature extraction network, the multiplied result is used as the input of DOT fine positioning, the interference of character information outside the DOT information area is eliminated, the region of interest is extracted, and the detection accuracy is further improved.
2. The prior method detects the DOT information of the tire, stays after detecting the text information of the tire by using a positioning network, and then identifies the text information of the tire by using a text identification network, wherein the DOT information and the text information are respectively and independently trained. Such a stepwise approach involves a number of cumbersome steps and error accumulation, resulting in poor tire DOT information identification performance. The tire DOT information identification method provided by the embodiment is characterized in that a neural network framework is formed by six parts, namely a feature extraction network, DOT information rough positioning, DOT information fine positioning, a tire bending direction, ROI rotation and DOT character identification in a seamless mode, so that different tasks are completed. And each stage of the framework is not independent, and the framework needs to be trained through a total loss function without error accumulation.
3. The deep learning based multitasking framework inevitably has inconsistent convergence. The present embodiment analyzes the contribution of the local loss of each part of the framework to convergence through theoretical derivation and experimental comparison, which can supervise the composition of the proposed total loss function of the multi-task framework to ensure effective convergence.
4. Compared with the traditional text positioning algorithm, the method and the device have the advantages that the DOT information text is detected by the feature extraction network, rich features can be extracted, and the text detection accuracy is improved. In the traditional text recognition method, the characters need to be segmented firstly and then recognized, and when the character spacing is small, the segmentation effect is not good, and recognition is influenced. And a text recognition network based on deep learning is introduced, so that even if the intervals among the DOT information embossed characters of the tire are very small and even the embossed characters are connected, the DOT information embossed characters can be accurately recognized.
Example 2
Based on the tire DOT information identification method based on the end-to-end deep learning described in the embodiment 1, the embodiment also provides a tire DOT information identification device, and the device comprises a feature extraction network module, a DOT information rough positioning module, a DOT information fine positioning module, a tire bending direction detection module, an ROI Rotate module and a DOT character identification module;
the feature extraction network module is used for extracting features of the tire image acquired with the tire DOT information to obtain first feature maps output in N stages, and meanwhile, performing feature fusion on the first feature maps output in the N stages to obtain a second feature map;
the DOT information rough positioning module is used for carrying out DOT information rough positioning on the second characteristic diagram and detecting whether three characters of DOT and position information thereof exist or not so as to obtain a region diagram;
the DOT information fine positioning module is used for generating a mask image for the area image and multiplying the mask image by the second feature image to obtain a third feature image for DOT information fine positioning to obtain DOT information text probability and position information so as to position candidate text blocks with angles;
the tire bending direction detection module is used for detecting the tire bending direction of the first characteristic diagram output at the last stage to acquire character direction information of the tire tread;
the ROI Rotate module is used for carrying out affine transformation on the character direction information of the candidate text block and the tire tread and converting the character direction information into a horizontal text block with an upward direction;
and the DOT character recognition module is used for performing DOT character recognition on the horizontal text block input text recognition network based on deep learning.
Example 3
A computer system comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method steps when executing the computer program as follows:
s1: carrying out feature extraction on the tire image acquired with the tire DOT information to respectively obtain first feature maps output in N stages, and simultaneously carrying out feature fusion on the first feature maps output in the N stages to obtain a second feature map;
s2: performing DOT information rough positioning on the fused second feature map, and detecting whether three characters of DOT and position information thereof exist or not so as to obtain a regional map;
s3: generating a mask image from the area image, multiplying the mask image by the second feature image, and performing DOT information fine positioning on a third feature image obtained after multiplication to obtain DOT information text probability and position information so as to position a candidate text block with an angle;
s4: detecting the bending direction of the tire on the first characteristic diagram output in the last stage to acquire character direction information of the tire tread of the tire;
s5: affine transformation is carried out on the candidate text blocks obtained in the step S3 and the character direction information of the tire tread obtained in the step S4, and the candidate text blocks are converted into horizontal text blocks with upward directions;
s6: and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
Example 4
A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method steps of:
s1: carrying out feature extraction on the tire image acquired with the tire DOT information to respectively obtain first feature maps output in N stages, and simultaneously carrying out feature fusion on the first feature maps output in the N stages to obtain a second feature map;
s2: performing DOT information rough positioning on the fused second feature map, and detecting whether three characters of DOT and position information thereof exist or not so as to obtain a regional map;
s3: generating a mask image from the area image, multiplying the mask image by the second feature image, and performing DOT information fine positioning on a third feature image obtained after multiplication to obtain DOT information text probability and position information so as to position a candidate text block with an angle;
s4: detecting the bending direction of the tire on the first characteristic diagram output in the last stage to acquire character direction information of the tire tread of the tire;
s5: affine transformation is carried out on the candidate text blocks obtained in the step S3 and the character direction information of the tire tread obtained in the step S4, and the candidate text blocks are converted into horizontal text blocks with upward directions;
s6: and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
The embodiments of the present invention can be arbitrarily combined to achieve different technical effects.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk SolidStateDisk), among others.
One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A tire DOT information identification method based on end-to-end deep learning is characterized in that: the method comprises the following steps:
s1: carrying out feature extraction on the tire image acquired with the tire DOT information to respectively obtain first feature maps output in N stages, and simultaneously carrying out feature fusion on the first feature maps output in the N stages to obtain a second feature map;
s2: performing DOT information rough positioning on the fused second feature map, and detecting whether three characters of DOT and position information thereof exist or not so as to obtain a regional map;
s3: generating a mask image from the area image, multiplying the mask image by the second feature image, and performing DOT information fine positioning on a third feature image obtained after multiplication to obtain DOT information text probability and position information so as to position a candidate text block with an angle;
s4: detecting the bending direction of the tire on the first characteristic diagram output in the last stage to acquire character direction information of the tire tread of the tire;
s5: affine transformation is carried out on the candidate text blocks obtained in the step S3 and the character direction information of the tire tread obtained in the step S4, and the candidate text blocks are converted into horizontal text blocks with upward directions;
s6: and inputting the horizontal text block into a text recognition network based on deep learning to perform DOT character recognition, so as to obtain final recognition information.
2. The method of identifying tire DOT information based on end-to-end deep learning of claim 1, wherein: step S1, extracting features by using a feature extraction network, wherein the feature extraction network comprises a ResNet-50 network and a feature pyramid network FPN;
the ResNet-50 network firstly extracts the characteristics of the tire images of which the DOT information is acquired, so as to obtain first characteristic maps C1, C2, C3 and C4 which are output in 4 stages, wherein the corresponding resolutions are 1/4,1/8,1/16 and 1/32 of the input tire images respectively;
inputting the first feature maps C1, C2, C3 and C4 into a feature pyramid network FPN for feature fusion, wherein the feature fusion is used for connecting low-level feature mapping and high-level semantic feature mapping; and respectively outputting second characteristic maps P1, P2, P3 and P4.
3. The method of identifying tire DOT information based on end-to-end deep learning of claim 2, wherein: the rough location of the DOT information is as follows:
s201: inputting the fused second feature P1 into a spatial attention module to obtain an output feature map A1;
s202: prediction is carried out through a regional suggestion network RPN, softmax classification and position regression are carried out on the feature map A1, and a regional map containing the DOT three characters and position information thereof is obtained.
4. A tire DOT information identification method based on end-to-end deep learning according to claim 3, characterized in that: the detailed steps of the DOT information fine positioning are as follows:
s301: establishing a text detection branch for a third feature map, wherein the size of the third feature map is 1/4 the size of the collected tire image;
s302: the third feature map consists of six channels, the first channel calculates the probability that each pixel is a positive sample, the middle four channels calculate the distance between each positive sample pixel point and the upper, right, lower and left boundaries of the text box, and the last channel predicts the direction of the related boundary box;
s303: thereby generating DOT information text probabilities and location information and locating candidate text blocks for the tire image.
5. The method of identifying tire DOT information based on end-to-end deep learning of claim 4, wherein: the detection of the bending direction of the tire is as follows: and taking out the last layer of feature map C4 output by the ResNet-50 network, classifying through two layers of full connection layers and through class, predicting a 4-dimensional array, representing the probability of belonging to four directions, namely the upper direction, the lower direction, the left direction and the right direction of the tire bending, and obtaining the character direction information of the tire tread.
6. The method of identifying DOT information in a tire based on end-to-end deep learning of claim 5, wherein: in step S5, the affine transformation is specifically as follows:
inputting the obtained character direction information of the tire tread and the obtained candidate text block with an angle as affine transformation parameters into an ROI Rotate module, and carrying out affine transformation on the text block to convert the text block into a horizontal text block with an upward direction; the process of the radiation transformation is divided into two steps:
(1) calculating affine transformation parameters through the predicted coordinates of the text candidates obtained in step S3 and the character direction information of the tire tread obtained in step S4;
(2) an affine transformation is applied to the shared feature maps of each region separately, and a normal-case horizontal feature map of the text region is obtained.
7. The method of identifying tire DOT information based on end-to-end deep learning of claim 6, wherein: in order to reduce the influence of local loss of each stage on convergence, a total loss function is adopted for training to ensure effective convergence; wherein the total loss function is defined as follows:
Ltotal=λ1Ldot+λ2Ldetect+λ3Lcls+λ4Lrg
in the formula, LdotA loss function representing a DOT information rough positioning stage; l isdetectA loss function representing the DOT information fine positioning stage; l isclsA loss function representing a stage of detecting a bending direction of the tire; l isrgA loss function representing a DOT character recognition stage; lambda [ alpha ]1,λ2,λ3,λ4Are the corresponding trade-off factors representing the contribution of the four losses to the overall loss function.
8. The method of identifying tire DOT information based on end-to-end deep learning of claim 7, wherein: loss function L of DOT information rough positioning stagedotThe system consists of a classification loss function and a position regression loss function, and the formula is as follows:
in the formula, i represents an anchor frame index; p is a radical ofiRepresenting the probability that a positive sample is predicted;representing the probability of the corresponding true value; t is tiA candidate box representing a prediction;indicating the true label corresponding to the positive anchorFraming; n is a radical ofs、NgRespectively representing the number of samples of the corresponding tasks; l issRepresenting a classification loss function; l isgThe positional regression loss function is represented.
9. The method of identifying tire DOT information based on end-to-end deep learning of claim 7, wherein: so that the loss function L of the fine positioning stage of the DOT informationdetect:
Ldetect=Ldcls+Ldreg
Wherein, | X ^ Y | represents the intersection of set X and Y, | X | and | Y | represent their element number, for cutting apart the task, | X | and | Y | represent group True and Predict mask cut separately; l isdclsRepresents a loss of classification;
Ldregrepresenting the total Loss of coordinate regression, using the IOU Loss + cosine angle difference Loss:
Ldreg=Liou+Lθ
IOU Loss:
wherein the content of the first and second substances,representing predicted geometry, R*Is its corresponding label box;
cosine angle difference loss:
10. The method of identifying tire DOT information based on end-to-end deep learning of claim 7, wherein: the loss function L of the tire bending direction detection stageclsThe formula is as follows:
in the formula, ajRepresents the jth value of the input vector T; a iskRepresents the kth value of the input vector T; y isjRepresenting a real tag; t represents the number of categories; sjIs the jth value of the vector S, indicating the probability that this sample belongs to the jth class;
the loss function L of the DOT character recognition stagergThe formula is as follows:
where ψ represents a set of ground truth sequences; y denotes the estimated sequence and l the authentic marker sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111370406.5A CN113989604B (en) | 2021-11-18 | 2021-11-18 | Tire DOT information identification method based on end-to-end deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111370406.5A CN113989604B (en) | 2021-11-18 | 2021-11-18 | Tire DOT information identification method based on end-to-end deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113989604A true CN113989604A (en) | 2022-01-28 |
CN113989604B CN113989604B (en) | 2024-06-25 |
Family
ID=79749384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111370406.5A Active CN113989604B (en) | 2021-11-18 | 2021-11-18 | Tire DOT information identification method based on end-to-end deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113989604B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116824591A (en) * | 2023-04-19 | 2023-09-29 | 钛玛科(北京)工业科技有限公司 | Identification method for tire sidewall characters |
CN116894937A (en) * | 2023-06-25 | 2023-10-17 | 德联易控科技(北京)有限公司 | Method, system and electronic equipment for acquiring parameters of wheel aligner |
CN117173711A (en) * | 2023-08-18 | 2023-12-05 | 安徽工程大学产业创新技术研究有限公司 | Automobile tire parameter identification and detection method and service platform |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010022185A1 (en) * | 2008-08-19 | 2010-02-25 | Digimarc Corporation | Methods and systems for content processing |
CN108960245A (en) * | 2018-07-13 | 2018-12-07 | 广东工业大学 | The detection of tire-mold character and recognition methods, device, equipment and storage medium |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN113516123A (en) * | 2021-05-14 | 2021-10-19 | 南京工程学院 | Detection and identification method for tire embossed characters |
-
2021
- 2021-11-18 CN CN202111370406.5A patent/CN113989604B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010022185A1 (en) * | 2008-08-19 | 2010-02-25 | Digimarc Corporation | Methods and systems for content processing |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN108960245A (en) * | 2018-07-13 | 2018-12-07 | 广东工业大学 | The detection of tire-mold character and recognition methods, device, equipment and storage medium |
CN113516123A (en) * | 2021-05-14 | 2021-10-19 | 南京工程学院 | Detection and identification method for tire embossed characters |
Non-Patent Citations (1)
Title |
---|
陈裕潮;蔡念;刘根;张福;李沅时;王晗;陈新度: "一种基于机器视觉的轮胎模具表面字符检测方法", 锻压技术, no. 012, 31 December 2016 (2016-12-31) * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116824591A (en) * | 2023-04-19 | 2023-09-29 | 钛玛科(北京)工业科技有限公司 | Identification method for tire sidewall characters |
CN116824591B (en) * | 2023-04-19 | 2023-12-05 | 钛玛科(北京)工业科技有限公司 | Identification method for tire sidewall characters |
CN116894937A (en) * | 2023-06-25 | 2023-10-17 | 德联易控科技(北京)有限公司 | Method, system and electronic equipment for acquiring parameters of wheel aligner |
CN116894937B (en) * | 2023-06-25 | 2024-02-06 | 德联易控科技(北京)有限公司 | Method, system and electronic equipment for acquiring parameters of wheel aligner |
CN117173711A (en) * | 2023-08-18 | 2023-12-05 | 安徽工程大学产业创新技术研究有限公司 | Automobile tire parameter identification and detection method and service platform |
Also Published As
Publication number | Publication date |
---|---|
CN113989604B (en) | 2024-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110738207B (en) | Character detection method for fusing character area edge information in character image | |
CN108961235B (en) | Defective insulator identification method based on YOLOv3 network and particle filter algorithm | |
CN109344701B (en) | Kinect-based dynamic gesture recognition method | |
CN113160192B (en) | Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background | |
CN113989604A (en) | Tire DOT information identification method based on end-to-end deep learning | |
CN111950453A (en) | Optional-shape text recognition method based on selective attention mechanism | |
CN110751154B (en) | Complex environment multi-shape text detection method based on pixel-level segmentation | |
CN113435319B (en) | Classification method combining multi-target tracking and pedestrian angle recognition | |
CN112541491A (en) | End-to-end text detection and identification method based on image character region perception | |
CN110751619A (en) | Insulator defect detection method | |
CN106407978B (en) | Method for detecting salient object in unconstrained video by combining similarity degree | |
CN117437647B (en) | Oracle character detection method based on deep learning and computer vision | |
CN112364687A (en) | Improved Faster R-CNN gas station electrostatic sign identification method and system | |
CN113780040A (en) | Lip key point positioning method and device, storage medium and electronic equipment | |
CN111612802A (en) | Re-optimization training method based on existing image semantic segmentation model and application | |
CN111738264A (en) | Intelligent acquisition method for data of display panel of machine room equipment | |
CN108985216B (en) | Pedestrian head detection method based on multivariate logistic regression feature fusion | |
CN116740572A (en) | Marine vessel target detection method and system based on improved YOLOX | |
CN110889418A (en) | Gas contour identification method | |
CN116363655A (en) | Financial bill identification method and system | |
CN115953744A (en) | Vehicle identification tracking method based on deep learning | |
CN115049833A (en) | Point cloud component segmentation method based on local feature enhancement and similarity measurement | |
CN113850167A (en) | Commodity identification method and system based on edge calculation and machine deep learning | |
CN113837015A (en) | Face detection method and system based on feature pyramid | |
CN111640071A (en) | Method for obtaining panoramic foreground target based on convolutional neural network frame difference repairing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |