CN113191358B - Metal part surface text detection method and system - Google Patents

Metal part surface text detection method and system Download PDF

Info

Publication number
CN113191358B
CN113191358B CN202110603294.7A CN202110603294A CN113191358B CN 113191358 B CN113191358 B CN 113191358B CN 202110603294 A CN202110603294 A CN 202110603294A CN 113191358 B CN113191358 B CN 113191358B
Authority
CN
China
Prior art keywords
text
image
text box
corrected
saliency map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110603294.7A
Other languages
Chinese (zh)
Other versions
CN113191358A (en
Inventor
谷朝臣
官同坤
王臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110603294.7A priority Critical patent/CN113191358B/en
Publication of CN113191358A publication Critical patent/CN113191358A/en
Application granted granted Critical
Publication of CN113191358B publication Critical patent/CN113191358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration by the use of histogram techniques
    • G06T5/73
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention provides a method and a system for detecting a text on the surface of a metal part, which comprises the following steps: a pretreatment step: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image; foreground feature focusing step: based on the preprocessed image, obtaining a saliency map through the image characteristics of the depth convolution network highlight text area; multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box; post-treatment: calculating the example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain the final text box position by combining the prediction score. The method solves the problem of text detection with complex background caused by metal attribute and industrial environment, realizes automatic segmentation of the character image of the metal part, outputs a high-precision text positioning frame, and improves the detection precision.

Description

Metal part surface text detection method and system
Technical Field
The invention relates to the technical field of text detection, in particular to a method and a system for detecting a text on the surface of a metal part.
Background
Text information is used as a key ring in the information era, is applied to network electronic information, text printing, traffic signs, product trademarks and the like, and plays an increasingly important role in the scientific era, so that the research on Optical Character Recognition (OCR) plays an important role in the fields of intelligent automation, information processing, AI and the like. Optical Character Recognition (OCR) applications hatched in the business scenario of enterprise resource planning have received a great deal of attention, such as gesture recognition, package print recognition and metal surface character recognition. Among them, tracking of metal parts is the most challenging in many industrial scenarios.
The direct metal part marking technology is a main means for marking part products, and means that determined part information is directly printed on the products when the parts are manufactured and produced, and the direct metal part marking technology mainly comprises three modes of laser engraving, pinhole marking and ink jet marking. The research and the analysis of OCR technique to metal parts surface character sign can be at the processing lines of all kinds of machines on the quick identification part model, information such as production information and producer, prevent the artifical emergence that leads to the mistake because of discerning fatigue, improve production efficiency.
The existing text detection method mainly researches the influence of the complexity of a natural scene, however, because a character data set on the surface of a metal part is difficult to collect, and in the field of text detection of the metal part, the problems of strong light reflection on the metal surface, large difference of metal textures, inconsistent character arrangement, poor foreground and background contrast, complex metal texture background and the like exist, the positioning of a text detection frame is not accurate enough, and the character identification tracked by the metal part is difficult.
Patent document CN110222680A (application number: CN 201910416098.1) discloses a text detection method for outer packaging of municipal solid waste articles: collecting an image data set of an outer package of the urban garbage article, and labeling a text region of each image in the image data set; generating a text score characteristic map and a multi-channel position characteristic map for each image in the image data set subjected to labeling according to the labeling of the text region to form a training label of each image; images in the image data set are processed according to the following steps of 9:1 into a training set and a test set; constructing a full convolution neural network model and training to obtain a trained full convolution neural network model; acquiring a prediction text region of an image to be detected by using a trained full convolution neural network model; a threshold value screening stage; and a non-maximum value inhibition stage, and obtaining a final text area detection result.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method and a system for detecting a text on the surface of a metal part.
The invention provides a metal part surface text detection method, which comprises the following steps:
a pretreatment step: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;
foreground feature focusing: based on the preprocessed image, highlighting the image characteristics of the text area through a depth convolution network to obtain a saliency map;
multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box;
and (3) post-treatment: calculating the example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain the final text box position by combining the prediction score.
Preferably, the pretreatment step comprises:
an image enhancement step: the method comprises the steps of carrying out equalization enhancement on the local contrast of a metal surface character image based on an RGB image self-adaptive histogram, sharpening the metal surface character image by adopting a Laplace operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.
Preferably, the foreground feature focusing step includes:
semantic segmentation step: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding a self-adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;
foreground focusing: and comparing the saliency map with a mask map with tag information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features.
Preferably, the multi-scale rectification step comprises:
polygon selection: carrying out image binarization processing on the saliency map, filtering text boxes generated by convolutional networks of different levels and text boxes excluding background areas by using a binarization result as a mask, and brushing and selecting to obtain multi-scale suggested text boxes;
a position correction step: and coding the suggested text box into a fixed shape by applying the ROI pooling model, extracting the ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box.
Preferably, the post-processing step comprises:
and (4) re-grading: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;
a non-maximum suppression step: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.
The invention provides a metal part surface text detection system, which comprises:
a preprocessing module: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;
a foreground feature focusing module: based on the preprocessed image, highlighting the image characteristics of the text area through a depth convolution network to obtain a saliency map;
a multi-scale rectification module: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box;
a post-processing module: calculating the example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain the final text box position by combining the prediction score.
Preferably, the preprocessing module comprises:
an image enhancement module: the method comprises the steps of carrying out equalization enhancement on the local contrast of a metal surface character image based on an RGB image self-adaptive histogram, sharpening the metal surface character image by adopting a Laplace operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.
Preferably, the foreground feature focusing module includes:
a semantic segmentation module: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding a self-adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;
a foreground focusing module: and comparing the saliency map with a mask map with tag information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features.
Preferably, the multi-scale rectification module comprises:
a polygon selection module: carrying out image binarization processing on the saliency map, filtering text boxes generated by convolutional networks of different levels and text boxes excluding background areas by using a binarization result as a mask, and brushing and selecting to obtain multi-scale suggested text boxes;
a position correction module: and coding the suggested text box into a fixed shape by applying the ROI pooling model, extracting the ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box.
Preferably, the post-processing module comprises:
a re-scoring module: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;
a non-maxima suppression module: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention relates to a metal part surface text detection method based on refined positions and classification characteristics, which aims at the phenomena of low contrast, strong light reflection, uneven characters and complex textures on a metal surface, enhances the character contrast of the metal surface through self-adaptive histogram equalization and image sharpening, designs a semantic segmentation method focusing on the foreground, and highlights the text characteristics of a character area;
(2) The invention provides a quick and effective polygon selection algorithm aiming at the phenomenon that the text positioning of metal parts is inaccurate, so that background frames are effectively filtered, more accurate foreground frames are provided for correcting a network for regression, and the positioning effect is improved;
(3) The invention provides a re-scoring mechanism by combining the example scores of the predicted position frames, thereby not only obtaining the detection frames with accurate positioning, but also improving the comprehensive index of text detection.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of text detection on a surface of a metal part.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will aid those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any manner. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment is as follows:
the method adopts a self-adaptive histogram equalization and image sharpening method to preprocess the RGB image; aiming at the characteristics of low contrast, strong light reflection, character corrosion and the like of the surface of the metal part, respectively adopting attention mechanism focusing text characteristics of high-level characteristics and low-level characteristics; then, more text region features with low contrast and difficult distinguishing are detected by adopting a foreground-based segmentation loss function, so that the segmentation effect is further improved; aiming at the problem that the prediction box of the algorithm is prone to have inaccurate positioning in the text edge area, selecting a better prediction box according to a saliency map obtained by segmentation, and then carrying out secondary correction on a suggestion box; calculating the example score of the correction frame based on the position and the score of the correction frame, and screening all the correction frames by adopting a re-scoring mechanism; and finally, obtaining a final prediction frame according to a non-maximum suppression algorithm, and establishing a text detection evaluation system by combining the indexes of calculating the accuracy, recall rate, comprehensive index, IOU and the like of the IOU from the prediction frame, thereby realizing the digital quantitative evaluation of the text detection algorithm.
The invention provides a metal part surface text detection method based on a refined position and classification characteristics, which comprises the following steps:
a pretreatment step: identifying the metal surface character image, and carrying out image enhancement on the metal surface character image through preprocessing to obtain a high-quality preprocessed image;
foreground feature focusing step: inputting the preprocessed metal surface character image, and obtaining a segmented saliency map through the image characteristics of a deep convolution network highlight text area;
multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and sending the selected text boxes into the correction feature network for re-evaluation;
and (3) post-treatment: extracting the example scores of the corrected network prediction frame on the image features of the saliency map, re-scoring the suggestion frame based on the classification scores of the corrected feature network prediction by combining the example scores and the classification scores, and obtaining a final corrected text frame by applying a non-maximum suppression post-processing method.
Specifically, the pretreatment step includes:
an image enhancement step: the method comprises the steps of carrying out equalization enhancement on the local contrast of a metal part image based on an RGB image self-adaptive histogram, sharpening the metal part image by adopting a Laplacian operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.
Specifically, the foreground feature focusing step includes:
semantic segmentation step: sending the metal part character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding an adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;
foreground focusing step: and comparing the saliency map obtained in the semantic segmentation step with a mask map with label information, and setting segmentation threshold values for low-contrast and difficult-to-identify areas to exchange more foreground text features.
Specifically, the multi-scale rectification step comprises:
polygon selection: filtering text boxes generated by convolutional networks of different levels as masks according to the binarization result of the semantically segmented saliency map, excluding text boxes in a background area, and brushing to obtain multi-scale suggested text boxes;
and a position correction step: and coding the suggested text box generated in the polygon selection step into a fixed shape by applying an RROI pooling model, extracting the characteristics of the ROI area, and sending the ROI area characteristics into a classification and regression network to obtain a corrected text box.
Specifically, the post-processing step includes:
and (3) re-grading: calculating the score of the modified text box example according to the binarized saliency map, and re-evaluating the score of each modified text box by combining the prediction score of the modified text box;
a non-maximum suppression step: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.
As shown in fig. 1, the segmentation method based on foreground feature focusing, the multi-scale position correction method and the text box re-scoring mechanism of the present invention process the enhanced metal part image, so as to realize the detection and positioning of the text box, and include the following steps:
step 100, image preprocessing, including step 110 and step 120, wherein adaptive histogram equalization and image sharpening are adopted to enhance the image, improve the image quality and prepare for subsequent processing;
step 110: enhancing the contrast of the metal part image, and adopting different enhancement schemes for different parts of the image based on self-adaptive histogram equalization to enhance the contrast and simultaneously keep the image details;
step 120: realizing image sharpening based on a Laplace operator, keeping high-frequency information of the image, and highlighting detail characteristics of a text region, wherein the Laplace operator is as follows:
Figure BDA0003093419430000061
step 200: foreground feature focusing based segmentation comprises steps 210, 220 and 230, an attention mechanism is set based on pyramid network hierarchical levels of a ResNet-FPN framework, multi-level features are fused to enhance hierarchical feature relevance, high-level features and low-level features are combined to obtain a saliency map, and then a loss function focusing foreground is designed to promote segmentation of low-contrast text images.
Step 210: the low-level network has more details and poor semantic perception capability, while the high-level network has more semantic information and less detail characteristics. A parallel structure is designed for a high-level network, and a multi-resolution subnet of a high-level feature map is fused by mutually exchanging information, so that each level of the multi-resolution subnet is ensured to contain feature information with higher resolution, and the spatial features of the subnet are enriched. And weighting each channel of the low-level network to highlight text features, and acquiring more semantic information.
Specifically, let the pyramid network be divided into P 2 ,P 3 ,P 4 ,P 5 Four layers, conv, DConv and UConv respectively represent different convolution types, c is the channel number, j represents a pixel point on the characteristic diagram, CA is a channel attention mechanism, and P is selected 2 As a low-level network, P 3 ,P 4 ,P 5 As a high-level network, the following processing is performed on the low-level features:
L map =Conv(P 2 )
Figure BDA0003093419430000062
Figure BDA0003093419430000063
Figure BDA0003093419430000064
the high-level network has multi-scale subnets, and more detailed characteristics are obtained by the following processing:
Figure BDA0003093419430000071
Figure BDA0003093419430000072
Figure BDA0003093419430000073
Figure BDA0003093419430000074
Figure BDA0003093419430000075
then fusing the high-level feature maps
Figure BDA0003093419430000076
And low-level feature maps
Figure BDA0003093419430000077
Forming a supervised saliency map S map
Step 220: according to the characteristics of low-contrast characters on a metal surface, when the background noise is eliminated, the character features of the text edge can be lost in a segmented mode, the regression position is easy to be incorrect, the foreground is concerned more, and firstly, dice is introduced to serve as a loss function L dice To solve the problem of the imbalance of positive and negative samples, then the significance map segmentation is promoted according to the following two original design segmentation loss mechanisms: a) As many text features as possible are included; b) The number of false detections is minimized. Assume that the tag is S gt And the label is related to S map Is the difference of S Diff The following loss function is proposed.
L a =(S Diff ≥1/2)*(1-F)
L b =(-S Diff ≥1/2)*F
Figure BDA0003093419430000078
Figure BDA0003093419430000079
Gamma denotes the equilibrium parameter, F denotes S map The result after binarization, Δ, represents an upper limit of the allowable false detection rate in exchange for detecting more text features in low contrast and indistinguishable areas.
Step 230: the complicated metal part background is easy to interfere with a subsequent classification network and a regression network, so that a saliency map obtained by semantic segmentation is applied to each hierarchical subnet, text features in the classification network and the regression network are highlighted, and background noise is suppressed, and the method specifically comprises the following steps:
Figure BDA00030934194300000710
Figure BDA00030934194300000711
Figure BDA00030934194300000712
Figure BDA00030934194300000713
the classification network and regression network refer to two sub-networks of common structure and different parameters, which are composed of four 3 × 3 convolutional layers and one 3 × 5 convolutional layer. Each layer of the pyramid network is used for classification and regression tasks, respectively, after the saliency map is weighted. The method effectively inhibits background noise, retains more text features as far as possible, generates a prediction box with high matching precision, has the defect that the position of the prediction box cannot effectively cover the text, and is difficult to realize metal tracking subsequently, and the problem can be solved in secondary regression based on a saliency map subsequently.
Step 300 is a multi-scale saliency map-based rectification step, comprising steps 310 and 320, using foreground-focus based segmentation results, employing a polygon selection algorithm, and applying a refinement model to recalibrate the proposed box.
Step 310: each layer of the pyramid network generates a prediction box through classification and regression subnetworks, however, especially for oblique texts, the positions of the prediction boxes are not accurate enough, so k =500 prediction boxes attached to the foreground are obtained based on a saliency map obtained through segmentation and fed into a refinement feature network, and a specific algorithm is as follows:
Figure BDA0003093419430000081
Figure BDA0003093419430000091
step 320: and (3) coding the suggested text box generated in the polygon selection step into a fixed shape by applying an RROI pooling model, extracting the ROI regional characteristics from the ResNet-FPN frame, and sending the ROI regional characteristics into a classification and regression network to obtain the corrected text box position and score.
Step 400: a re-scoring mechanism, which comprises a step 410 and a step 420, wherein a score of a modified text box instance is calculated based on the binarized saliency map, and a score is re-evaluated for each modified text box in combination with a prediction score of the modified text box;
step 410: and calculating a score of a modified text box instance by combining the extracted saliency frames according to the positions and the scores of the text boxes generated by the modified network, and re-evaluating the score of each text box. Assume a classification score of S for the suggested text box c Example score of S I The specific process comprises the following steps:
P V ={ρ 1 ,…,ρ n }
Figure BDA0003093419430000092
Figure BDA0003093419430000093
wherein: mu is set to 1/4 V For the suggestion box from S map The extracted set of pixels.
Step 420: and an NMS (non-maximum rendering) algorithm is applied to remove repeated text boxes, and a text detection evaluation system is established based on indexes such as the IOU value establishing accuracy, the recall rate and the comprehensive score of the prediction box and the label.
The accuracy calculation formula:
Figure BDA0003093419430000094
recall ratio calculation formula:
Figure BDA0003093419430000095
the comprehensive fraction calculation formula is as follows:
Figure BDA0003093419430000096
where tp, fp, fn represent the number of hit text boxes, identify erroneous text boxes and missing text boxes, respectively.
The method solves the problem of positioning the text information on the surface of the metal part in the industrial environment so as to help the metal part to track and record on the industrial production line; through research and analysis to metal parts surface character sign, can discern information such as part model, size and producer fast on the processing lines of all kinds of machines, prevent the artifical emergence that leads to the mistake because of discerning fatigue, improve production efficiency.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (2)

1. A method for detecting texts on the surfaces of metal parts is characterized by comprising the following steps:
a pretreatment step: recognizing the metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;
foreground feature focusing step: based on the preprocessed image, highlighting the image characteristics of the text area through a depth convolution network to obtain a saliency map;
multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box;
post-treatment: calculating an example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain a final text box position in combination with the prediction score;
the pretreatment step comprises:
an image enhancement step: based on an RGB image self-adaptive histogram, the local contrast of a metal surface character image is enhanced in a balanced mode, meanwhile, a Laplacian operator is adopted to sharpen the metal surface character image, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
the foreground feature focusing step comprises:
semantic segmentation step: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding a self-adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;
foreground focusing step: comparing the saliency map with a mask map with label information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features;
the multi-scale rectification step comprises:
polygon selection: carrying out image binarization processing on the saliency map, filtering text boxes generated by convolutional networks of different levels and text boxes excluding background areas by using a binarization result as a mask, and brushing and selecting to obtain multi-scale suggested text boxes;
a position correction step: coding the suggested text box to a fixed shape by applying an ROI pooling model, extracting ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box;
the post-processing step comprises:
and (3) re-grading: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;
a non-maximum suppression step: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.
2. A metal part surface text detection system, comprising:
a preprocessing module: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;
a foreground feature focusing module: based on the preprocessed image, obtaining a saliency map through the image characteristics of the depth convolution network highlight text area;
a multi-scale rectification module: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box;
a post-processing module: calculating an example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain a final text box position in combination with the prediction score;
the preprocessing module comprises:
an image enhancement module: based on an RGB image self-adaptive histogram, the local contrast of a metal surface character image is enhanced in a balanced mode, meanwhile, a Laplacian operator is adopted to sharpen the metal surface character image, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
the foreground feature focusing module includes:
a semantic segmentation module: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding an adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;
a foreground focusing module: comparing the saliency map with a mask map with label information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features;
the multi-scale rectification module comprises:
a polygon selection module: carrying out image binarization processing on the saliency map, filtering text boxes generated by convolutional networks of different levels and text boxes excluding background areas by using a binarization result as a mask, and brushing and selecting to obtain multi-scale suggested text boxes;
a position correction module: coding the suggested text box to a fixed shape by applying an ROI pooling model, extracting ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box;
the post-processing module comprises:
a re-scoring module: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;
a non-maxima suppression module: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.
CN202110603294.7A 2021-05-31 2021-05-31 Metal part surface text detection method and system Active CN113191358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110603294.7A CN113191358B (en) 2021-05-31 2021-05-31 Metal part surface text detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110603294.7A CN113191358B (en) 2021-05-31 2021-05-31 Metal part surface text detection method and system

Publications (2)

Publication Number Publication Date
CN113191358A CN113191358A (en) 2021-07-30
CN113191358B true CN113191358B (en) 2023-01-24

Family

ID=76985941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110603294.7A Active CN113191358B (en) 2021-05-31 2021-05-31 Metal part surface text detection method and system

Country Status (1)

Country Link
CN (1) CN113191358B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471831B (en) * 2021-10-15 2024-01-23 中国矿业大学 Image saliency detection method based on text reinforcement learning
CN114120305B (en) * 2021-11-26 2023-07-07 北京百度网讯科技有限公司 Training method of text classification model, and text content recognition method and device
CN116912845B (en) * 2023-06-16 2024-03-19 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020676A (en) * 2019-03-18 2019-07-16 华南理工大学 Method for text detection, system, equipment and medium based on more receptive field depth characteristics
CN111898411A (en) * 2020-06-16 2020-11-06 华南理工大学 Text image labeling system, method, computer device and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549893B (en) * 2018-04-04 2020-03-31 华中科技大学 End-to-end identification method for scene text with any shape
CN110895695B (en) * 2019-07-31 2023-02-24 上海海事大学 Deep learning network for character segmentation of text picture and segmentation method
CN111598861B (en) * 2020-05-13 2022-05-03 河北工业大学 Improved Faster R-CNN model-based non-uniform texture small defect detection method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020676A (en) * 2019-03-18 2019-07-16 华南理工大学 Method for text detection, system, equipment and medium based on more receptive field depth characteristics
CN111898411A (en) * 2020-06-16 2020-11-06 华南理工大学 Text image labeling system, method, computer device and storage medium

Also Published As

Publication number Publication date
CN113191358A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN113191358B (en) Metal part surface text detection method and system
CN108596066B (en) Character recognition method based on convolutional neural network
CN110598609B (en) Weak supervision target detection method based on significance guidance
CN107729899B (en) License plate number recognition method and device
CN112686812B (en) Bank card inclination correction detection method and device, readable storage medium and terminal
CN108334881B (en) License plate recognition method based on deep learning
Nandi et al. Traffic sign detection based on color segmentation of obscure image candidates: a comprehensive study
CN107832767A (en) Container number identification method, device and electronic equipment
Türkyılmaz et al. License plate recognition system using artificial neural networks
TW202239281A (en) Electronic substrate defect detection
CN107016394B (en) Cross fiber feature point matching method
CN112819840B (en) High-precision image instance segmentation method integrating deep learning and traditional processing
Pandya et al. Morphology based approach to recognize number plates in India
Verma et al. Automatic container code recognition via spatial transformer networks and connected component region proposals
CN113836850A (en) Model obtaining method, system and device, medium and product defect detection method
CN110956167A (en) Classification discrimination and strengthened separation method based on positioning characters
CN110796210A (en) Method and device for identifying label information
Kumar et al. D-PNR: deep license plate number recognition
CN112508000B (en) Method and equipment for generating OCR image recognition model training data
CN114419006A (en) Method and system for removing watermark of gray level video characters changing along with background
CN111950556A (en) License plate printing quality detection method based on deep learning
Deb et al. A vehicle license plate detection method for intelligent transportation system applications
CN111402185A (en) Image detection method and device
CN112330659B (en) Geometric tolerance symbol segmentation method combining LSD (least squares) linear detection and connected domain marking method
CN111753842B (en) Method and device for detecting text region of bill

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant