CN114973228A - Metal part surface text recognition method and system based on contour feature enhancement - Google Patents
Metal part surface text recognition method and system based on contour feature enhancement Download PDFInfo
- Publication number
- CN114973228A CN114973228A CN202210610169.3A CN202210610169A CN114973228A CN 114973228 A CN114973228 A CN 114973228A CN 202210610169 A CN202210610169 A CN 202210610169A CN 114973228 A CN114973228 A CN 114973228A
- Authority
- CN
- China
- Prior art keywords
- character
- image
- sequence
- attention
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
- G06V30/168—Smoothing or thinning of the pattern; Skeletonisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19107—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/1918—Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Abstract
The invention provides a method and a system for recognizing a text on the surface of a metal part based on contour feature enhancement, which comprises the following steps: recognizing a metal surface character image, obtaining a preprocessed image through preprocessing, extracting multi-scale features of characters to obtain a multi-level feature map, and extracting sequence features; fusing the multi-scale features from bottom to top according to the resolution, obtaining the segmentation result of the characters according to the mask image, and extracting the attention information; constructing a character outline label by using the attention information and the segmentation result to obtain the topological characteristic of the character; and fusing the optimized sequence characteristics and the topology characteristics to obtain a final prediction result. The method solves the problem of text recognition of unordered character strings in the metal image with low contrast; the method has the advantages that extra character-level labels are not needed, the learned topological features of the characters of the metal parts are injected into the recognition branches, the semantic information recognized by the single characters in the unordered text is enriched, and the recognition accuracy is improved.
Description
Technical Field
The invention relates to the technical field of text recognition in the field of artificial intelligence, in particular to a method and a system for recognizing a text on the surface of a metal part based on contour feature enhancement. In particular, the invention preferably relates to a character-driven topological feature enhanced metal part surface text recognition method.
Background
Text information is used as a key ring in the information era, under the background of intelligent manufacturing, through research and analysis on character marks on the surfaces of metal parts, texts are detected to play an increasingly important role, the types of the parts, production information, manufacturers and other information can be quickly identified on processing production lines of various machines, errors caused by identifying fatigue in manual work are prevented, and the assembly speed of an industrial production line and the logistics transmission efficiency in an industrial scene are improved.
Existing research methods mainly come from Scene Text Recognition (STR), which aims to recognize regular and irregular texts from multi-scene images, and are widely applied to handwriting recognition, industrial print recognition and visual understanding. In addition to the problem of indefinite length character sequences of Optical Character Recognition (OCR) available in documents, the STR task faces irregular structure (curved, oriented and distorted text), low resolution, heavy occlusion and uneven lighting. Inspired by natural language processing methods, researchers have developed encoder-decoder architectures with attention to address the above challenges. The encoder extracts visual features of the text image and outputs a semantic feature representation, and the decoder constructs an attention mechanism to capture time-dependent attention character features in different decoding steps. These attention mechanisms are summarized as a one-dimensional (1D) attention mechanism and a two-dimensional (2D) attention mechanism.
Specifically, the encoder-decoder approach with a one-dimensional attention mechanism uses a long short-term memory (LSTM) unit to construct an implicit linguistic representation of the attention-focused feature sequence, and then sequentially outputs character classes. The one-dimensional attention mechanism generates different attention weights at different decoding steps, and thus multiplies the encoder output to indicate important items of the encoder output at the current decoding step, and finally the feature vector is used for character recognition. For regular text recognition, the attention-based approach performs sequence modeling on a one-dimensional space, where the input text images are encoded into a one-dimensional sequence via CNN, then uses the encoder BiLSTM to capture the remote dependencies of the input sequence, and then outputs their states sequentially to the decoder.
For irregular text recognition, the curved and distorted text images are compressed into one-dimensional sequence features, which inevitably bring irrelevant information, generate weak semantic representation and cause error accumulation. Therefore, constructing a two-dimensional attention mechanism is the mainstream solution. 2D attention-based methods capture the spatial features of the corresponding characters by embedding time-dependent sequences throughout the network, and various 2D attention feature representations are proposed in the decoding process.
The patent document of the invention discloses a natural scene text recognition method based on a two-dimensional feature attention mechanism, with publication number CN110378334A, and the method comprises the following steps: data acquisition: synthesizing a line text picture for training by using an open code, dividing the line text picture into a regular training set and an irregular training set according to the shape, and downloading a text picture which is really shot from the internet as test data; 2, data processing: stretching the size of the picture, wherein the size of the processed picture is 32 × 104; 3, label preparation: training a recognition model by adopting a supervision method, wherein each line of text picture has corresponding text content; 4, training the network: training the recognition network by using the data in the training set; 5, testing the network: test data is input into the trained network to obtain a prediction result of the line text picture.
With respect to the related art in the foregoing, the inventors consider that the 1D attention-based method in the foregoing method can quickly and robustly recognize regular texts, but they do not perform well on irregular and long texts due to the sequential attention of the alignment drift. The 2D attention-based approach provides an effective solution for identifying irregular text, but the extracted 2D attention features roughly focus on the spatial regions of the character, while less consideration is given to the topological information of the glyph (e.g., outline and pixel level locations), resulting in degraded performance.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method and a system for recognizing the text on the surface of a metal part based on contour feature enhancement.
The invention provides a metal part surface text recognition method based on outline feature enhancement, which comprises the following steps:
a pretreatment step: recognizing the metal surface character image, carrying out image enhancement on the metal surface character image through pretreatment to obtain a pretreatment image, and then classifying the pretreatment image to obtain a mask image;
and (3) feature coding: inputting a preprocessed metal surface character image, extracting multi-scale features of characters to obtain a multi-level feature map, and extracting sequence features from a feature map of a last layer in the multi-level feature map by using an attention mechanism;
and (3) sequence alignment step: fusing the multi-scale features from bottom to top according to the resolution by using a convolution network, obtaining a segmentation result of the character according to the mask image, and then optimizing the attention information in the sequence features by using the segmentation result;
a character segmentation step: constructing a character outline label by using the optimized attention information and the obtained segmentation result, then using the character outline label to supervise a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters;
and (3) feature fusion step: and fusing the sequence characteristics and the topology characteristics after the attention information is optimized, and sending the sequence characteristics and the topology characteristics into the parallel recognition branches to obtain a final prediction result.
Preferably, the pretreatment step comprises the steps of:
an image enhancement step: based on an RGB image self-adaptive histogram, the local contrast of the metal part image is enhanced in a balanced mode, meanwhile, the metal part image is sharpened by adopting a Laplace operator, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
unsupervised clustering step: and obtaining a mask result of the characters by an unsupervised clustering method based on the obtained preprocessed image.
Preferably, the feature encoding step includes the steps of:
a characteristic extraction step: sending the character image of the metal part into a deep convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the resolution ratio being a preset value;
a sequence attention extraction step: and designing an attention method, and fusing the feature sequences with the height of a preset value obtained in the feature extraction step to obtain the feature sequences of the corresponding characters.
Preferably, the sequence alignment step comprises the steps of:
text semantic segmentation step: carrying out bottom-to-top fusion on the multi-level feature maps obtained in the feature extraction step through a convolution network, obtaining a saliency map of the character through a classification layer and a nonlinear activation function, then establishing loss between a mask map obtained in the unsupervised clustering step and the saliency map, and optimizing a network segmentation result;
and (3) attention sequence correction: and designing an optimization function to optimize the attention result obtained in the sequence attention extraction step according to the character saliency map obtained in the text semantic segmentation step.
Preferably, the character segmentation step includes the steps of:
a character semantic segmentation step: constructing a character outline level label according to the attention result after the optimization of the attention sequence correction step and the character saliency map obtained in the semantic segmentation step, constructing a time sequence classification network to predict character outline information, and establishing a constructed character outline level label for loss to supervise and optimize the prediction result;
generating topological characteristics: and performing cross multiplication on the character segmentation graph obtained by predicting according to the character semantic segmentation step and the multi-scale features obtained in the feature extraction step to obtain the topological features.
Preferably, the feature fusion step includes the steps of:
and (3) feature fusion step: extracting semantic information from the optimized sequence features and topology features in a fusion mode;
parallel identification: and obtaining a final prediction result through a plurality of full connection layers according to the semantic information obtained in the characteristic fusion step.
The invention provides a metal part surface text recognition system based on outline feature enhancement, which comprises the following modules:
a pretreatment module: recognizing the metal surface character image, carrying out image enhancement on the metal surface character image through pretreatment to obtain a pretreatment image, and then classifying the pretreatment image to obtain a mask image;
a feature encoding module: inputting a preprocessed metal surface character image, extracting multi-scale features of characters to obtain a multi-level feature map, and extracting sequence features from a feature map of a last layer in the multi-level feature map by using an attention mechanism;
a sequence alignment module: fusing the multi-scale features from bottom to top according to the resolution by using a convolution network, obtaining a segmentation result of the character according to the mask image, and then optimizing the attention information in the sequence features by using the segmentation result;
a character segmentation module: constructing a character outline label by using the optimized attention information and the obtained segmentation result, then using the character outline label to supervise a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters;
a feature fusion module: and fusing the sequence characteristics and the topology characteristics after the attention information is optimized, and sending the sequence characteristics and the topology characteristics into the parallel recognition branches to obtain a final prediction result.
Preferably, the preprocessing module comprises the following modules:
an image enhancement module: based on an RGB image self-adaptive histogram, the local contrast of the metal part image is enhanced in a balanced mode, meanwhile, the metal part image is sharpened by adopting a Laplace operator, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
unsupervised clustering module: and obtaining a mask result of the characters by an unsupervised clustering method based on the obtained preprocessed image.
Preferably, the feature encoding module includes the following modules:
a feature extraction module: sending the character image of the metal part into a depth convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the resolution higher than a preset value;
a sequence attention extraction module: and designing an attention method, and fusing the feature sequences with the height of a preset value, which are obtained by the feature extraction module, to obtain the feature sequences of the corresponding characters.
Preferably, the sequence alignment module comprises the following modules:
a text semantic segmentation module: the multi-level feature map obtained by the feature extraction module is subjected to bottom-up fusion through a convolution network, a saliency map of the character is obtained through a classification layer and a nonlinear activation function, then a mask map obtained by the unsupervised clustering module and the saliency map are subjected to loss building, and a network segmentation result is optimized;
attention sequence correction module: and designing an optimization function to optimize the attention result obtained by the sequence attention extraction module according to the character saliency map obtained by the text semantic segmentation module.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention discloses a character-driven topological feature enhanced metal part surface text recognition method, which aims to solve the problems of alignment drift sequence attention and thick character attention. The invention solves the problem of text recognition of unordered character strings in metal images with low contrast; the method has the advantages that extra character-level labels are not needed, the learned topological characteristics of the metal part characters are injected into the identification branches, semantic information identified by single characters in the unordered text is enriched, and identification precision is improved;
2. the method utilizes an unsupervised thought to obtain the character foreground for the first time, and then constructs the character outline information, which is helpful for helping the recognition network to increase the discrimination of different characters, so that richer semantic information can be contained in a prediction layer, and the recognition precision is improved; compared with a rough attention mechanism, the method contains richer semantic information, and the effectiveness of the visual model can be improved;
3. compared with a sequence identification mode suitable for orderliness or context relevance, the method provided by the invention can be used for well solving the identification problem of the unordered text by a parallel prediction mode aiming at the characteristic that the text on the surface of the metal part has the orderless property.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of text recognition of a surface of a metal part;
FIG. 2 is an exemplary diagram of some unaligned sequence attention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment of the invention discloses a character-driven topological feature enhancement-based metal part surface text recognition method, which comprises the following steps of:
a pretreatment step: identifying the metal surface character image, carrying out image enhancement on the metal surface character image through preprocessing to obtain a high-quality preprocessed image, and then classifying the character image (preprocessed image) through unsupervised binary classification to obtain a mask image (namely, carrying out the binary classification on the preprocessed image by using an unsupervised clustering method to obtain the mask image).
The pretreatment step comprises the following steps:
an image enhancement step: the method comprises the steps of carrying out balance enhancement on the local contrast of a metal part image based on an RGB image self-adaptive histogram, sharpening the metal part image by adopting a Laplace operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.
Unsupervised clustering step: and obtaining a mask result (a mask image only comprising a foreground and a background) of the character by an unsupervised clustering method based on the obtained preprocessed image.
And (3) feature coding: inputting the preprocessed metal surface character image, extracting multi-scale features (multi-level feature maps) of the character through a ResNet31 network, and extracting sequence features from the final feature map (the last layer of feature map of the multi-level feature maps) by using a 1D attention mechanism. Inputting a preprocessed metal surface character image, extracting multi-scale features of the character to obtain a multi-level feature map, and extracting sequence features from the feature map of the last level of the multi-level feature map by using an attention mechanism.
The feature encoding step includes the steps of:
a characteristic extraction step: and (3) sending the character image of the metal part into a deep convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the height of 1 in the resolution to obtain the high-dimensional features of the feature sequence.
Specifically, the character image of the metal part is sent into a deep convolution network to obtain a multi-level feature map, and the feature map of the last layer is pooled into a feature sequence with the resolution ratio of 1 in height, W in width and C in channel number to obtain the high-dimensional features of the feature sequence. The detailed process comprises the following steps: 1. scaling the input image to (32, 100, 3); 2. using ResNet31 to obtain feature maps of three levels with resolutions of (16, 50, C ═ 128), (8, 25, C ═ 256), (4, 12, C ═ 512), where C is the number of channels; 3. the second level feature map (8, 25, C: 256) is then subjected to two convolution pooling operations to obtain the final sequence features (1, W: 26, C: 512).
A sequence attention extraction step: and designing a 1D Attention method, and fusing the characteristic sequences with the height of 1 obtained in the characteristic extraction step to obtain the characteristic sequences of corresponding characters (sequence characteristics extracted by using a 1D Attention Mechanism). In specific implementation, considering that a metal surface text is a non-relevance text, a parallel attention extraction mechanism is adopted to obtain sequence features, and pseudo codes are as follows:
and (3) sequence alignment step: the method comprises the steps that a convolution network is utilized to fuse multi-scale features from bottom to top according to the resolution, a segmentation result of characters is obtained according to a mask image, then the 1D sequence attention is optimized by using the segmentation result (a character saliency map, which is a character segmentation map only containing foreground and background), namely the attention information in the sequence features is optimized by using the segmentation result, and the attention is aligned with text characters; attention is therefore de-optimized because 1D Attention may be directed to misaligned image text, namely Attn (Attention) in Parallel Attention Mechanism (Pseudo code). As shown in fig. 2.
The sequence alignment step includes the steps of:
text semantic segmentation step: the multilevel characteristic diagram obtained in the characteristic extraction step is subjected to bottom-up fusion through a convolution network, and a saliency map S of the character is obtained through a classification layer and a nonlinear activation function m (32, 100, C is 1), and then establishing loss between the mask map obtained in the unsupervised clustering step and the saliency map, wherein the loss is used for optimizing the network segmentation result, and setting segmentation threshold values for low-contrast and difficult-to-identify areas to exchange more foreground text features.
Attention sequence correction step: according to a character saliency map (optimized segmentation result) obtained in the semantic segmentation step, an optimization function is designed to optimize a 1D attention result (1D sequence attention) obtained in the sequence attention extraction step (namely, Pseudo code part). The optimization formula is as follows:
wherein alpha is t E.g. Attn, sigma is an activation function, minimize represents a minimization function, wherein a loss function is used for optimization, t represents decoding time, and L represents the character length of a text instance; alpha is alpha t Is the 1D sequence attention obtained at the moment of decoding t; alpha is alpha t′ Is the 1D sequence attention obtained at the decoding instant t';is the result of the transposition, τ denotes transposition.
A character segmentation step: and constructing a character outline label by using the optimized 1D sequence attention (attention information) and the obtained segmentation result (optimized segmentation result), then using the character outline label to monitor a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters. The character outline label is constructed by using the optimized sequence characteristics and the obtained segmentation result, then the character outline label is used for monitoring a predicted character result obtained by the multi-scale characteristics sent into the time sequence classification network, and finally the topological characteristics of the character are obtained.
The character segmentation step includes the steps of:
a character semantic segmentation step: considering the workload and the difficulty of manual character labeling, constructing a character outline level label S according to a 1D attention result after the attention sequence correction step is optimized and a character saliency map obtained in a text semantic segmentation step gt Building a time-sequence classification network to predict the character outline information S cls And build a label for loss to supervise and optimize the predicted result (predicted character result, i.e. character outline information S) cls ). The specific optimization formula is as follows:
wherein, ω is j,i Andis the foreground mask S gt And character segmentation result S cls Confidence score, ρ, for pixel i of channel j i Is pixel i at S m A confidence score of (1);is a loss function; j represents a channel index; n represents the number of pixel points; i represents a pixel index;is a loss function;is a loss function; t is the decoding time.
Generating topological characteristics: and performing cross multiplication on the character segmentation graph (optimized prediction result) obtained by predicting according to the character semantic segmentation step and the multi-scale features obtained in the feature extraction step to obtain the topological features. Specifically, the character segmentation graph predicted by the character semantic segmentation step already has outline information of the character, and has finer character information than that of the character focused on a space region of the character coarsely, and then is cross-multiplied with the multi-scale features of the feature extraction step to obtain the topological feature c.
And (3) feature fusion step: and fusing the sequence characteristics (sequence characteristics obtained after optimizing the attention of the 1D sequence) and the topological characteristics, and sending the fused sequence characteristics into the parallel recognition branches to obtain a final prediction result. Fusing the optimized sequence features and topology features, and sending the fused sequence features and topology features to the parallel recognition branches to obtain a final prediction result.
The feature fusion step comprises the following steps:
and (3) feature fusion step: considering that the sequence features are in a 1D form and the topological features are in a 2D form, a 1D-2D fusion mode is designed to extract more sufficient semantic information. Specifically, the obtained topological features are fused with the sequence features in the following way:
z t =σ(W z [out t ,c k,t ]),
q k,t =z t out t +(1-z t )c k,t ,
wherein, c k,t Character information representing the k-th layer topological characteristic at the time t; z is a radical of formula t An intermediate amount of 0 to 1 as an activation value; w z Is a weight; out t An output indicating a t decoding time in the pseudo code; q. q.s k,t Is the final fused feature.
And then filling the sequence features into a two-dimensional space, splicing the sequence features with the extracted character segmentation graph to enable the sequence features to have universal semantic information, and pooling the sequence features to be 1D features q 'after several convolution layers' k,t . Finally q is added k,t And w' k,t Adding to obtain final high-level semantic information for sending to a classification network.
A parallel identification step: and according to the semantic information obtained in the characteristic fusion step, obtaining a final prediction result through a plurality of full-connection layers. Specifically, according to semantic information obtained in the characteristic fusion step and aiming at the characteristic of disorder of industrial texts, a time sequence prediction scheme is replaced, and a final prediction result is directly obtained through a plurality of full connection layers.
The method adopts preprocessing to enhance the image of the metal character image to obtain a high-quality preprocessed image; according to the characteristics of low contrast, strong light reflection and character corrosion of characters on the metal surface, firstly, extracting a character outline structure of a preprocessed image based on an unsupervised method, and then designing and segmenting a network to learn character outline information; designing a one-dimensional attention and image alignment method based on the segmentation result, and optimizing the drift problem of the one-dimensional attention in disordered texts and long texts; extracting character topological features by combining the generated attention mechanism of sequence alignment and the learned character profile map, and enhancing semantic information of a neural network middle layer; and finally, designing a parallel character recognition branch aiming at the disordered text on the surface of the metal part, and improving the recognition precision of the industrial disordered text.
The method solves the problem of identifying the text information on the surface of the metal part in the industrial environment so as to help the metal part to track and record on the industrial production line; through research and analysis to metal parts surface character sign, can discern information such as part model, size and producer fast on the processing production line of all kinds of machines, prevent the manual work because of discerning tired emergence that leads to the mistake, improve production efficiency.
The invention relates to a disordered text 1D attention feature extraction method, an unsupervised topological feature extraction method and a character feature fusion method, and digital quantitative evaluation is carried out on a text recognition result. The invention learns the character outline structure obtained by an unsupervised method by designing a character segmentation network, and helps to identify the network to obtain the topological information of the characters. The topological information of the constructed font assists the network to sense a more detailed character structure, enhances the distinction of different characters by the intermediate neural network layer, and drives the network to learn enhanced topological characteristics. The topological features are further fused into the final sequence to enrich semantic information and improve text recognition performance. When the identification of a single character of the ordered text is difficult, the identification result can be obtained by depending on the information of the context; but on unordered texts, the identification precision of the method relying on context sequence decoding is not high, and the identification network is helped to improve the precision by increasing the saliency of single characters of the unordered texts.
The invention discloses a character-driven topological feature enhanced metal part surface text recognition method, which aims to solve the problems of alignment drift sequence attention and thick character attention. Specifically, first, in the sequence alignment module, we solve the alignment drift order attention problem by building a constraint function with a learnable attention weight (1D) and a text segmentation map optimized by an unsupervised method on text images. Second, the proposed character segmentation module generates an ordered multi-channel segmentation result (2D) of the character class based on the 1D sequence alignment attention and the text segmentation map of the sequence alignment module without character-level annotations. It contains more detailed glyph topology information to enrich the semantic representation. Finally, a fusion mode is designed, and more topological features are fused into the attention-focused one-dimensional context features, so that the error accumulation caused by a time-dependent decoding mode is avoided.
The embodiment of the invention also discloses a metal part surface text recognition system based on outline feature enhancement, which comprises the following modules as shown in figure 1:
a preprocessing module: recognizing the metal surface character image, carrying out image enhancement on the metal surface character image through preprocessing to obtain a preprocessed image, and then classifying the preprocessed image to obtain a mask image.
The preprocessing module comprises the following modules: an image enhancement module: the method comprises the steps of carrying out balance enhancement on the local contrast of a metal part image based on an RGB image self-adaptive histogram, sharpening the metal part image by adopting a Laplace operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.
Unsupervised clustering module: and obtaining a mask result of the characters by an unsupervised clustering method based on the obtained preprocessed image.
A feature encoding module: inputting the preprocessed metal surface character image, extracting multi-scale features of the character to obtain a multi-level feature map, and extracting sequence features from the feature map of the last layer in the multi-level feature map by using an attention mechanism.
The feature coding module comprises the following modules: a feature extraction module: and sending the character image of the metal part into a deep convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the resolution ratio being high than a preset value.
A sequence attention extraction module: and designing an attention method, and fusing the feature sequences with the height of a preset value, which are obtained by the feature extraction module, to obtain the feature sequences of the corresponding characters.
A sequence alignment module: and (3) fusing the multi-scale features from bottom to top according to the resolution by using a convolution network, obtaining a segmentation result of the character according to the mask image, and then optimizing the attention information in the sequence features by using the segmentation result.
The sequence alignment module comprises the following modules: a text semantic segmentation module: the multi-level feature map obtained by the feature extraction module is subjected to bottom-up fusion through a convolution network, a saliency map of the character is obtained through a classification layer and a nonlinear activation function, then a mask map obtained by the unsupervised clustering module and the saliency map are subjected to loss building, and a network segmentation result is optimized;
attention sequence correction module: and designing an optimization function to optimize the 1D attention result obtained by the sequence attention extraction module according to the character saliency map obtained by the text semantic segmentation module.
A character segmentation module: and constructing a character outline label by using the optimized attention information and the obtained segmentation result, then using the character outline label to supervise a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters.
A feature fusion module: and fusing the sequence characteristics and the topology characteristics after the attention information is optimized, and sending the sequence characteristics and the topology characteristics into the parallel recognition branches to obtain a final prediction result.
The invention develops a metal part surface text recognition method based on outline feature enhancement by evaluating the characteristics of background characters of the metal part, and solves the problem of text recognition of disordered character strings in a metal image with low contrast; the method has the advantages that extra character-level labels are not needed, the learned topological characteristics of the metal part characters are injected into the identification branches, semantic information identified by the single characters in the unordered text is enriched, and the identification precision is improved.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. A metal part surface text recognition method based on contour feature enhancement is characterized by comprising the following steps:
a pretreatment step: recognizing the metal surface character image, carrying out image enhancement on the metal surface character image through pretreatment to obtain a pretreatment image, and then classifying the pretreatment image to obtain a mask image;
a characteristic coding step: inputting a preprocessed metal surface character image, extracting multi-scale features of characters to obtain a multi-level feature map, and extracting sequence features from a feature map of a last layer in the multi-level feature map by using an attention mechanism;
and (3) sequence alignment: fusing the multi-scale features from bottom to top according to the resolution by using a convolution network, obtaining a segmentation result of the character according to the mask image, and then optimizing the attention information in the sequence features by using the segmentation result;
a character segmentation step: constructing a character outline label by using the optimized attention information and the obtained segmentation result, then using the character outline label to supervise a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters;
a characteristic fusion step: and fusing the sequence characteristics and the topology characteristics after the attention information is optimized, and sending the sequence characteristics and the topology characteristics into the parallel recognition branches to obtain a final prediction result.
2. The method for recognizing the text on the surface of the metal part based on the enhancement of the contour features as claimed in claim 1, wherein the preprocessing step comprises the steps of:
an image enhancement step: based on an RGB image self-adaptive histogram, the local contrast of the metal part image is enhanced in a balanced mode, meanwhile, the metal part image is sharpened by adopting a Laplace operator, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
unsupervised clustering step: and obtaining a mask result of the characters by an unsupervised clustering method based on the obtained preprocessed image.
3. The method for recognizing the text on the surface of the metal part based on the enhancement of the outline characteristics as claimed in claim 1, wherein the characteristic encoding step comprises the steps of:
a characteristic extraction step: sending the character image of the metal part into a depth convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the resolution higher than a preset value;
sequence attention extraction step: and designing an attention method, and fusing the feature sequences with the height of a preset value obtained in the feature extraction step to obtain the feature sequences of the corresponding characters.
4. The method of claim 3, wherein the step of aligning the sequence comprises the steps of:
text semantic segmentation step: performing bottom-up fusion on the multi-level feature map obtained in the feature extraction step through a convolution network, obtaining a saliency map of the character through a classification layer and a nonlinear activation function, then establishing loss between a mask map obtained in the unsupervised clustering step and the saliency map, and optimizing a network segmentation result;
attention sequence correction step: and designing an optimization function to optimize the attention result obtained in the sequence attention extraction step according to the character saliency map obtained in the text semantic segmentation step.
5. The method as claimed in claim 4, wherein the character segmentation step comprises the following steps:
a character semantic segmentation step: constructing a character outline level label according to the attention result after the optimization of the attention sequence correction step and the character saliency map obtained in the semantic segmentation step, constructing a time sequence classification network to predict character outline information, and establishing a constructed character outline level label for loss to supervise and optimize the prediction result;
generating topological characteristics: and performing cross multiplication on the character segmentation graph obtained by predicting according to the character semantic segmentation step and the multi-scale features obtained in the feature extraction step to obtain the topological features.
6. The method for recognizing the text on the surface of the metal part based on the enhancement of the outline features as claimed in claim 1, wherein the feature fusion step comprises the steps of:
and (3) feature fusion step: extracting semantic information from the optimized sequence features and topology features in a fusion mode;
a parallel identification step: and obtaining a final prediction result through a plurality of full connection layers according to the semantic information obtained in the characteristic fusion step.
7. A metal part surface text recognition system based on outline feature enhancement is characterized by comprising the following modules:
a preprocessing module: identifying a metal surface character image, carrying out image enhancement on the metal surface character image through pretreatment to obtain a pretreated image, and then classifying the pretreated image to obtain a mask image;
a feature encoding module: inputting a preprocessed metal surface character image, extracting multi-scale features of characters to obtain a multi-level feature map, and extracting sequence features from a feature map of a last layer in the multi-level feature map by using an attention mechanism;
a sequence alignment module: fusing the multi-scale features from bottom to top according to the resolution by using a convolution network, obtaining a segmentation result of the character according to the mask image, and then optimizing the attention information in the sequence features by using the segmentation result;
a character segmentation module: constructing a character outline label by using the optimized attention information and the obtained segmentation result, then using the character outline label to supervise a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters;
a feature fusion module: and fusing the sequence characteristics and the topological characteristics after the attention information is optimized, and sending the sequence characteristics and the topological characteristics into the parallel recognition branches to obtain a final prediction result.
8. The system of claim 7, wherein the preprocessing module comprises the following modules:
an image enhancement module: based on an RGB image self-adaptive histogram, the local contrast of the metal part image is enhanced in a balanced mode, meanwhile, the metal part image is sharpened by adopting a Laplace operator, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
an unsupervised clustering module: and obtaining a mask result of the characters by an unsupervised clustering method based on the obtained preprocessed image.
9. The system of claim 8, wherein the feature encoding module comprises:
a feature extraction module: sending the character image of the metal part into a depth convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the resolution higher than a preset value;
a sequence attention extraction module: and designing an attention method, and fusing the feature sequences with the height of a preset value, which are obtained by the feature extraction module, to obtain the feature sequences of the corresponding characters.
10. The contour feature enhancement based metal part surface text recognition system of claim 9, wherein the sequence alignment module comprises the following modules:
a text semantic segmentation module: the multi-level feature map obtained by the feature extraction module is subjected to bottom-up fusion through a convolution network, a saliency map of the character is obtained through a classification layer and a nonlinear activation function, then a mask map obtained by the unsupervised clustering module and the saliency map are subjected to loss building, and a network segmentation result is optimized;
attention sequence correction module: and designing an optimization function to optimize the attention result obtained by the sequence attention extraction module according to the character saliency map obtained by the text semantic segmentation module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210610169.3A CN114973228A (en) | 2022-05-31 | 2022-05-31 | Metal part surface text recognition method and system based on contour feature enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210610169.3A CN114973228A (en) | 2022-05-31 | 2022-05-31 | Metal part surface text recognition method and system based on contour feature enhancement |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114973228A true CN114973228A (en) | 2022-08-30 |
Family
ID=82957841
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210610169.3A Pending CN114973228A (en) | 2022-05-31 | 2022-05-31 | Metal part surface text recognition method and system based on contour feature enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114973228A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912845A (en) * | 2023-06-16 | 2023-10-20 | 广东电网有限责任公司佛山供电局 | Intelligent content identification and analysis method and device based on NLP and AI |
CN117095423A (en) * | 2023-10-20 | 2023-11-21 | 上海银行股份有限公司 | Bank bill character recognition method and device |
-
2022
- 2022-05-31 CN CN202210610169.3A patent/CN114973228A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912845A (en) * | 2023-06-16 | 2023-10-20 | 广东电网有限责任公司佛山供电局 | Intelligent content identification and analysis method and device based on NLP and AI |
CN116912845B (en) * | 2023-06-16 | 2024-03-19 | 广东电网有限责任公司佛山供电局 | Intelligent content identification and analysis method and device based on NLP and AI |
CN117095423A (en) * | 2023-10-20 | 2023-11-21 | 上海银行股份有限公司 | Bank bill character recognition method and device |
CN117095423B (en) * | 2023-10-20 | 2024-01-05 | 上海银行股份有限公司 | Bank bill character recognition method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111177366B (en) | Automatic generation method, device and system for extraction type document abstract based on query mechanism | |
CN111309971B (en) | Multi-level coding-based text-to-video cross-modal retrieval method | |
CN107632981B (en) | Neural machine translation method introducing source language chunk information coding | |
CN114973228A (en) | Metal part surface text recognition method and system based on contour feature enhancement | |
CN111581345A (en) | Document level event extraction method and device | |
CN111858843B (en) | Text classification method and device | |
CN110188827B (en) | Scene recognition method based on convolutional neural network and recursive automatic encoder model | |
CN114818721B (en) | Event joint extraction model and method combined with sequence labeling | |
CN109933682B (en) | Image hash retrieval method and system based on combination of semantics and content information | |
CN114529903A (en) | Text refinement network | |
CN115620265A (en) | Locomotive signboard information intelligent identification method and system based on deep learning | |
CN115080750A (en) | Weak supervision text classification method, system and device based on fusion prompt sequence | |
CN112966676B (en) | Document key information extraction method based on zero sample learning | |
CN116152824A (en) | Invoice information extraction method and system | |
CN116401289A (en) | Traceability link automatic recovery method based on multi-source information combination | |
CN113254575B (en) | Machine reading understanding method and system based on multi-step evidence reasoning | |
CN114255379A (en) | Mathematical formula identification method and device based on coding and decoding and readable storage medium | |
CN114595338A (en) | Entity relation joint extraction system and method based on mixed feature representation | |
CN114297408A (en) | Relation triple extraction method based on cascade binary labeling framework | |
CN114298032A (en) | Text punctuation detection method, computer device and storage medium | |
CN116311275B (en) | Text recognition method and system based on seq2seq language model | |
CN116824271B (en) | SMT chip defect detection system and method based on tri-modal vector space alignment | |
CN113361274B (en) | Intent recognition method and device based on label vector, electronic equipment and medium | |
Xamena et al. | End-to-end platform evaluation for Spanish Handwritten Text Recognition | |
CN116778556A (en) | Face attribute identification method and system based on visual language model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |