CN114973228A - Metal part surface text recognition method and system based on contour feature enhancement - Google Patents

Metal part surface text recognition method and system based on contour feature enhancement Download PDF

Info

Publication number
CN114973228A
CN114973228A CN202210610169.3A CN202210610169A CN114973228A CN 114973228 A CN114973228 A CN 114973228A CN 202210610169 A CN202210610169 A CN 202210610169A CN 114973228 A CN114973228 A CN 114973228A
Authority
CN
China
Prior art keywords
character
image
sequence
attention
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210610169.3A
Other languages
Chinese (zh)
Inventor
谷朝臣
官同坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202210610169.3A priority Critical patent/CN114973228A/en
Publication of CN114973228A publication Critical patent/CN114973228A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/168Smoothing or thinning of the pattern; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19107Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention provides a method and a system for recognizing a text on the surface of a metal part based on contour feature enhancement, which comprises the following steps: recognizing a metal surface character image, obtaining a preprocessed image through preprocessing, extracting multi-scale features of characters to obtain a multi-level feature map, and extracting sequence features; fusing the multi-scale features from bottom to top according to the resolution, obtaining the segmentation result of the characters according to the mask image, and extracting the attention information; constructing a character outline label by using the attention information and the segmentation result to obtain the topological characteristic of the character; and fusing the optimized sequence characteristics and the topology characteristics to obtain a final prediction result. The method solves the problem of text recognition of unordered character strings in the metal image with low contrast; the method has the advantages that extra character-level labels are not needed, the learned topological features of the characters of the metal parts are injected into the recognition branches, the semantic information recognized by the single characters in the unordered text is enriched, and the recognition accuracy is improved.

Description

Metal part surface text recognition method and system based on contour feature enhancement
Technical Field
The invention relates to the technical field of text recognition in the field of artificial intelligence, in particular to a method and a system for recognizing a text on the surface of a metal part based on contour feature enhancement. In particular, the invention preferably relates to a character-driven topological feature enhanced metal part surface text recognition method.
Background
Text information is used as a key ring in the information era, under the background of intelligent manufacturing, through research and analysis on character marks on the surfaces of metal parts, texts are detected to play an increasingly important role, the types of the parts, production information, manufacturers and other information can be quickly identified on processing production lines of various machines, errors caused by identifying fatigue in manual work are prevented, and the assembly speed of an industrial production line and the logistics transmission efficiency in an industrial scene are improved.
Existing research methods mainly come from Scene Text Recognition (STR), which aims to recognize regular and irregular texts from multi-scene images, and are widely applied to handwriting recognition, industrial print recognition and visual understanding. In addition to the problem of indefinite length character sequences of Optical Character Recognition (OCR) available in documents, the STR task faces irregular structure (curved, oriented and distorted text), low resolution, heavy occlusion and uneven lighting. Inspired by natural language processing methods, researchers have developed encoder-decoder architectures with attention to address the above challenges. The encoder extracts visual features of the text image and outputs a semantic feature representation, and the decoder constructs an attention mechanism to capture time-dependent attention character features in different decoding steps. These attention mechanisms are summarized as a one-dimensional (1D) attention mechanism and a two-dimensional (2D) attention mechanism.
Specifically, the encoder-decoder approach with a one-dimensional attention mechanism uses a long short-term memory (LSTM) unit to construct an implicit linguistic representation of the attention-focused feature sequence, and then sequentially outputs character classes. The one-dimensional attention mechanism generates different attention weights at different decoding steps, and thus multiplies the encoder output to indicate important items of the encoder output at the current decoding step, and finally the feature vector is used for character recognition. For regular text recognition, the attention-based approach performs sequence modeling on a one-dimensional space, where the input text images are encoded into a one-dimensional sequence via CNN, then uses the encoder BiLSTM to capture the remote dependencies of the input sequence, and then outputs their states sequentially to the decoder.
For irregular text recognition, the curved and distorted text images are compressed into one-dimensional sequence features, which inevitably bring irrelevant information, generate weak semantic representation and cause error accumulation. Therefore, constructing a two-dimensional attention mechanism is the mainstream solution. 2D attention-based methods capture the spatial features of the corresponding characters by embedding time-dependent sequences throughout the network, and various 2D attention feature representations are proposed in the decoding process.
The patent document of the invention discloses a natural scene text recognition method based on a two-dimensional feature attention mechanism, with publication number CN110378334A, and the method comprises the following steps: data acquisition: synthesizing a line text picture for training by using an open code, dividing the line text picture into a regular training set and an irregular training set according to the shape, and downloading a text picture which is really shot from the internet as test data; 2, data processing: stretching the size of the picture, wherein the size of the processed picture is 32 × 104; 3, label preparation: training a recognition model by adopting a supervision method, wherein each line of text picture has corresponding text content; 4, training the network: training the recognition network by using the data in the training set; 5, testing the network: test data is input into the trained network to obtain a prediction result of the line text picture.
With respect to the related art in the foregoing, the inventors consider that the 1D attention-based method in the foregoing method can quickly and robustly recognize regular texts, but they do not perform well on irregular and long texts due to the sequential attention of the alignment drift. The 2D attention-based approach provides an effective solution for identifying irregular text, but the extracted 2D attention features roughly focus on the spatial regions of the character, while less consideration is given to the topological information of the glyph (e.g., outline and pixel level locations), resulting in degraded performance.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method and a system for recognizing the text on the surface of a metal part based on contour feature enhancement.
The invention provides a metal part surface text recognition method based on outline feature enhancement, which comprises the following steps:
a pretreatment step: recognizing the metal surface character image, carrying out image enhancement on the metal surface character image through pretreatment to obtain a pretreatment image, and then classifying the pretreatment image to obtain a mask image;
and (3) feature coding: inputting a preprocessed metal surface character image, extracting multi-scale features of characters to obtain a multi-level feature map, and extracting sequence features from a feature map of a last layer in the multi-level feature map by using an attention mechanism;
and (3) sequence alignment step: fusing the multi-scale features from bottom to top according to the resolution by using a convolution network, obtaining a segmentation result of the character according to the mask image, and then optimizing the attention information in the sequence features by using the segmentation result;
a character segmentation step: constructing a character outline label by using the optimized attention information and the obtained segmentation result, then using the character outline label to supervise a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters;
and (3) feature fusion step: and fusing the sequence characteristics and the topology characteristics after the attention information is optimized, and sending the sequence characteristics and the topology characteristics into the parallel recognition branches to obtain a final prediction result.
Preferably, the pretreatment step comprises the steps of:
an image enhancement step: based on an RGB image self-adaptive histogram, the local contrast of the metal part image is enhanced in a balanced mode, meanwhile, the metal part image is sharpened by adopting a Laplace operator, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
unsupervised clustering step: and obtaining a mask result of the characters by an unsupervised clustering method based on the obtained preprocessed image.
Preferably, the feature encoding step includes the steps of:
a characteristic extraction step: sending the character image of the metal part into a deep convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the resolution ratio being a preset value;
a sequence attention extraction step: and designing an attention method, and fusing the feature sequences with the height of a preset value obtained in the feature extraction step to obtain the feature sequences of the corresponding characters.
Preferably, the sequence alignment step comprises the steps of:
text semantic segmentation step: carrying out bottom-to-top fusion on the multi-level feature maps obtained in the feature extraction step through a convolution network, obtaining a saliency map of the character through a classification layer and a nonlinear activation function, then establishing loss between a mask map obtained in the unsupervised clustering step and the saliency map, and optimizing a network segmentation result;
and (3) attention sequence correction: and designing an optimization function to optimize the attention result obtained in the sequence attention extraction step according to the character saliency map obtained in the text semantic segmentation step.
Preferably, the character segmentation step includes the steps of:
a character semantic segmentation step: constructing a character outline level label according to the attention result after the optimization of the attention sequence correction step and the character saliency map obtained in the semantic segmentation step, constructing a time sequence classification network to predict character outline information, and establishing a constructed character outline level label for loss to supervise and optimize the prediction result;
generating topological characteristics: and performing cross multiplication on the character segmentation graph obtained by predicting according to the character semantic segmentation step and the multi-scale features obtained in the feature extraction step to obtain the topological features.
Preferably, the feature fusion step includes the steps of:
and (3) feature fusion step: extracting semantic information from the optimized sequence features and topology features in a fusion mode;
parallel identification: and obtaining a final prediction result through a plurality of full connection layers according to the semantic information obtained in the characteristic fusion step.
The invention provides a metal part surface text recognition system based on outline feature enhancement, which comprises the following modules:
a pretreatment module: recognizing the metal surface character image, carrying out image enhancement on the metal surface character image through pretreatment to obtain a pretreatment image, and then classifying the pretreatment image to obtain a mask image;
a feature encoding module: inputting a preprocessed metal surface character image, extracting multi-scale features of characters to obtain a multi-level feature map, and extracting sequence features from a feature map of a last layer in the multi-level feature map by using an attention mechanism;
a sequence alignment module: fusing the multi-scale features from bottom to top according to the resolution by using a convolution network, obtaining a segmentation result of the character according to the mask image, and then optimizing the attention information in the sequence features by using the segmentation result;
a character segmentation module: constructing a character outline label by using the optimized attention information and the obtained segmentation result, then using the character outline label to supervise a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters;
a feature fusion module: and fusing the sequence characteristics and the topology characteristics after the attention information is optimized, and sending the sequence characteristics and the topology characteristics into the parallel recognition branches to obtain a final prediction result.
Preferably, the preprocessing module comprises the following modules:
an image enhancement module: based on an RGB image self-adaptive histogram, the local contrast of the metal part image is enhanced in a balanced mode, meanwhile, the metal part image is sharpened by adopting a Laplace operator, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
unsupervised clustering module: and obtaining a mask result of the characters by an unsupervised clustering method based on the obtained preprocessed image.
Preferably, the feature encoding module includes the following modules:
a feature extraction module: sending the character image of the metal part into a depth convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the resolution higher than a preset value;
a sequence attention extraction module: and designing an attention method, and fusing the feature sequences with the height of a preset value, which are obtained by the feature extraction module, to obtain the feature sequences of the corresponding characters.
Preferably, the sequence alignment module comprises the following modules:
a text semantic segmentation module: the multi-level feature map obtained by the feature extraction module is subjected to bottom-up fusion through a convolution network, a saliency map of the character is obtained through a classification layer and a nonlinear activation function, then a mask map obtained by the unsupervised clustering module and the saliency map are subjected to loss building, and a network segmentation result is optimized;
attention sequence correction module: and designing an optimization function to optimize the attention result obtained by the sequence attention extraction module according to the character saliency map obtained by the text semantic segmentation module.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention discloses a character-driven topological feature enhanced metal part surface text recognition method, which aims to solve the problems of alignment drift sequence attention and thick character attention. The invention solves the problem of text recognition of unordered character strings in metal images with low contrast; the method has the advantages that extra character-level labels are not needed, the learned topological characteristics of the metal part characters are injected into the identification branches, semantic information identified by single characters in the unordered text is enriched, and identification precision is improved;
2. the method utilizes an unsupervised thought to obtain the character foreground for the first time, and then constructs the character outline information, which is helpful for helping the recognition network to increase the discrimination of different characters, so that richer semantic information can be contained in a prediction layer, and the recognition precision is improved; compared with a rough attention mechanism, the method contains richer semantic information, and the effectiveness of the visual model can be improved;
3. compared with a sequence identification mode suitable for orderliness or context relevance, the method provided by the invention can be used for well solving the identification problem of the unordered text by a parallel prediction mode aiming at the characteristic that the text on the surface of the metal part has the orderless property.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of text recognition of a surface of a metal part;
FIG. 2 is an exemplary diagram of some unaligned sequence attention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment of the invention discloses a character-driven topological feature enhancement-based metal part surface text recognition method, which comprises the following steps of:
a pretreatment step: identifying the metal surface character image, carrying out image enhancement on the metal surface character image through preprocessing to obtain a high-quality preprocessed image, and then classifying the character image (preprocessed image) through unsupervised binary classification to obtain a mask image (namely, carrying out the binary classification on the preprocessed image by using an unsupervised clustering method to obtain the mask image).
The pretreatment step comprises the following steps:
an image enhancement step: the method comprises the steps of carrying out balance enhancement on the local contrast of a metal part image based on an RGB image self-adaptive histogram, sharpening the metal part image by adopting a Laplace operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.
Unsupervised clustering step: and obtaining a mask result (a mask image only comprising a foreground and a background) of the character by an unsupervised clustering method based on the obtained preprocessed image.
And (3) feature coding: inputting the preprocessed metal surface character image, extracting multi-scale features (multi-level feature maps) of the character through a ResNet31 network, and extracting sequence features from the final feature map (the last layer of feature map of the multi-level feature maps) by using a 1D attention mechanism. Inputting a preprocessed metal surface character image, extracting multi-scale features of the character to obtain a multi-level feature map, and extracting sequence features from the feature map of the last level of the multi-level feature map by using an attention mechanism.
The feature encoding step includes the steps of:
a characteristic extraction step: and (3) sending the character image of the metal part into a deep convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the height of 1 in the resolution to obtain the high-dimensional features of the feature sequence.
Specifically, the character image of the metal part is sent into a deep convolution network to obtain a multi-level feature map, and the feature map of the last layer is pooled into a feature sequence with the resolution ratio of 1 in height, W in width and C in channel number to obtain the high-dimensional features of the feature sequence. The detailed process comprises the following steps: 1. scaling the input image to (32, 100, 3); 2. using ResNet31 to obtain feature maps of three levels with resolutions of (16, 50, C ═ 128), (8, 25, C ═ 256), (4, 12, C ═ 512), where C is the number of channels; 3. the second level feature map (8, 25, C: 256) is then subjected to two convolution pooling operations to obtain the final sequence features (1, W: 26, C: 512).
A sequence attention extraction step: and designing a 1D Attention method, and fusing the characteristic sequences with the height of 1 obtained in the characteristic extraction step to obtain the characteristic sequences of corresponding characters (sequence characteristics extracted by using a 1D Attention Mechanism). In specific implementation, considering that a metal surface text is a non-relevance text, a parallel attention extraction mechanism is adopted to obtain sequence features, and pseudo codes are as follows:
Figure BDA0003671717930000061
and (3) sequence alignment step: the method comprises the steps that a convolution network is utilized to fuse multi-scale features from bottom to top according to the resolution, a segmentation result of characters is obtained according to a mask image, then the 1D sequence attention is optimized by using the segmentation result (a character saliency map, which is a character segmentation map only containing foreground and background), namely the attention information in the sequence features is optimized by using the segmentation result, and the attention is aligned with text characters; attention is therefore de-optimized because 1D Attention may be directed to misaligned image text, namely Attn (Attention) in Parallel Attention Mechanism (Pseudo code). As shown in fig. 2.
The sequence alignment step includes the steps of:
text semantic segmentation step: the multilevel characteristic diagram obtained in the characteristic extraction step is subjected to bottom-up fusion through a convolution network, and a saliency map S of the character is obtained through a classification layer and a nonlinear activation function m (32, 100, C is 1), and then establishing loss between the mask map obtained in the unsupervised clustering step and the saliency map, wherein the loss is used for optimizing the network segmentation result, and setting segmentation threshold values for low-contrast and difficult-to-identify areas to exchange more foreground text features.
Attention sequence correction step: according to a character saliency map (optimized segmentation result) obtained in the semantic segmentation step, an optimization function is designed to optimize a 1D attention result (1D sequence attention) obtained in the sequence attention extraction step (namely, Pseudo code part). The optimization formula is as follows:
Figure BDA0003671717930000071
Figure BDA0003671717930000072
wherein alpha is t E.g. Attn, sigma is an activation function, minimize represents a minimization function, wherein a loss function is used for optimization, t represents decoding time, and L represents the character length of a text instance; alpha is alpha t Is the 1D sequence attention obtained at the moment of decoding t; alpha is alpha t′ Is the 1D sequence attention obtained at the decoding instant t';
Figure BDA0003671717930000073
is the result of the transposition, τ denotes transposition.
A character segmentation step: and constructing a character outline label by using the optimized 1D sequence attention (attention information) and the obtained segmentation result (optimized segmentation result), then using the character outline label to monitor a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters. The character outline label is constructed by using the optimized sequence characteristics and the obtained segmentation result, then the character outline label is used for monitoring a predicted character result obtained by the multi-scale characteristics sent into the time sequence classification network, and finally the topological characteristics of the character are obtained.
The character segmentation step includes the steps of:
a character semantic segmentation step: considering the workload and the difficulty of manual character labeling, constructing a character outline level label S according to a 1D attention result after the attention sequence correction step is optimized and a character saliency map obtained in a text semantic segmentation step gt Building a time-sequence classification network to predict the character outline information S cls And build a label for loss to supervise and optimize the predicted result (predicted character result, i.e. character outline information S) cls ). The specific optimization formula is as follows:
Figure BDA0003671717930000074
Figure BDA0003671717930000081
Figure BDA0003671717930000082
Figure BDA0003671717930000083
wherein, ω is j,i And
Figure BDA0003671717930000084
is the foreground mask S gt And character segmentation result S cls Confidence score, ρ, for pixel i of channel j i Is pixel i at S m A confidence score of (1);
Figure BDA0003671717930000085
is a loss function; j represents a channel index; n represents the number of pixel points; i represents a pixel index;
Figure BDA0003671717930000086
is a loss function;
Figure BDA0003671717930000087
is a loss function; t is the decoding time.
Generating topological characteristics: and performing cross multiplication on the character segmentation graph (optimized prediction result) obtained by predicting according to the character semantic segmentation step and the multi-scale features obtained in the feature extraction step to obtain the topological features. Specifically, the character segmentation graph predicted by the character semantic segmentation step already has outline information of the character, and has finer character information than that of the character focused on a space region of the character coarsely, and then is cross-multiplied with the multi-scale features of the feature extraction step to obtain the topological feature c.
And (3) feature fusion step: and fusing the sequence characteristics (sequence characteristics obtained after optimizing the attention of the 1D sequence) and the topological characteristics, and sending the fused sequence characteristics into the parallel recognition branches to obtain a final prediction result. Fusing the optimized sequence features and topology features, and sending the fused sequence features and topology features to the parallel recognition branches to obtain a final prediction result.
The feature fusion step comprises the following steps:
and (3) feature fusion step: considering that the sequence features are in a 1D form and the topological features are in a 2D form, a 1D-2D fusion mode is designed to extract more sufficient semantic information. Specifically, the obtained topological features are fused with the sequence features in the following way:
z t =σ(W z [out t ,c k,t ]),
q k,t =z t out t +(1-z t )c k,t ,
wherein, c k,t Character information representing the k-th layer topological characteristic at the time t; z is a radical of formula t An intermediate amount of 0 to 1 as an activation value; w z Is a weight; out t An output indicating a t decoding time in the pseudo code; q. q.s k,t Is the final fused feature.
And then filling the sequence features into a two-dimensional space, splicing the sequence features with the extracted character segmentation graph to enable the sequence features to have universal semantic information, and pooling the sequence features to be 1D features q 'after several convolution layers' k,t . Finally q is added k,t And w' k,t Adding to obtain final high-level semantic information for sending to a classification network.
A parallel identification step: and according to the semantic information obtained in the characteristic fusion step, obtaining a final prediction result through a plurality of full-connection layers. Specifically, according to semantic information obtained in the characteristic fusion step and aiming at the characteristic of disorder of industrial texts, a time sequence prediction scheme is replaced, and a final prediction result is directly obtained through a plurality of full connection layers.
The method adopts preprocessing to enhance the image of the metal character image to obtain a high-quality preprocessed image; according to the characteristics of low contrast, strong light reflection and character corrosion of characters on the metal surface, firstly, extracting a character outline structure of a preprocessed image based on an unsupervised method, and then designing and segmenting a network to learn character outline information; designing a one-dimensional attention and image alignment method based on the segmentation result, and optimizing the drift problem of the one-dimensional attention in disordered texts and long texts; extracting character topological features by combining the generated attention mechanism of sequence alignment and the learned character profile map, and enhancing semantic information of a neural network middle layer; and finally, designing a parallel character recognition branch aiming at the disordered text on the surface of the metal part, and improving the recognition precision of the industrial disordered text.
The method solves the problem of identifying the text information on the surface of the metal part in the industrial environment so as to help the metal part to track and record on the industrial production line; through research and analysis to metal parts surface character sign, can discern information such as part model, size and producer fast on the processing production line of all kinds of machines, prevent the manual work because of discerning tired emergence that leads to the mistake, improve production efficiency.
The invention relates to a disordered text 1D attention feature extraction method, an unsupervised topological feature extraction method and a character feature fusion method, and digital quantitative evaluation is carried out on a text recognition result. The invention learns the character outline structure obtained by an unsupervised method by designing a character segmentation network, and helps to identify the network to obtain the topological information of the characters. The topological information of the constructed font assists the network to sense a more detailed character structure, enhances the distinction of different characters by the intermediate neural network layer, and drives the network to learn enhanced topological characteristics. The topological features are further fused into the final sequence to enrich semantic information and improve text recognition performance. When the identification of a single character of the ordered text is difficult, the identification result can be obtained by depending on the information of the context; but on unordered texts, the identification precision of the method relying on context sequence decoding is not high, and the identification network is helped to improve the precision by increasing the saliency of single characters of the unordered texts.
The invention discloses a character-driven topological feature enhanced metal part surface text recognition method, which aims to solve the problems of alignment drift sequence attention and thick character attention. Specifically, first, in the sequence alignment module, we solve the alignment drift order attention problem by building a constraint function with a learnable attention weight (1D) and a text segmentation map optimized by an unsupervised method on text images. Second, the proposed character segmentation module generates an ordered multi-channel segmentation result (2D) of the character class based on the 1D sequence alignment attention and the text segmentation map of the sequence alignment module without character-level annotations. It contains more detailed glyph topology information to enrich the semantic representation. Finally, a fusion mode is designed, and more topological features are fused into the attention-focused one-dimensional context features, so that the error accumulation caused by a time-dependent decoding mode is avoided.
The embodiment of the invention also discloses a metal part surface text recognition system based on outline feature enhancement, which comprises the following modules as shown in figure 1:
a preprocessing module: recognizing the metal surface character image, carrying out image enhancement on the metal surface character image through preprocessing to obtain a preprocessed image, and then classifying the preprocessed image to obtain a mask image.
The preprocessing module comprises the following modules: an image enhancement module: the method comprises the steps of carrying out balance enhancement on the local contrast of a metal part image based on an RGB image self-adaptive histogram, sharpening the metal part image by adopting a Laplace operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.
Unsupervised clustering module: and obtaining a mask result of the characters by an unsupervised clustering method based on the obtained preprocessed image.
A feature encoding module: inputting the preprocessed metal surface character image, extracting multi-scale features of the character to obtain a multi-level feature map, and extracting sequence features from the feature map of the last layer in the multi-level feature map by using an attention mechanism.
The feature coding module comprises the following modules: a feature extraction module: and sending the character image of the metal part into a deep convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the resolution ratio being high than a preset value.
A sequence attention extraction module: and designing an attention method, and fusing the feature sequences with the height of a preset value, which are obtained by the feature extraction module, to obtain the feature sequences of the corresponding characters.
A sequence alignment module: and (3) fusing the multi-scale features from bottom to top according to the resolution by using a convolution network, obtaining a segmentation result of the character according to the mask image, and then optimizing the attention information in the sequence features by using the segmentation result.
The sequence alignment module comprises the following modules: a text semantic segmentation module: the multi-level feature map obtained by the feature extraction module is subjected to bottom-up fusion through a convolution network, a saliency map of the character is obtained through a classification layer and a nonlinear activation function, then a mask map obtained by the unsupervised clustering module and the saliency map are subjected to loss building, and a network segmentation result is optimized;
attention sequence correction module: and designing an optimization function to optimize the 1D attention result obtained by the sequence attention extraction module according to the character saliency map obtained by the text semantic segmentation module.
A character segmentation module: and constructing a character outline label by using the optimized attention information and the obtained segmentation result, then using the character outline label to supervise a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters.
A feature fusion module: and fusing the sequence characteristics and the topology characteristics after the attention information is optimized, and sending the sequence characteristics and the topology characteristics into the parallel recognition branches to obtain a final prediction result.
The invention develops a metal part surface text recognition method based on outline feature enhancement by evaluating the characteristics of background characters of the metal part, and solves the problem of text recognition of disordered character strings in a metal image with low contrast; the method has the advantages that extra character-level labels are not needed, the learned topological characteristics of the metal part characters are injected into the identification branches, semantic information identified by the single characters in the unordered text is enriched, and the identification precision is improved.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A metal part surface text recognition method based on contour feature enhancement is characterized by comprising the following steps:
a pretreatment step: recognizing the metal surface character image, carrying out image enhancement on the metal surface character image through pretreatment to obtain a pretreatment image, and then classifying the pretreatment image to obtain a mask image;
a characteristic coding step: inputting a preprocessed metal surface character image, extracting multi-scale features of characters to obtain a multi-level feature map, and extracting sequence features from a feature map of a last layer in the multi-level feature map by using an attention mechanism;
and (3) sequence alignment: fusing the multi-scale features from bottom to top according to the resolution by using a convolution network, obtaining a segmentation result of the character according to the mask image, and then optimizing the attention information in the sequence features by using the segmentation result;
a character segmentation step: constructing a character outline label by using the optimized attention information and the obtained segmentation result, then using the character outline label to supervise a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters;
a characteristic fusion step: and fusing the sequence characteristics and the topology characteristics after the attention information is optimized, and sending the sequence characteristics and the topology characteristics into the parallel recognition branches to obtain a final prediction result.
2. The method for recognizing the text on the surface of the metal part based on the enhancement of the contour features as claimed in claim 1, wherein the preprocessing step comprises the steps of:
an image enhancement step: based on an RGB image self-adaptive histogram, the local contrast of the metal part image is enhanced in a balanced mode, meanwhile, the metal part image is sharpened by adopting a Laplace operator, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
unsupervised clustering step: and obtaining a mask result of the characters by an unsupervised clustering method based on the obtained preprocessed image.
3. The method for recognizing the text on the surface of the metal part based on the enhancement of the outline characteristics as claimed in claim 1, wherein the characteristic encoding step comprises the steps of:
a characteristic extraction step: sending the character image of the metal part into a depth convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the resolution higher than a preset value;
sequence attention extraction step: and designing an attention method, and fusing the feature sequences with the height of a preset value obtained in the feature extraction step to obtain the feature sequences of the corresponding characters.
4. The method of claim 3, wherein the step of aligning the sequence comprises the steps of:
text semantic segmentation step: performing bottom-up fusion on the multi-level feature map obtained in the feature extraction step through a convolution network, obtaining a saliency map of the character through a classification layer and a nonlinear activation function, then establishing loss between a mask map obtained in the unsupervised clustering step and the saliency map, and optimizing a network segmentation result;
attention sequence correction step: and designing an optimization function to optimize the attention result obtained in the sequence attention extraction step according to the character saliency map obtained in the text semantic segmentation step.
5. The method as claimed in claim 4, wherein the character segmentation step comprises the following steps:
a character semantic segmentation step: constructing a character outline level label according to the attention result after the optimization of the attention sequence correction step and the character saliency map obtained in the semantic segmentation step, constructing a time sequence classification network to predict character outline information, and establishing a constructed character outline level label for loss to supervise and optimize the prediction result;
generating topological characteristics: and performing cross multiplication on the character segmentation graph obtained by predicting according to the character semantic segmentation step and the multi-scale features obtained in the feature extraction step to obtain the topological features.
6. The method for recognizing the text on the surface of the metal part based on the enhancement of the outline features as claimed in claim 1, wherein the feature fusion step comprises the steps of:
and (3) feature fusion step: extracting semantic information from the optimized sequence features and topology features in a fusion mode;
a parallel identification step: and obtaining a final prediction result through a plurality of full connection layers according to the semantic information obtained in the characteristic fusion step.
7. A metal part surface text recognition system based on outline feature enhancement is characterized by comprising the following modules:
a preprocessing module: identifying a metal surface character image, carrying out image enhancement on the metal surface character image through pretreatment to obtain a pretreated image, and then classifying the pretreated image to obtain a mask image;
a feature encoding module: inputting a preprocessed metal surface character image, extracting multi-scale features of characters to obtain a multi-level feature map, and extracting sequence features from a feature map of a last layer in the multi-level feature map by using an attention mechanism;
a sequence alignment module: fusing the multi-scale features from bottom to top according to the resolution by using a convolution network, obtaining a segmentation result of the character according to the mask image, and then optimizing the attention information in the sequence features by using the segmentation result;
a character segmentation module: constructing a character outline label by using the optimized attention information and the obtained segmentation result, then using the character outline label to supervise a predicted character result obtained by the multi-scale features sent into the time sequence classification network, and finally obtaining the topological features of the characters;
a feature fusion module: and fusing the sequence characteristics and the topological characteristics after the attention information is optimized, and sending the sequence characteristics and the topological characteristics into the parallel recognition branches to obtain a final prediction result.
8. The system of claim 7, wherein the preprocessing module comprises the following modules:
an image enhancement module: based on an RGB image self-adaptive histogram, the local contrast of the metal part image is enhanced in a balanced mode, meanwhile, the metal part image is sharpened by adopting a Laplace operator, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
an unsupervised clustering module: and obtaining a mask result of the characters by an unsupervised clustering method based on the obtained preprocessed image.
9. The system of claim 8, wherein the feature encoding module comprises:
a feature extraction module: sending the character image of the metal part into a depth convolution network to obtain a multi-level feature map, and pooling the feature map of the last layer into a feature sequence with the resolution higher than a preset value;
a sequence attention extraction module: and designing an attention method, and fusing the feature sequences with the height of a preset value, which are obtained by the feature extraction module, to obtain the feature sequences of the corresponding characters.
10. The contour feature enhancement based metal part surface text recognition system of claim 9, wherein the sequence alignment module comprises the following modules:
a text semantic segmentation module: the multi-level feature map obtained by the feature extraction module is subjected to bottom-up fusion through a convolution network, a saliency map of the character is obtained through a classification layer and a nonlinear activation function, then a mask map obtained by the unsupervised clustering module and the saliency map are subjected to loss building, and a network segmentation result is optimized;
attention sequence correction module: and designing an optimization function to optimize the attention result obtained by the sequence attention extraction module according to the character saliency map obtained by the text semantic segmentation module.
CN202210610169.3A 2022-05-31 2022-05-31 Metal part surface text recognition method and system based on contour feature enhancement Pending CN114973228A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210610169.3A CN114973228A (en) 2022-05-31 2022-05-31 Metal part surface text recognition method and system based on contour feature enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210610169.3A CN114973228A (en) 2022-05-31 2022-05-31 Metal part surface text recognition method and system based on contour feature enhancement

Publications (1)

Publication Number Publication Date
CN114973228A true CN114973228A (en) 2022-08-30

Family

ID=82957841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210610169.3A Pending CN114973228A (en) 2022-05-31 2022-05-31 Metal part surface text recognition method and system based on contour feature enhancement

Country Status (1)

Country Link
CN (1) CN114973228A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912845A (en) * 2023-06-16 2023-10-20 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI
CN117095423A (en) * 2023-10-20 2023-11-21 上海银行股份有限公司 Bank bill character recognition method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912845A (en) * 2023-06-16 2023-10-20 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI
CN116912845B (en) * 2023-06-16 2024-03-19 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI
CN117095423A (en) * 2023-10-20 2023-11-21 上海银行股份有限公司 Bank bill character recognition method and device
CN117095423B (en) * 2023-10-20 2024-01-05 上海银行股份有限公司 Bank bill character recognition method and device

Similar Documents

Publication Publication Date Title
CN111177366B (en) Automatic generation method, device and system for extraction type document abstract based on query mechanism
CN111309971B (en) Multi-level coding-based text-to-video cross-modal retrieval method
CN107632981B (en) Neural machine translation method introducing source language chunk information coding
CN114973228A (en) Metal part surface text recognition method and system based on contour feature enhancement
CN111581345A (en) Document level event extraction method and device
CN111858843B (en) Text classification method and device
CN110188827B (en) Scene recognition method based on convolutional neural network and recursive automatic encoder model
CN114818721B (en) Event joint extraction model and method combined with sequence labeling
CN109933682B (en) Image hash retrieval method and system based on combination of semantics and content information
CN114529903A (en) Text refinement network
CN115620265A (en) Locomotive signboard information intelligent identification method and system based on deep learning
CN115080750A (en) Weak supervision text classification method, system and device based on fusion prompt sequence
CN112966676B (en) Document key information extraction method based on zero sample learning
CN116152824A (en) Invoice information extraction method and system
CN116401289A (en) Traceability link automatic recovery method based on multi-source information combination
CN113254575B (en) Machine reading understanding method and system based on multi-step evidence reasoning
CN114255379A (en) Mathematical formula identification method and device based on coding and decoding and readable storage medium
CN114595338A (en) Entity relation joint extraction system and method based on mixed feature representation
CN114297408A (en) Relation triple extraction method based on cascade binary labeling framework
CN114298032A (en) Text punctuation detection method, computer device and storage medium
CN116311275B (en) Text recognition method and system based on seq2seq language model
CN116824271B (en) SMT chip defect detection system and method based on tri-modal vector space alignment
CN113361274B (en) Intent recognition method and device based on label vector, electronic equipment and medium
Xamena et al. End-to-end platform evaluation for Spanish Handwritten Text Recognition
CN116778556A (en) Face attribute identification method and system based on visual language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination