CN111160352B

CN111160352B - Workpiece metal surface character recognition method and system based on image segmentation

Info

Publication number: CN111160352B
Application number: CN201911373220.8A
Authority: CN
Inventors: 徐辉; 陆强; 袁智超; 孙天齐
Original assignee: Alnnovation Beijing Technology Co ltd
Current assignee: Alnnovation Beijing Technology Co ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2023-04-07
Anticipated expiration: 2039-12-27
Also published as: CN111160352A

Abstract

The invention discloses a workpiece metal surface character recognition method and system based on image segmentation, which relate to the technical field of character recognition and comprise the following steps: acquiring a text line image of the metal surface of a workpiece; extracting the characteristics of the text line image according to a pre-generated character recognition model to obtain a semantic segmentation mask image and an example segmentation mask image corresponding to the text line image; processing according to the semantic segmentation mask image to obtain a plurality of character areas in the text line image; processing according to the example segmentation mask graph to obtain the character category of each character area; and processing according to the two-dimensional coordinate position of the character area and the character type to obtain a character recognition result of the text line image. The method has the advantages of reliably acquiring the position and the category of each character on the metal surface of the workpiece in the picture in real time, acquiring the recognition result of the character picture on the metal surface of the workpiece, and effectively improving the operation efficiency and the accuracy.

Description

Workpiece metal surface character recognition method and system based on image segmentation

Technical Field

The invention relates to the technical field of character recognition, in particular to a workpiece metal surface character recognition method and system based on image segmentation.

Background

Character recognition generally refers to a process of performing recognition analysis processing on a text image to obtain useful character information on the text image. The method can be mainly divided into optical character recognition, character recognition in natural scenes and character recognition in special scenes. With the rise of deep learning, the accuracy of character recognition is greatly improved compared with the prior art, and the character recognition can be gradually applied to actual industrial production to improve the industrial automation level and the operation efficiency.

However, most of the existing character recognition is applied to the fields of identification card recognition, invoice recognition, automobile VN code recognition, license plate recognition and the like, and the application of character recognition in industrial scenes is less, especially the application of character recognition on the metal surface of a workpiece in the industrial scenes. Compared with the conventional character recognition application, the difficulty of character recognition on the metal surface of the workpiece is as follows: 1, difficulties caused by the imaging of characters. The field industrial workshop operating environment is generally worse, the manufacturing process of workpieces in an actual industrial scene is generally more complicated, and meanwhile, the imaging of the metal surface has the situations of light reflection and the like, so that the imaging stability and quality of the camera to the field workpieces are inferior to those of the conventional character recognition application scene. 2, difficulty caused by character quality. Different from the printed characters in the conventional character recognition application scene, the characters on the metal surface of the workpiece are mainly generated by laser engraving, welding, striker etching and other modes, the color of the generated characters on the surface of the workpiece is close to the background color of the workpiece, meanwhile, the quality of the characters is different due to the process level of actual engraving or welding and the like, and the characters on the workpiece are easily scratched, stained, incomplete and other conditions under the influence of an industrial field. 3, difficulty due to text length. The length and the format of characters on the metal surface of the workpiece are not fixed, and different workpieces are different.

Algorithms in the conventional character recognition application field are mainly CNN + RNN + CTC and CNN + RNN + Attention, and if the algorithms are directly applied to workpiece metal surface character recognition in an industrial environment, due to the fact that the conventional character recognition application scene is greatly different from the industrial scene, the recognition accuracy of the workpiece metal surface character recognition in the prior art cannot meet the requirement of practical application due to the mentioned several difficulties of the workpiece metal surface character recognition.

Disclosure of Invention

The invention aims to provide a workpiece metal surface character recognition method and system based on image segmentation.

In order to achieve the purpose, the invention adopts the following technical scheme:

the method for identifying the characters on the metal surface of the workpiece based on image segmentation specifically comprises the following steps:

s1, acquiring a text line image of the metal surface of a workpiece;

s2, extracting the characteristics of the text line image according to a pre-generated character recognition model to obtain a semantic segmentation mask image and an example segmentation mask image corresponding to the text line image;

s3, processing according to the semantic segmentation mask image to obtain a plurality of character areas in the text line image;

s4, dividing the mask image according to the example to process to obtain the character category of each character area;

and S5, processing according to the two-dimensional coordinate position of the character area and the character type to obtain a character recognition result of the text line image.

As a preferred scheme of the present invention, the method further comprises a process of generating the character recognition model in advance, specifically comprising:

a1, acquiring a plurality of text line images of the metal surface of the workpiece, and performing character-level marking on each text line image to obtain a marked image containing character marking information;

the character marking information comprises real position areas and real type information of all characters in the image;

step A2, inputting each annotated image into an initial recognition model with preset parameters for feature extraction to obtain a semantic segmentation mask image and an example segmentation mask image corresponding to the annotated image;

step A3, respectively calculating a first cross entropy loss between the predicted position area of each character in the semantic segmentation mask map and the corresponding real position area, and a second cross entropy loss between the predicted category information of each character in the example segmentation mask map and the corresponding real category information;

step A4, summing the first cross entropy loss and the second cross entropy loss to obtain a total cross entropy loss of the initial recognition model, and comparing the total cross entropy loss with a preset loss threshold:

if the total cross entropy loss is not less than the loss threshold, turning to step A5;

if the total cross entropy loss is less than the loss threshold, turning to step A6;

step A5, adjusting the preset parameters, and then returning to the step A2;

and A6, storing the initial recognition model as a character recognition model.

As a preferred embodiment of the present invention, the network architecture of the character recognition model specifically includes:

the convolutional encoder comprises a common convolutional layer, wherein the common convolutional layer comprises a convolutional layer with a convolutional kernel size of 3x3 and a step length of 2, a BN layer and a RELU layer, and an output channel of the common convolutional layer is 32;

a first residual convolution block connected to the common convolution layer, where the first residual convolution block includes a depth-separable convolution whose convolution kernel size is 3x3 and step size is 1, and an output channel of the first residual convolution block is 16;

a second residual convolution block connected to the first residual convolution block, where the second residual convolution block includes a first depth separable convolution and a second depth separable convolution, a convolution kernel size of the first depth separable convolution is 3x3, a step size is 2, an output channel is 24, a convolution kernel size of the second depth separable convolution is 3x3, a step size is 1, and an output channel is 24;

a third residual convolution block connected to the second residual convolution block, where the third residual convolution block includes a third depth separable convolution, a fourth depth separable convolution and a fifth depth separable convolution, a convolution kernel size of the third depth separable convolution is 3x3, a step size is 2, an output channel is 32, a convolution kernel size of the fourth depth separable convolution is 3x3, a step size is 1, an output channel is 32, a convolution kernel size of the fifth depth separable convolution kernel is 3x3, a step size is 1, and an output channel is 32;

a fourth residual convolution block connected to the third residual convolution block, where the fourth residual convolution block includes four sixth depth separable convolutions, and a convolution kernel size of each of the sixth depth separable convolutions is 3x3, a step size is 1, and an output channel is 64;

a fifth residual convolution block connected to the fourth residual convolution block, where the fifth residual convolution block includes three seventh depth separable convolutions, a convolution kernel size of each of the seventh depth separable convolutions is 3x3, a step size is 1, and an output channel is 96;

a sixth residual convolution block connected to the fifth residual convolution block, where the sixth residual convolution block includes three eighth depth separable convolutions, and a convolution kernel size of each of the eighth depth separable convolutions is 3x3, a step size is 1, and an output channel is 160;

a seventh residual convolution block connected to the sixth residual convolution block, where the seventh residual convolution block includes a ninth depth separable convolution, a convolution kernel of the ninth depth separable convolution is 3x3, a step size is 1, and an output channel is 320;

and the output layer of the eighth residual rolling block is subjected to upsampling by two different convolutions to respectively generate a semantic segmentation mask map and an example segmentation mask map.

As a preferable embodiment of the present invention, the step S3 specifically includes:

step S31, obtaining semantic class numbers corresponding to all pixel points in the semantic segmentation mask map;

step S32, matching the semantic class number with a semantic class serial number in a preset semantic segmentation class dictionary to obtain a semantic segmentation class of each pixel point;

the semantic segmentation class dictionary stores a plurality of semantic class serial numbers and semantic segmentation classes corresponding to the semantic class serial numbers;

and S33, extracting each pixel point corresponding to the semantic segmentation class representing the character information to obtain a plurality of character areas.

As a preferable embodiment of the present invention, the step S4 specifically includes:

step S41, aiming at each character area, mapping the pixel position of each pixel in the character area to the example division mask image, and extracting the example type number of each pixel position;

step S42, matching the instance class number with an instance class serial number in a preset instance segmentation class dictionary to obtain an instance segmentation class of each pixel position;

the instance segmentation class dictionary stores a plurality of instance class serial numbers and instance segmentation classes corresponding to the instance class serial numbers;

s43, counting the number of pixels corresponding to each instance segmentation class in the text area, and sequencing the number of pixels from large to small to obtain a pixel number sequence;

and step S44, taking the example segmentation class corresponding to the pixel number which is ranked most front in the pixel number sequence as the character class of the character area.

As a preferable embodiment of the present invention, the step S5 specifically includes:

step S51, calculating the center position coordinates of each character area aiming at each character area;

s52, sequencing the abscissa of the central position coordinate of each character area according to the order from small to large to obtain an abscissa sequence according to the character arrangement prior from left to right of each character in the text row image;

and S53, arranging the character types of the corresponding character areas according to the abscissa sequence to form a corresponding character sequence, and generating a character recognition result of the text line image according to the character sequence.

A workpiece metal surface character recognition system based on image segmentation is applied to the workpiece metal surface character recognition method based on image segmentation, and specifically comprises the following steps:

the data acquisition module is used for acquiring a text line image of the metal surface of the workpiece;

the feature extraction module is connected with the data acquisition module and used for extracting features of the text line image according to a pre-generated character recognition model to obtain a semantic segmentation mask image and an example segmentation mask image corresponding to the text line image;

the first identification module is connected with the feature extraction module and used for processing according to the semantic segmentation mask image to obtain a plurality of character areas in the text line image;

the second identification module is respectively connected with the feature extraction module and the first identification module and is used for obtaining the character category of each character area according to the example segmentation mask image processing;

and the third identification module is respectively connected with the first identification module and the second identification module and used for processing according to the two-dimensional coordinate position of the character area and the character category to obtain a character identification result of the text line image.

As a preferred embodiment of the present invention, the present invention further includes a model generation module, connected to the feature extraction module, where the model generation module specifically includes:

the character marking unit is used for acquiring a plurality of text line images on the metal surface of the workpiece and marking the text line images at a character level to obtain marked images containing character marking information;

the character marking unit is used for marking the character image and the instance segmentation mask image, and marking the marked image and the instance segmentation mask image by using the marked image;

the first processing unit is connected with the feature extraction unit and used for respectively calculating a first cross entropy loss between a predicted position area of each character in the semantic segmentation mask map and the corresponding real position area and a second cross entropy loss between predicted category information of each character in the example segmentation mask map and the corresponding real category information;

a second processing unit connected to the first processing unit, the second processing unit including:

a processing subunit, configured to sum the first cross entropy loss and the second cross entropy loss to obtain a total cross entropy loss of the initial recognition model;

the comparison subunit is connected with the processing subunit and is used for comparing the total cross entropy loss with a preset loss threshold, outputting a first comparison result when the total cross entropy is not less than the loss threshold, and outputting a second comparison result when the total cross entropy is less than the loss threshold;

the parameter adjusting unit is connected with the second processing unit and used for adjusting the preset parameters according to the first comparison result;

and the model storage unit is connected with the second processing unit and used for storing the initial recognition model as a character recognition model according to the second comparison result.

As a preferred aspect of the present invention, the first identification module specifically includes:

a first obtaining unit, configured to obtain a semantic class number corresponding to each pixel point in the semantic segmentation mask map;

the first matching unit is connected with the first acquisition unit and used for matching the semantic class number with a semantic class serial number in a preset semantic segmentation class dictionary to obtain a semantic segmentation class of each pixel point;

and the first extraction unit is connected with the first acquisition unit and used for extracting each pixel point corresponding to the semantic segmentation class representing the text information to obtain a plurality of text areas.

As a preferable scheme of the present invention, the second identification module specifically includes:

a second obtaining unit, configured to map, for each text region, a pixel position of each pixel in the text region into the example segmentation mask map, and extract an example class number of each pixel position;

the second matching unit is connected with the second acquisition unit and used for matching the instance class number with an instance class serial number in a preset instance segmentation class dictionary to obtain an instance segmentation class of each pixel position;

the data counting unit is connected with the second matching unit and used for counting the number of pixels corresponding to each instance segmentation class in the text area and sequencing the number of pixels from large to small to obtain a pixel number sequence;

and the second extraction unit is connected with the data statistics unit and is used for dividing the example corresponding to the most-front-ranked pixel number in the pixel number sequence into the text type of the text area.

As a preferable scheme of the present invention, the third identification module specifically includes:

a coordinate calculation unit configured to calculate, for each of the character areas, a center position coordinate of each of the character areas;

the coordinate sorting unit is connected with the coordinate calculating unit and used for sorting the abscissa of the central position coordinate of each character area according to the sequence from small to large to obtain an abscissa sequence according to the character arrangement prior of each character from left to right in the character row image;

and the result generating unit is connected with the coordinate sorting unit and used for arranging the character categories of the corresponding character areas according to the abscissa sequence to form a corresponding character sequence and generating a character recognition result of the text line image according to the character sequence.

The invention has the beneficial effects that: the position and the category of each character on the metal surface of the workpiece in the picture are reliably acquired in real time, the identification result of the character picture on the metal surface of the workpiece is acquired, and the operation efficiency and the accuracy are effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic flow chart of a workpiece metal surface character recognition method based on image segmentation according to an embodiment of the present invention.

Fig. 2 is a flowchart illustrating a process of generating a character recognition model in advance according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a network architecture of a character recognition model according to an embodiment of the invention.

Fig. 4 is a flowchart illustrating a method for recognizing a text area according to an embodiment of the present invention.

Fig. 5 is a flowchart illustrating a method for recognizing a text type according to an embodiment of the present invention.

Fig. 6 is a flowchart illustrating a method for generating a text recognition result according to an embodiment of the present invention.

Fig. 7 is a schematic structural diagram of a workpiece metal surface character recognition system based on image segmentation according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of a text line image of a curved metal surface of a workpiece and corresponding text recognition results according to another embodiment of the invention.

Fig. 9 is a schematic diagram of a text line image of a curved metal surface of a workpiece and corresponding text recognition results according to another embodiment of the invention.

Fig. 10 is a schematic diagram of a text line image of a curved metal surface of a workpiece and corresponding text recognition results according to another embodiment of the invention.

Detailed Description

The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.

Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; for a better explanation of the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if the terms "upper", "lower", "left", "right", "inner", "outer", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not indicated or implied that the referred device or element must have a specific orientation, be constructed in a specific orientation and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and the specific meanings of the terms may be understood by those skilled in the art according to specific situations.

In the description of the present invention, unless otherwise explicitly specified or limited, the term "connected" or the like, if appearing to indicate a connection relationship between the components, is to be understood broadly, for example, as being fixed or detachable or integral; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or may be connected through one or more other components or may be in an interactive relationship with one another. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Based on the technical problems in the prior art, a workpiece metal surface character recognition method based on image segmentation is provided, as shown in fig. 1, and specifically includes the following steps:

s1, acquiring a text line image of the metal surface of a workpiece;

s4, dividing the mask image according to the example to process to obtain the character type of each character area;

and S5, processing according to the two-dimensional coordinate position and the character type of the character area to obtain a character recognition result of the text line image.

Specifically, in the embodiment, the character recognition model based on image segmentation is trained according to the existing data set; inputting the obtained text line images on the metal surface of the workpiece into a trained character recognition model based on image segmentation, and obtaining a semantic segmentation mask image and an example segmentation mask image; determining whether each pixel point in the original input image is a character area or not according to the semantic segmentation mask image output by the model; determining the category of the text area according to the example segmentation mask graph output by the model; and determining a final character recognition result according to the two-dimensional coordinate position relation of the character area.

More preferably, the character recognition model based on image segmentation includes: the input image size of the character recognition model is preferably 88 × 352; the convolutional network of the character recognition model is a variant network based on MobilenetV 2; the output of the character recognition model is two branches of example segmentation and semantic segmentation, wherein the semantic segmentation branch is a semantic segmentation mask graph with the size of C1H 1W 1, and the example segmentation branch is an example segmentation mask graph with the size of C2H 2W 2. Wherein C1 is the category of semantic segmentation, C1 is more than or equal to 1, H1 is the height of the semantic segmentation mask image, H1 is more than or equal to 1, W1 is the width of the semantic segmentation mask image, and W1 is more than or equal to 1; the output example division branch is a semantic division mask graph with the size of C2H 2W 2, wherein C2 is the type of example division, C2 is more than or equal to 1, H2 is the height of the example division mask graph, H2 is more than or equal to 1, W2 is the width of the example division mask graph, and W2 is more than or equal to 1.

Further preferably, the training of the character recognition model based on image segmentation according to the existing data set includes: marking the existing data set at a character level, wherein the marking information comprises the position and the category marking information of each character in the text line image; analyzing the picture length-width ratio of the existing data set data to determine the size of an input picture; building a label of a data set according to the width and the height of two output branch mask images of a character recognition model for image segmentation and the existing character labeling information; performing data enhancement on the original data set, and simultaneously correspondingly modifying the label of the data after the data enhancement; constructing a character recognition model based on image segmentation; respectively calculating cross entropy losses of two branch instance segmentation mask graphs and semantic segmentation mask graphs output by the model and corresponding labels, wherein a loss function is the sum of the cross entropy losses of the two branches; and training according to the sum of the cross entropy losses to obtain a character recognition model.

Further preferably, the inputting the obtained text line image on the metal surface of the workpiece into a trained character recognition model based on image segmentation and obtaining the semantic segmentation mask map and the example segmentation mask map includes: acquiring a text line image on the metal surface of a workpiece; the character recognition model extracts the characteristics of the input text line image and outputs two branches of an example segmentation mask image and a semantic segmentation mask image.

Further preferably, the determining whether each pixel point in the original input text line image is a text region according to the semantic segmentation mask image output by the text recognition model includes: and corresponding the value of each pixel point position in the semantic segmentation mask map with the category serial number in the semantic segmentation category dictionary to obtain the semantic segmentation category at the position.

Further preferably, the determining the type of the text region according to the example segmentation mask map output by the text model includes: mapping each pixel position of the character region to an example segmentation mask map according to the character region determined by the semantic segmentation map, and then corresponding the value of the mapped pixel position in the example segmentation mask map to the category serial number in the example segmentation category dictionary, thereby obtaining the example segmentation category at the corresponding pixel position; and counting the number of pixel points of each type in the current character area according to the type of each pixel point in the current character area, taking the type with the largest pixel point ratio as the type of the current character area, and obtaining the types of all the character areas by analogy.

Further preferably, the determining a final character recognition result according to the two-dimensional coordinate position relationship of the character region includes: calculating the central coordinate position of each character area; and sorting the abscissa of the central coordinate of each character region from small to large according to the character arrangement prior of each character from left to right in the image of the character row, and arranging the categories of the corresponding character regions after sorting into a character sequence as a final character recognition result.

As a preferred embodiment of the present invention, the method further includes a process of generating a character recognition model in advance, as shown in fig. 2, specifically including:

a1, acquiring a plurality of text line images on the metal surface of a workpiece, and performing character-level marking on each text line image to obtain a marked image containing character marking information;

step A3, respectively calculating a first cross entropy loss between a predicted position area and a corresponding real position area of each character in the semantic segmentation mask map, and a second cross entropy loss between predicted category information and corresponding real category information of each character in the example segmentation mask map;

step A4, summing the first cross entropy loss and the second cross entropy loss to obtain a total cross entropy loss of the initial recognition model, and comparing the total cross entropy loss with a preset loss threshold value:

step A5, adjusting the preset parameters, and then returning to the step A2;

and step A6, storing the initial recognition model as a character recognition model.

As a preferred scheme of the present invention, as shown in fig. 3, a network architecture of the character recognition model specifically includes:

a normal convolutional layer 70, the normal convolutional layer 70 including a convolutional layer having a convolutional kernel size of 3x3 and a step size of 2, a BN layer, and a RELU layer, and an output channel of the normal convolutional layer 70 being 32;

a first residual convolution block 71 connected to the normal convolution layer 70, the first residual convolution block 71 including a depth separable convolution having a convolution kernel size of 3x3 and a step size of 1, and an output channel of the first residual convolution block 71 being 16;

a second residual convolution block 72 connected to the first residual convolution block 71, the second residual convolution block 72 including a first depth separable convolution and a second depth separable convolution, and the convolution kernel size of the first depth separable convolution is 3x3, the step size is 2, the output channel is 24, the convolution kernel size of the second depth separable convolution is 3x3, the step size is 1, and the output channel is 24;

a third residual convolution block 73 connected to the second residual convolution block 72, where the third residual convolution block 73 includes a third depth separable convolution, a fourth depth separable convolution and a fifth depth separable convolution, and the convolution kernel size of the third depth separable convolution is 3x3, the step size is 2, the output channel is 32, the convolution kernel size of the fourth depth separable convolution is 3x3, the step size is 1, the output channel is 32, the convolution kernel size of the fifth depth separable convolution is 3x3, the step size is 1, and the output channel is 32;

a fourth residual convolution block 74 connected to the third residual convolution block 73, where the fourth residual convolution block 74 includes four sixth-depth separable convolutions, and a convolution kernel size of each sixth-depth separable convolution is 3x3, a step size is 1, and an output channel is 64;

a fifth residual convolution block 75 connected to the fourth residual convolution block 74, where the fifth residual convolution block 75 includes three seventh depth separable convolutions, and the convolution kernel size of each seventh depth separable convolution is 3x3, the step size is 1, and the output channel is 96;

a sixth residual convolution block 76 connected to the fifth residual convolution block 75, where the sixth residual convolution block 76 includes three eighth depth separable convolutions, and the convolution kernel size of each eighth depth separable convolution is 3x3, the step size is 1, and the output channel is 160;

a seventh residual convolution block 77 connected to the sixth residual convolution block 76, where the seventh residual convolution block 77 includes a ninth depth separable convolution with a convolution kernel size of 3x3, a step size of 1, and an output channel of 320;

the eighth residual convolution block 78 is connected to the seventh residual convolution block 77, and the output layer of the eighth residual convolution block 78 is upsampled by two different convolutions to generate a semantic segmentation mask map 791 and an example segmentation mask map 792, respectively.

Specifically, in the present embodiment, the input of the normal convolutional layer 70 is a text line image, and the size of the input image is preferably 88 × 352. The layer consists of a convolution layer with convolution kernel size of 3x3 and step size of 2, a BN layer, a RELU layer, the output channel of which is 32.

The first to eighth residual volume blocks 71 to 78 are each composed of one or more depth separable convolutions, and join the shortcut structure of the resnet network, and the difference is that different residual convolution blocks contain different numbers of depth separable convolutions and convolution parameters thereof, specifically:

first residual volume block 71: consists of 1 depth separable convolution, where the convolution kernel size of the depth separable convolution is 3*3, the step size is 1, and the output channel of the layer is 16.

Second residual volume block 72: consists of 2 depth-separable convolutions, wherein the convolution kernel size of the 1 st depth-separable convolution is 3*3, step size is 2, output channel is 24, the size of the 2 nd depth-separable convolution is 3*3, step size is 1, and output channel is 24.

Third residual volume block 73: the convolution device is composed of 3 depth separable convolutions, wherein the convolution kernel size of the 1 st depth separable convolution is 3*3, the step size is 2, the output channel is 32, the 2 nd and 3 rd depth separable convolution parameters are consistent, the convolution kernel size is 3*3, the step size is 1, and the output channel is 32.

Fourth residual volume block 74: the convolution is composed of 4 depth separable convolutions, wherein the 4 depth separable convolution parameters are consistent, the convolution kernel size is 3*3, the step size is 1, and the output channel is 64.

Fifth residual volume block 75: the convolution kernel is formed by 3 depth separable convolutions, wherein the 3 depth separable convolution parameters are consistent, the convolution kernel size is 3*3, the step size is 1, and the output channel is 96.

Sixth residual volume block 76: the convolution kernel is formed by 3 depth separable convolutions, wherein the 3 depth separable convolution parameters are consistent, the convolution kernel size is 3*3, the step size is 1, and the output channel is 160.

Seventh residual volume block 77: consists of 1 depth separable convolution, where the convolution kernel size of the depth separable convolution is 3*3, the step size is 1, and the output channel is 320.

The output layer of the eighth residual convolution block 78 is upsampled by two different convolutions to generate two branches, a semantic segmentation mask map 791 and an example segmentation mask map 792, the number of convolution kernel channels used to generate the semantic segmentation mask map 791 is determined by the type of the text region, and the number of convolution kernel channels used to generate the example segmentation mask map 792 is determined by the total type of the text to be recognized.

As a preferred embodiment of the present invention, as shown in fig. 4, step S3 specifically includes:

and S33, extracting all pixel points corresponding to the semantic segmentation categories representing the character information to obtain a plurality of character areas.

As a preferred embodiment of the present invention, as shown in fig. 5, step S4 specifically includes:

step S41, aiming at each character area, mapping the pixel position of each pixel in the character area to an example division mask map, and extracting the example class number of each pixel position;

step S42, matching the instance class number with the instance class sequence number in a preset instance segmentation class dictionary to obtain an instance segmentation class of each pixel position;

step S44, dividing the example corresponding to the top pixel in the pixel number sequence into classes as character classes of the character area.

As a preferred embodiment of the present invention, as shown in fig. 6, step S5 specifically includes:

s52, sequencing the abscissa of the central position coordinate of each character area according to the sequence from small to large to obtain an abscissa sequence according to the character arrangement prior of each character from left to right in the character line image;

A workpiece metal surface character recognition system based on image segmentation, which applies any one of the above workpiece metal surface character recognition methods based on image segmentation, as shown in fig. 7, specifically includes:

the data acquisition module 1 is used for acquiring a text line image of the metal surface of a workpiece;

the feature extraction module 2 is connected with the data acquisition module 1 and is used for extracting features of the text line image according to a pre-generated character recognition model to obtain a semantic segmentation mask image and an example segmentation mask image corresponding to the text line image;

the first recognition module 3 is connected with the feature extraction module 2 and used for processing according to the semantic segmentation mask image to obtain a plurality of character areas in the text line image;

the second recognition module 4 is respectively connected with the feature extraction module 2 and the first recognition module 3 and is used for segmenting the mask image according to the example to obtain the character category of each character area;

and the third recognition module 5 is respectively connected with the first recognition module 3 and the second recognition module 4 and is used for processing according to the two-dimensional coordinate position and the character category of the character area to obtain a character recognition result of the text line image.

As a preferred scheme of the present invention, the present invention further includes a model generation module 6 connected to the feature extraction module 2, where the model generation module 6 specifically includes:

the character marking unit 61 is used for acquiring a plurality of text line images on the metal surface of the workpiece, and marking each text line image at a character level to obtain a marked image containing character marking information;

a feature extraction unit 62 connected to the character labeling unit 61 and configured to input each labeled image into an initial recognition model with preset parameters for feature extraction to obtain a semantic segmentation mask map and an example segmentation mask map corresponding to the labeled image;

a first processing unit 63 connected to the feature extracting unit 62, configured to calculate a first cross entropy loss between the predicted position region and the corresponding real position region of each character in the semantic segmentation mask map, and a second cross entropy loss between the predicted category information and the corresponding real category information of each character in the example segmentation mask map, respectively;

a second processing unit 64 connected to the first processing unit 63, the second processing unit 64 including:

a processing subunit 641, configured to sum the first cross entropy loss and the second cross entropy loss to obtain a total cross entropy loss of the initial recognition model;

a comparing subunit 642, connected to the processing subunit 641, configured to compare the total cross entropy loss with a preset loss threshold, and output a first comparison result when the total cross entropy is not smaller than the loss threshold, and output a second comparison result when the total cross entropy is smaller than the loss threshold;

a parameter adjusting unit 65, connected to the second processing unit 64, for adjusting the preset parameter according to the first comparison result;

and the model storage unit 66 is connected to the second processing unit 64 and is used for storing the initial recognition model as a character recognition model according to the second comparison result.

As a preferred embodiment of the present invention, the first identification module 3 specifically includes:

a first obtaining unit 31, configured to obtain a semantic class number corresponding to each pixel point in the semantic segmentation mask map;

the first matching unit 32 is connected to the first obtaining unit 31, and is configured to match the semantic class number with a semantic class number in a preset semantic segmentation class dictionary to obtain a semantic segmentation class of each pixel point;

the first extraction unit 33 is connected to the first obtaining unit 32, and is configured to extract each pixel point corresponding to the semantic segmentation class representing the text information, so as to obtain a plurality of text regions.

As a preferred embodiment of the present invention, the second identification module 4 specifically includes:

a second obtaining unit 41, configured to map, for each text region, a pixel position of each pixel in the text region into an example division mask map, and extract an example class number of each pixel position;

the second matching unit 42 is connected to the second obtaining unit 41, and is configured to match the instance class number with an instance class sequence number in a preset instance segmentation class dictionary to obtain an instance segmentation class of each pixel position;

the data counting unit 43 is connected with the second matching unit 42 and is used for counting the number of pixels corresponding to each instance segmentation class in the text area and sequencing the number of pixels from large to small to obtain a pixel number sequence;

the second extracting unit 44 is connected to the data counting unit 43, and is configured to divide the instance corresponding to the top-ranked pixel number in the pixel number sequence into categories as the character categories of the character region.

As a preferred embodiment of the present invention, the third identification module 5 specifically includes:

a coordinate calculation unit 51 for calculating a center position coordinate of each character region for each character region;

the coordinate sorting unit 52 is connected with the coordinate calculating unit 51 and used for sorting the abscissa of the coordinates of the central position of each character area according to the order from small to large to obtain an abscissa sequence according to the character arrangement prior from left to right of each character in the text line image;

and the result generating unit 53 is connected to the coordinate sorting unit 52, and is configured to arrange the character categories of the corresponding character areas according to the abscissa sequence to form a corresponding character sequence, and generate a character recognition result of the text line image according to the character sequence.

In another preferred embodiment of the present invention, the present invention can be practically applied to the field industrial production environment, and has a high recognition rate of the characters on the metal surface of the workpiece, as shown in fig. 8 to 10, even when the characters on the metal surface of the workpiece have scratches, blurs, too dark, too much exposure, too shallow lettering, etc., the characters can be recognized, and at the same time, the characters can be recognized for the curved text lines. In addition, the provided identification method not only can identify the content of the characters, but also can accurately position the position of each character in the picture.

It should be understood that the above-described embodiments are merely preferred embodiments of the invention and the technical principles applied thereto. It will be understood by those skilled in the art that various modifications, equivalents, changes, and the like can be made to the present invention. However, such variations are within the scope of the invention as long as they do not depart from the spirit of the invention. In addition, certain terms used in the specification and claims of the present application are not limiting, but are used merely for convenience of description.

Claims

1. A workpiece metal surface character recognition method based on image segmentation is characterized by comprising the following steps:

s1, acquiring a text line image of the metal surface of a workpiece;

s5, processing according to the two-dimensional coordinate position of the character area and the character type to obtain a character recognition result of the text line image;

the workpiece metal surface character recognition method based on image segmentation further comprises a process of generating the character recognition model in advance, and the method specifically comprises the following steps:

step A3, respectively calculating a first cross entropy loss between a predicted position area of each character in the semantic segmentation mask map and the corresponding real position area, and a second cross entropy loss between predicted category information of each character in the example segmentation mask map and the corresponding real category information;

step A5, adjusting the preset parameters, and then returning to the step A2;

and A6, storing the initial recognition model as a character recognition model.

2. The method of claim 1, wherein the network architecture of the character recognition model specifically comprises:

a first residual convolution block connected to the normal convolution layer, wherein the first residual convolution block includes a depth separable convolution, a convolution kernel of the depth separable convolution is 3x3, a step size is 1, and an output channel of the first residual convolution block is 16;

a second residual convolution block connected to the first residual convolution block, wherein the second residual convolution block includes a first depth separable convolution and a second depth separable convolution, and the convolution kernel size of the first depth separable convolution is 3x3, the step size is 2, the output channel is 24, the convolution kernel size of the second depth separable convolution is 3x3, the step size is 1, and the output channel is 24;

a third residual convolution block connected to the second residual convolution block, where the third residual convolution block includes a third depth separable convolution, a fourth depth separable convolution and a fifth depth separable convolution, and a convolution kernel size of the third depth separable convolution is 3x3, a step size is 2, an output channel is 32, a convolution kernel size of the fourth depth separable convolution is 3x3, a step size is 1, an output channel is 32, a convolution kernel size of the fifth depth separable convolution is 3x3, a step size is 1, and an output channel is 32;

a fourth residual convolution block connected to the third residual convolution block, where the fourth residual convolution block includes four sixth depth-separable convolutions, and a convolution kernel of each of the sixth depth-separable convolutions has a size of 3x3, a step size of 1, and an output channel of 64;

a fifth residual convolution block connected to the fourth residual convolution block, where the fifth residual convolution block includes three seventh depth-separable convolutions, a convolution kernel size of each of the seventh depth-separable convolutions is 3x3, a step size is 1, and an output channel is 96;

a seventh residual convolution block connected to the sixth residual convolution block, where the seventh residual convolution block includes a ninth depth separable convolution, a convolution kernel of the ninth depth separable convolution has a size of 3x3, a step size of 1, and an output channel of 320;

and an eighth residual convolution block connected to the seventh residual convolution block, wherein an output layer of the eighth residual convolution block is upsampled by two different convolutions to generate a semantic segmentation mask map and an example segmentation mask map, respectively.

3. The method for recognizing characters on the metal surface of a workpiece based on image segmentation as claimed in claim 1, wherein the step S3 specifically comprises:

4. The method for recognizing the characters on the metal surface of the workpiece based on the image segmentation as claimed in claim 1, wherein the step S4 specifically comprises:

step S42, matching the instance class number with an instance class sequence number in a preset instance segmentation class dictionary to obtain an instance segmentation class of each pixel position;

5. The method for recognizing characters on a metal surface of a workpiece based on image segmentation as claimed in claim 1, wherein the step S5 specifically comprises:

6. A workpiece metal surface character recognition system based on image segmentation, which is characterized in that the workpiece metal surface character recognition method based on image segmentation as claimed in any one of claims 1 to 5 is applied, and the workpiece metal surface character recognition system based on image segmentation specifically comprises:

the second identification module is respectively connected with the feature extraction module and the first identification module and is used for processing the example segmentation mask map to obtain character categories of the character areas;

the third identification module is respectively connected with the first identification module and the second identification module and used for processing according to the two-dimensional coordinate position of the character area and the character category to obtain a character identification result of the text line image;

the workpiece metal surface character recognition system based on image segmentation further comprises a model generation module connected with the feature extraction module, and the model generation module specifically comprises:

the characteristic extraction unit is connected with the character marking unit and is used for inputting each marked image into an initial identification model with preset parameters for characteristic extraction to obtain a semantic segmentation mask image and an example segmentation mask image corresponding to the marked image;

the first processing unit is connected with the feature extraction unit and is used for respectively calculating a first cross entropy loss between a predicted position area of each character in the semantic segmentation mask map and the corresponding real position area and a second cross entropy loss between predicted category information of each character in the example segmentation mask map and the corresponding real category information;

7. The image segmentation-based workpiece metal surface character recognition system of claim 6, wherein the first recognition module specifically comprises:

the first matching unit is connected with the first acquisition unit and is used for matching the semantic class number with a semantic class serial number in a preset semantic segmentation class dictionary to obtain a semantic segmentation class of each pixel point;

8. The image segmentation-based workpiece metal surface character recognition system of claim 6, wherein the second recognition module specifically comprises:

the second matching unit is connected with the second acquisition unit and used for matching the instance class number with the instance class sequence number in a preset instance segmentation class dictionary to obtain an instance segmentation class of each pixel position;

and the second extraction unit is connected with the data statistics unit and is used for dividing the example corresponding to the most-front-ranked pixel number in the pixel number sequence into classes as character classes of the character area.

9. The image segmentation-based workpiece metal surface character recognition system of claim 6, wherein the third recognition module specifically comprises:

the coordinate sorting unit is connected with the coordinate calculating unit and used for sorting the abscissa of the central position coordinate of each character area according to the small-to-large sequence to obtain an abscissa sequence according to the character arrangement prior of each character in the text row image from left to right;