CN116452809A

CN116452809A - Line object extraction method based on semantic segmentation

Info

Publication number: CN116452809A
Application number: CN202310442676.5A
Authority: CN
Inventors: 王慧青; 刘复铭; 余厚云
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2023-07-18

Abstract

The invention discloses a line object extraction method based on semantic segmentation, which comprises the steps of firstly labeling a sample image, generating a mask image and enhancing a data set to obtain a training data set, then constructing a semantic segmentation neural network model, training the model, reasoning an image to be detected by using the trained model to obtain a segmentation result, and performing post-processing on the result obtained by semantic segmentation, wherein the post-processing comprises image refinement, cross line identification and line merging, and finally obtaining coordinate information of points formed by the line object in the image to finish the extraction of the line object in a complex image.

Description

Line object extraction method based on semantic segmentation

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a line object extraction method based on semantic segmentation.

Background

In industrial production, there are some products or some elements in products, mainly line objects, which need to be detected for defects due to the quality detection requirement. With the development of image processing and computer vision technologies, image processing technologies have also been widely used in industrial product defect detection.

An important step in defect detection using images is to separate the object to be detected from the background from the image, i.e. to divide the image. The image segmentation refers to that an image is segmented into different areas by a certain method, and the same area shows the same or similar characteristics of gray scale, color, space texture, geometric shape, even semantics and the like. Image segmentation is an important step between image processing and image analysis and is also a classical problem in computer vision research.

In the prior art, conventional image segmentation methods, such as a common segmentation algorithm based on a threshold value, a segmentation algorithm based on an edge and a segmentation algorithm based on a region, are still mostly used for segmenting objects from images, and other theories, such as clustering, mathematical morphology, genetic algorithm, wavelet transformation and the like, are also used. Most of the traditional algorithms focus on low-order visual information of the image, have good effects on the images with simple segmentation content, few categories and strong purposefulness, but cannot segment by using the advanced features of the target. In the face of complex segmentation tasks, conventional segmentation algorithms do not achieve the effect of defect detection.

With the rapid development of deep learning, the convolutional neural network technology brings a new solution to the field of image processing. The U-Net network model of an end-to-end encoder-decoder architecture was proposed by Ronneberger et al based on the FCN network (Ronneberger O, fischer P, brox T.U-Net: convolutional Networks for Biomedical Image Segmentation [ J ]. Springer International Publishing, 2015.) with a new jump connection added between the convolved portion and the symmetric upsampled portion, overcoming the disadvantage that the FCN is unable to retain part of the pixel spatial location information and context information resulting in loss of local and global features. The U-Net mainly comprises an encoder, a decoder and jump connection, the jump connection reduces the difference of the context information, and the segmentation performance is further improved. Since the network is similar to a U-type in structure, the network is named as a UNet network. The network has high segmentation precision and is suitable for small sample data.

The success of U-Net has led many other network architectures to choose such "U" type networks as the backbone of the model. However, there is also a limitation on the U-Net structure based on the codec network, firstly, the optimal codec depth is unknown a priori, extensive architecture search or inefficient integration of different depth models is required for testing, i.e. different data sets determine different optimal depths of the network, and the deeper the network is not necessarily the better; secondly, the jump connection between encoder and decoder imposes an unnecessarily restrictive fusion scheme that only forces fusion on the same scale of feature maps of the encoder and decoder subnetworks, i.e. the same scale of feature maps from the decoder and encoder network are semantically different, without a reliable theory ensuring that they are the best matches for feature fusion.

To solve the problems with the UNet described above, zhou et al propose a U-Net modified UNet++ model (Zhou Z, siddique M R, tajbakhsh N, et al UNet++: A Nesed U-Net Architecture for Medical Image Segmentation [ J ]. 2018.) which is primarily characterized by the use of multi-layer U-Net network overlays to mitigate unknown network depths through efficient integration of U-Nets of different depths that can share one encoder in part and learn simultaneously through depth supervision. In addition, the unet++ model redesigns the jump connection so that the decoder subnetwork can aggregate features with different semantic scales, thereby generating a highly flexible feature fusion scheme.

Although existing semantic segmentation models can accurately segment line objects from the background, the result of the segmentation is a binarized image, not the coordinate information of the line objects. The accurate extraction of the coordinates of each point on the line object from the image is currently an urgent problem to be solved.

Disclosure of Invention

The invention provides a line object extraction method based on semantic segmentation, which aims at solving the problem that in the prior art, line objects are difficult to extract accurately in a complex scene.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: a line object extraction method based on semantic segmentation comprises the following steps:

S1, constructing a training set sample: generating a mask image by using the original picture and the marked label through program conversion to form a data set, and enhancing the data set to form a training data set; the data set enhancement mode comprises one or more of affine transformation, contrast transformation and elastic deformation;

s2, constructing a semantic segmentation neural network model: the neural network model mainly comprises an encoder, a decoder and jump connection, wherein the jump connection combines a depth, semantic and coarse granularity characteristic diagram from a sub-network of the decoder with a shallow, low and fine granularity characteristic diagram from the sub-network of the encoder to realize fine segmentation;

s3, model training: training the network model constructed in the step S2 according to the training data set obtained in the step S1 to obtain a trained network model;

s4, image segmentation: inputting the sample image to be detected into the trained network model obtained in the step S3 for reasoning and segmentation to obtain a segmentation result of the binarized mask image;

s5, image post-processing: carrying out post-processing on the segmentation result obtained in the step S4 to obtain segmentation coordinate information of the image lines, wherein the post-processing at least comprises extraction of the line object contours, refinement of the contours, extraction of the lines and identification of crossing points;

S6, combining lines: and carrying out merging operation on the lines with the endpoint distances and the endpoint vector direction angles meeting the set threshold values, merging the line parts in the set, and finally obtaining the coordinate information of points formed by all the complete line objects in the image.

As an improvement of the invention, the step S1 is to use Labelme marking tool to mark the pixel type of the sample image in the training set sample to obtain the label data file; converting the label data file by a program to output a corresponding mask image; the sample image and the corresponding mask image are subjected to horizontal, vertical overturning and rotation transformation simultaneously, and random brightness, contrast and elastic deformation processing are carried out to form a training data set

As another improvement of the present invention, the affine transformation in the data set enhancement mode in the step S1 is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, and the sample enhancement is performed from the whole image, specifically:

wherein, (x, y) represents pixel coordinates on the original image, and (x ', y') represents transformed pixel coordinates, which are respectively 6 transformation parameters for controlling translation, rotation, scaling and shearing operations;

order theThen it can be recorded as

Matrix a is a 2 x 2 matrix that controls scaling, rotation and shearing linear transformations before and after transformation; wherein the first column vectorAnd a second column vector->Scaling and rotation of the x-axis and the y-axis before and after transformation are respectively controlled; vector B is a column vector of 2*1, which controls the translated portion after transformation; wherein the first element c controls the amount of translation in the x-axis and the second element f controls the amount of translation in the y-axis.

As another improvement of the present invention, the contrast transformation in the data set enhancement mode in the step S1 is performed on a single pixel for sample enhancement, and is specifically expressed by the formula x '=αx+β, where x' is a gray value after the pixel transformation, x is a gray value before the pixel transformation, and α and β respectively determine the degrees of contrast and brightness of the transformation.

As a further improvement of the present invention, the elastic deformation in the data set enhancement mode in the step S1 is performed on a morphological scale, specifically expressed by using the formula Trans (x+ [ delta ] x (x, y), y+ [ delta ] y (x, y))=i (j, k), and for the point I (j, k) on the original image, a Trans value is obtained after elastic deformation, wherein a random offset of (-1, 1) interval is generated for each pixel point (x, y) of the input image, and the displacement of the pixel point in the x and y directions is expressed by using [ delta ] x (x, y) and [ delta ] y (x, y), respectively.

As a further improvement of the present invention, in the step S3 model training process, the loss function used is: l=β·fl+ (1- β) ·dl·λ, where FL is a Focal Loss function; DL is a Dice Loss function; beta is a super parameter for controlling the weight relation between Focal Loss and Dice Loss; lambda is a superparameter for balancing the order of magnitude imbalance between the Focal Loss and the Dice Loss function values.

As a further improvement of the present invention, the step S5 specifically includes:

s51: removing noise areas generated in the step S4 through contour area screening, and completing extraction of the contours of the line objects;

s52, refining the line object binarization image obtained in the step S51 by using a Zhang-Suen algorithm to obtain a line area with the pixel width of 1, and finishing the refinement of the outline;

s53, judging whether each point in the image obtained in the image S51 is a starting point of a line by traversing the neighborhood condition of the point, wherein the specific flow is as follows: traversing the points in the image, and judging the number of the points with 9 pixel values which are not 0 in the eight neighborhood of the point and the point: if the number is 0, indicating that the area does not contain lines, and directly skipping; if the number is 1, indicating that isolated points exist in the area, and skipping; if the number is 2, indicating that the point is an endpoint of a certain line, and recording the point; if the number is 3, the point is indicated to be in a certain part in the middle of the line and skipped;

S54, a depth-first search algorithm is used for obtaining the coordinates of each point on the line, and whether the points belong to the cross points or not is judged in the depth-first search process, so that the extraction of the line and the identification of the cross points are completed; the identification of the crossing point is specifically as follows: if there are two or more points with a pixel gray value of 255 in the eight neighborhood of the point except the precursor point of the point, the point is considered to be the intersection point.

As a further improvement of the present invention, the basic basis for determining whether the lines can be combined in step S6 is to increase and determine whether the vector direction angles of the vectors formed by the four end points of the two lines meet the set number relationship based on determining the distance between the nearest break points of the two lines, which can be expressed as

Wherein A and B are two endpoints of the same line, C and D are two endpoints of the other line, B and C are two endpoints closest to the two lines, B 'and C' are points of n pixels inward of the B end and the C end respectively, and alpha isDirection angle, beta is->Direction angle, gamma is +.>The direction angle, min_d, is a set distance threshold.

As a further improvement of the present invention, the step S6 merges two line segments into one line, wherein the portion of the gap between the two lines uses an interpolation method to generate a connecting line, calculates coordinate points of a fitting straight line between two end points closest to the two lines by a BresenhamLine method based on interpolation, merges the three point sets of the two line point sets and the fitted interpolation point set into one point set according to the order of line growth; the calculation formula of the BresenhamLine method is as follows:

And performing rounding optimization on the calculated result.

Compared with the prior art: according to the line object extraction method based on semantic segmentation, the line objects are extracted by using the semantic segmentation model which is connected in a densely jumping manner, so that the problem that a target cannot be segmented through semantic information in traditional image segmentation is solved, the edge and detail can be segmented more finely relative to a UNet network, the accurate separation of the target to be detected is realized, and the problem that the line objects possibly cross is solved; the line merging method provided by the invention not only judges the distance between two line endpoints, but also judges whether the endpoint direction angle meets the condition, thereby greatly reducing the possibility of error merging. The invention provides convenience for further analysis of the line object, and information of the line object including, but not limited to, length of the line, curvature parameters at various positions, position relationship between lines and the like can be analyzed subsequently on the basis of the achievement of the invention.

Drawings

FIG. 1 is a flow chart of steps of a line object extraction method based on semantic segmentation according to the present invention;

FIG. 2 is a diagram of the structure of a UNet++ network model used in step S2 of the method of the present invention;

Fig. 3 is an original picture 1 of a semiconductor X-Ray image provided in embodiment 2 of the present invention;

fig. 4 is a graph of the segmentation result after the method is used for the original picture 1 in embodiment 2 of the present invention;

fig. 5 is an original image 2 of a semiconductor X-Ray image provided in embodiment 2 of the present invention;

fig. 6 is a graph of the segmentation result after the method is used for the original picture 2 in embodiment 2 of the present invention;

FIG. 7 is a flow chart showing the steps of the post-processing of step S5 of the method of the present invention;

FIG. 8 is a neighborhood numbering plan defined in the present invention;

FIG. 9 is a schematic diagram of the merging operation provided in embodiment 2 of the present invention;

fig. 10 is a schematic diagram of line merging provided in embodiment 2 of the present invention.

Detailed Description

The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention.

Example 1

A line object extraction method based on semantic segmentation, as shown in figure 1, comprises the following steps:

step S1, constructing a training set sample: labeling the pixel types of the sample image by using a Labelme labeling tool to obtain a label data file; converting the label data file by a program to output a corresponding mask image; and simultaneously carrying out horizontal, vertical overturning, rotation and other transformations on the sample image and the corresponding mask image, and carrying out random brightness, contrast, elastic deformation and other treatments to form a training data set.

Under the condition that the number of the original data sets obtained by labeling in the step S1 is small, affine transformation, contrast transformation, elastic deformation and combinations thereof are used for data set enhancement, so that the number of effective samples can be greatly increased, and the problem that training samples are insufficient due to high sample acquisition difficulty in actual situations is solved. Affine transformation, contrast transformation and elastic deformation are respectively sample enhanced from three aspects of whole image, single pixel and morphological scale, whereas the data set enhancement in step S1 uses a combination of these three aspects. Wherein the affine transformation can be represented by the following mathematical expression

Wherein, the pixel coordinates on the original image are represented, the transformed pixel coordinates are respectively 6 transformation parameters, and the transformation parameters can be used for controlling operations of translation, rotation, scaling, shearing and the like. If orderThen it can be recorded as

The matrix a and vector B represent the linear and translational parts of the transformation, respectively. Specifically, matrix a is a 2 x 2 matrix that controls scaling, rotation, and shearing, etc., of the linear transformation before and after transformation. Wherein the first column vectorAnd a second column vector->Scaling and rotation of the x-axis and y-axis before and after transformation, respectively, are controlled. When the values of a and e are equal and the values of b and d are zero, it means that only the equal scaling transformation is performed; when the values of a and e are equal and the values of b and d are not zero, this indicates that rotation, clipping, scaling, etc. transformations are performed. Vector B is A column vector of 2*1 which controls the translated portion after transformation. Wherein the first element c controls the amount of translation in the x-axis and the second element f controls the amount of translation in the y-axis. In practical applications, the translation operation on the image may be achieved by changing the value of the vector B.

The contrast transformation can be expressed by the formula x '=αx+β, where x' is the gray value after the pixel transformation, x is the gray value before the pixel transformation, and α and β determine the degree of contrast and luminance of the transformation, respectively.

The elastic deformation can be expressed by using the formula Trans (x+Δx (x, y), y+Δy (x, y))=i (j, k), and for the point I (j, k) on the original image, a Trans value is obtained after elastic deformation, wherein random offset of (-1, 1) interval is generated for each pixel point (x, y) of the input image, and Δx (x, y) and Δy (x, y) are used to represent the displacement of the pixel point in x and y directions, respectively

Step S2, constructing a semantic segmentation neural network model: the neural network model is based on a UNet++ model and mainly comprises an encoder, a decoder and jump connection, wherein the jump connection combines a depth, semantic and coarse granularity characteristic diagram from a sub-network of the decoder with a shallow, low and fine granularity characteristic diagram from the sub-network of the encoder to realize finer segmentation of edges and details;

Step S3, model training: training the network model constructed in the step S2 according to the training data set obtained in the step S1 to obtain a trained network model;

the Loss function used in step S3 is a weighted mixture of the Loss functions of Focal Loss and Dice Loss. Focal Loss is a Loss function for dealing with sample classification imbalance, and is focused on adding weight to the corresponding Loss of a sample according to the difficulty of sample resolution, namely adding smaller weight alpha to a sample which is easy to distinguish ₁ Adding a greater weight alpha to the indistinguishable samples ₂ Therefore, the model mainly focuses on the refractory samples, and the weights of the refractory samples are increased. Thus the Focal Loss function can be expressed as

FL(p _t )＝-α _t (1-p _t ) ^γ log(p _t )

The Dice coefficient is a measure function of the similarity of the set, and is commonly used to calculate the similarity of two samples, and can be usedExpressed by ∈Loss, and Dice Loss is expressed as +.>In semantic segmentation, X represents a label image and Y represents an inference image, so the Dice Loss function can be further expressed as

Combining the Focal Loss and the Dice Loss, the finally defined mixed Loss function expression is

L＝β·FL+(1-β)·DL·λ

Wherein beta is a superparameter for controlling a weight relationship between Focal Loss and Dice Loss, and lambda is a superparameter for balancing the problem of unbalance in order of magnitude between Focal Loss and Dice Loss function values. The beneficial effects of this step are: the problems of inconsistent proportion of samples difficult to separate and samples easy to separate and unbalanced positive and negative samples in semantic segmentation can be effectively solved.

Step S4, image segmentation: inputting the sample image to be detected into the trained network model obtained in the step S3 for reasoning and segmentation to obtain a segmentation result; the segmentation result in this step is a binarized mask image, where either 0 or 255 of its pixel value represents the category represented by the voxel in the original image. When reasoning, firstly converting the pth file of Pytorch into ONNX open neural network exchange format, and using the onnxrun engine to make reasoning.

Step S5, image post-processing: performing post-processing on the segmentation result obtained in the step S4 to obtain segmentation coordinate information of the image lines; the extraction of the lines is characterized in that ordered coordinate information of points formed by the lines is obtained.

The post-processing of the segmentation result in the step comprises the extraction of the outline of the line object, the refinement of the outline, the extraction of the line and the identification of the intersection point. Wherein the extraction of the object contour uses a findContours function provided by OpenCV; the contour refinement uses an iterative Zhang-Suen refinement algorithm; the basic method used for extracting lines and identifying crossing points is a depth-first search algorithm, and in the process of depth search, whether crossing points exist or not is judged by the number of neighborhood points of each point. Thus, the method specifically comprises the following steps:

S51, removing the noise area generated in the step S4 through contour area screening, and completing extraction of the contour of the line object;

s53, judging whether each point in the image obtained in the image S51 is a starting point of a line or not through traversing the neighborhood condition of the point;

s54, the coordinates of each point on the line are obtained by using a depth-first search algorithm, and whether the points belong to the cross points or not is judged in the depth-first search process, so that the extraction of the line and the identification of the cross points are completed.

The extraction of the lines and the identification of the intersections in step S5 uses a depth-first search algorithm. Specifically, all line endpoints are found in the image, and the endpoints are taken as starting points, so that whether a point with a gray value of 255 exists in eight adjacent areas of the point is judged. For convenience of description, the coordinates of the set point P are (x, y), and the coordinates of the points in the four directions of up, down, left and right defining P are (x, y-1), (x, y+1), (x-1, y), (x+1, y), and the coordinates of the points in the four directions of up, down, up and right are (x-1, y-1), (x-1, y+1), (x+1, y+1).

If one and only one eligible point is found in the four directions up, down, left, right, then the point is taken as the next point and the origin gray value is set to 0, the above operation is repeated with the next point as the origin; if no eligible point is found in the four directions up, down, left, right, but one and only one eligible point is found in the four directions up left, down left, up right, down right, then the point is taken as the next point and the origin gray value is set to 0, the above operation is repeated with the next point as the origin; if more than one eligible point is found in the four directions of up, down, left and right or in the four directions of up, down, up right and down left, then the intersection point is considered to be found, and the above operation should be repeated for the eligible point in the neighborhood as the start point of another line, respectively; if there are no eligible points in the eight neighborhood, then we consider traversing to the end of the line, ending the search. The beneficial effects of this step are: and S5, taking the end points and the intersection points of the lines as the starting points of the line segments to be classified independently, and combining the line segments into complete lines, so that the method can be still applicable under the condition of line intersection.

The present embodiment uses an improved depth-first search algorithm that adds the process of identifying and processing the intersection points based on the conventional search algorithm searching for points that meet the criteria. The line formation points accessed in step S54 are arranged according to the order of line growth, so that further analysis and processing of the line can be facilitated.

Step S6, combining lines: and carrying out merging operation on the lines with the endpoint distances and the endpoint vector direction angles meeting the set threshold values, merging the line parts in the set, and finally obtaining the coordinate information of points formed by all the complete line objects in the image.

In the step S6, the combination of the connected domains by the lines is improved by a combination strategy on the basis of the basic idea method of combining and searching in graph theory: as long as the same line segment appears in the two sets, the two sets can be considered to belong to one connected domain, and can be combined. With this optimization and improvement, each line segment may be used multiple times so that a common line segment may appear in different sets.

The method comprises the steps of carrying out merging operation on all the lines meeting the conditions, and carrying out merging specific operation on line parts in a set to obtain the following steps: setting n line segments obtained in step S5, respectively marked as 1-n lines, respectively for every two n lines Judging whether the communication domains belong to the same communication domain; judging whether the basis of the two lines belongs to the same connected domain is whether the vector direction angles of vectors formed by four endpoints of the two lines accord with the assumed quantity relation; storing the lines meeting the communication relation into a set independently; then judging whether common lines exist in different sets, if so, combining the two sets into one set, which indicates that the two original sets belong to the same connected domain; after the above operation is finished, m sets are obtained, wherein each set has k _i The lines indicate that m non-communicated lines exist in total, and the ith line is formed by k _i The sub-lines are formed. Next the number of sub-lines k _i And combining the sets with the number greater than 1, so that the line number of each set is 1 finally. The BresenhamLine algorithm is used in the merging process, and the function of the algorithm is to give two-dimensional coordinates of the straight line passing between two points according to any given two points. The beneficial effects of this step are: each line segment may be reused multiple times so that a common line segment may appear in different sets. Such a design is practical because in some cases the intersection after refinement is not a single point, but a small common line segment, which is then common to different connected domains.

Further, the basic basis for determining whether the lines can be combined in step S6 is to increase and determine whether the vector direction angles of the vectors formed by the four endpoints of the two lines meet the set number relationship based on determining the distance between the nearest break points of the two lines, which can be expressed as

Wherein A and B are two endpoints of the same line, C and D are two endpoints of the other line, B and C are two endpoints closest to the two lines, B 'and C' are points of n pixels inward of the B end and the C end respectively, and alpha isDirection angle, beta is->Direction angle, gamma is +.>The direction angle, min_d, is a set distance threshold. The beneficial effects of this step are: the method not only judges whether the nearest distance between two line endpoints meets the condition, but also judges whether the connection meets the growth trend of the line by using the number relation of the direction angles of the vectors on the premise that the nearest distance condition is met by supposing that the two endpoints closest to each other are connected, and can greatly reduce the false recognition rate.

The final result of the method provided by the invention, namely the coordinate information of the point formed by each line in the image, can be obtained through the steps

Example 2

In this embodiment, taking an X-Ray imaging image of a chip after packaging in the semiconductor industry as an example, fig. 3 and fig. 5 are original pictures of two semiconductors X-Ray, respectively, the method of the present invention is used to extract the leads in the chip. The purpose of extracting the lead is to calculate parameters such as length, curvature, position relation and the like of the lead.

A line object extraction method based on semantic segmentation, the steps of the method comprising:

step S1, a training set sample is constructed, an original picture and a marked label are used for generating a mask image through conversion, and then a training data set is formed by using a certain data set enhancement algorithm;

to effectively expand the number of data set samples, affine transformation, contrast transformation, and elastic deformation are used for data set enhancement. The affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, and can keep the flatness of the two-dimensional graph, namely the graph after transformation can keep the original relationship of line co-points and point co-lines unchanged, and the affine transformation can be expressed by the following mathematical expression

Wherein, the pixel coordinates on the original image are represented, the transformed pixel coordinates are respectively 6 transformation parameters, and the transformation parameters can be used for controlling operations of translation, rotation, scaling, shearing and the like. If order Then it can be recorded as

The matrix a and vector B represent the linear and translational parts of the transformation, respectively. Specifically, matrix a is a 2 x 2 matrix that controls scaling, rotation, and shearing, etc., of the linear transformation before and after transformation. Wherein the first column vectorAnd a second column vector->Scaling and rotation of the x-axis and y-axis before and after transformation, respectively, are controlled. When the values of a and e are equal and the values of b and d are zero, it means that only the equal scaling transformation is performed; when the values of a and e are equal and the values of b and d are not zero, this indicates that rotation, clipping, scaling, etc. transformations are performed. Vector B is a column vector of 2*1 that controls the translated portion after transformation. Wherein the first element c controls the amount of translation in the x-axis and the second element f controls the amount of translation in the y-axis. In practical applications, the translation operation on the image may be achieved by changing the value of the vector B.

The elastic deformation performs random affine transformation on the position and the value of each pixel point of the input image, and the transformation of each pixel point is mutually independent, so that a certain distortion effect is generated on an object of the original image and the like, and the name is elastically deformed. The idea of elastic deformation algorithm used in data set enhancement is mainly divided into three steps:

1) Generating random offset of (-1, 1) interval for each pixel point (x, y) of the input image, and using Deltax (x, y) and Deltay (x, y) to represent displacement of the pixel point in x and y directions respectively;

2) Convolutionally filtering the random offset values using a gaussian kernel knn (0, σ) with a standard normal distribution;

3) The deviation is adjusted using the magnification factor alpha and acts on the artwork.

△x＝G(σ)*(α*Rand(n,m))

△y＝G(σ)*(α*Rand(n,m))

Trans(x+△x(x,y),y+△y(x,y))＝I(j,k)

For the point I (j, k) on the original image, the Trans value is obtained after elastic deformation. The above formula shows that the amplitude of the change in elastic deformation is related to α and σ, with the larger α being the more intense the deformation, the larger σ being the less pronounced the deformation, and with smaller σ being the random movement of each pixel.

Affine transformations and elastic deformations may produce decimal numbers and thus invalid coordinates, and thus further calculate pixel values of the target location using bilinear interpolation algorithms. Since the leads in the semiconductor chip have certain deformability, namely certain bending can be generated, the new image generated by the elastic deformation means accords with the actual situation and still has reliability.

S2, constructing a semantic segmentation neural network model, wherein the convolutional neural network mainly comprises an encoder, a decoder and jump connection;

As shown in fig. 2, the neural network model used in the step S2 is mainly composed of an encoder, a decoder, and a jump connection. The jump connection combines the depth, semantic, coarse-granularity feature map from the decoder sub-network with the shallow, low-level, fine-granularity feature map from the encoder sub-network and proves that fine-granularity details of the target object can be effectively restored even in a complex background. The hopping connections in the unet++ network, which are redesigned with respect to UNet, can aggregate features of different semantic dimensions on the decoder subnetwork, resulting in a highly flexible feature fusion scheme.

Step S3, training the network model constructed in the step S2 according to the training data set of the step S1;

in this step, the Loss function used is the weighted mixed Loss function of Focal Loss and Dice Loss, and the finally defined mixed Loss function expression is

L＝β·FL+(1-β)·DL·λ

Where β is a superparameter for controlling the weight relationship between Focal Loss and Dice Loss, where β=0.7, λ is a superparameter for balancing the problem of order of magnitude imbalance between Focal Loss and Dice Loss function values, where λ=1.

S4, carrying out reasoning segmentation on the sample image to be detected by utilizing the trained network model in the step S3 and obtaining a segmentation result;

S5, carrying out certain post-processing on the segmentation result in the step S4, wherein the post-processing comprises extraction of the outline of the line object, refinement of the outline, extraction of the line and identification of the intersection point, and obtaining segmentation coordinate information of the line;

as shown in fig. 7, the post-processing in the step S5 may be divided into the following steps:

s51, removing the noise area generated in the step S4 through contour area screening;

s52, refining the line object binarization image obtained in the step S51 by using a Zhang-Suen algorithm to obtain a line area with the pixel width of 1;

s53, judging whether each point in the image obtained in the image S51 is a starting point of a line or not by traversing the neighborhood condition of the point, wherein eight neighborhood definitions of the point are shown in fig. 8;

s54, a depth-first search algorithm is used for obtaining the coordinates of each point on the line, and whether the points belong to the cross points is judged simultaneously in the depth-first search process;

further, in step S53, it is determined whether a certain point is an endpoint, by traversing the points in the image, and determining the number of points in which 9 points are not 0 in the eight neighborhood of the point and the point itself: if the number is 0, indicating that the area does not contain lines, and directly skipping; if the number is 1, indicating that isolated points exist in the area, and skipping; if the number is 2, indicating that the point is an endpoint of a certain line, and recording the point; if the number is 3, this point is indicated as being somewhere in the middle of the line, skipped.

Further, the depth-first search algorithm in step S54 is shown in fig. 8, and the specific process is as follows: taking the coordinates of the endpoint S obtained in the step S52, firstly judging whether the gray value of the endpoint is 255, if the gray value is 0, indicating that the endpoint has been accessed and juxtaposed to 0 in the previous operation, and directly skipping; if the point value is 255, the coordinates of the starting point S are assigned to the point P, and the gray value of the eight neighborhood points is judged by taking the point P as the center. The gray values of the eight neighborhood points are in the following cases:

(1) In four directions of up, down, left, and right of the point P, that is, 2, 7, 5, and 4 points in fig. 8 find one and only one point satisfying the condition (that is, the pixel gray value is 255), the point is selected as the next point N, and the gray value of the origin P is set to 0 to indicate that the point has been accessed, no access is performed later, and the above operation is repeated with the next point N as the origin P;

(2) No eligible point is found in the four directions up, down, left, right of the point P, but one and only one eligible point is found in the four directions up, down, up right, down left, i.e., 1, 3,6,8 in fig. 8, the point is selected as the next point N, and the gray value of the origin P is set to 0 to indicate that the point has been accessed, no more access is subsequently made, and the above operation is repeated with the next point N as the origin P;

(3) Finding more than one point meeting the condition in the four directions of the upper, lower, left and right of the point P or in the four directions of the upper, lower left, upper right and lower right, considering the point P as an intersection point, stopping searching the part of lines, taking the point meeting the condition in the neighborhood as the starting point of another line, and repeating the operation;

(4) If no eligible point is found in the octant, the traversal is deemed to be at the end of the line, ending the search.

Compared with the existing line extraction method, the method in the step S5 has the advantages that:

(1) The order of the points on the extracted lines is arranged according to the growth order of the lines; if traversing is performed according to the row-column sequence, the obtained points are likely not to be the growth sequence of the lines, which brings great trouble to the subsequent further analysis and processing of the line points;

(2) The intersection point on the line can be identified; the endpoints and the intersections of the lines are used as the line segments to be classified independently, and then the line segments are combined into complete lines, so that the method can normally play a role under the condition of line intersection.

S6, carrying out merging operation on all the lines meeting the conditions, merging the line parts in the set, and finally obtaining the coordinate information of all the complete line objects in the image;

The merging operation in step S6 is an optimization operation, which has the meaning that when the line object is in a complex background, and the segmentation result in step S4 has a certain flaw, or the lines have a cross, the parts of the line are already separated in step S5, and this step is required to merge the parts of the line into a complete line. It is assumed that after step S5, each line segment numbered 1 to 12 is obtained from fig. 9. It should be noted that, the line segments processed here are lines having a width of one pixel subjected to the Zhang-Suen refinement algorithm, and for convenience of illustration, refinement lines are not used in fig. 9, and it may be assumed that they are all refinement lines. Next, the 12 points 1 to 12 are judged two by two, and the basis of the judgment will be given below, whether the merging condition is satisfied. Taking fig. 9 as an example, the finally obtained lines meeting the merging condition are respectively stored in the same set, and the elements of the set A are 1 and 6, the elements of the set B are 2 and 7, the elements of the set C are 3 and 4, the elements of the set D are 4 and 5, the elements of the set E are 6 and 11, the elements of the set F are 7 and 12, the elements of the set G are 8 and 9, and the elements of the set H are 9 and 10. However, the set obtained at this time is not the final result, and it is obvious that the sets a and E both have a common element line 6, which indicates that the sets a and E belong to the same connected domain, so that the sets a and E can be further combined, the other sets are similar, and after the final combination is completed, there are 4 sets, namely, the set a elements are 1, 6 and 11, the set B elements are 2, 7 and 12, the set C elements are 3, 4 and 5, and the set D elements are 8, 9 and 10, which indicates that there are 4 independent lines in the graph, and the line a can be obtained by combining the line segments 1, 6 and 11 through the BresenhamLine algorithm. The merging method provided by the method considers that two sets belong to one connected domain only if the same line segment appears in the two sets, and can be merged.

The method of merging sets of line segments in step S6 has the outstanding advantage over the existing method in that each line segment can be reused several times, so that a common line segment can appear in different sets. Such a design is practical because in some cases the intersection after refinement is not a single point, but a small common line segment, which is then common to different connected domains.

Further, in the step S6, it is determined whether the two line segments meet the merging condition, which specifically includes: taking fig. 10 as an example, two lines, namely line 1 and line 2, are first taken, and the endpoints thereof are A, B and C, D respectively; calculating the distances A to C, A to D, B to C and B to D respectively, if the smallest one of the four distances is smaller than the set threshold value min_distance, further checking, otherwise directly judging that the conditions are not met; after the condition of the last step is met, assuming that the distances between the end points of the line 1 and the line 2 and the midpoint A and the point C are the smallest, respectively taking the point of the 20 th pixel in the end point direction, marking as A 'and C', and calculating vectors And->If- >And->The difference between the direction angles is smaller than 120 DEG and +.>And->The difference between the direction angles is less than 30 DEG,>and->If the difference between the direction angles is smaller than 30 degrees, the line 1 and the line 2 where the point A and the point C are located are considered to be the same line, the point A and the point C are connected, and if the conditions are not met, skipping is performed. Taking fig. 10 as an example, by calculation of the algorithm, it can be determined that the point a and the point C do not satisfy the merging condition, and the point D and the point E on the line 2 and the line 3 satisfy the merging condition.

In the step S6, it is determined whether the two line segments meet the merging condition, and the main innovation point is that not only is it determined whether the nearest distance between the two line end points meets the condition, but also on the premise of meeting the nearest distance condition, it is assumed that the two end points nearest to each other are connected, and the direction angle number relationship of the vector is used to determine whether such connection meets the growth trend of the line, so that the false recognition rate can be greatly reduced.

The point coordinate information on each line obtained in step S6 may be further analyzed and calculated, and the main calculation and judgment contents include: whether the positions of the lines are deviated, whether the lengths of the lines meet the requirements, whether the curvature of each point on the lines exceeds the standard, whether the overall bending of the lines is qualified, and whether the distances between the lines are qualified. Since the specific coordinates of each point on each line have been obtained in step S6 and are arranged strictly in the order of line growth, the above calculation items can be obtained by basic algebraic calculation.

After the two original pictures are subjected to the method of the present embodiment, the segmentation result diagrams as shown in fig. 4 and fig. 6 are obtained, and in summary, the problem of extracting the leads from the X-ray imaging image of the semiconductor chip into the chip is solved in the embodiment. By adopting the semantic segmentation model, all leads can be accurately extracted from the image, and the limitation that the traditional image segmentation can not extract the required detection target from the complex background is broken through. By adopting the image refinement and depth-first search method, the coordinate information of each line can be obtained from the binary image of the segmentation result, and the coordinates are arranged according to the growth direction of the line, so that the problem that certain points in the middle section of the line are likely to be accessed first when the line is simply traversed by rows or columns is solved, and the problem can greatly influence the subsequent defect detection related calculation. The method provides a post-processing measure when the semantic segmentation result has flaws, and is also suitable for scenes needing disconnection reconnection under other conditions.

It should be noted that the foregoing merely illustrates the technical idea of the present invention and is not intended to limit the scope of the present invention, and that a person skilled in the art may make several improvements and modifications without departing from the principles of the present invention, which fall within the scope of the claims of the present invention.

Claims

1. The line object extraction method based on semantic segmentation is characterized by comprising the following steps of:

S6, combining lines: and carrying out merging operation on lines with the endpoint distance and the endpoint vector direction angle conforming to the set threshold value, merging line parts in the set, and finally obtaining the coordinate information of points formed by all the complete line objects in the image.

2. A semantic segmentation-based line object extraction method as claimed in claim 1, wherein: in the step S1, constructing a training set sample, and labeling pixel categories of a sample image by using a Labelme labeling tool to obtain a label data file; converting the label data file by a program to output a corresponding mask image; and (3) carrying out horizontal and vertical overturning and rotation transformation on the sample image and the corresponding mask image simultaneously, and carrying out random brightness, contrast and elastic deformation treatment to form a training data set.

3. A semantic segmentation based line object extraction method as claimed in claim 2, wherein: the affine transformation in the data set enhancement mode in the step S1 is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, and the sample enhancement is performed from the whole image, specifically:

Order theThen it can be recorded as

4. A semantic segmentation based line object extraction method as claimed in claim 2, wherein: the contrast ratio conversion in the data set enhancement mode in the step S1 is performed on a single pixel, and specifically expressed by a formula x '=αx+β, where x' is a gray value after the pixel conversion, x is a gray value before the pixel conversion, and α and β determine the degrees of the converted contrast ratio and brightness, respectively.

5. A semantic segmentation based line object extraction method as claimed in claim 2, wherein: the elastic deformation in the data set enhancement mode in the step S1 is performed on a morphological scale, and is specifically expressed by using a formula Trans (x+Δx (x, y), y+Δy (x, y))=i (j, k), and for a point I (j, k) on the original image, a Trans value is obtained after elastic deformation, wherein random offset of (-1, 1) intervals is generated for each pixel point (x, y) of the input image, and displacement of the pixel point in x and y directions is expressed by using Δx (x, y) and Δy (x, y), respectively.

6. A line object extraction method based on semantic segmentation as set forth in claim 3, 4 or 5, wherein: in the step S3 model training process, the loss function used is as follows: l=β·fl+ (1- β) ·dl·λ, where FL is a Focal Loss function; DL is a Dice Loss function; beta is a super parameter for controlling the weight relation between Focal Loss and Dice Loss; lambda is a superparameter for balancing the order of magnitude imbalance between the Focal Loss and the Dice Loss function values.

7. The semantic segmentation-based line object extraction method as set forth in claim 6, wherein: the step S5 specifically includes:

s53, judging whether each point in the image obtained in the image S51 is a starting point of a line or not by traversing the neighborhood condition of the point, specifically: traversing the points in the image, and judging the number of the points with 9 pixel values which are not 0 in the eight neighborhood of the point and the point: if the number is 0, indicating that the area does not contain lines, and directly skipping; if the number is 1, indicating that isolated points exist in the area, and skipping; if the number is 2, indicating that the point is an endpoint of a certain line, and recording the point; if the number is 3, the point is indicated to be in a certain part in the middle of the line and skipped;

8. The semantic segmentation-based line object extraction method as set forth in claim 7, wherein: the basic basis for determining whether the lines can be combined in step S6 is to increase and determine whether the vector direction angles of the vectors formed by the four endpoints of the two lines meet the set number relationship based on determining the nearest breakpoint distance of the two lines, which can be expressed as

Wherein A and B are two endpoints of the same line, C and D are two endpoints of the other line, B and C are two endpoints closest to the two lines, B 'and C' are points of n pixels inward of the B end and the C end respectively, and alpha isDirection angle, beta isDirection angle, gamma is +.>The direction angle, min_d, is a set distance threshold.

9. The line object extraction method based on semantic segmentation as set forth in claim 8, wherein: step S6, combining two line segments into one line, wherein an interpolation method is used for generating a connecting line at a part of the gap between the two lines, calculating coordinate points of a fitting straight line between two end points, which are closest to the two lines, by using a Bresenhalline method based on interpolation, and combining three point sets of the two line point sets and the fitted interpolation point set into one point set according to the growth sequence of the lines; the calculation formula of the bresenhalline method is as follows:

And performing rounding optimization on the calculated result.