CN115272691A - Training method, recognition method and equipment for steel bar binding state detection model - Google Patents

Training method, recognition method and equipment for steel bar binding state detection model Download PDF

Info

Publication number
CN115272691A
CN115272691A CN202210604702.5A CN202210604702A CN115272691A CN 115272691 A CN115272691 A CN 115272691A CN 202210604702 A CN202210604702 A CN 202210604702A CN 115272691 A CN115272691 A CN 115272691A
Authority
CN
China
Prior art keywords
sample
steel bar
image
bar binding
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210604702.5A
Other languages
Chinese (zh)
Inventor
张雷
董国梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Civil Engineering and Architecture
Original Assignee
Beijing University of Civil Engineering and Architecture
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Civil Engineering and Architecture filed Critical Beijing University of Civil Engineering and Architecture
Priority to CN202210604702.5A priority Critical patent/CN115272691A/en
Publication of CN115272691A publication Critical patent/CN115272691A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The application provides a training method, a recognition method and equipment of a steel bar binding state detection model, and relates to the technical field of image recognition. The method comprises the steps of obtaining a sample data set, wherein the sample data set comprises: the method comprises the following steps that a plurality of sample steel bar binding point state images are obtained, and state information of corresponding binding points is marked in each sample steel bar binding point state image in advance; carrying out image enhancement processing on the sample data set; and performing model training by adopting the sample data set after image enhancement to obtain a reinforcement binding state detection model. Therefore, the reinforcement state detection model is accurately and efficiently trained, and the efficiency of reinforcement state recognition is improved.

Description

Training method, recognition method and equipment for steel bar binding state detection model
Technical Field
The invention relates to the field of image recognition, in particular to a training method, a recognition method and equipment for a steel bar binding state detection model.
Background
The binding operation of the reinforcing steel bars is one of the important processes of the reinforced concrete construction process, and the working target of the binding operation is to bind a reinforcing steel bar frame for the reinforced concrete components such as a constructional column, a cantilever beam and the like in advance. Due to the fact that the number of the cross points is large (hundreds of cross points can be obtained per square meter), workers need to repeatedly bend over for a long time, labor intensity is high, and the problem of serious physical strain is easily caused. The construction robot is adopted as a robot for replacing workers to operate in a special environment, has the advantages of high working efficiency, strong repeatability, strong construction environment adaptability and the like, and becomes an important direction for upgrading the construction industry.
In order to ensure that the construction robot completes reinforcement work, firstly, the construction robot needs to accurately identify reinforcement points. However, in the prior art, the problems of low efficiency of recognizing the state of the steel bar binding point and the like exist.
Disclosure of Invention
The invention aims to provide a training method, a recognition method and equipment of a steel bar binding state detection model aiming at the defects in the prior art, so as to solve the problems of low efficiency and the like of recognition of steel bar binding point states in the prior art.
In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:
in a first aspect, an embodiment of the present application provides a method for training a reinforcement binding state detection model, where the method includes:
obtaining a sample data set, the sample data set comprising: the method comprises the following steps that a plurality of sample steel bar binding point state images are obtained, and state information of corresponding binding points is marked in each sample steel bar binding point state image in advance;
performing image enhancement processing on the sample data set;
and performing model training by adopting the sample data set after image enhancement to obtain the reinforcement binding state detection model.
Optionally, the reinforcement bar binding state detection model includes: the method comprises the following steps of performing model training by adopting a sample data set after image enhancement processing to obtain the steel bar binding state detection model, wherein the model training comprises the following steps:
performing feature extraction on the state image of each sample steel bar binding point by adopting the feature extraction network to obtain sample image features of multiple scales;
processing the sample image characteristics of the minimum scale in the multiple scales by adopting the pooling network to obtain sample semantic information;
processing the sample image characteristics of at least two target scales in the multiple scales by adopting the characteristic fusion network to obtain sample positioning information corresponding to the at least two target scales, and fusing the sample positioning information and the sample semantic information to obtain fused sample characteristics;
processing the fusion sample characteristics by adopting the prediction network to obtain a prediction result of a binding point state corresponding to each sample steel bar binding point state image;
calculating a loss function value according to the prediction result and the mark state information corresponding to the state image of each sample steel bar binding point;
and modifying the model parameters of the steel bar binding state detection model according to the loss function values, and performing model training again until a preset iteration stop condition is reached.
Optionally, the steel bar binding state detection model further includes: and a clustering network, wherein before the characteristic extraction network is adopted to extract the characteristics of the state image of each sample steel bar binding point to obtain the characteristics of the sample images with multiple scales, the method further comprises the following steps:
processing the steel bar binding point state image of each sample by adopting the clustering network to obtain a steel bar binding point region image in the steel bar binding point state image of each sample;
the method for extracting the characteristics of the state image of each sample steel bar binding point by adopting the characteristic extraction network to obtain the characteristics of the sample images with a plurality of scales comprises the following steps:
and extracting the characteristics of the regional image of the steel bar binding point in the state image of each steel bar binding point of the sample by adopting the characteristic extraction network to obtain the characteristics of the sample images of multiple scales.
Optionally, the feature extraction network comprises: the feature extraction layers of the multiple scales are connected in sequence, wherein the feature extraction layers of the at least two target scales in the multiple scales comprise: at least one feature extraction module; the output of the last feature extraction module in the at least one feature extraction module is the output of the corresponding feature extraction layer.
Optionally, in the steel bar binding state detection model, an input convolution layer is arranged at an input end of the pooling network, and the input convolution layer is a depth separable convolution;
before the step of processing the sample image features of the minimum scale in the multiple scales by using the pooling network to obtain the sample semantic information, the method further comprises the following steps:
performing convolution processing on the sample image features of the minimum scale by using the input convolution layer to obtain the sample image features of the minimum scale after convolution processing;
the processing the sample image feature of the minimum scale in the multiple scales by using the pooling network to obtain sample semantic information comprises the following steps:
and processing the sample image features of the minimum scale after convolution processing by adopting the pooling network to obtain the sample semantic information.
Optionally, in the steel bar binding state detection model, an output convolution layer is further arranged at an output end of the pooling network, and the output convolution layer is a depth separable convolution;
before the fusing the sample positioning information and the sample semantic information to obtain the fused sample characteristics, the method further includes:
processing the sample semantic information by adopting the output convolution layer to obtain the sample semantic information after convolution processing;
the fusing the sample positioning information and the sample semantic information to obtain fused sample characteristics comprises:
and fusing the sample positioning information and the sample semantic information after convolution processing to obtain the fused sample characteristics.
Optionally, the performing image enhancement processing on the sample data set includes:
performing at least one image enhancement process on the sample data set; the at least one image enhancement process comprises: at least one of horizontal flipping, rotation, adding noise information, brightness adjustment, and color adjustment.
In a second aspect, an embodiment of the present application provides a binding point state identification method, where the method includes:
acquiring an acquired image of a steel bar structure, wherein the steel bar structure is built in advance by adopting a plurality of steel bars;
and obtaining a steel bar binding state detection model by adopting any training in the first aspect, and processing the collected image to obtain the binding state of each binding point in the steel bar structure.
In a third aspect, an embodiment of the present application provides a training apparatus, including: the training processor is connected with the training storage medium through bus communication, the training storage medium stores program instructions executable by the training processor, and the training processor calls the program stored in the training storage medium to execute the steps of the training method for the reinforcement state detection model in the first aspect.
In a fourth aspect, an embodiment of the present application provides an identification device, including: the identification processor is connected with the identification storage medium through bus communication, the identification storage medium stores program instructions executable by the identification processor, and the identification processor calls the program stored in the identification storage medium to execute the steps of the binding dot state identification method in the second aspect.
Compared with the prior art, the method has the following beneficial effects:
the embodiment of the application provides a training method, an identification method and equipment for a steel bar binding state detection model, wherein the method comprises the following steps of obtaining a sample data set, wherein the sample data set comprises: the method comprises the following steps that a plurality of sample steel bar binding point state images are obtained, and state information of corresponding binding points is marked in each sample steel bar binding point state image in advance; carrying out image enhancement processing on the sample data set; and performing model training by adopting the sample data set after image enhancement to obtain a reinforcement binding state detection model. Therefore, the steel bar binding state detection model is accurately and efficiently trained, and the efficiency of steel bar binding state identification is improved conveniently.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flow chart of a training method of a steel bar binding state detection model provided in the present application;
FIG. 2A is a schematic flow chart of a method for training a steel bar binding state detection model using a sample data set after image enhancement processing according to the present application;
FIG. 2B is a schematic structural diagram of a YOLOv4 model;
FIG. 2C is a schematic diagram of a backbone feature extraction network (CSP-Darknet 53);
FIG. 2D is a schematic diagram of a spatial pyramid pooling network (SPP);
FIG. 2E is a schematic diagram of a Path Aggregation Network (PAN);
fig. 3 is a schematic flow chart of a sample data clustering method according to an embodiment of the present application;
FIG. 4 is a schematic flow diagram of an input convolution method for a pooled network;
fig. 5A is a schematic flowchart of an output convolution method of a pooling network according to an embodiment of the present application;
fig. 5B is a schematic structural diagram of a modified lightweight model based on the YOLOv4 model provided in the present application;
fig. 6 is a schematic flowchart of a binding point state identification method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a training apparatus for a reinforcement bar binding state detection model according to an embodiment of the present disclosure;
fig. 8 is a schematic view of a binding dot state recognition device provided in an embodiment of the present application;
FIG. 9 is a schematic diagram of a training apparatus provided in an embodiment of the present application;
fig. 10 is a schematic diagram of an identification device according to an embodiment of the present application.
Icon: 701-acquisition module, 702-enhancement module, 703-training module, 801-acquisition module, 802-recognition module, 901-training processor, 902-training storage medium, 1001-recognition processor and 1002-recognition storage medium.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are only used to distinguish one description from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
In the construction work, the reinforcement bar binding work is one of the important processes of the reinforced concrete construction process, and the work target is to bind a reinforcement frame for the reinforced concrete members such as the constructional column, the cantilever beam and the like in advance. Due to the fact that the number of the cross points is large (hundreds of cross points can be obtained per square meter), workers need to repeatedly bend over for a long time, labor intensity is high, and the problem of serious physical strain is easily caused. The construction robot is adopted as a robot for replacing workers to operate in a special environment, has the advantages of high working efficiency, strong repeatability, strong construction environment adaptability and the like, and becomes an important direction for upgrading the construction industry.
In order to ensure that the construction robot completes the steel bar binding operation, firstly, the construction robot needs to accurately identify a steel bar binding point. In order to ensure that a construction robot finishes steel bar binding operation, the application provides a training method, a recognition method and equipment of a steel bar binding state detection model.
As follows, a method for training a reinforcement state detection model provided by the present application will be explained first by way of a specific example. Fig. 1 is a schematic flow chart of a training method of a reinforcement bar binding state detection model provided in the present application. As shown in fig. 1, the method includes:
s101, acquiring a sample data set.
When the sample data set is obtained, a camera can be used for shooting the steel bar binding points at different angles and distances, after the blurred or incomplete image of the binding point position is removed, a plurality of pictures are selected, the sizes of the pictures are set to be uniform (for example, the size is 512 multiplied by 512 pixels, and the image format is jpg), and the sample data set is formed. Illustratively, the camera may be a monocular camera, a binocular camera.
Wherein the sample data set comprises: and the state images of the plurality of sample steel bar binding points are marked with the state information of the corresponding binding points in advance. When the sample data set is manufactured, the states of the steel bar binding points in the image need to be annotated with labels. Reinforcement bar tie points are divided into two categories, tied points and unbounded points, annotated using computer software (e.g., labelImg software).
Illustratively, the banded point tags are annotated as binding, the unbounded point tags are annotated as bar _ cross, the formats of the tags are unified as PASCAL VOC2007 and stored in a (.xml) file format, and the (.xml) files correspond to the images in the data set one by one to form a sample data set together for subsequent recognition of the banded point state of the reinforcing steel bar.
And S102, carrying out image enhancement processing on the sample data set.
The method aims to prevent the overfitting problem caused by the fact that the sample capacity of a data set is too small in target detection and improve adaptability to complex environments on site. The data enhancement method can be adopted to carry out image enhancement processing on the sample data set, expand the original image to more sheets and manufacture the sample data set after the image enhancement processing. The sample data set is expanded in multiple directions, and the model training is more accurate.
And S103, performing model training by using the sample data set subjected to image enhancement processing to obtain a reinforcement binding state detection model.
In the embodiment, when the reinforcement bar binding point state is identified, the transfer learning pre-training model is adopted, the gradient descent convergence efficiency is improved, the gradient disappearance is avoided, and the model parameters are obtained by training the sample data set subjected to image enhancement for multiple times.
Illustratively, the model training of the present embodiment may be performed 100 times of iterative training, the first 50 times of iterative training freeze part of the pre-trained model network with a learning rate set to 0.001, the Batchsize set to 16, the last 50 times of iterative training unfreeze part of the pre-trained model network with a learning rate set to 0.0001, and the Batchsize set to 8. The model training is efficiently and accurately completed, and the steel bar binding state detection model is obtained.
To sum up, the training method for the steel bar binding state detection model provided by the embodiment of the application obtains a sample data set, wherein the sample data set comprises: the method comprises the following steps that a plurality of sample steel bar binding point state images are obtained, and state information of corresponding binding points is marked in each sample steel bar binding point state image in advance; carrying out image enhancement processing on the sample data set; and performing model training by adopting the sample data set after image enhancement to obtain a reinforcement binding state detection model. Therefore, the steel bar binding state detection model is accurately and efficiently trained, and the efficiency of steel bar binding state identification is improved conveniently.
Fig. 2A is a schematic flowchart of a method for training a reinforcement binding state detection model by using a sample data set after image enhancement processing according to the present application. The steel bar binding state detection model comprises: feature extraction networks, pooling networks, feature fusion networks, and prediction networks.
The steel bar binding state detection model in the present application will be explained below by taking the steel bar binding state detection model as the YOLOv4 model as an example. It should be noted that, the present application does not limit the steel bar binding state detection model, and the model that can complete the steel bar binding state detection is all within the protection scope of the present application. The YOLOv4 model is a target detection model, and fig. 2B is a schematic structural diagram of the YOLOv4 model. As shown in fig. 2b, the yolovv4 model includes: the method comprises a main feature extraction network (CSP-Darknet 53) (the main feature extraction network comprises model input), a Neck module (Neck comprises a space pyramid pooling network (SPP) and a Path Aggregation Network (PAN)), and a prediction network (YOLO Head). When a target detection picture is input, feature information is extracted by a trunk feature extraction network, the neck module performs repeated feature extraction on the output of the trunk extraction network, semantic information and positioning information are fused to realize multi-scale target detection, and the position, the category and the confidence coefficient of a target detection image are predicted by a prediction network. To accommodate various data sets, the input to YOLOv4 has two formats, 416 × 416 × 3, 608 × 608 × 3.
Fig. 2C is a schematic structural diagram of a main feature extraction network (CSP-dark net 53). Referring to fig. 2c, the CSP-dark net53 is divided into 5 residual blocks based on dark net53, and the residual unit in each residual block is composed of CSP blocks. The CSP module is composed of two branches: the trunk branch comprises a CBM module and a reset module, and the branch comprises the CBM module, so that the cross-stage layer structure splicing of the characteristic diagram is realized, the calculated amount is reduced, and the accuracy is improved.
The hack module includes a spatial pyramid pooling network (SPP) and a Path Aggregation Network (PAN) in order to make feature extraction more sufficient. Fig. 2D is a schematic structural diagram of a spatial pyramid pooling network (SPP). As shown in fig. 2D, the spatial pyramid pooling network (SPP) replaces the original 1 largest pooling layer with a combination of pooling layers of 4 scales, and the feature map is spatially pyramid pooled by the largest pooling layers of different scales with pooling kernels of 13 × 13, 7 × 7, 5 × 5, and 1 × 1, so as to obtain feature layer information of different scales and effectively increase the receptive field of the network model.
Fig. 2E is a schematic structural diagram of a Path Aggregation Network (PAN). As shown in fig. 2E, the path aggregation network PAN adds a bottom-up feature pyramid structure on the basis of the FPN. And the FPN module performs up-sampling on the feature map and transmits deep semantic information to the shallow network. Although high-level semantic information is transmitted, the shallow feature information is lost after being transmitted by a large number of network layers, and the prediction of small objects is influenced. Therefore, the PAN structure convolves and downsamples the feature map, and performs feature integration with the feature map of the upper level in the FPN. By repeatedly extracting the features, the network feature extraction capability is enhanced. The PAN utilizes twice characteristic pyramid module operation, improves the utilization rate of semantic information and positioning information, and is beneficial to the detection of objects of various scales by a network.
Specifically, as shown in fig. 2A, the performing model training by using the sample data set after the image enhancement processing in S103 to obtain the reinforcement state detection model includes:
s201, extracting the characteristics of the state image of each sample steel bar binding point by adopting a characteristic extraction network to obtain the characteristics of the sample images with multiple scales.
The feature extraction network may be a CSP-Darknet53 network in the YOLOv4 model. And (3) repeatedly extracting the characteristics of the state image of each sample steel bar binding point by the CSP-Darknet53 network to obtain the characteristics of the sample images of multiple scales. In the CSP-Darknet53 network, the network is divided into 5 residual modules, the residual units in each residual block can realize the cross-stage layer structure splicing of the sample image characteristics, and the accuracy is improved while the calculated amount is reduced.
For example, when the size of the input sample image is 416 × 416 × 3, by performing feature extraction through the feature extraction network, sample image features of multiple scales with output sizes of 52 × 52 × 256, 26 × 26 × 512, and 13 × 13 × 1024 may be obtained, and the number of sample images of each type of size may be multiple.
S202, processing the sample image characteristics of the minimum scale in the multiple scales by adopting a pooling network to obtain sample semantic information.
The pooled network may be a pyramid pooled structured network (SPP) in the YOLOv4 model. The pyramid pooling structure network comprises 4 kinds of pooling layers, and the 4 kinds of pooling cores can be 13 × 13, 7 × 7, 5 × 5 and 1 × 1 in size. And the sample graph is subjected to spatial pyramid pooling formed by the maximum pooling layers with different scales, so that the characteristic layer information with different scales is obtained, and the receptive field of the network model is effectively increased.
One output of the feature extraction network is an input of the pooling network, and the feature extraction network outputs sample image features of a plurality of scales, and a sample image of a smallest scale of the plurality of scales is input to the pooling network. And processing the sample image characteristics of the minimum scale in the multiple scales by adopting a pooling network to obtain sample semantic information. The sample semantic information includes: color, texture, shape, etc. of the sample image.
S203, processing the sample image characteristics of at least two target scales in the multiple scales by adopting a characteristic fusion network to obtain sample positioning information corresponding to the at least two target scales, and fusing the sample positioning information and the sample semantic information to obtain fusion sample characteristics.
The feature fusion network may be a path aggregation network in the YOLOv4 model. A Path Aggregation Network (PAN) includes: a bottom-up and top-down characteristic pyramid structure. And the path aggregation network performs up-sampling on the sample image and transmits the semantic information of the sample to the shallow network. However, the shallow feature information is lost after being transmitted through a large number of network layers, which affects the prediction of small objects. Therefore, the path aggregation network also convolutes and downsamples the sample image to obtain the sample positioning information. And performing feature integration with the sample semantic information. By repeatedly extracting the features, the network feature extraction capability is enhanced. The path aggregation network utilizes twice characteristic pyramid module operation, improves the utilization rate of semantic information and positioning information, and is beneficial to the detection of the network on objects with various scales.
For example, when the input sample image size is 416 × 416 × 3, through the pooling network, the feature fusion network, and the fusion of the sample positioning information and the sample semantic information, the sample image features of multiple scales with output sizes of 52 × 52 × 256, 26 × 26 × 512, and 13 × 13 × 1024 may be obtained, and the number of sample images of each type of size may be multiple.
And S204, processing the fusion sample characteristics by adopting a prediction network to obtain a prediction result of the binding point state corresponding to the reinforcing steel bar binding point state image of each sample.
The prediction network may be a prediction network in the YOLOv4 model (YOLO Head). The output of the feature fusion network is the input of the prediction network.
Exemplarily, taking the sample image features of multiple scales with output sizes of 52 × 52 × 256, 26 × 26 × 512, and 13 × 13 × 1024 of the feature fusion network as an example, the prediction network may predict results of three feature maps of different scales, and the three feature maps are classified into 9 size prior frames on set grid points, and the prior frames are decoded to determine whether there is a target and a target type in the prior frames, and then the prior frames are adjusted according to the position parameters to obtain prediction frames, and then the prediction frames are screened according to the set confidence threshold to obtain final prediction frames.
Specifically, each mesh will get 3 prediction boxes, and the prediction box results include location (4 dimensions), confidence (1 dimension), category (N dimensions). The calculation formula of the center coordinates and width and height of the prediction box is shown in the following formula (1):
Figure RE-RE-GDA0003853705240000111
bx, by, bw and bh respectively represent the horizontal and vertical coordinates and the width and height of the center of the prediction frame, cx and cy are the horizontal and vertical offsets of the grid where the center of the measured target is located relative to the origin of the upper left corner of the image, tx and ty are the horizontal and vertical coordinate offsets of the center point of the prediction frame, σ (tx) and σ (ty) represent the normalization results of tx and ty, pw and ph are the width and height of the prior frame, and tw and th are the scaling ratios of the width and the height of the prediction frame.
The probability of whether the target is in the grid can be expressed by confidence, and the calculation formula of the confidence is shown as the following formula (2):
Figure RE-RE-GDA0003853705240000121
wherein, pr (Object) represents whether the Object is in the grid, if the Object exists, pr (Object) is 1, otherwise, 0, IOU represents the intersection ratio of the real box and the predicted box.
In this embodiment, since the image size of the data set is 512 × 512, to increase the operation speed, the input size in the YOLOv4 model is 416 × 416 × 3, and a total of 10647 bounding boxes is generated after prediction, which is 52 × 52 × 3+26 × 26 × 3+13 × 13 × 3. And obtaining a corresponding confidence score for each bounding box, and screening the prediction box with the highest score so as to finish the identification.
And S205, calculating a loss function value according to the prediction result and the mark state information corresponding to the state image of each sample steel bar binding point.
And for each sample reinforcement bar binding point state image, a corresponding prediction result and original mark state information exist. And comparing the prediction result with the original mark state information to obtain a calculated loss function value.
S206, modifying the model parameters of the steel bar binding state detection model according to the loss function values, and performing model training again until a preset iteration stop condition is reached.
Model parameters of corresponding types are preset in the steel bar binding state detection model. And obtaining corresponding model parameter values according to the loss function values. And model training is carried out again to obtain a plurality of groups of corresponding loss function values and model parameter values. And fitting the plurality of groups of corresponding loss function values and model parameter values to obtain a steel bar binding state detection model until a preset iteration stop condition is reached. Exemplarily, the preset iteration stop condition may be setting a preset iteration number, and stopping the iteration if the iteration number exceeds the preset iteration number; a preset fitting threshold may also be set, and if the degree of fit is higher than the fitting threshold, the iteration is stopped.
In summary, in this embodiment, a feature extraction network is adopted to perform feature extraction on the state image of each sample reinforcement binding point, so as to obtain sample image features of multiple scales; processing the sample image characteristics of the minimum scale in the multiple scales by adopting a pooling network to obtain sample semantic information; processing sample image features of at least two target scales in the multiple scales by adopting a feature fusion network to obtain sample positioning information corresponding to the at least two target scales, and fusing the sample positioning information and sample semantic information to obtain fusion sample features; processing the characteristics of the fusion sample by adopting a prediction network to obtain a prediction result of a binding point state corresponding to the state image of the binding point of the reinforcing steel bar of each sample; calculating a loss function value according to the prediction result and the marking state information corresponding to the state image of each sample steel bar binding point; and modifying the model parameters of the steel bar binding state detection model according to the loss function values, and performing model training again until a preset iteration stop condition is reached. Therefore, the preset steel bar binding state detection model is trained for multiple times to obtain an accurate steel bar binding state detection model.
Fig. 3 is a schematic flowchart of a sample data clustering method according to an embodiment of the present application. As shown in fig. 3, the steel bar binding state detection model further includes: and (5) clustering the network. In S201, before the feature extraction network is used to perform feature extraction on the state image of each sample steel bar binding point to obtain sample image features of multiple scales, the method may further include:
s301, processing each sample steel bar binding point state image by adopting a clustering network to obtain a steel bar binding point region image in each sample steel bar binding point state image.
As the corresponding detection targets of the sample images acquired by the method are small targets, in order to avoid the influence of the original prior frame on the detection performance of the network, the clustering network is utilized to cluster the prior frames of the sample image data set.
Illustratively, the clustering network may be a K-means clustering algorithm. The partitioning method of the K-means clustering algorithm comprises the following steps:
(1) The initial clustering center is any k data in the sample image data; (2) Dividing the samples into classes with the minimum distance according to Euclidean distances between the clustering centers and each data sample Xi; (3) Recalculating the clustering center for each class C, and correcting the clustering center; (4) And after multiple times of calculation, the clustering center is stable and has the minimum distance with each data in the cluster, and clustering is completed. When the clustering algorithm is applied to a target detection model to obtain a prior frame, the Euclidean distance is changed into the similarity distance between a labeling frame and each clustering center prior frame, and the calculation formula is shown as the following formula (3):
d(center,other)=1-IOU(center,other) (3)
the center represents a prior box of the clustering center, the other represents a prior box to be divided, and the IOU represents the intersection ratio between the two prior boxes. The recalculation of the cluster center is to calculate the average value of the width and the height of each labeling box in each class, and the calculation formula is shown as the following formula (4):
Figure RE-RE-GDA0003853705240000141
wherein, Xi、YiRespectively representing the width and height of the clustering center after the i-th class is re-clustered, NiFor the number of labeled boxes in the ith class, xij and yij respectively represent the width and height of the jth standard box in the ith class.
And (3) obtaining the clustering center of each type again through the calculation formula, and repeating the steps (2) and (3) until the clustering center is not changed any more, wherein the width and the height of the obtained clustering center are the width and the height of a priori frame, namely the reinforcement bar binding point area image.
In S201, a feature extraction network is used to perform feature extraction on the state image of each sample reinforcement binding point to obtain sample image features of multiple scales, including:
s302, extracting the characteristics of the steel bar binding point region image in each sample steel bar binding point state image by adopting a characteristic extraction network to obtain sample image characteristics of multiple scales.
And (4) obtaining a steel bar binding point region image in the steel bar binding point state image of each sample through clustering. Furthermore, characteristic special regions do not need to be carried out on the whole sample steel bar binding point state image, and only the steel bar binding point region image in each sample steel bar binding point state image needs to be subjected to characteristic extraction, so that sample image characteristics of multiple scales are obtained. The efficiency and the precision of feature extraction are improved.
In summary, in this embodiment, a clustering network is used to process each sample steel bar binding point state image to obtain a steel bar binding point region image in each sample steel bar binding point state image; and (3) extracting the characteristics of the reinforcing steel bar binding point region image in each sample reinforcing steel bar binding point state image by adopting a characteristic extraction network to obtain sample image characteristics of multiple scales. Thus, the efficiency and accuracy of feature extraction are improved.
On the basis of the embodiment shown in fig. 2A, the feature extraction network comprises: the feature extraction layers of a plurality of scales that connect gradually, wherein, the feature extraction layer of at least two target scales in a plurality of scales includes: at least one feature extraction module; the output of the last feature extraction module in the at least one feature extraction module is the output of the corresponding feature extraction layer.
For example, when the size of the input sample image is 416 × 416 × 3, the sizes of the feature extraction layers passing through the plurality of scales may be 52 × 52 × 256, 26 × 26 × 512, and 13 × 13 × 1024, and the number of sample images of each type of size may be plural. The feature extraction layer having the size of 52 × 52 × 256 and 26 × 26 × 512 may include: at least one feature extraction module; the output of the last feature extraction module of the plurality of feature extraction modules of size 52 x 256 may be the output of the corresponding feature extraction layer.
Furthermore, the identification speed is improved to adapt to the steel bar binding work while the high identification accuracy is ensured. In order to improve the accuracy and speed of feature extraction, on the basis of the CSP-dark net53 in the YOLOv4 model, the CSP-dark net53 network is changed into the MobileNetv2 network in the embodiment, so as to realize network lightweight and reduce the network model parameters. For example, feature layers at the 6 th, 13 th and 17 th layers of the MobileNetv2 network can be adopted to replace the feature layers at the last three layers of the CSP-dark net53 as the feature layer output of the feature extraction network.
The conventional Convolution is replaced by a depth Separable Convolution (Depthwise Separable Convolution) in the MobileNetv2 network, the depth Convolution is carried out on channels one by one, and then the point-by-point Convolution is carried out on all the channels, so that the scale of network parameters is greatly reduced.
For conventional convolution, if the input layer is Df × M, the convolution uses N convolution kernels with the size Dk × M to obtain the output layer with the size Dp × N, and the calculation amount is shown in the following formula (5):
Figure RE-RE-GDA0003853705240000151
the calculation formula of the corresponding model parameter number is shown in the following formula (6):
Figure RE-RE-GDA0003853705240000152
for the depth separable convolution, if the input layer is Df × M, firstly, the N Dk × Dk × 1 convolution kernels are used for depth convolution, the output is Dp × M, secondly, the N1 × 1 × M convolution kernels are used for point-by-point convolution, the channel is subjected to feature integration, and finally, a Dp × N feature map is obtained. The calculated amount is shown in the following formula (7):
Figure RE-RE-GDA0003853705240000161
the formula for calculating the corresponding model parameters is shown in the following equation (8):
Figure RE-RE-GDA0003853705240000162
the deep separable convolution can effectively solve the problems of high calculation complexity, low recognition efficiency and the like caused by more network model channels. Deep convolution divides input into groups, each group is subjected to convolution, only the internal information of each group is subjected to feature extraction, but the information between channels is shielded, so point-by-point convolution is required to supplement, the information of each channel is exchanged, and all feature information of an input layer is extracted.
By introducing two structures of reversed residual errors (inversed Residuals) and Linear Bottlenecks (Linear botttlenecks), the characterization capability of the network is improved, and simultaneously, a Linear activation function is introduced for replacing the information loss phenomenon caused by the ReLU activation function. The convolution operation of the inverse residual structure is ascending dimension first and descending dimension later, which is opposite to the residual structure. By performing the inverse convolution operation of the 1 x 1 convolution ascending dimension, more image features can be obtained, and the parameter and the calculation amount of the network are reduced.
In summary, in this embodiment, the feature extraction network includes: the feature extraction layers of a plurality of scales that connect gradually, wherein, the feature extraction layer of at least two target scales in a plurality of scales includes: at least one feature extraction module; the output of the last feature extraction module in the at least one feature extraction module is the output of the corresponding feature extraction layer. Therefore, accuracy and speed of feature extraction are improved, and recognition speed is improved to adapt to reinforcement binding work while high recognition accuracy is guaranteed.
Fig. 4 is a flow chart of an input convolution method of a pooling network. In the steel bar binding state detection model, an input convolution layer is arranged at the input end of the pooling network and is a depth separable convolution.
As shown in fig. 4, before the step S202 of processing the sample image feature of the smallest scale among the multiple scales by using the pooling network to obtain the sample semantic information, the method further includes:
s401, carrying out convolution processing on the sample image features of the minimum scale by adopting the input convolution layer to obtain the sample image features of the minimum scale after the convolution processing.
On the basis that the above-mentioned pooling network may be a pyramid pooling structure network (SPP) in the YOLOv4 model. An input convolutional layer is arranged at the input end of the spatial pyramid pooling network SPP, and 1 convolution of the original input convolutional layer with the convolution of 3 x 3 is replaced by the convolution capable of being separated in depth. The problem that the calculated quantity is increased due to large convolution kernel caused by 3 x 3 convolution is solved, and the identification speed of the model is increased by increasing the input convolution calculation speed. And carrying out convolution processing on the sample image features of the minimum scale by adopting depth separable convolution input into the convolution layer to obtain the sample image features of the minimum scale after the convolution processing.
In S202, processing the sample image feature of the smallest scale among the multiple scales by using the pooling network to obtain sample semantic information, including:
s402, processing the sample image features of the minimum scale after convolution processing by adopting a pooling network to obtain sample semantic information.
Further, after the depth separable convolution processing is carried out on the sample image characteristic of the minimum scale, the sample image characteristic of the minimum scale after the depth separable convolution processing is obtained. And finishing the input of the sample image features of the minimum scale, and processing the sample image features of the minimum scale after convolution processing by adopting a pooling network to obtain sample semantic information.
In summary, in this embodiment, the input convolution layer is used to perform convolution processing on the sample image feature of the minimum scale, so as to obtain the sample image feature of the minimum scale after convolution processing; and processing the sample image characteristics of the minimum scale after convolution processing by adopting a pooling network to obtain sample semantic information. Therefore, the identification speed of the model is improved by improving the input convolution calculation speed.
Fig. 5A is a schematic flowchart of an output convolution method of a pooling network according to an embodiment of the present disclosure. In the steel bar binding state detection model, an output convolution layer is further arranged at the output end of the pooling network, and the output convolution layer is a depth separable convolution.
As shown in fig. 5A, before the fusing the sample positioning information and the sample semantic information in S203 to obtain the fused sample feature, the method further includes:
s501, processing the sample semantic information by adopting the output convolution layer to obtain the sample semantic information after convolution processing.
On the basis that the above-mentioned pooling network may be a pyramid pooling structure network (SPP) in the yollov 4 model. An output convolutional layer is arranged at the output end of the spatial pyramid pooling network SPP, and 1 convolution of 3 × 3 in the original input convolutional layer is replaced by adopting a depth separable convolution. The problem that the calculated quantity is increased due to the fact that convolution kernel becomes large due to 3 x 3 convolution is solved, and the identification speed of the model is improved by improving the calculating speed of output convolution. And carrying out convolution processing on the sample semantic information by adopting the depth separable convolution in the output convolution layer to obtain the sample semantic information after the convolution processing.
In S203, the fusion of the sample positioning information and the sample semantic information to obtain a fusion sample feature includes:
s502, fusing the sample positioning information and the sample semantic information after the convolution processing to obtain fusion sample characteristics.
Further, after the sample semantic information is subjected to deep separable convolution processing to obtain the sample semantic information subjected to the deep separable convolution processing. The pooling network can output the sample semantic information after the deep separable convolution processing to the feature fusion network. And the feature fusion network fuses the sample positioning information and the sample semantic information after the convolution processing to obtain the fusion sample features.
In summary, in this embodiment, the output convolution layer is adopted to process the sample semantic information, so as to obtain the sample semantic information after convolution processing; and fusing the sample positioning information and the sample semantic information after the convolution processing to obtain the fused sample characteristics. Therefore, the identification speed of the model is improved by improving the output convolution calculation speed.
On the basis of all the embodiments, the present application provides a lightweight model improved based on the YOLOv4 model. Fig. 5B is a schematic structural diagram of a lightweight model improved based on the YOLOv4 model provided in the present application. As shown in fig. 5B, changing the CSP-DarkNet53 network to a MobileNetv2 network, 13 × 3 convolution of the original input convolutional layer in the input convolutional layer and the output convolutional layer of the spatial pyramid pooling network SPP is replaced with a deep separable convolution. Therefore, the network is light in weight, and the parameter quantity of the network model is reduced.
On the basis of the embodiment described in fig. 1, the image enhancement processing on the sample data set in S102 includes:
performing at least one image enhancement treatment on the sample data set; the at least one image enhancement process includes: at least one of horizontal flipping, rotation, adding noise information, brightness adjustment, and color adjustment.
By enhancing the images, the original data set is effectively expanded, training sample data is increased, the generalization capability of the model is increased, and the improvement of the anti-interference capability of the model is facilitated by adding noise.
By way of example, the present embodiment mainly performs image processing in the following ways:
(1) Horizontally overturning: flipping has two ways, namely flipping the image in the horizontal and vertical directions. The vertical flipping may change the property of the object, and has a certain limitation, so the horizontal flipping may be used to enhance the image.
(2) Rotating: the rotation comprises a clockwise rotation mode and a counterclockwise rotation mode, the orientation of the target in the image can be changed by the rotation angle, and then the image position of the target is changed, so that the enhancement effect is achieved.
(3) Adding noise information: noise information is introduced into the image of the data set, so that the anti-interference performance of the model can be improved. For example, the noise information may be gaussian noise information.
(4) And (3) brightness adjustment: luminance variation by changing the pixel values, image brightness variation is achieved. In the embodiment, the brightness is changed randomly, so that the image is closer to the real illumination condition of the construction environment, and the capability of the model for adapting to different environmental scenes is further improved.
(5) Color adjustment: the color change realizes the color change of the image by adjusting three parameters, namely hue, saturation and brightness, in the HSV color model, and further improves the generalization of the model to the color change.
In summary, in this embodiment, at least one type of image enhancement processing is performed on the sample data set; at least one image enhancement process includes: horizontal flip, rotation, adding noise information, brightness adjustment, and color adjustment. Therefore, the generalization capability of the model is enhanced, and the training precision of the model is improved.
In summary, in this embodiment, at least one type of image enhancement processing is performed on the sample data set; the at least one image enhancement process includes: at least one of horizontal flipping, rotation, adding noise information, brightness adjustment, and color adjustment. Therefore, the generalization capability of the model is increased, and the anti-interference capability of the model is improved.
Fig. 6 is a schematic flow chart of a binding point state identification method provided in an embodiment of the present application. As shown in fig. 6, the method includes:
s601, acquiring an acquired image of a steel bar structure, wherein the steel bar structure is built by adopting a plurality of steel bars in advance.
The steel bar structure is arranged and ligature many reinforcing bars with horizontal and vertical crisscross mode interval, forms the component rack, commonly called "steel reinforcement cage", is convenient for retrain the concrete, reinforcing element's wholeness.
The image of the steel bar structure is collected through the camera, and the collected image of the steel bar structure is obtained so as to be used for binding point state recognition in the steel bar structure.
And S602, obtaining a steel bar binding state detection model by adopting any training, and processing the collected image to obtain the binding state of each binding point in the steel bar structure.
On the basis of all the embodiments, a steel bar binding state detection model is obtained through training, and the collected image is processed by adopting the steel bar binding state detection model to obtain the binding state of each binding point in the steel bar structure. The collected image is identified by the aid of the reinforcement binding state detection model obtained through training, and compared with an existing identification method, the method has the advantages of being high in accuracy and identification speed.
In summary, according to the binding point state identification method provided in the embodiments of the present application, a collected image of a steel bar structure is obtained, the steel bar structure is a structure built by a plurality of steel bars in advance, a steel bar binding state detection model is obtained by adopting any training in the above all embodiments, and the collected image is processed to obtain a binding state of each binding point in the steel bar structure. Therefore, the recognition accuracy and the recognition speed are improved.
The following describes a training device, equipment, a storage medium, and the like for executing the reinforcement state detection model provided by the present application, and specific implementation processes and technical effects thereof are referred to above and will not be described again below.
Fig. 7 is a schematic diagram of a training device for a reinforcement bar binding state detection model according to an embodiment of the present application, and as shown in fig. 7, the training device includes:
an acquisition module 701, configured to acquire a sample data set, where the sample data set includes: and a plurality of sample steel bar binding point state images, wherein the state image of each sample steel bar binding point is marked with the state information of the corresponding binding point in advance.
An enhancement module 702, configured to perform image enhancement processing on the sample data set.
And the training module 703 is configured to perform model training by using the sample data set after the image enhancement processing, so as to obtain a steel bar binding state detection model.
Optionally, the training module 703, specifically configured to the reinforcement bar binding state detection model, includes: the method comprises the following steps of performing model training by adopting a sample data set subjected to image enhancement processing to obtain a reinforcement binding state detection model, wherein the model training comprises the following steps:
performing feature extraction on the state image of each sample steel bar binding point by adopting a feature extraction network to obtain sample image features of multiple scales; processing the sample image characteristics of the minimum scale in the multiple scales by adopting a pooling network to obtain sample semantic information; processing sample image features of at least two target scales in the multiple scales by adopting a feature fusion network to obtain sample positioning information corresponding to the at least two target scales, and fusing the sample positioning information and sample semantic information to obtain fusion sample features; processing the fusion sample characteristics by adopting a prediction network to obtain a prediction result of a binding point state corresponding to each sample reinforcing steel bar binding point state image; calculating a loss function value according to the prediction result and the mark state information corresponding to the state image of each sample steel bar binding point; and modifying the model parameters of the steel bar binding state detection model according to the loss function values, and performing model training again until a preset iteration stop condition is reached.
Optionally, the training module 703 is specifically configured to further include: the clustering network adopts a feature extraction network to extract the features of the state image of each sample steel bar binding point, and before the features of the sample images of multiple scales are obtained, the method further comprises the following steps: processing the steel bar binding point state image of each sample by adopting a clustering network to obtain a steel bar binding point region image in the steel bar binding point state image of each sample; by adopting a feature extraction network, carrying out feature extraction on the state image of each sample steel bar binding point to obtain sample image features of multiple scales, and the method comprises the following steps: and (3) extracting the characteristics of the steel bar binding point region image in each sample steel bar binding point state image by adopting a characteristic extraction network to obtain the sample image characteristics of multiple scales.
Optionally, the training module 703, specifically configured to the feature extraction network, includes: the feature extraction layers of a plurality of scales that connect gradually, wherein, the feature extraction layer of at least two target scales in a plurality of scales includes: at least one feature extraction module; the output of the last feature extraction module in the at least one feature extraction module is the output of the corresponding feature extraction layer.
Optionally, the training module 703 is specifically used in the steel bar binding state detection model, where an input convolution layer is provided at an input end of the pooling network, and the input convolution layer is a depth separable convolution; before the sample image features of the minimum scale in the multiple scales are processed by adopting the pooling network to obtain the sample semantic information, the method further comprises the following steps: performing convolution processing on the sample image features of the minimum scale by adopting an input convolution layer to obtain the sample image features of the minimum scale after the convolution processing; processing the sample image characteristics of the minimum scale in the multiple scales by adopting a pooling network to obtain sample semantic information, wherein the processing comprises the following steps: and processing the sample image characteristics of the minimum scale after convolution processing by adopting a pooling network to obtain sample semantic information.
Optionally, the training module 703 is specifically used in the steel bar binding state detection model, and an output convolution layer is further disposed at an output end of the pooling network, where the output convolution layer is a depth separable convolution; before the sample positioning information and the sample semantic information are fused to obtain the fused sample characteristics, the method further comprises the following steps: processing the sample semantic information by adopting an output convolution layer to obtain the sample semantic information after convolution processing; fusing the sample positioning information and the sample semantic information to obtain fused sample characteristics, wherein the fused sample characteristics comprise: and fusing the sample positioning information and the sample semantic information after the convolution processing to obtain the fused sample characteristics.
Optionally, the enhancing module 702 is specifically configured to perform image enhancement processing on the sample data set, and includes: performing at least one image enhancement treatment on the sample data set; the at least one image enhancement process includes: at least one of horizontal flipping, rotation, adding noise information, brightness adjustment, and color adjustment.
The following describes a binding point state identification apparatus, a device, a storage medium, and the like provided by the present application for execution, and specific implementation processes and technical effects thereof are referred to above and will not be described again below.
Fig. 8 is a schematic view of a binding point state recognition device provided in an embodiment of the present application, where the binding point state recognition device includes:
the acquiring module 801 is configured to acquire an acquired image of a steel bar structure, where the steel bar structure is a structure built by a plurality of steel bars in advance.
And the recognition module 802 is configured to obtain a steel bar binding state detection model by using any one of the above training methods, and process the collected image to obtain a binding state of each binding point in the steel bar structure.
Fig. 9 is a schematic diagram of an exercise device according to an embodiment of the present application, where the exercise device may be a device with a computing processing function.
The training apparatus includes: a training processor 901 and a training storage medium 902. The training processor 901 and the training storage medium 902 are connected via a bus.
The training storage medium 902 is used to store a program, and the training processor 901 calls the program stored in the training storage medium 902 to execute the above-described method embodiments. The specific implementation and technical effects are similar, and are not described herein again.
Fig. 10 is a schematic diagram of an identification device according to an embodiment of the present application, where the identification device may be a device with a computing processing function.
The identification device includes: an identification processor 1001 identifies a storage medium 1002. The identification processor 1001 and the identification storage medium 1002 are connected via a bus.
The identification storage medium 1002 is used to store a program, and the identification processor 1001 calls the program stored in the identification storage medium 1002 to execute the above-described method embodiments. The specific implementation and technical effects are similar, and are not described herein again.
Optionally, the present invention also provides a program product, for example a computer-readable storage medium, comprising a program which, when being executed by a processor, is adapted to carry out the above-mentioned method embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims (10)

1. A training method of a reinforcement bar binding state detection model is characterized by comprising the following steps:
obtaining a sample data set, the sample data set comprising: the method comprises the following steps that a plurality of sample steel bar binding point state images are obtained, and state information of corresponding binding points is marked in each sample steel bar binding point state image in advance;
performing image enhancement processing on the sample data set;
and performing model training by adopting the sample data set after image enhancement to obtain the reinforcement binding state detection model.
2. The method of claim 1, wherein the reinforcement bar binding state detection model comprises: the method comprises the following steps of performing model training by adopting a sample data set subjected to image enhancement processing to obtain the reinforcement bar binding state detection model, wherein the model training comprises the following steps:
performing feature extraction on the state image of each sample steel bar binding point by adopting the feature extraction network to obtain sample image features of multiple scales;
processing the sample image characteristics of the minimum scale in the multiple scales by adopting the pooling network to obtain sample semantic information;
processing the sample image characteristics of at least two target scales in the multiple scales by adopting the characteristic fusion network to obtain sample positioning information corresponding to the at least two target scales, and fusing the sample positioning information and the sample semantic information to obtain fused sample characteristics;
processing the fusion sample characteristics by adopting the prediction network to obtain a prediction result of a binding point state corresponding to each sample steel bar binding point state image;
calculating a loss function value according to the prediction result and the mark state information corresponding to the state image of each sample steel bar binding point;
and modifying the model parameters of the steel bar binding state detection model according to the loss function value, and performing model training again until a preset iteration stop condition is reached.
3. The method of claim 2, wherein the reinforcement bar binding state detection model further comprises: and a clustering network, wherein before the characteristic extraction network is adopted to extract the characteristics of the state image of each sample steel bar binding point to obtain the characteristics of the sample images with multiple scales, the method further comprises the following steps:
processing the steel bar binding point state image of each sample by adopting the clustering network to obtain a steel bar binding point region image in the steel bar binding point state image of each sample;
the method for extracting the characteristics of the state image of each sample steel bar binding point by adopting the characteristic extraction network to obtain the characteristics of the sample images with multiple scales comprises the following steps:
and extracting the characteristics of the steel bar binding point region image in the state image of each sample steel bar binding point by adopting the characteristic extraction network to obtain the characteristics of the sample images with multiple scales.
4. The method of claim 2, wherein the feature extraction network comprises: the feature extraction layers of the multiple scales are connected in sequence, wherein the feature extraction layers of the at least two target scales in the multiple scales comprise: at least one feature extraction module; the output of the last feature extraction module in the at least one feature extraction module is the output of the corresponding feature extraction layer.
5. The method of claim 2, wherein in the reinforcement bar binding state detection model, an input convolution layer is arranged at an input end of the pooling network, and the input convolution layer is a deep separable convolution;
before the step of processing the sample image feature of the smallest scale in the multiple scales by using the pooling network to obtain the sample semantic information, the method further comprises:
performing convolution processing on the sample image features of the minimum scale by using the input convolution layer to obtain the sample image features of the minimum scale after convolution processing;
the processing the sample image feature of the smallest scale in the multiple scales by using the pooling network to obtain sample semantic information comprises:
and processing the sample image features of the minimum scale after convolution processing by adopting the pooling network to obtain the sample semantic information.
6. The method of claim 2, wherein in the steel bar binding state detection model, an output convolution layer is further arranged at the output end of the pooling network, and the output convolution layer is a depth separable convolution;
before the fusing the sample positioning information and the sample semantic information to obtain the fused sample characteristics, the method further includes:
processing the sample semantic information by adopting the output convolution layer to obtain the sample semantic information after convolution processing;
the fusing the sample positioning information and the sample semantic information to obtain fused sample characteristics comprises:
and fusing the sample positioning information and the sample semantic information after convolution processing to obtain the fused sample characteristics.
7. The method of claim 1, wherein said image enhancement processing of said sample data set comprises:
performing at least one image enhancement process on the sample data set; the at least one image enhancement process comprises: at least one of horizontal flipping, rotation, adding noise information, brightness adjustment, and color adjustment.
8. A binding point state identification method is characterized by comprising the following steps:
acquiring an acquired image of a steel bar structure, wherein the steel bar structure is built by adopting a plurality of steel bars in advance;
a steel bar binding state detection model is obtained by adopting the training of any one of the claims 1 to 7, and the collected image is processed to obtain the binding state of each binding point in the steel bar structure.
9. An exercise apparatus, comprising: a training processor and a training storage medium, wherein the training processor and the training storage medium are connected through bus communication, the training storage medium stores program instructions executable by the training processor, and the training processor calls the program stored in the training storage medium to execute the steps of the training method of the reinforcement bar binding state detection model according to any one of claims 1 to 7.
10. An identification device, comprising: an identification processor, an identification storage medium, said identification processor and said identification storage medium being connected by bus communication, said identification storage medium storing program instructions executable by said identification processor, said identification processor calling the program stored in said identification storage medium to perform the steps of the ligature point state identification method according to claim 8.
CN202210604702.5A 2022-05-30 2022-05-30 Training method, recognition method and equipment for steel bar binding state detection model Pending CN115272691A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210604702.5A CN115272691A (en) 2022-05-30 2022-05-30 Training method, recognition method and equipment for steel bar binding state detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210604702.5A CN115272691A (en) 2022-05-30 2022-05-30 Training method, recognition method and equipment for steel bar binding state detection model

Publications (1)

Publication Number Publication Date
CN115272691A true CN115272691A (en) 2022-11-01

Family

ID=83760272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210604702.5A Pending CN115272691A (en) 2022-05-30 2022-05-30 Training method, recognition method and equipment for steel bar binding state detection model

Country Status (1)

Country Link
CN (1) CN115272691A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092012A (en) * 2023-03-06 2023-05-09 安徽数智建造研究院有限公司 Video stream-based steel bar binding procedure monitoring method and monitoring device
CN116189115A (en) * 2023-04-24 2023-05-30 青岛创新奇智科技集团股份有限公司 Vehicle type recognition method, electronic device and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092012A (en) * 2023-03-06 2023-05-09 安徽数智建造研究院有限公司 Video stream-based steel bar binding procedure monitoring method and monitoring device
CN116189115A (en) * 2023-04-24 2023-05-30 青岛创新奇智科技集团股份有限公司 Vehicle type recognition method, electronic device and readable storage medium

Similar Documents

Publication Publication Date Title
CN110084292B (en) Target detection method based on DenseNet and multi-scale feature fusion
CN110738207B (en) Character detection method for fusing character area edge information in character image
WO2020228446A1 (en) Model training method and apparatus, and terminal and storage medium
CN107944450B (en) License plate recognition method and device
CN109840556B (en) Image classification and identification method based on twin network
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN109960742B (en) Local information searching method and device
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
WO2022033076A1 (en) Target detection method and apparatus, device, storage medium, and program product
CN115272691A (en) Training method, recognition method and equipment for steel bar binding state detection model
CN111461213B (en) Training method of target detection model and target rapid detection method
CN110991444B (en) License plate recognition method and device for complex scene
CN110222718B (en) Image processing method and device
CN109118473A (en) Angular-point detection method, storage medium and image processing system neural network based
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
CN112256899B (en) Image reordering method, related device and computer readable storage medium
CN112001931A (en) Image segmentation method, device, equipment and storage medium
CN115578590A (en) Image identification method and device based on convolutional neural network model and terminal equipment
CN112668675B (en) Image processing method and device, computer equipment and storage medium
CN112364709A (en) Cabinet intelligent asset checking method based on code identification
CN111738069A (en) Face detection method and device, electronic equipment and storage medium
CN109583584B (en) Method and system for enabling CNN with full connection layer to accept indefinite shape input
CN111814562A (en) Vehicle identification method, vehicle identification model training method and related device
CN112989919B (en) Method and system for extracting target object from image
CN115171011A (en) Multi-class building material video counting method and system and counting equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination