CN114429578A

CN114429578A - Method for inspecting ancient architecture ridge beast decoration

Info

Publication number: CN114429578A
Application number: CN202210109004.8A
Authority: CN
Inventors: 侯妙乐; 纪宇航; 董友强; 栗怡豪; 郝务宸; 孙晨曦
Original assignee: Beijing University of Civil Engineering and Architecture
Current assignee: Beijing University of Civil Engineering and Architecture
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-03

Abstract

The invention discloses an inspection method for ancient architecture backbone decorations, which comprises the following steps: obtaining a training sample set which comprises a large number of spine images marked with spine targets; inputting the ridge image into a convolutional neural network model for training, and acquiring an optimal detection feature map corresponding to the ridge image from the convolutional neural network model, wherein the convolutional neural network model comprises a polymerization convolutional layer module for extracting fine-grained features of the ridge image and a multi-size convolutional prediction module for extracting an attention mechanism of ridge image semantic information; unmanned aerial vehicle patrols and examines the ancient building backbone decoration regularly according to the point location of patrolling and examining of setting for and the route, patrols and examines the point location at every and all shoots a set of backbone image, uses the convolutional neural network model that the training was accomplished to detect a set of backbone image of shooting, carries out contrastive analysis and judges whether the backbone decoration takes place the damage to the testing result of many stages. The method realizes the rapid automatic inspection of the large-range ridge beast decoration.

Description

Method for inspecting ancient architecture ridge beast decoration

Technical Field

The invention relates to the field of detection of ancient building decorations. More specifically, the invention relates to a method for inspecting the ancient building ridge animal decoration.

Background

The ridge beasts are decorative members arranged on the ancient building ridge, play roles in stabilizing the ridge, suppressing fire, preventing rain and dispersing thunder, contain rich symbolic meanings, and have important artistic and historical values. Due to the influence of factors such as earthquake, rain erosion and the like, the ancient architecture backbone decoration piece is lost or damaged, the aesthetic value of the whole architecture is damaged, and precious cultural relic resources are lost. The method is used for carrying out daily routing inspection on the back beast decoration parts, and counting the types and the quantity of the back beast decoration parts, so that damaged components can be found and repaired in time, and the method has important significance for daily management and maintenance of ancient buildings. At present, the inspection mode aiming at the spine part is mainly to periodically send workers to inspect, rely on human eyes to visually interpret, manually count the type and the number of the spine part, and record the position of a damaged or missing part. And then updating and recording information in the database management system according to the polling report. Because ancient building communities have many buildings, large range and large area, the conventional routing inspection frequency is high (for example, 1 time/week). The existing manual inspection method has certain limitations: firstly, the labor cost is high, a large amount of personnel are required to be invested to participate in the routing inspection work, the workers are trained, the ancient building knowledge is established for the workers, and the labor cost is high; secondly, the operation efficiency is low, ridge beasts are located at the higher elevation positions of the ancient buildings and are easily influenced by sightseeing of tourists, and the visual inspection operation efficiency is not high; and thirdly, when the information updating is slow, manual statistics is carried out, and the information updating is slow.

Disclosure of Invention

An object of the present invention is to solve at least the above problems and/or disadvantages and to provide at least the advantages described hereinafter.

The invention also aims to provide an ancient building ridge animal decoration inspection method, which realizes the rapid automatic inspection of the ridge animal decoration in a large range and can automatically judge and position the missing and damaged ridge animal decoration.

To achieve these objects and other advantages in accordance with the purpose of the invention, there is provided a method for inspecting a decorating part of a spinal animal of an ancient architecture, comprising:

obtaining a training sample set, wherein the training sample set comprises a large number of spine images marked with spine targets;

inputting the spine image into a convolutional neural network model for training, and acquiring an optimal detection feature map corresponding to the spine image from the convolutional neural network model, wherein the convolutional neural network model comprises: the system comprises a polymerization convolutional layer module and a multi-size convolutional prediction head module, wherein the polymerization convolutional layer module consists of n convolutional layers, the multi-size convolutional prediction head module is integrated into an attention mechanism, n is an integer larger than or equal to 1, the polymerization convolutional layer module is used for extracting fine-grained features of a ridge image, and the multi-size convolutional prediction head module is used for extracting semantic information of the ridge image;

unmanned aerial vehicle patrols and examines the ancient building backbone decoration regularly according to the point location of patrolling and examining of setting for and the route, patrols and examines the point location at every and all shoots a set of backbone image, uses the convolutional neural network model that the training was accomplished to detect a set of backbone image of shooting, carries out contrastive analysis and judges whether the backbone decoration takes place the damage to the testing result of many stages.

Preferably, in the method for inspecting the ancient building ridge animal decoration piece, in the polymeric convolution layer modules, the first convolution layer module is a basic convolution layer, and the second convolution layer module to the nth convolution layer module have uniform structures and respectively comprise a convolution layer with a step length of 2 and a residual block formed by connecting a plurality of deep polymerization convolutions in a layer jumping manner; the specific processing procedure of the deep aggregation convolution is as follows:

step S1, inputting characteristic diagram T_inputIs H isWXM, using a filter bank D_wRespectively inputting a characteristic diagram T_inputThe two-dimensional convolution is independently calculated on the M channels in a one-to-one correspondence manner to obtain a characteristic diagram T_d(ii) a H is the height of the input characteristic diagram, W is the width of the input characteristic diagram, and M is the channel number of the input characteristic diagram; wherein the filter bank D_wThe filter comprises M filters, and the size of each filter is 3 multiplied by 3;

step S2, using K sliding window with sliding step length of 1, respectively in the characteristic diagram T_dApplying a summation function SUM () to each central pixel on the M channels, and calculating the algebraic SUM of elements in the range to obtain a feature map T_sumTo input a feature map T_inputAnd characteristic diagram T_sumSplicing in channel dimension to obtain a characteristic diagram T_concat；

Step S3, along the depth direction, using a filter P_wCalculating a feature map T_concatThe point convolution depth-aggregation feature of (1), wherein the filter P_wThe size of (a) is 1 × 1;

step S4, repeating step S3F times to obtain an output characteristic diagram T_outputH is multiplied by W by F.

Preferably, in the method for inspecting the ancient architecture ridge animal decoration piece, the polymer convolutional layer module is processed in the following specific steps: when the ridge image sequentially passes through the first convolution layer module, the second convolution layer module, the third convolution layer module and the fourth convolution layer module in the polymerization convolution layer module till the nth convolution layer module, the resolution of the input ridge image is sequentially reduced to 1/2, and meanwhile, the number of channels is sequentially expanded to 2 times, so that the fine-grained characteristic of the ridge image is obtained.

Preferably, in the method for inspecting the ancient architecture spinal animal decoration, the multi-size convolution prediction head module includes an SE module for performing a compressing operation and an exciting operation, and a multi-size convolution structure module for fusing information of different area ranges, and the multi-size convolution structure module includes: the system comprises two standard convolutions of 1 multiplied by 1 and 3 multiplied by 3 which are connected in parallel, a splicing module for splicing feature maps output by the two standard convolutions which are connected in parallel, and two 3 multiplied by 3 standard convolutions of which the filter quantity is respectively C and 2C which are sequentially connected in series.

Preferably, in the method for inspecting the historic building ridge animal decoration piece, the specific processing process of extracting the ridge animal image semantic information by using the multi-size convolution premeasuring head module is as follows:

the characteristic diagram output by the nth convolutional layer module is input into the SE module for processing, and then is processed by the multi-size convolution structure module to obtain a first prediction output tensor, wherein the characteristic diagram convolved by 3 × 3 standard with the quantity of 2C filters is output by the two-dimensional convolutional layer with bias prediction to output the first prediction output tensor, the characteristic diagram convolved by 3 × 3 standard with the quantity of C filters is sent to a first up-sampling convolution module, and the resolution is expanded by using a near interpolation function;

splicing the feature map processed by the first up-sampling convolution module with the feature map output by the (n-1) th convolution module, inputting the spliced feature map into the SE module for processing, and processing by the multi-size convolution structure module to obtain a second prediction output tensor, wherein the feature map subjected to the 3 × 3 standard convolution with the filter number of 2C outputs the second prediction output tensor through the two-dimensional convolution layer with the offset, the feature map subjected to the 3 × 3 standard convolution with the filter number of C is sent to the second up-sampling convolution module, and the adjacent interpolation function is used for expanding the resolution;

splicing the feature map processed by the second up-sampling convolution module with the feature map output by the (n-2) th convolution module, inputting the spliced feature map into the SE module for processing, and then processing by the multi-size convolution structure module to obtain a third prediction output tensor, wherein the feature map subjected to the 3 × 3 standard convolution with the filter number of 2C outputs a third prediction output tensor through the two-dimensional convolution layer with the offset;

and after the obtained first predicted output tensor, the second predicted output tensor and the third predicted output tensor are processed by a position back calculation and non-maximum suppression algorithm, obtaining an optimal prediction result, namely extracting the type and the position of the image of the spinal animal.

Preferably, in the method for inspecting the ancient building ridge animal decoration piece, n takes a value of 6 in the polymeric convolution layer module, the first convolution layer module expands the number of channels of the input ridge animal image from 3 to 16, and the number of channels processed by the second convolution layer module to the sixth convolution layer module is 32, 64, 128, 256 and 512 in sequence; the number of residual blocks in the second convolutional layer module is 1, the number of residual blocks in the third convolutional layer module is 2, the number of residual blocks in the fourth convolutional layer module and the fifth convolutional layer module is 4, and the number of residual blocks in the sixth convolutional layer module is 2; in step S2, K is 3.

Preferably, in the method for inspecting the ancient building spinal animal decoration piece, the compression operation in the SE module adopts a global average pooling operation, and the excitation operation is performed by the following steps: and the feature map subjected to the global average pooling operation sequentially passes through the first full connection layer, the Relu excitation function, the second full connection layer and the Sigmoid excitation function, and is multiplied by the input feature map matrix before the global average pooling operation, so that the redistribution of the image features is realized.

Preferably, the training sample set comprises a fixed-point aerial image of the ancient architecture backbone, and an undamaged ancient architecture backbone image obtained from internet source data; using a marking tool to mark 14 categories of spinal animal decoration pieces in a spinal animal image, including fairy, dragon, phoenix, lion, sea horse, heaven horse, be improperly familiar with fish, a Suan, an adian, a bullfight, a Leishu, a drop, a rhynchus and a sleeve.

Preferably, the ancient building backbone decoration inspection method, the specific process of carrying out contrastive analysis on the detection results of multiple stages and judging whether the backbone decoration is damaged or not is as follows:

if more than 70% of ridge image detection results are the same after a group of ridge images shot at an inspection point are detected by a trained convolutional neural network model, determining that the inspection point is stable, and keeping the result as the detection result of the inspection point in the period;

should patrol and examine the testing result of point location with this cycle and should patrol and examine the testing result of point location with last cycle and compare, specifically do: and calculating a cross ratio of each ridge animal decoration part sequentially serving as the detection result of the previous period and the detection result of the current period for matching, if the cross ratio is not less than a set threshold value, successfully matching, updating the detection state of the inspection point ridge animal decoration part, and if the cross ratio is less than the set threshold value, failing to match and carrying out damage early warning.

Preferably, before the ancient architecture ridge animal decoration inspection method is used for inputting the ridge animal image into a convolutional neural network model for training, the sizes of 9 anchor frames of the input ridge animal image are determined through a K-Means clustering algorithm, the size of the input ridge animal image is 640 x 640, and the image with the irregularity is unified through a scaling method.

The invention at least comprises the following beneficial effects:

the utility model provides an unmanned aerial vehicle patrols and examines the point location and the airline regularly to ancient building ridge beast decoration on a large scale according to patrolling and examining of setting for, patrol and examine the point location at every and all gather many clear ground ridge beast images, the convolutional neural network model that uses the training to accomplish detects the discernment to the ridge beast image of gathering, acquire the stable testing result of ridge beast decoration, and extract the kind and the quantity of ridge beast decoration, then compare the testing result of multistage and then judge whether the ridge beast decoration takes place the damage, the quick automation of having realized ridge beast decoration on a large scale patrols and examines.

Secondly, the deep polymerization convolution is adopted, a residual block structure in the polymerization convolution layer module is constructed, and the constructed convolution neural network model learning spine image has more different characteristics.

Thirdly, in the multi-size convolution prediction head module, firstly, an SE module is utilized to model a multi-scale characteristic diagram channel relation, then a multi-size convolution structure is adopted behind the SE module, namely 1 × 1 and 3 × 3 standard convolutions are connected in parallel to fuse multi-scale characteristics, after splicing the characteristic diagrams output by the two standard convolutions connected in parallel, the semantic information is further extracted by connecting in series two 3 × 3 standard convolutions with the filter quantity of C and 2C respectively.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

FIG. 1 is a schematic diagram of a structural relationship of a convolutional neural network model according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method for inspecting the ancient building spinal animal decoration piece according to one embodiment of the invention;

FIG. 3 is a schematic diagram of a point location and a flight path for unmanned aerial vehicle inspection according to an embodiment of the present invention;

FIG. 4 is a diagram of the correspondence of classes of spinal animals in one embodiment of the present invention;

FIG. 5 is a schematic structural diagram of the second to sixth convolutional layer modules according to one embodiment of the present invention;

FIG. 6 is a diagram illustrating a process of calculating a deep aggregate convolution according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a process for expanding the receptive field in accordance with one embodiment of the present invention;

FIG. 8 is a diagram illustrating the relationship of a multi-size convolution structure module according to one embodiment of the present invention;

FIG. 9 is a diagram illustrating the relationship of the SE modules in accordance with one embodiment of the present invention;

FIG. 10 is a diagram illustrating the relationship of an upsampling convolution module in accordance with one embodiment of the present invention;

FIG. 11 is a graph of the detection results of using a trained convolutional neural network model in one embodiment of the present invention;

FIG. 12(a) is a diagram illustrating the results of prior art detection using the method of YOLOv 3;

FIG. 12(b) is a diagram showing the results of the detection using the method according to the embodiment of the present invention;

FIG. 13(a) is a diagram illustrating the results of prior art detection using the YOLOv3 method;

FIG. 13(b) is a diagram showing the results of the detection using the method of the embodiment of the present invention.

Detailed Description

The present invention is further described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description.

As shown in fig. 1-2, an embodiment of the invention provides a method for inspecting historic building ridge animal decoration parts, which comprises the following steps:

s100, obtaining a training sample set, wherein the training sample set comprises a large number of spine image labeled with a spine target.

The convolutional neural network model needs a large number of data samples as a basis for training, and the optimal detection effect can be achieved by optimizing model parameters. Therefore, the embodiment of the invention adopts rich spine data sets such as actual aerial images and internet multi-source data to construct the training sample set of the spine images.

The specific steps for acquiring the data set are as follows:

a. acquiring an actual aerial image: and adjusting the optimal lens angle to shoot video data in a segmented manner along the direction of a black arrow shown in the figure 3, analyzing the frame number and the frame rate of the video stream in a segmented manner, and selecting a single-frame image as a candidate sample at a fixed interval.

b. Acquiring Internet source data: combining a plurality of keywords, capturing related images in batch from an open source gallery and a search engine, and selecting an undamaged ancient architecture spine image conforming to a format (JPEG/PNG/BMP) as a candidate sample.

c. Accurately marking the ridge animal target: accurately marking the target of the spinal animal in each candidate sample by using a marking tool, wherein the types and the numbers of the spinal animals are shown in fig. 4, and 14 types of spinal animal decoration parts are marked: fairy, dragon, phoenix, lion, sea horse, heaven, be improperly familiar with fish, Suan Ni, xiezhi, fighter, rowshi, plumb, rhynchus and mullet.

S101, inputting the ridge image into a convolutional neural network model for training, and acquiring an optimal detection feature map corresponding to the ridge image from the convolutional neural network model, wherein the convolutional neural network model comprises: the system comprises a polymerization convolution layer module and a multi-size convolution prediction module, wherein the polymerization convolution layer module is composed of n convolution layer modules, the multi-size convolution prediction module is integrated into an attention mechanism, n is an integer larger than or equal to 1, the polymerization convolution layer module is used for extracting fine-grained features of the ridge image, and the multi-size convolution prediction module is used for extracting semantic information of the ridge image.

Before inputting the ridge image into the convolutional neural network model for training, determining the sizes of 9 anchor frames of the input ridge image through a K-Means clustering algorithm, wherein the size of the input ridge image is 640 multiplied by 640, and the scaling method is used for unifying irregular images. The specific treatment process is as follows: calculating a scaling coefficient of 640 by taking the longest edge of the image as a reference edge to scale the image, and filling the image into a regular square by using RGB (128, 128 and 128); the combined data augmentation technology is expanded, and the specific method comprises the following steps: carrying out mirror image transformation; random offset in the horizontal or vertical direction; color dithering within the HSV color space.

When the method is specifically implemented, when the convolutional neural network model is adopted to train the spine image, the Adam optimizer is used to train the convolutional neural network model of the spine image, and the set parameters are respectively as follows: the initial learning Rate was set to 0.0001, beta _1 was 0.9, beta _2 was 0.999, the learning Rate Decay Rate (Decay Rate) was 0.5, 100 rounds of training (Epoch), and the model parameters at the lowest loss were saved as the final state.

In specific implementation, n is preferably 6, and the polymeric convolutional layer module includes 6 convolutional layer modules. The convolutional layer module can be used for extracting image features, and the multilayer convolutional layer module can obtain a deeper feature map, so that the six convolutional layer modules are adopted to extract fine-grained features of the image in the embodiment of the invention.

The attention mechanism, which is a series of attention distribution coefficients, i.e. a series of weight parameters, can be used to emphasize or select important information of the target processing object and suppress some irrelevant detailed information. The prediction head can be used for detecting targets with different scales, so that the multi-size convolution prediction head module added with the attention mechanism can enable image features to have directivity, can also be fused with the multi-scale features, and can extract semantic information. Therefore, the spine image is input into the convolutional neural network model for training, and the optimal detection feature map corresponding to the spine image can be obtained from the convolutional neural network model.

S102, the unmanned aerial vehicle regularly patrols and examines the ancient building backbone decoration according to the point location of patrolling and examining of setting for and the route, all shoots a set of backbone image at every point location of patrolling and examining, uses the convolutional neural network model that the training was accomplished to detect a set of backbone image of shooting, carries out contrastive analysis and judges whether the backbone decoration takes place the damage to the testing result of many stages.

During concrete implementation, before unmanned aerial vehicle patrols and examines, need the initialization to patrol and examine point location and flight route, patrol and examine the design of point location and corresponding flight route as shown in fig. 3, given a simple example, the square is the backbone position on ancient building roof in the picture, and circular patrols and examines the point location for unmanned aerial vehicle, and the arrow point is the direction of flight. On the spot investigation scene, after fully knowing the airspace flight condition, the manual control unmanned aerial vehicle equipment flies to P along the clockwise direction in sequence₁To P₈The point is hovered in the air, and the point location coordinates are recorded in the flight control system in sequence to be used as the inspection point location of the subsequent task. And the unmanned aerial vehicle calls the recorded coordinates of the inspection point in the automatic inspection process.

Wherein, carry out contrastive analysis and judge whether the concrete process that the back beast decoration took place to damage to the testing result of many stages does:

a. if more than 70% of ridge image detection results are the same after a group of ridge images shot at an inspection point are detected by a trained convolutional neural network model, determining that the inspection point can be stably detected, and keeping the result as the detection result of the inspection point in the period;

b. the detection result of the polling point location in the period is compared with the detection result of the polling point location in the previous period, and the method specifically comprises the following steps: and calculating a cross-over ratio for each ridge animal decoration part of the detection result of the previous period and the detection result of the current period in sequence for matching, if the cross-over ratio is not less than a set threshold value, the matching is successful, the detection state of the inspection point ridge animal decoration part is updated, and if the cross-over ratio is less than the set threshold value, the matching is failed, and damage early warning is carried out.

During specific implementation, initializing the detection state, the category and the position coordinates of the spine animal target in the first-stage aerial image; and acquiring a second-stage aerial image, detecting by using the trained convolutional neural network model to obtain all detection results, sequentially setting the cross comparison ratio threshold value of each ridge animal decoration part to be 0.5 from the second-stage detection result based on the first-stage detection state, matching, updating the detection state of the inspection point if the matching is successful, otherwise, performing damage early warning, and repeating the third stage until a plurality of stages. It should be noted that, by using the trained convolutional neural network model detection effect diagram, as shown in fig. 11, each ridge animal decoration has a rectangular frame, and the change condition of the ridge animal decoration of the inspection point location can be easily determined by calculating the intersection ratio of each ridge animal decoration of the inspection point location of two adjacent inspection periods.

Therefore, the method for patrolling the ancient architecture backbone decoration piece can realize the rapid automatic patrolling of the backbone decoration piece in a large range.

In order to more clearly illustrate a specific implementation process of obtaining an optimal detection feature map from the convolutional neural network model in the previous embodiment, in another specific embodiment of the present invention, in the aggregation convolutional layer module in step S101, the first convolutional layer module 1 is a basic convolutional layer, that is, a conventional convolutional layer, or a standard convolutional layer, in order to expand the number of channels of the input image to obtain richer information, the number of channels of the input image is expanded from 3 to 16, and the second convolutional layer module 2, the third convolutional layer module 3, the fourth convolutional layer module 4, the fifth convolutional layer module 5, and the sixth convolutional layer module 6 have a uniform structure, and each of the residual blocks includes a convolutional layer with a step size of 2 and N depth aggregation convolutions that are formed by layer hopping, as shown in fig. 5.

In the above embodiment, when N is 1, the number of residual blocks set in the convolutional layer module is 1, and when N is 2, the number of residual blocks set in the convolutional layer module is 2, and the value of N is the same as the number of residual blocks set in the convolutional layer module. In specific implementation, the number of the residual blocks in the second convolutional layer module is set to 1, the number of the residual blocks in the third convolutional layer module is set to 2, the number of the residual blocks in the fourth convolutional layer module and the fifth convolutional layer module is set to 4, and the number of the residual blocks in the sixth convolutional layer module is set to 2, so that the network learns the residual of the inter-layer potential mapping to maintain linear mapping, and the problems that the network degrades along with the increase of the depth and the like are solved.

The convolutional layer module downsamples the feature map, with the feature map resolution reduced to 1/2 per downsampling, while the number of channels is expanded to 2 times. Therefore, specifically, when the size of the input ridge image is 416 × 416 × 3, the feature map output by the first convolutional layer module 1 has a size of 416 × 416 × 16, the feature map output by the second convolutional layer module 2 has a size of 208 × 208 × 32, the feature map output by the third convolutional layer module 3 has a size of 104 × 104 × 64, the feature map output by the fourth convolutional layer module 4 has a size of 52 × 52 × 128, the feature map output by the fifth convolutional layer module 5 has a size of 26 × 26 × 256, and the feature map output by the sixth convolutional layer module 6 has a size of 13 × 13 × 512.

As shown in fig. 6, when constructing the residual block, the specific process of the applied deep aggregation convolution is as follows:

step 201, inputting a characteristic diagram T_inputFor H x W x M, a filter bank D is used_wRespectively inputting a characteristic diagram T_inputThe two-dimensional convolution is independently calculated on the M channels in a one-to-one correspondence manner to obtain a characteristic diagram T_d(ii) a H is the height of the input characteristic diagram, W is the width of the input characteristic diagram, and M is the channel number of the input characteristic diagram; wherein the filter bank D_wThe filter comprises M filters, and the size of each filter is 3 multiplied by 3;

step 202, utilizing a 3 × 3 sliding window with a sliding step length of 1, respectively in the feature map T_dApplying a SUM function SUM () to each center pixel on the M channels of (1), and calculatingAlgebraic sum of elements in the range to obtain characteristic diagram T_sumTo input a feature map T_inputAnd characteristic diagram T_sumSplicing in channel dimension to obtain a characteristic diagram T_concat；

Step S203, along the depth direction, using a filter P_wCalculating a feature map T_concatThe point convolution depth aggregation feature of (1), wherein the filter P_wThe size of (a) is 1 × 1;

step S204, repeating the step S203F times to obtain an output characteristic diagram T_outputH is multiplied by W by F.

In the implementation process, the depth aggregation convolution is performed due to the input characteristic diagram T_inputAnd characteristic diagram T_sumThe number of the channels is M, so that the characteristic diagram T obtained by splicing_concatThe number of channels of (2M). In using a filter bank P_wProcessing a feature map T_concatWhen 1, P_wA feature map is obtained using a plurality of P_wA plurality of feature maps, i.e., a plurality of feature map channels, are obtained. Specifically, in step S204, regarding the number of times of repeating step S203, in the second convolutional layer module, F is 32, in the third convolutional layer module, F is 64, in the fourth convolutional layer module, F is 128, in the fifth convolutional layer module, F is 256, and in the sixth convolutional layer module, F is 512.

As can be seen from the above specific processing procedure of the deep aggregate convolution, in this embodiment, the proposed deep aggregate convolution can make the cyber-learning spine have more distinctive features. The embodiment is based on the idea of feature aggregation, improves the depth separable Convolution by using a local summation function, proposes a depth aggregation Convolution (DPC), and constructs a residual structure in an aggregation Convolution layer module by using the depth aggregation Convolution, thereby simplifying parameter quantity and improving the utilization rate of context information. For depth separable convolution, aggregating internal features by using local summation function, enlarging convolution receptive field, enhancing feature difference, splicing input features and aggregated features, fusing original single-pixel features and field features by using point convolutionAnd learning the image depth aggregation characteristics. In step S202, SUM () is a summation function, if there is a matrix

As shown in FIG. 7, the depth-polymerizing convolution uses a summing function to aggregate D_wThe convolution features do not bring extra parameters, but can combine features of larger area range, thereby increasing the difference of the mobility and depth features of information among channels. The deep aggregation convolution enables the convolution neural network model to learn deep aggregation characteristics, and what kind of local signals are combined is determined by the kernel convolution weight coefficient, so that the extraction capability of fine-grained characteristics can be enhanced to a certain extent.

In another embodiment, in the method for inspecting the ancient architecture spinal veterinary ornament, the multi-size convolution prediction head module 7 includes an SE module for performing a compression operation and an excitation operation, and a multi-size convolution structure module for fusing information of different area ranges, as shown in fig. 8, and the multi-size convolution structure module 7 includes: the system comprises two standard convolutions of 1 multiplied by 1 and 3 multiplied by 3 which are connected in parallel, a splicing module for splicing feature maps output by the two standard convolutions which are connected in parallel, and two 3 multiplied by 3 standard convolutions of which the number of filters which are sequentially connected in series is C and 2C respectively.

In the above embodiment, as shown in fig. 9, the compression operation in the SE (Squeeze and Excitation) module adopts a global average pooling operation, and the Excitation operation is performed by: and the feature graph after the full-local average pooling operation sequentially passes through the first full-connection layer, the Relu excitation function, the second full-connection layer and the Sigmoid excitation function, and then is multiplied by the input feature graph matrix before the global average pooling operation, so that the redistribution of the image features is realized. The number of the neurons of the first full-connection layer is 10% of the input characteristic dimension, and the number of the neurons of the second full-connection layer is the same as the input characteristic dimension. The SE module is one of channel domain attention, firstly performs pooling extrusion on channel global features to generate channel descriptors, then the channel descriptors are calculated and transmitted through a fully-connected neural network to model the interrelation among the channels, the channel descriptors are excited to be channel weight coefficients capable of expressing relative importance, and finally the redistribution of the features is realized through matrix multiplication. The SE module finally maps each weight coefficient to a scaling factor within the (0, 1) interval using the Sigmoid excitation function, and thus can suppress the response of the interference signal and make the feature more directional.

In order to better fuse information in different area ranges, a multi-size convolution structure module is adopted behind an SE module, two standard convolutions of 1 × 1 and 3 × 3 which are connected in parallel can fuse multi-scale features, then splicing is carried out, and semantic information can be further extracted by further connecting two standard convolutions of 3 × 3 of which the filter numbers are respectively C and 2C. The standard convolution Convblock consists of Con2D convolution layer, batch normalization layer (Batchnormalization) and LeakyRelu excitation function layer. The feature graph after being convolved by the 3 × 3 standard with the number of filters being C is sent to an upsampling convolution module (as shown in fig. 10), and large-scale features are spliced after resolution is expanded by using a near interpolation function to form a feature pyramid, so that the utilization rate of the features is improved, and the detection effect of the multi-scale target is improved. The feature map convolved with 3 × 3 standards of 2C filters predicts an output tensor via a two-dimensional convolution layer with an offset.

In the convolutional neural network model, the specific processing procedure for extracting semantic information of the spinal image by using the multi-size convolutional prediction head module is as shown in fig. 1:

and S301, inputting the feature map output by the sixth convolutional layer module into the SE module for processing, and then obtaining a first prediction output tensor after the processing by the multi-size convolutional structure module, wherein the feature map convolved by the 3 × 3 standard with the filter number of 2C outputs the first prediction output tensor through the prediction of the two-dimensional convolutional layer with the bias, and the feature map convolved by the 3 × 3 standard with the filter number of C is sent to the first up-sampling convolution module 8, and the resolution is expanded by using a near interpolation function.

S302, splicing the feature map processed by the first up-sampling convolution module 8 with the feature map output by the fifth convolution module 5, inputting the spliced feature map into the SE module for processing, and then processing by the multi-size convolution structure module to obtain a second predicted output tensor, wherein the feature map subjected to the 3 × 3 standard convolution with the filter number of 2C outputs the second predicted output tensor through the two-dimensional convolution layer with bias, the feature map subjected to the 3 × 3 standard convolution with the filter number of C is sent to the second up-sampling convolution module 9, and resolution expansion processing is performed by using a neighboring interpolation function;

s303, splicing the feature map processed by the second upsampling convolution module 9 with the feature map output by the fourth convolution module 4, inputting the spliced feature map into the SE module for processing, and then processing the feature map by the multi-size convolution structure module to obtain a third predicted output tensor, wherein the feature map subjected to the 3 × 3 standard convolution with the filter number of 2C outputs the third predicted output tensor through the two-dimensional convolution layer prediction with offset;

and S304, after the obtained first predicted output tensor, the second predicted output tensor and the third predicted output tensor are processed by a position back calculation and non-maximum suppression algorithm, an optimal prediction result is obtained, and the type and the position of the image of the spinal animal are extracted.

In the above embodiment, the first prediction output tensor, the second prediction output tensor and the third prediction output tensor are prediction values of the classes and relative positions of the spinal animals, all prediction results of the image are obtained after the three prediction values are subjected to position back calculation, all prediction results are processed by a non-maximum suppression algorithm, low-confidence and redundant predictions are filtered, and final prediction results are retained. Specifically, when the feature map output by the sixth convolution module is 13 × 13 × 512, the obtained first predicted output tensor is 13 × 13 × 3 × (4+1+ C), when the feature map output by the fifth convolution module is 26 × 26 × 256, the output second predicted output tensor is 26 × 26 × 3 × (4+1+ C), and when the feature map output by the fourth convolution module is 52 × 52 × 128, the output third predicted output tensor is 52 × 52 × 3 × (4+1+ C), where C is 14, the number of types of spines, and 4+1 represents position information. The type information and the position information of the ridge animals can be obtained by detecting the ridge animal image through a convolutional neural network model. The cubic interpolation method is used for the adjacent interpolation function in the processing procedure of the above embodiment.

As described above, the method for inspecting the ancient architecture backbone decoration piece provided by the embodiment of the invention realizes the rapid and automatic inspection of the backbone decoration piece in a large range, and automatically judges and positions the missing and damaged backbone decoration piece.

By adopting the convolutional neural network model provided by the embodiment of the invention for detection, compared with the method YOLOv3 adopted by the prior art, as shown by table quantitative comparison, mAP is improved by 3.05%, parameter quantity is reduced by about 70%, missing detection and false detection conditions of small-size dense ridge parts are effectively reduced, and the requirement of quick inspection of ridge decorating parts can be met. Qualitative comparison results are shown in fig. 12(a), fig. 12(b), fig. 13(a) and 13(b), fig. 12(a) and 13(a) are graphs of the detection effect of YOLOv3, and fig. 12(b) and 13(b) are graphs of the detection effect of the embodiment of the invention, and it can be seen from the graphs that the method adopted by the embodiment of the invention can detect the small-size dense ridge animal decoration parts without missing detection.

Table quantitative comparison

While embodiments of the invention have been disclosed above, it is not intended to be limited to the uses listed in the specification and examples. It can be applied to all kinds of fields suitable for the present invention. Additional modifications will readily occur to those skilled in the art. It is therefore intended that the invention not be limited to the exact details and examples shown and described, but that the invention will be understood by those skilled in the art without departing from the general concept as defined by the appended claims and their equivalents.

Claims

1. Ancient building back beast decoration method of patrolling and examining, its characterized in that includes:

inputting the ridge image into a convolutional neural network model for training, and acquiring an optimal detection feature map corresponding to the ridge image from the convolutional neural network model, wherein the convolutional neural network model comprises: the system comprises a polymerization convolutional layer module and a multi-size convolutional prediction head module, wherein the polymerization convolutional layer module consists of n convolutional layers, the multi-size convolutional prediction head module is integrated into an attention mechanism, n is an integer larger than or equal to 1, the polymerization convolutional layer module is used for extracting fine-grained features of a ridge image, and the multi-size convolutional prediction head module is used for extracting semantic information of the ridge image;

2. The method for inspecting the ancient building ridge veterinary decoration pieces according to claim 1, wherein in the polymeric convolution layer modules, a first convolution layer module is a basic convolution layer, and the second convolution layer module to the nth convolution layer module have uniform structures and respectively comprise a convolution layer with the step length of 2 and a residual block formed by connecting a plurality of deep polymeric convolutions in a layer-skipping mode; the specific processing procedure of the deep aggregation convolution is as follows:

step S1, inputting characteristic diagram T_inputFor H x W x M, a filter bank D is used_wRespectively inputting a feature map T_inputThe two-dimensional convolution is independently calculated on the M channels in a one-to-one correspondence manner to obtain a characteristic diagram T_d(ii) a H is the height of the input characteristic diagram, W is the width of the input characteristic diagram, and M is the channel number of the input characteristic diagram; wherein the filter bank D_wThe filter comprises M filters, and the size of each filter is 3 multiplied by 3;

step S2, using K sliding window with sliding step length of 1, respectively in the characteristic diagram T_dFor each central pixel, a SUM function SUM () is applied to the M channels of (c)Calculating the algebraic sum of elements in the range to obtain a feature map T_sumTo input a feature map T_inputAnd characteristic diagram T_sumSplicing in channel dimension to obtain a characteristic diagram T_concat；

Step S3, along the depth direction, using a filter P_wCalculating a feature map T_concatThe point convolution depth aggregation feature of (1), wherein the filter P_wThe size of (a) is 1 × 1;

3. The method for inspecting the ancient building backbone decoration piece according to claim 2, wherein the specific processing procedure of the polymerization convolutional layer module is as follows: when the spine image sequentially passes through the first convolution layer module, the second convolution layer module, the third convolution layer module and the fourth convolution layer module in the polymerization convolution layer module till the nth convolution layer module, the resolution of the input spine image is sequentially reduced to 1/2, and meanwhile, the number of channels is sequentially expanded to 2 times, so that the fine-grained characteristic of the spine image is obtained.

4. The method for inspecting a decorative piece of a spinal chord animal according to claim 2, wherein the multi-size convolution predictor module includes an SE module performing a compression operation and an excitation operation and a multi-size convolution structure module fusing information of different area ranges, the multi-size convolution structure module including: the system comprises two standard convolutions of 1 multiplied by 1 and 3 multiplied by 3 which are connected in parallel, a splicing module for splicing feature maps output by the two standard convolutions which are connected in parallel, and two 3 multiplied by 3 standard convolutions of which the filter quantity is respectively C and 2C which are sequentially connected in series.

5. The method for inspecting the ornamental components of the ancient building backbone according to claim 4, wherein the specific processing procedure for extracting semantic information of the backbone image by using the multi-size convolution prediction head module is as follows:

the feature map output by the nth convolutional layer module is input into the SE module for processing, and then is processed by the multi-size convolutional structure module to obtain a first prediction output tensor, wherein the feature map convolved by the 3 × 3 standard with the filter number of 2C outputs the first prediction output tensor through the two-dimensional convolutional layer with bias, the feature map convolved by the 3 × 3 standard with the filter number of C is input into a first up-sampling convolutional module, and resolution expansion processing is performed by using an adjacent interpolation function;

splicing the feature map processed by the first up-sampling convolution module with the feature map output by the (n-1) th convolution module, inputting the spliced feature map into the SE module for processing, and then processing by the multi-size convolution structure module to obtain a second prediction output tensor, wherein the feature map subjected to the 3 × 3 standard convolution with the filter number of 2C outputs the second prediction output tensor through the two-dimensional convolution layer with the offset, and the feature map subjected to the 3 × 3 standard convolution with the filter number of C is sent to the second up-sampling convolution module and is subjected to resolution expansion processing by using an adjacent interpolation function;

splicing the feature map processed by the second up-sampling convolution module with the feature map output by the (n-2) th convolution module, inputting the spliced feature map into the SE module for processing, and then processing by the multi-size convolution structure module to obtain a third prediction output tensor, wherein the feature map subjected to the 3 × 3 standard convolution with the filter number of 2C outputs the third prediction output tensor through the two-dimensional convolution layer prediction with the bias;

and after the obtained first predicted output tensor, the second predicted output tensor and the third predicted output tensor are processed by a position inverse calculation and non-maximum suppression algorithm, obtaining an optimal prediction result, namely extracting the type and the position of the image of the spinal animal.

6. The method for inspecting the ornamental component of the ancient building backbone according to claim 3, wherein n is 6 in the polymeric convolutional layer module, the number of channels of the input backbone image is expanded from 3 to 16 by the first convolutional layer module, and the number of channels processed by the second convolutional layer module to the sixth convolutional layer module is 32, 64, 128, 256 and 512 in sequence; the number of residual blocks in the second convolutional layer module is 1, the number of residual blocks in the third convolutional layer module is 2, the number of residual blocks in the fourth convolutional layer module and the fifth convolutional layer module is 4, and the number of residual blocks in the sixth convolutional layer module is 2; in step S2, K takes a value of 3.

7. The method for inspecting the ornamental pieces of the ancient building spinal animals according to claim 4, wherein the compression operation in the SE module adopts a global average pooling operation, and the excitation operation is carried out by the following steps: and the feature map subjected to the global average pooling operation sequentially passes through the first full connection layer, the Relu excitation function, the second full connection layer and the Sigmoid excitation function, and then is multiplied by the input feature map matrix before the global average pooling operation, so that the redistribution of the image features is realized.

8. The method for inspecting the ancient building backbone decoration piece according to claim 1, wherein the training sample set comprises backbone images shot by fixed-point aerial photography and undamaged ancient building backbone images obtained from internet source data; using a marking tool to mark 14 categories of spinal animal decoration pieces in a spinal animal image, including fairy, dragon, phoenix, lion, sea horse, heaven horse, be improperly familiar with fish, a Suan, an adian, a bullfight, a Leishu, a drop, a rhynchus and a sleeve.

9. The method for inspecting the ancient architecture backbone decoration parts according to claim 1, wherein the specific process of comparing and analyzing the detection results of a plurality of stages and judging whether the backbone decoration parts are damaged is as follows:

if more than 70% of ridge image detection results are the same after a group of ridge images shot at an inspection point are detected by a trained convolutional neural network model, determining that the inspection point can be stably detected, and keeping the result as the detection result of the inspection point in the period;

should patrol and examine the testing result of point location with this cycle and should patrol and examine the testing result of point location with last cycle and compare, specifically do: and calculating a cross-over ratio for each ridge animal decoration part of the detection result in the previous period and the detection result in the current period in sequence, matching successfully if the cross-over ratio is not less than a set threshold value, updating the detection state of the inspection point ridge animal decoration part, and failing to match if the cross-over ratio is less than the set threshold value, and performing damage early warning.

10. The method for inspecting the ornamental pieces of the ancient building backbone according to claim 1, wherein before the backbone image is input into the convolutional neural network model for training, the sizes of 9 anchor frames are determined by a K-Means clustering algorithm for the input backbone image, the size of the input backbone image is 640 x 640, and images with irregularities are unified by using a scaling method.