CN113537026A

CN113537026A - Primitive detection method, device, equipment and medium in building plan

Info

Publication number: CN113537026A
Application number: CN202110775938.0A
Authority: CN
Inventors: 崔淼; 陈成才
Original assignee: Shanghai Xiaoi Robot Technology Co Ltd
Current assignee: Shanghai Xiaoi Robot Technology Co Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-10-22
Anticipated expiration: 2041-07-09
Also published as: CN113537026B

Abstract

The embodiment of the invention discloses a method, a device, equipment and a medium for detecting a primitive in a building plan. The method comprises the following steps: extracting multi-channel basic image features under multiple scales from a building plan to be identified; after the receptive fields of the multi-channel basic image features under all scales are increased by adopting a hole convolution algorithm, performing feature fusion on the multi-channel basic image features under all scales to obtain multi-channel fusion image features; obtaining a plurality of segmentation maps according to the multi-channel fusion image characteristics; and combining the primitives with different kernel proportions in each segmentation graph by adopting a progressive expansion algorithm to obtain at least one primitive area, and acquiring a primitive identification result corresponding to the primitive area. In the technical scheme, the features extracted from the building plane map are processed based on an artificial intelligence algorithm to obtain the primitive identification result, so that the primitives in the building plane map are accurately detected, and the problems of missing detection and false detection caused by shielding or interference are avoided.

Description

Primitive detection method, device, equipment and medium in building plan

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a method, a device, equipment and a medium for detecting a primitive in a building plan.

Background

With the rapid development of the artificial intelligence technology, the artificial intelligence technology has been widely applied to various application scenarios such as financial services, medical imaging, sequencing diagnosis, machine vision, industrial detection and the like.

At present, in the building industry, especially in the examination of building drawings, the amount of examination information is large, for example, whether a component (i.e. a primitive) in a building plan completely identifies the total size, bay or depth size of a plane, and whether each index (such as area, length, perimeter, and the like) of the component is calculated according to a specification requirement or a planning condition. However, when the building plan is analyzed, due to the variety of component backgrounds in the building plan, the occlusion of other components, the interference of auxiliary lines and characters, etc., component detection in the building plan is often affected, resulting in problems of missing detection and false detection. Therefore, how to realize accurate detection of the primitives in the building plane graph based on an artificial intelligence algorithm and avoid the problems of missed detection and false detection caused by occlusion or interference is a problem to be solved urgently.

Disclosure of Invention

The embodiment of the invention provides a method, a device, equipment and a medium for detecting primitives in a building plan, which are used for realizing accurate detection of the primitives in the building plan based on an artificial intelligence algorithm and avoiding the problems of missed detection and false detection caused by shielding or interference.

In a first aspect, an embodiment of the present invention provides a method for detecting a primitive in a building plan, including:

extracting multi-channel basic image features under multiple scales from a building plan to be identified;

after the receptive fields of the multi-channel basic image features under all scales are increased by adopting a hole convolution algorithm, performing feature fusion on the multi-channel basic image features under all scales to obtain multi-channel fusion image features;

obtaining a plurality of segmentation graphs according to the multi-channel fusion image characteristics, wherein the primitives included in different segmentation graphs correspond to different kernel scales;

and combining the primitives with different kernel proportions in each segmentation graph by adopting a progressive expansion algorithm to obtain at least one primitive area, and acquiring a primitive identification result corresponding to the primitive area.

In a second aspect, an embodiment of the present invention further provides a primitive detecting apparatus in a building plan, including:

the multi-channel basic image feature extraction module is used for extracting multi-channel basic image features under multiple scales from a building plan to be identified;

the multi-channel fusion image feature generation module is used for increasing the receptive field of the multi-channel basic image features under each scale by adopting a hole convolution algorithm, and then performing feature fusion on the multi-channel basic image features under each scale to obtain multi-channel fusion image features;

the segmentation graph generation module is used for obtaining a plurality of segmentation graphs according to the multi-channel fusion image characteristics, and the primitives included in different segmentation graphs correspond to different kernel scales;

and the primitive identification result acquisition module is used for combining the primitives with different kernel proportions in each segmentation graph by adopting a progressive expansion algorithm to obtain at least one primitive area and acquiring a primitive identification result corresponding to the primitive area.

In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the method for detecting the primitive in the building plan according to any embodiment of the present invention.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for detecting a primitive in a building plan according to any embodiment of the present invention.

In the technical scheme provided by the embodiment of the invention, the multi-channel basic image characteristics under multiple scales are extracted from the building plan to be identified, the cavity convolution algorithm is adopted, the receptive fields of the multi-channel basic image characteristics under each scale are increased, then the characteristic fusion is carried out to obtain the multi-channel fusion image characteristics, then obtaining a plurality of segmentation graphs according to the multi-channel fusion image characteristics, merging the primitives with different kernel proportions in each segmentation graph by adopting a progressive expansion algorithm to obtain at least one primitive area, acquiring a primitive identification result corresponding to the primitive area, the characteristics extracted from the building plane graph are processed through an artificial intelligence algorithm, the detection effect of the primitives can be effectively improved, the primitive recognition result with high accuracy is obtained, the accurate detection of the primitives in the building plane graph is realized, and the problems of missed detection and false detection caused by shielding or interference are avoided.

Drawings

Fig. 1 is a schematic flowchart of a primitive detection method in a building plan according to a first embodiment of the present invention;

fig. 2a is a schematic flowchart of a primitive detection method in a building plan according to a second embodiment of the present invention;

fig. 2b is a schematic diagram of a model structure for obtaining a primitive recognition result of a building plan according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a primitive detecting device in a building plan according to a third embodiment of the present invention;

fig. 4 is a schematic hardware configuration diagram of a computer device in the fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example one

Fig. 1 is a flowchart of a method for detecting a primitive in a building plan according to an embodiment of the present invention, where the embodiment of the present invention is applicable to a situation where a primitive in a building plan is accurately detected, and the method may be executed by a primitive detecting apparatus in a building plan according to an embodiment of the present invention, where the apparatus may be implemented in software and/or hardware, and may be generally integrated in a computer device.

As shown in fig. 1, the method for detecting a primitive in a building plan provided in this embodiment specifically includes:

and S110, extracting multi-channel basic image features under multiple scales from the building plan to be identified.

The building plan to be identified refers to the building plan to be subjected to primitive detection. In the embodiment of the present invention, the primitive refers to a member in a building plan, that is, each element constituting a building, such as a floor, a wall, a beam, and the like.

The multi-channel basic image features refer to image features extracted from a building plan under multi-dimensional scales.

In order to enable the extracted image features to better characterize different regions of the building plan, multi-channel base image features at multiple scales in the building plan may therefore be extracted. For the multi-channel basic image features with high image resolution (namely low-dimensional scale), the method has rich detail information and smaller receptive field, and is suitable for detecting small targets; and for the multi-channel basic image features with low image resolution (namely high dimensional scale), the method has higher image semantic information and larger receptive field, and is suitable for detecting large targets. In the embodiment of the present invention, the Receptive Field refers to a size of an area mapped by a pixel point on an image feature output by each layer of the convolutional neural network on an input picture, that is, a size of an area mapped in a building plan by a multi-channel basic image feature, that is, an area on the building plan corresponding to each point on the multi-channel basic image feature.

It should be noted that, before extracting the multi-channel basic image features under multiple scales from the building plan to be identified, any target detection algorithm in the prior art, such as a YOLO-v3 (young Only Look on-version 3) algorithm or an ssd (single Shot multiple box detector) algorithm, may be adopted to detect the standard building plan and obtain the building plan to be identified, which is not specifically limited in the embodiment of the present invention. The standard construction drawing refers to a construction engineering drawing including a plurality of drawing frames (e.g., a small drawing frame corresponding to a building plan, an auxiliary drawing frame such as a building design description drawing frame, etc.).

And S120, increasing the receptive field of the multichannel basic image features under each scale by adopting a hole convolution algorithm, and then performing feature fusion on the multichannel basic image features under each scale to obtain multichannel fusion image features.

The hole Convolution (Atrous Convolution) algorithm, i.e. the dilation Convolution algorithm (or dilation Convolution algorithm), is to inject holes in the standard Convolution kernel to increase the receptive field and reduce the computational load.

The multi-channel fusion image feature refers to an image feature obtained by fusing multi-channel basic image features under multiple scales, and can represent multi-dimensional scale features in a building plan.

Because the feature resolution of the image features is high under the low-dimensional scale and contains more detailed information, and the feature resolution of the image features is low under the high-dimensional scale and has poor detail perception capability, in order to enable the extracted features to better represent the building plan, the multi-channel basic image features under different scales can be fused. Before feature fusion, a hole convolution algorithm can be adopted to increase the receptive field of the multi-channel basic image features under each scale, the context information of the multi-channel basic image features and the building plane graph under the multi-scale is obtained, then the multi-channel basic image features processed by the hole convolution algorithm are utilized to carry out feature fusion, the feature information of different scales is fused together to obtain the multi-channel fusion image features, further the feature information of the primitives in the building plane graph under the multi-scale can be obtained, particularly the feature information of the primitives blocked by the interference of auxiliary lines or characters and the like, and the primitive detection performance of the building plane graph is improved.

S130, obtaining a plurality of segmentation graphs according to the multi-channel fusion image characteristics, wherein the primitives included in different segmentation graphs correspond to different kernel scales.

The segmentation map refers to a plurality of feature maps obtained by segmenting the multi-channel fused image features.

The kernel scale refers to a segmentation scale of the multi-channel fused image features corresponding to each segmentation map. And the segmentation graph corresponding to the maximum kernel scale is the multi-channel fusion image characteristic.

In order to detect the primitives overlapped in a cross mode, the edge information of each primitive needs to be accurately detected, so before detection, the multi-channel fused image features can be divided into a plurality of divided graphs according to different kernel scales, the divided graphs and the multi-channel fused image features have the same shapes and central points but are gradually increased in scale, and the primitives in each divided graph are combined based on a progressive expansion algorithm.

And S140, merging the primitives with different kernel proportions in each segmentation graph by adopting a progressive expansion algorithm to obtain at least one primitive area, and acquiring a primitive identification result corresponding to the primitive area.

In the embodiment of the invention, the Progressive Scale Expansion (PSE) algorithm is adopted to position any shape of graphic element in a building plan and effectively distinguish the boundary of the adjacent or partially overlapped graphic elements.

Kernel scale, refers to the contraction ratio of the primitives in each segmentation map corresponding to the multi-channel fused image features.

The primitive area refers to an area which is obtained after the primitives in each partition graph are combined and is consistent with the partition graph with the largest kernel scale in size, namely the area where each primitive in the building plan is located.

The primitive recognition result refers to a primitive detection result, for example, an area range corresponding to each primitive in the building plan, and position information of each primitive area in the building plan.

The method adopts a progressive expansion algorithm, combines the primitives with different kernel proportions in each segmentation graph to obtain at least one complete primitive area, can effectively detect the boundary of each primitive, distinguishes the adjacent primitive areas, avoids the problems of missing detection and false detection caused by shielding, interference or dense primitive distribution, can position the primitive areas by adopting the progressive expansion algorithm, and obtains the position information of the primitive areas in the building plane graph.

As an alternative implementation, obtaining a plurality of segmentation maps according to the multi-channel fused image features may include: performing channel fusion on the characteristics of the multi-channel fusion image to obtain a target fusion image; inputting the target fusion image into an image segmentation model, and outputting a plurality of segmentation graphs through the image segmentation model, wherein each segmentation graph comprises mask graphs of all primitives with a set kernel scale;

adopting a progressive expansion algorithm to merge the primitives with different kernel proportions in each segmentation graph to obtain at least one primitive area, wherein the primitive area can comprise: and sequentially combining all the primitives included in each partition graph according to the sequence of the kernel proportion from small to large by adopting a breadth-first method to obtain at least one primitive area.

And channel fusion, which is used for fusing the multi-channel fusion image characteristics and obtaining the image characteristics with the same channel number (referring to the number of convolution kernels), namely obtaining the target fusion image. For example, the target fusion image may be obtained by performing channel fusion on the multi-channel fusion image features by using a 1 × 1 convolution kernel with a set number of channels.

And the image segmentation model is used for carrying out image segmentation on the input target fusion image and outputting a plurality of segmentation images obtained after segmentation.

The mask image refers to a segmentation result of different primitives in the multi-channel fusion image feature under a certain scale (namely, kernel scale is set), in the mask image, pixel points at positions where the primitives are located are generally black, and pixel points at positions where the non-primitives are located are generally white.

The Breadth First Search (BFS) refers to a blind Search method that examines all nodes in an image until a Search result is obtained.

The method comprises the steps of performing channel fusion on characteristics of a multi-channel fusion image to obtain a target fusion image with the same channel number, inputting the target fusion image into an image segmentation model to obtain a plurality of segmentation maps, and then combining the segmentation maps with the minimum kernel proportion into segmentation maps with larger kernel proportions in sequence by adopting a breadth-first method to expand the regions of the segmentation maps until the segmentation maps with the maximum kernel proportion are expanded, namely gradually expanding primitives in each segmentation map with the small kernel proportion until the primitives in each segmentation map with the maximum kernel proportion are covered to obtain at least one primitive region. The primitive detection area is gradually expanded to the maximum from small to large through the plurality of segmentation graphs, a complete primitive area is obtained, robustness is provided for any shape of primitive, and primitive boundaries close to or even partially overlapped can be rapidly and accurately separated.

The technical scheme provided by the embodiment of the invention extracts the multi-channel basic image characteristics under a plurality of scales from the building plan to be identified, adopts the hole convolution algorithm to perform characteristic fusion after increasing the receptive field of the multi-channel basic image characteristics under each scale to obtain the multi-channel fusion image characteristics, then obtaining a plurality of segmentation graphs according to the multi-channel fusion image characteristics, merging the primitives with different kernel proportions in each segmentation graph by adopting a progressive expansion algorithm to obtain at least one primitive area, acquiring a primitive identification result corresponding to the primitive area, the characteristics extracted from the building plane graph are processed through an artificial intelligence algorithm, the detection effect of the primitives can be effectively improved, the primitive recognition result with high accuracy is obtained, the accurate detection of the primitives in the building plane graph is realized, and the problems of missed detection and false detection caused by shielding or interference are avoided.

In an optional implementation manner of this embodiment, before extracting the multi-channel basic image features at the set low-dimensional scale in the building plan to be identified, the method may further include:

pre-identifying a standard building drawing by adopting a morphological algorithm, and intercepting and obtaining at least one alternative picture frame detection area in the standard building drawing according to a pre-identification result, wherein the standard building drawing comprises at least one small picture frame with the image size smaller than or equal to a preset standard identification size; for each candidate frame detection area, performing the following frame detection processing operations: extracting multi-channel basic image features in the candidate frame detection area; on the basis of keeping the basic image features from being lost, extracting multi-channel high-dimensional image features from each basic image feature, and enhancing the image feature quality of each high-dimensional image feature; on the basis of keeping the high-dimensional image characteristics from being lost, performing characteristic fusion on the multi-channel high-dimensional image characteristics to obtain multi-channel fusion image characteristics; and acquiring a picture frame identification result of the candidate picture frame detection area according to the multi-channel fusion image characteristics, and taking each identified picture frame as a building plan to be identified.

The morphological algorithm refers to an algorithm for analyzing and identifying an image by measuring and extracting a corresponding shape in the image through a certain morphological structural element. In the embodiment of the invention, the morphological algorithm is used for acquiring the drawing frame information of the standard building drawing.

And the pre-recognition result refers to at least one small drawing frame which is recognized in the standard construction drawing by adopting a morphological algorithm and has the image size smaller than or equal to the preset standard recognition size. The preset standard identification size refers to a preset maximum size of a picture frame which can be identified through a morphological algorithm.

The candidate frame detection area refers to an area cut out from a standard building drawing.

And the image characteristic quality is used for measuring the degree of representing the detection area of the candidate frame by the image characteristic.

The frame identification result refers to a result obtained by performing frame detection on the candidate frame detection area.

The standard architectural drawing may include a plurality of frames, for example, for a standard architectural drawing of a residential project, which may include image catalogs, architectural design descriptions, and other frames, and therefore, before primitive detection is performed on the architectural plan, it is necessary to intelligently detect the architectural plan to be identified in the standard architectural drawing, specifically: firstly, pre-identifying a standard building drawing by adopting a morphological algorithm to obtain at least one small drawing frame with the size less than or equal to a preset standard identification size, then intercepting at least one alternative drawing frame detection area from the standard building drawing according to the small drawing frame obtained by identification, and carrying out drawing frame detection on the area to determine whether the drawing frame is a building plan to be identified.

The frame detection of the candidate frame detection area specifically includes: firstly, convolution cores with different channel numbers can be adopted to carry out convolution processing on alternative picture frame detection areas, basic image features of multiple channels are extracted, then high-dimensional image features of multiple channels are extracted from the basic image features and image feature quality is enhanced, then feature fusion is carried out on the high-dimensional image features of the multiple channels to obtain fusion image features of the multiple channels, finally, picture frame identification results of the alternative picture frame detection areas are obtained according to the fusion image features of the multiple channels, and all the identified picture frames are used as building plangrams to be identified.

The method has the advantages that at least one small drawing frame can be identified in the standard building drawing by adopting the morphological algorithm, the problem that the detection model cannot be correctly identified due to the fact that the standard building drawing with very high image resolution is directly input into the detection model is avoided, and the resolution of the image input into the detection model is reduced; and then intercepting at least one alternative picture frame detection area in the standard building drawing according to the small picture frame obtained by identification, and intelligently detecting the alternative picture frame detection area, thereby determining the building plan to be identified, without manual drawing verification by professionals, improving the searching speed of the building plan to be identified in the standard building drawing, and realizing automatic drawing verification and picture frame information searching.

On the basis of the above embodiments, the pre-recognizing the standard building drawing by using the morphological algorithm, and intercepting at least one candidate drawing frame detection area from the standard building drawing according to the pre-recognition result may include:

carrying out binarization processing on the standard building drawing to obtain a binarization image; carrying out corrosion and/or expansion processing on the binary image so as to smooth the boundary of the object in the binary image; carrying out edge point detection on the processed binary image to obtain a plurality of edge points, carrying out connected domain detection according to each detected edge point, and obtaining the position coordinate range of each detected connected domain in the binary image; and intercepting and obtaining alternative drawing frame detection areas corresponding to the connected domains in the standard building drawing according to the position coordinate ranges.

On the basis of the foregoing embodiments, extracting multiple channels of basic image features in the candidate frame detection region may include:

inputting the candidate picture frame detection area into a lightweight network, and inputting output results of a plurality of bottleneck layers of the lightweight network into a path aggregation network to obtain multichannel basic image characteristics;

wherein different bottleneck layers are used for outputting basic image features with different scales.

On the basis of the foregoing embodiments, extracting multiple channels of high-dimensional image features from each basic image feature on the basis of keeping the basic image features from being missing may include:

and inputting the multichannel basic image features into a spatial pyramid pooling network, and extracting the multichannel high-dimensional image features with standard scales from the multi-scale multichannel basic image features through the spatial pyramid pooling network.

On the basis of the above embodiments, enhancing the image feature quality of each high-dimensional image feature may include:

and inputting the multi-channel high-dimensional image features into a sub-pixel convolution network, and respectively inserting the low-resolution high-dimensional image features into the high-resolution feature spectrum through the sub-pixel convolution network so as to enhance the feature quality of the high-dimensional image features.

On the basis of the foregoing embodiments, on the basis of keeping the high-dimensional image features from being missing, performing feature fusion on the multi-channel high-dimensional image features to obtain multi-channel fused image features, which may include:

and performing convolution processing on the multi-channel high-dimensional image features by using a set number of convolution cores of 1 x 1 to obtain multi-channel fusion image features.

On the basis of the foregoing embodiments, obtaining a frame identification result of a candidate frame detection region according to a multi-channel fused image feature may include:

and respectively inputting the multi-channel fusion image characteristics into a classification network and a positioning network, and identifying the position coordinates of the area where the picture frame is located in the candidate picture frame detection area through the classification result output by the classification network and the positioning result output by the positioning network.

On the basis of the foregoing embodiments, for each candidate frame detection area, the performing of each frame detection processing operation may specifically include:

inputting each alternative picture frame detection area into a picture frame identification model trained in advance respectively, and acquiring a picture frame identification result output by the picture frame identification model aiming at each alternative picture frame detection area;

wherein, the frame recognition model specifically includes: the system comprises a lightweight network, a path aggregation network, a spatial pyramid pooling network, a sub-pixel convolution network, a 1 x 1 convolution kernel, a classification network and a positioning network;

the training samples used by the frame recognition model in training comprise: and pre-marking standard building drawings of the picture frame positions of the building plane drawings of each house.

Example two

Fig. 2a is a flowchart of a primitive detection method in a building plan according to a second embodiment of the present invention. The present embodiment is embodied on the basis of the above embodiment, wherein the extracting the multi-channel basic image features under multiple scales from the building plan to be identified may specifically be:

inputting the building plan into a residual error network, and acquiring output results of a plurality of residual error blocks of the residual error network as multi-channel basic image characteristics under a plurality of scales;

and each residual block is used for outputting the multi-channel basic image characteristics under the set scale.

Further, before the hollow convolution algorithm is adopted and the receptive field of the multi-channel basic image features under each scale is increased, the method may further include:

and respectively performing up-sampling processing on the multi-channel basic image features under each scale to increase the high-dimensional features in the multi-channel basic image features.

As shown in fig. 2a, the method for detecting a primitive in a building plan provided in this embodiment specifically includes:

s210, inputting the building plan into a residual error network, and obtaining output results of a plurality of residual error blocks of the residual error network as multi-channel basic image characteristics under a plurality of scales.

Residual networks (ResNet) are used to extract multi-scale image features in building plans.

And a residual block (ResBlock) refers to a basic structural unit in a residual network, and each residual block is used for outputting multi-channel basic image features under a set scale.

It is understood that a plurality of residual blocks can be arranged in one residual network, and different residual blocks can output image features at different scales. The building plan is input into the residual error network, and the image characteristics under different set scales can be respectively output, so that the multichannel basic image characteristics under a plurality of different scales can be obtained.

S220, up-sampling the multi-channel basic image features under each scale respectively to increase high-dimensional features in the multi-channel basic image features.

The up-sampling is used for increasing high-dimensional characteristic information and aggregating image semantic information in the multi-channel basic image characteristics.

Illustratively, after the multi-channel basic image features under each scale are sequentially convolved by 1 × 1 convolution kernels with the preset number of channels, 2 times of upsampling processing is performed, so that high-dimensional features in the multi-channel basic image features are added on the basis of ensuring that the multi-channel basic image features are not lost, the multi-channel basic image features can contain features related to image semantics, and the primitive detection effect in the building plan is further improved.

And S230, increasing the receptive field of the multichannel basic image features under each scale by adopting a hole convolution algorithm, and then performing feature fusion on the multichannel basic image features under each scale to obtain multichannel fusion image features.

Optionally, increasing the receptive field of the multi-channel basic image features under each scale by using a hole convolution algorithm may include: acquiring void convolution ratios respectively corresponding to the multi-channel basic image features of all scales; and respectively carrying out convolution operation on the convolution kernel of each cavity convolution ratio and the matched multichannel basic image characteristics so as to increase the receptive field of the multichannel basic image characteristics under each scale.

Here, the hole convolution ratio, i.e., the dilation rate (or dilation rate), refers to the number of intervals of the points of the convolution kernel, which is used to represent the magnitude of the increase in the receptive field. For a smaller cavity convolution ratio, the receptive field is smaller, which is beneficial to detecting a small target; for a larger cavity convolution ratio, the receptive field is larger, which is beneficial to detecting a large target. For example, the void convolution ratio is 1, indicating that points of the convolution kernel are adjacent to each other, and corresponds to a general convolution; the hollow convolution ratio is not 1, and for example, the hollow convolution ratio is 2, which means that the points of the convolution kernel are separated by one pixel, that is, the 3 × 3 convolution kernel of the hollow convolution with the hollow convolution ratio of 2 has the same receptive field as the 5 × 5 convolution kernel of the general convolution.

In order to increase the receptive field of the multi-channel basic image features under each scale and enable the receptive field to cover the region corresponding to the whole multi-channel basic image features without blind areas, the hole convolution ratios respectively corresponding to the multi-channel basic image features of each scale can be obtained, and the convolution kernel of each hole convolution ratio and the matched multi-channel basic image features are respectively subjected to convolution operation.

Optionally, performing feature fusion on the multi-channel basic image features under each scale to obtain multi-channel fusion image features, which may include: inputting the multi-channel basic image features under all scales into a merging network together to obtain a fusion image feature with a first channel number; and performing convolution processing on the fusion image features of the first channel number by using a convolution kernel of 1 × 1 of the set channel number to obtain fusion image features of the second channel number.

Merging network refers to a network layer for feature fusion.

The fused image features of the first channel number refer to image features obtained by fusing multi-channel basic image features under multi-scale.

The fused image features of the second channel number are image features with specific dimensionality after information integration is carried out on the fused image features of the first channel number on the basis of ensuring that the fused image features of the first channel number are not lost.

Because the feature resolution of the image features is higher under the low-dimensional scale and contains more detailed information, and the feature resolution of the image features is lower under the high-dimensional scale and has poorer detail perception capability, in the embodiment of the invention, in order to enable the extracted features to better describe the primitives in the building plane graph, the multi-channel basic image features under different scales can be fused, the edge detection effect is improved, and the primitive detection performance of the building plane graph is further improved; the convolution processing can be performed on the fusion image features of the first channel number, and the image features under the low-dimensional scale or the high-dimensional scale are reserved, for example, the convolution processing with the channel number being 256 and the convolution kernel being 1 × 1 is performed on the fusion image features of the first channel number to obtain the fusion image features of the second channel number, so that the image features under the set scale can be reserved, and the loss of the fusion image features of the first channel number can be avoided.

S240, obtaining a plurality of segmentation graphs according to the multi-channel fusion image characteristics, wherein the primitives included in different segmentation graphs correspond to different kernel scales.

And S250, merging the primitives with different kernel proportions in each segmentation graph by adopting a progressive expansion algorithm to obtain at least one primitive area, and acquiring a primitive identification result corresponding to the primitive area.

As a specific implementation, fig. 2b provides a schematic diagram of a model structure for obtaining a primitive recognition result of a building plan. Firstly, a building plan to be identified is input into a residual network, and output results of a first residual block ResBlock1, a second residual block ResBlock2, a fourth residual block ResBlock4 and a sixth residual block ResBlock6 in the residual network ResNet are 1/4, 1/8, 1/16 and 1/32 of the resolution of the input building plan respectively, so that the output results of the four residual blocks can be obtained and used as multi-channel basic image features under four scales; secondly, performing convolution processing on the multichannel basic image features under four scales by respectively adopting 1 x 1 convolution kernel with the channel numbers of 16, 32, 125 and 256, performing 2 times of up-sampling processing on the multichannel basic image features obtained after the convolution processing, and increasing high-dimensional features in the multichannel basic image features on the basis of ensuring that the multichannel basic image features are not lost; then, performing convolution operation on the 3 × 3 convolution kernels with 1 × 1 standard convolution and three cavity convolution ratios of 6, 12 and 18 respectively and matched multichannel basic image features, increasing the receptive fields of the multichannel basic image features under each scale on the basis of ensuring that the multichannel basic image features are not lost, acquiring multi-scale information, inputting the multichannel basic image features under each scale into a combining network Comcat together to obtain the fused image features of a first channel number, performing convolution processing on the fused image features of the first channel number by using the convolution kernel with 1 × 1 with the set channel number of 256, and obtaining the fused image features of a second channel number on the basis of ensuring that the fused image features of the first channel number are not lost; and finally, obtaining a plurality of segmentation graphs according to the multi-channel fusion image characteristics, combining the primitives with different kernel proportions in each segmentation graph by adopting a progressive expansion algorithm to obtain at least one primitive area, and outputting a primitive identification result corresponding to the primitive area.

For those parts of this embodiment that are not explained in detail, reference is made to the aforementioned embodiments, which are not repeated herein.

According to the technical scheme, the building plan is input into a residual error network, output results of a plurality of residual error blocks are used as multi-channel basic image features under a plurality of scales, the multi-channel basic image features under each scale are respectively subjected to up-sampling processing, then a hole convolution algorithm is adopted, the receptive fields of the multi-channel basic image features under each scale are increased, and then feature fusion is carried out, so that multi-channel fusion image features are obtained, high-dimensional features in the features are effectively increased, more feature detail information is obtained, and feature resolution and primitive detection accuracy are improved; and the segmentation graphs of the multi-channel fusion image characteristics are merged by adopting a progressive expansion algorithm to obtain at least one primitive area, and a primitive identification result corresponding to the primitive area is obtained, so that the boundary pixel values of the primitives which are close to or intersected can be effectively separated, the accurate detection of the primitives in the building plane graph is realized, and the problems of missing detection and false detection caused by shielding or interference are avoided.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a primitive detection apparatus in a building plan according to a third embodiment of the present invention, where the third embodiment of the present invention is applicable to a situation of accurately detecting primitives in a building plan, and the apparatus may be implemented in a software and/or hardware manner, and may be generally integrated in a computer device.

As shown in fig. 3, the primitive detecting device in the building plan specifically includes: the image processing system comprises a multi-channel basic image feature extraction module 310, a multi-channel fusion image feature generation module 320, a segmentation map generation module 330 and a primitive identification result acquisition module 340. Wherein the content of the first and second substances,

the multi-channel basic image feature extraction module 310 is used for extracting multi-channel basic image features under multiple scales from a building plan to be identified;

the multi-channel fusion image feature generation module 320 is configured to perform feature fusion on the multi-channel basic image features at each scale to obtain multi-channel fusion image features after increasing the receptive field of the multi-channel basic image features at each scale by using a hole convolution algorithm;

the segmentation map generation module 330 is configured to obtain a plurality of segmentation maps according to the multi-channel fusion image feature, where primitives included in different segmentation maps correspond to different kernel scales;

the primitive identification result obtaining module 340 is configured to combine the primitives with different kernel proportions in each segmentation graph by using a progressive expansion algorithm to obtain at least one primitive region, and obtain a primitive identification result corresponding to the primitive region.

Optionally, the multi-channel basic image feature extraction module 310 is specifically configured to input the building plan into a residual error network, and obtain output results of a plurality of residual error blocks of the residual error network as the multi-channel basic image features under the multiple scales; and each residual block is used for outputting the multi-channel basic image characteristics under the set scale.

Optionally, the apparatus further comprises: and the up-sampling processing module is used for respectively performing up-sampling processing on the multi-channel basic image features under each scale before increasing the receptive field of the multi-channel basic image features under each scale by adopting a hole convolution algorithm so as to increase the high-dimensional features in the multi-channel basic image features.

Optionally, the multi-channel fusion image feature generation module 320 is specifically configured to obtain a void convolution ratio corresponding to each scale of multi-channel basic image features; and respectively carrying out convolution operation on the convolution kernel of each cavity convolution ratio and the matched multichannel basic image characteristics to increase the receptive field of the multichannel basic image characteristics under each scale, and then carrying out characteristic fusion on the multichannel basic image characteristics under each scale to obtain multichannel fusion image characteristics.

Optionally, the multi-channel fusion image feature generation module 320 is specifically configured to, after increasing the receptive field of the multi-channel basic image features at each scale by using a hole convolution algorithm, input the multi-channel basic image features at each scale into the merging network together to obtain the fusion image features of the first channel number; and performing convolution processing on the fusion image features of the first channel number by using a convolution kernel of 1 × 1 of the set channel number to obtain fusion image features of the second channel number.

Optionally, the segmentation map generation module 330 is specifically configured to perform channel fusion on the multi-channel fusion image features to obtain a target fusion image; inputting the target fusion image into an image segmentation model, and outputting a plurality of segmentation graphs through the image segmentation model, wherein each segmentation graph comprises mask graphs of all primitives with a set kernel scale;

the primitive identification result obtaining module 340 is specifically configured to combine the primitives included in each partition graph in sequence according to a sequence from small kernel to large kernel by using a breadth first method to obtain at least one primitive area, and obtain a primitive identification result corresponding to the primitive area.

Optionally, the apparatus further comprises: the system comprises a drawing frame detection module in a building drawing, wherein the drawing frame detection module in the building drawing is used for pre-identifying a standard building drawing by adopting a morphological algorithm before extracting multi-channel basic image features under multiple scales from a building plan to be identified, and intercepting at least one alternative drawing frame detection area from the standard building drawing according to a pre-identification result, wherein the standard building drawing comprises at least one small drawing frame of which the image size is smaller than or equal to a preset standard identification size;

for each candidate frame detection area, performing the following frame detection processing operations: extracting multi-channel basic image features in the candidate frame detection area; on the basis of keeping the basic image features from being lost, extracting multi-channel high-dimensional image features from each basic image feature, and enhancing the image feature quality of each high-dimensional image feature; on the basis of keeping the high-dimensional image characteristics from being lost, performing characteristic fusion on the multi-channel high-dimensional image characteristics to obtain multi-channel fusion image characteristics; and acquiring a picture frame identification result of the candidate picture frame detection area according to the multi-channel fusion image characteristics, and taking each identified picture frame as a building plan to be identified.

The primitive detection device in the building plan can execute the primitive detection method in the building plan provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the primitive detection method in the building plan.

Example four

Fig. 4 is a schematic diagram of a hardware structure of a computer device according to a fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 4 is only one example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.

As shown in FIG. 4, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes programs stored in the system memory 28 to perform various functional applications and data processing, such as implementing a primitive detection method in a building plan provided by an embodiment of the present invention. That is, the processing unit implements, when executing the program:

EXAMPLE five

An embodiment five of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for detecting a primitive in a building plan, as provided in all inventive embodiments of this application: that is, the program when executed by the processor implements:

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for detecting a primitive in a building plan is characterized by comprising the following steps:

2. The method of claim 1, wherein extracting multi-channel basis image features at multiple scales in a building plan to be identified comprises:

inputting the building plan into a residual error network, and acquiring output results of a plurality of residual error blocks of the residual error network as multi-channel basic image characteristics under the plurality of scales;

3. The method of claim 1, further comprising, before increasing the receptive field of the multi-channel base image feature at each scale using a hole convolution algorithm:

4. The method of claim 1, wherein increasing the receptive field of the multi-channel base image features at each scale using a hole convolution algorithm comprises:

acquiring void convolution ratios respectively corresponding to the multi-channel basic image features of all scales;

and respectively carrying out convolution operation on the convolution kernel of each cavity convolution ratio and the matched multichannel basic image characteristics so as to increase the receptive field of the multichannel basic image characteristics under each scale.

5. The method according to claim 1, wherein performing feature fusion on the multi-channel basic image features at each scale to obtain multi-channel fusion image features comprises:

inputting the multi-channel basic image features under all scales into a merging network together to obtain a fusion image feature with a first channel number;

and performing convolution processing on the fusion image features of the first channel number by using a convolution kernel of 1 × 1 of the set channel number to obtain fusion image features of the second channel number.

6. The method of claim 1, wherein deriving a plurality of segmentation maps from the multi-channel fused image features comprises:

performing channel fusion on the characteristics of the multi-channel fusion image to obtain a target fusion image;

inputting the target fusion image into an image segmentation model, and outputting a plurality of segmentation graphs through the image segmentation model, wherein each segmentation graph comprises mask graphs of all primitives with a set kernel scale;

adopting a progressive expansion algorithm to combine the primitives with different kernel proportions in each segmentation graph to obtain at least one primitive area, wherein the method comprises the following steps:

and sequentially combining all the primitives included in each partition graph according to the sequence of the kernel proportion from small to large by adopting a breadth-first method to obtain at least one primitive area.

7. The method according to any one of claims 1-6, wherein before extracting the multi-channel base image features at multiple scales in the building plan to be identified, further comprising:

pre-identifying a standard building drawing by adopting a morphological algorithm, and intercepting and obtaining at least one alternative picture frame detection area in the standard building drawing according to a pre-identification result, wherein the standard building drawing comprises at least one small picture frame with the image size smaller than or equal to a preset standard identification size;

for each candidate frame detection area, performing the following frame detection processing operations:

extracting multi-channel basic image features in the candidate frame detection area;

on the basis of keeping the basic image features from being lost, extracting multi-channel high-dimensional image features from each basic image feature, and enhancing the image feature quality of each high-dimensional image feature;

on the basis of keeping the high-dimensional image characteristics from being lost, performing characteristic fusion on the multi-channel high-dimensional image characteristics to obtain multi-channel fusion image characteristics;

and acquiring a picture frame identification result of the candidate picture frame detection area according to the multi-channel fusion image characteristics, and taking each identified picture frame as a building plan to be identified.

8. A primitive detecting apparatus in a building plan, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.