WO2022062543A1 - 一种图像处理方法、装置、设备和存储介质 - Google Patents
一种图像处理方法、装置、设备和存储介质 Download PDFInfo
- Publication number
- WO2022062543A1 WO2022062543A1 PCT/CN2021/103643 CN2021103643W WO2022062543A1 WO 2022062543 A1 WO2022062543 A1 WO 2022062543A1 CN 2021103643 W CN2021103643 W CN 2021103643W WO 2022062543 A1 WO2022062543 A1 WO 2022062543A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- building
- model
- roof
- area
- mentioned
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 82
- 230000001131 transforming effect Effects 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims description 109
- 230000009466 transformation Effects 0.000 claims description 61
- 238000012549 training Methods 0.000 claims description 57
- 238000000605 extraction Methods 0.000 claims description 46
- 238000004590 computer program Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 19
- 238000005070 sampling Methods 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 7
- 238000013519 translation Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 8
- 238000011176 pooling Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000013507 mapping Methods 0.000 description 7
- 238000002372 labelling Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000004927 fusion Effects 0.000 description 2
- 238000012821 model calculation Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/176—Urban or other man-made structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- the present application relates to the field of computer technologies, and in particular, to an image processing method, apparatus, device, and storage medium.
- the target images containing buildings are usually remote sensing images captured by satellites or aircraft, the bases of buildings in the images may be partially occluded, resulting in inconspicuous visual features of the bases of buildings, which affects the extraction of bases of buildings. precision.
- the present application discloses at least one image processing method, the method includes: acquiring a target image including at least one building; for each building, extracting the bounding box of the building and the target from the target image The target feature map of the image, based on the target feature map of the target image and the bounding box of the building, determine the roof area of the building and the predicted offset between the roof and the base; The area is transformed to obtain the base area of the above-mentioned building.
- the above-mentioned extracting the bounding box of the above-mentioned building from the above-mentioned target image includes:
- the above-mentioned determination of the roof area of the above-mentioned building includes: a target feature map based on the bounding box of the above-mentioned building and the above-mentioned target image , the roof area of the above-mentioned building is determined by the roof area prediction sub-model; the above-mentioned determination of the predicted offset between the roof and the base of the above-mentioned building includes: based on the bounding box of the above-mentioned building and the target feature map of the above-mentioned target image, The aforementioned predicted offset for the aforementioned building is determined by the offset prediction sub-model.
- the above-mentioned determination of the above-mentioned predicted offset of the above-mentioned building includes: using the second convolution processing unit included in the above-mentioned offset prediction sub-model to perform a second convolution process on the above-mentioned building feature to obtain the above-mentioned building.
- the above predicted offset for the object includes: using the second convolution processing unit included in the above-mentioned offset prediction sub-model to perform a second convolution process on the above-mentioned building feature to obtain the above-mentioned building.
- the above-mentioned transforming the above-mentioned roof area according to the above-mentioned predicted offset to obtain the base area of the above-mentioned building includes: based on the above-mentioned predicted offset and the above-mentioned building characteristics of the above-mentioned building, from the base
- the area prediction submodel determines the base area of the above-mentioned building.
- the above-mentioned determining the base area of the building by the base area prediction sub-model based on the predicted offset and the building characteristics of the building includes: using the space included in the base area prediction sub-model A transformation network, performing translation transformation on the building features corresponding to the above-mentioned roof area, to obtain the base characteristics of the above-mentioned buildings, wherein, the spatial transformation parameters of the above-mentioned spatial transformation network include parameters determined based on the above-mentioned predicted offset; using the above-mentioned base area to predict The sub-model performs a third convolution process on the above-mentioned base feature to obtain the base area of the above-mentioned building.
- the above-mentioned spatial transformation network includes a sampler constructed based on interpolation, wherein the above-mentioned sampler includes a sampling grid constructed based on the above-mentioned prediction offset;
- the transformation network performs translational transformation on the building features corresponding to the above-mentioned roof area, and obtains the base features of the above-mentioned buildings, including: using the above-mentioned sampler, according to the coordinate information of a plurality of pixel points included in the above-mentioned base features, sequentially transforming the base features
- Each of the included pixels is taken as the current pixel, and among the pixels included in the roof area, the pixel corresponding to the above-mentioned current pixel is determined by the sampling grid, and the value of the above-determined pixel is determined based on the interpolation method. Calculate to obtain the pixel value corresponding to the above-mentioned current pixel point.
- the same area feature extraction unit shared with the above-mentioned roof area prediction sub-model and the above-mentioned offset prediction sub-model also includes a roof contour prediction sub-model; the above-mentioned method further includes: using the above-mentioned roof contour prediction sub-model Perform contour regression on the above-mentioned building features to determine the roof contour of the above-mentioned building; according to the above-mentioned predicted offset, transform the above-mentioned roof contour to obtain the base contour of the above-mentioned building; Based on the above-mentioned base contour, the above-mentioned base area is adjusted, Get the final plinth area for the above building.
- performing contour regression on the building features using the roof contour prediction sub-model, and determining the roof contour of the building includes: extracting a plurality of connection points from the building features; At least some of the connection points in the connection points are combined to obtain a plurality of line segments; the above-mentioned plurality of line segments are predicted to obtain a prediction score corresponding to each line segment, wherein the above-mentioned prediction score is used to indicate that the line segment corresponding to the score belongs to the roof profile. probability; among the above-mentioned multiple line segments, the corresponding line segments whose predicted scores are greater than the preset threshold are combined to obtain the roof outline of the above-mentioned building.
- the above method is obtained by using an image processing model; wherein, the above image processing model includes a building bounding box prediction sub-model, a roof area prediction sub-model, an offset prediction sub-model, a roof outline prediction sub-model, and Base region prediction submodel.
- the above image processing model includes a building bounding box prediction sub-model, a roof area prediction sub-model, an offset prediction sub-model, a roof outline prediction sub-model, and Base region prediction submodel.
- the training method of the above image processing model includes: acquiring a plurality of training samples including labeling information; wherein the labeling information includes a building bounding box, a building roof area, a building roof outline, a building The offset between the roof and the base, the base area of the building; the joint learning loss information is constructed based on the loss information corresponding to each sub-model included in the above image processing model; the above image is based on the above joint learning loss information and the above training samples.
- Each sub-model included in the processing model is jointly trained until the above-mentioned sub-models converge.
- the present application also proposes an image processing apparatus, the apparatus includes: an acquisition module for acquiring a target image including at least one building; an extraction module for extracting the above-mentioned building from the above-mentioned target image for each building
- the bounding box of the above-mentioned target image and the target feature map of the above-mentioned target image based on the target feature map of the above-mentioned target image and the bounding box of the above-mentioned building, determine the roof area of the above-mentioned building and the predicted offset between the roof and the base; According to the predicted offset, the roof area is transformed to obtain the base area of the building.
- the present application also proposes an electronic device, the device includes: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement any of the above An image processing method shown in an embodiment.
- the present application also provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to execute the image processing method shown in any of the foregoing embodiments.
- the present application also proposes a computer program, including computer-readable codes, when the computer-readable codes are executed in an electronic device, the processor in the electronic device executes the program for realizing the implementation of any of the foregoing embodiments. out the image processing method.
- the roof area of the building with obvious visual features and the predicted offset between the roof and the base can be extracted from the acquired target image, and then based on the predicted offset, the higher-precision The roof area is transformed, so a high-precision building base area can be obtained, so that in the building base prediction process, there is no need to rely on the base features included in the target image, so that the building base features included in the target image are occluded. In the case of , a building base with higher precision can also be obtained.
- FIG. 2 is a schematic flowchart of a base extraction by an image processing model shown in the application
- FIG. 3 is a schematic flowchart of a method for predicting a roof area by an image processing unit shown in the application;
- FIG. 4 is a schematic flowchart of a method for performing offset prediction by an image processing unit shown in the present application
- FIG. 5 is a schematic flowchart of a method for performing offset prediction and roof area prediction by an image processing unit shown in the present application
- FIG. 6 is a schematic flowchart of a method for base prediction by a base area prediction sub-model shown in the application;
- FIG. 8 is a schematic flowchart of a method for predicting a roof outline by a bounding box analysis network shown in the present application
- Fig. 10 is a kind of task and model correspondence relationship diagram shown in the application.
- FIG. 11 is a method flowchart of an image processing model training method shown in this application.
- FIG. 12 is a schematic diagram of an image processing apparatus shown in this application.
- FIG. 13 is a hardware structure diagram shown in this application.
- This application aims to propose an image processing method.
- the method makes full use of the information of the main body, roof, base, etc. of the building in the target image, and extracts the roof area of the building with obvious visual features and the predicted offset between the roof and the base from the obtained target image. Based on the predicted offset, the roof area with high precision is transformed to obtain the base area of the building with high precision. In this way, even in the case where the building base included in the target image is occluded, a building base with high accuracy can be obtained.
- FIG. 1 is a method flowchart of an image processing method shown in this application. As shown in Figure 1, the above method may include:
- S102 Acquire a target image including at least one building.
- S106 Transform the roof area according to the predicted offset to obtain the base area of the building.
- the roof area of the building is represented by attribute information of the roof area, and the attribute information at least includes coordinate information representing the roof area.
- the attribute information further includes feature information of the roof area and/or the outline of the roof area.
- the base area of the building is represented by attribute information of the base area, and the attribute information at least includes coordinate information representing the base area.
- the attribute information further includes feature information of the base area and/or the outline of the base area.
- the above-mentioned image processing method can be applied to electronic equipment.
- the above-mentioned electronic device may execute the above-mentioned image processing method by carrying a software system or hardware structure corresponding to the image processing method.
- the types of the above electronic devices may be notebook computers, computers, servers, mobile phones, PAD terminals, etc., which are not particularly limited in this application.
- the above-mentioned image processing method can be executed only by the terminal device or the server device alone, or can be executed by the terminal device and the server device in cooperation.
- the above-mentioned image processing method can be integrated in the client.
- the terminal device equipped with the client After receiving the image processing request, the terminal device equipped with the client can provide computing power through its own hardware environment to execute the above image processing method.
- the above-mentioned image processing method can be integrated into the system platform.
- the server device equipped with the system platform can provide computing power through its own hardware environment to execute the above image processing method.
- the above image processing method can be divided into two tasks: acquiring a target image and processing the target image.
- the acquisition task can be integrated in the client and carried on the terminal device.
- Processing tasks can be integrated on the server and carried on the server device.
- the above terminal device may initiate an image processing request to the above server device after acquiring the target image.
- the server device may execute the method on the target image in response to the request.
- the execution subject is an electronic device (hereinafter referred to as a device) as an example for description.
- the above-mentioned target image refers to an image including at least one building in the image.
- the above-mentioned target image may be a remote sensing image captured by a device such as an aircraft, an unmanned aerial vehicle, or a satellite.
- the following description mainly takes a building as an example.
- the processing method of the image including a plurality of buildings in the target image is similar to the processing method of the image including one building.
- the above-mentioned device may complete the input of the target image by interacting with the user.
- the above-mentioned device can provide the user with a window for inputting the target image to be processed through its onboard interface, so that the user can input the image.
- the user can complete the input of the target image based on this window.
- the image can be input into the image processing model for calculation.
- the above-mentioned device can directly acquire the remote sensing image output by the remote sensing image acquisition system.
- the above-mentioned device may pre-establish a certain protocol with the remote sensing image acquisition system. After the remote sensing image acquisition system generates the remote sensing image, it can be sent to the above-mentioned equipment for image processing.
- the above-mentioned device may be equipped with an image processing model to perform the above-mentioned image processing.
- the above-mentioned device can use an image processing model to perform image processing on each building in the above-mentioned target image, so as to extract the roof area of the building and the predicted deviation between the roof and the base of the above-mentioned building from the above-mentioned target image. According to the predicted offset, the roof area is transformed to obtain the base area corresponding to the building.
- the above image processing model may be an end-to-end image processing model for extracting building bases based on target images.
- the image processing model may be a pre-trained neural network model.
- FIG. 2 is a schematic flowchart of a base extraction using an image processing model according to the present application.
- the above-mentioned image processing model may include an image processing unit and a region transforming unit.
- the input of the image processing unit is the target image.
- the output of the above image processing unit is the roof area of each building and the predicted offset between the roof and the plinth.
- the input of the above-mentioned area transformation unit is the output of the above-mentioned image processing unit.
- the output of the above-mentioned area transformation unit is the base area.
- the above image processing unit may include a sub-model based on a deep neural network for predicting the roof area and the predicted offset between the roof and the plinth.
- the above image processing unit may further include a building bounding box prediction sub-model and a roof area prediction sub-model.
- the building bounding box prediction sub-model is used to extract the building bounding box in the target image and provide input for other sub-models, so as to make full use of various information of buildings in the target image.
- the above-mentioned building bounding box prediction sub-model may be a neural network model obtained by training based on a plurality of training samples marked with bounding boxes.
- the above-mentioned roof area prediction sub-model performs roof area prediction for each building based on the inputted building bounding box and the area features in the target image.
- the above-mentioned roof area prediction sub-model may be a neural network model obtained by training based on a plurality of training samples marked with the roof area.
- FIG. 3 is a schematic flowchart of a method for predicting a roof area by an image processing unit according to the present application.
- the image processing unit may include a roof area prediction sub-model.
- the roof area prediction submodel may include a building bounding box prediction submodel.
- the above-mentioned building bounding box prediction sub-model may be a regression model constructed based on RPN (Region Proposal Network, candidate box generation network).
- the above-mentioned roof area prediction sub-model may be a regression model constructed based on a regional feature extraction unit such as a RoI Align (Region of interest Align, region of interest feature alignment) network or a RoI pooling (Region of interest pooling, region of interest feature pooling) network. .
- the above-mentioned roof area prediction sub-model includes the above-mentioned building bounding box prediction sub-model, and the above-mentioned building bounding box prediction sub-model includes a backbone network, a candidate frame generation network and an area feature extraction unit.
- FIG. 3 is only a schematic illustration, and some intermediate layers such as convolution layers, spatial pyramid layers, and fully connected layers may be added according to actual situations.
- the building bounding box prediction sub-model can be used to first perform target detection on the above target image to obtain the bounding box of the above building.
- the target feature map of the target image can be obtained.
- This application does not limit the architecture of the backbone network, which can be a common convolutional neural network (Convolutional Neural Networks, CNN) network, such as VGGNet, ResNet, HRNet, and the like.
- the information of the target feature map of the target image is related to the specific architecture of the applied backbone network. Then, the target feature map is calculated based on the RPN, and multiple candidate boxes of different sizes are obtained.
- the regional feature extraction unit 1 Through the regional feature extraction unit 1, the corresponding features of a fixed size can be obtained from these candidate boxes, and then the bounding boxes of one or more buildings are respectively generated through the subsequent fully connected layers. Among them, the regional feature extraction unit 1 can use the RoI Align network or the RoI pooling network.
- the above-mentioned target image may be input into the above-mentioned building bounding box prediction sub-model to perform target detection to obtain the bounding box of the building.
- the roof area of the building included in the bounding box may be determined by the roof area prediction sub-model based on the bounding box and the target feature map of the target image.
- the attribute information of the roof area includes coordinate information of the roof area.
- the above-mentioned bounding box and the target feature map of the above-mentioned target image may be input into the area feature extraction unit 2 of the above-mentioned roof area prediction sub-model to obtain the roof area corresponding to the building included in the above-mentioned bounding box.
- the loss value of the roof area prediction sub-model can be increased for supervised training, thereby improving the accuracy of feature extraction of the backbone network.
- the above-mentioned roof area prediction sub-model utilizes the output of the above-mentioned building bounding box prediction sub-model. Therefore, when training the above-mentioned building bounding box prediction sub-model, the roof area can be used for supervised training. In the training sample, the roof area of the sample image is marked with the real value, which is used as supervision information to predict the building bounding box. The sub-model can learn the relevant features needed to predict the roof area, thereby improving the accuracy of building bounding box prediction and further improving the accuracy of roof extraction.
- the above-mentioned image processing unit may further include an offset prediction sub-model (hereinafter referred to as “offset prediction sub-model”) between the roof and the base, for extracting the roof and base of the building included in the image from the target image. Prediction offset between (hereinafter referred to as “prediction offset”).
- Prediction offset between (hereinafter referred to as “prediction offset”).
- the above-mentioned offset prediction sub-model and the above-mentioned roof area prediction sub-model both perform feature extraction on the buildings included in the target image, in order to reduce the model calculation amount, the above-mentioned offset prediction can be The sub-model and the above-mentioned roof area prediction sub-model share the above-mentioned building bounding box prediction sub-model.
- FIG. 4 is a schematic flowchart of a method for performing offset prediction by an image processing unit according to the present application.
- the image processing unit may include a building bounding box prediction sub-model and an offset prediction sub-model.
- the above-mentioned building bounding box prediction sub-model may be a regression model constructed based on RPN.
- the above-mentioned offset prediction sub-model may be a regression model constructed based on regional feature extraction units such as RoI Align network or RoI pooling network.
- the offset prediction sub-model and the roof area prediction sub-model share the building bounding box prediction sub-model.
- FIG. 4 is only a schematic illustration, and some intermediate layers such as convolution layers, spatial pyramid layers, and fully connected layers may be added according to actual situations.
- the offset prediction sub-model may determine the predicted offset between the roof and the base of the building included in the bounding box based on the bounding box and the target feature map of the target image.
- the bounding box of the building output by the building bounding box prediction sub-model and the target feature map of the above-mentioned target image can be input into the regional feature extraction unit 2 of the offset prediction sub-model to obtain the above-mentioned predicted offset .
- the loss value of the offset prediction sub-model can be increased for supervised training, thereby improving the accuracy of feature extraction of the backbone network.
- the offset prediction sub-model and the roof area prediction sub-model share the building bounding box prediction sub-model
- the input of the offset prediction sub-model is the building bounding box prediction sub-model
- the above-mentioned offset prediction sub-model includes the above-mentioned building bounding box prediction sub-model. Therefore, when training the above-mentioned building bounding box prediction sub-model, the predicted offset can be used for supervised training.
- the offset part of the sample image is marked with the real value, which is used as supervision information to make the building
- the bounding box prediction sub-model can learn the relevant features required to predict the offset, thereby improving the prediction accuracy of the building bounding box and further improving the accuracy of the base area obtained by transformation.
- sharing the building bounding box prediction sub-model with the roof area prediction sub-model can reduce the amount of model computation.
- the roof area prediction sub-model and the offset prediction sub-model between the roof and the base may share the same area feature extraction unit.
- the above-mentioned regional feature extraction unit may be a regional feature extraction unit constructed based on the RoI Align unit or the RoI pooling unit.
- FIG. 5 is a schematic flowchart of a method for performing offset prediction and roof area prediction by an image processing unit according to the present application.
- the above processing flow includes two sub-branches.
- the first sub-branch is the roof area prediction sub-branch; the other sub-branch is the offset prediction sub-branch.
- the above two sub-branches may share the region feature extraction unit.
- the above-mentioned target image which can be a target feature map obtained after the target image is processed by the backbone network
- the above-mentioned regional feature extraction unit it is determined that the above-mentioned bounding box includes: The corresponding building features of the building.
- the above-mentioned target image may include multiple buildings. It can be understood that, in the above-mentioned situation, the solution described in this application can separately extract the bounding boxes of multiple buildings, and perform the above steps of determining the building features for each building bounding box.
- the number of buildings included in the target image is not limited in this application.
- the first convolution processing unit included in the roof area prediction sub-model can be used to perform the first convolution processing on the above building features, to obtain The roof area of the above building.
- the attribute information of the roof area includes not only coordinate information of the roof area, but also feature information of the roof area.
- the building features may be input into the first convolution processing unit shown in FIG. 5 for calculation to obtain the attribute information of the roof area.
- the second convolution processing unit included in the offset prediction sub-model can also be used to perform a second volume on the building features.
- the product is processed to obtain the predicted offset between the roof and the base of the building included in the above bounding box.
- the building features may be input into the second convolution processing unit shown in FIG. 5 for calculation to obtain the predicted offset.
- the present application does not limit the structures of the first convolution unit and the second convolution unit.
- the structures of the first convolution unit and the second convolution unit may be set according to actual requirements.
- the model structure shown in the above-mentioned FIG. 5 is only a schematic illustration. Conventional structures such as upsampling, downsampling, pooling operations, etc. are not shown in Figure 5. The above conventional structure can be set according to the actual situation.
- the above-mentioned roof area prediction sub-model and the above-mentioned offset prediction sub-model between the roof and the base may share the same area feature extraction unit. Therefore, on the one hand, when training the above-mentioned regional feature extraction unit, the predicted offset and the roof area can be used for supervised training, and the label information of the real value such as offset and roof area is introduced as the supervision information, so that the regional feature extraction unit can perform supervised training. It can learn the predicted offset and related features required by the roof area, thereby improving the accuracy of building feature extraction and further improving the accuracy of base extraction. On the other hand, the model structure is simplified and the model calculation amount is reduced.
- the obtained roof area and the predicted offset can be input into the area transformation unit to obtain the base area.
- the region transformation unit may be a type of mapping unit.
- x1 represents the predicted offset between the roof area and the base area
- x2 represents the roof area
- y represents the base area
- f is the mapping function of the base area obtained from the predicted offset and the roof area.
- the above-mentioned prediction offset may include a rotation prediction offset and a translation prediction offset.
- the specific meaning of the prediction offset is not limited in this application.
- the base can be obtained only by translating the roof as an example for description.
- the predicted offset and the roof area can be obtained according to the predicted offset.
- the roof feature is transformed to obtain the plinth feature of the plinth area.
- the base feature is refined to obtain the base area.
- the region transformation unit when performing translation transformation, translation transformation is performed on the features of the roof area.
- the region transformation unit can use bilinear interpolation to select and map the original features and the transformed features.
- the above operation can avoid the introduction of convolution and/or upsampling from the roof feature to the roof area during the translation transformation process. other errors, thereby improving the accuracy of the base extraction.
- the roof area is obtained based on the building features, and the building features are obtained from the regional feature extraction unit. Since the supervised training is performed with the roof area as the real value when training the feature extraction unit of this area, the feature response corresponding to the roof area in the above building features will be very high.
- the above-mentioned region transformation unit may be a unit constructed based on a neural network.
- This unit can be used as a base area prediction sub-model for predicting the base area, that is, the above-mentioned image processing model further includes a base area prediction sub-model constructed based on a neural network.
- the above-mentioned roof area is transformed according to the above-mentioned predicted offset to obtain the base area corresponding to the above-mentioned building, it is possible to determine the above-mentioned base area based on the above-mentioned predicted offset, the characteristics of the building corresponding to the above-mentioned building, and the base area prediction sub-model.
- the base area corresponding to the building is determined.
- the above-mentioned predicted offset and the building feature corresponding to the above-mentioned building may be input into the above-mentioned base area prediction sub-model to obtain the base area corresponding to the above-mentioned building.
- the base region prediction sub-model described above may comprise a spatial transformation network.
- the spatial transformation parameters corresponding to the above-mentioned spatial transformation network include parameters determined based on the above-mentioned prediction offset.
- FIG. 6 is a schematic flowchart of a method for predicting a base by using a base area prediction sub-model according to the present application.
- the spatial transformation network included in the above-mentioned base area prediction sub-model can be used to perform spatial transformation on the building characteristics corresponding to the above-mentioned roof area to obtain the corresponding Base features.
- the above-mentioned base features can be input into multiple convolutional layers (the third convolution unit shown in FIG. 6 ) included in the above-mentioned base area prediction sub-model to perform third convolution processing to obtain the above-mentioned building. the corresponding base area.
- the attribute information of the base area includes not only coordinate information of the base area, but also feature information of the base area.
- the present application does not limit the structure of the third convolution unit.
- the structure of the above-mentioned third convolution unit can be set according to actual requirements.
- the model structure shown in the above-mentioned FIG. 6 is only a schematic illustration. Conventional structures such as upsampling, downsampling, pooling operations, etc. are not shown in Figure 6. The above conventional structure can be set according to the actual situation.
- the above-mentioned spatial transformation network may include a sampler (Sampler) constructed based on an interpolation method, wherein the above-mentioned sampler includes a sampling grid (Grid generator) constructed based on the above-mentioned predicted offset.
- the above-mentioned sampling grid is specifically a transformation function constructed based on the above-mentioned prediction offset.
- the above sampling grid may indicate the mapping relationship between each pixel included in the roof feature and each pixel included in the base feature. For example, according to the above sampling grid, it can be determined which pixel points included in the roof feature are mapped to a certain pixel point corresponding to the base feature.
- the above sampler is specifically a mapping unit constructed based on an interpolation method.
- the above-mentioned sampler can map the original feature (building feature) and the translationally transformed feature (base feature) based on the interpolation method to map the feature position (feature position) and the feature value (feature score), so as to obtain the base feature.
- the above-mentioned interpolation mode may be bilinear interpolation, linear difference, parabolic interpolation, and the like. In this application, bilinear interpolation can be adopted.
- the sampler can be used to obtain the base features according to the plurality of base features. For each coordinate information of the pixel point, each pixel point included in the base feature is taken as the current pixel point in turn, and the pixel point corresponding to the above-mentioned current pixel point in each pixel point included in the roof area is determined through the sampling grid, and based on The interpolation method calculates the value of the determined pixel point to obtain the pixel value corresponding to the above-mentioned current pixel point.
- the above-mentioned base region prediction sub-model includes a spatial transformation network that can be back-propagated and a third convolution unit, therefore, it is different from a non-neural network such as RT transformation (rotation, translation transformation), etc.
- the base area can be used as the ground truth to perform supervised training on the base area prediction sub-model (including the spatial transformation network and the third convolution layer), thereby introducing The prediction error between the base area predicted based on the roof area and the true value of the base area is used as supervision information, so that the shared network between the base area prediction sub-model and the offset prediction sub-model can be trained based on the above prediction error, thereby Improve the accuracy of offset prediction, and further improve the prediction accuracy of the base area.
- the base area prediction sub-model, the roof area prediction sub-model, and the offset prediction sub-model share the building features output by the regional feature extraction unit, the supervision information can be shared during the training of each sub-model, thereby speeding up the The model converges while improving the performance of each sub-model.
- the geographic coordinates of the building can be restored according to the coordinate information of each pixel included in the roof area on the target image, the technical solutions provided by the above embodiments can not only accurately restore the shape of the base of the building, but also restore the shape of the base of the building. Restore the geographic location of the base of the building.
- a roof contour that is more suitable than the edge included in the roof area extracted by the roof area prediction sub-model may be extracted from the target image, and based on the above-mentioned roof The contour modifies the roof area obtained by the roof area prediction submodel to obtain the final base area.
- FIG. 7 is a flowchart of a method for predicting a final base area shown in the present application.
- S702 may be executed, and contour regression is performed on the building features by using the roof contour prediction sub-model to determine the roof contour of the building.
- the building features may be input into the roof profile prediction sub-model to obtain the roof profile of the building.
- the above-mentioned roof profile prediction sub-model may be a model constructed based on a Wireframe Parsing network. Through the bounding box analysis network, a more accurate roof outline can be extracted from the target image.
- FIG. 8 is a schematic flowchart of a method for predicting a roof outline by using a bounding box analysis network according to the present application.
- connection points can be extracted from the building features.
- the building features can be input into the fourth convolution unit (not shown in the figure) for multiple convolution operations and smoothing to obtain a heat map including multiple connection points.
- each pixel block in the heat map (for example, if the resolution of the heat map is 14*14, then the heat map includes 196 pixel blocks) can be marked true value (that is, when a certain pixel block includes a connection point, it is marked as 1, otherwise it is marked as 0) to obtain multiple training samples, and then based on the multiple training samples, the cross-entropy loss information can be used as the objective function.
- the product unit is trained so that the fourth convolution unit described above can make connection point predictions for each pixel block in this heatmap.
- line segment sampling can be performed. That is, a plurality of line segments are obtained by combining at least some of the connection points in pairs.
- line segment verification can be performed. That is, the above-mentioned multiple line segments are predicted to obtain a prediction score corresponding to each line segment; and a line segment with a predicted score greater than a preset threshold is screened out; wherein, the above-mentioned prediction score is used to indicate the probability that the line segment corresponding to the score belongs to the roof outline.
- the above-mentioned preset threshold may be a threshold set according to experience.
- the above-mentioned multiple line segments can be input into a line segment verification network to obtain a prediction score corresponding to each line segment, and then a line segment with a prediction score greater than the above-mentioned preset threshold can be screened out.
- the above-mentioned line segment verification network may include a line segment feature extraction network and a classification score prediction network.
- the above-mentioned line segment feature extraction network is used for extracting line segment features corresponding to the constructed line segments from the building features. After the line segment feature is obtained, the classification score corresponding to the line segment can be predicted based on the classification score prediction network and the line segment feature.
- the same number of positive samples and negative samples can be set when constructing training samples, so that the above-mentioned line segment verification network can learn the line segments corresponding to the positive samples and the negative samples respectively.
- the positive samples refer to the line segment pairs with high similarity in the image.
- Negative samples are pairs of line segments with low similarity.
- the line segments whose classification score is greater than the preset threshold After filtering out the line segments whose classification score is greater than the preset threshold, the line segments whose corresponding prediction scores are greater than the preset threshold among the multiple line segments can be combined to obtain the roof outline of the building.
- S704 may be executed to perform translation transformation on the above-mentioned roof outline according to the above-mentioned predicted offset to obtain the above-mentioned base outline of the building.
- the above-mentioned translation transformation may be to map the roof outline to the base outline through a preset transformation function (eg, RT transformation).
- a preset transformation function eg, RT transformation
- the roof contour in order to improve the transformation accuracy, may be translated and transformed through a spatial transformation network to obtain the base contour.
- FIG. 9 is a schematic diagram of a base area prediction flowchart shown in the present application.
- the above-mentioned roof outline and the above-mentioned predicted offset can be input into the above-mentioned spatial transformation network for translational transformation to obtain the base outline.
- the space transformation network For the related introduction of the space transformation network, reference may be made to the foregoing content, which will not be described in detail here.
- the spatial transformation network used for predicting the outline of the base and predicting the area of the base may be the same network or different networks.
- the spatial transformation network used for predicting the outline of the base and predicting the base area may be the same network.
- S706 may be executed to adjust the base area based on the base outline to obtain the final base area corresponding to the building.
- the attribute information of the final base area includes coordinate information representing the base area, feature information of the base area, and an outline of the base area.
- the outline of the base can be fused with the preliminarily predicted base area, and the edge corresponding to the preliminarily predicted base area can be corrected by the fusion technology to obtain a more realistic base outline.
- the base outline can then be fused with the original target image to obtain the final base area.
- the process of image fusion may refer to the related art, which will not be described in detail here.
- the roof outline is first obtained from the target image using the bounding box analysis network. Then based on the roof profile to get the exact base profile. Finally, based on the outline of the base, the preliminary predicted base area is corrected to obtain the final base area.
- the above-mentioned roof outline is more accurate and more suitable for the real building roof outline. Therefore, the final foundation predicted after the above-mentioned foundation outline is revised Regions will be more precise.
- the image processing models used in the building base prediction scheme may include a building bounding box prediction submodel, a roof area prediction submodel, an offset prediction submodel, a roof outline prediction submodel, and a base area prediction submodel.
- the multi-task joint training method is adopted when training the image processing model.
- FIG. 10 is a diagram showing the correspondence between tasks and models shown in this application.
- the base prediction needs at least the building bounding box prediction subtask, the roof area prediction subtask, and the offset between the roof and the base.
- the prediction subtask (hereinafter referred to as the "offset prediction subtask"), the roof outline prediction subtask, and the base area prediction subtask.
- the above building bounding box prediction subtask corresponds to the building bounding box prediction submodel.
- the above-mentioned roof area prediction subtask corresponds to the roof area prediction submodel.
- the above offset prediction subtask corresponds to the offset prediction submodel.
- the above-mentioned roof profile prediction subtask corresponds to the roof profile prediction submodel.
- the above-mentioned base area prediction subtask corresponds to the base area prediction sub-model.
- FIG. 11 is a method flowchart of an image processing model training method shown in this application.
- the above image processing model includes a building bounding box prediction sub-model, a roof area prediction sub-model, an offset prediction sub-model, a roof outline prediction sub-model and a base area prediction sub-model.
- the method includes:
- S1102 Acquire a plurality of training samples including labeling information; wherein the labeling information includes a building bounding box, a building roof area, a building roof outline, an offset between a building roof and a base, and a building base area.
- the original image can be labeled with ground truth by means of manual labeling or machine-assisted labeling.
- image annotation software can be used to label the building bounding box, building roof area, building roof outline, offset between building roof and base, building base included in the original image Regions are labeled to obtain multiple training samples.
- one-hot encoding or other methods may be used to encode the training samples, and the present application does not limit the specific encoding method.
- S1104 Construct joint learning loss information based on loss information corresponding to each sub-model included in the image processing model.
- the corresponding loss information of each sub-model may be determined first.
- the loss information corresponding to the above-mentioned building bounding box prediction sub-model is Smooth L1; the above-mentioned roof area prediction sub-model, the above-mentioned roof outline prediction sub-model, and the above-mentioned base area prediction sub-model
- the corresponding loss information is the cross entropy loss information; the loss information corresponding to the above-mentioned offset prediction sub-model between the roof and the base is the MSE (Mean Squared Error, mean square error) loss information. From this, five different levels of supervisory information are provided.
- joint learning loss information may be constructed based on the corresponding loss information of each sub-model included in the above image processing model. For example, the loss information corresponding to each sub-model can be added to obtain the above-mentioned joint learning loss information.
- a regularization term may also be added to the above joint learning loss information in the present application, which is not particularly limited here.
- S1106 may be executed to jointly train the sub-models included in the image processing model based on the joint learning loss information and the training samples, until the sub-models converge.
- the above-mentioned image processing model can be supervised based on the above-mentioned training samples marked with ground truth values.
- the supervised training process after the calculation results are obtained by forward propagation of the image processing model, the error between the true value and the above calculation results can be evaluated based on the constructed joint learning loss information.
- the stochastic gradient descent method can be used to determine the descending gradient.
- the model parameters corresponding to the above image processing model can be updated based on backpropagation. The above process is repeated until the above sub-model models converge.
- the present application does not specifically limit the conditions for model convergence.
- the method of sharing features is used for multi-task joint training to ensure the coupling between each task in the training phase.
- the five sub-models included in the image processing can be trained at the same time, so that the sub-models can both constrain and promote each other during the training process.
- the convergence efficiency of the image processing model is improved; on the other hand, the backbone network shared by each sub-model is promoted to extract features that are more beneficial to the prediction of the base area, thereby improving the accuracy of the model prediction.
- the present application further provides an image processing apparatus.
- FIG. 12 is a schematic diagram of an image processing apparatus shown in this application.
- the above-mentioned apparatus 1200 includes:
- the acquisition module 1210 is used to acquire a target image containing at least one building; the extraction module 1220 is used to extract the bounding box of the above-mentioned building and the target feature map of the above-mentioned target image from the above-mentioned target image for each building, Based on the target feature map of the target image and the bounding box of the building, the roof area of the building and the predicted offset between the roof and the base are determined; the transformation module 1230 is configured to, according to the predicted offset, perform a transformation on the roof of the building. The area is transformed to obtain the base area of the above-mentioned building.
- the bounding box determination module is used to perform target detection on the above-mentioned target image by using the building bounding box prediction sub-model to obtain the bounding box of the above-mentioned building;
- the above-mentioned extraction module 1220 includes: a roof area determination module , for determining the roof area of the above-mentioned building based on the target feature map of the above-mentioned bounding box and the above-mentioned target image through the roof area prediction sub-model;
- the offset determination module is used for the target feature map based on the above-mentioned bounding box and the above-mentioned target image.
- the above-mentioned predicted offset of the above-mentioned building is determined by the offset prediction sub-model.
- the above-mentioned roof area prediction sub-model and the above-mentioned offset prediction sub-model share the same area feature extraction unit, and the above-mentioned area feature extraction unit is based on the bounding box of the building and the target feature map of the target image Determine the building features of the above-mentioned buildings;
- the above-mentioned roof area determination module includes: a first convolution processing module for using the first convolution processing unit included in the above-mentioned roof area prediction sub-model to perform the first volume on the above-mentioned building characteristics.
- the above-mentioned offset determination module includes: a second convolution processing module for using the second convolution processing unit included in the above-mentioned offset prediction sub-model to analyze the characteristics of the above-mentioned building.
- a second convolution process is performed to obtain the above-mentioned predicted offset of the above-mentioned building.
- the transformation module 1230 is specifically configured to: determine the base area of the building by using the base area prediction sub-model based on the predicted offset and the building characteristics of the building.
- the above-mentioned transformation module 1230 includes: a first translational transformation module, configured to perform translational transformation on the building features corresponding to the above-mentioned roof area by using the spatial transformation network included in the above-mentioned base area prediction sub-model, to obtain The base feature of the above-mentioned building, wherein, the spatial transformation parameter corresponding to the above-mentioned spatial transformation network includes a parameter determined based on the above-mentioned prediction offset; the third convolution processing module is used for using the above-mentioned base area prediction sub-model to perform the above-mentioned base feature. The third convolution process obtains the base area of the above-mentioned building.
- the above-mentioned spatial transformation network includes a sampler constructed based on interpolation, wherein the above-mentioned sampler includes a sampling grid constructed based on the above-mentioned predicted offset; the above-mentioned first translational transformation module is specifically used for: Using the above sampler, according to the coordinate information of the plurality of pixels included in the base feature, each pixel included in the base feature is taken as the current pixel in turn, and the sampling grid is used to determine the number of pixels included in the roof area.
- the pixel point corresponding to the above-mentioned current pixel point, and the value of the above-mentioned determined pixel point is calculated based on the interpolation method to obtain the pixel value corresponding to the above-mentioned current pixel point.
- the roof contour prediction sub-model that shares the same area feature extraction unit with the above-mentioned roof area prediction sub-model and the above-mentioned offset prediction sub-model also includes a roof contour prediction sub-model;
- the above-mentioned apparatus further includes: a contour regression module for using The above-mentioned roof contour prediction sub-model performs contour regression on the above-mentioned building features to determine the roof contour of the above-mentioned building;
- the second translation transformation module is used for transforming the above-mentioned roof contour according to the above-mentioned predicted offset, and obtains the above-mentioned building.
- the base outline; the final base area determination module is used to adjust the base area based on the base outline to obtain the final base area of the building.
- the above-mentioned contour regression module is specifically used to: extract a plurality of connection points from the above-mentioned building features; combine at least some of the above-mentioned connection points in pairs to obtain a plurality of line segments Predict the above-mentioned multiple line segments, and obtain the corresponding prediction score of each line segment, wherein, the above-mentioned prediction score is used to indicate the probability that the line segment corresponding to the score belongs to the roof outline; The line segments with the threshold value are combined to obtain the roof outline of the above-mentioned building.
- the above-mentioned extraction module 1220 is specifically configured to: use an image processing model to perform image processing on the target image; wherein, the image processing model includes a building bounding box prediction sub-model, a roof area prediction sub-model model, offset prediction submodel, roof profile prediction submodel, and plinth area prediction submodel.
- the image processing model includes a building bounding box prediction sub-model, a roof area prediction sub-model model, offset prediction submodel, roof profile prediction submodel, and plinth area prediction submodel.
- the training device corresponding to the training method of the above image processing model includes:
- the training sample acquisition module is used to acquire a plurality of training samples including label information; wherein, the label information includes the building bounding box, the building roof area, the building roof outline, the offset between the building roof and the base, building base area;
- a loss information determination module configured to construct joint learning loss information based on the loss information corresponding to each sub-model included in the above image processing model
- the joint training module is configured to jointly train each sub-model included in the above-mentioned image processing model based on the above-mentioned joint learning loss information and the above-mentioned training sample, until the above-mentioned sub-models converge.
- an electronic device which may include: a processor.
- Memory used to store processor-executable instructions.
- the above-mentioned processor is configured to invoke the executable instructions stored in the above-mentioned memory to implement the image processing method shown in any of the above-mentioned embodiments.
- FIG. 13 is a hardware structure diagram of an electronic device shown in this application.
- the electronic device may include a processor for executing instructions, a network interface for performing network connection, a memory for storing operating data for the processor, and a non-volatile memory for storing instructions corresponding to the image processing apparatus. volatile memory.
- the embodiment of the image processing apparatus may be implemented by software, or may be implemented by hardware or a combination of software and hardware.
- a device in a logical sense is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where the device is located.
- the electronic device where the apparatus is located in the embodiment may also include other Hardware, no further details on this.
- the corresponding instructions of the image processing apparatus may also be directly stored in the memory, which is not limited herein.
- the present application provides a computer-readable storage medium, where a computer program is stored in the storage medium, and the computer program is used to execute the image processing method shown in any of the foregoing embodiments.
- the storage medium may be a volatile or non-volatile computer-readable storage medium.
- the embodiments of the present application further provide a computer program product, where the computer program product carries program codes, and the instructions included in the program codes can be used to execute the data set generation method or the forgery detection method described in the above method embodiments. Refer to the above method embodiments, which are not repeated here.
- the above-mentioned computer program product can be specifically implemented by means of hardware, software or a combination thereof.
- the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
- one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may employ a computer implemented on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein The form of the program product.
- computer-usable storage media which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.
- Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuits, in tangible embodiment of computer software or firmware, in computer hardware which can include the structures disclosed in this application and their structural equivalents, or in A combination of one or more of.
- Embodiments of the subject matter described in this application may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules.
- the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data.
- the processing device executes.
- the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
- the processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
- the processes and logic flows described above can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
- FPGA Field Programmable Gate Array
- ASIC Application Specific Integrated Circuit
- a computer suitable for the execution of a computer program may include, for example, a general and/or special purpose microprocessor, or any other type of central processing unit.
- the central processing unit will receive instructions and data from read only memory and/or random access memory.
- the basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operably coupled to, such mass storage devices to receive data therefrom or to include one or more mass storage devices, such as magnetic disks, magneto-optical disks, or optical disks, etc., for storing data. Send data to it, or both.
- the computer does not have to have such a device.
- the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
- PDA personal digital assistant
- GPS global positioning system
- USB universal serial bus
- Computer readable media suitable for storage of computer program instructions and data may include all forms of non-volatile memory, media, and memory devices, and may include, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks) or removable discs), magneto-optical discs, and CD-ROM and DVD-ROM discs.
- semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
- magnetic disks eg, internal hard disks
- removable discs removable discs
- magneto-optical discs e.g., CD-ROM and DVD-ROM discs.
- the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (14)
- 一种图像处理方法,其特征在于,所述方法包括:获取包含至少一个建筑物的目标图像;对于每个建筑物,从所述目标图像中提取出所述建筑物的边界框和所述目标图像的目标特征图;基于所述目标图像的目标特征图和所述建筑物的边界框确定所述建筑物的屋顶区域和屋顶与底座之间的预测偏移量;根据所述预测偏移量,对所述屋顶区域进行变换得到所述建筑物的底座区域。
- 根据权利要求1所述的方法,其特征在于,所述从所述目标图像中提取出所述建筑物的边界框包括:利用建筑物边界框预测子模型对所述目标图像进行目标检测,得到所述建筑物的边界框;所述确定所述建筑物的屋顶区域,包括:基于所述建筑物的边界框和所述目标图像的目标特征图,由屋顶区域预测子模型确定所述建筑物的屋顶区域;所述确定所述建筑物的屋顶与底座之间的预测偏移量,包括:基于所述建筑物的边界框和所述目标图像的目标特征图,由偏移量预测子模型确定所述建筑物的所述预测偏移量。
- 根据权利要求2所述的方法,其特征在于,所述屋顶区域预测子模型与所述偏移量预测子模型共用同一区域特征提取单元,所述区域特征提取单元基于所述建筑物的边界框和所述目标图像的目标特征图确定所述建筑物的建筑物特征;所述确定所述建筑物的屋顶区域,包括:利用所述屋顶区域预测子模型包括的第一卷积处理单元对所述建筑物特征进行第一卷积处理,得到所述建筑物的屋顶区域;所述确定所述建筑物的所述预测偏移量,包括:利用所述偏移量预测子模型包括的第二卷积处理单元对所述建筑物特征进行第二卷积处理,得到所述建筑物的所述预测偏移量。
- 根据权利要求3所述的方法,其特征在于,所述根据所述预测偏移量,对所述屋顶区域进行变换得到所述建筑物的底座区域,包括:基于所述预测偏移量和所述建筑物的所述建筑物特征,由底座区域预测子模型确定所述建筑物的底座区域。
- 根据权利要求4所述的方法,其特征在于,所述基于所述预测偏移量和所述建筑物的所述建筑物特征,由底座区域预测子模型确定所述建筑物的底座区域,包括:利用所述底座区域预测子模型包括的空间变换网络,对所述屋顶区域对应的建筑物特征进行平移变换,得到所述建筑物的底座特征,其中,所述空间变换网络的空间变换参数包括基于所述预测偏移量确定的参数;利用所述底座区域预测子模型对所述底座特征进行第三卷积处理,得到所述建筑物的底座区域。
- 根据权利要求5所述的方法,其特征在于,所述空间变换网络包括基于插值方式构建的采样器,其中,所述采样器包括基于所述预测偏移量构建的采样网格;所述利用所述底座区域预测子模型包括的空间变换网络,对所述屋顶区域对应的建筑物特征进行平移变换,得到所述建筑物的底座特征,包括:利用所述采样器,按照所述底座特征包括的多个像素点的各坐标信息,依次将底座特征包括的各像素点作为当前像素点,通过所述采样网格确定所述屋顶区域包括的各像素点中,与所述当前像素点对应的像素点,并基于插值方式对所述确定的像素点的值进行计算,得到所述当前像素点对应的像素值。
- 根据权利要求3-6任一所述的方法,其特征在于,与所述屋顶区域预测子模型以 及所述偏移量预测子模型共用同一区域特征提取单元的还包括屋顶轮廓预测子模型;所述方法还包括:利用所述屋顶轮廓预测子模型对所述建筑物特征进行轮廓回归,确定所述建筑物的屋顶轮廓;根据所述预测偏移量,对所述屋顶轮廓进行变换,得到所述建筑物的底座轮廓;基于所述底座轮廓对所述底座区域进行调整,得到所述建筑物的最终底座区域。
- 根据权利要求7所述的方法,其特征在于,所述利用所述屋顶轮廓预测子模型对所述建筑物特征进行轮廓回归,确定所述建筑物的屋顶轮廓,包括:从所述建筑物特征中提取多个连接点;将所述多个连接点中的至少部分连接点进行组合,得到多个线段;对所述多个线段进行预测,得到各线段对应的预测分数,其中,所述预测分数用于指示与该分数对应的线段属于屋顶轮廓的概率;将所述多个线段中,对应的预测分数大于预设阈值的线段进行组合,得到所述建筑物的屋顶轮廓。
- 根据权利要求1-8任一所述的方法,其特征在于,所述方法利用图像处理模型实现;其中,所述图像处理模型包括建筑物边界框预测子模型、屋顶区域预测子模型、偏移量预测子模型、屋顶轮廓预测子模型以及底座区域预测子模型。
- 根据权利要求9所述的方法,其特征在于,所述图像处理模型的训练方法包括:获取多个包括标注信息的训练样本;其中,所述标注信息包括建筑物边界框,建筑物屋顶区域,建筑物屋顶轮廓,建筑物屋顶与底座之间的偏移量,建筑物底座区域;基于所述图像处理模型包括的各子模型分别对应的损失信息,构建联合学习损失信息;基于所述联合学习损失信息以及所述训练样本对所述图像处理模型包括的各子模型进行联合训练,直至所述各子模型收敛。
- 一种图像处理装置,其特征在于,所述装置包括:获取模块,用于获取包含至少一个建筑物的目标图像;提取模块,用于对每个建筑物,从所述目标图像中提取出所述建筑物的边界框和所述目标图像的目标特征图,基于所述目标图像的目标特征图和所述建筑物的边界框确定所述建筑物的屋顶区域和屋顶与底座之间的预测偏移量;变换模块,用于根据所述预测偏移量,对所述屋顶区域进行变换得到所述建筑物的底座区域。
- 一种电子设备,其特征在于,所述设备包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现权利要求1至10中任一项所述的图像处理方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序用于执行权利要求1至10中任一项所述的图像处理方法。
- 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至10任一项所述的图像处理方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011035443.6 | 2020-09-27 | ||
CN202011035443.6A CN112149585A (zh) | 2020-09-27 | 2020-09-27 | 一种图像处理方法、装置、设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022062543A1 true WO2022062543A1 (zh) | 2022-03-31 |
Family
ID=73896114
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/103643 WO2022062543A1 (zh) | 2020-09-27 | 2021-06-30 | 一种图像处理方法、装置、设备和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112149585A (zh) |
WO (1) | WO2022062543A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115035409A (zh) * | 2022-06-20 | 2022-09-09 | 北京航空航天大学 | 一种基于相似性对比学习的弱监督遥感图像目标检测算法 |
CN117115641A (zh) * | 2023-07-20 | 2023-11-24 | 中国科学院空天信息创新研究院 | 建筑物信息提取方法、装置、电子设备及存储介质 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149585A (zh) * | 2020-09-27 | 2020-12-29 | 上海商汤智能科技有限公司 | 一种图像处理方法、装置、设备和存储介质 |
CN112949388B (zh) * | 2021-01-27 | 2024-04-16 | 上海商汤智能科技有限公司 | 一种图像处理方法、装置、电子设备和存储介质 |
CN113344180A (zh) * | 2021-05-31 | 2021-09-03 | 上海商汤智能科技有限公司 | 神经网络训练与图像处理方法、装置、设备和存储介质 |
CN113344195A (zh) * | 2021-05-31 | 2021-09-03 | 上海商汤智能科技有限公司 | 网络训练与图像处理方法、装置、设备和存储介质 |
CN114529552A (zh) * | 2022-03-03 | 2022-05-24 | 北京航空航天大学 | 一种基于几何轮廓顶点预测的遥感影像建筑物分割方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8731234B1 (en) * | 2008-10-31 | 2014-05-20 | Eagle View Technologies, Inc. | Automated roof identification systems and methods |
CN110197147A (zh) * | 2019-05-23 | 2019-09-03 | 星际空间(天津)科技发展有限公司 | 遥感影像的建筑物实例提取方法、装置、存储介质及设备 |
CN111458691A (zh) * | 2020-01-19 | 2020-07-28 | 北京建筑大学 | 建筑物信息的提取方法、装置及计算机设备 |
CN112149585A (zh) * | 2020-09-27 | 2020-12-29 | 上海商汤智能科技有限公司 | 一种图像处理方法、装置、设备和存储介质 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7164883B2 (en) * | 2001-02-14 | 2007-01-16 | Motorola. Inc. | Method and system for modeling and managing terrain, buildings, and infrastructure |
JP4319857B2 (ja) * | 2003-05-19 | 2009-08-26 | 株式会社日立製作所 | 地図作成方法 |
CN104240247B (zh) * | 2014-09-10 | 2017-04-12 | 无锡儒安科技有限公司 | 一种基于单张图片的建筑物俯视轮廓的快速提取方法 |
-
2020
- 2020-09-27 CN CN202011035443.6A patent/CN112149585A/zh active Pending
-
2021
- 2021-06-30 WO PCT/CN2021/103643 patent/WO2022062543A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8731234B1 (en) * | 2008-10-31 | 2014-05-20 | Eagle View Technologies, Inc. | Automated roof identification systems and methods |
CN110197147A (zh) * | 2019-05-23 | 2019-09-03 | 星际空间(天津)科技发展有限公司 | 遥感影像的建筑物实例提取方法、装置、存储介质及设备 |
CN111458691A (zh) * | 2020-01-19 | 2020-07-28 | 北京建筑大学 | 建筑物信息的提取方法、装置及计算机设备 |
CN112149585A (zh) * | 2020-09-27 | 2020-12-29 | 上海商汤智能科技有限公司 | 一种图像处理方法、装置、设备和存储介质 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115035409A (zh) * | 2022-06-20 | 2022-09-09 | 北京航空航天大学 | 一种基于相似性对比学习的弱监督遥感图像目标检测算法 |
CN115035409B (zh) * | 2022-06-20 | 2024-05-28 | 北京航空航天大学 | 一种基于相似性对比学习的弱监督遥感图像目标检测算法 |
CN117115641A (zh) * | 2023-07-20 | 2023-11-24 | 中国科学院空天信息创新研究院 | 建筑物信息提取方法、装置、电子设备及存储介质 |
CN117115641B (zh) * | 2023-07-20 | 2024-03-22 | 中国科学院空天信息创新研究院 | 建筑物信息提取方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112149585A (zh) | 2020-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022062543A1 (zh) | 一种图像处理方法、装置、设备和存储介质 | |
CN112200165A (zh) | 模型训练方法、人体姿态估计方法、装置、设备及介质 | |
CN112330664B (zh) | 路面病害检测方法、装置、电子设备及存储介质 | |
WO2022062854A1 (zh) | 一种图像处理方法、装置、设备和存储介质 | |
US11106904B2 (en) | Methods and systems for forecasting crowd dynamics | |
CN110969648B (zh) | 一种基于点云序列数据的3d目标跟踪方法及系统 | |
CN114758337B (zh) | 一种语义实例重建方法、装置、设备及介质 | |
WO2021249114A1 (zh) | 目标跟踪方法和目标跟踪装置 | |
JP2023535502A (ja) | 半教師付きキーポイントベースモデル | |
WO2024083121A1 (zh) | 一种数据处理方法及其装置 | |
CN113344195A (zh) | 网络训练与图像处理方法、装置、设备和存储介质 | |
CN115953468A (zh) | 深度和自运动轨迹的估计方法、装置、设备及存储介质 | |
WO2022252558A1 (zh) | 神经网络训练与图像处理方法、装置、设备和存储介质 | |
Sun et al. | Two-stage deep regression enhanced depth estimation from a single RGB image | |
CN113932796A (zh) | 高精地图车道线生成方法、装置和电子设备 | |
CN114077892A (zh) | 人体骨骼序列提取及训练方法、装置和存储介质 | |
JP2023036795A (ja) | 画像処理方法、モデル訓練方法、装置、電子機器、記憶媒体、コンピュータプログラム及び自動運転車両 | |
CN114677508A (zh) | 一种基于动态滤波和逐点相关的点云实例语义分割方法 | |
CN113920254A (zh) | 一种基于单目rgb的室内三维重建方法及其系统 | |
Xing et al. | ROIFormer: semantic-aware region of interest transformer for efficient self-supervised monocular depth estimation | |
Sun et al. | Accurate deep direct geo-localization from ground imagery and phone-grade gps | |
Cheng | Global-feature enhanced network for fast semantic segmentation | |
KR102613887B1 (ko) | 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성 방법 및 장치 | |
CN114926655B (zh) | 地理与视觉跨模态预训练模型的训练方法、位置确定方法 | |
CN117612200A (zh) | 一种知识迁移的点云人体姿态估计模型的训练和识别方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21870922 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022546338 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.09.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21870922 Country of ref document: EP Kind code of ref document: A1 |