WO2022062543A1 - 一种图像处理方法、装置、设备和存储介质 - Google Patents

一种图像处理方法、装置、设备和存储介质 Download PDF

Info

Publication number
WO2022062543A1
WO2022062543A1 PCT/CN2021/103643 CN2021103643W WO2022062543A1 WO 2022062543 A1 WO2022062543 A1 WO 2022062543A1 CN 2021103643 W CN2021103643 W CN 2021103643W WO 2022062543 A1 WO2022062543 A1 WO 2022062543A1
Authority
WO
WIPO (PCT)
Prior art keywords
building
model
roof
area
mentioned
Prior art date
Application number
PCT/CN2021/103643
Other languages
English (en)
French (fr)
Inventor
王金旺
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2022062543A1 publication Critical patent/WO2022062543A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to the field of computer technologies, and in particular, to an image processing method, apparatus, device, and storage medium.
  • the target images containing buildings are usually remote sensing images captured by satellites or aircraft, the bases of buildings in the images may be partially occluded, resulting in inconspicuous visual features of the bases of buildings, which affects the extraction of bases of buildings. precision.
  • the present application discloses at least one image processing method, the method includes: acquiring a target image including at least one building; for each building, extracting the bounding box of the building and the target from the target image The target feature map of the image, based on the target feature map of the target image and the bounding box of the building, determine the roof area of the building and the predicted offset between the roof and the base; The area is transformed to obtain the base area of the above-mentioned building.
  • the above-mentioned extracting the bounding box of the above-mentioned building from the above-mentioned target image includes:
  • the above-mentioned determination of the roof area of the above-mentioned building includes: a target feature map based on the bounding box of the above-mentioned building and the above-mentioned target image , the roof area of the above-mentioned building is determined by the roof area prediction sub-model; the above-mentioned determination of the predicted offset between the roof and the base of the above-mentioned building includes: based on the bounding box of the above-mentioned building and the target feature map of the above-mentioned target image, The aforementioned predicted offset for the aforementioned building is determined by the offset prediction sub-model.
  • the above-mentioned determination of the above-mentioned predicted offset of the above-mentioned building includes: using the second convolution processing unit included in the above-mentioned offset prediction sub-model to perform a second convolution process on the above-mentioned building feature to obtain the above-mentioned building.
  • the above predicted offset for the object includes: using the second convolution processing unit included in the above-mentioned offset prediction sub-model to perform a second convolution process on the above-mentioned building feature to obtain the above-mentioned building.
  • the above-mentioned transforming the above-mentioned roof area according to the above-mentioned predicted offset to obtain the base area of the above-mentioned building includes: based on the above-mentioned predicted offset and the above-mentioned building characteristics of the above-mentioned building, from the base
  • the area prediction submodel determines the base area of the above-mentioned building.
  • the above-mentioned determining the base area of the building by the base area prediction sub-model based on the predicted offset and the building characteristics of the building includes: using the space included in the base area prediction sub-model A transformation network, performing translation transformation on the building features corresponding to the above-mentioned roof area, to obtain the base characteristics of the above-mentioned buildings, wherein, the spatial transformation parameters of the above-mentioned spatial transformation network include parameters determined based on the above-mentioned predicted offset; using the above-mentioned base area to predict The sub-model performs a third convolution process on the above-mentioned base feature to obtain the base area of the above-mentioned building.
  • the above-mentioned spatial transformation network includes a sampler constructed based on interpolation, wherein the above-mentioned sampler includes a sampling grid constructed based on the above-mentioned prediction offset;
  • the transformation network performs translational transformation on the building features corresponding to the above-mentioned roof area, and obtains the base features of the above-mentioned buildings, including: using the above-mentioned sampler, according to the coordinate information of a plurality of pixel points included in the above-mentioned base features, sequentially transforming the base features
  • Each of the included pixels is taken as the current pixel, and among the pixels included in the roof area, the pixel corresponding to the above-mentioned current pixel is determined by the sampling grid, and the value of the above-determined pixel is determined based on the interpolation method. Calculate to obtain the pixel value corresponding to the above-mentioned current pixel point.
  • the same area feature extraction unit shared with the above-mentioned roof area prediction sub-model and the above-mentioned offset prediction sub-model also includes a roof contour prediction sub-model; the above-mentioned method further includes: using the above-mentioned roof contour prediction sub-model Perform contour regression on the above-mentioned building features to determine the roof contour of the above-mentioned building; according to the above-mentioned predicted offset, transform the above-mentioned roof contour to obtain the base contour of the above-mentioned building; Based on the above-mentioned base contour, the above-mentioned base area is adjusted, Get the final plinth area for the above building.
  • performing contour regression on the building features using the roof contour prediction sub-model, and determining the roof contour of the building includes: extracting a plurality of connection points from the building features; At least some of the connection points in the connection points are combined to obtain a plurality of line segments; the above-mentioned plurality of line segments are predicted to obtain a prediction score corresponding to each line segment, wherein the above-mentioned prediction score is used to indicate that the line segment corresponding to the score belongs to the roof profile. probability; among the above-mentioned multiple line segments, the corresponding line segments whose predicted scores are greater than the preset threshold are combined to obtain the roof outline of the above-mentioned building.
  • the above method is obtained by using an image processing model; wherein, the above image processing model includes a building bounding box prediction sub-model, a roof area prediction sub-model, an offset prediction sub-model, a roof outline prediction sub-model, and Base region prediction submodel.
  • the above image processing model includes a building bounding box prediction sub-model, a roof area prediction sub-model, an offset prediction sub-model, a roof outline prediction sub-model, and Base region prediction submodel.
  • the training method of the above image processing model includes: acquiring a plurality of training samples including labeling information; wherein the labeling information includes a building bounding box, a building roof area, a building roof outline, a building The offset between the roof and the base, the base area of the building; the joint learning loss information is constructed based on the loss information corresponding to each sub-model included in the above image processing model; the above image is based on the above joint learning loss information and the above training samples.
  • Each sub-model included in the processing model is jointly trained until the above-mentioned sub-models converge.
  • the present application also proposes an image processing apparatus, the apparatus includes: an acquisition module for acquiring a target image including at least one building; an extraction module for extracting the above-mentioned building from the above-mentioned target image for each building
  • the bounding box of the above-mentioned target image and the target feature map of the above-mentioned target image based on the target feature map of the above-mentioned target image and the bounding box of the above-mentioned building, determine the roof area of the above-mentioned building and the predicted offset between the roof and the base; According to the predicted offset, the roof area is transformed to obtain the base area of the building.
  • the present application also proposes an electronic device, the device includes: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement any of the above An image processing method shown in an embodiment.
  • the present application also provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to execute the image processing method shown in any of the foregoing embodiments.
  • the present application also proposes a computer program, including computer-readable codes, when the computer-readable codes are executed in an electronic device, the processor in the electronic device executes the program for realizing the implementation of any of the foregoing embodiments. out the image processing method.
  • the roof area of the building with obvious visual features and the predicted offset between the roof and the base can be extracted from the acquired target image, and then based on the predicted offset, the higher-precision The roof area is transformed, so a high-precision building base area can be obtained, so that in the building base prediction process, there is no need to rely on the base features included in the target image, so that the building base features included in the target image are occluded. In the case of , a building base with higher precision can also be obtained.
  • FIG. 2 is a schematic flowchart of a base extraction by an image processing model shown in the application
  • FIG. 3 is a schematic flowchart of a method for predicting a roof area by an image processing unit shown in the application;
  • FIG. 4 is a schematic flowchart of a method for performing offset prediction by an image processing unit shown in the present application
  • FIG. 5 is a schematic flowchart of a method for performing offset prediction and roof area prediction by an image processing unit shown in the present application
  • FIG. 6 is a schematic flowchart of a method for base prediction by a base area prediction sub-model shown in the application;
  • FIG. 8 is a schematic flowchart of a method for predicting a roof outline by a bounding box analysis network shown in the present application
  • Fig. 10 is a kind of task and model correspondence relationship diagram shown in the application.
  • FIG. 11 is a method flowchart of an image processing model training method shown in this application.
  • FIG. 12 is a schematic diagram of an image processing apparatus shown in this application.
  • FIG. 13 is a hardware structure diagram shown in this application.
  • This application aims to propose an image processing method.
  • the method makes full use of the information of the main body, roof, base, etc. of the building in the target image, and extracts the roof area of the building with obvious visual features and the predicted offset between the roof and the base from the obtained target image. Based on the predicted offset, the roof area with high precision is transformed to obtain the base area of the building with high precision. In this way, even in the case where the building base included in the target image is occluded, a building base with high accuracy can be obtained.
  • FIG. 1 is a method flowchart of an image processing method shown in this application. As shown in Figure 1, the above method may include:
  • S102 Acquire a target image including at least one building.
  • S106 Transform the roof area according to the predicted offset to obtain the base area of the building.
  • the roof area of the building is represented by attribute information of the roof area, and the attribute information at least includes coordinate information representing the roof area.
  • the attribute information further includes feature information of the roof area and/or the outline of the roof area.
  • the base area of the building is represented by attribute information of the base area, and the attribute information at least includes coordinate information representing the base area.
  • the attribute information further includes feature information of the base area and/or the outline of the base area.
  • the above-mentioned image processing method can be applied to electronic equipment.
  • the above-mentioned electronic device may execute the above-mentioned image processing method by carrying a software system or hardware structure corresponding to the image processing method.
  • the types of the above electronic devices may be notebook computers, computers, servers, mobile phones, PAD terminals, etc., which are not particularly limited in this application.
  • the above-mentioned image processing method can be executed only by the terminal device or the server device alone, or can be executed by the terminal device and the server device in cooperation.
  • the above-mentioned image processing method can be integrated in the client.
  • the terminal device equipped with the client After receiving the image processing request, the terminal device equipped with the client can provide computing power through its own hardware environment to execute the above image processing method.
  • the above-mentioned image processing method can be integrated into the system platform.
  • the server device equipped with the system platform can provide computing power through its own hardware environment to execute the above image processing method.
  • the above image processing method can be divided into two tasks: acquiring a target image and processing the target image.
  • the acquisition task can be integrated in the client and carried on the terminal device.
  • Processing tasks can be integrated on the server and carried on the server device.
  • the above terminal device may initiate an image processing request to the above server device after acquiring the target image.
  • the server device may execute the method on the target image in response to the request.
  • the execution subject is an electronic device (hereinafter referred to as a device) as an example for description.
  • the above-mentioned target image refers to an image including at least one building in the image.
  • the above-mentioned target image may be a remote sensing image captured by a device such as an aircraft, an unmanned aerial vehicle, or a satellite.
  • the following description mainly takes a building as an example.
  • the processing method of the image including a plurality of buildings in the target image is similar to the processing method of the image including one building.
  • the above-mentioned device may complete the input of the target image by interacting with the user.
  • the above-mentioned device can provide the user with a window for inputting the target image to be processed through its onboard interface, so that the user can input the image.
  • the user can complete the input of the target image based on this window.
  • the image can be input into the image processing model for calculation.
  • the above-mentioned device can directly acquire the remote sensing image output by the remote sensing image acquisition system.
  • the above-mentioned device may pre-establish a certain protocol with the remote sensing image acquisition system. After the remote sensing image acquisition system generates the remote sensing image, it can be sent to the above-mentioned equipment for image processing.
  • the above-mentioned device may be equipped with an image processing model to perform the above-mentioned image processing.
  • the above-mentioned device can use an image processing model to perform image processing on each building in the above-mentioned target image, so as to extract the roof area of the building and the predicted deviation between the roof and the base of the above-mentioned building from the above-mentioned target image. According to the predicted offset, the roof area is transformed to obtain the base area corresponding to the building.
  • the above image processing model may be an end-to-end image processing model for extracting building bases based on target images.
  • the image processing model may be a pre-trained neural network model.
  • FIG. 2 is a schematic flowchart of a base extraction using an image processing model according to the present application.
  • the above-mentioned image processing model may include an image processing unit and a region transforming unit.
  • the input of the image processing unit is the target image.
  • the output of the above image processing unit is the roof area of each building and the predicted offset between the roof and the plinth.
  • the input of the above-mentioned area transformation unit is the output of the above-mentioned image processing unit.
  • the output of the above-mentioned area transformation unit is the base area.
  • the above image processing unit may include a sub-model based on a deep neural network for predicting the roof area and the predicted offset between the roof and the plinth.
  • the above image processing unit may further include a building bounding box prediction sub-model and a roof area prediction sub-model.
  • the building bounding box prediction sub-model is used to extract the building bounding box in the target image and provide input for other sub-models, so as to make full use of various information of buildings in the target image.
  • the above-mentioned building bounding box prediction sub-model may be a neural network model obtained by training based on a plurality of training samples marked with bounding boxes.
  • the above-mentioned roof area prediction sub-model performs roof area prediction for each building based on the inputted building bounding box and the area features in the target image.
  • the above-mentioned roof area prediction sub-model may be a neural network model obtained by training based on a plurality of training samples marked with the roof area.
  • FIG. 3 is a schematic flowchart of a method for predicting a roof area by an image processing unit according to the present application.
  • the image processing unit may include a roof area prediction sub-model.
  • the roof area prediction submodel may include a building bounding box prediction submodel.
  • the above-mentioned building bounding box prediction sub-model may be a regression model constructed based on RPN (Region Proposal Network, candidate box generation network).
  • the above-mentioned roof area prediction sub-model may be a regression model constructed based on a regional feature extraction unit such as a RoI Align (Region of interest Align, region of interest feature alignment) network or a RoI pooling (Region of interest pooling, region of interest feature pooling) network. .
  • the above-mentioned roof area prediction sub-model includes the above-mentioned building bounding box prediction sub-model, and the above-mentioned building bounding box prediction sub-model includes a backbone network, a candidate frame generation network and an area feature extraction unit.
  • FIG. 3 is only a schematic illustration, and some intermediate layers such as convolution layers, spatial pyramid layers, and fully connected layers may be added according to actual situations.
  • the building bounding box prediction sub-model can be used to first perform target detection on the above target image to obtain the bounding box of the above building.
  • the target feature map of the target image can be obtained.
  • This application does not limit the architecture of the backbone network, which can be a common convolutional neural network (Convolutional Neural Networks, CNN) network, such as VGGNet, ResNet, HRNet, and the like.
  • the information of the target feature map of the target image is related to the specific architecture of the applied backbone network. Then, the target feature map is calculated based on the RPN, and multiple candidate boxes of different sizes are obtained.
  • the regional feature extraction unit 1 Through the regional feature extraction unit 1, the corresponding features of a fixed size can be obtained from these candidate boxes, and then the bounding boxes of one or more buildings are respectively generated through the subsequent fully connected layers. Among them, the regional feature extraction unit 1 can use the RoI Align network or the RoI pooling network.
  • the above-mentioned target image may be input into the above-mentioned building bounding box prediction sub-model to perform target detection to obtain the bounding box of the building.
  • the roof area of the building included in the bounding box may be determined by the roof area prediction sub-model based on the bounding box and the target feature map of the target image.
  • the attribute information of the roof area includes coordinate information of the roof area.
  • the above-mentioned bounding box and the target feature map of the above-mentioned target image may be input into the area feature extraction unit 2 of the above-mentioned roof area prediction sub-model to obtain the roof area corresponding to the building included in the above-mentioned bounding box.
  • the loss value of the roof area prediction sub-model can be increased for supervised training, thereby improving the accuracy of feature extraction of the backbone network.
  • the above-mentioned roof area prediction sub-model utilizes the output of the above-mentioned building bounding box prediction sub-model. Therefore, when training the above-mentioned building bounding box prediction sub-model, the roof area can be used for supervised training. In the training sample, the roof area of the sample image is marked with the real value, which is used as supervision information to predict the building bounding box. The sub-model can learn the relevant features needed to predict the roof area, thereby improving the accuracy of building bounding box prediction and further improving the accuracy of roof extraction.
  • the above-mentioned image processing unit may further include an offset prediction sub-model (hereinafter referred to as “offset prediction sub-model”) between the roof and the base, for extracting the roof and base of the building included in the image from the target image. Prediction offset between (hereinafter referred to as “prediction offset”).
  • Prediction offset between (hereinafter referred to as “prediction offset”).
  • the above-mentioned offset prediction sub-model and the above-mentioned roof area prediction sub-model both perform feature extraction on the buildings included in the target image, in order to reduce the model calculation amount, the above-mentioned offset prediction can be The sub-model and the above-mentioned roof area prediction sub-model share the above-mentioned building bounding box prediction sub-model.
  • FIG. 4 is a schematic flowchart of a method for performing offset prediction by an image processing unit according to the present application.
  • the image processing unit may include a building bounding box prediction sub-model and an offset prediction sub-model.
  • the above-mentioned building bounding box prediction sub-model may be a regression model constructed based on RPN.
  • the above-mentioned offset prediction sub-model may be a regression model constructed based on regional feature extraction units such as RoI Align network or RoI pooling network.
  • the offset prediction sub-model and the roof area prediction sub-model share the building bounding box prediction sub-model.
  • FIG. 4 is only a schematic illustration, and some intermediate layers such as convolution layers, spatial pyramid layers, and fully connected layers may be added according to actual situations.
  • the offset prediction sub-model may determine the predicted offset between the roof and the base of the building included in the bounding box based on the bounding box and the target feature map of the target image.
  • the bounding box of the building output by the building bounding box prediction sub-model and the target feature map of the above-mentioned target image can be input into the regional feature extraction unit 2 of the offset prediction sub-model to obtain the above-mentioned predicted offset .
  • the loss value of the offset prediction sub-model can be increased for supervised training, thereby improving the accuracy of feature extraction of the backbone network.
  • the offset prediction sub-model and the roof area prediction sub-model share the building bounding box prediction sub-model
  • the input of the offset prediction sub-model is the building bounding box prediction sub-model
  • the above-mentioned offset prediction sub-model includes the above-mentioned building bounding box prediction sub-model. Therefore, when training the above-mentioned building bounding box prediction sub-model, the predicted offset can be used for supervised training.
  • the offset part of the sample image is marked with the real value, which is used as supervision information to make the building
  • the bounding box prediction sub-model can learn the relevant features required to predict the offset, thereby improving the prediction accuracy of the building bounding box and further improving the accuracy of the base area obtained by transformation.
  • sharing the building bounding box prediction sub-model with the roof area prediction sub-model can reduce the amount of model computation.
  • the roof area prediction sub-model and the offset prediction sub-model between the roof and the base may share the same area feature extraction unit.
  • the above-mentioned regional feature extraction unit may be a regional feature extraction unit constructed based on the RoI Align unit or the RoI pooling unit.
  • FIG. 5 is a schematic flowchart of a method for performing offset prediction and roof area prediction by an image processing unit according to the present application.
  • the above processing flow includes two sub-branches.
  • the first sub-branch is the roof area prediction sub-branch; the other sub-branch is the offset prediction sub-branch.
  • the above two sub-branches may share the region feature extraction unit.
  • the above-mentioned target image which can be a target feature map obtained after the target image is processed by the backbone network
  • the above-mentioned regional feature extraction unit it is determined that the above-mentioned bounding box includes: The corresponding building features of the building.
  • the above-mentioned target image may include multiple buildings. It can be understood that, in the above-mentioned situation, the solution described in this application can separately extract the bounding boxes of multiple buildings, and perform the above steps of determining the building features for each building bounding box.
  • the number of buildings included in the target image is not limited in this application.
  • the first convolution processing unit included in the roof area prediction sub-model can be used to perform the first convolution processing on the above building features, to obtain The roof area of the above building.
  • the attribute information of the roof area includes not only coordinate information of the roof area, but also feature information of the roof area.
  • the building features may be input into the first convolution processing unit shown in FIG. 5 for calculation to obtain the attribute information of the roof area.
  • the second convolution processing unit included in the offset prediction sub-model can also be used to perform a second volume on the building features.
  • the product is processed to obtain the predicted offset between the roof and the base of the building included in the above bounding box.
  • the building features may be input into the second convolution processing unit shown in FIG. 5 for calculation to obtain the predicted offset.
  • the present application does not limit the structures of the first convolution unit and the second convolution unit.
  • the structures of the first convolution unit and the second convolution unit may be set according to actual requirements.
  • the model structure shown in the above-mentioned FIG. 5 is only a schematic illustration. Conventional structures such as upsampling, downsampling, pooling operations, etc. are not shown in Figure 5. The above conventional structure can be set according to the actual situation.
  • the above-mentioned roof area prediction sub-model and the above-mentioned offset prediction sub-model between the roof and the base may share the same area feature extraction unit. Therefore, on the one hand, when training the above-mentioned regional feature extraction unit, the predicted offset and the roof area can be used for supervised training, and the label information of the real value such as offset and roof area is introduced as the supervision information, so that the regional feature extraction unit can perform supervised training. It can learn the predicted offset and related features required by the roof area, thereby improving the accuracy of building feature extraction and further improving the accuracy of base extraction. On the other hand, the model structure is simplified and the model calculation amount is reduced.
  • the obtained roof area and the predicted offset can be input into the area transformation unit to obtain the base area.
  • the region transformation unit may be a type of mapping unit.
  • x1 represents the predicted offset between the roof area and the base area
  • x2 represents the roof area
  • y represents the base area
  • f is the mapping function of the base area obtained from the predicted offset and the roof area.
  • the above-mentioned prediction offset may include a rotation prediction offset and a translation prediction offset.
  • the specific meaning of the prediction offset is not limited in this application.
  • the base can be obtained only by translating the roof as an example for description.
  • the predicted offset and the roof area can be obtained according to the predicted offset.
  • the roof feature is transformed to obtain the plinth feature of the plinth area.
  • the base feature is refined to obtain the base area.
  • the region transformation unit when performing translation transformation, translation transformation is performed on the features of the roof area.
  • the region transformation unit can use bilinear interpolation to select and map the original features and the transformed features.
  • the above operation can avoid the introduction of convolution and/or upsampling from the roof feature to the roof area during the translation transformation process. other errors, thereby improving the accuracy of the base extraction.
  • the roof area is obtained based on the building features, and the building features are obtained from the regional feature extraction unit. Since the supervised training is performed with the roof area as the real value when training the feature extraction unit of this area, the feature response corresponding to the roof area in the above building features will be very high.
  • the above-mentioned region transformation unit may be a unit constructed based on a neural network.
  • This unit can be used as a base area prediction sub-model for predicting the base area, that is, the above-mentioned image processing model further includes a base area prediction sub-model constructed based on a neural network.
  • the above-mentioned roof area is transformed according to the above-mentioned predicted offset to obtain the base area corresponding to the above-mentioned building, it is possible to determine the above-mentioned base area based on the above-mentioned predicted offset, the characteristics of the building corresponding to the above-mentioned building, and the base area prediction sub-model.
  • the base area corresponding to the building is determined.
  • the above-mentioned predicted offset and the building feature corresponding to the above-mentioned building may be input into the above-mentioned base area prediction sub-model to obtain the base area corresponding to the above-mentioned building.
  • the base region prediction sub-model described above may comprise a spatial transformation network.
  • the spatial transformation parameters corresponding to the above-mentioned spatial transformation network include parameters determined based on the above-mentioned prediction offset.
  • FIG. 6 is a schematic flowchart of a method for predicting a base by using a base area prediction sub-model according to the present application.
  • the spatial transformation network included in the above-mentioned base area prediction sub-model can be used to perform spatial transformation on the building characteristics corresponding to the above-mentioned roof area to obtain the corresponding Base features.
  • the above-mentioned base features can be input into multiple convolutional layers (the third convolution unit shown in FIG. 6 ) included in the above-mentioned base area prediction sub-model to perform third convolution processing to obtain the above-mentioned building. the corresponding base area.
  • the attribute information of the base area includes not only coordinate information of the base area, but also feature information of the base area.
  • the present application does not limit the structure of the third convolution unit.
  • the structure of the above-mentioned third convolution unit can be set according to actual requirements.
  • the model structure shown in the above-mentioned FIG. 6 is only a schematic illustration. Conventional structures such as upsampling, downsampling, pooling operations, etc. are not shown in Figure 6. The above conventional structure can be set according to the actual situation.
  • the above-mentioned spatial transformation network may include a sampler (Sampler) constructed based on an interpolation method, wherein the above-mentioned sampler includes a sampling grid (Grid generator) constructed based on the above-mentioned predicted offset.
  • the above-mentioned sampling grid is specifically a transformation function constructed based on the above-mentioned prediction offset.
  • the above sampling grid may indicate the mapping relationship between each pixel included in the roof feature and each pixel included in the base feature. For example, according to the above sampling grid, it can be determined which pixel points included in the roof feature are mapped to a certain pixel point corresponding to the base feature.
  • the above sampler is specifically a mapping unit constructed based on an interpolation method.
  • the above-mentioned sampler can map the original feature (building feature) and the translationally transformed feature (base feature) based on the interpolation method to map the feature position (feature position) and the feature value (feature score), so as to obtain the base feature.
  • the above-mentioned interpolation mode may be bilinear interpolation, linear difference, parabolic interpolation, and the like. In this application, bilinear interpolation can be adopted.
  • the sampler can be used to obtain the base features according to the plurality of base features. For each coordinate information of the pixel point, each pixel point included in the base feature is taken as the current pixel point in turn, and the pixel point corresponding to the above-mentioned current pixel point in each pixel point included in the roof area is determined through the sampling grid, and based on The interpolation method calculates the value of the determined pixel point to obtain the pixel value corresponding to the above-mentioned current pixel point.
  • the above-mentioned base region prediction sub-model includes a spatial transformation network that can be back-propagated and a third convolution unit, therefore, it is different from a non-neural network such as RT transformation (rotation, translation transformation), etc.
  • the base area can be used as the ground truth to perform supervised training on the base area prediction sub-model (including the spatial transformation network and the third convolution layer), thereby introducing The prediction error between the base area predicted based on the roof area and the true value of the base area is used as supervision information, so that the shared network between the base area prediction sub-model and the offset prediction sub-model can be trained based on the above prediction error, thereby Improve the accuracy of offset prediction, and further improve the prediction accuracy of the base area.
  • the base area prediction sub-model, the roof area prediction sub-model, and the offset prediction sub-model share the building features output by the regional feature extraction unit, the supervision information can be shared during the training of each sub-model, thereby speeding up the The model converges while improving the performance of each sub-model.
  • the geographic coordinates of the building can be restored according to the coordinate information of each pixel included in the roof area on the target image, the technical solutions provided by the above embodiments can not only accurately restore the shape of the base of the building, but also restore the shape of the base of the building. Restore the geographic location of the base of the building.
  • a roof contour that is more suitable than the edge included in the roof area extracted by the roof area prediction sub-model may be extracted from the target image, and based on the above-mentioned roof The contour modifies the roof area obtained by the roof area prediction submodel to obtain the final base area.
  • FIG. 7 is a flowchart of a method for predicting a final base area shown in the present application.
  • S702 may be executed, and contour regression is performed on the building features by using the roof contour prediction sub-model to determine the roof contour of the building.
  • the building features may be input into the roof profile prediction sub-model to obtain the roof profile of the building.
  • the above-mentioned roof profile prediction sub-model may be a model constructed based on a Wireframe Parsing network. Through the bounding box analysis network, a more accurate roof outline can be extracted from the target image.
  • FIG. 8 is a schematic flowchart of a method for predicting a roof outline by using a bounding box analysis network according to the present application.
  • connection points can be extracted from the building features.
  • the building features can be input into the fourth convolution unit (not shown in the figure) for multiple convolution operations and smoothing to obtain a heat map including multiple connection points.
  • each pixel block in the heat map (for example, if the resolution of the heat map is 14*14, then the heat map includes 196 pixel blocks) can be marked true value (that is, when a certain pixel block includes a connection point, it is marked as 1, otherwise it is marked as 0) to obtain multiple training samples, and then based on the multiple training samples, the cross-entropy loss information can be used as the objective function.
  • the product unit is trained so that the fourth convolution unit described above can make connection point predictions for each pixel block in this heatmap.
  • line segment sampling can be performed. That is, a plurality of line segments are obtained by combining at least some of the connection points in pairs.
  • line segment verification can be performed. That is, the above-mentioned multiple line segments are predicted to obtain a prediction score corresponding to each line segment; and a line segment with a predicted score greater than a preset threshold is screened out; wherein, the above-mentioned prediction score is used to indicate the probability that the line segment corresponding to the score belongs to the roof outline.
  • the above-mentioned preset threshold may be a threshold set according to experience.
  • the above-mentioned multiple line segments can be input into a line segment verification network to obtain a prediction score corresponding to each line segment, and then a line segment with a prediction score greater than the above-mentioned preset threshold can be screened out.
  • the above-mentioned line segment verification network may include a line segment feature extraction network and a classification score prediction network.
  • the above-mentioned line segment feature extraction network is used for extracting line segment features corresponding to the constructed line segments from the building features. After the line segment feature is obtained, the classification score corresponding to the line segment can be predicted based on the classification score prediction network and the line segment feature.
  • the same number of positive samples and negative samples can be set when constructing training samples, so that the above-mentioned line segment verification network can learn the line segments corresponding to the positive samples and the negative samples respectively.
  • the positive samples refer to the line segment pairs with high similarity in the image.
  • Negative samples are pairs of line segments with low similarity.
  • the line segments whose classification score is greater than the preset threshold After filtering out the line segments whose classification score is greater than the preset threshold, the line segments whose corresponding prediction scores are greater than the preset threshold among the multiple line segments can be combined to obtain the roof outline of the building.
  • S704 may be executed to perform translation transformation on the above-mentioned roof outline according to the above-mentioned predicted offset to obtain the above-mentioned base outline of the building.
  • the above-mentioned translation transformation may be to map the roof outline to the base outline through a preset transformation function (eg, RT transformation).
  • a preset transformation function eg, RT transformation
  • the roof contour in order to improve the transformation accuracy, may be translated and transformed through a spatial transformation network to obtain the base contour.
  • FIG. 9 is a schematic diagram of a base area prediction flowchart shown in the present application.
  • the above-mentioned roof outline and the above-mentioned predicted offset can be input into the above-mentioned spatial transformation network for translational transformation to obtain the base outline.
  • the space transformation network For the related introduction of the space transformation network, reference may be made to the foregoing content, which will not be described in detail here.
  • the spatial transformation network used for predicting the outline of the base and predicting the area of the base may be the same network or different networks.
  • the spatial transformation network used for predicting the outline of the base and predicting the base area may be the same network.
  • S706 may be executed to adjust the base area based on the base outline to obtain the final base area corresponding to the building.
  • the attribute information of the final base area includes coordinate information representing the base area, feature information of the base area, and an outline of the base area.
  • the outline of the base can be fused with the preliminarily predicted base area, and the edge corresponding to the preliminarily predicted base area can be corrected by the fusion technology to obtain a more realistic base outline.
  • the base outline can then be fused with the original target image to obtain the final base area.
  • the process of image fusion may refer to the related art, which will not be described in detail here.
  • the roof outline is first obtained from the target image using the bounding box analysis network. Then based on the roof profile to get the exact base profile. Finally, based on the outline of the base, the preliminary predicted base area is corrected to obtain the final base area.
  • the above-mentioned roof outline is more accurate and more suitable for the real building roof outline. Therefore, the final foundation predicted after the above-mentioned foundation outline is revised Regions will be more precise.
  • the image processing models used in the building base prediction scheme may include a building bounding box prediction submodel, a roof area prediction submodel, an offset prediction submodel, a roof outline prediction submodel, and a base area prediction submodel.
  • the multi-task joint training method is adopted when training the image processing model.
  • FIG. 10 is a diagram showing the correspondence between tasks and models shown in this application.
  • the base prediction needs at least the building bounding box prediction subtask, the roof area prediction subtask, and the offset between the roof and the base.
  • the prediction subtask (hereinafter referred to as the "offset prediction subtask"), the roof outline prediction subtask, and the base area prediction subtask.
  • the above building bounding box prediction subtask corresponds to the building bounding box prediction submodel.
  • the above-mentioned roof area prediction subtask corresponds to the roof area prediction submodel.
  • the above offset prediction subtask corresponds to the offset prediction submodel.
  • the above-mentioned roof profile prediction subtask corresponds to the roof profile prediction submodel.
  • the above-mentioned base area prediction subtask corresponds to the base area prediction sub-model.
  • FIG. 11 is a method flowchart of an image processing model training method shown in this application.
  • the above image processing model includes a building bounding box prediction sub-model, a roof area prediction sub-model, an offset prediction sub-model, a roof outline prediction sub-model and a base area prediction sub-model.
  • the method includes:
  • S1102 Acquire a plurality of training samples including labeling information; wherein the labeling information includes a building bounding box, a building roof area, a building roof outline, an offset between a building roof and a base, and a building base area.
  • the original image can be labeled with ground truth by means of manual labeling or machine-assisted labeling.
  • image annotation software can be used to label the building bounding box, building roof area, building roof outline, offset between building roof and base, building base included in the original image Regions are labeled to obtain multiple training samples.
  • one-hot encoding or other methods may be used to encode the training samples, and the present application does not limit the specific encoding method.
  • S1104 Construct joint learning loss information based on loss information corresponding to each sub-model included in the image processing model.
  • the corresponding loss information of each sub-model may be determined first.
  • the loss information corresponding to the above-mentioned building bounding box prediction sub-model is Smooth L1; the above-mentioned roof area prediction sub-model, the above-mentioned roof outline prediction sub-model, and the above-mentioned base area prediction sub-model
  • the corresponding loss information is the cross entropy loss information; the loss information corresponding to the above-mentioned offset prediction sub-model between the roof and the base is the MSE (Mean Squared Error, mean square error) loss information. From this, five different levels of supervisory information are provided.
  • joint learning loss information may be constructed based on the corresponding loss information of each sub-model included in the above image processing model. For example, the loss information corresponding to each sub-model can be added to obtain the above-mentioned joint learning loss information.
  • a regularization term may also be added to the above joint learning loss information in the present application, which is not particularly limited here.
  • S1106 may be executed to jointly train the sub-models included in the image processing model based on the joint learning loss information and the training samples, until the sub-models converge.
  • the above-mentioned image processing model can be supervised based on the above-mentioned training samples marked with ground truth values.
  • the supervised training process after the calculation results are obtained by forward propagation of the image processing model, the error between the true value and the above calculation results can be evaluated based on the constructed joint learning loss information.
  • the stochastic gradient descent method can be used to determine the descending gradient.
  • the model parameters corresponding to the above image processing model can be updated based on backpropagation. The above process is repeated until the above sub-model models converge.
  • the present application does not specifically limit the conditions for model convergence.
  • the method of sharing features is used for multi-task joint training to ensure the coupling between each task in the training phase.
  • the five sub-models included in the image processing can be trained at the same time, so that the sub-models can both constrain and promote each other during the training process.
  • the convergence efficiency of the image processing model is improved; on the other hand, the backbone network shared by each sub-model is promoted to extract features that are more beneficial to the prediction of the base area, thereby improving the accuracy of the model prediction.
  • the present application further provides an image processing apparatus.
  • FIG. 12 is a schematic diagram of an image processing apparatus shown in this application.
  • the above-mentioned apparatus 1200 includes:
  • the acquisition module 1210 is used to acquire a target image containing at least one building; the extraction module 1220 is used to extract the bounding box of the above-mentioned building and the target feature map of the above-mentioned target image from the above-mentioned target image for each building, Based on the target feature map of the target image and the bounding box of the building, the roof area of the building and the predicted offset between the roof and the base are determined; the transformation module 1230 is configured to, according to the predicted offset, perform a transformation on the roof of the building. The area is transformed to obtain the base area of the above-mentioned building.
  • the bounding box determination module is used to perform target detection on the above-mentioned target image by using the building bounding box prediction sub-model to obtain the bounding box of the above-mentioned building;
  • the above-mentioned extraction module 1220 includes: a roof area determination module , for determining the roof area of the above-mentioned building based on the target feature map of the above-mentioned bounding box and the above-mentioned target image through the roof area prediction sub-model;
  • the offset determination module is used for the target feature map based on the above-mentioned bounding box and the above-mentioned target image.
  • the above-mentioned predicted offset of the above-mentioned building is determined by the offset prediction sub-model.
  • the above-mentioned roof area prediction sub-model and the above-mentioned offset prediction sub-model share the same area feature extraction unit, and the above-mentioned area feature extraction unit is based on the bounding box of the building and the target feature map of the target image Determine the building features of the above-mentioned buildings;
  • the above-mentioned roof area determination module includes: a first convolution processing module for using the first convolution processing unit included in the above-mentioned roof area prediction sub-model to perform the first volume on the above-mentioned building characteristics.
  • the above-mentioned offset determination module includes: a second convolution processing module for using the second convolution processing unit included in the above-mentioned offset prediction sub-model to analyze the characteristics of the above-mentioned building.
  • a second convolution process is performed to obtain the above-mentioned predicted offset of the above-mentioned building.
  • the transformation module 1230 is specifically configured to: determine the base area of the building by using the base area prediction sub-model based on the predicted offset and the building characteristics of the building.
  • the above-mentioned transformation module 1230 includes: a first translational transformation module, configured to perform translational transformation on the building features corresponding to the above-mentioned roof area by using the spatial transformation network included in the above-mentioned base area prediction sub-model, to obtain The base feature of the above-mentioned building, wherein, the spatial transformation parameter corresponding to the above-mentioned spatial transformation network includes a parameter determined based on the above-mentioned prediction offset; the third convolution processing module is used for using the above-mentioned base area prediction sub-model to perform the above-mentioned base feature. The third convolution process obtains the base area of the above-mentioned building.
  • the above-mentioned spatial transformation network includes a sampler constructed based on interpolation, wherein the above-mentioned sampler includes a sampling grid constructed based on the above-mentioned predicted offset; the above-mentioned first translational transformation module is specifically used for: Using the above sampler, according to the coordinate information of the plurality of pixels included in the base feature, each pixel included in the base feature is taken as the current pixel in turn, and the sampling grid is used to determine the number of pixels included in the roof area.
  • the pixel point corresponding to the above-mentioned current pixel point, and the value of the above-mentioned determined pixel point is calculated based on the interpolation method to obtain the pixel value corresponding to the above-mentioned current pixel point.
  • the roof contour prediction sub-model that shares the same area feature extraction unit with the above-mentioned roof area prediction sub-model and the above-mentioned offset prediction sub-model also includes a roof contour prediction sub-model;
  • the above-mentioned apparatus further includes: a contour regression module for using The above-mentioned roof contour prediction sub-model performs contour regression on the above-mentioned building features to determine the roof contour of the above-mentioned building;
  • the second translation transformation module is used for transforming the above-mentioned roof contour according to the above-mentioned predicted offset, and obtains the above-mentioned building.
  • the base outline; the final base area determination module is used to adjust the base area based on the base outline to obtain the final base area of the building.
  • the above-mentioned contour regression module is specifically used to: extract a plurality of connection points from the above-mentioned building features; combine at least some of the above-mentioned connection points in pairs to obtain a plurality of line segments Predict the above-mentioned multiple line segments, and obtain the corresponding prediction score of each line segment, wherein, the above-mentioned prediction score is used to indicate the probability that the line segment corresponding to the score belongs to the roof outline; The line segments with the threshold value are combined to obtain the roof outline of the above-mentioned building.
  • the above-mentioned extraction module 1220 is specifically configured to: use an image processing model to perform image processing on the target image; wherein, the image processing model includes a building bounding box prediction sub-model, a roof area prediction sub-model model, offset prediction submodel, roof profile prediction submodel, and plinth area prediction submodel.
  • the image processing model includes a building bounding box prediction sub-model, a roof area prediction sub-model model, offset prediction submodel, roof profile prediction submodel, and plinth area prediction submodel.
  • the training device corresponding to the training method of the above image processing model includes:
  • the training sample acquisition module is used to acquire a plurality of training samples including label information; wherein, the label information includes the building bounding box, the building roof area, the building roof outline, the offset between the building roof and the base, building base area;
  • a loss information determination module configured to construct joint learning loss information based on the loss information corresponding to each sub-model included in the above image processing model
  • the joint training module is configured to jointly train each sub-model included in the above-mentioned image processing model based on the above-mentioned joint learning loss information and the above-mentioned training sample, until the above-mentioned sub-models converge.
  • an electronic device which may include: a processor.
  • Memory used to store processor-executable instructions.
  • the above-mentioned processor is configured to invoke the executable instructions stored in the above-mentioned memory to implement the image processing method shown in any of the above-mentioned embodiments.
  • FIG. 13 is a hardware structure diagram of an electronic device shown in this application.
  • the electronic device may include a processor for executing instructions, a network interface for performing network connection, a memory for storing operating data for the processor, and a non-volatile memory for storing instructions corresponding to the image processing apparatus. volatile memory.
  • the embodiment of the image processing apparatus may be implemented by software, or may be implemented by hardware or a combination of software and hardware.
  • a device in a logical sense is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where the device is located.
  • the electronic device where the apparatus is located in the embodiment may also include other Hardware, no further details on this.
  • the corresponding instructions of the image processing apparatus may also be directly stored in the memory, which is not limited herein.
  • the present application provides a computer-readable storage medium, where a computer program is stored in the storage medium, and the computer program is used to execute the image processing method shown in any of the foregoing embodiments.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the embodiments of the present application further provide a computer program product, where the computer program product carries program codes, and the instructions included in the program codes can be used to execute the data set generation method or the forgery detection method described in the above method embodiments. Refer to the above method embodiments, which are not repeated here.
  • the above-mentioned computer program product can be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
  • one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may employ a computer implemented on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein The form of the program product.
  • computer-usable storage media which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuits, in tangible embodiment of computer software or firmware, in computer hardware which can include the structures disclosed in this application and their structural equivalents, or in A combination of one or more of.
  • Embodiments of the subject matter described in this application may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
  • the processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows described above can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • a computer suitable for the execution of a computer program may include, for example, a general and/or special purpose microprocessor, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from read only memory and/or random access memory.
  • the basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operably coupled to, such mass storage devices to receive data therefrom or to include one or more mass storage devices, such as magnetic disks, magneto-optical disks, or optical disks, etc., for storing data. Send data to it, or both.
  • the computer does not have to have such a device.
  • the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer readable media suitable for storage of computer program instructions and data may include all forms of non-volatile memory, media, and memory devices, and may include, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks) or removable discs), magneto-optical discs, and CD-ROM and DVD-ROM discs.
  • semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
  • magnetic disks eg, internal hard disks
  • removable discs removable discs
  • magneto-optical discs e.g., CD-ROM and DVD-ROM discs.
  • the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种图像处理方法、装置、设备和存储介质,该方法包括,获取包含至少一个建筑物的目标图像;对于每个建筑物,从上述目标图像中提取出所述建筑物的边界框和所述目标图像的目标特征图;基于上述目标图像的目标特征图和所述建筑物的边界框确定所述建筑物的屋顶区域和屋顶与底座之间的预测偏移量;根据上述预测偏移量,对上述屋顶区域进行变换得到所述建筑物的底座区域。

Description

一种图像处理方法、装置、设备和存储介质
相关公开的交叉引用
本公开基于申请号为202011035443.6、申请日为2020年9月27日、申请名称为“一种图像处理方法、装置、设备和存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本申请涉及计算机技术领域,具体涉及一种图像处理方法、装置、设备和存储介质。
背景技术
目前,在图像处理领域中,通常需要将图像中的建筑物提取出来,用于进行诸如城市规划、地图绘制,建筑物变化检测等活动。而进行建筑物提取中重要任务之一为进行建筑物底座提取。
可是,由于包含建筑物的目标图像通常为通过卫星或飞机拍摄的遥感图像,因此,图像中建筑物底座可能被部分遮挡,导致建筑物底座的视觉特征并不明显,从而影响建筑物底座的提取精度。
发明内容
有鉴于此,本申请至少公开一种图像处理方法,上述方法包括:获取包含至少一个建筑物的目标图像;对于每个建筑物,从上述目标图像中提取出上述建筑物的边界框和上述目标图像的目标特征图,基于上述目标图像的目标特征图和上述建筑物的边界框确定上述建筑物的屋顶区域和屋顶与底座之间的预测偏移量;根据上述预测偏移量,对上述屋顶区域进行变换得到上述建筑物的底座区域。
在示出的一些例子中,上述从所述目标图像中提取出上述建筑物的边界框包括:
利用建筑物边界框预测子模型对上述目标图像进行目标检测,得到上述建筑物的边界框;上述确定上述建筑物的屋顶区域,包括:基于上述建筑物的边界框和上述目标图像的目标特征图,由屋顶区域预测子模型确定上述建筑物的屋顶区域;上述确定上述建筑物的屋顶与底座之间的预测偏移量,包括:基于上述建筑物的边界框和上述目标图像的目标特征图,由偏移量预测子模型确定上述建筑物的上述预测偏移量。
在示出的一些例子中,上述屋顶区域预测子模型与上述偏移量预测子模型共用同一区域特征提取单元,上述区域特征提取单元基于上述建筑物的边界框和所述目标图像的目标特征图确定上述建筑物的建筑物特征;上述确定上述建筑物的屋顶区域,包括:利用上述屋顶区域预测子模型包括的第一卷积处理单元对上述建筑物特征进行第一卷积处理,得到上述建筑物的屋顶区域;上述确定上述建筑物的上述预测偏移量,包括:利用上述偏移量预测子模型包括的第二卷积处理单元对上述建筑物特征进行第二卷积处理,得到上述建筑物的上述预测偏移量。
在示出的一些例子中,上述根据上述预测偏移量,对上述屋顶区域进行变换得到上述建筑物的底座区域,包括:基于上述预测偏移量和上述建筑物的上述建筑物特征,由底座区域预测子模型确定上述建筑物的底座区域。
在示出的一些例子中,上述基于上述预测偏移量和上述建筑物的建筑物特征,由底 座区域预测子模型确定上述建筑物的底座区域,包括:利用上述底座区域预测子模型包括的空间变换网络,对上述屋顶区域对应的建筑物特征进行平移变换,得到上述建筑物的底座特征,其中,上述空间变换网络的空间变换参数包括基于上述预测偏移量确定的参数;利用上述底座区域预测子模型对上述底座特征进行第三卷积处理,得到上述建筑物的底座区域。
在示出的一些例子中,上述空间变换网络包括基于插值方式构建的采样器,其中,上述采样器包括基于上述预测偏移量构建的采样网格;上述利用上述底座区域预测子模型包括的空间变换网络,对上述屋顶区域对应的建筑物特征进行平移变换,得到上述建筑物的底座特征,包括:利用上述采样器,按照上述底座特征包括的多个像素点的各坐标信息,依次将底座特征包括的各像素点作为当前像素点,通过上述采样网格确定所述屋顶区域包括的各像素点中,与上述当前像素点对应的像素点,并基于插值方式对上述确定的像素点的值进行计算,得到上述当前像素点对应的像素值。
在示出的一些例子中,与上述屋顶区域预测子模型以及上述偏移量预测子模型共用同一区域特征提取单元的还包括屋顶轮廓预测子模型;上述方法还包括:利用上述屋顶轮廓预测子模型对上述建筑物特征进行轮廓回归,确定上述建筑物的屋顶轮廓;根据上述预测偏移量,对上述屋顶轮廓进行变换,得到上述建筑物的底座轮廓;基于上述底座轮廓对上述底座区域进行调整,得到上述建筑物的最终底座区域。
在示出的一些例子中,上述利用屋顶轮廓预测子模型对上述建筑物特征进行轮廓回归,确定上述建筑物的屋顶轮廓,包括:从上述建筑物特征中提取多个连接点;将上述多个连接点中的至少部分连接点进行组合,得到多个线段;对上述多个线段进行预测,得到各线段对应的预测分数,其中,上述预测分数用于指示与该分数对应的线段属于屋顶轮廓的概率;将上述多个线段中,对应的预测分数大于预设阈值的线段进行组合,得到上述建筑物的屋顶轮廓。
在示出的一些例子中,上述方法利用图像处理模型得到;其中,上述图像处理模型包括建筑物边界框预测子模型、屋顶区域预测子模型、偏移量预测子模型、屋顶轮廓预测子模型以及底座区域预测子模型。
在示出的一些例子中,上述图像处理模型的训练方法包括:获取多个包括标注信息的训练样本;其中,上述标注信息包括建筑物边界框,建筑物屋顶区域,建筑物屋顶轮廓,建筑物屋顶与底座之间的偏移量,建筑物底座区域;基于上述图像处理模型包括的各子模型分别对应的损失信息,构建联合学习损失信息;基于上述联合学习损失信息以及上述训练样本对上述图像处理模型包括的各子模型进行联合训练,直至上述各子模型收敛。
本申请还提出一种图像处理装置,上述装置包括:获取模块,用于获取包含至少一个建筑物的目标图像;提取模块,用于针对每个建筑物,从上述目标图像中提取出上述建筑物的边界框和上述目标图像的目标特征图,基于上述目标图像的目标特征图和上述建筑物的边界框确定上述建筑物的屋顶区域和屋顶与底座之间的预测偏移量;变换模块,用于根据上述预测偏移量,对上述屋顶区域进行变换得到上述建筑物的底座区域。
本申请还提出一种电子设备,上述设备包括:处理器;用于存储上述处理器可执行指令的存储器;其中,上述处理器被配置为调用上述存储器中存储的可执行指令,实现如上述任一实施例示出的图像处理方法。
本申请还提出一种计算机可读存储介质,上述存储介质存储有计算机程序,上述计算机程序用于执行如上述任一实施例示出的图像处理方法。
[17]本申请还提出一种计算机程序,包括计算机可读代码,当所述计算机可读代码 在电子设备中运行时,所述电子设备中的处理器执行用于实现如上述任一实施例示出的图像处理方法。
在上述方案中,由于可以从获取的目标图像中提取出视觉特征较为明显的建筑物屋顶区域以及屋顶与底座之间的预测偏移量,然后再基于该预测偏移量,对精度较高的屋顶区域进行变换,因此可以得到精度较高的建筑物底座区域,从而在建筑物底座预测过程中,无需依赖目标图像中包括的底座特征,以使在目标图像中包括的建筑物底座特征被遮挡的情形下,也可以得到精度较高的建筑物底座。
应当理解的是,以上所述的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
为了更清楚地说明本申请一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请示出的一种图像处理方法的方法流程图;
图2为本申请示出的一种通过图像处理模型进行底座提取的流程示意图;
图3为本申请示出的一种通过图像处理单元进行屋顶区域预测的方法流程示意图;
图4为本申请示出的一种通过图像处理单元进行偏移量预测的方法流程示意图;
图5为本申请示出的一种通过图像处理单元进行偏移量预测与屋顶区域预测的方法流程示意图;
图6为本申请示出的一种通过底座区域预测子模型进行底座预测的方法流程示意图;
图7为本申请示出的一种最终底座区域预测方法的流程图;
图8为本申请示出的一种通过边界框分析网络进行屋顶轮廓预测的方法流程示意图;
图9为本申请示出的一种底座区域预测流程示意图;
图10为本申请示出的一种任务与模型对应关系图;
图11为本申请示出的一种图像处理模型训练方法的方法流程图;
图12为本申请示出的一种图像处理装置的示意图;
图13为本申请示出的一种硬件结构图。
具体实施方式
下面将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的设备和方法的例子。
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“上述”和“该”也旨在可以包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。还应当理解,本文中 所使用的词语“如果”,取决于语境,可以被解释成为“在……时”或“当……时”或“响应于确定”。
本申请旨在提出一种图像处理方法。该方法充分利用目标图像中建筑物主体、屋顶、底座等的信息,通过从获取的目标图像中提取出视觉特征较为明显的建筑物屋顶区域以及屋顶与底座之间的预测偏移量,然后再基于该预测偏移量,对精度较高的屋顶区域进行变换,从而得到精度较高的建筑物底座区域。这样,即便在目标图像中包括的建筑物底座被遮挡的情形下,也可以得到精度较高的建筑物底座。
请参见图1,图1为本申请示出的一种图像处理方法的方法流程图。如图1所示,上述方法可以包括:
S102,获取包含至少一个建筑物的目标图像。
S104,对于每个建筑物,从上述目标图像中提取出上述建筑物的边界框和上述目标图像的目标特征图,以及,基于上述目标图像的目标特征图和上述建筑物的边界框确定上述建筑物的屋顶区域和屋顶与底座之间的预测偏移量。
S106,根据上述预测偏移量,对上述屋顶区域进行变换得到上述建筑物的底座区域。
其中,上述建筑物的屋顶区域由屋顶区域的属性信息进行表征,上述属性信息至少包括表示上述屋顶区域的坐标信息。在一些例子中,上述属性信息还包括上述屋顶区域的特征信息和/或上述屋顶区域的轮廓。上述建筑物的底座区域由底座区域的属性信息进行表征,上述属性信息至少包括表示上述底座区域的坐标信息。在一些例子中,上述属性信息还包括上述底座区域的特征信息和/或上述底座区域的轮廓。
上述图像处理方法可以应用于电子设备中。其中,上述电子设备可以通过搭载与图像处理方法对应的软件系统或者硬件结构执行上述图像处理方法。本申请实施例中,上述电子设备的类型可以是笔记本电脑,计算机,服务器,手机,PAD终端等,在本申请中不作特别限定。
可以理解的是,上述图像处理方法既可以仅通过终端设备或服务端设备单独执行,也可以通过终端设备与服务端设备配合执行。
例如,上述图像处理方法可以集成于客户端。搭载该客户端的终端设备在接收到图像处理请求后,可以通过自身硬件环境提供算力执行上述图像处理方法。
又例如,上述图像处理方法可以集成于系统平台。搭载该系统平台的服务端设备在接收到图像处理请求后,可以通过自身硬件环境提供算力执行上述图像处理方法。
还例如,上述图像处理方法可以分为获取目标图像与对目标图像进行处理两个任务。其中,获取任务可以集成于客户端并搭载于终端设备。处理任务可以集成于服务端并搭载于服务端设备。上述终端设备可以在获取到目标图像后向上述服务端设备发起图像处理请求。上述服务端设备在接收到上述图像处理请求后,可以响应于上述请求对上述目标图像执行上述方法。
以下以执行主体为电子设备(以下简称设备)为例进行说明。
上述目标图像是指图像中包括至少一个建筑物的图像。例如,上述目标图像可以是通过飞机、无人机、卫星等设备拍摄的遥感图像。
为了简单起见,后续主要以一个建筑物为例进行说明。目标图像中包括多个建筑物的图像的处理方法,与包括一个建筑物的图像的处理方法类似。
在一种情形中,在获取目标图像时,上述设备可以通过与用户进行交互,完成目标图像的输入。例如,上述设备可以通过其搭载的界面为用户提供输入待处理目标图像的 窗口,供用户输入图像。用户可以基于该窗口完成目标图像的输入。上述设备在获取到目标图像后,可以将该图像输入图像处理模型中进行计算。
在另一种情形中,上述设备可以直接获取遥感图像采集系统输出的遥感图像。例如,上述设备可以与遥感图像采集系统预先建立某种协议。当遥感图像采集系统生成遥感图像后可以发送至上述设备进行图像处理。
在一些例子中,上述设备可以搭载图像处理模型以进行上述图像处理。
具体地,上述设备可以利用图像处理模型对上述目标图像中的每个建筑物进行图像处理,以从上述目标图像中提取出建筑物的屋顶区域以及上述建筑物的屋顶与底座之间的预测偏移量,并根据上述预测偏移量,对上述屋顶区域进行变换得到上述建筑物对应的底座区域。
上述图像处理模型,可以是端到端的基于目标图像进行建筑物底座提取的图像处理模型。在一些例子中,该图像处理模型可以是预先训练完毕的神经网络模型。
请参见图2,图2为本申请示出的一种通过图像处理模型进行底座提取的流程示意图。如图2所示,上述图像处理模型可以包括图像处理单元和区域变换单元。其中,上述图像处理单元的输入为上述目标图像。上述图像处理单元的输出为每个建筑物的屋顶区域以及屋顶与底座之间的预测偏移量。上述区域变换单元的输入为上述图像处理单元的输出。上述区域变换单元的输出为底座区域。
上述图像处理单元可以包括基于深度神经网络构建的、用于对屋顶区域和屋顶与底座之间的预测偏移量进行预测的子模型。
在一些例子中,上述图像处理单元还可以包括建筑物边界框预测子模型和屋顶区域预测子模型。
其中,上述建筑物边界框预测子模型,用于将目标图像中的建筑物边界框提取出来,为其他子模型提供输入,从而充分利用目标图像中建筑物的各种信息。上述建筑物边界框预测子模型,可以是基于多个标注了边界框的训练样本进行训练得到的神经网络模型。
上述屋顶区域预测子模型,基于输入的上述建筑物边界框以及目标图像中的区域特征,针对每一个建筑物进行屋顶区域预测。上述屋顶区域预测子模型,可以是基于多个标注了屋顶区域的训练样本进行训练得到的神经网络模型。
请参见图3,图3为本申请示出的一种通过图像处理单元进行屋顶区域预测的方法流程示意图。
如图3所示,图像处理单元可以包括屋顶区域预测子模型。屋顶区域预测子模型可以包括建筑物边界框预测子模型。上述建筑物边界框预测子模型可以是基于RPN(Region Proposal Network,候选框生成网络)构建的回归模型。上述屋顶区域预测子模型可以是基于RoI Align(Region of interest Align,感兴趣区域特征对齐)网络或RoI pooling(Region of interest pooling,感兴趣区域特征池化)网络等区域特征提取单元构建的回归模型。上述屋顶区域预测子模型包括了上述建筑物边界框预测子模型,上述建筑物边界框预测子模型包括骨干网络、候选框生成网络和区域特征提取单元。本申请实施例中,图3仅为示意性说明,可根据实际情形增加一些诸如卷积层,空间金字塔层,全连接层等中间层。
在预测建筑物屋顶时,可以先利用建筑物边界框预测子模型对上述目标图像进行目标检测,得到上述建筑物的边界框。将目标图像输入骨干网络后,可以得到该目标图像的目标特征图。本申请不限制骨干网络的架构,可以为常见的卷积神经网络(Convolutional Neural Networks,CNN)网络,如VGGNet、ResNet、HRNet等。该目 标图像的目标特征图的信息和所应用的骨干网络的具体架构相关。然后基于RPN对目标特征图进行计算,得到多个大小不同的候选框。通过区域特征提取单元1,可以从这些候选框得到固定大小的相应的特征,再通过后续的全连接层分别生成一个或多个建筑物的边界框。其中,区域特征提取单元1可以使用RoI Align网络或RoI pooling网络。
例如,在一些例子中可以将上述目标图像输入上述建筑物边界框预测子模型进行目标检测,得到建筑物的边界框。
在得到上述目标图像中包括的各建筑物的边界框后,可以基于上述边界框和上述目标图像的目标特征图,由屋顶区域预测子模型确定上述边界框内包括的建筑物的屋顶区域。上述屋顶区域的属性信息包括上述屋顶区域的坐标信息。
例如,在一些例子中可以将上述边界框以及上述目标图像的目标特征图输入上述屋顶区域预测子模型的区域特征提取单元2,得到上述边界框内包括的建筑物对应的屋顶区域。
在一个例子中,在训练骨干网络时可以增加屋顶区域预测子模型的损失值来进行监督训练,从而提升骨干网络特征提取的准确度。
在上述方案中,上述屋顶区域预测子模型利用了上述建筑物边界框预测子模型的输出。因此在训练上述建筑物边界框预测子模型时,可以使用屋顶区域进行有监督训练,在训练样本中,样本图片的屋顶区域部分进行了真实值的标注,作为监督信息,使建筑物边界框预测子模型能够学习到预测屋顶区域需要的相关特征,从而提升建筑物边界框预测精准度,进一步提升屋顶提取精准度。
上述图像处理单元还可以包括屋顶与底座之间的偏移量预测子模型(以下简称“偏移量预测子模型”),用于从目标图像中提取该图像中包括的建筑物的屋顶与底座之间的预测偏移量(以下简称“预测偏移量”)。在进行偏移量预测时,可以将目标图像输入上述偏移量预测子模型,预测出上述预测偏移量。
在一些例子中,由于上述偏移量预测子模型与上述屋顶区域预测子模型均是针对目标图像中包括的建筑物进行特征提取,因此,为了减小模型运算量,可以使上述偏移量预测子模型与上述屋顶区域预测子模型共用上述建筑物边界框预测子模型。
请参见图4,图4为本申请示出的一种通过图像处理单元进行偏移量预测的方法流程示意图。
如图4所示,图像处理单元可以包括建筑物边界框预测子模型,偏移量预测子模型。其中,上述建筑物边界框预测子模型可以是基于RPN构建的回归模型。上述偏移量预测子模型可以是基于RoI Align网络或RoI pooling网络等区域特征提取单元构建的回归模型。上述偏移量预测子模型与上述屋顶区域预测子模型共用上述建筑物边界框预测子模型。本申请实施例中,图4仅为示意性说明,可根据实际情形增加一些诸如卷积层,空间金字塔层,全连接层等中间层。
在预测偏移量时,可以基于上述边界框和上述目标图像的目标特征图,由偏移量预测子模型确定上述边界框内包括的建筑物的屋顶与底座之间的预测偏移量。
例如,在一些例子中可以将建筑物边界框预测子模型输出的建筑物的边界框以及上述目标图像的目标特征图输入偏移量预测子模型的区域特征提取单元2,得到上述预测偏移量。
在一个例子中,在训练骨干网络时可以增加偏移量预测子模型的损失值来进行监督训练,从而提升骨干网络特征提取的准确度。
在上述方案中,一方面,上述偏移量预测子模型与上述屋顶区域预测子模型共用上述建筑物边界框预测子模型,上述偏移量预测子模型的输入为上述建筑物边界框预测子模型的输出,可以理解的是,可以认为上述偏移量预测子模型包括了上述建筑物边界框预测子模型。因此在训练上述建筑物边界框预测子模型时,可以使用预测偏移量进行有监督训练,在训练样本中,样本图片的偏移量部分进行了真实值的标注,作为监督信息,使建筑物边界框预测子模型能够学习到预测偏移量需要的相关特征,从而提升建筑物边界框预测精准度,进一步提升变换得到底座区域的精准度。
另一方面,上述偏移量预测子模型与上述屋顶区域预测子模型共用上述建筑物边界框预测子模型可以减少模型运算量。
在一些实施例中,上述屋顶区域预测子模型与上述屋顶与底座之间的偏移量预测子模型可以共用同一区域特征提取单元。
其中,上述区域特征提取单元,可以是基于RoI Align单元或RoI pooling单元构建的区域特征提取单元。
请参见图5,图5为本申请示出的一种通过图像处理单元进行偏移量预测与屋顶区域预测的方法流程示意图。
如图5所示,上述处理流程包括两个子分支。其中,第一子分支为屋顶区域预测子分支;另一子分支为偏移量预测子分支。
上述两个子分支可以共用区域特征提取单元。在进行偏移量预测与屋顶区域预测时可以先基于上述边界框,上述目标图像(可以是目标图像经过骨干网络处理后得到的目标特征图)以及上述区域特征提取单元,确定上述边界框内包括的建筑物对应的建筑物特征。
本申请实施例中,上述目标图像中可能包括多个建筑物。可以理解的是,在上述情形下,本申请记载的方案可以将多个建筑物的边界框分别提取出来,并针对每一建筑物边界框执行上述确定建筑物特征的步骤。在本申请中不对目标图像包括的建筑物数量进行限定。
在得到边界框内建筑物对应的建筑物特征后,在屋顶区域预测子分支中,可以利用屋顶区域预测子模型包括的第一卷积处理单元对上述建筑物特征进行第一卷积处理,得到上述建筑物的屋顶区域。上述屋顶区域的属性信息不但包括上述屋顶区域的坐标信息,还包括上述屋顶区域的特征信息。
在执行上述第一卷积处理时,可以是将建筑物特征输入图5示出的第一卷积处理单元中进行计算得到屋顶区域的属性信息。
在得到边界框内建筑物对应的建筑物特征后,在偏移量预测子分支中,还可以利用上述偏移量预测子模型包括的第二卷积处理单元对上述建筑物特征进行第二卷积处理,得到上述边界框内包括的建筑物的屋顶与底座之间的预测偏移量。
在执行上述第二卷积处理时,可以是将建筑物特征输入图5示出的第二卷积处理单元中进行计算得到预测偏移量。
本申请实施例中,一方面,本申请不对上述第一卷积单元与上述第二卷积单元的结构进行限定。上述第一卷积单元与上述第二卷积单元的结构可以根据实际需求进行设定。另一方面,上述图5示出的模型结构仅为示意性图示。在图5中并未示出诸如上采样、下采样,池化操作等常规结构。上述常规结构可以根据实际情形进行设定。
在上述方案中,上述屋顶区域预测子模型与上述屋顶与底座之间的偏移量预测子模 型可以共用同一区域特征提取单元。因此,一方面,在训练上述区域特征提取单元时,可以使用预测偏移量与屋顶区域进行有监督训练,引入偏移量、屋顶区域等真实值的标注信息作为监督信息,使区域特征提取单元能够学习到预测偏移量以及屋顶区域需要的相关特征,从而提升建筑物特征提取精准度,进一步提升底座提取精准度。另一方面,精简模型结构,减少模型运算量。
请继续参见图2,当通过图像处理单元得到屋顶区域与预测偏移量之后,可以将得到的屋顶区域与预测偏移量输入区域变换单元,得到底座区域。
其中,区域变换单元可以是一种映射单元。例如,可以构造一个映射函数y=f(x1,x2)。其中,x1表示屋顶区域与底座区域之间的预测偏移量,x2表示屋顶区域。y表示底座区域。f为由预测偏移量与屋顶区域得到底座区域的映射函数。通过构造的上述映射函数,可以基于预测偏移量与屋顶区域得到底座区域。
本申请实施例中,上述预测偏移量可以包括旋转预测偏移量,与平移预测偏移量。在本申请中不对预测偏移量的具体含义进行限定。以下以仅经过平移屋顶即可得到底座为例进行说明。
在一些实施例中,为了提升底座提取精度,在根据上述预测偏移量,对上述屋顶区域进行变换,得到上述建筑物对应的底座区域时,可以根据上述预测偏移量,以及上述屋顶区域的屋顶特征进行变换,得到底座区域的底座特征。在得到底座特征后,再对底座特征进行精细化处理,得到底座区域。
在上述实施例中,在进行平移变换时,是针对屋顶区域的特征进行平移变换,在平移变换时,区域变换单元可以利用双线性插值的方式对原始特征和变换后的特征进行选择和映射,而由于屋顶区域是基于屋顶特征经过卷积和/或上采样而得到,因此,上述操作可以避免在平移变换过程中,引入从屋顶特征卷积和/或上采样至屋顶区域这一过程产生的其它误差,从而提升底座提取精度。
本申请实施例中,请参见图5,屋顶区域是基于建筑物特征得到,而建筑物特征是从区域特征提取单元中得到。由于在训练该区域特征提取单元时是以屋顶区域为真值进行有监督训练的,因此,上述建筑物特征中对应于屋顶区域的特征响应会非常高。
在一些实施例中,上述区域变换单元可以是基于神经网络构建的单元。该单元可以作为用于预测底座区域的底座区域预测子模型,即,上述图像处理模型还包括基于神经网络构建的底座区域预测子模型。
在根据上述预测偏移量,对上述屋顶区域进行变换,得到上述建筑物对应的底座区域时,可以基于上述预测偏移量,上述建筑物对应的建筑物特征以及底座区域预测子模型,确定上述建筑物对应的底座区域。
例如,在一些例子中可以将上述预测偏移量以及上述建筑物对应的建筑物特征,输入上述底座区域预测子模型,得到上述建筑物对应的底座区域。
在一些实施例中,上述底座区域预测子模型可以包括空间变换网络。其中,上述空间变换网络对应的空间变换参数包括基于上述预测偏移量确定的参数。
请参见图6,图6为本申请示出的一种通过底座区域预测子模型进行底座预测的方法流程示意图。
如图6所示,在得到屋顶区域对应的建筑物特征后,可以利用上述底座区域预测子模型包括的空间变换网络,对上述屋顶区域对应的建筑物特征进行空间变换,得到上述建筑物对应的底座特征。
在得到上述底座特征后,可以将上述底座特征输入上述底座区域预测子模型包括的多个卷积层(图6中示出的第三卷积单元)进行第三卷积处理,得到上述建筑物对应的底座区域。上述底座区域的属性信息不但包括上述底座区域的坐标信息,还包括上述底座区域的特征信息。
本申请实施例中,一方面,本申请不对上述第三卷积单元的结构进行限定。上述第三卷积单元的结构可以根据实际需求进行设定。另一方面,上述图6示出的模型结构仅为示意性图示。在图6中并未示出诸如上采样,下采样,池化操作等常规结构。上述常规结构可以根据实际情形进行设定。
上述空间变换网络可以包括基于插值方式构建的采样器(Sampler),其中,上述采样器包括基于上述预测偏移量构建的采样网格(Grid generator)。
上述采样网格,具体是基于上述预测偏移量构建的变换函数。上述采样网格可以指示屋顶特征包括的各像素点,与底座特征包括的各像素点之间的映射关系。例如,根据上述采样网格,可以确定底座特征对应的某一像素点是由屋顶特征包括的哪些像素点映射而成的。
上述采样器,具体是基于插值方式构建的映射单元。上述采样器可以基于插值方式对原始特征(建筑物特征)和平移变换后的特征(底座特征)进行特征位置(feature position)与特征值(feature score)的映射,从而得到底座特征。其中,上述插值方式可以是,双线性插值,线性差值,抛物线插值等。在本申请中可以采用双线性插值的方式。
此时,在将上述屋顶区域对应的建筑物特征以及上述预测偏移量输入空间变换网络进行平移变换,得到上述建筑物对应的底座特征时,可以利用上述采样器,按照底座特征包括的多个像素点的各坐标信息,依次将底座特征包括的各像素点作为当前像素点,通过上述采样网格确定所述屋顶区域包括的各像素点中,与上述当前像素点对应的像素点,并基于插值方式对确定的像素点的值进行计算,得到上述当前像素点对应的像素值。
在上述实施例中,一方面,由于上述底座区域预测子模型包括可反向传播的空间变换网络以及第三卷积单元,因此,与通过诸如RT变换(旋转、平移变换)等非神经网络的形式将建筑物特征映射为底座特征的方式相比,可以将底座区域(底座特征)作为真值对底座区域预测子模型(包括空间变换网络以及第三卷积层)进行有监督训练,从而引入基于屋顶区域预测出的底座区域与底座区域真值之间的预测误差作为监督信息,使得能够基于上述预测误差对底座区域预测子模型与偏移量预测子模型之间的共享网络进行训练,从而提升偏移量预测精准度,进一步提升底座区域预测精准度。
另一方面,由于底座区域预测子模型,屋顶区域预测子模型,偏移量预测子模型共享经过区域特征提取单元输出的建筑物特征,因此,可以在各子模型训练时共享监督信息,从而加速模型收敛,同时提高各子模型的性能。另外,由于根据屋顶区域包括的各像素点在目标图像上的坐标信息可以还原出建筑物的地理坐标,因此,上述实施例提供的技术方案不仅能够准确的还原出建筑物底座的形状,还能够还原出建筑物底座的地理位置。
在一些实施例中,为了进一步提升底座区域预测精准度,可以从目标图像中,提取出比通过屋顶区域预测子模型提取的屋顶区域所包括的边缘更贴合实际的屋顶轮廓,并基于上述屋顶轮廓对屋顶区域预测子模型得到的屋顶区域进行修正,得到最终底座区域。
请参见图7,图7为本申请示出的一种最终底座区域预测方法的流程图。
如图7所示,在得到上述建筑物特征之后,可以执行S702,利用屋顶轮廓预测子模型对上述建筑物特征进行轮廓回归,确定上述建筑物的屋顶轮廓。
在一些例子中,可以将上述建筑物特征输入上述屋顶轮廓预测子模型,得到上述建筑物的屋顶轮廓。
上述屋顶轮廓预测子模型,可以是基于边界框分析(Wireframe Parsing)网络构建的模型。通过该边界框分析网络可以从目标图像中提取出较为精确的屋顶轮廓。
请参见图8,图8为本申请示出的一种通过边界框分析网络进行屋顶轮廓预测的方法流程示意图。
如图8所示,在得到建筑物特征后,可以从上述建筑物特征中提取多个连接点。
其中在提取多个连接点时,可以将建筑物特征输入第四卷积单元(图中未示出)进行多次卷积操作以及平滑处理,得到包括多个连接点的热图。其中,在对上述第四卷积单元进行训练时,可以通过为热图中的每个像素块(例如,热图的分辨率为14*14,则该热图包括196个像素块)标注真值(即当某一像素块内包括连接点时标记为1,否则标记为0)得到多个训练样本,然后可以基于该多个训练样本,以交叉熵损失信息作为目标函数对上述第四卷积单元进行训练,以使上述第四卷积单元可以对该热图中的每个像素块进行连接点预测。
在得到多个连接点后,可以进行线段采样。即,将上述多个连接点中的至少部分连接点进行两两组合,得到多个线段。
在得到多个线段后,可以进行线段验证。即,对上述多个线段进行预测,得到各线段对应的预测分数;并筛选出预测分数大于预设阈值的线段;其中,上述预测分数用于指示与该分数对应的线段属于屋顶轮廓的概率。
上述预设阈值,可以是根据经验设定的阈值。
在进行线段验证时,可以将上述多个线段输入线段验证网络,得到各线段对应的预测分数,然后可以筛选出预测分数大于上述预设阈值的线段。
上述线段验证网络,可以包括线段特征提取网络,以及分类分数预测网络。其中上述线段特征提取网络用于从建筑物特征中提取与构建的线段对应的线段特征。在得到线段特征后,可以基于上述分类分数预测网络、该线段特征,预测该线段对应的分类分数。
本申请实施例中,为了提升线段验证网络的预测精准度,在构建训练样本时可以设置数量相同的正样本与负样本,以使上述线段验证网络可以学习到正样本与负样本分别对应的线段特征,以实现线段分类分数的精准预测。其中,正样本是指图像中相似度较高的线段对。负样本是指相似度较低的线段对。
在筛选出分类分数大于上述预设阈值的线段后,可以将上述多个线段中,对应的预测分数大于预设阈值的线段进行组合,得到上述建筑物的屋顶轮廓。
在得到上述屋顶轮廓之后,可以执行S704,根据上述预测偏移量,对上述屋顶轮廓进行平移变换,得到上述建筑物的底座轮廓。
在一些实施例中,上述平移变换可以是通过预设的变换函数(例如RT转换)将屋顶轮廓映射为底座轮廓。
在一些实施例中,为了提升变换精准度,可以通过空间变换网络对屋顶轮廓进行平移变换得到底座轮廓。
请参见图9,图9为本申请示出的一种底座区域预测流程示意图。
如图9所示,在得到屋顶轮廓后,可以将上述屋顶轮廓以及上述预测偏移量输入上述空间变换网络进行平移变换,得到底座轮廓。其中,该空间变换网络的相关介绍可以参见前述内容,在此不作详述。
本申请实施例中,预测底座轮廓与预测底座区域所使用的空间变换网络可以是同一网络或不同的网络。当然,可以理解的是,为了减少运算量,预测底座轮廓与预测底座区域所使用的空间变换网络可以是同一网络。
在得到底座轮廓后,可以执行S706,基于上述底座轮廓对上述底座区域进行调整,得到上述建筑物对应的最终底座区域。最终底座区域的属性信息包括表示上述底座区域的坐标信息、上述底座区域的特征信息以及上述底座区域的轮廓。
在本步骤中,可以将底座轮廓与初步预测的底座区域相融合,通过融合技术对上述初步预测的底座区域对应的边缘进行修正,以得到更贴合实际的底座轮廓。之后可以将底座轮廓与原目标图像融合,得到最终底座区域。其中,图像融合的过程可以参见相关技术,在此不作详述。
在上述方案中,先采用边界框分析网络从目标图像中得到了屋顶轮廓。然后在基于屋顶轮廓得到精确的底座轮廓。最后基于底座轮廓对初步预测出的底座区域进行修正得到最终的底座区域。而由于上述屋顶轮廓相较于基于屋顶区域预测子模型预测出的屋顶区域来讲,精准度更高,更加贴合真实的建筑物屋顶轮廓,因此,基于上述底座轮廓修正后预测得到的最终底座区域将更加精准。
以上是对本申请示出的建筑物底座预测方案的介绍,以下介绍图像处理模型的训练方法。
在本申请中,建筑物底座预测方案使用的图像处理模型可以包括建筑物边界框预测子模型、屋顶区域预测子模型、偏移量预测子模型、屋顶轮廓预测子模型以及底座区域预测子模型。
为了提升图像处理模型对底座区域的预测精确度以及模型的泛化能力,在对图像处理模型进行训练时采用多任务联合训练方式。
请参见图10,图10为本申请示出的一种任务与模型对应关系图。
如图10所示,对图9示出的底座区域预测流程进行分解,可以得到进行底座预测需要至少包括建筑物边界框预测子任务,屋顶区域预测子任务,屋顶与底座之间的偏移量预测子任务(以下简称“偏移量预测子任务”),屋顶轮廓预测子任务,以及底座区域预测子任务。其中上述建筑物边界框预测子任务对应建筑物边界框预测子模型。上述屋顶区域预测子任务对应屋顶区域预测子模型。上述偏移量预测子任务对应偏移量预测子模型。上述屋顶轮廓预测子任务对应屋顶轮廓预测子模型。上述底座区域预测子任务对应底座区域预测子模型。
以下介绍图10示出的图像处理模型的训练过程。请参见图11,图11为本申请示出的一种图像处理模型训练方法的方法流程图。
其中,上述图像处理模型包括建筑物边界框预测子模型、屋顶区域预测子模型、偏移量预测子模型、屋顶轮廓预测子模型以及底座区域预测子模型。
如图11所示,该方法包括:
S1102,获取多个包括标注信息的训练样本;其中,上述标注信息包括建筑物边界框,建筑物屋顶区域,建筑物屋顶轮廓,建筑物屋顶与底座之间的偏移量,建筑物底 座区域。
在执行本步骤时,可以采用人工标注或机器辅助标注的方式对原始图像进行真值标注。例如,在获取到原始图像后,可以使用图像标注软件对原始图像中包括的建筑物边界框,建筑物屋顶区域,建筑物屋顶轮廓,建筑物屋顶与底座之间的偏移量,建筑物底座区域进行标注,以得到多个训练样本。本申请实施例中,在编码训练样本时可以采用one-hot编码等方式进行编码,本申请不对编码的具体方式进行限定。
S1104,基于上述图像处理模型包括的各子模型分别对应的损失信息,构建联合学习损失信息。
在执行本步骤时,可以先确定各子模型各自对应的损失信息。为了提升子模型预测精准度,在本申请中,上述建筑物边界框预测子模型对应的损失信息为Smooth L1;上述屋顶区域预测子模型,上述屋顶轮廓预测子模型,以及上述底座区域预测子模型对应的损失信息为交叉熵损失信息;上述屋顶与底座之间的偏移量预测子模型对应的损失信息为MSE(Mean Squared Error,均方误差)损失信息。由此,提供了五种不同层次的监督信息。
在确定各子模型各自对应的损失信息后,可以基于上述图像处理模型包括的各子模型分别对应的损失信息,构建联合学习损失信息。例如,可以将各子模型各自对应的损失信息相加得到上述联合学习损失信息。
本申请实施例中,在本申请中还可以为上述联合学习损失信息增加正则化项,在此不作特别限定。
在确定联合学习损失信息,以及训练样本后,可以执行S1106,基于上述联合学习损失信息以及上述训练样本对上述图像处理模型包括的各子模型进行联合训练,直至上述各子模型收敛。
在进行模型训练时,可以先指定诸如学习率、训练循环次数等超参数。在确定上述超参数之后,可以基于标注了真值的上述训练样本对上述图像处理模型进行有监督训练。
在有监督训练过程中,可以在针对图像处理模型进行前向传播得到计算结果后,基于构建的联合学习损失信息评价真值与上述计算结果之间的误差。在得到误差之后,可以采用随机梯度下降法确定下降梯度。在确定下降梯度后,可以基于反向传播更新上述图像处理模型对应的模型参数。重复上述过程,直至上述各子模型模型收敛。本申请实施例中,本申请不对模型收敛的条件进行特别限定。
在对图像处理模型进行训练时,使用共享特征的方式进行多任务联合训练,保证训练阶段各个任务之间的耦合性。同时,由于采用了有监督式的联合训练方法,因此,可以对该图像处理包括的五个子模型进行同时训练,使得各子模型之间在训练过程中既可以相互约束,又可以相互促进,从而一方面提高图像处理模型收敛效率;另一方面促进各子模型共用的骨干网络提取到对底座区域预测更有益的特征,从而提升模型预测精准度。
与上述任一实施例相对应的,本申请还提出一种图像处理装置。
请参见图12,图12为本申请示出的一种图像处理装置的示意图。
如图12所示,上述装置1200包括:
获取模块1210,用于获取包含至少一个建筑物的目标图像;提取模块1220,用于针对每个建筑物,从上述目标图像中提取出上述建筑物的边界框和上述目标图像的目 标特征图,基于上述目标图像的目标特征图和上述建筑物的边界框确定上述建筑物的屋顶区域和屋顶与底座之间的预测偏移量;变换模块1230,用于根据上述预测偏移量,对上述屋顶区域进行变换得到上述建筑物的底座区域。
在示出的一些例子中,边界框确定模块,用于利用建筑物边界框预测子模型对上述目标图像进行目标检测,得到上述建筑物的边界框;上述提取模块1220,包括:屋顶区域确定模块,用于基于上述边界框和上述目标图像的目标特征图,通过屋顶区域预测子模型确定上述建筑物的屋顶区域;偏移量确定模块,用于基于上述边界框和上述目标图像的目标特征图,通过偏移量预测子模型确定上述建筑物的上述预测偏移量。
在示出的一些例子中,上述屋顶区域预测子模型与上述偏移量预测子模型共用同一区域特征提取单元,上述区域特征提取单元基于上述建筑物的边界框和所述目标图像的目标特征图确定上述建筑物的建筑物特征;上述屋顶区域确定模块,包括:第一卷积处理模块,用于利用上述屋顶区域预测子模型包括的第一卷积处理单元对上述建筑物特征进行第一卷积处理,得到上述建筑物的屋顶区域;上述偏移量确定模块,包括:第二卷积处理模块,用于利用上述偏移量预测子模型包括的第二卷积处理单元对上述建筑物特征进行第二卷积处理,得到上述建筑物的上述预测偏移量。在示出的一些例子中,上述变换模块1230,具体用于:基于上述预测偏移量和上述建筑物的上述建筑物特征,通过底座区域预测子模型确定上述建筑物的底座区域。
在示出的一些例子中,上述变换模块1230,包括:第一平移变换模块,用于利用上述底座区域预测子模型包括的空间变换网络,对上述屋顶区域对应的建筑物特征进行平移变换,得到上述建筑物的底座特征,其中,上述空间变换网络对应的空间变换参数包括基于上述预测偏移量确定的参数;第三卷积处理模块,用于利用上述底座区域预测子模型对上述底座特征进行第三卷积处理,得到上述建筑物的底座区域。
在示出的一些例子中,上述空间变换网络包括基于插值方式构建的采样器,其中,上述采样器包括基于上述预测偏移量构建的采样网格;上述第一平移变换模块,具体用于:利用上述采样器,按照上述底座特征包括的多个像素点的各坐标信息,依次将底座特征包括的各像素点作为当前像素点,通过上述采样网格确定所述屋顶区域包括的各像素点中,与上述当前像素点对应的像素点,并基于插值方式对上述确定的像素点的值进行计算,得到上述当前像素点对应的像素值。
在示出的一些例子中,与上述屋顶区域预测子模型以及上述偏移量预测子模型共用同一区域特征提取单元的还包括屋顶轮廓预测子模型;上述装置还包括:轮廓回归模块,用于利用上述屋顶轮廓预测子模型对上述建筑物特征进行轮廓回归,确定上述建筑物的屋顶轮廓;第二平移变换模块,用于根据上述预测偏移量,对上述屋顶轮廓进行变换,得到上述建筑物的底座轮廓;最终底座区域确定模块,用于基于上述底座轮廓对上述底座区域进行调整,得到上述建筑物的最终底座区域。
在示出的一些例子中,上述轮廓回归模块,具体用于:从上述建筑物特征中提取多个连接点;将上述多个连接点中的至少部分连接点两两进行组合,得到多个线段;对上述多个线段进行预测,得到各线段对应的预测分数,其中,上述预测分数用于指示与该分数对应的线段属于屋顶轮廓的概率;将上述多个线段中,对应的预测分数大于预设阈值的线段进行组合,得到上述建筑物的屋顶轮廓。
在示出的一些例子中,上述提取模块1220,具体用于:利用图像处理模型对所述目标图像进行图像处理;其中,所述图像处理模型包括建筑物边界框预测子模型、屋顶区域预测子模型、偏移量预测子模型、屋顶轮廓预测子模型以及底座区域预测子模型。
在示出的一些例子中,上述图像处理模型的训练方法对应的训练装置包括:
训练样本获取模块,用于获取多个包括标注信息的训练样本;其中,上述标注信息包括建筑物边界框,建筑物屋顶区域,建筑物屋顶轮廓,建筑物屋顶与底座之间的偏移量,建筑物底座区域;
损失信息确定模块,用于基于上述图像处理模型包括的各子模型分别对应的损失信息,构建联合学习损失信息;
联合训练模块,用于基于上述联合学习损失信息以及上述训练样本对上述图像处理模型包括的各子模型进行联合训练,直至上述各子模型收敛。
本申请示出的图像处理装置的实施例可以应用于电子设备上。相应地,本申请公开了一种电子设备,该设备可以包括:处理器。
用于存储处理器可执行指令的存储器。
其中,上述处理器被配置为调用上述存储器中存储的可执行指令,实现如上述任一实施例示出的图像处理方法。
请参见图13,图13为本申请示出的一种电子设备的硬件结构图。
如图13所示,该电子设备可以包括用于执行指令的处理器,用于进行网络连接的网络接口,用于为处理器存储运行数据的内存,以及用于存储图像处理装置对应指令的非易失性存储器。
其中,图像处理装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,除了图13所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。
可以理解的是,为了提升处理速度,图像处理装置对应指令也可以直接存储于内存中,在此不作限定。本申请提出一种计算机可读存储介质,上述存储介质存储有计算机程序,上述计算机程序用于执行如上述任一实施例示出的图像处理方法。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本申请实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的数据集生成方法或伪造检测方法,具体可参见上述方法实施例,在此不再赘述。
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
本领域技术人员应明白,本申请一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本申请一个或多实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(可以包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请中的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”可以包括三种方案:A、B、以及“A和B”。
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部 分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本申请中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、可以包括本申请中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本申请中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。
本申请中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。上述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。
适合用于执行计算机程序的计算机可以包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件可以包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将可以包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。
适合于存储计算机程序指令和数据的计算机可读介质可以包括所有形式的非易失性存储器、媒介和存储器设备,例如可以包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。
虽然本申请包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本申请内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望 的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。
以上仅为本申请一个或多个实施例的较佳实施例而已,并不用以限制本申请一个或多个实施例,凡在本申请一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请一个或多个实施例保护的范围之内。

Claims (14)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取包含至少一个建筑物的目标图像;
    对于每个建筑物,
    从所述目标图像中提取出所述建筑物的边界框和所述目标图像的目标特征图;
    基于所述目标图像的目标特征图和所述建筑物的边界框确定所述建筑物的屋顶区域和屋顶与底座之间的预测偏移量;
    根据所述预测偏移量,对所述屋顶区域进行变换得到所述建筑物的底座区域。
  2. 根据权利要求1所述的方法,其特征在于,
    所述从所述目标图像中提取出所述建筑物的边界框包括:
    利用建筑物边界框预测子模型对所述目标图像进行目标检测,得到所述建筑物的边界框;
    所述确定所述建筑物的屋顶区域,包括:
    基于所述建筑物的边界框和所述目标图像的目标特征图,由屋顶区域预测子模型确定所述建筑物的屋顶区域;
    所述确定所述建筑物的屋顶与底座之间的预测偏移量,包括:
    基于所述建筑物的边界框和所述目标图像的目标特征图,由偏移量预测子模型确定所述建筑物的所述预测偏移量。
  3. 根据权利要求2所述的方法,其特征在于,所述屋顶区域预测子模型与所述偏移量预测子模型共用同一区域特征提取单元,所述区域特征提取单元基于所述建筑物的边界框和所述目标图像的目标特征图确定所述建筑物的建筑物特征;
    所述确定所述建筑物的屋顶区域,包括:利用所述屋顶区域预测子模型包括的第一卷积处理单元对所述建筑物特征进行第一卷积处理,得到所述建筑物的屋顶区域;
    所述确定所述建筑物的所述预测偏移量,包括:利用所述偏移量预测子模型包括的第二卷积处理单元对所述建筑物特征进行第二卷积处理,得到所述建筑物的所述预测偏移量。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述预测偏移量,对所述屋顶区域进行变换得到所述建筑物的底座区域,包括:
    基于所述预测偏移量和所述建筑物的所述建筑物特征,由底座区域预测子模型确定所述建筑物的底座区域。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述预测偏移量和所述建筑物的所述建筑物特征,由底座区域预测子模型确定所述建筑物的底座区域,包括:
    利用所述底座区域预测子模型包括的空间变换网络,对所述屋顶区域对应的建筑物特征进行平移变换,得到所述建筑物的底座特征,其中,所述空间变换网络的空间变换参数包括基于所述预测偏移量确定的参数;
    利用所述底座区域预测子模型对所述底座特征进行第三卷积处理,得到所述建筑物的底座区域。
  6. 根据权利要求5所述的方法,其特征在于,所述空间变换网络包括基于插值方式构建的采样器,其中,所述采样器包括基于所述预测偏移量构建的采样网格;
    所述利用所述底座区域预测子模型包括的空间变换网络,对所述屋顶区域对应的建筑物特征进行平移变换,得到所述建筑物的底座特征,包括:
    利用所述采样器,按照所述底座特征包括的多个像素点的各坐标信息,依次将底座特征包括的各像素点作为当前像素点,通过所述采样网格确定所述屋顶区域包括的各像素点中,与所述当前像素点对应的像素点,并基于插值方式对所述确定的像素点的值进行计算,得到所述当前像素点对应的像素值。
  7. 根据权利要求3-6任一所述的方法,其特征在于,与所述屋顶区域预测子模型以 及所述偏移量预测子模型共用同一区域特征提取单元的还包括屋顶轮廓预测子模型;所述方法还包括:
    利用所述屋顶轮廓预测子模型对所述建筑物特征进行轮廓回归,确定所述建筑物的屋顶轮廓;
    根据所述预测偏移量,对所述屋顶轮廓进行变换,得到所述建筑物的底座轮廓;
    基于所述底座轮廓对所述底座区域进行调整,得到所述建筑物的最终底座区域。
  8. 根据权利要求7所述的方法,其特征在于,所述利用所述屋顶轮廓预测子模型对所述建筑物特征进行轮廓回归,确定所述建筑物的屋顶轮廓,包括:
    从所述建筑物特征中提取多个连接点;
    将所述多个连接点中的至少部分连接点进行组合,得到多个线段;
    对所述多个线段进行预测,得到各线段对应的预测分数,其中,所述预测分数用于指示与该分数对应的线段属于屋顶轮廓的概率;
    将所述多个线段中,对应的预测分数大于预设阈值的线段进行组合,得到所述建筑物的屋顶轮廓。
  9. 根据权利要求1-8任一所述的方法,其特征在于,所述方法利用图像处理模型实现;其中,所述图像处理模型包括建筑物边界框预测子模型、屋顶区域预测子模型、偏移量预测子模型、屋顶轮廓预测子模型以及底座区域预测子模型。
  10. 根据权利要求9所述的方法,其特征在于,所述图像处理模型的训练方法包括:
    获取多个包括标注信息的训练样本;其中,所述标注信息包括建筑物边界框,建筑物屋顶区域,建筑物屋顶轮廓,建筑物屋顶与底座之间的偏移量,建筑物底座区域;
    基于所述图像处理模型包括的各子模型分别对应的损失信息,构建联合学习损失信息;
    基于所述联合学习损失信息以及所述训练样本对所述图像处理模型包括的各子模型进行联合训练,直至所述各子模型收敛。
  11. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取包含至少一个建筑物的目标图像;
    提取模块,用于对每个建筑物,从所述目标图像中提取出所述建筑物的边界框和所述目标图像的目标特征图,基于所述目标图像的目标特征图和所述建筑物的边界框确定所述建筑物的屋顶区域和屋顶与底座之间的预测偏移量;
    变换模块,用于根据所述预测偏移量,对所述屋顶区域进行变换得到所述建筑物的底座区域。
  12. 一种电子设备,其特征在于,所述设备包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现权利要求1至10中任一项所述的图像处理方法。
  13. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序用于执行权利要求1至10中任一项所述的图像处理方法。
  14. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至10任一项所述的图像处理方法。
PCT/CN2021/103643 2020-09-27 2021-06-30 一种图像处理方法、装置、设备和存储介质 WO2022062543A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011035443.6 2020-09-27
CN202011035443.6A CN112149585A (zh) 2020-09-27 2020-09-27 一种图像处理方法、装置、设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022062543A1 true WO2022062543A1 (zh) 2022-03-31

Family

ID=73896114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103643 WO2022062543A1 (zh) 2020-09-27 2021-06-30 一种图像处理方法、装置、设备和存储介质

Country Status (2)

Country Link
CN (1) CN112149585A (zh)
WO (1) WO2022062543A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035409A (zh) * 2022-06-20 2022-09-09 北京航空航天大学 一种基于相似性对比学习的弱监督遥感图像目标检测算法
CN117115641A (zh) * 2023-07-20 2023-11-24 中国科学院空天信息创新研究院 建筑物信息提取方法、装置、电子设备及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149585A (zh) * 2020-09-27 2020-12-29 上海商汤智能科技有限公司 一种图像处理方法、装置、设备和存储介质
CN112949388B (zh) * 2021-01-27 2024-04-16 上海商汤智能科技有限公司 一种图像处理方法、装置、电子设备和存储介质
CN113344180A (zh) * 2021-05-31 2021-09-03 上海商汤智能科技有限公司 神经网络训练与图像处理方法、装置、设备和存储介质
CN113344195A (zh) * 2021-05-31 2021-09-03 上海商汤智能科技有限公司 网络训练与图像处理方法、装置、设备和存储介质
CN114529552A (zh) * 2022-03-03 2022-05-24 北京航空航天大学 一种基于几何轮廓顶点预测的遥感影像建筑物分割方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8731234B1 (en) * 2008-10-31 2014-05-20 Eagle View Technologies, Inc. Automated roof identification systems and methods
CN110197147A (zh) * 2019-05-23 2019-09-03 星际空间(天津)科技发展有限公司 遥感影像的建筑物实例提取方法、装置、存储介质及设备
CN111458691A (zh) * 2020-01-19 2020-07-28 北京建筑大学 建筑物信息的提取方法、装置及计算机设备
CN112149585A (zh) * 2020-09-27 2020-12-29 上海商汤智能科技有限公司 一种图像处理方法、装置、设备和存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7164883B2 (en) * 2001-02-14 2007-01-16 Motorola. Inc. Method and system for modeling and managing terrain, buildings, and infrastructure
JP4319857B2 (ja) * 2003-05-19 2009-08-26 株式会社日立製作所 地図作成方法
CN104240247B (zh) * 2014-09-10 2017-04-12 无锡儒安科技有限公司 一种基于单张图片的建筑物俯视轮廓的快速提取方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8731234B1 (en) * 2008-10-31 2014-05-20 Eagle View Technologies, Inc. Automated roof identification systems and methods
CN110197147A (zh) * 2019-05-23 2019-09-03 星际空间(天津)科技发展有限公司 遥感影像的建筑物实例提取方法、装置、存储介质及设备
CN111458691A (zh) * 2020-01-19 2020-07-28 北京建筑大学 建筑物信息的提取方法、装置及计算机设备
CN112149585A (zh) * 2020-09-27 2020-12-29 上海商汤智能科技有限公司 一种图像处理方法、装置、设备和存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035409A (zh) * 2022-06-20 2022-09-09 北京航空航天大学 一种基于相似性对比学习的弱监督遥感图像目标检测算法
CN115035409B (zh) * 2022-06-20 2024-05-28 北京航空航天大学 一种基于相似性对比学习的弱监督遥感图像目标检测算法
CN117115641A (zh) * 2023-07-20 2023-11-24 中国科学院空天信息创新研究院 建筑物信息提取方法、装置、电子设备及存储介质
CN117115641B (zh) * 2023-07-20 2024-03-22 中国科学院空天信息创新研究院 建筑物信息提取方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN112149585A (zh) 2020-12-29

Similar Documents

Publication Publication Date Title
WO2022062543A1 (zh) 一种图像处理方法、装置、设备和存储介质
CN112200165A (zh) 模型训练方法、人体姿态估计方法、装置、设备及介质
CN112330664B (zh) 路面病害检测方法、装置、电子设备及存储介质
WO2022062854A1 (zh) 一种图像处理方法、装置、设备和存储介质
US11106904B2 (en) Methods and systems for forecasting crowd dynamics
CN110969648B (zh) 一种基于点云序列数据的3d目标跟踪方法及系统
CN114758337B (zh) 一种语义实例重建方法、装置、设备及介质
WO2021249114A1 (zh) 目标跟踪方法和目标跟踪装置
JP2023535502A (ja) 半教師付きキーポイントベースモデル
WO2024083121A1 (zh) 一种数据处理方法及其装置
CN113344195A (zh) 网络训练与图像处理方法、装置、设备和存储介质
CN115953468A (zh) 深度和自运动轨迹的估计方法、装置、设备及存储介质
WO2022252558A1 (zh) 神经网络训练与图像处理方法、装置、设备和存储介质
Sun et al. Two-stage deep regression enhanced depth estimation from a single RGB image
CN113932796A (zh) 高精地图车道线生成方法、装置和电子设备
CN114077892A (zh) 人体骨骼序列提取及训练方法、装置和存储介质
JP2023036795A (ja) 画像処理方法、モデル訓練方法、装置、電子機器、記憶媒体、コンピュータプログラム及び自動運転車両
CN114677508A (zh) 一种基于动态滤波和逐点相关的点云实例语义分割方法
CN113920254A (zh) 一种基于单目rgb的室内三维重建方法及其系统
Xing et al. ROIFormer: semantic-aware region of interest transformer for efficient self-supervised monocular depth estimation
Sun et al. Accurate deep direct geo-localization from ground imagery and phone-grade gps
Cheng Global-feature enhanced network for fast semantic segmentation
KR102613887B1 (ko) 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성 방법 및 장치
CN114926655B (zh) 地理与视觉跨模态预训练模型的训练方法、位置确定方法
CN117612200A (zh) 一种知识迁移的点云人体姿态估计模型的训练和识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21870922

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022546338

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.09.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21870922

Country of ref document: EP

Kind code of ref document: A1