WO2022252558A1 - Methods for neural network training and image processing, apparatus, device and storage medium - Google Patents

Methods for neural network training and image processing, apparatus, device and storage medium Download PDF

Info

Publication number
WO2022252558A1
WO2022252558A1 PCT/CN2021/137544 CN2021137544W WO2022252558A1 WO 2022252558 A1 WO2022252558 A1 WO 2022252558A1 CN 2021137544 W CN2021137544 W CN 2021137544W WO 2022252558 A1 WO2022252558 A1 WO 2022252558A1
Authority
WO
WIPO (PCT)
Prior art keywords
area
offset
image
roof
base
Prior art date
Application number
PCT/CN2021/137544
Other languages
French (fr)
Chinese (zh)
Inventor
王金旺
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2022252558A1 publication Critical patent/WO2022252558A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a neural network training and image processing method, device, device and storage medium.
  • the building base extraction network based on the neural network is mainly used to extract the building base in the remote sensing image, and then the building base is used for building statistics.
  • the present disclosure at least discloses a neural network training method.
  • the method may include: for each of the multiple regions, acquiring one or more frames of captured images corresponding to the region; wherein, in the case that the region corresponds to multiple frames of captured images, there are at least two frames of captured images
  • the collected images have different collection angles; a frame of the collected image corresponding to the region is used as the target collected image corresponding to the region to mark the true value information of the base area; the marked target collected image corresponding to the region
  • the true value information of the base area is determined as the true value information of the base area of each frame acquisition image corresponding to the area, and based on the acquisition image and the target acquisition image respectively corresponding to the multiple areas, a training sample set is obtained based on The training sample set is used for neural network training.
  • the method further includes: acquiring the training sample set; using the building base extraction network to obtain the roof area and offset corresponding to each collected image in the training sample set; wherein, The offset represents an offset between the roof area and the base area; for each acquired image, based on the acquired offset corresponding to the acquired image, the roof area corresponding to the acquired image is translated Transform to obtain the base area corresponding to the collected image; adjust the network parameters of the building base extraction network based on the true value information of the base area corresponding to each of the collected images and the base area respectively obtained for each of the collected images .
  • the obtaining process of the training sample set further includes: marking the base position true value information on the target acquisition image corresponding to each area; for each area, the The base position truth information marked in the target acquisition image corresponding to the area is determined as the base position truth information of each frame acquisition image corresponding to the area.
  • the method further includes: acquiring the training sample set; using the roof area extraction network, the offset extraction network, and the roof position extraction network included in the building foundation extraction network to obtain the The roof area, offset and roof position corresponding to each collected image in the training sample set, wherein the offset represents the offset between the roof area and the base area; based on the base area corresponding to each of the collected images True value information, and for the roof area and offset obtained respectively for each of the collected images, adjust the network parameters of the roof area extraction network; based on the true value information of the base position corresponding to the respective collected images, and for the The roof positions and offsets obtained by each of the collected images are adjusted, and the network parameters of the roof position extraction network and the offset extraction network are adjusted.
  • the roof area extraction network is adjusted based on the ground truth information of the base area corresponding to each of the collected images, and the roof area and offset respectively obtained for each of the collected images.
  • the network parameters include: for each frame of image in the collected images, using the offset corresponding to the image to translate the true value information of the base area corresponding to the image to obtain the first frame corresponding to the image Roof area true value information; based on the first roof area true value information corresponding to the image and the roof area obtained for the image, the roof area loss information corresponding to the image is obtained; based on the respective collected images corresponding to The roof area loss information is adjusted by backpropagation to the network parameters of the roof area extraction network.
  • the roof position extraction network is adjusted based on the ground truth information of the base positions corresponding to the collected images, and the roof positions and offsets respectively obtained for the collected images. Extracting the network parameters of the network with the offset includes: for each frame of image in the collected images, using the offset corresponding to the image to translate the roof position corresponding to the image to obtain the The base position corresponding to the image; based on the base position true value information corresponding to the image and the base position obtained for the image, the base position loss information corresponding to the image is obtained; based on the base position loss corresponding to each collected image information, adjust the network parameters of the roof position extraction network and the offset extraction network through backpropagation.
  • the roof area extraction network, the offset extraction network and the roof position extraction network share a feature extraction network.
  • At least part of the collected images of the training sample set are also labeled with the second roof region true value information, the real offset and the roof position true value information; the method also includes at least one of the following : Adjusting the network parameters of the roof area extraction network based on the ground truth information of the second roof region marked on the at least part of the captured image and the roof region obtained for the at least part of the captured image; based on the at least part of the captured image marked adjusting the network parameters of the offset extraction network based on the actual offset and the offset obtained for the at least part of the collected images; The roof position is obtained by collecting images, and the network parameters of the roof position extraction network are adjusted.
  • the at least part of the collected images are also labeled with the true value information of the building frame; the method further includes: using the building frame extraction network included in the building base extraction network to extract the A building frame corresponding to at least part of the collected image; wherein, the building frame extraction network includes the feature extraction network; based on the true value information of the building frame marked on the at least part of the collected image and the obtained for the at least part of the collected image adjusting the network parameters of the building frame extraction network.
  • the method further includes: using the collected images marked with the real value information of the second roof area, the real offset and the real value information of the roof position in the training sample set, to construct the building
  • the base extraction network is pretrained.
  • the acquired images in the training sample set are marked with a first real offset; the method further includes: using the offset extraction network to obtain a plurality of rotated images corresponding to various The second predicted offset corresponding to the preset angles; the second predicted offset indicates the offset between the roof and the base in the rotated image; obtained by rotating the various preset angles; respectively rotating the first real offset by the various preset angles to obtain second real offsets respectively corresponding to the various preset angles; based on The second actual offset corresponding to the various preset angles and the obtained second predicted offset adjust the network parameters of the offset extraction network.
  • the using the offset extraction network to obtain the second predicted offsets respectively corresponding to various preset angles from multiple rotated images includes: for the various preset angles For each preset angle, use the offset extraction network to rotate the first image feature corresponding to the collected image by the preset angle to obtain the second image feature corresponding to the preset angle; based on the The second image feature is used to obtain a second predicted offset corresponding to the preset angle.
  • the present disclosure also proposes an image processing method, including: receiving a remote sensing image to be processed; using a building base extraction network to extract the roof area and offset of the building in the remote sensing image to be processed; wherein, the building base
  • the extraction network is obtained by training the neural network training method shown in any of the aforementioned implementations, the offset represents the offset between the roof area and the base area; the roof area is translated by using the offset transform to obtain the building base area corresponding to the remote sensing image to be processed.
  • the present disclosure also proposes a neural network training device, including: an acquisition module, configured to acquire, for each of multiple areas, one or more frames of images corresponding to the area; wherein, in the area corresponding to In the case of multiple frames of captured images, there are at least two frames of the captured images with different capture angles; the first labeling module is configured to use one frame of the captured image corresponding to the region as the target captured image corresponding to the region Annotate the true value information of the base area; the first determination module is used to determine the true value information of the base area marked by the target acquisition image corresponding to the area as the true value information of the base area of each frame acquisition image corresponding to the area and obtaining a training sample set based on the collected images and the target collected image respectively corresponding to the multiple regions, so as to perform neural network training based on the training sample set.
  • an acquisition module configured to acquire, for each of multiple areas, one or more frames of images corresponding to the area; wherein, in the area corresponding to In the case of multiple frames of captured images, there are at
  • the present disclosure also proposes an image processing device, including: a receiving module, configured to receive remote sensing images to be processed; an extraction module, configured to use a building base extraction network to extract building roof areas and offset displacement; wherein, the building base extraction network is obtained by training the neural network training method shown in any of the foregoing implementations, and the displacement represents the offset between the roof area and the base area; the translation module, It is used for performing translation transformation on the roof area by using the offset to obtain the building base area corresponding to the remote sensing image to be processed.
  • the present disclosure also proposes an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein, the processor executes the executable instructions to implement any of the neural network training methods described above And/or the image processing method described above.
  • the present disclosure also proposes a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to enable a processor to execute any of the above-mentioned neural network training methods and/or the above-mentioned image processing methods .
  • the area and position of the base of the building in each acquired image are identical. That is, to mark the true value information of the base area of the target acquisition image in the same area, it can be regarded as marking the true value information of the base area for each frame acquisition image in this area, so as to carry out sample expansion, that is, through a small amount of The labeling operation obtains a large number of training samples.
  • the training sample set obtained by sample expansion based on the characteristics that the same building base area will not change can be used to train the building base prediction network, which is helpful to use a small number of labeled samples to train high-precision buildings.
  • object base extraction network
  • Fig. 1 is a method flowchart of a neural network training method shown in the present disclosure
  • FIG. 2 is a schematic flow chart of a neural network training method shown in the present disclosure
  • FIG. 3 is a schematic diagram of a building base area extraction process shown in the present disclosure
  • FIG. 4 is a schematic diagram of a building base area extraction process shown in the present disclosure
  • FIG. 5 is a schematic flow chart of a neural network training method shown in the present disclosure.
  • FIG. 6 is a method flowchart of a neural network training method shown in the present disclosure.
  • FIG. 7 is a schematic flowchart of a neural network training method shown in the present disclosure.
  • FIG. 8 is a schematic diagram of a building base extraction network training process shown in the present disclosure.
  • FIG. 9 is a schematic diagram of a building base extraction network training process shown in the present disclosure.
  • FIG. 10 is a schematic structural diagram of a neural network training device shown in the present disclosure.
  • FIG. 11 is a schematic diagram of a hardware structure of an electronic device shown in the present disclosure.
  • the present disclosure aims to propose a neural network training method.
  • This method takes advantage of the fact that the base area of the same building will not change, and shares the ground truth information of the base area between the multi-frame acquisition images corresponding to the same area, so as to achieve the effect of expanding the training samples, which in turn helps to utilize a small number of labeled samples , to train a high-precision building base extraction network.
  • FIG. 1 is a method flowchart of a neural network training method shown in the present disclosure.
  • the neural network training method can be applied to electronic equipment.
  • the electronic device may implement the method by carrying a software device corresponding to the neural network training method.
  • the type of the electronic device may be a notebook computer, a computer, a server, a mobile phone, a PAD terminal and the like.
  • the type of the electronic device is not particularly limited in the present disclosure.
  • the electronic device may be a client device or a server device.
  • the server device may be a cloud.
  • an electronic device (hereinafter referred to as device) is taken as an example for description.
  • the method may include:
  • the captured image may be captured by any image capturing device capable of capturing images of the multiple regions.
  • the multi-frame acquisition images collected for the same area there are at least two frames of acquisition images with different acquisition angles, thereby enriching the information contained in the training samples and improving the adaptability of the neural network.
  • the collected images may be classified and stored in the storage medium according to regions.
  • the device can acquire the collected images from the storage medium.
  • the acquired images may include multi-temporal images acquired for the plurality of regions.
  • the multi-temporal image may refer to multiple frames of remote sensing images collected for the same area at different times.
  • the target captured image may be arbitrarily selected from one or more frames of captured images corresponding to the region with a resolution that meets the standard.
  • At least one frame of the captured image may be respectively selected from the captured images corresponding to each region as the target captured image. Then, mark the true value information of the base area by pre-marking.
  • the ground truth information of the base area may be pixel level ground truth information.
  • the true value information of the base area may be to set the value of the pixel points in the base area of the building in the remote sensing image to 1, and set the value of the pixel points outside the base area to 0.
  • the base region ground truth information marked for the target acquisition image corresponding to each region in S104 can be used as the truth value information corresponding to each acquisition image in each region, thereby achieving the purpose of expanding training samples.
  • the area and position of the base of the building in each acquired image are the same. That is, to mark the true value information of the base area of any frame of the image in the same area and use it as the target acquisition image corresponding to the area, it can be regarded as that the true value information of the base area has been marked for each frame of the acquired image in the area, so that Sample expansion is carried out, that is, a large number of training samples are obtained through a small amount of labeling operations.
  • neural network training can be performed based on the obtained training sample set.
  • FIG. 2 is a schematic flowchart of a neural network training method disclosed publicly.
  • the method includes:
  • the device may execute S202 in response to the network training request.
  • the training sample set may be stored in a storage medium, so that the device can obtain the stored training sample set from the storage medium. Afterwards, the device may perform S204-S206.
  • the building base extraction network (hereinafter referred to as the base extraction network) can be used to directly extract the building base;
  • the base extraction network can be used to first extract the building roof and the offset indicating the offset between the roof and the base , and then transform the roof indirectly through the offset to get the base.
  • the training methods of the base extraction network corresponding to different methods are different.
  • FIG. 3 is a schematic diagram of a process of extracting a building base area shown in the present disclosure.
  • the base area can be obtained directly after the remote sensing image is input into the base extraction network.
  • the base extraction network shown in FIG. 3 may be a network constructed based on a target detection network.
  • the target detection network can be based on RCNN (Region Convolutional Neural Network, regional convolutional neural network), FAST-RCNN (Fast Region Convolutional Neural Network, fast regional convolutional neural network), FASTER-RCNN ( Faster Region Convolutional Neural Network, Faster Regional Convolutional Neural Network) or MASK-RCNN (Mask Region Convolutional Neural Network, Mask Region Convolutional Neural Network).
  • a MASK-RCNN with higher accuracy for region representation can be used.
  • the MASK-RCNN may include RPN (Region Proposal Network, candidate frame generation network), and RoI Align (Region of Interest Align, region of interest alignment) unit, etc.
  • the RPN network is used to generate candidate frames corresponding to each building in the collected image. After the candidate frame is obtained, the regression and classification of the candidate frame can be performed to obtain the frame corresponding to each building.
  • the RoI Align unit is used to extract the visual features corresponding to the building from the collected image according to the frame corresponding to the building. Afterwards, the corresponding visual features of the building can be used to extract the base area, roof area, offset, and roof position according to the functional requirements of the target detection network.
  • the neural network training method may include: when executing S204, the device may input each collected image in the training sample set into the base extraction network for base extraction, and obtain each Acquire images respectively corresponding to the pedestal area.
  • the preset loss function can be used to obtain the base area loss information corresponding to each acquired image according to the ground truth information of each acquired image marked base area and the base area corresponding to each acquired image.
  • the backpropagation method can be used to adjust the network parameters of the base extraction network after the descent gradient is obtained.
  • the network training is completed, and the trained building base extraction network is obtained.
  • the training sample set obtained by sample expansion based on the characteristics that the same building base area will not change can be used for building base extraction network training, which helps to use a small number of labeled samples to train high-precision The building base extraction network for .
  • FIG. 4 is a schematic diagram of a process for extracting building plinth regions shown in the present disclosure.
  • the roof area of the building and the offset indicating the offset between the roof and the base can be obtained first.
  • the offset can then be used to transform (for example, translate) the roof area to obtain the base area.
  • the base extraction network shown in FIG. 4 may include a roof area extraction network and an offset extraction network.
  • the roof area extraction network and the offset extraction network may be networks constructed based on a target detection network.
  • the target detection network can be any one of RCNN, FAST-RCNN, FASTER-RCNN or MASK-RCNN.
  • a MASK-RCNN with higher accuracy for region representation can be used.
  • the roof area extraction network and the offset extraction network may share a feature extraction network.
  • the shared feature extraction network can include a backbone network, regional feature extraction units, etc. This can simplify the network structure and facilitate network training.
  • the two networks can also share RPN, RoI Align units, and the like.
  • FIG. 5 is a schematic flowchart of a neural network training method shown in the present disclosure.
  • the neural network training method may include:
  • the roof area extraction network and the offset extraction network included in the building base extraction network may be used to extract the roof area and offset corresponding to the collected images respectively.
  • a translation operation may be performed on each pixel contained in the roof area to obtain the base area.
  • the preset loss function can be used to obtain the base area loss information corresponding to each acquired image according to the ground truth information of the base area marked for each acquired image and the base area corresponding to each acquired image. . Afterwards, the backpropagation method can be used to adjust the network parameters of the base extraction network after obtaining the descending gradient.
  • the network training is completed, and the trained building base extraction network is obtained.
  • the base area of the building can be obtained indirectly, and the features of the roof area and the offset in the collected image can be used
  • the characteristic of significant quantitative features improves the accuracy of base extraction, and even when the building base is blocked, it can also obtain a higher-precision building base.
  • the training sample set obtained by sample expansion based on the characteristics that the same building base area will not change can be used for building base extraction network training, which is helpful to use a small number of labeled samples to train high-precision Building base extraction network.
  • the feature that the shape and position of the base area of the same building will not change can be used to share the true value information of the base area and the true value information of the base position among the multi-frame acquisition images corresponding to the same area, so as to achieve the expansion
  • the effect of training samples which in turn helps to train a high-precision building base extraction network with a small number of labeled samples.
  • FIG. 6 is a method flowchart of a neural network training method shown in the present disclosure. As shown in Figure 6, the method may include:
  • the true value information of the base position may be marked in advance.
  • the ground position information of the base may include the coordinates of the center pixel in the base area, and the width and height information of the base area.
  • cx, cy represent the horizontal and vertical coordinates of the center pixel of the base area, respectively
  • w, h represent the width and height of the base area, respectively.
  • the base position truth information marked for the target acquisition image corresponding to each area in S604 can be used as the truth information corresponding to each acquisition image in each area, thereby achieving the purpose of expanding training samples.
  • each acquired image in the obtained training sample set is marked with the ground truth information of the base area and the ground truth information of the base position.
  • neural network training can be performed based on the obtained training sample set.
  • FIG. 7 is a schematic flow chart of a neural network training method disclosed publicly.
  • the method may include S702-S708.
  • S706 and S708 do not have a strict execution sequence.
  • S706 and S708 may be executed in parallel.
  • the present disclosure does not specifically limit the execution sequence of S706 and S708.
  • the neural network training method can be applied to electronic equipment.
  • the device may execute S702 to acquire the training sample set from a storage medium in response to the network training request.
  • the device may execute S704-S708.
  • the building base extraction network (hereinafter referred to as the base extraction network) may be a network constructed based on a target detection network.
  • the base extraction network in order to improve the accuracy of base area extraction, MASK-RCNN with higher accuracy for area representation can be used as the target detection network.
  • the base extraction network may include a roof area extraction network, an offset extraction network, and a roof position extraction network.
  • the roof area extraction network can be used to extract building roof areas.
  • the offset extraction network can be used to extract the offset between the roof and the base.
  • the roof position extraction network may be used to extract roof positions.
  • the offset can then be used to transform (for example, translate) the roof area to obtain the base area.
  • the position of the roof can be translated to obtain the position of the base through the offset.
  • FIG. 8 is a schematic diagram of a network training process for building base extraction shown in the present disclosure.
  • the base extraction network shown in FIG. 8 includes a roof area extraction network, an offset extraction network and a roof position extraction network. Among them, the roof area and offset extracted by the roof area extraction network and the offset extraction network can be translated and transformed to obtain the base area.
  • the network can be modified to add base area loss information determination branch and base position loss information determination branch, so as to update network parameters according to the determined loss information.
  • the base area loss information may represent an error between the obtained base area and the true value information of the base area.
  • the base position loss information may represent an error between the obtained base position and the base position true value information.
  • S7062 may be executed, and for each frame of the image in the collected images, use the offset corresponding to the image to translate the true value information of the base area corresponding to the image , to obtain the ground truth information of the first roof region corresponding to the image. Then S7064 may be executed to obtain roof area loss information corresponding to the image based on the ground truth information of the first roof area corresponding to the image and the roof area obtained for the image. Afterwards, S7066 may be executed to adjust the network parameters of the roof region extraction network through backpropagation based on the roof region loss information respectively corresponding to the collected images.
  • the size of the extracted roof area is a preset size.
  • the size of the roof area is 14*14.
  • the predicted offset is too large, when the roof area is translated, the pixels in the roof area may be translated out of the matrix of the preset size, resulting in information loss and an accurate base area cannot be obtained Disadvantages of loss of information and failure of network convergence.
  • the truth information of the base area is pixel-level truth information, that is, 0 or 1 is marked for each pixel in the captured image.
  • the pixels marked as 1 can be considered as the pixels in the base area; the pixels marked as 0 can be considered as the pixels outside the base area.
  • the true value information of the base area will be translated within the corresponding captured image with a high probability, so the lack of true value information will not be caused, that is, S7062
  • the ground-truth information of the first roof area obtained in will not lack the actual roof-area ground-truth information.
  • accurate roof area loss information can be obtained based on the first roof area true value information and the roof area, so as to ensure smooth convergence of the network.
  • the network parameters of the roof area extraction network can be adjusted by calculating the descent gradient and using back propagation. This enables the training of the network for roof region extraction.
  • S7082 may be executed, and for each frame of image in the collected images, using the offset corresponding to the image, the position of the roof corresponding to the image is translated to obtain the The above image corresponds to the position of the base. Then S7084 may be executed to obtain base position loss information corresponding to the image based on the base position truth information corresponding to the image and the base position obtained for the image. Afterwards, S7086 may be executed to adjust the network parameters of the roof position extraction network and the offset extraction network through backpropagation based on the base position loss information respectively corresponding to the collected images.
  • R 0 (cx 0 , cy 0 , w 0 , h 0 ) may be used to represent the extracted roof position.
  • cx 0 , cy 0 represent the horizontal and vertical coordinates of the center pixel of the roof area, respectively
  • w 0 , h 0 represent the width and height of the roof area, respectively.
  • ⁇ x and ⁇ y represent the offset of the pixel point in the X-axis and Y-axis directions, respectively.
  • a preset loss function such as a cross-entropy loss function
  • the descent gradient can be calculated, and the network parameters can be updated according to backpropagation. Since the roof position and offset obtained by the roof position extraction network and the offset extraction network need to be used when extracting the base position, the roof position extraction network and the offset extraction network can be updated during the backpropagation process .
  • the expanded training sample set can be used to train the roof area extraction network, the roof position extraction network and the offset extraction network, so as to complete the training for the base extraction network and obtain high-precision base extraction. network.
  • the roof area extraction network, the offset extraction network and the roof position extraction network may share a feature extraction network such as a backbone network and an area feature extraction unit. This can simplify the network structure and facilitate network training.
  • the roof region extraction network and the offset extraction network are MASK-RCNN.
  • the roof area extraction network, the offset extraction network and the roof position extraction network can also share RPN (Region Proposal Network, candidate frame generation network), and RoI Align (Region of Interest Align, region of interest alignment) unit, etc. .
  • the shared feature extraction network can be adjusted, so that the training process can be mutually constrained and mutually promoted, thereby improving the network training efficiency on the one hand; on the other hand, promoting the shared feature extraction network Features that are more beneficial to base region extraction are extracted, thereby improving the accuracy of base region extraction.
  • network training efficiency and network prediction accuracy can be improved through joint training.
  • At least part of the collected images of the training sample set may also be marked with at least one of the following information: ground truth information of the second roof area, real offset, and ground truth information of the roof position.
  • manual labeling may be used to label the real value information of the roof area, the real offset, and the real value information of the roof position.
  • a preset loss function (such as a cross-entropy loss function) may be used to obtain loss information according to the second roof area true value information and the obtained roof area. Gradients are then calculated based on the obtained loss information, and backpropagation is performed to adjust the network parameters of the roof region extraction network.
  • a preset loss function (such as an MSE (Mean Square Error, mean square error) loss function) may be used to obtain loss information according to the real offset and the obtained offset. Then calculate the gradient according to the obtained loss information, and perform backpropagation to update the network parameters of the offset extraction network.
  • MSE Mel Square Error, mean square error
  • a preset loss function (such as a Smooth L1 (smooth L1 paradigm) loss function) may be used to obtain loss information according to the true roof position information and the obtained roof position. Then the gradient is calculated according to the obtained loss information, and the network parameters of the roof position extraction network are updated by backpropagation.
  • the roof area extraction network, roof position extraction network, and offset extraction network of the shared feature extraction network on the one hand, various learning information can be introduced, so that the training process can be mutually Constraints can promote each other, so that on the one hand, the network training efficiency can be improved; on the other hand, the shared feature extraction network can be promoted to extract features that are more beneficial to the extraction of the base area, thereby improving the accuracy of the base area extraction.
  • the collected images in the training sample set are also marked with a first real offset; the first real offset indicates the real offset between the roof and the base in the collected images.
  • S402 may be executed, using the offset extraction network to obtain second predicted offsets respectively corresponding to various preset angles from multiple rotated images amount; the second predicted offset indicates the offset between the roof and the base in the rotated image; the multiple rotated images are obtained by rotating the collected image by the various preset angles respectively.
  • the collected image may refer to a remote sensing image marked with the first real offset.
  • the offset refers to the offset between the roof and the base in the remote sensing image.
  • the roof includes 10 pixels, and the base can be obtained by translating the 10 pixels according to the offset.
  • the first real offset may be information indicating the real offset between the roof and the base of the building in the captured image.
  • the first real offset may be information in the form of (x, y) vector.
  • x and y represent the offsets of the pixel points in the roof region and the corresponding pixel points in the base region in the x-axis and y-axis directions, respectively.
  • the offset may be marked in advance according to the real offset between the roof and the base of the building in the collected image. The present disclosure does not specifically limit the labeling manner of the offset.
  • the preset angle can be set according to business requirements.
  • the number of preset angles can be determined according to the sample size that needs to be expanded. For example, if a large number of samples need to be expanded, a large number of preset angles can be set.
  • the present disclosure does not specifically limit the value and quantity of the preset angles.
  • the multiple preset angles are used to rotate the captured image or the image features corresponding to the captured image.
  • each preset angle may be used to generate corresponding rotation matrices respectively. Then, for each preset angle, the rotation matrix corresponding to the preset angle is used to shift each pixel included in the captured image to obtain a rotated captured image, that is, a rotated image. Afterwards, the rotated captured image can be input into the offset extraction network to extract the second predicted offset corresponding to the preset angle, thereby obtaining the second predicted offset corresponding to various preset angles quantity.
  • the feature extraction network included in the offset extraction network can be used to perform feature extraction on the captured image to obtain the first image feature, and then, the obtained The first image feature is rotated. This can reduce the amount of calculation in the rotation process, and can reduce the rotation error introduced when extracting features from the rotated image, which helps to improve the network training effect.
  • S404 may be executed to respectively rotate the first real offset by the various preset angles to obtain second real offsets respectively corresponding to the various preset angles.
  • the rotation matrices corresponding to the respective preset angles can be used to rotate the first real offset of the captured image to obtain the respective rotation matrices corresponding to the multiple preset angles of the captured image. Second real offset.
  • S406 may be executed to adjust network parameters of the offset extraction network based on the second real offset corresponding to the various preset angles and the obtained second predicted offset.
  • a preset loss function (such as a cross-entropy loss function) may be used, for each preset angle, after the first real offset of the captured image is rotated by the preset angle.
  • the corresponding second real offset and the obtained second predicted offset corresponding to the preset angle obtain the corresponding offset loss information after the acquired image is rotated by the preset angle.
  • the total loss is determined by methods such as summation, product, and average, and the descent gradient is calculated using the determined total loss.
  • the network parameters of the offset extraction network are adjusted by backpropagation.
  • the offset extraction network can be used to obtain second predicted offsets respectively corresponding to various preset angles, and to rotate the first real offsets by the various preset angles respectively , to obtain the second real offsets respectively corresponding to the various preset angles, and then use the second real offsets and the obtained second predicted offsets respectively corresponding to the various preset angles, Adjust the network parameters of the offset extraction network.
  • the offset will also rotate by the angle.
  • the effect of expanding the image sample with the real offset can be achieved. In this way, a small amount of labeled data with offsets can be used to train a high-precision offset extraction network.
  • the rotation process of the acquired image can be placed in the offset extraction network, so that the acquired image can be rotated inside the offset extraction network without affecting the training of other branches, that is, it will not affect other branches.
  • the convergence speed of the branch improves the network training efficiency.
  • S4022 can be executed for each preset angle among various preset angles, and the first image feature corresponding to the collected image is rotated by the preset angle by using the offset extraction network to obtain the The second image feature corresponding to the preset angle. Then S4024 may be executed to obtain a second predicted offset corresponding to the preset angle based on the second image feature.
  • the first image feature may refer to an image feature obtained after the collected image undergoes feature extraction processing such as several convolutional layers and pooling layers.
  • the offset extraction network can be a network constructed based on MASK-RCNN.
  • the offset extraction network can perform feature extraction on the collected image through the included backbone network and the RoI Align unit to obtain the first image feature.
  • the aforementioned image features may be characterized by a feature map.
  • the positions of the pixels in the first image feature can be transformed through rotation matrices corresponding to various preset angles to obtain second images respectively corresponding to various preset angles feature.
  • the second image feature can be processed by such as several convolutional layers, pooling layers, fully connected layers and mapping units (for example, softmax (soft maximum transfer function)), to obtain the offset value
  • softmax soft maximum transfer function
  • the acquired image will only be rotated within the offset extraction network, and the unrotated acquired image is still used for training for the roof region extraction network. In this way, the rotation of the acquired image can be changed in the offset extraction network, so as not to affect the training of other branches.
  • the spatial transformation network in order to facilitate the training of the offset extraction network, can be used to rotate the image, so that the rotation process becomes derivable, the gradient can be backpropagated normally, and the network can be directly trained .
  • building frame information can also be introduced during network training to form constraints on network training, thereby improving network training efficiency and helping the feature extraction network to extract features related to buildings.
  • the at least part of the collected images in the training sample set are also marked with true value information of the building frame.
  • the building frame information may be the coordinates of the central pixel point in the building area, and information such as the width and height of the building area.
  • the building frame extraction network included in the building base extraction network can be used to extract the building frame corresponding to the at least part of the collected images; wherein, the building frame extraction network includes the Feature extraction network. Then the network parameters of the building frame extraction network may be adjusted based on the ground truth information of the building frame marked on the at least part of the captured image and the building frame obtained for the at least part of the captured image.
  • the building frame information can be introduced during network training. Since the four extraction networks for the roof area, roof position, offset and building frame share the feature extraction network, on the one hand, the four extraction networks can be made Interrelated and shared feature extraction network, so that the supervision information of each task can be shared, and the convergence of the network can be accelerated; on the other hand, the three extraction networks for roof area, roof position, and offset can feel the complete building. The features of the object region can be improved to improve the extraction performance.
  • the network training efficiency can be improved through pre-training.
  • pre-training may be performed on the building base extraction network by using the collected images labeled with the ground truth information of the second roof area, the real offset and the truth information of the roof position in the training sample set.
  • the pre-training process may refer to the network training process shown in any of the foregoing implementation manners.
  • joint training may also be used in pre-training.
  • at least part of the collected images of the training sample set may include the true value information of six items of roof area, roof position, base area, base position, offset, and building frame.
  • the plinth extraction network may include six extraction networks for roof area, roof location, offset, building border, plinth area loss information, and plinth location loss information sharing a feature extraction network.
  • the six extraction networks can be used as six branches of the base extraction network.
  • the base area loss information may be equivalently represented as the roof area loss information.
  • At least part of the collected images of the training sample set may be input into the base extraction network to obtain the output results of the six branches. Then, the loss information can be obtained according to the aforementioned six items of true value information labeled with the at least part of the collected images, and the output results, and then the network parameters can be updated. In this way, the six branches can be jointly trained to improve the training efficiency and training effect of the base extraction network.
  • the labeled collected images and unlabeled images in the training sample set can be used to randomly input the network for training.
  • the marked captured image may refer to at least part of the captured image marked with the aforementioned six items of true value information.
  • a reasonable network training scheme can be proposed, that is, firstly, the network is systematically pre-trained by using the labeled images with rich real-value information through joint training, and then the labeled images are mixed with the unlabeled images. Fine-tuning the network parameters of the base extraction network, on the one hand, helps to train a high-precision base extraction network using a small number of labeled and collected images; on the other hand, it can improve the efficiency of network training.
  • Embodiments are described below in conjunction with specific training scenarios.
  • FIG. 9 is a schematic diagram of a network training process for building base extraction shown in the present disclosure.
  • the training method in this example can be deployed in any type of electronic device.
  • the base extraction network shown in FIG. 9 includes a network constructed based on MASK-RCNN.
  • the network may include six branches that extract roof area, roof location, offset, building border, plinth area loss information, and plinth location loss information, respectively.
  • the six branches share the backbone network, the RPN candidate frame generation network (hereinafter referred to as RPN), and the RoI Align region feature extraction unit (hereinafter referred to as RoI Align).
  • the backbone network can be VGG (Visual Geometry Group, visual geometry group) network, ResNet (Residual Network, residual network), HRNet (high-to-low resolution network, high-resolution to low-resolution network), etc., in It does not specifically limit in this disclosure.
  • the labeled image may include truth value information of six items of roof area, roof position, base area, base position, offset, and building frame. It can be understood that since the shape and position of the base area of the same building in the multi-temporal image will not change, the unlabeled image in the multi-temporal image can share the ground truth information of the base area and base position with the labeled image .
  • the base extraction network may be pre-trained by using labeled images first through joint training.
  • the loss information corresponding to the four branches can be obtained through the truth information of the four items of the roof area, roof position, offset, and building frame corresponding to the labeled image, and the four branches can be updated through back propagation.
  • Network parameters
  • the branch can be determined through the loss information of the base area and base position, and the base area and base position can be obtained Loss information, and through backpropagation, adjust the network parameters of the three branches of the roof area, roof position, and offset extraction network.
  • the training process can not only constrain each other, but also promote each other, thereby improving the network training efficiency, so that only a small number of labeled
  • the image can initially obtain a network with a better extraction effect.
  • labeled images and unlabeled images can be mixed, and randomly input to the base extraction network for training.
  • joint training such as a pre-training process can be performed.
  • the base extraction network can be used to obtain the roof area, roof position, and offset corresponding to each unlabeled image. Then you can use the base area and base position loss information to determine the branch, and the shared base area and base position true value information to get the base area and base position loss information, and update and extract the roof area, roof position, and offset through backpropagation Quantitatively extract the network parameters of these three branches of the network.
  • fine-tuning the parameters of the pre-trained network can obtain a high-precision base extraction network.
  • the scheme of performing pre-training first and then performing mixed training through the joint training method can improve the efficiency of network training, so that a network with better extraction effect can be obtained by using a small number of labeled images, and the dependence on labeling work can be reduced Second, it can promote the shared feature extraction network (including backbone network and regional feature extraction unit) to extract features that are more beneficial to base region extraction, thereby improving the accuracy of base region extraction. Third, the three branches of extracting the roof area, roof position, and offset extraction network can feel the complete building area characteristics, thereby improving the performance of branch extraction.
  • the building base can be extracted from the remote sensing image to be processed through the network.
  • the specific implementation process may include:
  • the building base extraction network uses the building base extraction network to extract the building roof area and offset in the remote sensing image to be processed; wherein, the building base extraction network is obtained by training the neural network training method as shown in any of the aforementioned implementations , the offset characterizes the offset between the roof area and the base area;
  • the translation transformation is performed on the roof area by using the offset to obtain the building base area corresponding to the remote sensing image to be processed.
  • the remote sensing image to be processed may be a remote sensing image collected by any collection device capable of collecting images of buildings.
  • the trained building base extraction network may be a network as shown in FIG. 9 .
  • a small number of labeled samples can be used to train a high-precision building base extraction network, which can reduce network training costs, improve network training efficiency, and further reduce base extraction costs.
  • the high-precision base extraction network can be used for base extraction to improve the accuracy of building base extraction, thereby improving the statistical accuracy of buildings.
  • the present disclosure also proposes a neural network training device 100 .
  • FIG. 10 is a schematic structural diagram of a neural network training device shown in the present disclosure.
  • the device 100 may include:
  • the acquisition module 101 is configured to acquire, for each of the multiple areas, one or more frames of captured images corresponding to the area; wherein, in the case where the area corresponds to multiple frames of captured images, there are at least two frames
  • the collected images have different collection angles;
  • the first labeling module 102 is configured to use a frame of the captured image corresponding to the area as a target captured image corresponding to the area to label the true value information of the base area;
  • the first determination module 103 is configured to determine the ground truth information of the base area marked in the target acquisition image corresponding to the area as the ground area truth information of each frame acquisition image corresponding to the area, based on the plurality of areas Corresponding to the collected image and the target collected image respectively, a training sample set is obtained to perform neural network training based on the training sample set.
  • the device 100 further includes:
  • a first training module 106 configured to acquire the training sample set
  • the building base extraction network uses the building base extraction network to obtain the roof area and offset corresponding to each collected image in the training sample set; wherein the offset represents the offset between the roof area and the base area;
  • the device 100 further includes:
  • the second labeling module 104 is configured to label the base position true value information on the target acquisition images respectively corresponding to each area;
  • the second determination module 105 is configured to, for each region, determine the true value information of the base position marked in the target captured image corresponding to the region as the true value information of the base position of each frame captured image corresponding to the region.
  • the device 100 further includes:
  • a second training module 107 configured to acquire the training sample set
  • roof area extraction network Using the roof area extraction network, offset extraction network, and roof position extraction network included in the building base extraction network, obtain the roof area, offset, and roof position corresponding to each collected image in the training sample set, wherein the The above offset characterizes the offset between the roof area and the base area;
  • the second training module 107 is used to:
  • the ground truth information of the base area corresponding to the image is translated to obtain the ground truth information of the first roof area corresponding to the image;
  • the network parameters of the roof area extraction network are adjusted through back propagation.
  • the second training module 107 is used to:
  • the position of the roof corresponding to the image is translated to obtain the position of the base corresponding to the image;
  • the network parameters of the roof position extraction network and the offset extraction network are adjusted through back propagation.
  • the roof area extraction network, the offset extraction network and the roof position extraction network share a feature extraction network.
  • At least part of the collected images of the training sample set are also marked with the second roof area ground truth information, the real offset and the roof position ground truth information;
  • the device 100 also includes at least one of the following:
  • a first adjustment module configured to adjust the network parameters of the roof area extraction network based on the ground truth information of the second roof area marked on the at least part of the captured image and the roof area obtained for the at least part of the captured image;
  • the second adjustment module is configured to adjust the network parameters of the offset extraction network based on the real offset marked by the at least part of the captured image and the offset obtained for the at least part of the captured image;
  • the third adjustment module is configured to adjust the network parameters of the roof position extraction network based on the roof position ground truth information marked on the at least part of the collected images and the roof position obtained for the at least part of the collected images.
  • the at least part of the collected images are also marked with the true value information of the building frame; the device 100 also includes:
  • the extraction module is configured to use the building frame extraction network included in the building base extraction network to extract the building frame corresponding to the at least part of the collected images; wherein the building frame extraction network includes the feature extraction network;
  • the fourth adjustment module is configured to adjust the network parameters of the building frame extraction network based on the true value information of the building frame marked on the at least part of the collected images and the building frame obtained for the at least part of the collected images.
  • the device 100 further includes:
  • the pre-training module is used to pre-train the network for extracting the building base by using the training sample set to mark the second roof area true value information, the collected images of the real offset and the roof position true value information.
  • the collected images in the training sample set are marked with the first real offset; the device also includes:
  • An offset obtaining module configured to use the offset extraction network to obtain second predicted offsets respectively corresponding to various preset angles from a plurality of rotated images; the second predicted offset indicates the an offset between the roof and the base in the rotated image; the plurality of rotated images are obtained by rotating the collected images respectively through the various preset angles;
  • a selection module configured to rotate the first real offset by the multiple preset angles to obtain second real offsets respectively corresponding to the multiple preset angles;
  • the fourth adjustment module is configured to adjust the network parameters of the offset extraction network based on the second real offset corresponding to the various preset angles and the obtained second predicted offset.
  • the offset acquisition module is specifically configured to: use an offset extraction network for each preset angle of the plurality of preset angles to convert the acquired image to a corresponding The first image feature is rotated by the preset angle to obtain the second image feature corresponding to the preset angle;
  • the present disclosure further proposes an image processing device.
  • the device can include:
  • a receiving module configured to receive remote sensing images to be processed
  • the extraction module is configured to use the building base extraction network to extract the building roof area and offset in the remote sensing image to be processed; wherein, the building base extraction network uses the neural network as shown in any of the foregoing implementations.
  • the network training method is trained, and the offset characterizes the offset between the roof area and the base area;
  • a translation module configured to use the offset to perform translation transformation on the roof area to obtain the building base area corresponding to the remote sensing image to be processed.
  • an electronic device which may include: a processor.
  • Memory used to store processor-executable instructions.
  • the processor is configured to invoke the executable instructions stored in the memory to implement the aforementioned neural network training method and/or image processing method.
  • FIG. 11 is a schematic diagram of a hardware structure of an electronic device shown in the present disclosure.
  • the electronic device may include a processor for executing instructions, a network interface for network connection, a memory for storing operating data for the processor, and a memory for storing neural network training devices and/or
  • the image processing device corresponds to a non-volatile memory of instructions.
  • the embodiment of the device may be implemented by software, or by hardware or a combination of software and hardware.
  • software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where it is located.
  • the electronic device where the device in the embodiment is usually based on the actual function of the electronic device can also include other Hardware, no more details on this.
  • the device corresponding instructions may also be directly stored in the memory, which is not limited herein.
  • the present disclosure proposes a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be used to cause a processor to execute the aforementioned neural network training method and/or image processing method.
  • one or more embodiments of the present disclosure may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may employ a computer implemented on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) with computer-usable program code embodied therein. The form of the Program Product.
  • Embodiments of the subject matter and functional operations described in this disclosure can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this disclosure and their structural equivalents, or their A combination of one or more of them.
  • Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e. one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data
  • the processing means executes.
  • a computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Computers suitable for the execution of a computer program may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit.
  • a central processing unit will receive instructions and data from a read only memory and/or a random access memory.
  • the basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or Send data to it, or both.
  • mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or Send data to it, or both.
  • a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus (USB) ) portable storage devices like flash drives, to name a few.
  • PDA personal digital assistant
  • GPS Global Positioning System
  • USB Universal Serial Bus
  • Computer-readable media suitable for storing computer program instructions and data may include all forms of non-volatile memory, media and memory devices and may include, for example, semiconductor memory devices such as EPROM, EEPROM and flash memory devices, magnetic disks such as internal hard drives or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks.
  • semiconductor memory devices such as EPROM, EEPROM and flash memory devices
  • magnetic disks such as internal hard drives or removable disks
  • magneto-optical disks and CD ROM and DVD-ROM disks.
  • the processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

Methods for neural network training and image processing, an apparatus, a device, and a storage medium. The method for neural network training comprises: for each area among multiple areas, acquiring one or more captured images corresponding to the area, wherein, if the area corresponds to multiple captured images, at least two captured images have different capture angles (S102); using one captured image corresponding to the area as a target captured image corresponding to the area, and performing base area ground truth information tagging (S104); for each area, determining base area ground truth information tagged in the target captured image corresponding to the area to be base area ground truth information for each captured image corresponding to the area, and obtaining a training sample set on the basis of the captured images and the target captured images corresponding to each of the multiple areas, so as to perform neural network training on the basis of the training sample set (S106).

Description

神经网络训练与图像处理方法、装置、设备和存储介质Neural network training and image processing method, device, equipment and storage medium
相关公开的交叉引用Related Publication Cross-References
本公开要求于2021年5月31日提交的、申请号为202110602248.5的中国专利公开的优先权,该中国专利公开的全部内容以引用的方式并入本文中。This disclosure claims the priority of the Chinese patent publication with application number 202110602248.5 filed on May 31, 2021, the entire content of which is incorporated herein by reference.
技术领域technical field
本公开涉及计算机技术领域,具体涉及一种神经网络训练与图像处理方法、装置、设备和存储介质。The present disclosure relates to the field of computer technology, and in particular to a neural network training and image processing method, device, device and storage medium.
背景技术Background technique
随着城市化率的逐步提高,需要对建筑物进行及时统计以完成城市规划、地图绘制和建筑物变化监测等任务。With the gradual increase of the urbanization rate, timely statistics of buildings are required to complete tasks such as urban planning, map drawing, and building change monitoring.
目前,主要利用基于神经网络生成的建筑物底座提取网络,对遥感图像中的建筑物底座进行提取,然后利用获得的建筑物底座进行建筑物统计。At present, the building base extraction network based on the neural network is mainly used to extract the building base in the remote sensing image, and then the building base is used for building statistics.
然而数据的标注成本很高,无法大量获取有标注样本,而使用少量有标注样本难以训练出高精度的建筑物底座提取网络。However, the cost of data labeling is very high, and it is impossible to obtain a large number of labeled samples, and it is difficult to train a high-precision building base extraction network with a small number of labeled samples.
发明内容Contents of the invention
有鉴于此,本公开至少公开一种神经网络训练方法。该方法可以包括:针对多个区域中的每个区域,获取与所述区域对应的一帧或多帧采集图像;其中,在所述区域对应多帧采集图像的情况下,存在至少两帧所述采集图像具有不同的采集角度;将所述区域对应的一帧所述采集图像作为所述区域对应的目标采集图像进行底座区域真值信息标注;将所述区域对应的目标采集图像所标注的底座区域真值信息,确定为所述区域对应的各帧采集图像的底座区域真值信息,基于所述多个区域分别对应的所述采集图像和所述目标采集图像,得到训练样本集以基于所述训练样本集进行神经网络训练。In view of this, the present disclosure at least discloses a neural network training method. The method may include: for each of the multiple regions, acquiring one or more frames of captured images corresponding to the region; wherein, in the case that the region corresponds to multiple frames of captured images, there are at least two frames of captured images The collected images have different collection angles; a frame of the collected image corresponding to the region is used as the target collected image corresponding to the region to mark the true value information of the base area; the marked target collected image corresponding to the region The true value information of the base area is determined as the true value information of the base area of each frame acquisition image corresponding to the area, and based on the acquisition image and the target acquisition image respectively corresponding to the multiple areas, a training sample set is obtained based on The training sample set is used for neural network training.
在示出的一些实现方式中,所述方法还包括:获取所述训练样本集;利用建筑物底座提取网络,获得所述训练样本集中各采集图像分别对应的屋顶区域与偏移量;其中,所述偏移量表征屋顶区域与底座区域之间的偏移量;针对各采集图像,基于获得的所述采集图像对应的偏移量,对与所述采集图像对应的所述屋顶区域进行平移变换,获得所述采集图像对应的底座区域;基于所述各采集图像分别对应的底座区域真值信息以及针对所述各采集图像分别获得的底座区域,调整所述建筑物底座提取网络的网络参数。In some implementations shown, the method further includes: acquiring the training sample set; using the building base extraction network to obtain the roof area and offset corresponding to each collected image in the training sample set; wherein, The offset represents an offset between the roof area and the base area; for each acquired image, based on the acquired offset corresponding to the acquired image, the roof area corresponding to the acquired image is translated Transform to obtain the base area corresponding to the collected image; adjust the network parameters of the building base extraction network based on the true value information of the base area corresponding to each of the collected images and the base area respectively obtained for each of the collected images .
在示出的一些实现方式中,所述训练样本集的获得过程还包括:对所述每个区域分别对应的所述目标采集图像进行底座位置真值信息标注;针对每个区域,将所述区域对应的目标采集图像所标注的底座位置真值信息,确定为所述区域对应的各帧采集图像的底座位置真值信息。In some of the illustrated implementation manners, the obtaining process of the training sample set further includes: marking the base position true value information on the target acquisition image corresponding to each area; for each area, the The base position truth information marked in the target acquisition image corresponding to the area is determined as the base position truth information of each frame acquisition image corresponding to the area.
在示出的一些实现方式中,所述方法还包括:获取所述训练样本集;利用建筑物底座提取网络包括的屋顶区域提取网络、偏移量提取网络,以及屋顶位置提取网络,获得所述训练样本集中各采集图像分别对应的屋顶区域、偏移量与屋顶位置,其中,所述偏移量表征屋顶区域与底座区域之间的偏移量;基于所述各采集图像分别对应的底座区域真值信息,以及针对所述各采集图像分别获得的屋顶区域与偏移量,调整所述屋顶区域提取网络的网络参数;基于所述各采集图像分别对应的底座位置真值信息,以及针对所述各采集图像分别获得的屋顶位置与偏移量,调整所述屋顶位置提取网络和所述偏移量提取网络的网络参数。In some implementations shown, the method further includes: acquiring the training sample set; using the roof area extraction network, the offset extraction network, and the roof position extraction network included in the building foundation extraction network to obtain the The roof area, offset and roof position corresponding to each collected image in the training sample set, wherein the offset represents the offset between the roof area and the base area; based on the base area corresponding to each of the collected images True value information, and for the roof area and offset obtained respectively for each of the collected images, adjust the network parameters of the roof area extraction network; based on the true value information of the base position corresponding to the respective collected images, and for the The roof positions and offsets obtained by each of the collected images are adjusted, and the network parameters of the roof position extraction network and the offset extraction network are adjusted.
在示出的一些实现方式中,所述基于所述各采集图像分别对应的底座区域真值信息,以及针对所述各采集图像分别获得的屋顶区域与偏移量,调整所述屋顶区域提取网络的网络参数,包括:针对所述各采集图像中的每帧图像,利用所述图像对应的偏移量,对所述图像对应的底座区域真值信息进行平移,得到所述图像对应的第一屋顶区域真值信息;基于所述图像对应的所述第一屋顶区域真值信息与针对所述图像获得的屋顶区域,得到所述图像对应的屋顶区域损失信息;基于所述各采集图像分别对应的屋顶区域损失 信息,通过反向传播调整所述屋顶区域提取网络的网络参数。In some implementations shown, the roof area extraction network is adjusted based on the ground truth information of the base area corresponding to each of the collected images, and the roof area and offset respectively obtained for each of the collected images. The network parameters include: for each frame of image in the collected images, using the offset corresponding to the image to translate the true value information of the base area corresponding to the image to obtain the first frame corresponding to the image Roof area true value information; based on the first roof area true value information corresponding to the image and the roof area obtained for the image, the roof area loss information corresponding to the image is obtained; based on the respective collected images corresponding to The roof area loss information is adjusted by backpropagation to the network parameters of the roof area extraction network.
在示出的一些实现方式中,所述基于所述各采集图像分别对应的底座位置真值信息,以及针对所述各采集图像分别获得的屋顶位置与偏移量,调整所述屋顶位置提取网络和所述偏移量提取网络的网络参数,包括:针对所述各采集图像中的每帧图像,利用所述图像对应的偏移量,对所述图像对应的屋顶位置进行平移,获得所述图像对应的底座位置;基于所述图像对应的底座位置真值信息以及针对所述图像获得的底座位置,得到所述图像对应的底座位置损失信息;基于所述各采集图像分别对应的底座位置损失信息,通过反向传播调整所述屋顶位置提取网络和所述偏移量提取网络的网络参数。In some implementations shown, the roof position extraction network is adjusted based on the ground truth information of the base positions corresponding to the collected images, and the roof positions and offsets respectively obtained for the collected images. Extracting the network parameters of the network with the offset includes: for each frame of image in the collected images, using the offset corresponding to the image to translate the roof position corresponding to the image to obtain the The base position corresponding to the image; based on the base position true value information corresponding to the image and the base position obtained for the image, the base position loss information corresponding to the image is obtained; based on the base position loss corresponding to each collected image information, adjust the network parameters of the roof position extraction network and the offset extraction network through backpropagation.
在示出的一些实现方式中,所述屋顶区域提取网络、偏移量提取网络与所述屋顶位置提取网络共享特征提取网络。In some illustrated implementations, the roof area extraction network, the offset extraction network and the roof position extraction network share a feature extraction network.
在示出的一些实现方式中,所述训练样本集的至少部分采集图像还标注了第二屋顶区域真值信息,真实偏移量以及屋顶位置真值信息;所述方法还包括如下至少一项:基于所述至少部分采集图像标注的第二屋顶区域真值信息以及针对所述至少部分采集图像获得的屋顶区域,调整所述屋顶区域提取网络的网络参数;基于所述至少部分采集图像标注的真实偏移量以及针对所述至少部分采集图像获得的偏移量,调整所述偏移量提取网络的网络参数;基于所述至少部分采集图像标注的屋顶位置真值信息以及针对所述至少部分采集图像获得的屋顶位置,调整所述屋顶位置提取网络的网络参数。In some of the illustrated implementations, at least part of the collected images of the training sample set are also labeled with the second roof region true value information, the real offset and the roof position true value information; the method also includes at least one of the following : Adjusting the network parameters of the roof area extraction network based on the ground truth information of the second roof region marked on the at least part of the captured image and the roof region obtained for the at least part of the captured image; based on the at least part of the captured image marked adjusting the network parameters of the offset extraction network based on the actual offset and the offset obtained for the at least part of the collected images; The roof position is obtained by collecting images, and the network parameters of the roof position extraction network are adjusted.
在示出的一些实现方式中,所述至少部分采集图像还标注了建筑物边框真值信息;所述方法还包括:利用所述建筑物底座提取网络包括的建筑物边框提取网络,提取所述至少部分采集图像对应的建筑物边框;其中,所述建筑物边框提取网络包括所述特征提取网络;基于所述至少部分采集图像标注的建筑物边框真值信息与针对所述至少部分采集图像获得的所述建筑物边框,调整所述建筑物边框提取网络的网络参数。In some of the illustrated implementations, the at least part of the collected images are also labeled with the true value information of the building frame; the method further includes: using the building frame extraction network included in the building base extraction network to extract the A building frame corresponding to at least part of the collected image; wherein, the building frame extraction network includes the feature extraction network; based on the true value information of the building frame marked on the at least part of the collected image and the obtained for the at least part of the collected image adjusting the network parameters of the building frame extraction network.
在示出的一些实现方式中,所述方法还包括:利用所述训练样本集中标注了第二屋顶区域真值信息,真实偏移量以及屋顶位置真值信息的采集图像,对所述建筑物底座提取网络进行预训练。In some of the illustrated implementations, the method further includes: using the collected images marked with the real value information of the second roof area, the real offset and the real value information of the roof position in the training sample set, to construct the building The base extraction network is pretrained.
在示出的一些实现方式中,所述训练样本集中的采集图像标注有第一真实偏移量;所述方法还包括:利用所述偏移量提取网络从多个旋转图像,获得与多种预设角度分别对应的第二预测偏移量;所述第二预测偏移量指示所述旋转图像中屋顶与底座之间的偏移量;所述多个旋转图像通过将所述采集图像分别旋转所述多种预设角度而得到;将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量;基于与所述多种预设角度分别对应的第二真实偏移量和获得的第二预测偏移量,调整所述偏移量提取网络的网络参数。In some implementations shown, the acquired images in the training sample set are marked with a first real offset; the method further includes: using the offset extraction network to obtain a plurality of rotated images corresponding to various The second predicted offset corresponding to the preset angles; the second predicted offset indicates the offset between the roof and the base in the rotated image; obtained by rotating the various preset angles; respectively rotating the first real offset by the various preset angles to obtain second real offsets respectively corresponding to the various preset angles; based on The second actual offset corresponding to the various preset angles and the obtained second predicted offset adjust the network parameters of the offset extraction network.
在示出的一些实现方式中,所述利用偏移量提取网络从多个旋转图像,获得与多种预设角度分别对应的第二预测偏移量,包括:针对所述多种预设角度中的每一预设角度,利用偏移量提取网络,将所述采集图像对应的第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征;基于所述第二图像特征,得到与所述预设角度对应的第二预测偏移量。In some of the illustrated implementation manners, the using the offset extraction network to obtain the second predicted offsets respectively corresponding to various preset angles from multiple rotated images includes: for the various preset angles For each preset angle, use the offset extraction network to rotate the first image feature corresponding to the collected image by the preset angle to obtain the second image feature corresponding to the preset angle; based on the The second image feature is used to obtain a second predicted offset corresponding to the preset angle.
本公开还提出一种图像处理方法,包括:接收待处理遥感图像;利用建筑物底座提取网络,提取所述待处理遥感图像中的建筑物屋顶区域以及偏移量;其中,所述建筑物底座提取网络通过如前述任一实现方式示出的神经网络训练方法训练得到,所述偏移量表征屋顶区域与底座区域之间的偏移量;利用所述偏移量对所述屋顶区域进行平移变换,得到所述待处理遥感图像对应的建筑物底座区域。The present disclosure also proposes an image processing method, including: receiving a remote sensing image to be processed; using a building base extraction network to extract the roof area and offset of the building in the remote sensing image to be processed; wherein, the building base The extraction network is obtained by training the neural network training method shown in any of the aforementioned implementations, the offset represents the offset between the roof area and the base area; the roof area is translated by using the offset transform to obtain the building base area corresponding to the remote sensing image to be processed.
本公开还提出一种神经网络训练装置,包括:获取模块,用于针对多个区域中的每个区域,获取与所述区域对应的一帧或多帧采集图像;其中,在所述区域对应多帧采集图像的情况下,存在至少两帧所述采集图像具有不同的采集角度;第一标注模块,用于将所述区域对应的一帧所述采集图像作为所述区域对应的目标采集图像进行底座区域 真值信息标注;第一确定模块,用于将所述区域对应的目标采集图像所标注的底座区域真值信息,确定为所述区域对应的各帧采集图像的底座区域真值信息,基于所述多个区域分别对应的所述采集图像和所述目标采集图像,得到训练样本集以基于所述训练样本集进行神经网络训练。The present disclosure also proposes a neural network training device, including: an acquisition module, configured to acquire, for each of multiple areas, one or more frames of images corresponding to the area; wherein, in the area corresponding to In the case of multiple frames of captured images, there are at least two frames of the captured images with different capture angles; the first labeling module is configured to use one frame of the captured image corresponding to the region as the target captured image corresponding to the region Annotate the true value information of the base area; the first determination module is used to determine the true value information of the base area marked by the target acquisition image corresponding to the area as the true value information of the base area of each frame acquisition image corresponding to the area and obtaining a training sample set based on the collected images and the target collected image respectively corresponding to the multiple regions, so as to perform neural network training based on the training sample set.
本公开还提出一种图像处理装置,包括:接收模块,用于接收待处理遥感图像;提取模块,用于利用建筑物底座提取网络,提取所述待处理遥感图像中的建筑物屋顶区域以及偏移量;其中,所述建筑物底座提取网络通过如前述任一实现方式示出的神经网络训练方法训练得到,所述偏移量表征屋顶区域与底座区域之间的偏移量;平移模块,用于利用所述偏移量对所述屋顶区域进行平移变换,得到所述待处理遥感图像对应的建筑物底座区域。The present disclosure also proposes an image processing device, including: a receiving module, configured to receive remote sensing images to be processed; an extraction module, configured to use a building base extraction network to extract building roof areas and offset displacement; wherein, the building base extraction network is obtained by training the neural network training method shown in any of the foregoing implementations, and the displacement represents the offset between the roof area and the base area; the translation module, It is used for performing translation transformation on the roof area by using the offset to obtain the building base area corresponding to the remote sensing image to be processed.
本公开还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器通过运行所述可执行指令以实现上述任一所述的神经网络训练方法和/或所述的图像处理方法。The present disclosure also proposes an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein, the processor executes the executable instructions to implement any of the neural network training methods described above And/or the image processing method described above.
本公开还提出一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于使处理器执行上述任一所述的神经网络训练方法和/或所述的图像处理方法。The present disclosure also proposes a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to enable a processor to execute any of the above-mentioned neural network training methods and/or the above-mentioned image processing methods .
在前述方案中,第一,由于同一区域的建筑物底座是不会发生变化的,在对相同区域采集的各采集图像进行图像配准后,各采集图像中的建筑物的底座区域和位置是相同的。即针对相同区域中的目标采集图像的底座区域真值信息进行标注,可视为针对该区域的各帧采集图像均进行了底座区域真值信息的标注,从而进行了样本扩充,即通过少量的标注操作得到大量的训练样本。In the aforementioned scheme, firstly, since the base of buildings in the same area will not change, after image registration is performed on the acquired images collected in the same area, the area and position of the base of the building in each acquired image are identical. That is, to mark the true value information of the base area of the target acquisition image in the same area, it can be regarded as marking the true value information of the base area for each frame acquisition image in this area, so as to carry out sample expansion, that is, through a small amount of The labeling operation obtains a large number of training samples.
第二,可以利用基于同一建筑物底座区域不会发生变化的特性进行样本扩充得到的训练样本集,进行建筑物底座预测网络训练,有助于利用少量的有标注样本,训练出高精度的建筑物底座提取网络。Second, the training sample set obtained by sample expansion based on the characteristics that the same building base area will not change can be used to train the building base prediction network, which is helpful to use a small number of labeled samples to train high-precision buildings. object base extraction network.
应当理解的是,以上所述的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
为了更清楚地说明本公开一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure or related technologies, the following will briefly introduce the drawings that need to be used in the descriptions of the embodiments or related technologies. Obviously, the accompanying drawings in the following description The drawings are only some embodiments described in one or more embodiments of the present disclosure, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.
图1为本公开示出的一种神经网络训练方法的方法流程图;Fig. 1 is a method flowchart of a neural network training method shown in the present disclosure;
图2为本公开示出的一种神经网络训练方法流程示意图;FIG. 2 is a schematic flow chart of a neural network training method shown in the present disclosure;
图3为本公开示出的一种建筑物底座区域提取流程示意图;FIG. 3 is a schematic diagram of a building base area extraction process shown in the present disclosure;
图4为本公开示出的一种建筑物底座区域提取流程示意图;FIG. 4 is a schematic diagram of a building base area extraction process shown in the present disclosure;
图5为本公开示出的一种神经网络训练方法流程示意图;5 is a schematic flow chart of a neural network training method shown in the present disclosure;
图6为本公开示出的一种神经网络训练方法的方法流程图;FIG. 6 is a method flowchart of a neural network training method shown in the present disclosure;
图7为本公开示出的一种神经网络训练方法流程示意图;FIG. 7 is a schematic flowchart of a neural network training method shown in the present disclosure;
图8为本公开示出的一种建筑物底座提取网络训练流程示意图;FIG. 8 is a schematic diagram of a building base extraction network training process shown in the present disclosure;
图9为本公开示出的一种建筑物底座提取网络训练流程示意图;FIG. 9 is a schematic diagram of a building base extraction network training process shown in the present disclosure;
图10为本公开示出的一种神经网络训练装置结构示意图;FIG. 10 is a schematic structural diagram of a neural network training device shown in the present disclosure;
图11为本公开示出的一种电子设备的硬件结构示意图。FIG. 11 is a schematic diagram of a hardware structure of an electronic device shown in the present disclosure.
具体实施方式Detailed ways
下面将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实现方式并不代表与本公开相一致的所有实现方式。相反,它们仅是与如 所附权利要求书中所详述的、本公开的一些方面相一致的设备和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this disclosure. Rather, they are merely examples of devices and methods consistent with aspects of the present disclosure as recited in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在可以包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。还应当理解,本文中所使用的词语“如果”,取决于语境,可以被解释成为“在……时”或“当……时”或“响应于确定”。The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if", as used herein, could be interpreted as "at" or "when" or "in response to a determination", depending on the context.
本公开旨在提出一种神经网络训练方法。该方法利用同一建筑物底座区域不会发生变化的特性,在同一区域对应的多帧采集图像之间共享底座区域真值信息,达到扩充训练样本的效果,进而有助于利用少量的有标注样本,训练出高精度的建筑物底座提取网络。The present disclosure aims to propose a neural network training method. This method takes advantage of the fact that the base area of the same building will not change, and shares the ground truth information of the base area between the multi-frame acquisition images corresponding to the same area, so as to achieve the effect of expanding the training samples, which in turn helps to utilize a small number of labeled samples , to train a high-precision building base extraction network.
请参见图1,图1为本公开示出的一种神经网络训练方法的方法流程图。所述神经网络训练方法可以应用于电子设备中。其中,所述电子设备可以通过搭载与神经网络训练方法对应的软件装置执行所述方法。所述电子设备的类型可以是笔记本电脑,计算机,服务器,手机,PAD终端等。在本公开中不特别限定所述电子设备的类型。所述电子设备可以是客户端设备或服务端设备。所述服务端设备可以是云端。以下以执行主体为电子设备(以下简称设备)为例进行说明。Please refer to FIG. 1 . FIG. 1 is a method flowchart of a neural network training method shown in the present disclosure. The neural network training method can be applied to electronic equipment. Wherein, the electronic device may implement the method by carrying a software device corresponding to the neural network training method. The type of the electronic device may be a notebook computer, a computer, a server, a mobile phone, a PAD terminal and the like. The type of the electronic device is not particularly limited in the present disclosure. The electronic device may be a client device or a server device. The server device may be a cloud. In the following, an electronic device (hereinafter referred to as device) is taken as an example for description.
如图1所示,所述方法可以包括:As shown in Figure 1, the method may include:
S102,针对多个区域中的每个区域,获取与所述区域对应的一帧或多帧采集图像;其中,在所述区域对应多帧采集图像的情况下,存在至少两帧所述采集图像具有不同的采集角度。S102. For each of the multiple regions, acquire one or more frames of captured images corresponding to the region; wherein, in the case where the region corresponds to multiple frames of captured images, there are at least two frames of the captured images With different acquisition angles.
所述采集图像可以是由任意能够采集到所述多个区域的图像的图像采集设备采集的。在针对相同区域采集的多帧采集图像中,存在至少两帧具有不同的采集角度的采集图像,由此可以丰富训练样本包含的信息,提升神经网络适应性。The captured image may be captured by any image capturing device capable of capturing images of the multiple regions. Among the multi-frame acquisition images collected for the same area, there are at least two frames of acquisition images with different acquisition angles, thereby enriching the information contained in the training samples and improving the adaptability of the neural network.
所述采集图像可以按照区域分类存储在存储介质中。所述设备可以从存储介质中获取采集图像。The collected images may be classified and stored in the storage medium according to regions. The device can acquire the collected images from the storage medium.
在一些实现方式中,所述采集图像可以包括针对所述多个区域采集的多时相图像。所述多时相图像,可以是指在不同时刻针对同一地区采集得到的多帧遥感图像。In some implementations, the acquired images may include multi-temporal images acquired for the plurality of regions. The multi-temporal image may refer to multiple frames of remote sensing images collected for the same area at different times.
S104,将所述区域对应的一帧采集图像作为所述区域对应的目标采集图像进行底座区域真值信息标注。S104. Use a frame of captured image corresponding to the area as a target captured image corresponding to the area to mark the true value information of the base area.
所述目标采集图像可以是从所述区域对应的一帧或多帧采集图像中任意选取的清晰度达标的图像。The target captured image may be arbitrarily selected from one or more frames of captured images corresponding to the region with a resolution that meets the standard.
在一些实现方式中,可以分别从与每个区域对应的采集图像中选取至少一帧采集图像作为目标采集图像。然后通过预先标注的方式,进行底座区域真值信息标注。In some implementation manners, at least one frame of the captured image may be respectively selected from the captured images corresponding to each region as the target captured image. Then, mark the true value information of the base area by pre-marking.
其中,所述底座区域真值信息可以是像素级的真值信息。所述底座区域真值信息可以是将遥感图像中的建筑物底座区域内的像素点的值置为1,所述底座区域之外的像素点的值置为0。Wherein, the ground truth information of the base area may be pixel level ground truth information. The true value information of the base area may be to set the value of the pixel points in the base area of the building in the remote sensing image to 1, and set the value of the pixel points outside the base area to 0.
S106,针对每个区域,将所述区域对应的目标采集图像所标注的底座区域真值信息,确定为所述区域对应的各帧采集图像的底座区域真值信息,基于多个区域分别对应的采集图像和目标采集图像,得到训练样本集以基于所述训练样本集进行神经网络训练。S106, for each area, determine the base area truth information marked in the target acquisition image corresponding to the area as the base area truth information of each frame acquisition image corresponding to the area, based on the corresponding The image is collected and the target image is collected, and a training sample set is obtained to perform neural network training based on the training sample set.
在一些实现方式中,可以将S104中为每个区域对应的目标采集图像标注的底座区域真值信息,作为每个区域中各采集图像对应的真值信息,由此达到扩充训练样本的目的。In some implementation manners, the base region ground truth information marked for the target acquisition image corresponding to each region in S104 can be used as the truth value information corresponding to each acquisition image in each region, thereby achieving the purpose of expanding training samples.
由于同一区域的建筑物底座是不会发生变化的,在对相同区域采集的各采集图像进行图像配准后,各采集图像中的建筑物的底座区域和位置是相同的。即针对相同区域任一帧采集图像的底座区域真值信息进行标注并作为该区域对应的目标采集图像,可视为 针对该区域的各帧采集图像均进行了底座区域真值信息的标注,从而进行了样本扩充,即通过少量的标注操作得到大量的训练样本。Since the base of the building in the same area will not change, after image registration is performed on the acquired images collected in the same area, the area and position of the base of the building in each acquired image are the same. That is, to mark the true value information of the base area of any frame of the image in the same area and use it as the target acquisition image corresponding to the area, it can be regarded as that the true value information of the base area has been marked for each frame of the acquired image in the area, so that Sample expansion is carried out, that is, a large number of training samples are obtained through a small amount of labeling operations.
在一些实现方式中,可以基于得到的训练样本集进行神经网络训练。In some implementation manners, neural network training can be performed based on the obtained training sample set.
请参见图2,图2为公开示出的一种神经网络训练方法流程示意图。Please refer to FIG. 2 . FIG. 2 is a schematic flowchart of a neural network training method disclosed publicly.
如图2所示,所述方法包括:As shown in Figure 2, the method includes:
S202,获取所述训练样本集。S202. Acquire the training sample set.
S204,利用建筑物底座提取网络,提取所述训练样本集中各采集图像分别对应的底座区域。S204, using the building base extraction network to extract base areas corresponding to the collected images in the training sample set.
S206,基于所述各采集图像分别对应的底座区域真值信息以及针对所述各采集图像分别获得的底座区域,调整所述建筑物底座提取网络的网络参数。S206. Adjust the network parameters of the building base extraction network based on the ground truth information of the base area corresponding to the respective collected images and the base area respectively obtained for the respective collected images.
在一些实现方式中,设备可以响应于网络训练请求,执行S202。In some implementation manners, the device may execute S202 in response to the network training request.
在一些实现方式中,训练样本集可以存储在存储介质中,从而使该设备可以从存储介质中获取存储的训练样本集。之后,设备可以执行S204-S206。In some implementation manners, the training sample set may be stored in a storage medium, so that the device can obtain the stored training sample set from the storage medium. Afterwards, the device may perform S204-S206.
本公开中可以至少包括两种方式提取建筑物底座。第一,可以利用建筑物底座提取网络(以下简称底座提取网络),直接提取建筑物底座;第二,可以利用底座提取网络先提取建筑物屋顶以及指示屋顶与底座之间偏移的偏移量,然后通过偏移量对屋顶进行变换间接得到底座。There are at least two ways to extract building bases that may be included in the present disclosure. First, the building base extraction network (hereinafter referred to as the base extraction network) can be used to directly extract the building base; second, the base extraction network can be used to first extract the building roof and the offset indicating the offset between the roof and the base , and then transform the roof indirectly through the offset to get the base.
不同方式对应的底座提取网络的训练方式不同。以下分别针对所述两种方式进行实施例说明。The training methods of the base extraction network corresponding to different methods are different. The following describes the embodiments of the two modes respectively.
(一)直接提取建筑物底座的方式。(1) The method of directly extracting the base of the building.
请参见图3,图3为本公开示出的一种建筑物底座区域提取流程示意图。Please refer to FIG. 3 . FIG. 3 is a schematic diagram of a process of extracting a building base area shown in the present disclosure.
如图3所示,将遥感图像输入所述底座提取网络后可以直接获得底座区域。As shown in Figure 3, the base area can be obtained directly after the remote sensing image is input into the base extraction network.
图3示出的底座提取网络,可以是基于目标检测网络构建的网络。在一些实现方式中,所述目标检测网络可以是基于RCNN(Region Convolutional Neural Network,区域卷积神经网络),FAST-RCNN(Fast Region Convolutional Neural Network,快速区域卷积神经网络),FASTER-RCNN(Faster Region Convolutional Neural Network,更快速的区域卷积神经网络)或MASK-RCNN(Mask Region Convolutional Neural Network,掩膜区域卷积神经网络)中的任一构建的。The base extraction network shown in FIG. 3 may be a network constructed based on a target detection network. In some implementations, the target detection network can be based on RCNN (Region Convolutional Neural Network, regional convolutional neural network), FAST-RCNN (Fast Region Convolutional Neural Network, fast regional convolutional neural network), FASTER-RCNN ( Faster Region Convolutional Neural Network, Faster Regional Convolutional Neural Network) or MASK-RCNN (Mask Region Convolutional Neural Network, Mask Region Convolutional Neural Network).
在一些实现方式中,为了提升底座区域提取精度,可以采用对区域表征精度更高的MASK-RCNN。所述MASK-RCNN可以包括RPN(Region Proposal Network,候选框生成网络),以及RoI Align(Region of Interest Align,感兴趣区域对齐)单元等。In some implementations, in order to improve the accuracy of base region extraction, a MASK-RCNN with higher accuracy for region representation can be used. The MASK-RCNN may include RPN (Region Proposal Network, candidate frame generation network), and RoI Align (Region of Interest Align, region of interest alignment) unit, etc.
其中,RPN网络用于生成与采集图像中各建筑物对应的候选框。在得到候选框后,可以进行候选框的回归和分类,得到各建筑物对应的边框。RoI Align单元用于根据建筑物对应的边框,从采集图像中提取出与建筑物对应的视觉特征。之后可以利用建筑物对应的视觉特征,根据目标检测网络的功能需求提取底座区域,屋顶区域,偏移量以及屋顶位置等。Among them, the RPN network is used to generate candidate frames corresponding to each building in the collected image. After the candidate frame is obtained, the regression and classification of the candidate frame can be performed to obtain the frame corresponding to each building. The RoI Align unit is used to extract the visual features corresponding to the building from the collected image according to the frame corresponding to the building. Afterwards, the corresponding visual features of the building can be used to extract the base area, roof area, offset, and roof position according to the functional requirements of the target detection network.
在获取训练样本集后,基于上述直接提取建筑物底座的方式,神经网络训练的方法可以包括:执行S204时,设备可以将训练样本集中的各采集图像分别输入底座提取网络进行底座提取,获得各采集图像分别对应的底座区域。After obtaining the training sample set, based on the above-mentioned method of directly extracting the building base, the neural network training method may include: when executing S204, the device may input each collected image in the training sample set into the base extraction network for base extraction, and obtain each Acquire images respectively corresponding to the pedestal area.
然后,在执行S206时,可以利用预设的损失函数,根据各采集图像标注的底座区域真值信息,以及各采集图像分别对应的底座区域,得到各采集图像分别对应的底座区域损失信息。之后可以利用反向传播方法,在得到下降梯度后,调整底座提取网络的网络参数。Then, when performing S206, the preset loss function can be used to obtain the base area loss information corresponding to each acquired image according to the ground truth information of each acquired image marked base area and the base area corresponding to each acquired image. Afterwards, the backpropagation method can be used to adjust the network parameters of the base extraction network after the descent gradient is obtained.
由此执行多轮训练后,即完成网络训练,获得训练完成的建筑物底座提取网络。Thus, after performing multiple rounds of training, the network training is completed, and the trained building base extraction network is obtained.
在此方案中,可以利用基于同一建筑物底座区域不会发生变化的特性进行样本扩充得到的训练样本集,进行建筑物底座提取网络训练,有助于利用少量的有标注样本,训 练出高精度的建筑物底座提取网络。In this scheme, the training sample set obtained by sample expansion based on the characteristics that the same building base area will not change can be used for building base extraction network training, which helps to use a small number of labeled samples to train high-precision The building base extraction network for .
(二)间接提取建筑物底座的方式。(2) The method of indirect extraction of the building base.
请参见图4,图4为本公开示出的一种建筑物底座区域提取流程示意图。Please refer to FIG. 4 . FIG. 4 is a schematic diagram of a process for extracting building plinth regions shown in the present disclosure.
如图4所示,将遥感图像输入所述底座提取网络后可以先得到建筑物的屋顶区域,以及指示屋顶与底座之间偏移的偏移量。然后可以利用该偏移量,对屋顶区域进行变换(例如平移变换),得到底座区域。As shown in Fig. 4, after inputting the remote sensing image into the base extraction network, the roof area of the building and the offset indicating the offset between the roof and the base can be obtained first. The offset can then be used to transform (for example, translate) the roof area to obtain the base area.
其中图4示出的底座提取网络,可以包括屋顶区域提取网络,偏移量提取网络。所述屋顶区域提取网络与偏移量提取网络可以是基于目标检测网络构建的网络。所述目标检测网络可以是RCNN,FAST-RCNN,FASTER-RCNN或MASK-RCNN中的任一。在一些实现方式中,为了提升底座区域提取精度,可以采用对区域表征精度更高的MASK-RCNN。The base extraction network shown in FIG. 4 may include a roof area extraction network and an offset extraction network. The roof area extraction network and the offset extraction network may be networks constructed based on a target detection network. The target detection network can be any one of RCNN, FAST-RCNN, FASTER-RCNN or MASK-RCNN. In some implementations, in order to improve the accuracy of base region extraction, a MASK-RCNN with higher accuracy for region representation can be used.
在一些实现方式中,所述屋顶区域提取网络与偏移量提取网络可以共享特征提取网络。共享的特征提取网络可以包括骨干网络,区域特征提取单元等。由此可以简化网络结构,便于网络训练。响应于所述屋顶区域提取网络与偏移量提取网络为MASK-RCNN,两个网络还可以共享RPN,以及RoI Align单元等。In some implementations, the roof area extraction network and the offset extraction network may share a feature extraction network. The shared feature extraction network can include a backbone network, regional feature extraction units, etc. This can simplify the network structure and facilitate network training. In response to the fact that the roof area extraction network and the offset extraction network are MASK-RCNN, the two networks can also share RPN, RoI Align units, and the like.
请参见图5,图5为本公开示出的一种神经网络训练方法流程示意图。Please refer to FIG. 5 , which is a schematic flowchart of a neural network training method shown in the present disclosure.
如图5所示,在获取训练样本集后,基于上述间接提取建筑物底座的方式,神经网络训练的方法可以包括:As shown in Figure 5, after obtaining the training sample set, based on the above-mentioned method of indirectly extracting the building base, the neural network training method may include:
S502,利用建筑物底座提取网络,获得所述训练样本集中各采集图像分别对应的屋顶区域与偏移量;其中,所述偏移量表征屋顶区域与底座区域之间的偏移量。S502. Using the building base extraction network, obtain the roof area and offset corresponding to each collected image in the training sample set; wherein the offset represents an offset between the roof area and the base area.
在一些实现方式中,可以分别利用所述建筑物底座提取网络包括的屋顶区域提取网络与偏移量提取网络,提取所述各采集图像分别对应的屋顶区域与偏移量。In some implementation manners, the roof area extraction network and the offset extraction network included in the building base extraction network may be used to extract the roof area and offset corresponding to the collected images respectively.
S504,针对各采集图像,基于获得的所述采集图像对应的偏移量,对与所述采集图像对应的所述屋顶区域进行平移变换,获得所述采集图像对应的底座区域。S504. For each collected image, based on the acquired offset corresponding to the collected image, perform translation transformation on the roof area corresponding to the collected image, to obtain a base area corresponding to the collected image.
在一些实现方式中,可以针对所述屋顶区域包含的各像素点,分别进行平移操作,得到所述底座区域。In some implementation manners, a translation operation may be performed on each pixel contained in the roof area to obtain the base area.
S506,基于所述各采集图像分别对应的底座区域真值信息以及针对所述各采集图像分别获得的底座区域,调整所述建筑物底座提取网络的网络参数。S506. Adjust the network parameters of the building base extraction network based on the ground truth information of the base area corresponding to the collected images and the base area respectively obtained for the collected images.
在一些实现方式中,可以利用预设的损失函数,根据为所述各采集图像标注的底座区域真值信息,以及各采集图像分别对应的底座区域,得到各采集图像分别对应的底座区域损失信息。之后可以利用反向传播方法,在得到下降梯度后,调整所述底座提取网络的网络参数。In some implementations, the preset loss function can be used to obtain the base area loss information corresponding to each acquired image according to the ground truth information of the base area marked for each acquired image and the base area corresponding to each acquired image. . Afterwards, the backpropagation method can be used to adjust the network parameters of the base extraction network after obtaining the descending gradient.
由此执行多轮训练后,即完成网络训练,获得训练完成的建筑物底座提取网络。Thus, after performing multiple rounds of training, the network training is completed, and the trained building base extraction network is obtained.
在所述方案中,一方面,通过先提取建筑物屋顶区域和偏移量,然后通过偏移量对屋顶区域进行变换,间接得到建筑物底座区域,可以利用采集图像中屋顶区域特征和偏移量特征显著的特点,提升底座提取精度,并且即便在建筑物底座被遮挡的情形下,也可以得到精度较高的建筑物底座。另一方面,可以利用基于同一建筑物底座区域不会发生变化的特性进行样本扩充得到的训练样本集,进行建筑物底座提取网络训练,有助于利用少量的有标注样本,训练出高精度的建筑物底座提取网络。In the scheme, on the one hand, by first extracting the roof area and offset of the building, and then transforming the roof area by the offset, the base area of the building can be obtained indirectly, and the features of the roof area and the offset in the collected image can be used The characteristic of significant quantitative features improves the accuracy of base extraction, and even when the building base is blocked, it can also obtain a higher-precision building base. On the other hand, the training sample set obtained by sample expansion based on the characteristics that the same building base area will not change can be used for building base extraction network training, which is helpful to use a small number of labeled samples to train high-precision Building base extraction network.
在一些实现方式中,可以利用同一建筑物底座区域的形状和位置不会发生变化的特性,在同一区域对应的多帧采集图像之间共享底座区域真值信息和底座位置真值信息,达到扩充训练样本的效果,进而有助于利用少量的有标注样本,训练出高精度的建筑物底座提取网络。In some implementations, the feature that the shape and position of the base area of the same building will not change can be used to share the true value information of the base area and the true value information of the base position among the multi-frame acquisition images corresponding to the same area, so as to achieve the expansion The effect of training samples, which in turn helps to train a high-precision building base extraction network with a small number of labeled samples.
请参见图6,图6为本公开示出的一种神经网络训练方法的方法流程图。如图6所示,所述方法可以包括:Please refer to FIG. 6 . FIG. 6 is a method flowchart of a neural network training method shown in the present disclosure. As shown in Figure 6, the method may include:
S604,对所述每个区域分别对应的所述目标采集图像进行底座位置真值信息标注。S604. Mark the base position true value information on the collected images of the target corresponding to each of the regions.
在一些实现方式中,可以预先进行底座位置真值信息标注。所述底座位置真值信息可以包括底座区域内中心像素点的坐标,以及所述底座区域的宽、高信息。在一些实现方式中,可以使用R=(cx,cy,w,h)表示底座位置真值信息。其中,cx,cy分别表示底座区域中心像素点的横、纵坐标,w,h分别表示底座区域的宽、高。In some implementation manners, the true value information of the base position may be marked in advance. The ground position information of the base may include the coordinates of the center pixel in the base area, and the width and height information of the base area. In some implementation manners, R=(cx, cy, w, h) may be used to represent the ground truth information of the base position. Among them, cx, cy represent the horizontal and vertical coordinates of the center pixel of the base area, respectively, and w, h represent the width and height of the base area, respectively.
S606,针对每个区域,将所述区域对应的目标采集图像所标注的底座位置真值信息,确定为所述区域对应的各帧采集图像的底座位置真值信息。S606. For each area, determine the base position truth information marked in the target acquisition image corresponding to the area as the base position truth information of each frame acquisition image corresponding to the area.
在一些实现方式中,可以将S604中为每个区域对应的目标采集图像标注的底座位置真值信息,作为每个区域中各采集图像对应的真值信息,由此达到扩充训练样本的目的。由此,得到的训练样本集中各采集图像标注有底座区域真值信息与底座位置真值信息。In some implementation manners, the base position truth information marked for the target acquisition image corresponding to each area in S604 can be used as the truth information corresponding to each acquisition image in each area, thereby achieving the purpose of expanding training samples. Thus, each acquired image in the obtained training sample set is marked with the ground truth information of the base area and the ground truth information of the base position.
在一些实现方式中,可以基于得到的训练样本集进行神经网络训练。In some implementation manners, neural network training can be performed based on the obtained training sample set.
请参见图7,图7为公开示出的一种神经网络训练方法流程示意图。Please refer to FIG. 7 , which is a schematic flow chart of a neural network training method disclosed publicly.
如图7所述,该方法可以包括S702~S708。As shown in FIG. 7, the method may include S702-S708.
S702,获取所述训练样本集。S702. Acquire the training sample set.
S704,利用建筑物底座提取网络包括的屋顶区域提取网络、偏移量提取网络,以及屋顶位置提取网络,获得所述训练样本集中各采集图像分别对应的屋顶区域、偏移量与屋顶位置,其中,所述偏移量表征屋顶区域与底座区域之间的偏移量;S704, using the roof area extraction network, offset extraction network, and roof position extraction network included in the building base extraction network to obtain the roof area, offset, and roof position corresponding to each collected image in the training sample set, wherein , the offset characterizes the offset between the roof area and the base area;
S706,基于所述各采集图像分别对应的底座区域真值信息,以及针对所述各采集图像分别获得的屋顶区域与偏移量,调整所述屋顶区域提取网络的网络参数;S706. Adjust the network parameters of the roof area extraction network based on the ground truth information of the base area corresponding to each of the collected images, and the roof area and offset respectively obtained for each of the collected images;
S708,基于所述各采集图像分别对应的底座位置真值信息,以及针对所述各采集图像分别获得的屋顶位置与偏移量,调整所述屋顶位置提取网络和所述偏移量提取网络的网络参数。S708. Based on the true value information of the base positions corresponding to the collected images, and the roof positions and offsets respectively obtained for the collected images, adjust the relationship between the roof position extraction network and the offset extraction network. Network parameters.
其中,S706与S708并不存在严格的先后执行顺序。例如,可以并行执行S706与S708。本公开不对S706与S708的执行顺序进行特别限定。Among them, S706 and S708 do not have a strict execution sequence. For example, S706 and S708 may be executed in parallel. The present disclosure does not specifically limit the execution sequence of S706 and S708.
所述神经网络训练方法可以应用于电子设备中。The neural network training method can be applied to electronic equipment.
在一些实现方式中,所述设备可以响应于网络训练请求,执行S702,从存储介质中获取所述训练样本集。In some implementation manners, the device may execute S702 to acquire the training sample set from a storage medium in response to the network training request.
之后,所述设备可以执行S704-S708。Afterwards, the device may execute S704-S708.
所述建筑物底座提取网络(以下简称底座提取网络),可以是基于目标检测网络构建的网络。在一些实现方式中,为了提升底座区域提取精度,可以采用对区域表征精度更高的MASK-RCNN作为目标检测网络。The building base extraction network (hereinafter referred to as the base extraction network) may be a network constructed based on a target detection network. In some implementations, in order to improve the accuracy of base area extraction, MASK-RCNN with higher accuracy for area representation can be used as the target detection network.
所述底座提取网络可以包括屋顶区域提取网络,偏移量提取网络,以及屋顶位置提取网络。其中,所述屋顶区域提取网络,可以用于提取建筑物屋顶区域。所述偏移量提取网络,可以用于提取屋顶与底座之间的偏移量。所述屋顶位置提取网络可以用于提取屋顶位置。然后可以利用该偏移量,对屋顶区域进行变换(例如平移变换),得到底座区域。同时,通过偏移量还可以对屋顶位置进行平移得到底座位置。The base extraction network may include a roof area extraction network, an offset extraction network, and a roof position extraction network. Wherein, the roof area extraction network can be used to extract building roof areas. The offset extraction network can be used to extract the offset between the roof and the base. The roof position extraction network may be used to extract roof positions. The offset can then be used to transform (for example, translate) the roof area to obtain the base area. At the same time, the position of the roof can be translated to obtain the position of the base through the offset.
请参见图8,图8为本公开示出的一种建筑物底座提取网络训练流程示意图。Please refer to FIG. 8 . FIG. 8 is a schematic diagram of a network training process for building base extraction shown in the present disclosure.
图8示出的底座提取网络包括屋顶区域提取网络,偏移量提取网络以及屋顶位置提取网络。其中,通过屋顶区域提取网络,偏移量提取网络提取得到的屋顶区域和偏移量,可以进行平移变换,得到底座区域。The base extraction network shown in FIG. 8 includes a roof area extraction network, an offset extraction network and a roof position extraction network. Among them, the roof area and offset extracted by the roof area extraction network and the offset extraction network can be translated and transformed to obtain the base area.
在训练该网络时,可以对该网络进行改造,增加底座区域损失信息确定分支与底座位置损失信息确定分支,以根据确定的损失信息进行网络参数更新。所述底座区域损失信息可以表征得到的底座区域与底座区域真值信息之间的误差。所述底座位置损失信息可以表征得到的底座位置与底座位置真值信息之间的误差。When training the network, the network can be modified to add base area loss information determination branch and base position loss information determination branch, so as to update network parameters according to the determined loss information. The base area loss information may represent an error between the obtained base area and the true value information of the base area. The base position loss information may represent an error between the obtained base position and the base position true value information.
在一些实现方式中,在执行S706时,可以执行S7062,针对所述各采集图像中的每帧图像,利用所述图像对应的偏移量,对所述图像对应的底座区域真值信息进行平 移,得到所述图像对应的第一屋顶区域真值信息。然后可以执行S7064,基于所述图像对应的所述第一屋顶区域真值信息与针对所述图像获得的屋顶区域,得到所述图像对应的屋顶区域损失信息。之后可以执行S7066,基于所述各采集图像分别对应的屋顶区域损失信息,通过反向传播调整所述屋顶区域提取网络的网络参数。In some implementation manners, when S706 is executed, S7062 may be executed, and for each frame of the image in the collected images, use the offset corresponding to the image to translate the true value information of the base area corresponding to the image , to obtain the ground truth information of the first roof region corresponding to the image. Then S7064 may be executed to obtain roof area loss information corresponding to the image based on the ground truth information of the first roof area corresponding to the image and the roof area obtained for the image. Afterwards, S7066 may be executed to adjust the network parameters of the roof region extraction network through backpropagation based on the roof region loss information respectively corresponding to the collected images.
在前述S502-S506中记载的确定底座区域损失信息的方式中,需要利用偏移量对提取出的屋顶区域进行平移,得到底座区域,然后利用底座区域真值信息计算底座区域损失信息。In the method of determining the base area loss information described in S502-S506 above, it is necessary to use the offset to translate the extracted roof area to obtain the base area, and then calculate the base area loss information using the ground truth information.
但是,通常提取出的屋顶区域的尺寸为预设大小。例如屋顶区域的尺寸为14*14。此时如果预测出的偏移量过大,在对屋顶区域平移时,可能会将屋顶区域内的像素点平移至所述预设大小的矩阵外,从而存在信息丢失,无法得到准确的底座区域损失信息,无法进行网络收敛的弊端。However, usually the size of the extracted roof area is a preset size. For example the size of the roof area is 14*14. At this time, if the predicted offset is too large, when the roof area is translated, the pixels in the roof area may be translated out of the matrix of the preset size, resulting in information loss and an accurate base area cannot be obtained Disadvantages of loss of information and failure of network convergence.
而在S7062-S7066记载的方案中,所述底座区域真值信息为像素级真值信息,即针对采集图像中的各像素点进行0或1的标注。其中,标注为1的像素点可以认为是底座区域内的像素点;标注为0的像素点可以认为是底座区域之外的像素点。在对底座区域真值信息进行平移变换时,不论提取出的偏移量为多大,底座区域真值信息大概率会在对应采集图像内部平移,因此不会导致真值信息的缺失,即,S7062中得到的第一屋顶区域真值信息不会缺少实际的屋顶区域真值信息。S7064中可以基于第一屋顶区域真值信息和屋顶区域得到准确的屋顶区域损失信息,保证网络可以顺利收敛。In the solutions described in S7062-S7066, the truth information of the base area is pixel-level truth information, that is, 0 or 1 is marked for each pixel in the captured image. Among them, the pixels marked as 1 can be considered as the pixels in the base area; the pixels marked as 0 can be considered as the pixels outside the base area. When performing translation transformation on the true value information of the base area, no matter how large the extracted offset is, the true value information of the base area will be translated within the corresponding captured image with a high probability, so the lack of true value information will not be caused, that is, S7062 The ground-truth information of the first roof area obtained in will not lack the actual roof-area ground-truth information. In S7064, accurate roof area loss information can be obtained based on the first roof area true value information and the roof area, so as to ensure smooth convergence of the network.
在得到屋顶区域损失信息后,则可以通过计算下降梯度,利用反向传播等方式调整所述屋顶区域提取网络的网络参数。如此即可实现针对屋顶区域提取网络的训练。After obtaining the roof area loss information, the network parameters of the roof area extraction network can be adjusted by calculating the descent gradient and using back propagation. This enables the training of the network for roof region extraction.
在一些实现方式中,在执行S708时,可以执行S7082,针对所述各采集图像中的每帧图像,利用所述图像对应的偏移量,对所述图像对应的屋顶位置进行平移,获得所述图像对应的底座位置。然后可以执行S7084,基于所述图像对应的底座位置真值信息以及针对所述图像获得的底座位置,得到所述图像对应的底座位置损失信息。之后可以执行S7086,基于所述各采集图像分别对应的底座位置损失信息,通过反向传播调整所述屋顶位置提取网络和所述偏移量提取网络的网络参数。In some implementations, when S708 is executed, S7082 may be executed, and for each frame of image in the collected images, using the offset corresponding to the image, the position of the roof corresponding to the image is translated to obtain the The above image corresponds to the position of the base. Then S7084 may be executed to obtain base position loss information corresponding to the image based on the base position truth information corresponding to the image and the base position obtained for the image. Afterwards, S7086 may be executed to adjust the network parameters of the roof position extraction network and the offset extraction network through backpropagation based on the base position loss information respectively corresponding to the collected images.
在一些实现方式中,可以使用R 0=(cx 0,cy 0,w 0,h 0)表示提取出的屋顶位置。其中,cx 0,cy 0分别表示屋顶区域中心像素点的横、纵坐标,w 0,h 0分别表示屋顶区域的宽、高。可以使用O 0=(Δx,Δy)表示提取出的偏移量。其中,Δx,Δy分别表示像素点在X轴和Y轴方向上的偏移。通过F 0=(cx 0+Δx,cy 0+Δy,w 0,h 0)可以得到底座位置。然后利用预设的损失函数(例如交叉熵损失函数),根据底座位置真值信息和底座位置即可得到底座位置损失信息。 In some implementations, R 0 =(cx 0 , cy 0 , w 0 , h 0 ) may be used to represent the extracted roof position. Among them, cx 0 , cy 0 represent the horizontal and vertical coordinates of the center pixel of the roof area, respectively, and w 0 , h 0 represent the width and height of the roof area, respectively. The extracted offset can be represented by O 0 =(Δx, Δy). Among them, Δx and Δy represent the offset of the pixel point in the X-axis and Y-axis directions, respectively. The base position can be obtained by F 0 =(cx 0 +Δx, cy 0 +Δy, w 0 , h 0 ). Then, using a preset loss function (such as a cross-entropy loss function), the base position loss information can be obtained according to the base position truth information and the base position.
在得到底座位置损失信息后,即可计算下降梯度,根据反向传播更新网络参数。由于在提取底座位置时,需要利用屋顶位置提取网络和所述偏移量提取网络获得的屋顶位置与偏移量,因此在反向传播过程中,可以更新屋顶位置提取网络与偏移量提取网络。After obtaining the base position loss information, the descent gradient can be calculated, and the network parameters can be updated according to backpropagation. Since the roof position and offset obtained by the roof position extraction network and the offset extraction network need to be used when extracting the base position, the roof position extraction network and the offset extraction network can be updated during the backpropagation process .
在所述实施例中可以利用扩充的训练样本集,对屋顶区域提取网络、屋顶位置提取网络和所述偏移量提取网络进行训练,以完成针对底座提取网络的训练,得到高精度的底座提取网络。In the described embodiment, the expanded training sample set can be used to train the roof area extraction network, the roof position extraction network and the offset extraction network, so as to complete the training for the base extraction network and obtain high-precision base extraction. network.
在一些实现方式中,所述屋顶区域提取网络、偏移量提取网络与所述屋顶位置提取网络可以共享诸如骨干网络,区域特征提取单元等特征提取网络。由此可以简化网络结构,便于网络训练。在一些实现方式中,响应于所述屋顶区域提取网络与偏移量提取网络为MASK-RCNN。所述屋顶区域提取网络、偏移量提取网络与所述屋顶位置提取网络还可以共享RPN(Region Proposal Network,候选框生成网络),以及RoI Align(Region of Interest Align,感兴趣区域对齐)单元等。In some implementation manners, the roof area extraction network, the offset extraction network and the roof position extraction network may share a feature extraction network such as a backbone network and an area feature extraction unit. This can simplify the network structure and facilitate network training. In some implementations, the roof region extraction network and the offset extraction network are MASK-RCNN. The roof area extraction network, the offset extraction network and the roof position extraction network can also share RPN (Region Proposal Network, candidate frame generation network), and RoI Align (Region of Interest Align, region of interest alignment) unit, etc. .
由此对三个提取网络进行参数调整时可以兼顾调整共享的特征提取网络,使训练过程既可以相互约束,又可以相互促进,从而一方面提高网络训练效率;另一方面促 进共享的特征提取网络提取到对底座区域提取更有益的特征,从而提升底座区域提取精准度。Therefore, when adjusting the parameters of the three extraction networks, the shared feature extraction network can be adjusted, so that the training process can be mutually constrained and mutually promoted, thereby improving the network training efficiency on the one hand; on the other hand, promoting the shared feature extraction network Features that are more beneficial to base region extraction are extracted, thereby improving the accuracy of base region extraction.
在一些实现方式中,可以通过联合训练的方式,提升网络训练效率与网络预测精度。In some implementations, network training efficiency and network prediction accuracy can be improved through joint training.
所述训练样本集的至少部分采集图像还可以标注以下至少一项信息:第二屋顶区域真值信息,真实偏移量以及屋顶位置真值信息。At least part of the collected images of the training sample set may also be marked with at least one of the following information: ground truth information of the second roof area, real offset, and ground truth information of the roof position.
在一些实现方式中,可以采用人工标注的方式进行屋顶区域真值信息,真实偏移量,以及屋顶位置真值信息的标注。In some implementation manners, manual labeling may be used to label the real value information of the roof area, the real offset, and the real value information of the roof position.
在通过训练样本集训练网络时,还可以包括如下至少一项:When training the network through the training sample set, at least one of the following may also be included:
S802,基于所述至少部分采集图像标注的第二屋顶区域真值信息以及针对所述至少部分采集图像获得的屋顶区域,调整所述屋顶区域提取网络的网络参数。S802. Adjust network parameters of the roof region extraction network based on the ground truth information of the second roof region marked on the at least part of the captured image and the roof region obtained for the at least part of the collected image.
S804,基于所述至少部分采集图像标注的真实偏移量以及针对所述至少部分采集图像获得的偏移量,调整所述偏移量提取网络的网络参数。S804. Adjust network parameters of the offset extraction network based on the real offset marked by the at least part of the captured image and the offset obtained for the at least part of the captured image.
S806,基于所述至少部分采集图像标注的屋顶位置真值信息以及针对所述至少部分采集图像获得的屋顶位置,调整所述屋顶位置提取网络的网络参数。S806. Adjust network parameters of the roof position extraction network based on the roof position ground truth information marked on the at least part of the collected images and the roof position obtained for the at least part of the collected images.
在一些实现方式中,在执行S802时,可以采用预设的损失函数(例如交叉熵损失函数),根据第二屋顶区域真值信息以及获得的屋顶区域,得到损失信息。然后根据得到的损失信息计算梯度,并进行反向传播调整屋顶区域提取网络的网络参数。In some implementation manners, when performing S802, a preset loss function (such as a cross-entropy loss function) may be used to obtain loss information according to the second roof area true value information and the obtained roof area. Gradients are then calculated based on the obtained loss information, and backpropagation is performed to adjust the network parameters of the roof region extraction network.
在执行S804时,可以采用预设的损失函数(例如MSE(Mean Square Error,均方差)损失函数),根据真实偏移量以及获得的偏移量,得到损失信息。然后根据得到的损失信息计算梯度,并进行反向传播更新偏移量提取网络的网络参数。When executing S804, a preset loss function (such as an MSE (Mean Square Error, mean square error) loss function) may be used to obtain loss information according to the real offset and the obtained offset. Then calculate the gradient according to the obtained loss information, and perform backpropagation to update the network parameters of the offset extraction network.
在执行S806时,可以采用预设的损失函数(例如Smooth L1(光滑L1范式)损失函数),根据屋顶位置真值信息以及获得的屋顶位置,得到损失信息。然后根据得到的损失信息计算梯度,并进行反向传播更新屋顶位置提取网络的网络参数。When executing S806, a preset loss function (such as a Smooth L1 (smooth L1 paradigm) loss function) may be used to obtain loss information according to the true roof position information and the obtained roof position. Then the gradient is calculated according to the obtained loss information, and the network parameters of the roof position extraction network are updated by backpropagation.
在所述例子中,通过对共享特征提取网络的屋顶区域提取网络、屋顶位置提取网络、偏移量提取网络进行联合训练,一方面,可以引入多方面的的学习信息,使训练过程既可以相互约束,又可以相互促进,从而一方面提高网络训练效率;另一方面促进共享的特征提取网络提取到对底座区域提取更有益的特征,从而提升底座区域提取精准度。In the above example, by jointly training the roof area extraction network, roof position extraction network, and offset extraction network of the shared feature extraction network, on the one hand, various learning information can be introduced, so that the training process can be mutually Constraints can promote each other, so that on the one hand, the network training efficiency can be improved; on the other hand, the shared feature extraction network can be promoted to extract features that are more beneficial to the extraction of the base area, thereby improving the accuracy of the base area extraction.
请继续参见图4,在对图4示出的底座提取网络进行训练时,由于样本标注成本很高,因此无法获取大量包括真实偏移量的标注样本,而使用少量的有标注样本无法训练出高精度的底座提取网络。Please continue to refer to Figure 4. When training the base extraction network shown in Figure 4, due to the high cost of sample labeling, it is impossible to obtain a large number of labeled samples including real offsets, and it is impossible to train with a small number of labeled samples. High-precision base extraction network.
在一些实现方式中,所述训练样本集中的采集图像还标注有第一真实偏移量;所述第一真实偏移量指示所述采集图像中屋顶与底座之间真实的偏移量。In some implementation manners, the collected images in the training sample set are also marked with a first real offset; the first real offset indicates the real offset between the roof and the base in the collected images.
在利用所述训练样本集训练所述偏移量提取网络时,可以执行S402,利用所述偏移量提取网络从多个旋转图像,获得与多种预设角度分别对应的第二预测偏移量;所述第二预测偏移量指示所述旋转图像中屋顶与底座之间的偏移量;所述多个旋转图像通过将所述采集图像分别旋转所述多种预设角度而得到。When using the training sample set to train the offset extraction network, S402 may be executed, using the offset extraction network to obtain second predicted offsets respectively corresponding to various preset angles from multiple rotated images amount; the second predicted offset indicates the offset between the roof and the base in the rotated image; the multiple rotated images are obtained by rotating the collected image by the various preset angles respectively.
所述采集图像,可以是指标注了第一真实偏移量的遥感图像。本公开实施例中,偏移量,指的是遥感图像中屋顶与底座之间的偏移量。例如,屋顶包括10个像素点,将该10个像素点按照所述偏移量进行平移,即可得到底座。The collected image may refer to a remote sensing image marked with the first real offset. In the embodiments of the present disclosure, the offset refers to the offset between the roof and the base in the remote sensing image. For example, the roof includes 10 pixels, and the base can be obtained by translating the 10 pixels according to the offset.
所述第一真实偏移量可以是指示采集图像中建筑物屋顶与底座之间真实偏移量的信息。例如,所述第一真实偏移量可以是(x,y)向量形式的信息。其中,x和y分别表示屋顶区域的像素点与底座区域对应位置的像素点在x轴和y轴方向上的偏移。在一些实现方式中,可以预先根据采集图像中的建筑物屋顶与底座之间的真实偏移量,进行偏移量标注。本公开不对偏移量的标注方式进行特别限定。The first real offset may be information indicating the real offset between the roof and the base of the building in the captured image. For example, the first real offset may be information in the form of (x, y) vector. Wherein, x and y represent the offsets of the pixel points in the roof region and the corresponding pixel points in the base region in the x-axis and y-axis directions, respectively. In some implementation manners, the offset may be marked in advance according to the real offset between the roof and the base of the building in the collected image. The present disclosure does not specifically limit the labeling manner of the offset.
所述预设角度,可以根据业务需求进行设定。所述预设角度的数量可以根据需要扩充的样本量进行确定。例如,需要扩充大量样本,则可以设置大量的预设角度。本公开不对预设角度的数值和数量进行特别限定。所述多种预设角度用于旋转采集图像或所述采集图像对应的图像特征。The preset angle can be set according to business requirements. The number of preset angles can be determined according to the sample size that needs to be expanded. For example, if a large number of samples need to be expanded, a large number of preset angles can be set. The present disclosure does not specifically limit the value and quantity of the preset angles. The multiple preset angles are used to rotate the captured image or the image features corresponding to the captured image.
在一些实现方式中,在执行S402时,可以先利用各预设角度分别生成对应的旋转矩阵。然后针对各预设角度利用该预设角度对应的旋转矩阵,对采集图像包括的各像素点进行移位,得到旋转后的采集图像,即旋转图像。之后,可以将旋转后的采集图像输入所述偏移量提取网络,提取出与该预设角度对应的第二预测偏移量,从而获得与多种预设角度分别对应的第二预测偏移量。需要说明的是,在一些实现方式中,在对采集图像进行旋转时,可以先利用偏移量提取网络包含的特征提取网络对采集图像进行特征提取,得到第一图像特征,之后,对得到的第一图像特征进行旋转。由此可以减少旋转过程的运算量,以及可以减少对旋转后图像进行特征提取时引入的旋转误差,有助于提升网络训练效果。In some implementation manners, when performing S402, each preset angle may be used to generate corresponding rotation matrices respectively. Then, for each preset angle, the rotation matrix corresponding to the preset angle is used to shift each pixel included in the captured image to obtain a rotated captured image, that is, a rotated image. Afterwards, the rotated captured image can be input into the offset extraction network to extract the second predicted offset corresponding to the preset angle, thereby obtaining the second predicted offset corresponding to various preset angles quantity. It should be noted that, in some implementations, when the captured image is rotated, the feature extraction network included in the offset extraction network can be used to perform feature extraction on the captured image to obtain the first image feature, and then, the obtained The first image feature is rotated. This can reduce the amount of calculation in the rotation process, and can reduce the rotation error introduced when extracting features from the rotated image, which helps to improve the network training effect.
然后可以执行S404,将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量。Then S404 may be executed to respectively rotate the first real offset by the various preset angles to obtain second real offsets respectively corresponding to the various preset angles.
在一些实现方式中,在执行S404时,可以利用各预设角度分别对应的旋转矩阵,对采集图像的第一真实偏移量进行旋转,得到将采集图像旋转多种预设角度后分别对应的第二真实偏移量。In some implementation manners, when executing S404, the rotation matrices corresponding to the respective preset angles can be used to rotate the first real offset of the captured image to obtain the respective rotation matrices corresponding to the multiple preset angles of the captured image. Second real offset.
之后可以执行S406,基于与所述多种预设角度分别对应的第二真实偏移量和获得的第二预测偏移量,调整所述偏移量提取网络的网络参数。Afterwards, S406 may be executed to adjust network parameters of the offset extraction network based on the second real offset corresponding to the various preset angles and the obtained second predicted offset.
在一些实现方式中,在执行S406时,可以利用预设的损失函数(例如交叉熵损失函数),针对每种预设角度,根据将采集图像的第一真实偏移量旋转该预设角度后对应的第二真实偏移量和获得的与该预设角度对应的第二预测偏移量,得到将采集图像旋转该预设角度后对应的偏移量损失信息。然后,基于将采集图像旋转多种预设角度后分别对应的偏移量损失信息,利用诸如求和,求积,求平均数等方式,确定总损失,并利用确定的总损失计算下降梯度,通过反向传播调整所述偏移量提取网络的网络参数。In some implementations, when performing S406, a preset loss function (such as a cross-entropy loss function) may be used, for each preset angle, after the first real offset of the captured image is rotated by the preset angle The corresponding second real offset and the obtained second predicted offset corresponding to the preset angle obtain the corresponding offset loss information after the acquired image is rotated by the preset angle. Then, based on the offset loss information corresponding to the acquired image rotated by various preset angles, the total loss is determined by methods such as summation, product, and average, and the descent gradient is calculated using the determined total loss. The network parameters of the offset extraction network are adjusted by backpropagation.
在所述方案中,可以利用偏移量提取网络,获得与多种预设角度分别对应的第二预测偏移量,以及将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量,然后可以利用与所述多种预设角度分别对应的第二真实偏移量和获得的第二预测偏移量,调整所述偏移量提取网络的网络参数。In the solution, the offset extraction network can be used to obtain second predicted offsets respectively corresponding to various preset angles, and to rotate the first real offsets by the various preset angles respectively , to obtain the second real offsets respectively corresponding to the various preset angles, and then use the second real offsets and the obtained second predicted offsets respectively corresponding to the various preset angles, Adjust the network parameters of the offset extraction network.
因此可以利用图像旋转一定角度后,偏移量也会旋转该角度的特性,通过对图像(或其图像特征)和真实偏移量进行旋转,达到扩充具有真实偏移量的图像样本的效果,从而可以利用少量标注了偏移量的标注数据,训练得到高精度偏移量提取网络。Therefore, after the image is rotated by a certain angle, the offset will also rotate by the angle. By rotating the image (or its image features) and the real offset, the effect of expanding the image sample with the real offset can be achieved. In this way, a small amount of labeled data with offsets can be used to train a high-precision offset extraction network.
由于在进行采集图像与真实偏移量旋转的过程中,也会对采集图像中涵盖的其它信息进行旋转处理,在利用旋转后的采集图像对所述底座提取网络进行训练时,该底座提取网络的其它分支需要对旋转后的采集图像的其它信息进行拟合,由此增加了训练时间,降低了训练效率。Since in the process of rotating the captured image and the real offset, other information covered in the captured image will also be rotated, when the rotated captured image is used to train the base extraction network, the base extraction network Other branches need to fit other information of the rotated image, which increases the training time and reduces the training efficiency.
在一些实现方式中,可以将采集图像的旋转过程置于偏移量提取网络中,由此可以在偏移量提取网络内部进行采集图像旋转,不会影响其它分支的训练,即不会影响其它分支的收敛速度,进而提升了网络训练效率。In some implementations, the rotation process of the acquired image can be placed in the offset extraction network, so that the acquired image can be rotated inside the offset extraction network without affecting the training of other branches, that is, it will not affect other branches. The convergence speed of the branch improves the network training efficiency.
在执行S402时,可以针对多种预设角度中的每一预设角度,执行S4022,利用偏移量提取网络,将所述采集图像对应的第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征。然后可以执行S4024,基于所述第二图像特征,得到与所述预设角度对应的第二预测偏移量。When executing S402, S4022 can be executed for each preset angle among various preset angles, and the first image feature corresponding to the collected image is rotated by the preset angle by using the offset extraction network to obtain the The second image feature corresponding to the preset angle. Then S4024 may be executed to obtain a second predicted offset corresponding to the preset angle based on the second image feature.
所述第一图像特征,可以是指采集图像经过若干卷积层、池化层等的特征提取处理后得到的图像特征。在一些实现方式中,所述偏移量提取网络可以是基于 MASK-RCNN构建的网络。所述偏移量提取网络可以通过包括的骨干网络以及RoI Align单元对采集图像进行特征提取得到所述第一图像特征。在一些实现方式中,可以通过特征图表征前述图像特征。The first image feature may refer to an image feature obtained after the collected image undergoes feature extraction processing such as several convolutional layers and pooling layers. In some implementations, the offset extraction network can be a network constructed based on MASK-RCNN. The offset extraction network can perform feature extraction on the collected image through the included backbone network and the RoI Align unit to obtain the first image feature. In some implementation manners, the aforementioned image features may be characterized by a feature map.
在一些实现方式中,在执行S4022时,可以通过多种预设角度分别对应的旋转矩阵对第一图像特征中的各像素点进行位置变换,得到与多种预设角度分别对应的第二图像特征。然后在执行S4024时,可以通过诸如若干卷积层,池化层,全连接层以及映射单元(例如,softmax(柔性最大值传输函数))对第二图像特征进行处理,得到针对偏移量的提取结果,即第二预测偏移量。In some implementations, when executing S4022, the positions of the pixels in the first image feature can be transformed through rotation matrices corresponding to various preset angles to obtain second images respectively corresponding to various preset angles feature. Then when performing S4024, the second image feature can be processed by such as several convolutional layers, pooling layers, fully connected layers and mapping units (for example, softmax (soft maximum transfer function)), to obtain the offset value The result of the extraction is the second predicted offset.
在对图4示出的网络进行训练时,采集图像只会在偏移量提取网络内进行旋转,对于屋顶区域提取网络则仍然利用未旋转的采集图像进行训练。由此,即可在偏移量提取网络内对采集图像进行旋转变化,从而不会影响其它分支的训练。When training the network shown in Figure 4, the acquired image will only be rotated within the offset extraction network, and the unrotated acquired image is still used for training for the roof region extraction network. In this way, the rotation of the acquired image can be changed in the offset extraction network, so as not to affect the training of other branches.
在一些实现方式中,为了便于对偏移量提取网络进行训练,可以利用空间变换网络进行图像旋转,从而使旋转过程变的可导,使梯度可以正常反向传播,进而可以直接对网络进行训练。In some implementations, in order to facilitate the training of the offset extraction network, the spatial transformation network can be used to rotate the image, so that the rotation process becomes derivable, the gradient can be backpropagated normally, and the network can be directly trained .
在一些实现方式中,还可以在网络训练过程中引入建筑物边框信息,形成对网络训练的约束,从而提高网络训练效率,且有助于特征提取网络提取到与建筑物相关的特征。In some implementations, building frame information can also be introduced during network training to form constraints on network training, thereby improving network training efficiency and helping the feature extraction network to extract features related to buildings.
训练样本集中的所述至少部分采集图像还标注了建筑物边框真值信息。其中,所述建筑物边框信息可以是建筑物区域内中心像素点的坐标,以及建筑物区域的宽、高等信息。The at least part of the collected images in the training sample set are also marked with true value information of the building frame. Wherein, the building frame information may be the coordinates of the central pixel point in the building area, and information such as the width and height of the building area.
在进行底座提取网络训练时,可以利用所述建筑物底座提取网络包括的建筑物边框提取网络,提取所述至少部分采集图像对应的建筑物边框;其中,所述建筑物边框提取网络包括所述特征提取网络。然后可以基于所述至少部分采集图像标注的建筑物边框真值信息与针对所述至少部分采集图像获得的所述建筑物边框,调整所述建筑物边框提取网络的网络参数。When training the base extraction network, the building frame extraction network included in the building base extraction network can be used to extract the building frame corresponding to the at least part of the collected images; wherein, the building frame extraction network includes the Feature extraction network. Then the network parameters of the building frame extraction network may be adjusted based on the ground truth information of the building frame marked on the at least part of the captured image and the building frame obtained for the at least part of the captured image.
由此可以在网络训练时,引入建筑物边框信息,由于针对屋顶区域、屋顶位置、偏移量以及建筑物边框的这四个提取网络共享特征提取网络,因此,一方面可以使四个提取网络相互关联,共享特征提取网络,从而可以分享各个任务的监督信息,加速网络的收敛;另一方面,可以使针对屋顶区域、屋顶位置、偏移量的这三个提取网络可以感受到完整的建筑物区域特征,进而提升提取性能。Therefore, the building frame information can be introduced during network training. Since the four extraction networks for the roof area, roof position, offset and building frame share the feature extraction network, on the one hand, the four extraction networks can be made Interrelated and shared feature extraction network, so that the supervision information of each task can be shared, and the convergence of the network can be accelerated; on the other hand, the three extraction networks for roof area, roof position, and offset can feel the complete building. The features of the object region can be improved to improve the extraction performance.
在一些实现方式中,可以通过预训练,提升网络训练效率。In some implementations, the network training efficiency can be improved through pre-training.
在一些实现方式中,可以利用所述训练样本集中标注了第二屋顶区域真值信息,真实偏移量以及屋顶位置真值信息的采集图像,对所述建筑物底座提取网络进行预训练。In some implementation manners, pre-training may be performed on the building base extraction network by using the collected images labeled with the ground truth information of the second roof area, the real offset and the truth information of the roof position in the training sample set.
所述预训练的过程可以参照前述任一实现方式示出的网络训练过程。在一些实现方式中,为了达到最佳的网络预训练效果,在预训练中也可以采用联合训练的方式。其中所述训练样本集的至少部分采集图像可以包括屋顶区域、屋顶位置、底座区域、底座位置、偏移量、建筑物边框这六项的真值信息。所述底座提取网络可以包括共享特征提取网络的、针对屋顶区域、屋顶位置、偏移量、建筑物边框、底座区域损失信息以及底座位置损失信息的六个提取网络。所述六个提取网络可以作为底座提取网络的六个分支。其中,在一些实现方式中,由于建筑物的屋顶和底座的形状基本一致,因此,底座区域损失信息可以等价表示为屋顶区域损失信息。The pre-training process may refer to the network training process shown in any of the foregoing implementation manners. In some implementations, in order to achieve the best network pre-training effect, joint training may also be used in pre-training. Wherein at least part of the collected images of the training sample set may include the true value information of six items of roof area, roof position, base area, base position, offset, and building frame. The plinth extraction network may include six extraction networks for roof area, roof location, offset, building border, plinth area loss information, and plinth location loss information sharing a feature extraction network. The six extraction networks can be used as six branches of the base extraction network. Wherein, in some implementation manners, since the shapes of the roof and the base of the building are basically the same, the base area loss information may be equivalently represented as the roof area loss information.
在进行预训练时,可以将所述训练样本集的至少部分采集图像分别输入所述底座提取网络,得到所述六个分支的输出结果。然后可以根据所述至少部分采集图像标注的前述六项真值信息,以及所述输出结果得到损失信息,进而更新网络参数。如此即可对六个分支进行联合训练,提升底座提取网络训练效率和训练效果。During pre-training, at least part of the collected images of the training sample set may be input into the base extraction network to obtain the output results of the six branches. Then, the loss information can be obtained according to the aforementioned six items of true value information labeled with the at least part of the collected images, and the output results, and then the network parameters can be updated. In this way, the six branches can be jointly trained to improve the training efficiency and training effect of the base extraction network.
在一些实现方式中,在完成预训练后,可以利用训练样本集中的已标注采集图 像与未标注图像,随机输入网络进行训练。其中,已标注采集图像可以指标注有前述六项真值信息的至少部分采集图像。In some implementations, after the pre-training is completed, the labeled collected images and unlabeled images in the training sample set can be used to randomly input the network for training. Wherein, the marked captured image may refer to at least part of the captured image marked with the aforementioned six items of true value information.
由此可以提出一种合理的网络训练方案,即先利用具有丰富的真值信息的已标注采集图像通过联合训练的方式对网络进行系统的预训练,然后再混合已标注采集图像与未标注图像对底座提取网络的网络参数进行微调,从而一方面有助于利用少量已标注采集图像训练出高精度的底座提取网络;另一方面,可以提升网络训练效率。Therefore, a reasonable network training scheme can be proposed, that is, firstly, the network is systematically pre-trained by using the labeled images with rich real-value information through joint training, and then the labeled images are mixed with the unlabeled images. Fine-tuning the network parameters of the base extraction network, on the one hand, helps to train a high-precision base extraction network using a small number of labeled and collected images; on the other hand, it can improve the efficiency of network training.
以下结合具体训练场景进行实施例说明。Embodiments are described below in conjunction with specific training scenarios.
请参见图9,图9为本公开示出的一种建筑物底座提取网络训练流程示意图。本例中的训练方法可以部署在任意类型的电子设备中。Please refer to FIG. 9 . FIG. 9 is a schematic diagram of a network training process for building base extraction shown in the present disclosure. The training method in this example can be deployed in any type of electronic device.
图9示出的底座提取网络包括基于MASK-RCNN构建的网络。该网络可以包括分别提取屋顶区域、屋顶位置、偏移量、建筑物边框、底座区域损失信息以及底座位置损失信息的六个分支。其中,所述六个分支共用骨干网络,RPN候选框生成网络(以下称为RPN),以及RoI Align区域特征提取单元(以下称为RoI Align)。所述骨干网络可以是VGG(Visual Geometry Group,视觉几何组)网络、ResNet(Residual Network,残差网络)、HRNet(high-to-low resolution network,高分辨率到低分辨率网络)等,在本公开中不进行特别限定。The base extraction network shown in FIG. 9 includes a network constructed based on MASK-RCNN. The network may include six branches that extract roof area, roof location, offset, building border, plinth area loss information, and plinth location loss information, respectively. Wherein, the six branches share the backbone network, the RPN candidate frame generation network (hereinafter referred to as RPN), and the RoI Align region feature extraction unit (hereinafter referred to as RoI Align). The backbone network can be VGG (Visual Geometry Group, visual geometry group) network, ResNet (Residual Network, residual network), HRNet (high-to-low resolution network, high-resolution to low-resolution network), etc., in It does not specifically limit in this disclosure.
在训练网络前,可以获取针对多个区域的多组多时相图像(已经过采集图像配准)。然后可以从各组多时相图像中分别选出至少一帧图像进行人工标注,得到少量的有标注图像,即上述已标注采集图像。其中,所述有标注图像可以包括屋顶区域、屋顶位置、底座区域、底座位置、偏移量、建筑物边框这六项的真值信息。可以理解的是,由于多时相图像中同一建筑物的底座区域的形状和位置不会发生变化,因此,多时相图像中的未标注图像可以与有标注图像共享底座区域与底座位置的真值信息。Before training the network, multiple groups of multi-temporal images (which have been registered with the collected images) for multiple regions can be obtained. Then, at least one frame of images may be selected from each group of multi-temporal images for manual labeling to obtain a small number of labeled images, that is, the above-mentioned labeled collected images. Wherein, the labeled image may include truth value information of six items of roof area, roof position, base area, base position, offset, and building frame. It can be understood that since the shape and position of the base area of the same building in the multi-temporal image will not change, the unlabeled image in the multi-temporal image can share the ground truth information of the base area and base position with the labeled image .
在进行网络训练时,可以先利用有标注图像,通过联合训练的方式,对所述底座提取网络进行预训练。When performing network training, the base extraction network may be pre-trained by using labeled images first through joint training.
在预训练过程中,可以根据预训练迭代次数,执行多轮以下步骤:During the pre-training process, multiple rounds of the following steps can be performed according to the number of pre-training iterations:
将各有标注图像输入底座提取网络,得到各有标注图像对应的屋顶区域,屋顶位置,偏移量以及建筑物边框。Input each labeled image into the base extraction network to obtain the roof area, roof position, offset and building frame corresponding to each labeled image.
然后可以通过各有标注图像对应的屋顶区域、屋顶位置、偏移量、建筑物边框这四项的真值信息,得到对应四个分支的损失信息,并通过反向传播,更新四个分支的网络参数。Then, the loss information corresponding to the four branches can be obtained through the truth information of the four items of the roof area, roof position, offset, and building frame corresponding to the labeled image, and the four branches can be updated through back propagation. Network parameters.
以及,可以基于各有标注图像对应的屋顶区域、屋顶位置、偏移量,底座区域与底座位置这五项的真值信息,通过底座区域与底座位置损失信息确定分支,得到底座区域与底座位置损失信息,并通过反向传播,调整屋顶区域、屋顶位置、偏移量提取网络三个分支的网络参数。And, based on the truth value information of the five items of roof area, roof position, offset, base area and base position corresponding to each marked image, the branch can be determined through the loss information of the base area and base position, and the base area and base position can be obtained Loss information, and through backpropagation, adjust the network parameters of the three branches of the roof area, roof position, and offset extraction network.
在所述预训练过程中,由于采用联合训练方式,可以引入多方面的的学习信息,使训练过程既可以相互约束,又可以相互促进,从而可以提高网络训练效率,使得仅依赖少量的有标注图像即可初步得到提取效果比较好的网络。In the pre-training process, due to the joint training method, various learning information can be introduced, so that the training process can not only constrain each other, but also promote each other, thereby improving the network training efficiency, so that only a small number of labeled The image can initially obtain a network with a better extraction effect.
完成预训练后,可以混合有标注图像与未标注图像,并随机输入所述底座提取网络进行训练。After pre-training is completed, labeled images and unlabeled images can be mixed, and randomly input to the base extraction network for training.
其中,如果输入网络的是有标注图像,可以进行诸如预训练过程的联合训练。Among them, if the input network is a labeled image, joint training such as a pre-training process can be performed.
如果输入网络的是未标注图像,可以利用底座提取网络得到各未标注图像对应的屋顶区域、屋顶位置、偏移量。然后可以利用底座区域与底座位置损失信息确定分支,以及共享的底座区域与底座位置真值信息,得到底座区域与底座位置损失信息,并通过反向传播,更新提取屋顶区域、屋顶位置、偏移量提取网络这三个分支的网络参数。If the input network is an unlabeled image, the base extraction network can be used to obtain the roof area, roof position, and offset corresponding to each unlabeled image. Then you can use the base area and base position loss information to determine the branch, and the shared base area and base position true value information to get the base area and base position loss information, and update and extract the roof area, roof position, and offset through backpropagation Quantitatively extract the network parameters of these three branches of the network.
由此利用有标注图像与未标注图像,对预训练后的网络进行参数微调即可得到高精度的底座提取网络。Therefore, using labeled images and unlabeled images, fine-tuning the parameters of the pre-trained network can obtain a high-precision base extraction network.
所述通过联合训练方式,先进行预训练,然后再进行混合训练的方案,第一,可以提高网络训练效率,使得利用少量有标注图像可以得到提取效果比较好的网络,减轻对标注工作的依赖;第二,可以促进共享的特征提取网络(包括骨干网络与区域特征提取单元)提取到对底座区域提取更有益的特征,从而提升底座区域提取精准度。第三,可以使提取屋顶区域、屋顶位置、偏移量提取网络这三个分支可以感受到完整的建筑物区域特征,进而提升分支提取性能。The scheme of performing pre-training first and then performing mixed training through the joint training method, firstly, can improve the efficiency of network training, so that a network with better extraction effect can be obtained by using a small number of labeled images, and the dependence on labeling work can be reduced Second, it can promote the shared feature extraction network (including backbone network and regional feature extraction unit) to extract features that are more beneficial to base region extraction, thereby improving the accuracy of base region extraction. Third, the three branches of extracting the roof area, roof position, and offset extraction network can feel the complete building area characteristics, thereby improving the performance of branch extraction.
在通过上述实现方式得到训练好的建筑物底座提取网络后,可以通过该网络对待处理遥感图像进行建筑物底座提取。具体实现过程可以包括:After the trained building base extraction network is obtained through the above implementation, the building base can be extracted from the remote sensing image to be processed through the network. The specific implementation process may include:
接收待处理遥感图像;Receive remote sensing images to be processed;
利用建筑物底座提取网络,提取所述待处理遥感图像中的建筑物屋顶区域以及偏移量;其中,所述建筑物底座提取网络通过如前述任一实现方式示出的神经网络训练方法训练得到,所述偏移量表征屋顶区域与底座区域之间的偏移量;Use the building base extraction network to extract the building roof area and offset in the remote sensing image to be processed; wherein, the building base extraction network is obtained by training the neural network training method as shown in any of the aforementioned implementations , the offset characterizes the offset between the roof area and the base area;
利用所述偏移量对所述屋顶区域进行平移变换,得到所述待处理遥感图像对应的建筑物底座区域。The translation transformation is performed on the roof area by using the offset to obtain the building base area corresponding to the remote sensing image to be processed.
其中,待处理遥感图像可以是由任意能够采集到建筑物的图像的采集设备采集的遥感图像。在一些实现方式中,训练好的建筑物底座提取网络可以是如图9示出的网络。Wherein, the remote sensing image to be processed may be a remote sensing image collected by any collection device capable of collecting images of buildings. In some implementations, the trained building base extraction network may be a network as shown in FIG. 9 .
由此一方面,可以使用少量有标注样本训练出高精度的建筑物底座提取网络,降低网络训练成本,提升网络训练效率,进而降低底座提取成本。另一方面可以利用高精度底座提取网络进行底座提取,提升建筑物底座提取精度,进而提升针对建筑物的统计精度。Therefore, on the one hand, a small number of labeled samples can be used to train a high-precision building base extraction network, which can reduce network training costs, improve network training efficiency, and further reduce base extraction costs. On the other hand, the high-precision base extraction network can be used for base extraction to improve the accuracy of building base extraction, thereby improving the statistical accuracy of buildings.
与所述任一实现方式相对应的,本公开还提出一种神经网络训练装置100。Corresponding to any of the above implementation manners, the present disclosure also proposes a neural network training device 100 .
请参见图10,图10为本公开示出的一种神经网络训练装置结构示意图。Please refer to FIG. 10 , which is a schematic structural diagram of a neural network training device shown in the present disclosure.
如图10所示,所述装置100可以包括:As shown in Figure 10, the device 100 may include:
获取模块101,用于针对多个区域中的每个区域,获取与所述区域对应的一帧或多帧采集图像;其中,在所述区域对应多帧采集图像的情况下,存在至少两帧所述采集图像具有不同的采集角度;The acquisition module 101 is configured to acquire, for each of the multiple areas, one or more frames of captured images corresponding to the area; wherein, in the case where the area corresponds to multiple frames of captured images, there are at least two frames The collected images have different collection angles;
第一标注模块102,用于将所述区域对应的一帧所述采集图像作为所述区域对应的目标采集图像进行底座区域真值信息标注;The first labeling module 102 is configured to use a frame of the captured image corresponding to the area as a target captured image corresponding to the area to label the true value information of the base area;
第一确定模块103,用于将所述区域对应的目标采集图像所标注的底座区域真值信息,确定为所述区域对应的各帧采集图像的底座区域真值信息,基于所述多个区域分别对应的所述采集图像和所述目标采集图像,得到训练样本集以基于所述训练样本集进行神经网络训练。The first determination module 103 is configured to determine the ground truth information of the base area marked in the target acquisition image corresponding to the area as the ground area truth information of each frame acquisition image corresponding to the area, based on the plurality of areas Corresponding to the collected image and the target collected image respectively, a training sample set is obtained to perform neural network training based on the training sample set.
在示出的一些实现方式中,所述装置100还包括:In some implementations shown, the device 100 further includes:
第一训练模块106,用于获取所述训练样本集;A first training module 106, configured to acquire the training sample set;
利用建筑物底座提取网络,获得所述训练样本集中各采集图像分别对应的屋顶区域与偏移量;其中,所述偏移量表征屋顶区域与底座区域之间的偏移量;Using the building base extraction network to obtain the roof area and offset corresponding to each collected image in the training sample set; wherein the offset represents the offset between the roof area and the base area;
针对各采集图像,基于获得的所述采集图像对应的偏移量,对与所述采集图像对应的所述屋顶区域进行平移变换,获得所述采集图像对应的底座区域;For each collected image, based on the acquired offset corresponding to the collected image, performing a translation transformation on the roof area corresponding to the collected image to obtain a base area corresponding to the collected image;
基于所述各采集图像分别对应的底座区域真值信息以及针对所述各采集图像分别获得的底座区域,调整所述建筑物底座提取网络的网络参数。Adjusting the network parameters of the building base extraction network based on the ground truth information of the base area corresponding to the collected images and the base area respectively obtained for the collected images.
在示出的一些实现方式中,所述装置100还包括:In some implementations shown, the device 100 further includes:
第二标注模块104,用于对所述每个区域分别对应的所述目标采集图像进行底座位置真值信息标注;The second labeling module 104 is configured to label the base position true value information on the target acquisition images respectively corresponding to each area;
第二确定模块105,用于针对每个区域,将所述区域对应的目标采集图像所标注的底座位置真值信息,确定为所述区域对应的各帧采集图像的底座位置真值信息。The second determination module 105 is configured to, for each region, determine the true value information of the base position marked in the target captured image corresponding to the region as the true value information of the base position of each frame captured image corresponding to the region.
在示出的一些实现方式中,所述装置100还包括:In some implementations shown, the device 100 further includes:
第二训练模块107,用于获取所述训练样本集;A second training module 107, configured to acquire the training sample set;
利用建筑物底座提取网络包括的屋顶区域提取网络、偏移量提取网络,以及屋顶位置提取网络,获得所述训练样本集中各采集图像分别对应的屋顶区域、偏移量与屋顶位置,其中,所述偏移量表征屋顶区域与底座区域之间的偏移量;Using the roof area extraction network, offset extraction network, and roof position extraction network included in the building base extraction network, obtain the roof area, offset, and roof position corresponding to each collected image in the training sample set, wherein the The above offset characterizes the offset between the roof area and the base area;
基于所述各采集图像分别对应的底座区域真值信息,以及针对所述各采集图像分别获得的屋顶区域与偏移量,调整所述屋顶区域提取网络的网络参数;Adjusting the network parameters of the roof area extraction network based on the true value information of the base area corresponding to each of the collected images, and the roof area and the offset respectively obtained for each of the collected images;
基于所述各采集图像分别对应的底座位置真值信息,以及针对所述各采集图像分别获得的屋顶位置与偏移量,调整所述屋顶位置提取网络和所述偏移量提取网络的网络参数。Adjust the network parameters of the roof position extraction network and the offset extraction network based on the true value information of the base positions corresponding to the respective collected images, and the roof positions and offsets respectively obtained for the respective collected images. .
在示出的一些实现方式中,所述第二训练模块107,用于:In some implementations shown, the second training module 107 is used to:
针对所述各采集图像中的每帧图像,利用所述图像对应的偏移量,对所述图像对应的底座区域真值信息进行平移,得到所述图像对应的第一屋顶区域真值信息;For each frame of image in the collected images, using the offset corresponding to the image, the ground truth information of the base area corresponding to the image is translated to obtain the ground truth information of the first roof area corresponding to the image;
基于所述图像对应的所述第一屋顶区域真值信息与针对所述图像获得的屋顶区域,得到所述图像对应的屋顶区域损失信息;Obtaining roof area loss information corresponding to the image based on the ground truth information of the first roof area corresponding to the image and the roof area obtained for the image;
基于所述各采集图像分别对应的屋顶区域损失信息,通过反向传播调整所述屋顶区域提取网络的网络参数。Based on the roof area loss information corresponding to each of the collected images, the network parameters of the roof area extraction network are adjusted through back propagation.
在示出的一些实现方式中,所述第二训练模块107,用于:In some implementations shown, the second training module 107 is used to:
针对所述各采集图像中的每帧图像,利用所述图像对应的偏移量,对所述图像对应的屋顶位置进行平移,获得所述图像对应的底座位置;For each frame of image in each of the collected images, using the offset corresponding to the image, the position of the roof corresponding to the image is translated to obtain the position of the base corresponding to the image;
基于所述图像对应的底座位置真值信息以及针对所述图像获得的底座位置,得到所述图像对应的底座位置损失信息;Obtaining base position loss information corresponding to the image based on the base position truth information corresponding to the image and the base position obtained for the image;
基于所述各采集图像分别对应的底座位置损失信息,通过反向传播调整所述屋顶位置提取网络和所述偏移量提取网络的网络参数。Based on the base position loss information corresponding to each of the collected images, the network parameters of the roof position extraction network and the offset extraction network are adjusted through back propagation.
在示出的一些实现方式中,所述屋顶区域提取网络、偏移量提取网络与所述屋顶位置提取网络共享特征提取网络。In some illustrated implementations, the roof area extraction network, the offset extraction network and the roof position extraction network share a feature extraction network.
在示出的一些实现方式中,所述训练样本集的至少部分采集图像还标注了第二屋顶区域真值信息,真实偏移量以及屋顶位置真值信息;In some implementations shown, at least part of the collected images of the training sample set are also marked with the second roof area ground truth information, the real offset and the roof position ground truth information;
所述装置100还包括如下至少一项:The device 100 also includes at least one of the following:
第一调整模块,用于基于所述至少部分采集图像标注的第二屋顶区域真值信息以及针对所述至少部分采集图像获得的屋顶区域,调整所述屋顶区域提取网络的网络参数;A first adjustment module, configured to adjust the network parameters of the roof area extraction network based on the ground truth information of the second roof area marked on the at least part of the captured image and the roof area obtained for the at least part of the captured image;
第二调整模块,用于基于所述至少部分采集图像标注的真实偏移量以及针对所述至少部分采集图像获得的偏移量,调整所述偏移量提取网络的网络参数;The second adjustment module is configured to adjust the network parameters of the offset extraction network based on the real offset marked by the at least part of the captured image and the offset obtained for the at least part of the captured image;
第三调整模块,用于基于所述至少部分采集图像标注的屋顶位置真值信息以及针对所述至少部分采集图像获得的屋顶位置,调整所述屋顶位置提取网络的网络参数。The third adjustment module is configured to adjust the network parameters of the roof position extraction network based on the roof position ground truth information marked on the at least part of the collected images and the roof position obtained for the at least part of the collected images.
在示出的一些实现方式中,所述至少部分采集图像还标注了建筑物边框真值信息;所述装置100还包括:In some implementations shown, the at least part of the collected images are also marked with the true value information of the building frame; the device 100 also includes:
提取模块,用于利用所述建筑物底座提取网络包括的建筑物边框提取网络,提取所述至少部分采集图像对应的建筑物边框;其中,所述建筑物边框提取网络包括所述特征提取网络;The extraction module is configured to use the building frame extraction network included in the building base extraction network to extract the building frame corresponding to the at least part of the collected images; wherein the building frame extraction network includes the feature extraction network;
第四调整模块,用于基于所述至少部分采集图像标注的建筑物边框真值信息与针对所述至少部分采集图像获得的所述建筑物边框,调整所述建筑物边框提取网络的网络参数。The fourth adjustment module is configured to adjust the network parameters of the building frame extraction network based on the true value information of the building frame marked on the at least part of the collected images and the building frame obtained for the at least part of the collected images.
在示出的一些实现方式中,所述装置100还包括:In some implementations shown, the device 100 further includes:
预训练模块,用于利用所述训练样本集中标注了第二屋顶区域真值信息,真实 偏移量以及屋顶位置真值信息的采集图像,对所述建筑物底座提取网络进行预训练。The pre-training module is used to pre-train the network for extracting the building base by using the training sample set to mark the second roof area true value information, the collected images of the real offset and the roof position true value information.
在示出的一些实现方式中,所述训练样本集中的采集图像标注有第一真实偏移量;所述装置还包括:In some implementations shown, the collected images in the training sample set are marked with the first real offset; the device also includes:
偏移量获得模块,用于利用所述偏移量提取网络从多个旋转图像,获得与多种预设角度分别对应的第二预测偏移量;所述第二预测偏移量指示所述旋转图像中屋顶与底座之间的偏移量;所述多个旋转图像通过将所述采集图像分别旋转所述多种预设角度而得到;An offset obtaining module, configured to use the offset extraction network to obtain second predicted offsets respectively corresponding to various preset angles from a plurality of rotated images; the second predicted offset indicates the an offset between the roof and the base in the rotated image; the plurality of rotated images are obtained by rotating the collected images respectively through the various preset angles;
选择模块,用于将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量;A selection module, configured to rotate the first real offset by the multiple preset angles to obtain second real offsets respectively corresponding to the multiple preset angles;
第四调整模块,用于基于与所述多种预设角度分别对应的所述第二真实偏移量和获得的第二预测偏移量,调整所述偏移量提取网络的网络参数。The fourth adjustment module is configured to adjust the network parameters of the offset extraction network based on the second real offset corresponding to the various preset angles and the obtained second predicted offset.
在示出的一些实现方式中,所述偏移量获得模块,具体用于:针对所述多种预设角度中的每一预设角度,利用偏移量提取网络,将所述采集图像对应的第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征;In some of the illustrated implementation manners, the offset acquisition module is specifically configured to: use an offset extraction network for each preset angle of the plurality of preset angles to convert the acquired image to a corresponding The first image feature is rotated by the preset angle to obtain the second image feature corresponding to the preset angle;
基于所述第二图像特征,得到与所述预设角度对应的第二预测偏移量。Based on the second image feature, a second predicted offset corresponding to the preset angle is obtained.
与所述任一实现方式相对应的,本公开还提出一种图像处理装置。该装置可以包括:Corresponding to any of the above implementation manners, the present disclosure further proposes an image processing device. The device can include:
接收模块,用于接收待处理遥感图像;A receiving module, configured to receive remote sensing images to be processed;
提取模块,用于利用建筑物底座提取网络,提取所述待处理遥感图像中的建筑物屋顶区域以及偏移量;其中,所述建筑物底座提取网络通过如前述任一实现方式示出的神经网络训练方法训练得到,所述偏移量表征屋顶区域与底座区域之间的偏移量;The extraction module is configured to use the building base extraction network to extract the building roof area and offset in the remote sensing image to be processed; wherein, the building base extraction network uses the neural network as shown in any of the foregoing implementations. The network training method is trained, and the offset characterizes the offset between the roof area and the base area;
平移模块,用于利用所述偏移量对所述屋顶区域进行平移变换,得到所述待处理遥感图像对应的建筑物底座区域。A translation module, configured to use the offset to perform translation transformation on the roof area to obtain the building base area corresponding to the remote sensing image to be processed.
本公开示出的神经网络训练装置和/或图像处理装置的实施例可以应用于电子设备上。相应地,本公开公开了一种电子设备,该设备可以包括:处理器。The embodiments of the neural network training device and/or image processing device shown in the present disclosure can be applied to electronic equipment. Accordingly, the present disclosure discloses an electronic device, which may include: a processor.
用于存储处理器可执行指令的存储器。Memory used to store processor-executable instructions.
其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现前述神经网络训练方法和/或图像处理方法。Wherein, the processor is configured to invoke the executable instructions stored in the memory to implement the aforementioned neural network training method and/or image processing method.
请参见图11,图11为本公开示出的一种电子设备的硬件结构示意图。Please refer to FIG. 11 , which is a schematic diagram of a hardware structure of an electronic device shown in the present disclosure.
如图11所示,该电子设备可以包括用于执行指令的处理器,用于进行网络连接的网络接口,用于为处理器存储运行数据的内存,以及用于存储神经网络训练装置和/或图像处理装置对应指令的非易失性存储器。As shown in Figure 11, the electronic device may include a processor for executing instructions, a network interface for network connection, a memory for storing operating data for the processor, and a memory for storing neural network training devices and/or The image processing device corresponds to a non-volatile memory of instructions.
其中,装置的实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,除了图11所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。Wherein, the embodiment of the device may be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where it is located. From the perspective of hardware, in addition to the processor, memory, network interface, and non-volatile memory shown in Figure 11, the electronic device where the device in the embodiment is usually based on the actual function of the electronic device can also include other Hardware, no more details on this.
可以理解的是,为了提升处理速度,装置对应指令也可以直接存储于内存中,在此不作限定。It can be understood that, in order to increase the processing speed, the device corresponding instructions may also be directly stored in the memory, which is not limited herein.
本公开提出一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序可以用于使处理器执行前述神经网络训练方法和/或图像处理方法。The present disclosure proposes a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be used to cause a processor to execute the aforementioned neural network training method and/or image processing method.
本领域技术人员应明白,本公开一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本公开一个或多实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本公开一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(可以包括但不限于磁盘存 储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that one or more embodiments of the present disclosure may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may employ a computer implemented on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) with computer-usable program code embodied therein. The form of the Program Product.
本公开中的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”可以包括三种方案:A、B、以及“A和B”。"And/or" in the present disclosure means at least one of the two, for example, "A and/or B" may include three options: A, B, and "A and B".
本公开中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in the present disclosure is described in a progressive manner, the same and similar parts of the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
以上对本公开特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实现方式中,多任务处理和并行处理也是可以的或者可能是有利的。The specific embodiments of the present disclosure have been described above. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible, or may be advantageous, in certain implementations.
本公开中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、可以包括本公开中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本公开中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。Embodiments of the subject matter and functional operations described in this disclosure can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this disclosure and their structural equivalents, or their A combination of one or more of them. Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e. one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data The processing means executes. A computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
本公开中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。The processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
适合用于执行计算机程序的计算机可以包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件可以包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将可以包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。Computers suitable for the execution of a computer program may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory and/or a random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or Send data to it, or both. However, it is not necessary for a computer to have such a device. In addition, a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus (USB) ) portable storage devices like flash drives, to name a few.
适合于存储计算机程序指令和数据的计算机可读介质可以包括所有形式的非易失性存储器、媒介和存储器设备,例如可以包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。Computer-readable media suitable for storing computer program instructions and data may include all forms of non-volatile memory, media and memory devices and may include, for example, semiconductor memory devices such as EPROM, EEPROM and flash memory devices, magnetic disks such as internal hard drives or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.
虽然本公开包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本公开内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。While this disclosure contains many specific implementation details, these should not be construed as limitations on the scope of any disclosure or of what may be claimed, but rather as primarily describing features of particular disclosed embodiments. Certain features that are described in multiple embodiments within this disclosure can also be implemented in combination in a single embodiment. On the other hand, various features that are described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function in certain combinations as described and even initially claimed as such, one or more features from a claimed combination may in some cases be removed from that combination and the claimed A protected combination can point to a subcombination or a variant of a subcombination.
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,所述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。Similarly, while operations are depicted in the figures in a particular order, this should not be construed as requiring that those operations be performed in the particular order shown, or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of the various system modules and components in the described embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems can often be integrated together in a single software product, or packaged into multiple software products.
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。Thus, certain embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
以上仅为本公开一个或多个实施例的较佳实施例而已,并不用以限制本公开一个或多个实施例,凡在本公开一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开一个或多个实施例保护的范围之内。The above are only preferred embodiments of one or more embodiments of the present disclosure, and are not intended to limit one or more embodiments of the present disclosure. Any modification, equivalent replacement, improvement, etc. should be included in the protection scope of one or more embodiments of the present disclosure.

Claims (17)

  1. 一种神经网络训练方法,包括:A neural network training method, comprising:
    针对多个区域中的每个区域,For each of the multiple regions,
    获取与所述区域对应的一帧或多帧采集图像;其中,在所述区域对应多帧采集图像的情况下,存在至少两帧所述采集图像具有不同的采集角度;Acquiring one or more frames of captured images corresponding to the region; wherein, in the case where the region corresponds to multiple frames of captured images, there are at least two frames of the captured images with different capture angles;
    将所述区域对应的一帧所述采集图像作为所述区域对应的目标采集图像进行底座区域真值信息标注;Using one frame of the captured image corresponding to the area as the target captured image corresponding to the area to mark the true value information of the base area;
    将所述区域对应的所述目标采集图像所标注的底座区域真值信息,确定为所述区域对应的各帧所述采集图像的底座区域真值信息;determining the true value information of the base area marked in the target acquisition image corresponding to the area as the true value information of the base area of each frame of the acquisition image corresponding to the area;
    基于所述多个区域分别对应的所述采集图像和所述目标采集图像,得到训练样本集以基于所述训练样本集进行神经网络训练。A training sample set is obtained based on the collected images and the target collected images respectively corresponding to the multiple regions, so as to perform neural network training based on the training sample set.
  2. 根据权利要求1所述的方法,还包括:The method according to claim 1, further comprising:
    获取所述训练样本集;Obtain the training sample set;
    利用建筑物底座提取网络,Using the building base to extract the network,
    获得所述训练样本集中各所述采集图像分别对应的屋顶区域与偏移量;其中,所述偏移量表征屋顶区域与底座区域之间的偏移量;Obtain the roof area and offset corresponding to each of the collected images in the training sample set; wherein, the offset represents the offset between the roof area and the base area;
    针对各所述采集图像,基于获得的所述采集图像对应的所述偏移量,对与所述采集图像对应的所述屋顶区域进行平移变换,获得所述采集图像对应的底座区域;For each of the collected images, based on the obtained offset corresponding to the collected images, performing translation transformation on the roof area corresponding to the collected images to obtain a base area corresponding to the collected images;
    基于各所述采集图像分别对应的底座区域真值信息以及针对各所述采集图像分别获得的底座区域,调整所述建筑物底座提取网络的网络参数。Based on the ground truth information of the base area corresponding to each of the collected images and the base area obtained for each of the collected images, the network parameters of the building base extraction network are adjusted.
  3. 根据权利要求1所述的方法,所述训练样本集的获得过程还包括:According to the method according to claim 1, the obtaining process of the training sample set also includes:
    对所述每个区域分别对应的所述目标采集图像进行底座位置真值信息标注;Annotating the true value information of the base position on the target acquisition images corresponding to each of the regions;
    针对每个区域,将所述区域对应的目标采集图像所标注的底座位置真值信息,确定为所述区域对应的各帧所述采集图像的底座位置真值信息。For each area, the base position truth information marked in the target acquisition image corresponding to the area is determined as the base position truth information of each frame of the acquisition image corresponding to the area.
  4. 根据权利要求3所述的方法,还包括:The method according to claim 3, further comprising:
    获取所述训练样本集;Obtain the training sample set;
    利用建筑物底座提取网络包括的屋顶区域提取网络、偏移量提取网络,以及屋顶位置提取网络,获得所述训练样本集中各采集图像分别对应的屋顶区域、偏移量与屋顶位置,其中,所述偏移量表征屋顶区域与底座区域之间的偏移量;Using the roof area extraction network, offset extraction network, and roof position extraction network included in the building base extraction network, obtain the roof area, offset, and roof position corresponding to each collected image in the training sample set, wherein the The above offset characterizes the offset between the roof area and the base area;
    基于所述各采集图像分别对应的底座区域真值信息,以及针对所述各采集图像分别获得的所述屋顶区域与所述偏移量,调整所述屋顶区域提取网络的网络参数;Adjusting the network parameters of the roof area extraction network based on the ground truth information of the base area corresponding to the collected images, and the roof area and the offset respectively obtained for the collected images;
    基于所述各采集图像分别对应的底座位置真值信息,以及针对所述各采集图像分别获得的所述屋顶位置与所述偏移量,调整所述屋顶位置提取网络和所述偏移量提取网络的网络参数。Adjusting the roof position extraction network and the offset extraction based on the true value information of the base positions corresponding to the collected images, and the roof positions and the offsets respectively obtained for the collected images The network parameters of the network.
  5. 根据权利要求4所述的方法,所述基于所述各采集图像分别对应的底座区域真值信息,以及针对所述各采集图像分别获得的所述屋顶区域与所述偏移量,调整所述屋顶区域提取网络的网络参数,包括:According to the method according to claim 4, the base area truth value information corresponding to each of the collected images, and the roof area and the offset respectively obtained for each of the collected images are used to adjust the Network parameters of the roof area extraction network, including:
    针对所述各采集图像中的每帧图像,利用所述图像对应的所述偏移量,对所述图像对应的底座区域真值信息进行平移,得到所述图像对应的第一屋顶区域真值信息;For each frame of image in the collected images, use the offset corresponding to the image to translate the ground truth information of the base area corresponding to the image to obtain the truth value of the first roof area corresponding to the image information;
    基于所述图像对应的所述第一屋顶区域真值信息与针对所述图像获得的所述屋顶区域,得到所述图像对应的屋顶区域损失信息;Obtaining roof area loss information corresponding to the image based on the ground truth information of the first roof area corresponding to the image and the roof area obtained for the image;
    基于所述各采集图像分别对应的所述屋顶区域损失信息,通过反向传播调整所述屋顶区域提取网络的网络参数。Based on the roof area loss information respectively corresponding to the collected images, network parameters of the roof area extraction network are adjusted through back propagation.
  6. 根据权利要求4或5所述的方法,所述基于所述各采集图像分别对应的底座位置真值信息,以及针对所述各采集图像分别获得的所述屋顶位置与所述偏移量,调整所述屋顶位置提取网络和所述偏移量提取网络的网络参数,包括:According to the method according to claim 4 or 5, the base position truth information corresponding to each of the collected images, and the roof position and the offset respectively obtained for each of the collected images are adjusted based on the method according to claim 4 or 5. The network parameters of the roof position extraction network and the offset extraction network include:
    针对所述各采集图像中的每帧图像,利用所述图像对应的所述偏移量,对所述图像对应的所述屋顶位置进行平移,获得所述图像对应的底座位置;For each frame of image in the collected images, using the offset corresponding to the image, the position of the roof corresponding to the image is translated to obtain the position of the base corresponding to the image;
    基于所述图像对应的底座位置真值信息以及针对所述图像获得的所述底座位置,得到所述图像对应的底座位置损失信息;Obtaining base position loss information corresponding to the image based on the base position truth information corresponding to the image and the base position obtained for the image;
    基于所述各采集图像分别对应的所述底座位置损失信息,通过反向传播调整所述屋顶位置提取网络和所述偏移量提取网络的网络参数。Based on the base position loss information respectively corresponding to the acquired images, network parameters of the roof position extraction network and the offset extraction network are adjusted through back propagation.
  7. 根据权利要求4至6任一所述的方法,所述屋顶区域提取网络、偏移量提取网络与所述屋顶位置提取网络共享特征提取网络。The method according to any one of claims 4 to 6, wherein the roof area extraction network, the offset extraction network and the roof position extraction network share a feature extraction network.
  8. 根据权利要求7所述的方法,所述训练样本集的至少部分采集图像还标注了第二屋顶区域真值信息,真实偏移量以及屋顶位置真值信息;According to the method according to claim 7, at least part of the collected images of the training sample set are also marked with the second roof area true value information, the real offset and the roof position true value information;
    所述方法还包括如下至少一项:The method also includes at least one of the following:
    基于所述至少部分采集图像标注的所述第二屋顶区域真值信息以及针对所述至少部分采集图像获得的屋顶区域,调整所述屋顶区域提取网络的网络参数;adjusting network parameters of the roof region extraction network based on the ground truth information of the second roof region marked on the at least part of the captured image and the roof region obtained for the at least part of the collected image;
    基于所述至少部分采集图像标注的所述真实偏移量以及针对所述至少部分采集图像获得的偏移量,调整所述偏移量提取网络的网络参数;adjusting network parameters of the offset extraction network based on the real offset marked by the at least partially captured image and the offset obtained for the at least partially captured image;
    基于所述至少部分采集图像标注的所述屋顶位置真值信息以及针对所述至少部分采集图像获得的屋顶位置,调整所述屋顶位置提取网络的网络参数。Adjusting network parameters of the roof position extraction network based on the roof position ground truth information marked on the at least part of the captured image and the roof position obtained for the at least part of the captured image.
  9. 根据权利要求8所述的方法,所述至少部分采集图像还标注了建筑物边框真值信息;所述方法还包括:According to the method according to claim 8, said at least part of the captured image is also labeled with the true value information of the building border; said method also includes:
    利用所述建筑物底座提取网络包括的建筑物边框提取网络,提取所述至少部分采集图像对应的建筑物边框;其中,所述建筑物边框提取网络包括所述特征提取网络;Using the building frame extraction network included in the building base extraction network to extract the building frame corresponding to the at least part of the captured image; wherein the building frame extraction network includes the feature extraction network;
    基于所述至少部分采集图像标注的建筑物边框真值信息与针对所述至少部分采集图像获得的所述建筑物边框,调整所述建筑物边框提取网络的网络参数。Adjusting network parameters of the building frame extraction network based on the ground truth information of the building frame marked on the at least part of the captured image and the building frame obtained for the at least part of the captured image.
  10. 根据权利要求4至9任一所述的方法,还包括:The method according to any one of claims 4 to 9, further comprising:
    利用所述训练样本集中标注了第二屋顶区域真值信息,真实偏移量以及屋顶位置真值信息的采集图像,对所述建筑物底座提取网络进行预训练。Pre-training is performed on the building base extraction network by using the collected images marked with the true value information of the second roof area, the real offset and the true value information of the roof position in the training sample set.
  11. 根据权利要求4至10任一所述的方法,所述训练样本集中的采集图像标注有第一真实偏移量;所述方法还包括:According to the method according to any one of claims 4 to 10, the collected images in the training sample set are marked with the first real offset; the method also includes:
    利用所述偏移量提取网络从多个旋转图像,获得与多种预设角度分别对应的第二预测偏移量;所述第二预测偏移量指示所述旋转图像中屋顶区域与底座区域之间的偏移量;所述多个旋转图像通过将所述采集图像分别旋转所述多种预设角度而得到;Using the offset extraction network to obtain second predicted offsets corresponding to various preset angles from multiple rotated images; the second predicted offsets indicate the roof area and base area in the rotated image The offset between them; the plurality of rotated images are obtained by rotating the collected images respectively by the various preset angles;
    将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量;Rotating the first real offset by the multiple preset angles to obtain second real offsets respectively corresponding to the multiple preset angles;
    基于与所述多种预设角度分别对应的所述第二真实偏移量和所述第二预测偏移量,调整所述偏移量提取网络的网络参数。Adjusting network parameters of the offset extraction network based on the second real offset and the second predicted offset respectively corresponding to the various preset angles.
  12. 根据权利要求11所述的方法,所述利用偏移量提取网络从多个旋转图像,获得与多种预设角度分别对应的第二预测偏移量,包括:The method according to claim 11, said using the offset extraction network to obtain the second predicted offset corresponding to various preset angles respectively from multiple rotated images, comprising:
    针对所述多种预设角度中的每一预设角度,For each preset angle in the plurality of preset angles,
    利用所述偏移量提取网络,将所述采集图像对应的第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征;Using the offset extraction network to rotate the first image feature corresponding to the acquired image by the preset angle to obtain a second image feature corresponding to the preset angle;
    基于所述第二图像特征,得到与所述预设角度对应的第二预测偏移量。Based on the second image feature, a second predicted offset corresponding to the preset angle is obtained.
  13. 一种图像处理方法,包括:An image processing method, comprising:
    接收待处理遥感图像;Receive remote sensing images to be processed;
    利用建筑物底座提取网络,提取所述待处理遥感图像中的建筑物屋顶区域以及偏移量;其中,所述建筑物底座提取网络通过如权利要求1至12任一所述的神经网络训练方法训练得到,所述偏移量表征屋顶区域与底座区域之间的偏移量;Using the building base extraction network to extract the building roof area and offset in the remote sensing image to be processed; wherein, the building base extraction network passes the neural network training method according to any one of claims 1 to 12 Obtained by training, the offset characterizes the offset between the roof area and the base area;
    利用所述偏移量对所述屋顶区域进行平移变换,得到所述待处理遥感图像对应的建筑物底座区域。The translation transformation is performed on the roof area by using the offset to obtain the building base area corresponding to the remote sensing image to be processed.
  14. 一种神经网络训练装置,包括:A neural network training device, comprising:
    获取模块,用于针对多个区域中的每个区域,获取与所述区域对应的一帧或多帧采集图像;其中,在所述区域对应多帧采集图像的情况下,存在至少两帧所述采集图像具有不同的采集角度;An acquisition module, configured to acquire one or more frames of captured images corresponding to the region for each of the multiple regions; wherein, in the case where the region corresponds to multiple frames of captured images, there are at least two frames of captured images The collected images have different collection angles;
    第一标注模块,用于将所述区域对应的一帧所述采集图像作为所述区域对应的目标采集图像进行底座区域真值信息标注;The first labeling module is used to label the real value information of the base area by using a frame of the captured image corresponding to the area as the target captured image corresponding to the area;
    第一确定模块,用于将所述区域对应的所述目标采集图像所标注的底座区域真值信息,确定为所述区域对应的各帧所述采集图像的底座区域真值信息,基于所述多个区域分别对应的所述采集图像和所述目标采集图像,得到训练样本集以基于所述训练样本集进行神经网络训练。The first determination module is configured to determine the true value information of the base area marked in the target acquisition image corresponding to the area as the true value information of the base area of each frame of the acquisition image corresponding to the area, based on the A plurality of regions respectively correspond to the collected image and the target collected image to obtain a training sample set so as to perform neural network training based on the training sample set.
  15. 一种图像处理装置,包括:An image processing device, comprising:
    接收模块,用于接收待处理遥感图像;A receiving module, configured to receive remote sensing images to be processed;
    提取模块,用于利用建筑物底座提取网络,提取所述待处理遥感图像中的建筑物屋顶区域以及偏移量;其中,所述建筑物底座提取网络通过如权利要求1至12任一所述的神经网络训练方法训练得到,所述偏移量表征屋顶区域与底座区域之间的偏移量;The extraction module is configured to use the building base extraction network to extract the building roof area and offset in the remote sensing image to be processed; wherein, the building base extraction network passes through any one of claims 1 to 12 The neural network training method training obtains, and described offset characterizes the offset between the roof area and the base area;
    平移模块,用于利用所述偏移量对所述屋顶区域进行平移变换,得到所述待处理遥感图像对应的建筑物底座区域。A translation module, configured to use the offset to perform translation transformation on the roof area to obtain the building base area corresponding to the remote sensing image to be processed.
  16. 一种电子设备,包括:An electronic device comprising:
    处理器;processor;
    用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
    其中,所述处理器通过运行所述可执行指令以实现如权利要求1至12任一所述的神经网络训练方法和/或权利要求13所述的图像处理方法。Wherein, the processor implements the neural network training method as claimed in any one of claims 1 to 12 and/or the image processing method as claimed in claim 13 by running the executable instructions.
  17. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于使处理器执行如权利要求1至12任一所述的神经网络训练方法和/或权利要求13所述的图像处理方法。A computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to enable a processor to execute the neural network training method according to any one of claims 1 to 12 and/or according to claim 13 image processing method.
PCT/CN2021/137544 2021-05-31 2021-12-13 Methods for neural network training and image processing, apparatus, device and storage medium WO2022252558A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110602248.5A CN113344180A (en) 2021-05-31 2021-05-31 Neural network training and image processing method, device, equipment and storage medium
CN202110602248.5 2021-05-31

Publications (1)

Publication Number Publication Date
WO2022252558A1 true WO2022252558A1 (en) 2022-12-08

Family

ID=77473204

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137544 WO2022252558A1 (en) 2021-05-31 2021-12-13 Methods for neural network training and image processing, apparatus, device and storage medium

Country Status (3)

Country Link
CN (1) CN113344180A (en)
TW (1) TW202248910A (en)
WO (1) WO2022252558A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911501A (en) * 2024-03-20 2024-04-19 陕西中铁华博实业发展有限公司 High-precision positioning method for metal processing drilling
CN117911501B (en) * 2024-03-20 2024-06-04 陕西中铁华博实业发展有限公司 High-precision positioning method for metal processing drilling

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344180A (en) * 2021-05-31 2021-09-03 上海商汤智能科技有限公司 Neural network training and image processing method, device, equipment and storage medium
CN115096375B (en) * 2022-08-22 2022-11-04 启东亦大通自动化设备有限公司 Carrier roller running state monitoring method and device based on carrier roller carrying trolley detection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200098141A1 (en) * 2018-09-21 2020-03-26 Revive AI, Inc. Systems and methods for home improvement visualization
CN111931836A (en) * 2020-07-31 2020-11-13 上海商米科技集团股份有限公司 Method and device for acquiring neural network training image
CN112149585A (en) * 2020-09-27 2020-12-29 上海商汤智能科技有限公司 Image processing method, device, equipment and storage medium
CN112232425A (en) * 2020-10-21 2021-01-15 腾讯科技(深圳)有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN112329559A (en) * 2020-10-22 2021-02-05 空间信息产业发展股份有限公司 Method for detecting homestead target based on deep convolutional neural network
CN113344180A (en) * 2021-05-31 2021-09-03 上海商汤智能科技有限公司 Neural network training and image processing method, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991491A (en) * 2019-11-12 2020-04-10 苏州智加科技有限公司 Image labeling method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200098141A1 (en) * 2018-09-21 2020-03-26 Revive AI, Inc. Systems and methods for home improvement visualization
CN111931836A (en) * 2020-07-31 2020-11-13 上海商米科技集团股份有限公司 Method and device for acquiring neural network training image
CN112149585A (en) * 2020-09-27 2020-12-29 上海商汤智能科技有限公司 Image processing method, device, equipment and storage medium
CN112232425A (en) * 2020-10-21 2021-01-15 腾讯科技(深圳)有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN112329559A (en) * 2020-10-22 2021-02-05 空间信息产业发展股份有限公司 Method for detecting homestead target based on deep convolutional neural network
CN113344180A (en) * 2021-05-31 2021-09-03 上海商汤智能科技有限公司 Neural network training and image processing method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911501A (en) * 2024-03-20 2024-04-19 陕西中铁华博实业发展有限公司 High-precision positioning method for metal processing drilling
CN117911501B (en) * 2024-03-20 2024-06-04 陕西中铁华博实业发展有限公司 High-precision positioning method for metal processing drilling

Also Published As

Publication number Publication date
TW202248910A (en) 2022-12-16
CN113344180A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
Zhang et al. Jaguar: Low latency mobile augmented reality with flexible tracking
Brachmann et al. Visual camera re-localization from RGB and RGB-D images using DSAC
WO2022252558A1 (en) Methods for neural network training and image processing, apparatus, device and storage medium
Cheng et al. Panoptic-deeplab
WO2022252557A1 (en) Neural network training method and apparatus, image processing method and apparatus, device, and storage medium
WO2022062543A1 (en) Image processing method and apparatus, device and storage medium
CN109785298B (en) Multi-angle object detection method and system
US20120011119A1 (en) Object recognition system with database pruning and querying
US9984301B2 (en) Non-matching feature-based visual motion estimation for pose determination
CN102959946A (en) Augmenting image data based on related 3d point cloud data
JP2023516500A (en) Systems and methods for image-based location determination
WO2022141718A1 (en) Method and system for assisting point cloud-based object detection
CN112434618A (en) Video target detection method based on sparse foreground prior, storage medium and equipment
CN116935332A (en) Fishing boat target detection and tracking method based on dynamic video
WO2022247126A1 (en) Visual localization method and apparatus, and device, medium and program
Liao et al. Se-calib: Semantic edges based lidar-camera boresight online calibration in urban scenes
Tang et al. Fast multidirectional vehicle detection on aerial images using region based convolutional neural networks
Di et al. A unified framework for piecewise semantic reconstruction in dynamic scenes via exploiting superpixel relations
CN112200303B (en) Laser radar point cloud 3D target detection method based on context-dependent encoder
Wang et al. Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator
US20230089845A1 (en) Visual Localization Method and Apparatus
Chaturvedi et al. Small object detection using retinanet with hybrid anchor box hyper tuning using interface of Bayesian mathematics
CN114077892A (en) Human body skeleton sequence extraction and training method, device and storage medium
GB2592583A (en) Aligning images
Paul et al. Machine learning advances aiding recognition and classification of Indian monuments and landmarks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21943910

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21943910

Country of ref document: EP

Kind code of ref document: A1