WO2022252557A1 - Neural network training method and apparatus, image processing method and apparatus, device, and storage medium - Google Patents

Neural network training method and apparatus, image processing method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2022252557A1
WO2022252557A1 PCT/CN2021/137532 CN2021137532W WO2022252557A1 WO 2022252557 A1 WO2022252557 A1 WO 2022252557A1 CN 2021137532 W CN2021137532 W CN 2021137532W WO 2022252557 A1 WO2022252557 A1 WO 2022252557A1
Authority
WO
WIPO (PCT)
Prior art keywords
offset
image
extraction network
network
preset angles
Prior art date
Application number
PCT/CN2021/137532
Other languages
French (fr)
Chinese (zh)
Inventor
王金旺
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2022252557A1 publication Critical patent/WO2022252557A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a neural network training and image processing method, device, device and storage medium.
  • the using the offset extraction network to obtain the second predicted offsets respectively corresponding to various preset angles from multiple second sample images includes: for the various preset angles Each preset angle in the angle is set, and the first image feature corresponding to the first sample image is rotated by the preset angle by using the offset extraction network to obtain a second image corresponding to the preset angle A feature; based on the second image feature, a second predicted offset corresponding to the preset angle is obtained.
  • the space transformation network includes a sampler for image rotation based on interpolation; wherein, the sampler includes a sampling grid determined based on a preset angle corresponding to the space transformation network;
  • the sampling grid can characterize the pixel point correspondence between the first image feature and the second image feature; use the space transformation network to rotate the first image feature by a preset angle, and obtain the image corresponding to the preset angle
  • the second image feature includes: using the sampler to determine a plurality of pixel points corresponding to each pixel point in the second image feature in the first image feature by using the sampling grid, and based on an interpolation method The pixel values of the plurality of pixel points are mapped to obtain pixel values corresponding to each pixel point in the second image feature.
  • the network parameters of the offset extraction network are adjusted based on the second real offset and the second predicted offset respectively corresponding to the various preset angles , including: obtaining offset loss information respectively corresponding to the various preset angles according to the second real offset and the second predicted offset respectively corresponding to the various preset angles; based on The offset loss information respectively corresponding to the various preset angles adjusts network parameters of the offset extraction network.
  • the method for generating the training sample set for training the offset extraction network includes: for each of the multiple regions, obtaining one or more frames corresponding to the region An original sample image; wherein, in the case where the region corresponds to multiple frames of the original sample image, there are at least two frames of the original sample image with different acquisition angles; one frame of the original sample image corresponding to the region is used as the The first sample image corresponding to the area is marked with the ground truth information of the base area; the ground area ground truth information marked in the first sample image corresponding to the area is determined as each frame corresponding to the area The ground truth information of the base area of the original sample image is based on the original sample image and the first sample image respectively corresponding to the multiple areas to obtain a training sample set.
  • the present disclosure also proposes an image processing device, including: an acquisition module, configured to acquire a first target image to be processed; an offset acquisition module, configured to use an offset extraction network to obtain the same value from a plurality of second target images The second offset corresponding to a variety of preset angles; wherein, the offset extraction network includes a network trained by using the neural network training method shown in any of the preceding embodiments; the second offset Indicating the offset between the roof and the base in the second target image; the multiple second target images are obtained by rotating the first target image respectively at the various preset angles; the reverse rotation module is used to For each of the various preset angles, reversely rotate the second offset corresponding to the angle to obtain the reverse second offset corresponding to the angle; The inverse second offsets respectively corresponding to the preset angles are fused to obtain the first offsets corresponding to the first target image.
  • an acquisition module configured to acquire a first target image to be processed
  • an offset acquisition module configured to use an offset extraction network to obtain the same value from a plurality of second target images The second
  • the device further includes: a roof area obtaining module, configured to use the roof area extraction network included in the base area extraction network to obtain the roof area in the first target image; wherein, the The base area extraction network also includes the offset extraction network; the base area extraction network is trained by using the neural network training method shown in the foregoing embodiment; the translation module is used to use the first target image corresponding to An offset, performing a translation transformation on the obtained roof area to obtain a base area corresponding to the first target image.
  • a roof area obtaining module configured to use the roof area extraction network included in the base area extraction network to obtain the roof area in the first target image
  • the base area extraction network also includes the offset extraction network
  • the base area extraction network is trained by using the neural network training method shown in the foregoing embodiment
  • the translation module is used to use the first target image corresponding to An offset, performing a translation transformation on the obtained roof area to obtain a base area corresponding to the first target image.
  • the offset will also rotate the angle.
  • the effect of expanding the sample image with the real offset can be achieved. In this way, a small amount of labeled data with offsets can be used to train a high-precision offset extraction network.
  • the rotation process of the sample image can be placed in the offset extraction network, so that the sample image rotation can be performed inside the offset extraction network without affecting the training of other branches of the comprehensive network, that is, it will not affect other
  • the convergence speed of the branch improves the network training efficiency.
  • the synthesis network includes the offset extraction network.
  • STN Spacal Transformer Network
  • the rotation process becomes guideable, so that the gradient can be backpropagated normally, and then the offset extraction network can be directly trained.
  • Fig. 1 is a method flowchart of a neural network training method shown in the present disclosure
  • Fig. 2a is a schematic diagram of an offset shown in the present disclosure
  • FIG. 3 is a schematic diagram of a building base extraction process shown in the present disclosure
  • FIG. 6 is a schematic diagram of an offset extraction network training process shown in the present disclosure.
  • the first sample image may refer to a remote sensing image marked with a first real offset.
  • the offset refers to the offset between the roof and the base in the image.
  • the roof includes 10 pixels, and the base can be obtained by translating the 10 pixels according to the offset.
  • the first real offset may be information indicating the real offset between the roof and the base of the building in the first sample image.
  • the first real offset may be information in the form of (x, y) vector.
  • x and y represent the offsets of the pixel points in the roof region and the corresponding pixel points in the base region in the x-axis and y-axis directions, respectively.
  • the offset may be marked in advance according to the actual offset between the roof and the base of the building in the first sample image. The present disclosure does not specifically limit the labeling manner of the offset.
  • a MASK-RCNN with higher accuracy for region representation can be used.
  • the MASK-RCNN may include RPN (Region Proposal Network, candidate frame generation network), and RoI Align (Region of Interest Align, region of interest alignment) unit, etc.
  • the preset angle can be set according to business requirements.
  • the number of preset angles can be determined according to the sample size that needs to be expanded. For example, if a large number of samples need to be expanded, a large number of preset angles can be set.
  • the present disclosure does not specifically limit the value and quantity of the preset angles.
  • the various preset angles are used to rotate the sample image or image features corresponding to the sample image.
  • the spatial transformation network in order to facilitate the training of the offset extraction network, can be used to rotate the image, so that the rotation process becomes derivable, the gradient can be backpropagated normally, and the network can be directly trained .
  • FIG. 6 is a schematic diagram of an offset extraction network training process shown in the present disclosure.
  • the offset expansion unit may include 4 STN branches. As shown in FIG. 6 , the offset expansion unit can use the STN to respectively rotate the first image feature F0 by 0 degrees, 90 degrees, 180 degrees and 270 degrees to obtain the corresponding second image features F1-F4.
  • the offset expansion unit can also use a classifier to classify the second image features F1-F4 to obtain the second predicted offset corresponding to the rotation of the first sample image by 0 degrees, 90 degrees, 180 degrees and 270 degrees .
  • the classifier may include multiple convolutional layers, fully connected layers, and mapping units. In some implementation manners, in order to simplify the network structure, parameters of at least some convolutional layers and fully connected layers in multiple classifiers may be shared.
  • the first sample image is also marked with real roof area information.
  • the sample size can be expanded by rotating the sample image and the real offset, so as to achieve high-precision offset extraction using a small amount of labeled data. network effect.
  • the building frame information is introduced. Since the three extraction networks of the roof area, offset and building frame share the feature extraction network, on the one hand, the three extraction networks can be mutually Association, through the shared feature extraction network, the supervision information of each task can be shared, and the convergence of the network can be accelerated to achieve the effect of using a small amount of labeled data to train a high-precision base area extraction network; on the other hand, the roof area and offset can be doubled. An extraction network perceives the complete building area features, thereby improving the extraction performance.
  • the first image features corresponding to each first sample image can be rotated through the STNs corresponding to 0 degrees, 90 degrees, 180 degrees and 270 degrees respectively, and the offset is extracted to obtain The second prediction offsets corresponding to each first sample image after rotating 0 degree, 90 degree, 180 degree and 270 degree respectively.
  • the sample size can be expanded by rotating the sample image and the real offset, so as to achieve the effect of using a small amount of labeled data to train a high-precision offset extraction branch.
  • the rotation transformation of image features can be performed in the offset extraction branch without affecting the training of other branches, which improves the efficiency of network training.
  • the joint training method is adopted to enable the network to learn various information, and the training between the branches supervises and promotes each other, which improves the network training efficiency and achieves the goal of using a small amount of labeled data to train a high-precision base area extraction network. Effect.
  • feature extraction networks such as the shared backbone network can extract features that are more beneficial to base region extraction, thereby improving the accuracy of base region extraction.
  • FIG. 8 is a method flowchart of an image processing method shown in the present disclosure. As shown in Figure 8, the method may include:
  • the present disclosure also proposes a neural network training device 90 .
  • FIG. 9 is a schematic structural diagram of a neural network training device shown in the present disclosure.
  • a rotation module 92 configured to rotate the first real offset by the various preset angles to obtain second real offsets respectively corresponding to the various preset angles;
  • the adjustment module 93 is configured to adjust network parameters of the offset extraction network based on the second real offset and the second predicted offset respectively corresponding to the various preset angles.
  • the space transformation network includes a sampler for image rotation based on interpolation; wherein, the sampler includes a sampling grid determined based on a preset angle corresponding to the space transformation network;
  • the sampling grid can characterize the pixel point correspondence between the first image feature and the second image feature.
  • the obtaining module is configured to: use the sampler to determine a plurality of pixels in the first image feature corresponding to each pixel in the second image feature by using the sampling grid points, and map the pixel values of the plurality of pixel points based on an interpolation method to obtain pixel values corresponding to each pixel point in the second image feature.
  • the adjusting module 93 is configured to: obtain the second real offset corresponding to the various preset angles and the second predicted offset according to the Offset loss information corresponding to multiple preset angles; adjusting network parameters of the offset extraction network based on the offset loss information respectively corresponding to the multiple preset angles.
  • the device 90 further includes: a sample expansion module, configured to acquire, for each of the multiple regions, one or more frames of original sample images corresponding to the region; wherein, In the case where the region corresponds to multiple frames of original sample images, there are at least two frames of the original sample images with different acquisition angles; one frame of the original sample image corresponding to the region is used as the corresponding Annotate the ground truth information of the base area on the first sample image; determine the truth information of the base area marked in the first sample image corresponding to the area as the original sample image of each frame corresponding to the area The ground truth information of the base area is based on the original sample image and the first sample image respectively corresponding to the multiple areas to obtain a training sample set.
  • a sample expansion module configured to acquire, for each of the multiple regions, one or more frames of original sample images corresponding to the region; wherein, In the case where the region corresponds to multiple frames of original sample images, there are at least two frames of the original sample images with different acquisition angles; one frame of the original sample image corresponding to the region
  • the present disclosure further proposes an image processing device.
  • the device may include: an acquisition module, used to acquire the first target image to be processed; an offset acquisition module, used to use the offset extraction network to obtain images corresponding to various preset angles from multiple second target images The second offset; wherein, the offset extraction network includes a network trained by using the neural network training method shown in any of the preceding embodiments; the second offset indicates the second target image The offset between the roof and the base; the plurality of second target images are obtained by rotating the first target image respectively at the various preset angles; the reverse rotation module is used for the various For each angle in the preset angles, the second offset amount corresponding to the angle is reversely rotated to obtain the reverse second offset amount corresponding to the angle; the fusion module is used to separate the various preset angles The corresponding inverse second offset is fused to obtain the first offset corresponding to the first target image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present application provides a neural network training method and apparatus, an image processing method and apparatus, a device, and a storage medium. The method may comprise: obtaining, from a plurality of second sample images by means of an offset extraction network, second predicted offsets respectively corresponding to a plurality of preset angles, each second predicted offset indicating an offset between a roof and a base in a second sample image, a first sample image being labeled with a first real offset, and the plurality of second sample images being obtained by rotating the first sample image by the plurality of preset angles respectively; rotating the first real offset by the plurality of preset angles to obtain second real offsets respectively corresponding to the plurality of preset angles, respectively; and adjusting network parameters of the offset extraction network on the basis of the second real offsets and second predicted offsets respectively corresponding to the plurality of preset angles.

Description

神经网络训练与图像处理方法、装置、设备和存储介质Neural network training and image processing method, device, equipment and storage medium
相关公开的交叉引用Cross References to Related Publications
本公开要求于2021年5月31日提交的、申请号为2021106022362的中国专利公开的优先权,该中国专利公开的全部内容以引用的方式并入本文中。This disclosure claims the priority of Chinese Patent Publication No. 2021106022362 filed on May 31, 2021, the entire content of which is incorporated herein by reference.
技术领域technical field
本公开涉及计算机技术领域,具体涉及一种神经网络训练与图像处理方法、装置、设备和存储介质。The present disclosure relates to the field of computer technology, and in particular to a neural network training and image processing method, device, device and storage medium.
背景技术Background technique
随着城市化率的逐步提高,需要对建筑物进行及时统计以完成城市规划、地图绘制和建筑物变化监测等任务。With the gradual increase of the urbanization rate, timely statistics of buildings are required to complete tasks such as urban planning, map drawing, and building change monitoring.
目前,主要通过统计建筑物底座进行建筑物统计。其中,在统计建筑物底座时,需要先利用基于神经网络生成的偏移量提取网络提取出表征屋顶与底座之间偏移的偏移量以及利用屋顶区域提取网络提取出建筑物屋顶,然后利用所述偏移量对所述屋顶进行变换得到底座。Currently, building statistics are mainly performed by counting building bases. Among them, when counting the building base, it is necessary to use the offset extraction network generated based on the neural network to extract the offset representing the offset between the roof and the base, and use the roof area extraction network to extract the roof of the building, and then use The offset transforms the roof to obtain a base.
然而数据标注成本很高,因此无法获取大量包括真实偏移量的有标注样本,而使用少量的有标注样本难以训练出高精度的偏移量提取网络。However, the cost of data labeling is very high, so it is impossible to obtain a large number of labeled samples including real offsets, and it is difficult to train a high-precision offset extraction network with a small number of labeled samples.
发明内容Contents of the invention
有鉴于此,本公开至少公开一种神经网络训练方法。该方法可以包括:利用偏移量提取网络从多个第二样本图像,获得与多种预设角度分别对应的第二预测偏移量;所述第二预测偏移量指示所述第二样本图像中屋顶与底座之间的偏移量;所述第一样本图像标注有第一真实偏移量;所述多个第二样本图像通过将第一样本图像分别旋转所述多种预设角度而得到;将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量;基于与所述多种预设角度分别对应的第二真实偏移量和所述第二预测偏移量,调整所述偏移量提取网络的网络参数。In view of this, the present disclosure at least discloses a neural network training method. The method may include: using an offset extraction network to obtain second predicted offsets respectively corresponding to various preset angles from a plurality of second sample images; the second predicted offsets indicate the second sample The offset between the roof and the base in the image; the first sample image is marked with the first real offset; obtained by setting an angle; respectively rotating the first real offset by the multiple preset angles to obtain the second real offset corresponding to the multiple preset angles; based on the multiple preset angles Set the second real offset and the second predicted offset corresponding to the angles, and adjust the network parameters of the offset extraction network.
在示出的一些实施方式中,所述利用偏移量提取网络从多个第二样本图像,获得与多种预设角度分别对应的第二预测偏移量,包括:针对所述多种预设角度中的每一预设角度,利用偏移量提取网络,将所述第一样本图像对应的第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征;基于所述第二图像特征,得到与所述预设角度对应的第二预测偏移量。In some of the illustrated embodiments, the using the offset extraction network to obtain the second predicted offsets respectively corresponding to various preset angles from multiple second sample images includes: for the various preset angles Each preset angle in the angle is set, and the first image feature corresponding to the first sample image is rotated by the preset angle by using the offset extraction network to obtain a second image corresponding to the preset angle A feature; based on the second image feature, a second predicted offset corresponding to the preset angle is obtained.
在示出的一些实施方式中,所述利用偏移量提取网络,将所述第一样本图像对应的第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征,包括:利用所述偏移量提取网络包括的与所述预设角度对应的空间变换网络,将所述第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征。In some of the illustrated embodiments, the offset extraction network is used to rotate the first image feature corresponding to the first sample image by the preset angle to obtain the second image feature corresponding to the preset angle. The image features include: using the space transformation network included in the offset extraction network corresponding to the preset angle to rotate the first image feature by the preset angle to obtain the image corresponding to the preset angle Second image features.
在示出的一些实施方式中,所述空间变换网络包括基于插值方式进行图像旋转的采样器;其中,所述采样器包括基于所述空间变换网络对应的预设角度确定的采样网格;所述采样网格能够表征第一图像特征与第二图像特征之间的像素点对应关系;利用所述空间变换网络将所述第一图像特征旋转预设角度,得到与所述预设角度对应的第二图像特征,包括:通过所述采样器,利用所述采样网格,确定所述第一图像特征中,与第二 图像特征中各像素点分别对应的多个像素点,并基于插值方式对所述多个像素点的像素值进行映射,得到第二图像特征中各像素点分别对应的像素值。In some of the illustrated embodiments, the space transformation network includes a sampler for image rotation based on interpolation; wherein, the sampler includes a sampling grid determined based on a preset angle corresponding to the space transformation network; The sampling grid can characterize the pixel point correspondence between the first image feature and the second image feature; use the space transformation network to rotate the first image feature by a preset angle, and obtain the image corresponding to the preset angle The second image feature includes: using the sampler to determine a plurality of pixel points corresponding to each pixel point in the second image feature in the first image feature by using the sampling grid, and based on an interpolation method The pixel values of the plurality of pixel points are mapped to obtain pixel values corresponding to each pixel point in the second image feature.
在示出的一些实施方式中,所述基于与所述多种预设角度分别对应的第二真实偏移量和所述第二预测偏移量,调整所述偏移量提取网络的网络参数,包括:根据与所述多种预设角度分别对应的第二真实偏移量和所述第二预测偏移量,得到与所述多种预设角度分别对应的偏移量损失信息;基于与所述多种预设角度分别对应的偏移量损失信息,调整所述偏移量提取网络的网络参数。In some of the illustrated embodiments, the network parameters of the offset extraction network are adjusted based on the second real offset and the second predicted offset respectively corresponding to the various preset angles , including: obtaining offset loss information respectively corresponding to the various preset angles according to the second real offset and the second predicted offset respectively corresponding to the various preset angles; based on The offset loss information respectively corresponding to the various preset angles adjusts network parameters of the offset extraction network.
在示出的一些实施方式中,所述第一样本图像还标注有屋顶区域真实信息;所述方法还包括:利用屋顶区域提取网络,获得所述第一样本图像中的屋顶区域预测信息;其中,所述屋顶区域提取网络与所述偏移量提取网络共享特征提取网络,并且属于同一底座区域提取网络;所述底座区域提取网络用于基于获得的屋顶区域与偏移量得到底座区域;基于所述屋顶区域真实信息以及所述屋顶区域预测信息,对所述屋顶区域提取网络进行训练。In some of the illustrated embodiments, the first sample image is also marked with real roof area information; the method further includes: using a roof area extraction network to obtain roof area prediction information in the first sample image ; Wherein, the roof area extraction network and the offset extraction network share a feature extraction network and belong to the same base area extraction network; the base area extraction network is used to obtain the base area based on the obtained roof area and offset ; Based on the real roof area information and the roof area prediction information, the roof area extraction network is trained.
在示出的一些实施方式中,所述底座区域提取网络包括建筑物边框提取网络,所述建筑物边框提取网络包括所述特征提取网络,所述第一样本图像还标注有建筑物边框真实信息;所述方法还包括:利用所述建筑物边框提取网络,获得所述第一样本图像中的建筑物边框预测信息;基于所述建筑物边框真实信息以及所述建筑物边框预测信息,对所述建筑物边框提取网络进行训练。In some of the illustrated embodiments, the base area extraction network includes a building frame extraction network, the building frame extraction network includes the feature extraction network, and the first sample image is also marked with the building frame real information; the method further includes: using the building frame extraction network to obtain building frame prediction information in the first sample image; based on the building frame real information and the building frame prediction information, The building frame extraction network is trained.
在示出的一些实施方式中,用于训练所述偏移量提取网络的训练样本集的生成方法包括:针对多个区域中的每个区域,获取与所述区域对应的一帧或多帧原始样本图像;其中,在所述区域对应多帧原始样本图像的情况下,存在至少两帧所述原始样本图像具有不同的采集角度;将所述区域对应的一帧所述原始样本图像作为所述区域对应的所述第一样本图像进行底座区域真值信息标注;将所述区域对应的所述第一样本图像所标注的底座区域真值信息,确定为所述区域对应的各帧所述原始样本图像的底座区域真值信息,基于所述多个区域分别对应的所述原始样本图像和所述第一样本图像得到训练样本集。In some of the illustrated embodiments, the method for generating the training sample set for training the offset extraction network includes: for each of the multiple regions, obtaining one or more frames corresponding to the region An original sample image; wherein, in the case where the region corresponds to multiple frames of the original sample image, there are at least two frames of the original sample image with different acquisition angles; one frame of the original sample image corresponding to the region is used as the The first sample image corresponding to the area is marked with the ground truth information of the base area; the ground area ground truth information marked in the first sample image corresponding to the area is determined as each frame corresponding to the area The ground truth information of the base area of the original sample image is based on the original sample image and the first sample image respectively corresponding to the multiple areas to obtain a training sample set.
本公开提出一种图像处理方法,包括:获取待处理的第一目标图像;利用偏移量提取网络从多个第二目标图像,获得与多种预设角度分别对应的第二偏移量;其中,所述偏移量提取网络包括利用如前述任一实施方式示出的神经网络训练方法训练得到的网络;所述第二偏移量指示第二目标图像中屋顶与底座之间的偏移量;所述多个第二目标图像通过将所述第一目标图像分别旋转所述多种预设角度而得到;针对所述多种预设角度中的各角度,将所述角度对应的第二偏移量进行逆向旋转,得到所述角度对应的逆向第二偏移量;对所述多种预设角度分别对应的逆向第二偏移量进行融合,得到所述第一目标图像对应的第一偏移量。The present disclosure proposes an image processing method, including: acquiring a first target image to be processed; using an offset extraction network to obtain second offsets respectively corresponding to various preset angles from multiple second target images; Wherein, the offset extraction network includes a network trained by using the neural network training method shown in any of the preceding embodiments; the second offset indicates the offset between the roof and the base in the second target image amount; the multiple second target images are obtained by rotating the first target image through the multiple preset angles; for each of the multiple preset angles, the first target image corresponding to the angle The two offsets are reversely rotated to obtain the reverse second offset corresponding to the angle; the reverse second offsets corresponding to the various preset angles are fused to obtain the corresponding first target image first offset.
在示出的一些实施方式中,所述方法还包括:利用底座区域提取网络包括的屋顶区域提取网络,获得所述第一目标图像中的屋顶区域;其中,所述底座区域提取网络还包括所述偏移量提取网络;所述底座区域提取网络利用如前述实施方式示出的神经网络训练方法训练得到;利用所述第一目标图像对应的第一偏移量,对获得的所述屋顶区域进行平移变换,得到所述第一目标图像对应的底座区域。In some of the illustrated embodiments, the method further includes: using the roof area extraction network included in the base area extraction network to obtain the roof area in the first target image; wherein the base area extraction network further includes the The offset extraction network; the base area extraction network is trained using the neural network training method shown in the foregoing implementation manner; using the first offset corresponding to the first target image, the obtained roof area Perform translation transformation to obtain the base area corresponding to the first target image.
本公开还提出一种神经网络训练装置,包括:获得模块,用于利用偏移量提取网络从多个第二样本图像,获得与多种预设角度分别对应的第二预测偏移量;所述第二预测偏移量指示第二样本图像中屋顶与底座之间的偏移量;所述多个第二样本图像通过将第一样本图像分别旋转所述多种预设角度而得到;所述第一样本图像标注有第一真实偏移量;旋转模块,用于将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多 种预设角度分别对应的第二真实偏移量;调整模块,用于基于与所述多种预设角度分别对应的第二真实偏移量和所述第二预测偏移量,调整所述偏移量提取网络的网络参数。The present disclosure also proposes a neural network training device, including: an obtaining module, configured to use an offset extraction network to obtain second predicted offsets respectively corresponding to various preset angles from a plurality of second sample images; The second predicted offset indicates the offset between the roof and the base in the second sample image; the plurality of second sample images are obtained by rotating the first sample image by the various preset angles; The first sample image is marked with a first real offset; the rotation module is used to rotate the first real offset by the various preset angles respectively to obtain the various preset angles respectively A corresponding second real offset; an adjustment module, configured to adjust the offset extraction network based on the second real offset and the second predicted offset respectively corresponding to the various preset angles network parameters.
本公开还提出一种图像处理装置,包括:获取模块,用于获取待处理的第一目标图像;偏移量获得模块,用于利用偏移量提取网络从多个第二目标图像,获得与多种预设角度分别对应的第二偏移量;其中,所述偏移量提取网络包括利用如前述任一实施方式示出的神经网络训练方法训练得到的网络;所述第二偏移量指示第二目标图像中屋顶与底座之间的偏移量;所述多个第二目标图像通过将所述第一目标图像分别旋转所述多种预设角度而得到;逆向旋转模块,用于针对所述多种预设角度中的各角度,将所述角度对应的第二偏移量进行逆向旋转,得到所述角度对应的逆向第二偏移量;融合模块,用于对所述多种预设角度分别对应的逆向第二偏移量进行融合,得到所述第一目标图像对应的第一偏移量。The present disclosure also proposes an image processing device, including: an acquisition module, configured to acquire a first target image to be processed; an offset acquisition module, configured to use an offset extraction network to obtain the same value from a plurality of second target images The second offset corresponding to a variety of preset angles; wherein, the offset extraction network includes a network trained by using the neural network training method shown in any of the preceding embodiments; the second offset Indicating the offset between the roof and the base in the second target image; the multiple second target images are obtained by rotating the first target image respectively at the various preset angles; the reverse rotation module is used to For each of the various preset angles, reversely rotate the second offset corresponding to the angle to obtain the reverse second offset corresponding to the angle; The inverse second offsets respectively corresponding to the preset angles are fused to obtain the first offsets corresponding to the first target image.
在示出的一些实施方式中,所述装置还包括:屋顶区域获得模块,用于利用底座区域提取网络包括的屋顶区域提取网络,获得所述第一目标图像中的屋顶区域;其中,所述底座区域提取网络还包括所述偏移量提取网络;所述底座区域提取网络利用如前述实施方式示出的神经网络训练方法训练得到;平移模块,用于利用所述第一目标图像对应的第一偏移量,对获得的所述屋顶区域进行平移变换,得到所述第一目标图像对应的底座区域。In some of the illustrated embodiments, the device further includes: a roof area obtaining module, configured to use the roof area extraction network included in the base area extraction network to obtain the roof area in the first target image; wherein, the The base area extraction network also includes the offset extraction network; the base area extraction network is trained by using the neural network training method shown in the foregoing embodiment; the translation module is used to use the first target image corresponding to An offset, performing a translation transformation on the obtained roof area to obtain a base area corresponding to the first target image.
本公开还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器通过运行所述可执行指令以实现所述的神经网络训练方法和/或的图像处理方法。The present disclosure also proposes an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein, the processor executes the executable instructions to implement the neural network training method and/or image processing method.
本公开还提出一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于使处理器执行所述的神经网络训练方法和/或的图像处理方法。The present disclosure also proposes a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to make a processor execute the neural network training method and/or the image processing method.
在前述实施方式示出的方案中,第一,由于可以利用偏移量提取网络,获得与多种预设角度分别对应的第二预测偏移量,以及将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量,然后可以利用与所述多种预设角度分别对应的第二真实偏移量和获得的第二预测偏移量,调整所述偏移量提取网络的网络参数。In the solutions shown in the foregoing embodiments, first, because the offset extraction network can be used to obtain the second predicted offsets corresponding to various preset angles, and the first real offsets are respectively rotate the various preset angles to obtain second real offsets respectively corresponding to the various preset angles, and then use the sum of the second real offsets respectively corresponding to the various preset angles to obtain The second predicted offset, adjusting the network parameters of the offset extraction network.
因此可以利用图像旋转一定角度后,偏移量也会旋转该角度的特性,通过对图像(或其图像特征)和真实偏移量进行旋转,达到扩充具有真实偏移量的样本图像的效果,从而可以利用少量标注了偏移量的标注数据,训练得到高精度偏移量提取网络。Therefore, after the image is rotated by a certain angle, the offset will also rotate the angle. By rotating the image (or its image features) and the real offset, the effect of expanding the sample image with the real offset can be achieved. In this way, a small amount of labeled data with offsets can be used to train a high-precision offset extraction network.
第二,可以将样本图像的旋转过程置于偏移量提取网络中,由此可以在偏移量提取网络内部进行样本图像旋转,不会影响综合网络的其它分支的训练,即不会影响其它分支的收敛速度,进而提升了网络训练效率。所述综合网络包括所述偏移量提取网络。Second, the rotation process of the sample image can be placed in the offset extraction network, so that the sample image rotation can be performed inside the offset extraction network without affecting the training of other branches of the comprehensive network, that is, it will not affect other The convergence speed of the branch improves the network training efficiency. The synthesis network includes the offset extraction network.
第三,可以利用STN(Spatial Transformer Network,空间变换网络)进行图像旋转,从而使旋转过程变的可导,使梯度可以正常反向传播,进而可以直接对偏移量提取网络进行训练。Third, STN (Spatial Transformer Network) can be used for image rotation, so that the rotation process becomes guideable, so that the gradient can be backpropagated normally, and then the offset extraction network can be directly trained.
应当理解的是,以上所述的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
为了更清楚地说明本公开一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附 图仅仅是本公开一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure or related technologies, the following will briefly introduce the drawings that need to be used in the descriptions of the embodiments or related technologies. Obviously, the accompanying drawings in the following description The drawings are only some embodiments described in one or more embodiments of the present disclosure, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.
图1为本公开示出的一种神经网络训练方法的方法流程图;Fig. 1 is a method flowchart of a neural network training method shown in the present disclosure;
图2a为本公开示出的一种偏移量示意图;Fig. 2a is a schematic diagram of an offset shown in the present disclosure;
图2b为本公开示出的一种图像旋转90度后偏移量示意图;Fig. 2b is a schematic diagram of the offset after an image is rotated by 90 degrees shown in the present disclosure;
图3为本公开示出的一种建筑物底座提取流程示意图;FIG. 3 is a schematic diagram of a building base extraction process shown in the present disclosure;
图4为本公开示出的一种偏移量提取流程示意图;FIG. 4 is a schematic diagram of an offset extraction process shown in the present disclosure;
图5为本公开示出的一种利用空间变换网络进行图像旋转的流程示意图;FIG. 5 is a schematic flow diagram of image rotation using a space transformation network shown in the present disclosure;
图6为本公开示出的一种偏移量提取网络训练流程示意图;FIG. 6 is a schematic diagram of an offset extraction network training process shown in the present disclosure;
图7为本公开示出的一种建筑物底座提取流程示意图;FIG. 7 is a schematic diagram of a building base extraction process shown in the present disclosure;
图8为本公开示出的一种图像处理方法的方法流程图;FIG. 8 is a method flowchart of an image processing method shown in the present disclosure;
图9为本公开示出的一种神经网络训练装置的结构示意图;FIG. 9 is a schematic structural diagram of a neural network training device shown in the present disclosure;
图10为本公开示出的一种电子设备的硬件结构示意图。FIG. 10 is a schematic diagram of a hardware structure of an electronic device shown in the present disclosure.
具体实施方式Detailed ways
下面将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的设备和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of devices and methods consistent with aspects of the present disclosure as recited in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在可以包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。还应当理解,本文中所使用的词语“如果”,取决于语境,可以被解释成为“在……时”或“当……时”或“响应于确定”。The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if", as used herein, could be interpreted as "at" or "when" or "in response to a determination", depending on the context.
本公开旨在提出一种神经网络训练方法。该方法利用图像旋转一定角度后,偏移量也会旋转该角度的特性,通过对图像(或其图像特征)和真实偏移量进行旋转,达到扩充具有真实偏移量的样本图像的效果,从而可以利用少量标注了偏移量的标注数据,训练得到高精度偏移量提取网络。The present disclosure aims to propose a neural network training method. This method utilizes the characteristic that after the image is rotated by a certain angle, the offset will also rotate the angle. By rotating the image (or its image features) and the real offset, the effect of expanding the sample image with the real offset is achieved. In this way, a small amount of labeled data with offsets can be used to train a high-precision offset extraction network.
请参见图1,图1为本公开示出的一种神经网络训练方法的方法流程图。如图1所示,所述方法可以包括:Please refer to FIG. 1 . FIG. 1 is a method flowchart of a neural network training method shown in the present disclosure. As shown in Figure 1, the method may include:
S102,利用偏移量提取网络从多个第二样本图像,获得与多种预设角度分别对应的第二预测偏移量;所述第二预测偏移量指示所述第二样本图像中屋顶与底座之间的偏移量;所述多个第二样本图像通过将第一样本图像分别旋转所述多种预设角度而得到。S102, using the offset extraction network to obtain second predicted offsets respectively corresponding to various preset angles from a plurality of second sample images; the second predicted offsets indicate roofs in the second sample images The offset between the base and the base; the plurality of second sample images are obtained by rotating the first sample image by the various preset angles respectively.
S104,将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量。S104. Rotate the first real offset by the multiple preset angles respectively to obtain second real offsets respectively corresponding to the multiple preset angles.
S106,基于与所述多种预设角度分别对应的所述第二真实偏移量和所述第二预测偏移量,调整所述偏移量提取网络的网络参数。S106. Adjust network parameters of the offset extraction network based on the second real offset and the second predicted offset respectively corresponding to the various preset angles.
所述神经网络训练方法可以应用于电子设备中。其中,所述电子设备可以通过搭载 与神经网络训练方法对应的软件装置执行所述方法。所述电子设备的类型可以是笔记本电脑,计算机,服务器,手机,PAD终端等。在本公开中不特别限定所述电子设备的类型。所述电子设备可以是客户端设备或服务端设备。所述服务端设备可以是云端。以下以执行主体为电子设备(以下简称设备)为例进行说明。The neural network training method can be applied to electronic equipment. Wherein, the electronic device can execute the method by carrying a software device corresponding to the neural network training method. The type of the electronic device may be a notebook computer, a computer, a server, a mobile phone, a PAD terminal and the like. The type of the electronic device is not particularly limited in the present disclosure. The electronic device may be a client device or a server device. The server device may be a cloud. In the following, an electronic device (hereinafter referred to as device) is taken as an example for description.
在一些实现方式中,所述设备可以响应于网络训练请求,执行S102。In some implementation manners, the device may execute S102 in response to the network training request.
所述第一样本图像,可以是指标注了第一真实偏移量的遥感图像。本公开实施例中,偏移量指的是图像中屋顶与底座之间的偏移量。例如,屋顶包括10个像素点,将该10个像素点按照所述偏移量进行平移,即可得到底座。The first sample image may refer to a remote sensing image marked with a first real offset. In the embodiments of the present disclosure, the offset refers to the offset between the roof and the base in the image. For example, the roof includes 10 pixels, and the base can be obtained by translating the 10 pixels according to the offset.
所述第一真实偏移量可以是指示第一样本图像中建筑物屋顶与底座真实偏移量的信息。例如,所述第一真实偏移量可以是(x,y)向量形式的信息。其中,x和y分别表示屋顶区域的像素点与底座区域对应位置的像素点在x轴和y轴方向上的偏移。在一些实现方式中,可以预先根据第一样本图像中的建筑物屋顶与底座之间的真实偏移量,进行偏移量标注。本公开不对偏移量的标注方式进行特别限定。The first real offset may be information indicating the real offset between the roof and the base of the building in the first sample image. For example, the first real offset may be information in the form of (x, y) vector. Wherein, x and y represent the offsets of the pixel points in the roof region and the corresponding pixel points in the base region in the x-axis and y-axis directions, respectively. In some implementation manners, the offset may be marked in advance according to the actual offset between the roof and the base of the building in the first sample image. The present disclosure does not specifically limit the labeling manner of the offset.
由于第一样本图像坐标系是保持不变的,因此将第一样本图像旋转一定角度后,偏移量也会旋转相同的角度。Since the coordinate system of the first sample image remains unchanged, after the first sample image is rotated by a certain angle, the offset will also be rotated by the same angle.
请参见图2a与图2b,其中,图2a为本公开示出的一种偏移量示意图;图2b为本公开示出的一种图像旋转90度后偏移量示意图。Please refer to FIG. 2 a and FIG. 2 b , wherein FIG. 2 a is a schematic diagram of an offset shown in the present disclosure; FIG. 2 b is a schematic diagram of an offset after an image is rotated by 90 degrees shown in the present disclosure.
在图像旋转前,图像中屋顶与底座之间的偏移量可以如图2a所示。如图2b所示,在图像逆时针旋转90度后,由于坐标系不变,因此,偏移量也会旋转90度。Before the image is rotated, the offset between the roof and the base in the image can be shown in Figure 2a. As shown in Fig. 2b, after the image is rotated 90 degrees counterclockwise, since the coordinate system remains unchanged, the offset will also be rotated 90 degrees.
利用图像旋转一定角度后,偏移量也会旋转相同的角度的特性,可以将样本图像以及对应的真实偏移量进行各种角度的旋转,由此可以简便的扩充标注了真实偏移量的有标注样本数据,从而可以提升网络训练效果。After the image is rotated by a certain angle, the offset will also be rotated by the same angle. The sample image and the corresponding real offset can be rotated at various angles, so that the real offset can be easily expanded. There are labeled sample data, which can improve the network training effect.
所述偏移量提取网络,可以是基于目标检测网络构建的网络。所述目标检测网络可以是RCNN(Region Convolutional Neural Network,区域卷积神经网络),FAST-RCNN(Fast Region Convolutional Neural Network,快速区域卷积神经网络),FASTER-RCNN(Faster Region Convolutional Neural Network,更快速的区域卷积神经网络)或MASK-RCNN(Mask Region Convolutional Neural Network,掩膜区域卷积神经网络)中的任一。The offset extraction network may be a network constructed based on a target detection network. The target detection network can be RCNN (Region Convolutional Neural Network, regional convolutional neural network), FAST-RCNN (Fast Region Convolutional Neural Network, fast regional convolutional neural network), FASTER-RCNN (Faster Region Convolutional Neural Network, more Fast Regional Convolutional Neural Network) or MASK-RCNN (Mask Region Convolutional Neural Network, Mask Region Convolutional Neural Network).
在一些实现方式中,为了提升偏移量提取精度,可以采用对区域表征精度更高的MASK-RCNN。所述MASK-RCNN可以包括RPN(Region Proposal Network,候选框生成网络),以及RoI Align(Region of Interest Align,感兴趣区域对齐)单元等。In some implementations, in order to improve the accuracy of offset extraction, a MASK-RCNN with higher accuracy for region representation can be used. The MASK-RCNN may include RPN (Region Proposal Network, candidate frame generation network), and RoI Align (Region of Interest Align, region of interest alignment) unit, etc.
其中,所述RPN网络用于生成与图像中各建筑物对应的候选框。在得到候选框后,可以进行候选框的回归和分类,得到各建筑物对应的边框。所述RoI Align单元用于根据所述建筑物对应的边框,从所述图像中提取出与所述建筑物对应的视觉特征。之后可以利用所述建筑物对应的视觉特征,提取屋顶与底座之间的偏移量。Wherein, the RPN network is used to generate candidate frames corresponding to each building in the image. After the candidate frame is obtained, the regression and classification of the candidate frame can be performed to obtain the frame corresponding to each building. The RoI Align unit is used to extract visual features corresponding to the building from the image according to the frame corresponding to the building. The offset between the roof and the base can then be extracted using the corresponding visual features of the building.
所述预设角度,可以根据业务需求进行设定。所述预设角度的数量可以根据需要扩充的样本量进行确定。例如,需要扩充大量样本,则可以设置大量的预设角度。本公开不对预设角度的数值和数量进行特别限定。所述多种预设角度用于旋转样本图像或所述样本图像对应的图像特征。The preset angle can be set according to business requirements. The number of preset angles can be determined according to the sample size that needs to be expanded. For example, if a large number of samples need to be expanded, a large number of preset angles can be set. The present disclosure does not specifically limit the value and quantity of the preset angles. The various preset angles are used to rotate the sample image or image features corresponding to the sample image.
在一些实现方式中,在执行S102时,可以先利用各预设角度分别生成对应的旋转矩阵。然后针对各预设角度利用该预设角度对应的旋转矩阵,对第一样本图像包括的各像素点进行移位,得到旋转后的第二样本图像。之后,可以将各旋转后的第二样本图像 输入所述偏移量提取网络,提取出与各旋转后的第二样本图像分别对应的第二预测偏移量。需要说明的是,在一些实现方式中,在对第一样本图像进行旋转时,可以先利用偏移量提取网络包含的特征提取网络对第一样本图像进行特征提取,得到第一图像特征;之后,对得到的第一图像特征进行旋转。由此可以减少旋转过程的运算量,以及可以减少对旋转后图像进行特征提取时引入的旋转误差,有助于提升网络训练效果。In some implementation manners, when performing S102, each preset angle may be used to generate corresponding rotation matrices respectively. Then, for each preset angle, the rotation matrix corresponding to the preset angle is used to shift each pixel included in the first sample image to obtain a rotated second sample image. Afterwards, each rotated second sample image may be input into the offset extraction network, and second predicted offsets respectively corresponding to each rotated second sample image may be extracted. It should be noted that, in some implementations, when the first sample image is rotated, the feature extraction network included in the offset extraction network can be used to perform feature extraction on the first sample image to obtain the first image feature ; After that, rotate the obtained first image features. This can reduce the amount of calculation in the rotation process, and can reduce the rotation error introduced when extracting features from the rotated image, which helps to improve the network training effect.
在一些实现方式中,在执行S104时,可以利用各预设角度分别对应的旋转矩阵,对第一样本图像的第一真实偏移量进行旋转,得到将第一样本图像旋转多种预设角度后分别对应的第二真实偏移量。In some implementations, when executing S104, the first real offset of the first sample image can be rotated using the rotation matrices respectively corresponding to the preset angles, so as to obtain multiple preset rotations of the first sample image. The second real offset corresponding to the angle after setting.
在得到将第一样本图像旋转多种预设角度后分别对应的第二真实偏移量和第二预测偏移量后,可以执行S106。After the second real offset and the second predicted offset corresponding to the rotation of the first sample image by various preset angles are obtained, S106 may be executed.
在一些实现方式中,在执行S106时,可以利用预设的损失函数(例如交叉熵损失函数),针对每种预设角度,根据将第一样本图像的第一真实偏移量旋转该预设角度后对应的第二真实偏移量、和获得的与该预设角度对应的第二预测偏移量,得到将第一样本图像旋转该预设角度后对应的偏移量损失信息。然后,基于将第一样本图像旋转多种预设角度后分别对应的偏移量损失信息,利用诸如求和,求积,求平均数等方式,确定总损失,并利用确定的总损失计算下降梯度,通过反向传播调整所述偏移量提取网络的网络参数。In some implementations, when performing S106, a preset loss function (such as a cross-entropy loss function) may be used to rotate the first sample image by the first real offset of the first sample image for each preset angle. The corresponding second real offset after setting the angle, and the obtained second predicted offset corresponding to the preset angle obtain the corresponding offset loss information after the first sample image is rotated by the preset angle. Then, based on the offset loss information corresponding to the rotation of the first sample image by various preset angles, the total loss is determined by methods such as summation, product, and average, and the determined total loss is used to calculate Gradient descent, adjust the network parameters of the offset extraction network through backpropagation.
在所述方案中,由于可以利用偏移量提取网络,获得与多种预设角度分别对应的第二预测偏移量,以及将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量,然后可以利用与所述多种预设角度分别对应的第二真实偏移量和获得的第二预测偏移量,调整所述偏移量提取网络的网络参数。In the scheme, since the offset extraction network can be used to obtain the second predicted offsets respectively corresponding to various preset angles, and the first real offsets are respectively rotated by the various preset angles Angle, to obtain the second real offsets respectively corresponding to the various preset angles, and then use the second real offsets respectively corresponding to the various preset angles and the obtained second predicted offsets , adjust the network parameters of the offset extraction network.
因此可以利用图像旋转一定角度后,偏移量也会旋转该角度的特性,通过对图像(或其图像特征)和真实偏移量进行旋转,达到扩充具有真实偏移量的样本图像的效果,从而可以利用少量标注了偏移量的标注数据,训练得到高精度偏移量提取网络。Therefore, after the image is rotated by a certain angle, the offset will also rotate the angle. By rotating the image (or its image features) and the real offset, the effect of expanding the sample image with the real offset can be achieved. In this way, a small amount of labeled data with offsets can be used to train a high-precision offset extraction network.
在一些实现方式中,所述偏移量提取网络可能为某一综合网络的一个分支。由此在进行样本图像与真实偏移量旋转的过程中,也会对样本图像中涵盖的其它信息进行旋转,在利用旋转后的样本图像对所述综合网络进行训练时,该综合网络的其它分支需要对旋转后的样本图像的其它信息进行拟合,由此增加了训练时间,降低了训练效率。In some implementations, the offset extraction network may be a branch of an integrated network. Therefore, in the process of rotating the sample image and the real offset, other information contained in the sample image will also be rotated. When the rotated sample image is used to train the integrated network, other information of the integrated network The branch needs to fit other information of the rotated sample image, thus increasing the training time and reducing the training efficiency.
请参见图3,图3为本公开示出的一种建筑物底座提取流程示意图。Please refer to FIG. 3 . FIG. 3 is a schematic diagram of a building foundation extraction process shown in the present disclosure.
如图3所示,将遥感图像输入图3示出的底座区域提取网络后,可以利用屋顶区域提取网络提取建筑物屋顶区域,以及利用偏移量提取网络提取偏移量。然后可以利用该偏移量,对屋顶区域进行变换(例如平移变换),得到底座区域。As shown in Figure 3, after the remote sensing image is input into the base area extraction network shown in Figure 3, the roof area extraction network can be used to extract the roof area of the building, and the offset extraction network can be used to extract the offset. The offset can then be used to transform (for example, translate) the roof area to obtain the base area.
此时,偏移量提取网络和屋顶区域提取网络是底座区域提取网络的两个分支。在利用前述方案训练底座区域提取网络时,由于第一样本图像发生旋转,其包含的屋顶区域也会发生相应旋转,因此,在训练该网络时,屋顶区域提取网络(即前述其它分支)也需要重新拟合,导致网络收敛速度下降。At this time, the offset extraction network and the roof area extraction network are two branches of the base area extraction network. When using the aforementioned scheme to train the base area extraction network, since the first sample image is rotated, the roof area contained in it will also be rotated accordingly. Therefore, when training the network, the roof area extraction network (that is, the other branches mentioned above) will also be rotated. Refitting is required, resulting in a slowdown in network convergence.
为了解决前述痛点,在一些实现方式中,可以将第一样本图像的旋转过程置于偏移量提取网络中,由此可以在偏移量提取网络内部进行图像旋转,不会影响其它分支的训练,即不会影响其它分支的收敛速度,进而提升了网络训练效率。In order to solve the aforementioned pain points, in some implementations, the rotation process of the first sample image can be placed in the offset extraction network, so that image rotation can be performed inside the offset extraction network without affecting other branches. Training, that is, it will not affect the convergence speed of other branches, thereby improving the efficiency of network training.
请参见图4,图4为本公开示出的一种偏移量提取流程示意图。Please refer to FIG. 4 , which is a schematic diagram of an offset extraction process shown in the present disclosure.
如题4所示,在执行S102时,可以执行S402,针对所述多种预设角度中的每一预设角度,利用偏移量提取网络,将所述第一样本图像对应的第一图像特征旋转所述预设 角度,得到与所述预设角度对应的第二图像特征。然后可以执行S404,基于所述第二图像特征,得到与所述预设角度对应的第二预测偏移量。As shown in Question 4, when S102 is executed, S402 can be executed, and for each of the various preset angles, the offset extraction network is used to convert the first image corresponding to the first sample image to The feature is rotated by the preset angle to obtain a second image feature corresponding to the preset angle. Then S404 may be executed to obtain a second predicted offset corresponding to the preset angle based on the second image feature.
所述第一图像特征,可以是指第一样本图像经过若干卷积层、池化层等的特征提取处理后得到的图像特征。在一些实现方式中,所述偏移量提取网络可以是基于MASK-RCNN构建的网络。所述偏移量提取网络可以通过包括的骨干网络以及RoI Align单元对第一样本图像进行特征提取得到所述第一图像特征。在一些实现方式中,可以通过特征图表征前述图像特征。The first image feature may refer to the image feature obtained after the first sample image undergoes feature extraction processing such as several convolutional layers and pooling layers. In some implementation manners, the offset extraction network may be a network constructed based on MASK-RCNN. The offset extraction network can perform feature extraction on the first sample image through the included backbone network and the RoI Align unit to obtain the first image features. In some implementation manners, the aforementioned image features may be characterized by a feature map.
在一些实现方式中,在执行S402时,可以通过多种预设角度分别对应的旋转矩阵对第一图像特征中的各像素点进行位置变换,得到与多种预设角度分别对应的第二图像特征。然后在执行S404时,可以通过诸如若干卷积层,池化层,全连接层以及映射单元(例如,softmax(柔性最大值传输函数))对第二图像特征进行处理,得到针对偏移量的提取结果,即第二预测偏移量。In some implementations, when executing S402, the positions of the pixels in the first image feature can be transformed through rotation matrices corresponding to various preset angles to obtain second images respectively corresponding to various preset angles feature. Then when performing S404, the second image feature can be processed by such as several convolutional layers, pooling layers, fully connected layers and mapping units (for example, softmax (soft maximum transfer function)), to obtain the offset value The result of the extraction is the second predicted offset.
在一些实现方式中,第一样本图像只会在偏移量提取网络内进行旋转,对于屋顶区域提取网络则仍然利用未旋转的第一样本图像进行训练。由此,即可在偏移量提取网络内对第一样本图像进行旋转变化,从而不会影响其它分支的训练。In some implementations, the first sample image is only rotated within the offset extraction network, and the roof region extraction network is still trained with the unrotated first sample image. In this way, the rotation of the first sample image can be changed in the offset extraction network, so as not to affect the training of other branches.
在一些实现方式中,为了便于对偏移量提取网络进行训练,可以利用空间变换网络进行图像旋转,从而使旋转过程变的可导,使梯度可以正常反向传播,进而可以直接对网络进行训练。In some implementations, in order to facilitate the training of the offset extraction network, the spatial transformation network can be used to rotate the image, so that the rotation process becomes derivable, the gradient can be backpropagated normally, and the network can be directly trained .
请参见图5,图5为本公开示出的一种利用空间变换网络进行图像旋转的流程示意图。Please refer to FIG. 5 . FIG. 5 is a schematic flowchart of image rotation using a spatial transformation network shown in the present disclosure.
图5示出的空间变换网络(Spatial Transformer Network,STN)50可以包括旋转角生成网络51,采样网格52以及采样器53。The spatial transformation network (Spatial Transformer Network, STN) 50 shown in FIG. 5 may include a rotation angle generation network 51, a sampling grid 52 and a sampler 53.
其中,所述旋转角生成网络51可以用于通过自监督方式进行训练,在完成训练后,可以用于生成旋转角θ。在本例中,由于旋转角为指定的预设角度,因此并未使用旋转角生成网络51生成旋转角,而是直接指定旋转角θ。Wherein, the rotation angle generation network 51 can be used for training in a self-supervised manner, and can be used to generate the rotation angle θ after the training is completed. In this example, since the rotation angle is a specified preset angle, the rotation angle generation network 51 is not used to generate the rotation angle, but the rotation angle θ is directly specified.
所述采样网格52,可以根据旋转角,确定第二图像特征V中的像素点和第一图像特征U中各像素点之间的对应关系T θ(G)。 The sampling grid 52 can determine the corresponding relationship T θ (G) between the pixels in the second image feature V and the pixels in the first image feature U according to the rotation angle.
所述采样器53,可以分别针对第二图像特征V中各像素点,根据采样网格52表征的像素点对应关系,确定第一图像特征U中,与所述像素点对应的多个像素点,并基于插值方式对所述多个像素点的像素值进行映射,得到所述像素点对应的像素值,以完成图像特征旋转。所述插值方式可以包括多项式插值,线性插值,双线性插值等方式。在本例中,由于采用了插值方式进行图像旋转,从而使得图像旋转过程变的可导,使梯度可以正常反向传播,进而可以直接对网络进行训练。The sampler 53 can respectively determine a plurality of pixel points corresponding to the pixel points in the first image feature U according to the pixel point correspondence represented by the sampling grid 52 for each pixel point in the second image feature V , and map the pixel values of the plurality of pixel points based on an interpolation manner to obtain the pixel values corresponding to the pixel points, so as to complete the image feature rotation. The interpolation methods may include polynomial interpolation, linear interpolation, bilinear interpolation and other methods. In this example, since the image is rotated by interpolation, the image rotation process becomes derivable, so that the gradient can be backpropagated normally, and the network can be trained directly.
在执行S402时,可以针对所述多种预设角度中的每一预设角度,利用所述偏移量提取网络包括的与所述预设角度对应的空间变换网络,将所述第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征。When executing S402, for each preset angle among the plurality of preset angles, the first image may be converted to The feature is rotated by the preset angle to obtain a second image feature corresponding to the preset angle.
在一些实现方式中,可以在偏移量提取网络中部署与不同预设角度分别对应的空间变换网络(以下简称STN),并指定各STN对应的旋转角θ。其中,各STN的输入可以为对第一样本图像进行特征提取得到的第一图像特征。在STN中,可以利用所述采样网格,分别针对第二图像特征的各像素点,确定所述第一图像特征中,与所述像素点对应的多个像素点,并通过所述采样器,基于插值方式对所述多个像素点的像素值进行映射,得到所述像素点对应的像素值。In some implementation manners, space transformation networks (hereinafter referred to as STNs) respectively corresponding to different preset angles may be deployed in the offset extraction network, and the rotation angle θ corresponding to each STN may be specified. Wherein, the input of each STN may be the first image feature obtained by performing feature extraction on the first sample image. In the STN, the sampling grid can be used to determine a plurality of pixel points corresponding to the pixel points in the first image feature for each pixel point of the second image feature, and through the sampler , mapping the pixel values of the plurality of pixel points based on an interpolation manner to obtain the pixel values corresponding to the pixel points.
经过STN旋转处理后,可以得到将第一样本图像旋转多种预设角度后分别对应的第二图像特征。After the STN rotation processing, the second image features respectively corresponding to the first sample image rotated by various preset angles can be obtained.
然后可以通过S404,提取出将第一样本图像旋转多种预设角度后分别对应的第二预测偏移量。Then, through S404, the second predicted offsets corresponding to the rotations of the first sample image by various preset angles can be extracted.
请参见图6,图6为本公开示出的一种偏移量提取网络训练流程示意图。Please refer to FIG. 6 , which is a schematic diagram of an offset extraction network training process shown in the present disclosure.
图6示出的偏移量提取网络可以包括特征提取单元与偏移量扩充单元。The offset extraction network shown in FIG. 6 may include a feature extraction unit and an offset expansion unit.
其中,特征提取单元可以包括骨干网络与RoI Align单元(图6未示出),用于提取出第一样本图像中的建筑物图像特征,即第一图像特征F0。Wherein, the feature extraction unit may include a backbone network and a RoI Align unit (not shown in FIG. 6 ), for extracting building image features in the first sample image, that is, the first image feature F0.
在本例中,偏移量扩充单元可以包括4条STN分支。如图6所示,偏移量扩充单元可以利用STN分别将第一图像特征F0旋转0度,90度,180度以及270度,得到对应的第二图像特征F1-F4。偏移量扩充单元还可以利用分类器对第二图像特征F1-F4进行分类,得到将第一样本图像旋转0度,90度,180度以及270度后分别对应的第二预测偏移量。所述分类器可以包括多个卷积层、全连接层和映射单元。在一些实现方式中,为了简化网络结构,可以将多个分类器中的至少部分卷积层、全连接层进行参数共享。需要说明的是,第一样本图像的旋转角度可以包括但不限于前述例举的几种情况,在此对于旋转角度的间隔、旋转次数等不予限定,可以基于所需样本图像的数量等因素动态调整。In this example, the offset expansion unit may include 4 STN branches. As shown in FIG. 6 , the offset expansion unit can use the STN to respectively rotate the first image feature F0 by 0 degrees, 90 degrees, 180 degrees and 270 degrees to obtain the corresponding second image features F1-F4. The offset expansion unit can also use a classifier to classify the second image features F1-F4 to obtain the second predicted offset corresponding to the rotation of the first sample image by 0 degrees, 90 degrees, 180 degrees and 270 degrees . The classifier may include multiple convolutional layers, fully connected layers, and mapping units. In some implementation manners, in order to simplify the network structure, parameters of at least some convolutional layers and fully connected layers in multiple classifiers may be shared. It should be noted that the rotation angle of the first sample image may include but not limited to the above-mentioned several situations, and there is no limitation on the interval of the rotation angle, the number of rotations, etc., and may be based on the number of required sample images, etc. Factors are dynamically adjusted.
如图6所示,在训练过程中,可以分别利用0度,90度,180度以及270度分别对应的旋转矩阵,对第一样本图像的第一真实偏移量进行旋转变换,得到多个第二真实偏移量。As shown in Figure 6, during the training process, the rotation matrices corresponding to 0 degrees, 90 degrees, 180 degrees and 270 degrees can be used to perform rotation transformation on the first real offset of the first sample image, and multiple second real offset.
如图6所示,在训练过程中,可以根据将第一样本图像旋转多种预设角度后分别对应的第二真实偏移量和第二预测偏移量,得到将第一样本图像旋转多种预设角度后分别对应的偏移量损失信息L1-L4。然后基于偏移量损失信息L1-L4之和,确定总损失,并根据总损失,调整所述偏移量提取网络的网络参数。在一些实现方式中,也可以采用对偏移量损失信息L1-L4求积,求平均数等方式确定总损失,在此不作特别限定。As shown in Figure 6, in the training process, the first sample image can be obtained according to the second real offset and the second predicted offset after rotating the first sample image by various preset angles. The corresponding offset loss information L1-L4 after rotating various preset angles. Then determine the total loss based on the sum of the offset loss information L1-L4, and adjust the network parameters of the offset extraction network according to the total loss. In some implementation manners, the total loss may also be determined by means of multiplying the offset loss information L1-L4, calculating an average, etc., which are not particularly limited here.
在图6示出的训练过程中,第一,通过对第一样本图像和第一真实偏移量进行旋转,达到扩充具有真实偏移量的样本图像的效果,从而可以利用少量标注了偏移量的标注数据,训练得到高精度偏移量提取网络。第二,可以在偏移量提取网络内部进行第一样本图像旋转,不会影响其它分支的训练,即不会影响其它分支的收敛速度,进而提升了网络训练效率。第三,可以利用STN进行图像旋转,从而使旋转过程变的可导,使梯度可以正常反向传播,进而可以直接进行网络训练。In the training process shown in Figure 6, first, by rotating the first sample image and the first real offset, the effect of expanding the sample image with the real offset can be achieved, so that a small amount of labeled offset can be used. The labeled data of the offset is trained to obtain a high-precision offset extraction network. Second, the image rotation of the first sample can be performed inside the offset extraction network, which will not affect the training of other branches, that is, will not affect the convergence speed of other branches, thereby improving the efficiency of network training. Third, STN can be used for image rotation, so that the rotation process becomes derivable, so that the gradient can be backpropagated normally, and then network training can be performed directly.
请继续参见图3,在对图3示出的底座区域提取网络进行训练时,由于样本标注成本很高,因此无法获取大量包括真实偏移量和屋顶区域真实信息的标注样本数据,而使用少量的有标注样本数据无法训练出高精度的底座区域提取网络。Please continue to refer to Figure 3. When training the base area extraction network shown in Figure 3, due to the high cost of sample labeling, it is impossible to obtain a large number of labeled sample data including real offsets and real information of the roof area, and a small amount of Annotated sample data cannot train a high-precision base region extraction network.
所述底座区域提取网络用于基于获得的屋顶区域与偏移量提取底座区域。底座区域提取网络包括的屋顶区域提取网络与偏移量提取网络共享特征提取网络。所述特征提取网络可以包括骨干网络与RoI Align单元。The plinth area extraction network is used to extract plinth areas based on the obtained roof areas and offsets. The roof area extraction network included in the base area extraction network shares the feature extraction network with the offset extraction network. The feature extraction network may include a backbone network and a RoI Align unit.
在一些实施方式中,可以利用同一建筑物底座区域不会发生变化的特性,在同一区域对应的多帧样本图像之间共享底座区域真值信息,达到扩充训练样本集的效果,进而有助于利用少量的有标注样本数据,训练出高精度的建筑物底座区域提取网络。In some implementations, the fact that the base area of the same building does not change can be used to share the true value information of the base area between multiple frame sample images corresponding to the same area, so as to achieve the effect of expanding the training sample set, which in turn helps Using a small amount of labeled sample data, a high-precision building base area extraction network is trained.
其中,用于训练所述偏移量提取网络的训练样本集的生成方法可以包括:S302,针对多个区域中的每个区域,获取与所述区域对应的一帧或多帧原始样本图像;其中,在 所述区域对应多帧原始样本图像的情况下,存在至少两帧所述原始样本图像具有不同的采集角度。Wherein, the method for generating the training sample set for training the offset extraction network may include: S302, for each of the multiple regions, acquiring one or more frames of original sample images corresponding to the regions; Wherein, in the case that the region corresponds to multiple frames of original sample images, at least two frames of the original sample images have different acquisition angles.
原始样本图像可以是由任意能够采集到所述多个区域的图像的图像采集设备采集的。在针对相同区域采集的多帧原始样本图像中,存在至少两帧具有不同的采集角度的原始样本图像,由此可以丰富训练样本包含的信息,提升神经网络适应性。The original sample image may be acquired by any image acquisition device capable of acquiring images of the plurality of regions. Among the multiple frames of original sample images collected for the same area, there are at least two frames of original sample images with different collection angles, thereby enriching the information contained in the training samples and improving the adaptability of the neural network.
所述原始样本图像可以按照区域分类存储在存储介质中。所述设备可以从存储介质中获取原始样本图像。The original sample images may be classified and stored in the storage medium according to regions. The device can acquire the original sample image from a storage medium.
在一些实现方式中,所述原始样本图像可以包括针对所述多个区域采集的多时相图像。所述多时相图像,可以是指在不同时刻针对同一地区采集得到的多帧遥感图像。In some implementations, the raw sample images may include multi-temporal images acquired for the plurality of regions. The multi-temporal image may refer to multiple frames of remote sensing images collected for the same area at different times.
然后可以执行S304,将所述区域对应的一帧原始样本图像作为所述区域对应的所述第一样本图像进行底座区域真值信息标注。Then S304 may be executed, and a frame of the original sample image corresponding to the region is used as the first sample image corresponding to the region to mark the true value information of the base region.
作为所述第一样本图像的原始样本图像可以是从与所述区域对应的一帧或多帧原始样本图像中任意选取的清晰度达标的图像。The original sample image as the first sample image may be arbitrarily selected from one or more frames of original sample images corresponding to the region, and an image with a standard resolution.
在一些实现方式中,可以分别从与每个区域对应的原始样本图像中选取至少一帧原始样本图像。然后通过预先标注的方式,进行底座区域真值信息标注。In some implementation manners, at least one frame of original sample images may be respectively selected from the original sample images corresponding to each region. Then, mark the true value information of the base area by pre-marking.
其中,所述底座区域真值信息可以是像素级的真值信息。所述底座区域真值信息可以是将遥感样本图像中的建筑物底座区域内的像素点的值置为1,所述底座区域之外的像素点的值置为0。Wherein, the ground truth information of the base area may be pixel level ground truth information. The true value information of the base area may be to set the value of the pixel points in the base area of the building in the remote sensing sample image to 1, and set the value of the pixel points outside the base area to 0.
之后可以执行S306,针对每个区域,将所述区域对应的第一样本图像所标注的底座区域真值信息,确定为所述区域对应的各帧原始样本图像的底座区域真值信息,基于多个区域分别对应的原始样本图像和第一样本图像,得到训练样本集。Afterwards, S306 can be executed. For each region, the base region truth information marked in the first sample image corresponding to the region is determined as the base region truth information of each frame of the original sample image corresponding to the region, based on The original sample images and the first sample images respectively corresponding to the multiple regions are used to obtain a training sample set.
在一些实现方式中,可以将S304中为每个区域中第一样本图像标注的底座区域真值信息,作为每个区域中各原始样本图像对应的标注信息,由此达到扩充训练样本的目的。In some implementations, the ground truth information of the base area marked for the first sample image in each area in S304 can be used as the label information corresponding to each original sample image in each area, thereby achieving the purpose of expanding the training samples .
由于同一区域的建筑物底座是不会发生变化的,在对相同区域采集的各原始样本图像进行图像配准后,各原始样本图像中的建筑物的底座区域和位置是相同的。即针对相同区域任一帧原始样本图像的底座区域真值信息进行标注并作为该区域对应的第一样本图像,可视为针对该区域的各帧原始样本图像均进行了底座区域真值信息的标注,从而进行了样本扩充,即通过少量的标注操作得到大量的训练样本。Since the base of the building in the same area will not change, after image registration is performed on the original sample images collected in the same area, the area and position of the base of the building in each original sample image are the same. That is, the ground truth information of the base region of any frame of the original sample image in the same region is marked and used as the first sample image corresponding to the region, which can be regarded as the ground truth information of the base region for each frame of the original sample image in the region. labeling, thus performing sample expansion, that is, a large number of training samples are obtained through a small number of labeling operations.
在一些实施方式中,可以利用扩充得到的训练样本对所述底座区域提取网络进行有监督训练,由此有助于利用少量的有标注样本数据,训练出高精度的建筑物底座区域提取网络。在一些实现方式中,还可以将前述偏移量提取网络的训练与对屋顶区域提取网络的训练相结合,对底座区域提取网络进行联合训练,从而利用少量有标注数据,训练出高精度的底座区域提取网络。In some implementations, the expanded training samples can be used to perform supervised training on the base area extraction network, thereby helping to use a small amount of labeled sample data to train a high-precision building base area extraction network. In some implementations, the training of the aforementioned offset extraction network can also be combined with the training of the roof area extraction network to jointly train the base area extraction network, so as to use a small amount of labeled data to train a high-precision base Region extraction network.
所述第一样本图像还标注有屋顶区域真实信息。The first sample image is also marked with real roof area information.
在训练底座区域提取网络时,一方面,可以根据前述任一实施方式示出的偏移量提取网络训练方法,对偏移量提取网络进行训练。另一方面,可以利用屋顶区域提取网络,获得所述第一样本图像中的屋顶区域预测信息。然后基于所述屋顶区域真实信息以及获得的屋顶区域预测信息,对所述屋顶区域提取网络进行训练。在一些实现方式中,可以根据预设的损失函数,基于所述屋顶区域真实信息以及获得的屋顶区域预测信息,确定损失信息。然后可以根据损失信息,利用反向传播调整网络参数。When training the base region extraction network, on the one hand, the offset extraction network may be trained according to the offset extraction network training method shown in any of the foregoing implementation manners. On the other hand, roof area prediction information in the first sample image may be obtained by using a roof area extraction network. Then, based on the real roof area information and the obtained roof area prediction information, the roof area extraction network is trained. In some implementation manners, the loss information may be determined based on the real roof area information and the obtained roof area prediction information according to a preset loss function. The network parameters can then be tuned using backpropagation based on the loss information.
由此,第一,在针对偏移量提取网络进行训练时,可以通过旋转样本图像和真实偏移量扩充样本量,达到利用少量有标注数据训练出高精度偏移量提取网络的效果。第二,在针对底座区域提取网络进行训练时,可以通过对共享特征提取网络的屋顶区域提取网络和偏移量提取网络进行联合训练,一方面,可以引入多方面的的学习信息,使训练过程既可以相互约束,又可以相互促进,从而一方面提高网络训练效率,达到利用少量有标注数据训练出高精度底座区域提取网络的效果;另一方面促进共享的特征提取网络提取到对底座区域提取更有益的特征,从而提升底座区域提取精准度。Therefore, first, when training the offset extraction network, the sample size can be expanded by rotating the sample image and the real offset, so as to achieve the effect of using a small amount of labeled data to train a high-precision offset extraction network. Second, when training the base area extraction network, the roof area extraction network and the offset extraction network of the shared feature extraction network can be jointly trained. On the one hand, various learning information can be introduced to make the training process It can not only constrain each other, but also promote each other, so as to improve the network training efficiency on the one hand, and achieve the effect of using a small amount of labeled data to train a high-precision base area extraction network; on the other hand, it promotes the shared feature extraction network to extract the base area. More beneficial features to improve the accuracy of base region extraction.
在一些实现方式中,还可以在网络训练过程中引入建筑物边框信息,形成对网络训练的约束,从而提高网络训练效率,且有助于特征提取网络提取到与建筑物相关的特征。In some implementations, building frame information can also be introduced during network training to form constraints on network training, thereby improving network training efficiency and helping the feature extraction network to extract features related to buildings.
所述第一样本图像还标注有建筑物边框真实信息。其中,所述建筑物边框真实信息可以是建筑物区域内中心像素点的坐标,以及建筑物区域的宽、高等信息。The first sample image is also marked with real information about the frame of the building. Wherein, the real information of the building frame may be the coordinates of the center pixel in the building area, and information such as the width and height of the building area.
所述底座区域提取网络还包括建筑物边框提取网络。其中,所述建筑物边框提取网络包括所述特征提取网络。The base area extraction network also includes a building frame extraction network. Wherein, the building frame extraction network includes the feature extraction network.
在进行底座区域提取网络训练时,可以利用所述建筑物边框提取网络,提取所述第一样本图像中的建筑物边框预测信息。然后可以基于所述建筑物边框真实信息以及获得的建筑物边框预测信息,对所述建筑物边框提取网络进行训练。When training the base area extraction network, the building frame extraction network may be used to extract building frame prediction information in the first sample image. Then, the building frame extraction network can be trained based on the real building frame information and the obtained building frame prediction information.
由此可以在网络训练时,第一,在针对偏移量提取网络进行训练时,可以通过旋转样本图像和真实偏移量扩充样本量,达到利用少量有标注数据训练出高精度偏移量提取网络的效果。第二,在针对底座区域提取网络进行训练时,引入建筑物边框信息,由于屋顶区域、偏移量以及建筑物边框三个提取网络共享特征提取网络,因此,一方面可以使三个提取网络相互关联,通过共享特征提取网络可以分享各个任务的监督信息,加速网络的收敛,达到利用少量有标注数据训练出高精度底座区域提取网络的效果;另一方面,可以使屋顶区域、偏移量两个提取网络感受到完整的建筑物区域特征,进而提升提取性能。Therefore, during network training, first, when training the offset extraction network, the sample size can be expanded by rotating the sample image and the real offset, so as to achieve high-precision offset extraction using a small amount of labeled data. network effect. Second, when training the base area extraction network, the building frame information is introduced. Since the three extraction networks of the roof area, offset and building frame share the feature extraction network, on the one hand, the three extraction networks can be mutually Association, through the shared feature extraction network, the supervision information of each task can be shared, and the convergence of the network can be accelerated to achieve the effect of using a small amount of labeled data to train a high-precision base area extraction network; on the other hand, the roof area and offset can be doubled. An extraction network perceives the complete building area features, thereby improving the extraction performance.
以下结合训练场景进行实施例说明。Embodiments are described below in combination with training scenarios.
参见图7,图7为本公开示出的一种建筑物底座提取流程示意图。本例中的训练方法可以部署在任意类型的电子设备中。Referring to FIG. 7 , FIG. 7 is a schematic diagram of a building foundation extraction process shown in the present disclosure. The training method in this example can be deployed in any type of electronic device.
图7示出的底座区域提取网络包括基于MASK-RCNN构建的网络。该网络可以包括分别提取屋顶区域、偏移量、建筑物边框的3个分支。其中,所述3个分支共用骨干网络,RPN候选框生成网络(以下称为RPN),以及RoI Align区域特征提取单元(以下称为RoI Align)。所述骨干网络可以是VGG(Visual Geometry Group,视觉几何组)网络、ResNet(Residual Network,残差网络)、HRNet(High-to-low Resolution network,高分辨率到低分辨率网络)等,在本公开中不进行特别限定。The base region extraction network shown in FIG. 7 includes a network constructed based on MASK-RCNN. The network can include 3 branches that extract the roof area, offset, and building border, respectively. Wherein, the three branches share the backbone network, the RPN candidate frame generation network (hereinafter referred to as RPN), and the RoI Align region feature extraction unit (hereinafter referred to as RoI Align). The backbone network can be VGG (Visual Geometry Group, visual geometry group) network, ResNet (Residual Network, residual network), HRNet (High-to-low Resolution network, high-resolution to low-resolution network), etc., in It does not specifically limit in this disclosure.
其中,偏移量提取分支可以包括图6示出的偏移量扩充单元。通过获得的偏移量对屋顶区域进行变换可以得到底座区域。Wherein, the offset extraction branch may include the offset expansion unit shown in FIG. 6 . The base area can be obtained by transforming the roof area by the obtained offset.
在进行该网络训练前,可以先获取若干标注了第一真实偏移量、屋顶区域真实信息、建筑物边框真实信息的第一样本图像。Before performing the network training, several first sample images marked with the first real offset, the real information of the roof area, and the real information of the building frame can be obtained.
然后可以根据训练迭代次数,执行多轮以下步骤以完成网络训练:Depending on the number of training iterations, multiple rounds of the following steps can then be performed to complete the network training:
S71,将若干第一样本图像分别输入底座区域提取网络。S71. Input several first sample images into the base area extraction network respectively.
其中,可以利用底座区域提取网络包括的骨干网络、RoI Align单元对各第一样 本图像进行特征提取,得到各第一样本图像对应的第一图像特征。Wherein, the backbone network and the RoI Align unit included in the base area extraction network can be utilized to perform feature extraction on each first sample image, so as to obtain the corresponding first image features of each first sample image.
然后在偏移量提取分支中,可以通过0度,90度,180度与270度分别对应的STN对各第一样本图像对应的第一图像特征进行旋转,并进行偏移量提取,得到将各第一样本图像旋转0度,90度,180度与270度后分别对应的第二预测偏移量。Then in the offset extraction branch, the first image features corresponding to each first sample image can be rotated through the STNs corresponding to 0 degrees, 90 degrees, 180 degrees and 270 degrees respectively, and the offset is extracted to obtain The second prediction offsets corresponding to each first sample image after rotating 0 degree, 90 degree, 180 degree and 270 degree respectively.
在屋顶区域提取分支和建筑物边框提取分支中,可以得到各第一样本图像对应的屋顶区域预测信息与建筑物边框预测信息。In the branch of extracting the roof area and the branch of extracting the frame of the building, the prediction information of the roof area and the prediction information of the building frame corresponding to each first sample image can be obtained.
然后可以执行S72,利用真实信息,联合训练所述3个分支。Then S72 may be executed to jointly train the three branches using real information.
其中,在训练偏移量提取分支时,可以通过0度,90度,180度与270度分别对应的旋转矩阵,对各第一样本图像对应的第一真实偏移量进行旋转变换,得到多个第二真实偏移量。之后可以根据预设损失函数,基于各第一样本图像旋转0度,90度,180度与270度后分别对应的第二真实偏移量和获得的第二预测偏移量,计算该轮训练得到的损失信息。之后可以确定下降梯度,并通过反向传播调整偏移量提取分支包括的网络参数。Wherein, when training the offset extraction branch, the first real offset corresponding to each first sample image can be rotated and transformed through the rotation matrices corresponding to 0 degrees, 90 degrees, 180 degrees and 270 degrees respectively, to obtain A number of second true offsets. Then, according to the preset loss function, based on the second real offset and the obtained second predicted offset after each first sample image is rotated by 0 degrees, 90 degrees, 180 degrees and 270 degrees respectively, the round Loss information from training. Afterwards, the descent gradient can be determined, and the network parameters included in the branch can be extracted by adjusting the offset through backpropagation.
在训练屋顶区域提取分支和建筑物边框提取分支时,可以采用常规的训练方式,分别根据屋顶区域真实信息与建筑物边框真实信息,对两个分支进行训练。When training the roof area extraction branch and the building frame extraction branch, conventional training methods can be used to train the two branches according to the real information of the roof area and the real information of the building frame.
在所述方案中,第一,可以通过旋转样本图像和真实偏移量扩充样本量,达到利用少量有标注数据训练出高精度偏移量提取分支的效果。第二,可以在偏移量提取分支内进行图像特征的旋转变换,不会影响其它分支的训练,提升了网络训练效率。第三,采用联合训练的方式,使网络可以学习多方面的信息,各分支之间的训练相互监督,相互促进,提升网络训练效率,达到利用少量有标注数据训练出高精度底座区域提取网络的效果。第四,可以使共享的骨干网络等特征提取网络提取到对底座区域提取更有益的特征,从而提升底座区域提取精准度。In the scheme, first, the sample size can be expanded by rotating the sample image and the real offset, so as to achieve the effect of using a small amount of labeled data to train a high-precision offset extraction branch. Second, the rotation transformation of image features can be performed in the offset extraction branch without affecting the training of other branches, which improves the efficiency of network training. Third, the joint training method is adopted to enable the network to learn various information, and the training between the branches supervises and promotes each other, which improves the network training efficiency and achieves the goal of using a small amount of labeled data to train a high-precision base area extraction network. Effect. Fourth, feature extraction networks such as the shared backbone network can extract features that are more beneficial to base region extraction, thereby improving the accuracy of base region extraction.
本公开还提出一种图像处理方法。该方法可以提取将待处理的第一目标图像分别旋转多种预设角度对应的第二偏移量,然后对多个第二偏移量进行逆变换,并进行融合得到更加鲁棒和准确的偏移量。The disclosure also proposes an image processing method. This method can extract the second offsets corresponding to rotating the first target image to be processed by various preset angles, and then perform inverse transformation on the multiple second offsets, and perform fusion to obtain a more robust and accurate image. Offset.
请参见图8,图8为本公开示出的一种图像处理方法的方法流程图。如图8所示,所述方法可以包括:Please refer to FIG. 8 . FIG. 8 is a method flowchart of an image processing method shown in the present disclosure. As shown in Figure 8, the method may include:
S802,获取待处理的第一目标图像;S802, acquiring the first target image to be processed;
S804,利用偏移量提取网络从多个第二目标图像,获得与多种预设角度分别对应的第二偏移量;S804, using the offset extraction network to obtain second offsets respectively corresponding to various preset angles from multiple second target images;
其中,所述偏移量提取网络通过如前述任一实施方式示出的神经网络训练方法训练得到;所述第二偏移量指示第二目标图像中屋顶与底座之间的偏移量;所述多个第二目标图像通过将第一目标图像分别旋转所述多种预设角度而得到。Wherein, the offset extraction network is obtained by training the neural network training method as shown in any of the foregoing implementations; the second offset indicates the offset between the roof and the base in the second target image; The plurality of second target images are obtained by respectively rotating the first target image through the various preset angles.
S806,针对所述多种预设角度中的各角度,将所述角度对应的第二偏移量进行逆向旋转,得到所述角度对应的逆向第二偏移量;S806. For each of the various preset angles, reversely rotate the second offset corresponding to the angle to obtain the reverse second offset corresponding to the angle;
S808,对所述多种预设角度分别对应的逆向第二偏移量进行融合,得到所述第一目标图像对应的第一偏移量。S808, merging the inverse second offsets corresponding to the various preset angles respectively, to obtain a first offset corresponding to the first target image.
该方法可以应用于任意类型的电子设备中。The method can be applied to any type of electronic device.
以偏移量提取网络包括图6示出的偏移量扩充单元为例。Take the offset extraction network including the offset expansion unit shown in FIG. 6 as an example.
在执行S804时,可以利用0度,90度,180度与270度分别对应的STN对第 一目标图像对应的第一图像特征进行旋转,并利用分类器,得到将第一目标图像旋转0度,90度,180度与270度后分别对应的第二偏移量。When executing S804, STNs corresponding to 0 degrees, 90 degrees, 180 degrees and 270 degrees can be used to rotate the first image features corresponding to the first target image, and the classifier can be used to obtain the rotation of the first target image by 0 degrees , 90 degrees, 180 degrees and 270 degrees respectively correspond to the second offset.
然后在执行S806时,可以利用0度,90度,180度与270度分别对应的旋转矩阵,分别对S804得到的各第二偏移量进行逆向旋转,得到第一目标图像未旋转时对应的多个逆向第二偏移量。需要说明的,所述逆向旋转是指与S804示出的旋转方向相反的旋转方式。比如,S804示出的为顺时针旋转,所述逆向旋转则为逆时针旋转。Then, when executing S806, the rotation matrices corresponding to 0 degrees, 90 degrees, 180 degrees and 270 degrees can be used to reversely rotate the second offsets obtained in S804 respectively to obtain the corresponding position of the first target image when it is not rotated. A number of reverse second offsets. It should be noted that the reverse rotation refers to a rotation manner opposite to the rotation direction shown in S804. For example, what is shown in S804 is clockwise rotation, and the reverse rotation is counterclockwise rotation.
之后在执行S808时,可以采用求和,求积,求平均数等方式,对多个逆向第二偏移量进行融合,得到所述第一目标图像对应的第一偏移量。由此,在提取第一目标图像的偏移量时,可以融合将第一目标图像旋转多种角度后得到的第二偏移量,使获得的第一偏移量更加鲁棒,并且准确性更高。Afterwards, when S808 is executed, a plurality of reverse second offsets may be fused by means of summation, product, and average, etc., to obtain the first offset corresponding to the first target image. Therefore, when extracting the offset of the first target image, the second offset obtained by rotating the first target image by various angles can be fused, so that the obtained first offset is more robust and accurate. higher.
在一些实现方式中,还可以针对第一目标图像进行底座提取。所述方法还包括:In some implementations, base extraction may also be performed on the first target image. The method also includes:
S810,利用底座区域提取网络包括的屋顶区域提取网络,获得所述第一目标图像中的屋顶区域。S810. Obtain the roof area in the first target image by using the roof area extraction network included in the base area extraction network.
S812,利用所述第一目标图像对应的第一偏移量,对获得的所述屋顶区域进行平移变换,得到所述第一目标图像对应的底座区域。S812. Using the first offset corresponding to the first target image, perform translation transformation on the obtained roof area to obtain a base area corresponding to the first target image.
所述底座区域提取网络,除了包括屋顶区域提取网络,还包括所述偏移量提取网络。在一些实现方式中,为了提升底座区域提取精度,所述底座区域提取网络还可以包括建筑物边框提取网络。所述底座区域提取网络可以利用前述实施方式示出的神经网络训练方法训练得到。The base area extraction network includes the offset extraction network in addition to the roof area extraction network. In some implementation manners, in order to improve the accuracy of base area extraction, the base area extraction network may further include a building frame extraction network. The base region extraction network can be trained by using the neural network training method shown in the foregoing embodiments.
本公开实施例中,第一,由于底座区域提取网络是基于上述实施例中使用少量有标注样本数据训练得到的,因此,可以降低网络训练成本,提升网络训练效率,进而降低底座提取成本。第二,由于利用高精度底座区域提取网络进行底座提取,因此,能够提升建筑物底座提取精度,进而提升针对建筑物的统计精度。第三,可以利用屋顶区域特征与偏移量特征明显的特性,间接得到建筑物底座,有助于获取准确的底座。In the embodiments of the present disclosure, first, since the base region extraction network is trained based on a small amount of labeled sample data in the above embodiments, the network training cost can be reduced, the network training efficiency can be improved, and the base extraction cost can be further reduced. Second, since the high-precision base area extraction network is used for base extraction, the accuracy of building base extraction can be improved, thereby improving the statistical accuracy for buildings. Third, the obvious characteristics of the roof area feature and the offset feature can be used to indirectly obtain the building base, which is helpful for obtaining an accurate base.
与所述任一实施方式相对应的,本公开还提出一种神经网络训练装置90。Corresponding to any of the above implementation manners, the present disclosure also proposes a neural network training device 90 .
请参见图9,图9为本公开示出的一种神经网络训练装置的结构示意图。Please refer to FIG. 9 , which is a schematic structural diagram of a neural network training device shown in the present disclosure.
如图9所示,所述装置90可以包括:As shown in Figure 9, the device 90 may include:
偏移量获得模块91,用于利用偏移量提取网络从多个第二样本图像,获得与多种预设角度分别对应的第二预测偏移量;所述第二预测偏移量指示所述第二样本图像中屋顶与底座之间的偏移量;所述多个第二样本图像通过将第一样本图像分别旋转所述多种预设角度而得到;所述第一样本图像标注有第一真实偏移量;The offset obtaining module 91 is configured to use the offset extraction network to obtain second predicted offsets respectively corresponding to various preset angles from a plurality of second sample images; the second predicted offset indicates the The offset between the roof and the base in the second sample image; the multiple second sample images are obtained by rotating the first sample image by the various preset angles respectively; the first sample image Annotated with the first true offset;
旋转模块92,用于将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量;A rotation module 92, configured to rotate the first real offset by the various preset angles to obtain second real offsets respectively corresponding to the various preset angles;
调整模块93,用于基于与所述多种预设角度分别对应的所述第二真实偏移量和所述第二预测偏移量,调整所述偏移量提取网络的网络参数。The adjustment module 93 is configured to adjust network parameters of the offset extraction network based on the second real offset and the second predicted offset respectively corresponding to the various preset angles.
在示出的一些实施方式中,所述获得模块用于:针对所述多种预设角度中的每一预设角度,利用偏移量提取网络,将所述第一样本图像对应的第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征;基于所述第二图像特征,得到与所述预设角度对应的第二预测偏移量。In some of the illustrated embodiments, the obtaining module is configured to: for each of the various preset angles, use an offset extraction network to convert the first sample image corresponding to An image feature is rotated by the preset angle to obtain a second image feature corresponding to the preset angle; based on the second image feature, a second predicted offset corresponding to the preset angle is obtained.
在示出的一些实施方式中,所述获得模块用于:针对所述多种预设角度中的每一预设角度,利用所述偏移量提取网络包括的与所述预设角度对应的空间变换网络,将 所述第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征。In some of the illustrated embodiments, the obtaining module is configured to: for each preset angle among the plurality of preset angles, use the offset extraction network to include the The space transformation network rotates the first image feature by the preset angle to obtain a second image feature corresponding to the preset angle.
在示出的一些实施方式中,所述空间变换网络包括基于插值方式进行图像旋转的采样器;其中,所述采样器包括基于所述空间变换网络对应的预设角度确定的采样网格;所述采样网格能够表征第一图像特征与第二图像特征之间的像素点对应关系。在这种情况下,所述获得模块用于:通过所述采样器,利用所述采样网格,确定所述第一图像特征中,与第二图像特征中各像素点分别对应的多个像素点,并基于插值方式对所述多个像素点的像素值进行映射,得到第二图像特征中各像素点分别对应的像素值。In some of the illustrated embodiments, the space transformation network includes a sampler for image rotation based on interpolation; wherein, the sampler includes a sampling grid determined based on a preset angle corresponding to the space transformation network; The sampling grid can characterize the pixel point correspondence between the first image feature and the second image feature. In this case, the obtaining module is configured to: use the sampler to determine a plurality of pixels in the first image feature corresponding to each pixel in the second image feature by using the sampling grid points, and map the pixel values of the plurality of pixel points based on an interpolation method to obtain pixel values corresponding to each pixel point in the second image feature.
在示出的一些实施方式中,所述调整模块93,用于:根据与所述多种预设角度分别对应的第二真实偏移量和所述第二预测偏移量,得到与所述多种预设角度分别对应的偏移量损失信息;基于与所述多种预设角度分别对应的偏移量损失信息,调整所述偏移量提取网络的网络参数。In some of the illustrated embodiments, the adjusting module 93 is configured to: obtain the second real offset corresponding to the various preset angles and the second predicted offset according to the Offset loss information corresponding to multiple preset angles; adjusting network parameters of the offset extraction network based on the offset loss information respectively corresponding to the multiple preset angles.
在示出的一些实施方式中,所述第一样本图像还标注有屋顶区域真实信息;所述装置90还包括:屋顶区域获得模块,用于利用屋顶区域提取网络,获得所述第一样本图像中的屋顶区域预测信息;其中,所述屋顶区域提取网络与所述偏移量提取网络共享特征提取网络,并且属于同一底座区域提取网络;所述底座区域提取网络用于基于获得的屋顶区域与偏移量得到底座区域;第一训练模块,用于基于所述屋顶区域真实信息以及获得的所述屋顶区域预测信息,对所述屋顶区域提取网络进行训练。In some of the illustrated embodiments, the first sample image is also marked with real roof area information; the device 90 further includes: a roof area obtaining module, configured to use a roof area extraction network to obtain the first sample image Roof area prediction information in this image; wherein, the roof area extraction network and the offset extraction network share a feature extraction network and belong to the same base area extraction network; the base area extraction network is used to obtain the roof based on The base area is obtained by the area and the offset; the first training module is configured to train the roof area extraction network based on the real information of the roof area and the obtained prediction information of the roof area.
在示出的一些实施方式中,所述底座区域提取网络包括建筑物边框提取网络,所述建筑物边框提取网络包括所述特征提取网络,所述第一样本图像还标注有建筑物边框真实信息;所述装置90还包括:建筑物边框获得模块,用于利用所述建筑物边框提取网络,获得所述第一样本图像中的建筑物边框预测信息;第二训练模块,用于基于所述建筑物边框真实信息以及获得的建筑物边框预测信息,对所述建筑物边框提取网络进行训练。In some of the illustrated embodiments, the base area extraction network includes a building frame extraction network, the building frame extraction network includes the feature extraction network, and the first sample image is also marked with the building frame real information; the device 90 also includes: a building frame obtaining module, configured to utilize the building frame extraction network to obtain building frame prediction information in the first sample image; a second training module, configured to The real building frame information and the obtained building frame prediction information are used to train the building frame extraction network.
在示出的一些实施方式中,所述装置90还包括:样本扩充模块,用于针对多个区域中的每个区域,获取与所述区域对应的一帧或多帧原始样本图像;其中,在所述区域对应多帧原始样本图像的情况下,存在至少两帧所述原始样本图像具有不同的采集角度;将所述区域对应的一帧所述原始样本图像作为所述区域对应的所述第一样本图像进行底座区域真值信息标注;将所述区域对应的所述第一样本图像所标注的底座区域真值信息,确定为所述区域对应的各帧所述原始样本图像的底座区域真值信息,基于所述多个区域分别对应的所述原始样本图像和所述第一样本图像得到训练样本集。In some of the illustrated embodiments, the device 90 further includes: a sample expansion module, configured to acquire, for each of the multiple regions, one or more frames of original sample images corresponding to the region; wherein, In the case where the region corresponds to multiple frames of original sample images, there are at least two frames of the original sample images with different acquisition angles; one frame of the original sample image corresponding to the region is used as the corresponding Annotate the ground truth information of the base area on the first sample image; determine the truth information of the base area marked in the first sample image corresponding to the area as the original sample image of each frame corresponding to the area The ground truth information of the base area is based on the original sample image and the first sample image respectively corresponding to the multiple areas to obtain a training sample set.
与所述任一实施方式相对应的,本公开还提出一种图像处理装置。该装置可以包括:获取模块,用于获取待处理的第一目标图像;偏移量获得模块,用于利用偏移量提取网络从多个第二目标图像,获得与多种预设角度分别对应的第二偏移量;其中,所述偏移量提取网络包括利用如前述任一实施方式示出的神经网络训练方法训练得到的网络;所述第二偏移量指示所述第二目标图像中屋顶与底座之间的偏移量;所述多个第二目标图像通过将所述第一目标图像分别旋转所述多种预设角度而得到;逆向旋转模块,用于针对所述多种预设角度中的各角度,将所述角度对应的第二偏移量进行逆向旋转,得到所述角度对应的逆向第二偏移量;融合模块,用于对所述多种预设角度分别对应的逆向第二偏移量进行融合,得到所述第一目标图像对应的第一偏移量。Corresponding to any of the above implementation manners, the present disclosure further proposes an image processing device. The device may include: an acquisition module, used to acquire the first target image to be processed; an offset acquisition module, used to use the offset extraction network to obtain images corresponding to various preset angles from multiple second target images The second offset; wherein, the offset extraction network includes a network trained by using the neural network training method shown in any of the preceding embodiments; the second offset indicates the second target image The offset between the roof and the base; the plurality of second target images are obtained by rotating the first target image respectively at the various preset angles; the reverse rotation module is used for the various For each angle in the preset angles, the second offset amount corresponding to the angle is reversely rotated to obtain the reverse second offset amount corresponding to the angle; the fusion module is used to separate the various preset angles The corresponding inverse second offset is fused to obtain the first offset corresponding to the first target image.
在示出的一些实施方式中,所述装置还包括:屋顶区域获得模块,用于利用底座区域提取网络包括的屋顶区域提取网络,获得所述第一目标图像中的屋顶区域;其中,所述底座区域提取网络还包括所述偏移量提取网络;所述底座区域提取网络利用如前述实施方式示出的神经网络训练方法训练得到;平移模块,用于利用所述第一目标图像对应的第一偏移量,对获得的所述屋顶区域进行平移变换,得到所述第一目标图像对应的 底座区域。In some of the illustrated embodiments, the device further includes: a roof area obtaining module, configured to use the roof area extraction network included in the base area extraction network to obtain the roof area in the first target image; wherein, the The base area extraction network also includes the offset extraction network; the base area extraction network is trained by using the neural network training method shown in the foregoing embodiment; the translation module is used to use the first target image corresponding to An offset, performing a translation transformation on the obtained roof area to obtain a base area corresponding to the first target image.
本公开示出的神经网络训练装置和/或图像处理装置的实施例可以应用于电子设备上。相应地,本公开公开了一种电子设备,该设备可以包括处理器和用于存储处理器可执行指令的存储器。其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现前述神经网络训练方法和/或图像处理方法。The embodiments of the neural network training device and/or image processing device shown in the present disclosure can be applied to electronic equipment. Accordingly, the present disclosure discloses an electronic device that may include a processor and a memory for storing processor-executable instructions. Wherein, the processor is configured to invoke the executable instructions stored in the memory to implement the aforementioned neural network training method and/or image processing method.
请参见图10,图10为本公开示出的一种电子设备的硬件结构示意图。Please refer to FIG. 10 . FIG. 10 is a schematic diagram of a hardware structure of an electronic device shown in the present disclosure.
如图10所示,该电子设备可以包括用于执行指令的处理器,用于进行网络连接的网络接口,用于为处理器存储运行数据的内存,以及用于存储神经网络训练装置和/或图像处理装置对应指令的非易失性存储器。As shown in Figure 10, the electronic device may include a processor for executing instructions, a network interface for network connection, a memory for storing operating data for the processor, and a memory for storing neural network training devices and/or The image processing device corresponds to a non-volatile memory of instructions.
其中,装置的实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,除了图10所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。Wherein, the embodiment of the apparatus may be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where it is located. From the perspective of hardware, in addition to the processor, memory, network interface, and non-volatile memory shown in Figure 10, the electronic device where the device in the embodiment is usually based on the actual function of the electronic device can also include other Hardware, no more details on this.
可以理解的是,为了提升处理速度,装置对应指令也可以直接存储于内存中,在此不作限定。It can be understood that, in order to increase the processing speed, the device corresponding instructions may also be directly stored in the memory, which is not limited herein.
本公开提出一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序可以用于使处理器执行前述神经网络训练方法和/或图像处理方法。The present disclosure proposes a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be used to cause a processor to execute the aforementioned neural network training method and/or image processing method.
本领域技术人员应明白,本公开一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本公开一个或多实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本公开一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(可以包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that one or more embodiments of the present disclosure may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may employ a computer implemented on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) with computer-usable program code embodied therein. The form of the Program Product.
本公开中的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”可以包括三种方案:A、B、以及“A和B”。"And/or" in the present disclosure means at least one of the two, for example, "A and/or B" may include three options: A, B, and "A and B".
本公开中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in the present disclosure is described in a progressive manner, the same and similar parts of the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
以上对本公开特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The specific embodiments of the present disclosure have been described above. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.
本公开中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、可以包括本公开中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本公开中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介 质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。Embodiments of the subject matter and functional operations described in this disclosure can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this disclosure and their structural equivalents, or their A combination of one or more of them. Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e. one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data The processing means executes. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
本公开中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。The processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
适合用于执行计算机程序的计算机可以包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件可以包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将可以包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。Computers suitable for the execution of a computer program may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory and/or a random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or Send data to it, or both. However, a computer is not required to have such a device. In addition, a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus (USB) ) portable storage devices like flash drives, to name a few.
适合于存储计算机程序指令和数据的计算机可读介质可以包括所有形式的非易失性存储器、媒介和存储器设备,例如可以包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。Computer-readable media suitable for storing computer program instructions and data may include all forms of non-volatile memory, media and memory devices and may include, for example, semiconductor memory devices such as EPROM, EEPROM and flash memory devices, magnetic disks such as internal hard drives or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.
虽然本公开包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本公开内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。While this disclosure contains many specific implementation details, these should not be construed as limitations on the scope of any disclosure or of what may be claimed, but rather as primarily describing features of particular disclosed embodiments. Certain features that are described in multiple embodiments within this disclosure can also be implemented in combination in a single embodiment. On the other hand, various features that are described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function in certain combinations as described and even initially claimed as such, one or more features from a claimed combination may in some cases be removed from that combination and the claimed A protected combination can point to a subcombination or a variant of a subcombination.
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,所述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。Similarly, while operations are depicted in the figures in a particular order, this should not be construed as requiring that those operations be performed in the particular order shown, or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of the various system modules and components in the described embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems can often be integrated together in a single software product, or packaged into multiple software products.
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。Thus, certain embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
以上仅为本公开一个或多个实施例的较佳实施例而已,并不用以限制本公开一个或多个实施例,凡在本公开一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开一个或多个实施例保护的范围之内。The above are only preferred embodiments of one or more embodiments of the present disclosure, and are not intended to limit one or more embodiments of the present disclosure. Any modification, equivalent replacement, improvement, etc. should be included in the protection scope of one or more embodiments of the present disclosure.

Claims (15)

  1. 一种神经网络训练方法,包括:A neural network training method, comprising:
    利用偏移量提取网络从多个第二样本图像,获得与多种预设角度分别对应的第二预测偏移量;所述第二预测偏移量指示所述第二样本图像中屋顶与底座之间的偏移量;所述多个第二样本图像通过将第一样本图像分别旋转所述多种预设角度而得到;所述第一样本图像标注有第一真实偏移量;Using the offset extraction network to obtain second predicted offsets corresponding to various preset angles from a plurality of second sample images; the second predicted offsets indicate the roof and the base in the second sample image The offset between them; the plurality of second sample images are obtained by rotating the first sample image respectively at the various preset angles; the first sample image is marked with a first real offset;
    将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量;Rotating the first real offset by the multiple preset angles to obtain second real offsets respectively corresponding to the multiple preset angles;
    基于与所述多种预设角度分别对应的所述第二真实偏移量和所述第二预测偏移量,调整所述偏移量提取网络的网络参数。Adjusting network parameters of the offset extraction network based on the second real offset and the second predicted offset respectively corresponding to the various preset angles.
  2. 根据权利要求1所述的方法,所述利用偏移量提取网络从多个第二样本图像,获得与多种预设角度分别对应的第二预测偏移量,包括:According to the method according to claim 1, said using the offset extraction network to obtain the second predicted offsets respectively corresponding to various preset angles from a plurality of second sample images, comprising:
    针对所述多种预设角度中的每一预设角度,For each preset angle in the plurality of preset angles,
    利用所述偏移量提取网络,将所述第一样本图像对应的第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征;using the offset extraction network to rotate the first image feature corresponding to the first sample image by the preset angle to obtain a second image feature corresponding to the preset angle;
    基于所述第二图像特征,得到与所述预设角度对应的第二预测偏移量。Based on the second image feature, a second predicted offset corresponding to the preset angle is obtained.
  3. 根据权利要求2所述的方法,所述利用所述偏移量提取网络,将所述第一样本图像对应的第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征,包括:According to the method according to claim 2, the first image feature corresponding to the first sample image is rotated by the preset angle by using the offset extraction network to obtain an image corresponding to the preset angle Second image features, including:
    利用所述偏移量提取网络包括的与所述预设角度对应的空间变换网络,将所述第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征。Using a space transformation network included in the offset extraction network corresponding to the preset angle to rotate the first image feature by the preset angle to obtain a second image feature corresponding to the preset angle.
  4. 根据权利要求3所述的方法,所述空间变换网络包括基于插值方式进行图像旋转的采样器,所述采样器包括基于所述空间变换网络对应的预设角度确定的采样网格;所述采样网格能够表征所述第一图像特征与所述第二图像特征之间的像素点对应关系;所述利用所述偏移量提取网络包括的与所述预设角度对应的空间变换网络,将所述第一图像特征旋转所述预设角度,得到与所述预设角度对应的第二图像特征,包括:The method according to claim 3, wherein the space transformation network includes a sampler for image rotation based on an interpolation method, and the sampler includes a sampling grid determined based on a preset angle corresponding to the space transformation network; the sampling The grid can characterize the pixel point correspondence between the first image feature and the second image feature; the use of the space transformation network included in the offset extraction network corresponding to the preset angle will The first image feature rotates the preset angle to obtain a second image feature corresponding to the preset angle, including:
    利用所述采样网格,确定所述第一图像特征中,与所述第二图像特征中各像素点分别对应的多个像素点,并Using the sampling grid, determine a plurality of pixel points in the first image feature corresponding to each pixel point in the second image feature, and
    通过所述采样器,基于插值方式对所述多个像素点的像素值进行映射,得到所述第二图像特征中各像素点分别对应的像素值。Through the sampler, the pixel values of the plurality of pixel points are mapped based on an interpolation manner, to obtain the pixel values respectively corresponding to the pixel points in the second image feature.
  5. 根据权利要求1至4任一所述的方法,所述基于与所述多种预设角度分别对应的所述第二真实偏移量和所述第二预测偏移量,调整所述偏移量提取网络的网络参数,包括:The method according to any one of claims 1 to 4, wherein the offset is adjusted based on the second real offset and the second predicted offset respectively corresponding to the various preset angles Quantitatively extract the network parameters of the network, including:
    根据与所述多种预设角度分别对应的第二真实偏移量和所述第二预测偏移量,得到与所述多种预设角度分别对应的偏移量损失信息;Obtaining offset loss information respectively corresponding to the various preset angles according to the second actual offset corresponding to the various preset angles and the second predicted offset;
    基于与所述多种预设角度分别对应的偏移量损失信息,调整所述偏移量提取网络的网络参数。The network parameters of the offset extraction network are adjusted based on the offset loss information respectively corresponding to the various preset angles.
  6. 根据权利要求1至5任一所述的方法,所述第一样本图像还标注有屋顶区域真实信息;所述方法还包括:According to the method according to any one of claims 1 to 5, the first sample image is also marked with the real information of the roof area; the method also includes:
    利用屋顶区域提取网络,获得所述第一样本图像中的屋顶区域预测信息;其中,所述屋顶区域提取网络与所述偏移量提取网络共享特征提取网络;Using a roof area extraction network to obtain roof area prediction information in the first sample image; wherein, the roof area extraction network shares a feature extraction network with the offset extraction network;
    基于所述屋顶区域真实信息以及获得的所述屋顶区域预测信息,对所述屋顶区域提取网络进行训练。The roof area extraction network is trained based on the real roof area information and the obtained roof area prediction information.
  7. 根据权利要求6所述的方法,所述屋顶区域提取网络与所述偏移量提取网络属于同一底座区域提取网络,所述底座区域提取网络包括建筑物边框提取网络,所述建筑物边框提取网络包括所述特征提取网络,所述第一样本图像还标注有建筑物边框真实信息; 所述方法还包括:The method according to claim 6, wherein the roof area extraction network and the offset extraction network belong to the same base area extraction network, the base area extraction network includes a building frame extraction network, and the building frame extraction network Including the feature extraction network, the first sample image is also marked with the real information of the building frame; the method also includes:
    利用所述建筑物边框提取网络,获得所述第一样本图像中的建筑物边框预测信息;Obtaining building frame prediction information in the first sample image by using the building frame extraction network;
    基于所述建筑物边框真实信息以及获得的所述建筑物边框预测信息,对所述建筑物边框提取网络进行训练。Based on the real building frame information and the obtained building frame prediction information, the building frame extraction network is trained.
  8. 根据权利要求1至7任一所述的方法,用于训练所述偏移量提取网络的训练样本集的生成方法包括:According to the method described in any one of claims 1 to 7, the generation method for training the training sample set of the offset extraction network comprises:
    针对多个区域中的每个区域,For each of the multiple regions,
    获取与所述区域对应的一帧或多帧原始样本图像;其中,在所述区域对应多帧原始样本图像的情况下,存在至少两帧所述原始样本图像具有不同的采集角度;Acquire one or more frames of original sample images corresponding to the area; wherein, in the case where the area corresponds to multiple frames of original sample images, there are at least two frames of the original sample images with different acquisition angles;
    将所述区域对应的一帧所述原始样本图像作为所述区域对应的所述第一样本图像进行底座区域真值信息标注;Using a frame of the original sample image corresponding to the area as the first sample image corresponding to the area to mark the true value information of the base area;
    将所述区域对应的所述第一样本图像所标注的底座区域真值信息,确定为所述区域对应的各帧所述原始样本图像的底座区域真值信息;determining the ground truth information of the base area marked in the first sample image corresponding to the area as the ground truth information of the base area of each frame of the original sample image corresponding to the area;
    基于所述多个区域分别对应的所述原始样本图像和所述第一样本图像,得到所述训练样本集。The training sample set is obtained based on the original sample image and the first sample image respectively corresponding to the multiple regions.
  9. 一种图像处理方法,包括:An image processing method, comprising:
    获取待处理的第一目标图像;Obtain the first target image to be processed;
    利用偏移量提取网络从多个第二目标图像,获得与多种预设角度分别对应的第二偏移量;其中,所述偏移量提取网络包括利用如权利要求1至8任一所述的神经网络训练方法训练得到的网络;所述第二偏移量指示所述第二目标图像中屋顶与底座之间的偏移量;所述多个第二目标图像通过将所述第一目标图像分别旋转所述多种预设角度而得到;Using the offset extraction network to obtain second offsets corresponding to various preset angles from a plurality of second target images; wherein, the offset extraction network includes using any one of claims 1 to 8 The network obtained by training the neural network training method described above; the second offset indicates the offset between the roof and the base in the second target image; the multiple second target images are obtained by combining the first The target image is obtained by rotating the various preset angles respectively;
    针对所述多种预设角度中的各角度,将所述角度对应的第二偏移量进行逆向旋转,得到所述角度对应的逆向第二偏移量;For each of the various preset angles, reversely rotate the second offset corresponding to the angle to obtain the reverse second offset corresponding to the angle;
    对所述多种预设角度分别对应的逆向第二偏移量进行融合,得到所述第一目标图像对应的第一偏移量。The inverse second offsets corresponding to the various preset angles are fused to obtain the first offset corresponding to the first target image.
  10. 根据权利要求9所述的方法,还包括:The method of claim 9, further comprising:
    利用底座区域提取网络包括的屋顶区域提取网络,获得所述第一目标图像中的屋顶区域;其中,所述底座区域提取网络还包括所述偏移量提取网络;所述底座区域提取网络利用如权利要求7所述的神经网络训练方法训练得到;Using the roof area extraction network included in the base area extraction network to obtain the roof area in the first target image; wherein, the base area extraction network also includes the offset extraction network; the base area extraction network utilizes such as Obtained by the neural network training method training described in claim 7;
    利用所述第一目标图像对应的第一偏移量,对获得的所述屋顶区域进行平移变换,得到所述第一目标图像对应的底座区域。Using the first offset corresponding to the first target image, perform translation transformation on the obtained roof area to obtain the base area corresponding to the first target image.
  11. 一种神经网络训练装置,包括:A neural network training device, comprising:
    获得模块,用于利用偏移量提取网络从多个第二样本图像,获得与多种预设角度分别对应的第二预测偏移量;所述第二预测偏移量指示所述第二样本图像中屋顶与底座之间的偏移量;所述多个第二样本图像通过将第一样本图像分别旋转所述多种预设角度而得到;所述第一样本图像标注有第一真实偏移量;An obtaining module, configured to use an offset extraction network to obtain second predicted offsets respectively corresponding to various preset angles from a plurality of second sample images; the second predicted offsets indicate the second samples The offset between the roof and the base in the image; the plurality of second sample images are obtained by rotating the first sample image respectively through the various preset angles; the first sample image is marked with the first real offset;
    旋转模块,用于将所述第一真实偏移量分别旋转所述多种预设角度,得到与所述多种预设角度分别对应的第二真实偏移量;A rotation module, configured to rotate the first real offset by the multiple preset angles to obtain second real offsets respectively corresponding to the multiple preset angles;
    调整模块,用于基于与所述多种预设角度分别对应的所述第二真实偏移量和所述第二预测偏移量,调整所述偏移量提取网络的网络参数。An adjustment module, configured to adjust network parameters of the offset extraction network based on the second real offset and the second predicted offset respectively corresponding to the various preset angles.
  12. 一种图像处理装置,包括:An image processing device, comprising:
    获取模块,用于获取待处理的第一目标图像;An acquisition module, configured to acquire the first target image to be processed;
    偏移量获得模块,用于利用偏移量提取网络从多个第二目标图像,获得与多种预设角度分别对应的第二偏移量;其中,所述偏移量提取网络包括利用如权利要求1至8任一所述的神经网络训练方法训练得到的网络;所述第二偏移量指示所述第二目标图像中屋顶与底座之间的偏移量;所述多个第二目标图像通过将所述第一目标图像分别旋转所 述多种预设角度而得到;An offset obtaining module, configured to use an offset extraction network to obtain second offsets respectively corresponding to various preset angles from a plurality of second target images; wherein, the offset extraction network includes using such as The network trained by the neural network training method described in any one of claims 1 to 8; the second offset indicates the offset between the roof and the base in the second target image; the plurality of second The target image is obtained by rotating the first target image respectively at the various preset angles;
    逆向旋转模块,用于针对所述多种预设角度中的各角度,将所述角度对应的第二偏移量进行逆向旋转,得到所述角度对应的逆向第二偏移量;A reverse rotation module, configured to reversely rotate the second offset corresponding to the angle for each of the various preset angles to obtain the reverse second offset corresponding to the angle;
    融合模块,用于对所述多种预设角度分别对应的逆向第二偏移量进行融合,得到所述第一目标图像对应的第一偏移量。A fusion module, configured to fuse the inverse second offsets corresponding to the various preset angles to obtain the first offset corresponding to the first target image.
  13. 根据权利要求12所述的装置,还包括:The apparatus of claim 12, further comprising:
    屋顶区域获得模块,用于利用底座区域提取网络包括的屋顶区域提取网络,获得所述第一目标图像中的屋顶区域;其中,所述底座区域提取网络还包括所述偏移量提取网络;所述底座区域提取网络利用如权利要求7所述的神经网络训练方法训练得到;The roof area obtaining module is configured to use the roof area extraction network included in the base area extraction network to obtain the roof area in the first target image; wherein, the base area extraction network further includes the offset extraction network; The base area extraction network is obtained by training the neural network training method as claimed in claim 7;
    平移模块,用于利用所述第一目标图像对应的第一偏移量,对获得的所述屋顶区域进行平移变换,得到所述第一目标图像对应的底座区域。The translation module is configured to use the first offset corresponding to the first target image to perform translation transformation on the obtained roof area to obtain the base area corresponding to the first target image.
  14. 一种电子设备,包括:An electronic device comprising:
    处理器;processor;
    用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
    其中,所述处理器通过运行所述可执行指令以实现如权利要求1至8任一所述的神经网络训练方法,和/或权利要求9或10所述的图像处理方法。Wherein, the processor implements the neural network training method according to any one of claims 1 to 8, and/or the image processing method according to claim 9 or 10 by running the executable instructions.
  15. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于使处理器执行如权利要求1至8任一所述的神经网络训练方法,和/或权利要求9或10所述的图像处理方法。A computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to enable a processor to execute the neural network training method according to any one of claims 1 to 8, and/or claim 9 or The image processing method described in 10.
PCT/CN2021/137532 2021-05-31 2021-12-13 Neural network training method and apparatus, image processing method and apparatus, device, and storage medium WO2022252557A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110602236.2A CN113344195A (en) 2021-05-31 2021-05-31 Network training and image processing method, device, equipment and storage medium
CN202110602236.2 2021-05-31

Publications (1)

Publication Number Publication Date
WO2022252557A1 true WO2022252557A1 (en) 2022-12-08

Family

ID=77473197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137532 WO2022252557A1 (en) 2021-05-31 2021-12-13 Neural network training method and apparatus, image processing method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN113344195A (en)
WO (1) WO2022252557A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116109966A (en) * 2022-12-19 2023-05-12 中国科学院空天信息创新研究院 Remote sensing scene-oriented video large model construction method

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344195A (en) * 2021-05-31 2021-09-03 上海商汤智能科技有限公司 Network training and image processing method, device, equipment and storage medium
CN117291857B (en) * 2023-11-27 2024-03-22 武汉精立电子技术有限公司 Image processing method, moire eliminating equipment and moire eliminating device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898619A (en) * 2020-07-13 2020-11-06 上海眼控科技股份有限公司 Picture feature extraction method and device, computer equipment and readable storage medium
CN112396701A (en) * 2020-12-01 2021-02-23 腾讯科技(深圳)有限公司 Satellite image processing method and device, electronic equipment and computer storage medium
US20210056452A1 (en) * 2019-08-23 2021-02-25 Johnson Controls Technology Company Building system with probabilistic forecasting using a recurrent neural network sequence to sequence model
CN113344195A (en) * 2021-05-31 2021-09-03 上海商汤智能科技有限公司 Network training and image processing method, device, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11205096B2 (en) * 2018-11-19 2021-12-21 Google Llc Training image-to-image translation neural networks
CN112149585A (en) * 2020-09-27 2020-12-29 上海商汤智能科技有限公司 Image processing method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210056452A1 (en) * 2019-08-23 2021-02-25 Johnson Controls Technology Company Building system with probabilistic forecasting using a recurrent neural network sequence to sequence model
CN111898619A (en) * 2020-07-13 2020-11-06 上海眼控科技股份有限公司 Picture feature extraction method and device, computer equipment and readable storage medium
CN112396701A (en) * 2020-12-01 2021-02-23 腾讯科技(深圳)有限公司 Satellite image processing method and device, electronic equipment and computer storage medium
CN113344195A (en) * 2021-05-31 2021-09-03 上海商汤智能科技有限公司 Network training and image processing method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116109966A (en) * 2022-12-19 2023-05-12 中国科学院空天信息创新研究院 Remote sensing scene-oriented video large model construction method
CN116109966B (en) * 2022-12-19 2023-06-27 中国科学院空天信息创新研究院 Remote sensing scene-oriented video large model construction method

Also Published As

Publication number Publication date
CN113344195A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
WO2022252557A1 (en) Neural network training method and apparatus, image processing method and apparatus, device, and storage medium
Zhang et al. Jaguar: Low latency mobile augmented reality with flexible tracking
WO2022062543A1 (en) Image processing method and apparatus, device and storage medium
US11210570B2 (en) Methods, systems and media for joint manifold learning based heterogenous sensor data fusion
CN109785298B (en) Multi-angle object detection method and system
US10929676B2 (en) Video recognition using multiple modalities
Li et al. Camera localization for augmented reality and indoor positioning: a vision-based 3D feature database approach
WO2022237811A1 (en) Image processing method and apparatus, and device
WO2022252558A1 (en) Methods for neural network training and image processing, apparatus, device and storage medium
CN114820655B (en) Weak supervision building segmentation method taking reliable area as attention mechanism supervision
CN113807361B (en) Neural network, target detection method, neural network training method and related products
CN112818969A (en) Knowledge distillation-based face pose estimation method and system
US11823432B2 (en) Saliency prediction method and system for 360-degree image
CN116453121B (en) Training method and device for lane line recognition model
WO2022206414A1 (en) Three-dimensional target detection method and apparatus
CN112434618A (en) Video target detection method based on sparse foreground prior, storage medium and equipment
WO2023083256A1 (en) Pose display method and apparatus, and system, server and storage medium
Yun et al. Panoramic vision transformer for saliency detection in 360∘ videos
CN113095316B (en) Image rotation target detection method based on multilevel fusion and angular point offset
CN104769643A (en) Method for initializing and solving the local geometry or surface normals of surfels using images in a parallelizable architecture
WO2022247126A1 (en) Visual localization method and apparatus, and device, medium and program
Ke et al. Dense small face detection based on regional cascade multi‐scale method
CN112132880A (en) Real-time dense depth estimation method based on sparse measurement and monocular RGB (red, green and blue) image
Osuna-Coutiño et al. Structure extraction in urbanized aerial images from a single view using a CNN-based approach
US11954600B2 (en) Image processing device, image processing method and image processing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21943909

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21943909

Country of ref document: EP

Kind code of ref document: A1