WO2023077809A1 - 神经网络训练的方法、电子设备及计算机存储介质 - Google Patents

神经网络训练的方法、电子设备及计算机存储介质 Download PDF

Info

Publication number
WO2023077809A1
WO2023077809A1 PCT/CN2022/098767 CN2022098767W WO2023077809A1 WO 2023077809 A1 WO2023077809 A1 WO 2023077809A1 CN 2022098767 W CN2022098767 W CN 2022098767W WO 2023077809 A1 WO2023077809 A1 WO 2023077809A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
neural network
map
intermediate feature
depth
Prior art date
Application number
PCT/CN2022/098767
Other languages
English (en)
French (fr)
Inventor
崔岩
常青玲
杨鑫
廖洹浩
王昱涵
Original Assignee
五邑大学
广东四维看看智能设备有限公司
中德(珠海)人工智能研究院有限公司
珠海市四维时代网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 五邑大学, 广东四维看看智能设备有限公司, 中德(珠海)人工智能研究院有限公司, 珠海市四维时代网络科技有限公司 filed Critical 五邑大学
Publication of WO2023077809A1 publication Critical patent/WO2023077809A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images

Definitions

  • the invention relates to the field of artificial intelligence, in particular to a neural network training method, electronic equipment and a computer storage medium.
  • Monocular depth estimation refers to predicting the depth corresponding to each pixel from a single picture, but the geometric information in a single picture is limited, thus limiting the accuracy of its depth estimation.
  • Monocular depth estimation is widely used in many fields, such as indoor scene modeling, SLAM and robot automatic navigation.
  • the monocular depth estimation mainly uses deep neural network to predict the target image.
  • the deep neural network shows good performance in the extraction of depth information, but there is obvious loss of scene structure information in the process of information extraction. , the loss of structural information in the scene, that is, the loss of features will lead to the blurring of the depth map, which will eventually reduce the prediction accuracy, and will also cause a series of problems such as pixel drift when projecting the point cloud, which will affect the accuracy of the prediction results.
  • the present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the present invention provides a neural network training method, electronic equipment and computer storage medium, which can avoid the problem of fuzzy boundaries and improve the accuracy of prediction results.
  • the embodiment of the first aspect of the present invention provides a method for neural network training, including the following steps:
  • the preset training sample map is used as input, and the predicted feature map is output as the output to train the neural network model to obtain the trained neural network model.
  • the present invention has at least the following beneficial effects: by setting the boundary-aware depth loss function, the attention of the neural network model to the boundary area during the training process can be effectively improved, ensuring that the depth and depth gradient of the boundary area have a good Correctness, so as to effectively suppress the smoothing phenomenon, avoid the problem of blurred boundary areas, and finally effectively improve the accuracy of boundary prediction.
  • the accuracy of prediction results When applied to monocular depth estimation, it can effectively improve the accuracy of prediction results.
  • the preset training sample map is used as input, and the predicted feature map is used as output to train the neural network model to obtain the trained neural network model, include:
  • the calculation formula of the boundary-aware depth loss function is:
  • is the boundary-aware weight
  • d is the true depth
  • is the perception factor
  • the calculation formula of the boundary perception weight is:
  • g x is the gradient of x scale on the real depth map
  • g y is the gradient of y scale on the real depth map
  • N is the total number of pixels; is the real item, is the error term.
  • the scale feature map is convolved and compressed to obtain a plurality of first intermediate feature maps, including:
  • Each low-scale feature map is sampled to the same resolution, resulting in multiple first intermediate feature maps.
  • each first intermediate feature map is connected and fused to obtain a global feature map, including:
  • a fusion layer is used to connect and fuse each first intermediate feature map to obtain a global feature map; wherein the fusion layer includes two convolution kernels of different sizes.
  • the second intermediate feature map is pooled and reduced to obtain a third intermediate feature map, including:
  • the second intermediate feature map is pooled and reduced by an average pooling layer to obtain a third intermediate feature map.
  • the third intermediate feature map is convolutionally compressed and activated to obtain a fourth intermediate feature map, including:
  • the second convolutional layer is used to perform convolution compression on the third intermediate feature map and the relu function is used to perform non-linear activation to obtain the fourth intermediate feature map; wherein, the size of the convolution kernel corresponding to the second convolutional layer is 1.
  • the embodiment of the second aspect of the present invention provides an electronic device, including:
  • a memory a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the computer program, the neural network training method of any one of the first aspects is realized.
  • the electronic device in the embodiment of the second aspect applies any one of the neural network training methods in the first aspect, it has all the beneficial effects of the first aspect of the present invention.
  • computer-executable instructions are stored, and the computer-executable instructions are used to execute the neural network training method of any one of the first aspect.
  • the computer storage medium in the embodiment of the third aspect can execute any one of the neural network training methods in the first aspect, it has all the beneficial effects of the first aspect of the present invention.
  • Fig. 1 is the main step figure of the method for neural network training of the embodiment of the present invention
  • Fig. 2 is the step figure that obtains neural network model in the method for neural network training of the embodiment of the present invention
  • Fig. 3 is the working principle diagram of the neural network model of the embodiment of the present invention.
  • Fig. 4 is a working principle diagram of the scene understanding module in Fig. 3;
  • Fig. 5 is a working principle diagram of the scaling module in Fig. 3 .
  • Scene understanding module 100 scale transformation module 200 .
  • a method for neural network training includes the following steps:
  • Step S100 obtaining a preset training sample map
  • Step S200 obtaining a prediction feature map
  • Step S300 based on the preset boundary-aware depth loss function, using the preset training sample map as input and the predicted feature map as output, train the neural network model to obtain the trained neural network model.
  • the attention of the neural network model to the boundary area during the training process can be effectively improved, and the depth and depth gradient of the boundary area are guaranteed to be correct, thereby effectively suppressing the smoothing phenomenon and avoiding the blurring of the boundary area.
  • the problem can finally effectively improve the accuracy of boundary prediction, and when applied to monocular depth estimation, it can effectively improve the accuracy of prediction results.
  • step S300 includes the following steps:
  • Step S310 the preset training sample map is input to the neural network model for feature extraction, and multiple scale feature maps are obtained.
  • the image to be processed is encoded by an encoder, and then input into a neural network model, thereby extracting and obtaining multiple scale feature maps.
  • Step S320 performing convolution and compression on all the scale feature maps to obtain a plurality of first intermediate feature maps. That is, the feature map of each scale is compressed by convolution.
  • Step S330 connect and fuse each first intermediate feature map to obtain a global feature map. Specifically, after connecting the first intermediate feature maps, they are fused through the fusion layer to obtain a global feature map with global scene structure information, which can effectively avoid loss of low-level features.
  • Step S340 upsampling the global feature map to a preset scale and performing convolution compression to obtain a second intermediate feature map, which can meet conversion requirements of different scales.
  • Step S350 performing pooling and reduction on the second intermediate feature map to obtain a third intermediate feature map.
  • the second intermediate feature map is processed into a single pixel.
  • Step S360 perform convolution compression and activation on the third intermediate feature map to obtain a fourth intermediate feature map.
  • Step S370 deconvolution restores and activates the fourth intermediate feature map to obtain a mapped feature map.
  • Step S380 Multiply the mapped feature map with the second intermediate feature map and perform deconvolution restoration to obtain a predicted feature map. Specifically, the predicted feature maps are sent to each corresponding decoding step and jointly decoded.
  • Step S390 Calculate and obtain a loss function value according to the output value corresponding to the predicted feature map and a preset boundary-aware depth loss function (Boundary Aware Depth loss, BAD).
  • the loss of scene structure information can be effectively avoided, and by scaling the global feature map, it can output multiple scales.
  • the feature map can meet the prediction requirements of neural network training; when performing scale conversion, not only the resolution of the feature image is changed, but also the information of the global feature map is scaled, so as to avoid introducing too much redundant information and reduce
  • the model parameters can effectively improve the prediction performance; in addition, the boundary-aware depth loss function is introduced to improve the attention of the neural network model to the boundary area during the training process, thereby effectively suppressing the smoothing phenomenon, and finally can effectively improve the boundary prediction accuracy.
  • Using the neural network model trained in the embodiment of the present invention to process the image can extract depth information, thereby realizing monocular depth estimation.
  • the neural network model is easy to train in the process of training. Missing these relatively small areas will cause the model to get a boundary area with a small gradient when making predictions, which will cause the problem of blurred boundaries in the depth map; boundary blurring will not only reduce the prediction accuracy, but also cause pixel drift when projecting the point cloud This phenomenon may also cause the object to be misjudged as the background, and finally endow the object with a depth value similar to the background.
  • setting the boundary-aware depth loss function during the training process of the neural network can effectively improve the attention of the neural network model to the boundary area, so that the neural network model can pay more attention to the error caused by the boundary area during the training process.
  • the accuracy of depth prediction can be effectively improved, and the problem of blurring the boundary of the depth map can be avoided.
  • the boundary-aware depth loss function is set for training iterations, which satisfy the following equation:
  • is the boundary-aware weight
  • d is the true depth
  • is the perception factor
  • step S390 the predicted depth value in the predicted feature map is input into the boundary-aware depth loss function to obtain the loss function value, and by comparing the loss function value with the preset target value, if the training stop condition is not satisfied , then adjust the boundary-aware weights and repeat step S310 to step S390 to train the neural network model until the loss function value satisfies the condition for stopping the training.
  • the preset target value is also called the preset threshold, and the condition for stopping training is generally set when the loss function value is less than or equal to the preset threshold.
  • boundary-aware weights are defined as:
  • g x is the gradient of x scale on the real depth map
  • g y is the gradient of y scale on the real depth map
  • N represents the total number of pixels
  • the real term When the corresponding pixel has a large gradient on the real depth map, the real term will become larger; if there is a large gradient prediction error, the error term will become larger, and guide the neural network model to focus on these gradient errors. field.
  • the depth of the background and the object in the image are too large or too small at the same time, that is, when the gradient error is small and the depth error is large, by generating a larger real item, the model can be guided to pay attention even when the gradient error is small. these areas.
  • the smoothing phenomenon By increasing the proportion of the object boundary area in the image in the overall training loss, the smoothing phenomenon can be effectively suppressed, and finally the boundary prediction accuracy can be effectively improved.
  • step S320 performs convolution and compression on all scale feature maps to obtain multiple first intermediate feature maps, including but not limited to the following steps:
  • Step S321 using the first convolution layer with at least one convolution kernel to perform convolution compression on feature maps of each scale to obtain multiple low-size feature maps;
  • Step S322. Sampling each low-size feature map to the same resolution to obtain multiple first intermediate feature maps.
  • the first convolutional layer with the same two convolution kernels is used to compress each scale feature map into a low-scale feature map with channel reduction, and then use methods such as secondary scaling to sample each low-scale feature map to the same resolution, so as to obtain multiple first intermediate feature maps, and the resolution of each first intermediate feature map is the same.
  • step S330 connecting and fusing each first intermediate feature map to obtain a global feature map, includes but is not limited to the following steps:
  • each first intermediate feature map is connected and fused to obtain a global feature map.
  • the fusion layer can be adjusted adaptively, and the neural network model obtained after training can adaptively fuse the first intermediate feature map to obtain a global feature map with global scene information.
  • step S350 the second intermediate feature map is pooled and reduced by the average pooling layer, and the second intermediate feature map is processed into a single pixel, thereby obtaining the third intermediate feature map.
  • step S360 performing convolution compression and activation on the third intermediate feature map to obtain the fourth intermediate feature map, includes but not limited to the following steps:
  • step S370 deconvolution restores and activates the fourth intermediate feature map to obtain a mapped feature map, including but not limited to the following steps:
  • the fourth intermediate feature map is restored by the third convolutional layer with a convolution kernel of 1*1 to increase the number of channels, and then activated using the sigmoid function to obtain the mapped feature map.
  • the number of channels is increased by means of convolution restoration.
  • steps S340 to S380 the features are compressed and restored multiple times, and the attention channel attention mechanism is applied to adaptively adjust the corresponding weights.
  • the final target neural network module obtains the global feature map, it can Adaptive transformation is implemented on the scale of features.
  • the neural network model in the embodiment of the present invention includes a scene understanding module 100 and a scale transformation module 200, the scene understanding module 100 is used to perform steps S310 to S330, and the scale transformation module 200 is used to perform steps S340 to S330 S380.
  • the encoder encodes the image to be processed and inputs it to the scene understanding module 100, extracts and obtains all scale feature maps, performs convolution and compression on the scale feature maps to obtain the first intermediate feature map, connects each first intermediate feature map and performs fusion through the fusion layer.
  • the scene understanding module 100 finally outputs the global feature map;
  • the scale transformation module 200 is used to sample the global feature map to the same scale and perform convolution compression to obtain the second intermediate feature map, and then pool the second intermediate feature map to shrink Obtain the third intermediate feature map for a single pixel, then perform convolution compression on the third intermediate feature map and activate to obtain the fourth intermediate feature map, then perform deconvolution recovery on the fourth intermediate feature map and activate to obtain the mapped feature map, and finally The second intermediate feature map and the mapped feature map are multiplied and recovered by deconvolution to obtain the predicted feature map.
  • the above-mentioned scene understanding module 100 and scale transformation module 200 are both part of the neural network model.
  • the scene understanding module 100 and the scale transformation module 200 establish a feature pyramid, and cooperate with the setting of the boundary-aware depth loss function to solve the boundary blur problem of the depth map.
  • All scale features of the image to be processed are obtained through the scene understanding module 100, and a global feature map is obtained, and the global feature map is scale-transformed through the scale transformation module 200, so as to obtain multi-scale prediction feature maps.
  • the formation of the global feature map can avoid the loss of scene structure information, ensure the accuracy of the depth map obtained by prediction, and obtain the multi-scale prediction feature map through scale conversion, which can meet the needs of the decoding stage and avoid introducing too much information to the decoder. A lot of redundant information will affect the prediction performance.
  • step S300 in the neural network training method of the first aspect of the present invention will be described in detail in a specific embodiment in combination with the structure of the neural network model. It should be understood that the following description is only an illustration rather than a specific limitation to the invention.
  • step S310 the preset training sample map is input to the scene understanding module 100, and the scene understanding module 100 extracts and obtains all scale feature maps; step S321, as shown in Figure 3, the scene understanding module 100 sets the sampling module , the sampling module uses two first convolutional layers with a convolution kernel size of 3*3 to convolute and compress each scale feature map to 64 channels to obtain a low-size feature map; step S322, sampling in the scene understanding module 100 The module samples each low-scale feature map to the same resolution by means of secondary scaling to obtain the first intermediate feature map, and the resolution of each first intermediate feature map is the same; step S330, uses the fusion layer to convert each first intermediate feature map The intermediate feature maps are polarized and fused.
  • the scene understanding module 100 finally outputs a global feature map of 128 channels.
  • the fusion layer is the fusion module in FIG. 3. Specifically, the fusion layer includes two convolutional layers. The first layer The convolution kernel size is 5*5, and the convolution kernel size of the second layer is 3*3.
  • step S340 the scaling module 200 uses the first convolutional layer with a convolution kernel size of 3*3 to convolute and compress the global feature map to 64 channels, thereby obtaining a second intermediate feature map; step S350 , the scale transformation module 200 uses the average pooling layer to process the second intermediate feature map into a single pixel, thereby obtaining the third intermediate feature map; step S360, the scale transformation module 200 uses the second convolution with a convolution kernel size of 1*1 The layer convolutes and compresses the third intermediate feature map to 32 channels, and then uses the relu function to perform non-linear activation to obtain the fourth intermediate feature map; step S370, the scaling module 200 adopts the convolution kernel size of 1*1.
  • the third convolutional layer deconvolves the fourth intermediate feature map to 64 channels, and then uses the sigmoid function to activate to obtain the mapped feature map; step S380, the scale transformation module 200 performs the second intermediate feature map and the mapped feature map The product is then deconvoluted to 128 channels using the first convolutional layer with a convolution kernel size of 3*3, and finally the predicted feature map is obtained.
  • Repeating steps S340 to S380 can transform and obtain prediction feature maps at different scales.
  • the boundary-aware depth loss is specifically used as a loss function for iterative training. Define the boundary-aware depth loss as:
  • is the boundary-aware weight
  • d is the true depth
  • is the perception factor.
  • set ⁇ 3, and use the Sobel operator to extract the gradient. If the boundary-aware weight is large, the corresponding pixels will bring a large loss to the model, thereby forcing the neural network model to pay attention to these areas, and finally achieve the effect of increasing the proportion of the boundary area in the overall training loss.
  • the boundary-aware weights are defined as:
  • g x is the gradient of x scale on the real depth map
  • g y is the gradient of y scale on the real depth map
  • Is is the gradient of the y-scale on the predicted depth map
  • N represents the total number of pixels
  • the boundary perception weight includes two items, which are: is the real item, is the error term.
  • the real term When the corresponding pixel has a large gradient on the real depth map, the real term will become larger; if there is a large gradient prediction error, the error term will become larger, and guide the neural network model to focus on these gradient errors. field.
  • the background and object depth in the image are too large or too small at the same time, that is, when the gradient error is small and the depth error is large, by generating a larger real item, even when the gradient error is small, the model can be guided to pay attention to these fields.
  • the embodiment of the second aspect of the present invention also provides an electronic device, which includes: a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor and memory can be connected by a bus or other means.
  • memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices.
  • the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the non-transitory software programs and instructions required to realize the neural network training method of the first aspect embodiment above are stored in the memory, and when executed by the processor, the neural network training method in the above embodiment is executed, for example, the above The method steps S100 to S300 and the method steps S310 to S390 in FIG. 2 are described.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • an embodiment of the present invention also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned
  • a processor in the device embodiment executes the above-mentioned processor to execute the neural network training method in the above-mentioned embodiment, for example, execute the method steps S100 to S300 and method steps S310 to S390 in FIG. 1 described above.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
  • references to the terms “one embodiment,” “some embodiments,” “exemplary embodiments,” “example,” “specific examples,” or “some examples” are intended to mean that the implementation A specific feature, structure, material, or characteristic described by an embodiment or example is included in at least one embodiment or example of the present invention.
  • schematic representations of the above terms do not necessarily refer to the same embodiment or example.
  • the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络训练的方法、电子设备及计算机存储介质,包括如下步骤:获取预设的训练样本图(S100);获取预测特征图(S200);基于预设的边界感知深度损失函数,以预设的训练样本图为输入,预测特征图为输出,训练神经网络模型,得到训练后的神经网络模型(S300)。通过设置边界感知深度损失函数,能够有效提高神经网络模型在训练过程中对边界区域的关注度,保证边界区域的深度和深度梯度具备良好的正确性,从而有效抑制平滑现象,避免出现边界区域模糊的问题,最终能够有效提高边界预测精度。

Description

神经网络训练的方法、电子设备及计算机存储介质 技术领域
本发明涉及人工智能领域,特别涉及一种神经网络训练的方法、电子设备及计算机存储介质。
背景技术
随着深度学习的发展,各类视觉估算任务的准确度以及效率都得到长足的进步。单目深度估计是指从单张图片中预测每个像素点对应的深度,但是单张图片中的几何信息有限,因此限制了其深度估计的准确度。单目深度估计被广泛应用于多个领域,如室内场景建模、SLAM和机器人自动导航等领域。
相关技术中,单目深度估计主要采用深度神经网络来对目标图像进行预测,深度神经网络在深度信息的提取方面表现出较好的性能,但是在信息提取的过程中存在明显的场景结构信息丢失,场景中的结构信息即特征的损失会导致深度图模糊,最终会降低预测精度,还会导致投影点云时出现像素漂移等一系列问题,影响预测结果的准确性。
发明内容
本发明旨在至少解决现有技术中存在的技术问题之一。为此,本发明提供了一种神经网络训练的方法、电子设备及计算机存储介质,能够避免出现边界模糊的问题,提高预测结果的准确性。
本发明第一方面实施例提供一种神经网络训练的方法,包括如下步骤:
获取预设的训练样本图;
获取预测特征图;
基于预设的边界感知深度损失函数,以预设的训练样本图为输入,预测特征图为输出,训练神经网络模型,得到训练后的神经网络模型。
根据本发明的上述实施例,至少具有如下有益效果:通过设置边界感知深度损失函数,能够有效提高神经网络模型在训练过程中对边界区域的关注度,保证边界区域的深度和深度梯度具备良好的正确性,从而有效抑制平滑现象,避免出现边界区域模糊的问题,最终能够有效提高边界预测精度,应用于单目深度估计时,可有效提高预测结果的准确性。
根据本发明第一方面的一些实施例,基于预设的边界感知深度损失函数,以预设的训练样本图为输入,预测特征图为输出,训练神经网络模型,得到训练后的神经网络模型,包括:
将预设的训练样本图输入到神经网络模型进行特征提取,得到多个尺度特征图;
将尺度特征图进行卷积压缩,得到多个第一中间特征图;
将每一第一中间特征图进行连接并融合,得到全局特征图;
将全局特征图上采样到预设尺度并进行卷积压缩,得到第二中间特征图;
将第二中间特征图进行池化缩小,得到第三中间特征图;
将第三中间特征图进行卷积压缩并激活,得到第四中间特征图;
将第四中间特征图进行反卷积恢复并激活,得到映射特征图;
将映射特征图与第二中间特征图进行乘积并进行反卷积恢复,以输出得到预测特征图;
根据预测特征图对应的输出值和预设的边界感知深度损失函数,计算得到损失函数值;
根据损失函数值与预设目标值调整神经网络模型的权重,对神经网络模型进行训练,直至损失函数值满足停止训练条件。
根据本发明第一方面的一些实施例,边界感知深度损失函数的计算公式为:
Figure PCTCN2022098767-appb-000001
其中,ω是边界感知权重,d是真实深度,
Figure PCTCN2022098767-appb-000002
是预测深度,α是感知因子。
根据本发明第一方面的一些实施例,边界感知权重的计算公式为:
Figure PCTCN2022098767-appb-000003
其中,g x是在真实深度图上的x尺度的梯度,g y是真实深度图上的y尺度的梯度,
Figure PCTCN2022098767-appb-000004
是预测深度图上的x尺度的梯度,
Figure PCTCN2022098767-appb-000005
是预测深度图上y尺度的梯度,N是像素的总数;
Figure PCTCN2022098767-appb-000006
是真实项,
Figure PCTCN2022098767-appb-000007
是误差项。
根据本发明第一方面的一些实施例,将尺度特征图进行卷积压缩,得到多个第一中间特征图,包括:
采用具有至少一个卷积核的第一卷积层对每一尺度特征图均进行卷积压缩,得到多个低尺寸特征图;
将每一低尺寸特征图采样到相同的分辨率,得到多个第一中间特征图。
根据本发明第一方面的一些实施例,将每一第一中间特征图进行连接并融合,得到全局特征图,包括:
采用融合层将每一第一中间特征图进行连接并融合,得到全局特征图;其中融合层包括两个大小不同的卷积核。
根据本发明第一方面的一些实施例,将第二中间特征图进行池化缩小,得到第三中间特征图,包括:
通过平均池化层将第二中间特征图进行池化缩小,得到第三中间特征图。
根据本发明第一方面的一些实施例,将第三中间特征图进行卷积压缩并激活,得到第四中间特征图,包括:
采用第二卷积层对第三中间特征图进行卷积压缩并采用relu函数进行非线性激活,得到第四中间特征图;其中,第二卷积层对应的卷积核大小为1。
本发明第二方面实施例提供一种电子设备,包括:
存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现第一方面任意一项的神经网络训练的方法。
由于第二方面实施例的电子设备应用第一方面任意一项的神经网络训练的方法,因此具有本发明第一方面的所有有益效果。
根据本发明第三方面实施例提供的一种计算机存储介质,存储有计算机可执行指令,计算机可执行指令用于执行第一方面任意一项的神经网络训练的方法。
由于第三方面实施例的计算机存储介质可执行第一方面任意一项的神经网络训练的方法,因此具有本发明第一方面的所有有益效果。
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。
附图说明
本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:
图1是本发明实施例的神经网络训练的方法的主要步骤图;
图2是本发明实施例的神经网络训练的方法中得到神经网络模型的步骤图;
图3是本发明实施例的神经网络模型的工作原理图;
图4是图3中场景理解模块的工作原理图;
图5是图3中尺度变换模块的工作原理图。
附图标记:
场景理解模块100、尺度变换模块200。
具体实施方式
本发明的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本发明中的具体含义。 在本发明的描述中,若干的含义是一个或者多个,多个的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。此外,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。
单目深度估算大多都致力于提升深度预测的精度,而忽略了场景结构信息的完整性,在投影点云时会导致像素偏移,影响预测结果的准确性。
下面参照图1至图5描述本发明的神经网络训练的方法、设备及计算机存储介质。
如图1所示,根据本发明第一方面实施例的一种神经网络训练的方法,包括如下步骤:
步骤S100、获取预设的训练样本图;
步骤S200、获取预测特征图;
步骤S300、基于预设的边界感知深度损失函数,以预设的训练样本图为输入,预测特征图为输出,训练神经网络模型,得到训练后的神经网络模型。
通过设置边界感知深度损失函数,能够有效提高神经网络模型在训练过程中对边界区域的关注度,保证边界区域的深度和深度梯度具备良好的正确性,从而有效抑制平滑现象,避免出现边界区域模糊的问题,最终能够有效提高边界预测精度,应用于单目深度估计时,可有效提高预测结果的准确性。
具体的,步骤S300,包括以下步骤:
步骤S310、预设的训练样本图输入到神经网络模型进行特征提取,得到多个尺度特征图。具体的,利用编码器对待处理图像进行编码,然后输入到神经网络模型中,从而提取获得多个的尺度特征图。
步骤S320、将所有的尺度特征图均进行卷积压缩,获得多个第一中间特征图。即采用卷积的方式对每个尺度特征图进行压缩。
步骤S330、将每一第一中间特征图进行连接并融合,获得全局特征图。具体的,连接各个第一中间特征图后,通过融合层进行融合,从而获得具备全局场景结构信息的全局特征图,能够有效避免低级特征丢失。
步骤S340、将全局特征图上采样到预设尺度并进行卷积压缩,得到第二中间特征图,能够满足不同尺度的转换需求。
步骤S350、将第二中间特征图进行池化缩小,得到第三中间特征图,池化缩小的过程中,第二中间特征图被处理为单个像素。
步骤S360、将第三中间特征图进行卷积压缩并激活,得到第四中间特征图。
步骤S370、将第四中间特征图进行反卷积恢复并激活,得到映射特征图。
步骤S380、将映射特征图与第二中间特征图进行乘积并进行反卷积恢复,以输出得到预测特征图。具体的,预测特征图被发送到各个相应的解码步骤,并进行联合解码。
步骤S390、根据预测特征图对应的输出值和预设的边界感知深度损失函数(Boundary Aware Depth loss,BAD),计算得到损失函数值。
判断损失函数值与预设目标值的比较结果是否满足停止训练条件,若不满足则调整神经网络模型的权重,并重复执行步骤S310至步骤S390对神经网络模型进行训练;若满足则直接得到训练好的神经网络模型。
通过聚合待处理图像的所有尺度特征,并将这些尺度特征融合得到具备全局场景信息的全局特征图,可有效避免场景结构信息丢失,且通过对全局特征图进行尺度转换,从而输出多个尺度下的特征图,能够满足神经网络训练的预测需求;在进行尺度转换时,不仅仅改变特征图像的分辨率,还会对全局特征图的信息进行尺度变换,从而避免引入太多冗余信息,减少模型参数,从而能有效提高预测性能;此外,引入边界感知深度损失函数,用于提高神经网络模型在训练过程中对边界区域的关注度,从而有效抑制平滑现象,最终能够有效提高边界预测精度。
采用本发明实施例训练好的神经网络模型对图像进行处理能够提取深度信息,从而实现单目深度估算。
可以理解的是,由于深度图中非边界区域的深度梯度变化较小,边界区域的深度梯度变化较大,而边界区域在整体场景中的占比较小,神经网络模型在进行训练的过程中容易错过这些占比较小的区域,导致模型在进行预测时得到梯度较小的边界区域,会形成深度图边界模糊的问题;边界模糊不仅会降低预测精度,而且还会在投影点云时出现像素漂移的现象,还能够会导致将物体误判为背景,最后赋予物品与背景相近的深度值。因此,在神经网络的训练过程中设置边界感知深度损失函数,能够有效提高神经网络模型对边界区域的关注度,从而使神经网络模型在训练的过程中更加注意到边界区域所带来的误差,从而有效提高深度预测的精度,避免产生深度图边界模糊的问题。
可以理解的是,在神经网络模型的训练过程中,设置边界感知深度损失函数进行训练迭代,满足以下方程:
Figure PCTCN2022098767-appb-000008
其中,ω是边界感知权重,d是真实深度,
Figure PCTCN2022098767-appb-000009
是预测深度,α是感知因子。如果边界感知权重很大,那么对应的像素就会为模型带来较大的损失迫使神经网络模型去关注这些区域。
具体的,在步骤S390中,将预测特征图中的预测深度值输入到边界感知深度损失函数中,获得损失函数值,通过将损失函数值和预设目标值进行比较,若不满足停止训练条件,则调整边界感知权重并重复执行步骤S310至步骤S390对神经网络模型进行训练,直至损失函数值满足停止训练的条件。预设目标值又称预设阈值,停止训练条件一般设置为损失函数值小于或等于预设阈值。可选地,设置α=0.3,其中,边界感知权重的调整梯度可以通过Sobel算子来获得。
可以理解的是,边界感知权重定义为:
Figure PCTCN2022098767-appb-000010
其中,g x是在真实深度图上的x尺度的梯度,g y是真实深度图上的y尺度的梯度,
Figure PCTCN2022098767-appb-000011
是预测深度图上的x尺度的梯度,
Figure PCTCN2022098767-appb-000012
是预测深度图上y尺度的梯度,N表示像素的总数;
Figure PCTCN2022098767-appb-000013
是真实项,
Figure PCTCN2022098767-appb-000014
是误差项。
当对应像素在真实深度图上有很大的梯度时,真实项则会变大;如果存在较大的梯度预测误差,则误差项会变大,并引导神经网络模型聚焦这些梯度误差较大的领域。当图像中的背景和物体深度同时过大或过小,即梯度误差较小深度误差较大时,通过产生较大的真实项,在即使梯度误差很小的情况下,也能够引导模型去关注这些领域。通过增大图像中物体边界区域在总体训练损失中的占比,从而有效抑制平滑现象,最终能够有效提高边界预测精度。
可以理解的是,步骤S320将所有尺度特征图进行卷积压缩,得到多个第一中间特征图,包括但不限于以下步骤:
步骤S321、使用具有至少一个卷积核的第一卷积层将各个尺度特征图均进行卷积压缩,得到多个低尺寸特征图;
步骤S322、将每一低尺寸特征图采样到相同的分辨率,得到多个第一中间特征图。
具体的,采用两个卷积核相同的第一卷积层来将各个尺度特征图压缩为通道减少的低尺寸特征图,再采用如二次缩放等方式将每个低尺度特征图采样到相同的分辨率,从而获得多个第一中间特征图,各个第一中间特征图的分辨率相同。
可以理解的是,步骤S330、将每一第一中间特征图进行连接并融合,得到全局特征图,包括但不限于以下步骤:
使用由两个大小不同的卷积核组成的融合层,将每一第一中间特征图进行连接并融合,得到全局特征图。在神经网络模型训练的过程中,融合层能够自适应地调整,训练后得到的 神经网络模型能够自适应地融合第一中间特征图,获得具备全局场景信息的全局特征图。
可以理解的是,步骤S350中,第二中间特征图通过平均池化层进行池化缩小,将第二中间特征图处理为单个像素,从而获得第三中间特征图。
可以理解的是,步骤S360,将第三中间特征图进行卷积压缩并激活,得到第四中间特征图,包括但不限于以下步骤:
采用第二卷积层对第三中间特征图进行卷积压缩,并采用relu函数进行非线性激活,得到第四中间特征图;其中,第二卷积层对应的卷积核大小为1*1。
可以理解的是,步骤S370,将第四中间特征图进行反卷积恢复并激活,得到映射特征图,包括但不限于以下步骤:
第四中间特征图通过卷积核为1*1的第三卷积层恢复来增加通道数,然后使用sigmoid函数进行激活,获得映射特征图。
具体的,映射特征图和第二中间特征图进行乘积以后,通过卷积恢复的方式来增加通道数。
在步骤S340至步骤S380中,对特征进行多次的压缩和恢复,并施加注意通道注意力机制,从而适应性调节对应的权重,最终所获的目标神经网络模块在获取全局特征图之后,可以对特征的尺度实现自适应的转换。
如图3至5所示,本发明实施例中的神经网络模型包括场景理解模块100和尺度变换模块200,场景理解模块100用于执行步骤S310至S330,尺度变换模块200用于执行步骤S340至S380。编码器对待处理图像进行编码之后输入到场景理解模块100,提取获得所有的尺度特征图,将尺度特征图进行卷积压缩获得第一中间特征图,连接各个第一中间特征图并通过融合层进行融合,场景理解模块100最终输出全局特征图;尺度变换模块200,用于对全局特征图采样到相同的尺度并进行卷积压缩获得第二中间特征图,再将第二中间特征图池化缩小为单个像素获得第三中间特征图,再将第三中间特征图进行卷积压缩并激活获得第四中间特征图,然后将第四中间特征图进行反卷积恢复并激活获得映射特征图,最后将第二中间特征图和映射特征图进行乘积并通过反卷积恢复获得预测特征图。
上述的场景理解模块100和尺度变换模块200均为神经网络模型的一部分,场景理解模块100和尺度变换模块200建立特征金字塔,配合边界感知深度损失函数的设置,能够解决深度图的边界模糊问题。
通过场景理解模块100获得待处理图像的所有尺度特征,并获得全局特征图,通过尺度变换模块200将全局特征图进行尺度转换,从而获得多尺度下的预测特征图。全局特征图的形成可避免场景结构信息发生丢失,能够确保预测所获深度图的准确度,通过尺度转换获得 多尺度下的预测特征图,能够满足解码阶段的需求,同时避免向解码器引入太多的冗余信息而影响预测性能。
下面参考图1至图5,以一个具体的实施例结合神经网络模型的结构来详细描述本发明第一方面的神经网络训练的方法中的步骤S300。值得理解的是,下述描述仅是示例性说明,而不是对发明的具体限制。
如图4所示,步骤S310,预设的训练样本图输入到场景理解模块100,场景理解模块100提取获得所有的尺度特征图;步骤S321,如图3所示,场景理解模块100设置采样模块,采样模块使用两个卷积核大小为3*3的第一卷积层将每一尺度特征图卷积压缩至64个通道,获得低尺寸特征图;步骤S322,场景理解模块100中的采样模块通过二次缩放等方式将每个低尺度特征图采样到相同的分辨率来获得第一中间特征图,各个第一中间特征图的分辨率相同;步骤S330,采用融合层将每一第一中间特征图极性连接并融合,场景理解模块100最终输出128个通道的全局特征图,融合层即为图3中的融合模块,具体的,融合层包括两个卷积层,第一层的卷积核大小为5*5,第二层的卷积核大小为3*3。
如图5所示,步骤S340,尺度变换模块200使用卷积核大小为3*3的第一卷积层将全局特征图卷积压缩到64个通道,从而获得第二中间特征图;步骤S350,尺度变换模块200使用平均池化层将第二中间特征图处理为单个像素,从而获得第三中间特征图;步骤S360,尺度变换模块200使用卷积核大小为1*1的第二卷积层将第三中间特征图卷积压缩至32个通道,接着采用relu函数进行非线性激活,从而获得第四中间特征图;步骤S370,尺度变换模块200采用卷积核大小为1*1的第三卷积层将第四中间特征图反卷积恢复至64个通道,接着采用sigmoid函数进行激活,从而获得映射特征图;步骤S380,尺度变换模块200将第二中间特征图和映射特征图进行乘积,接着采用卷积核大小为3*3的第一卷积层反卷积恢复至128个通道,最后获得预测特征图。
重复步骤S340至步骤S380能够变换获得不同尺度下的预测特征图。
在场景理解模块100和尺度变换模块200所组成的神经网络模型中,通过施加通道注意力来进行训练,训练过程中,具体使用边界感知深度损失作为损失函数来进行迭代训练。定义边界感知深度损失为:
Figure PCTCN2022098767-appb-000015
其中,ω是边界感知权重,d是真实深度,
Figure PCTCN2022098767-appb-000016
是预测深度,α是感知因子。可选地,设置α=3,并使用Sobel算子来提取梯度。如果边界感知权重很大,那么对应的像素就会为模型 带来较大的损失,从而迫使神经网络模型去关注这些区域,最终达到增大边界区域在总体训练损失中占比的效果。
边界感知权重定义为:
Figure PCTCN2022098767-appb-000017
其中,g x是在真实深度图上的x尺度的梯度,g y是真实深度图上的y尺度的梯度,
Figure PCTCN2022098767-appb-000018
是预测深度图上的x尺度的梯度,
Figure PCTCN2022098767-appb-000019
是预测深度图上y尺度的梯度,N表示像素的总数;其中边界感知权重包括两项,这两项分别为:
Figure PCTCN2022098767-appb-000020
是真实项,
Figure PCTCN2022098767-appb-000021
是误差项。
当对应像素在真实深度图上有很大的梯度时,真实项则会变大;如果存在较大的梯度预测误差,则误差项会变大,并引导神经网络模型聚焦这些梯度误差较大的领域。当图像中的背景和物体深度同时过大或过小,即梯度误差较小深度误差较大时,通过产生较大的真实项,即使在梯度误差很小的情况下,也能够引导模型去关注这些领域。
另外,本发明第二方面实施例还提供了一种电子设备,该电子设备包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序。
处理器和存储器可以通过总线或者其他方式连接。
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
实现上述第一方面实施例的神经网络训练方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的神经网络训练的方法,例如,执行以上描述的图2中的方法步骤S100至S300、方法步骤S310至S390。
以上所描述的设备实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
此外,本发明的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质 存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被上述设备实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的神经网络训练的方法,例如,执行以上描述的图1中的方法步骤S100至S300、方法步骤S310至S390。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示意性实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
尽管已经示出和描述了本发明的实施例,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。

Claims (10)

  1. 一种神经网络训练的方法,其特征在于,包括如下步骤:
    获取预设的训练样本图;
    获取预测特征图;
    基于预设的边界感知深度损失函数,以所述预设的训练样本图为输入,所述预测特征图为输出,训练神经网络模型,得到训练后的所述神经网络模型。
  2. 根据权利要求1所述的神经网络训练的方法,其特征在于,所述基于预设的边界感知深度损失函数,以所述预设的训练样本图为输入,所述预测特征图为输出,训练神经网络模型,得到训练后的所述神经网络模型,包括:
    将所述预设的训练样本图输入到神经网络模型进行特征提取,得到多个尺度特征图;
    将所述尺度特征图进行卷积压缩,得到多个第一中间特征图;
    将每一所述第一中间特征图进行连接并融合,得到全局特征图;
    将所述全局特征图上采样到预设尺度并进行卷积压缩,得到第二中间特征图;
    将所述第二中间特征图进行池化缩小,得到第三中间特征图;
    将所述第三中间特征图进行卷积压缩并激活,得到第四中间特征图;
    将所述第四中间特征图进行反卷积恢复并激活,得到映射特征图;
    将所述映射特征图与所述第二中间特征图进行乘积并进行反卷积恢复,以输出得到所述预测特征图;
    根据所述预测特征图对应的输出值和预设的边界感知深度损失函数,计算得到损失函数值;
    根据所述损失函数值与预设目标值调整所述神经网络模型的权重,对所述神经网络模型进行训练,直至所述损失函数值满足停止训练条件。
  3. 根据权利要求1或2所述的神经网络训练的方法,其特征在于,所述边界感知深度损失函数的计算公式为:
    Figure PCTCN2022098767-appb-100001
    其中,ω是边界感知权重,d是真实深度,
    Figure PCTCN2022098767-appb-100002
    是预测深度,α是感知因子。
  4. 根据权利要求3所述的神经网络训练的方法,其特征在于,所述边界感知权重的计算公式为:
    Figure PCTCN2022098767-appb-100003
    其中,g x是在真实深度图上的x尺度的梯度,g y是真实深度图上的y尺度的梯度,
    Figure PCTCN2022098767-appb-100004
    是预测深度图上的x尺度的梯度,
    Figure PCTCN2022098767-appb-100005
    是预测深度图上y尺度的梯度,N是像素的总数;
    Figure PCTCN2022098767-appb-100006
    是真实项,
    Figure PCTCN2022098767-appb-100007
    是误差项。
  5. 根据权利要求2所述的神经网络训练的方法,其特征在于,所述将所述尺度特征图进行卷积压缩,得到多个第一中间特征图,包括:
    采用具有至少一个卷积核的第一卷积层对每一所述尺度特征图均进行卷积压缩,得到多个低尺寸特征图;
    将每一所述低尺寸特征图采样到相同的分辨率,得到多个所述第一中间特征图。
  6. 根据权利要求2所述的神经网络训练的方法,其特征在于,所述将每一所述第一中间特征图进行连接并融合,得到全局特征图,包括:
    采用融合层将每一所述第一中间特征图进行连接并融合,得到所述全局特征图;其中所述融合层包括两个大小不同的卷积核。
  7. 根据权利要求2所述的神经网络训练的方法,其特征在于,所述将所述第二中间特征图进行池化缩小,得到第三中间特征图,包括:
    通过平均池化层将所述第二中间特征图进行池化缩小,得到所述第三中间特征图。
  8. 根据权利要求2所述的神经网络训练的方法,其特征在于,所述将所述第三中间特征图进行卷积压缩并激活,得到第四中间特征图,包括:
    采用第二卷积层对所述第三中间特征图进行卷积压缩并采用relu函数进行非线性激活,得到第四中间特征图;其中,所述第二卷积层对应的卷积核大小为1。
  9. 一种电子设备,其特征在于,包括:
    存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至8中任意一项所述的神经网络训练的方法。
  10. 一种计算机存储介质,其特征在于,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至8中任意一项所述的神经网络训练的方法。
PCT/CN2022/098767 2021-11-05 2022-06-14 神经网络训练的方法、电子设备及计算机存储介质 WO2023077809A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111307529.4 2021-11-05
CN202111307529.4A CN114170438A (zh) 2021-11-05 2021-11-05 神经网络训练的方法、电子设备及计算机存储介质

Publications (1)

Publication Number Publication Date
WO2023077809A1 true WO2023077809A1 (zh) 2023-05-11

Family

ID=80478118

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098767 WO2023077809A1 (zh) 2021-11-05 2022-06-14 神经网络训练的方法、电子设备及计算机存储介质

Country Status (2)

Country Link
CN (1) CN114170438A (zh)
WO (1) WO2023077809A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116403717A (zh) * 2023-06-08 2023-07-07 广州视景医疗软件有限公司 基于深度学习的近视预测方法、装置、终端设备和介质
CN116679161A (zh) * 2023-05-25 2023-09-01 国网江苏省电力有限公司南京供电分公司 一种电网线路故障诊断方法、设备和介质
CN117420209A (zh) * 2023-12-18 2024-01-19 中国机械总院集团沈阳铸造研究所有限公司 基于深度学习的全聚焦相控阵超声快速高分辨率成像方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114170438A (zh) * 2021-11-05 2022-03-11 五邑大学 神经网络训练的方法、电子设备及计算机存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410261A (zh) * 2018-10-08 2019-03-01 浙江科技学院 基于金字塔池化模块的单目图像深度估计方法
CN111445432A (zh) * 2019-10-14 2020-07-24 浙江科技学院 一种基于信息融合卷积神经网络的图像显著性检测方法
CN111784628A (zh) * 2020-05-11 2020-10-16 北京工业大学 基于有效学习的端到端的结直肠息肉图像分割方法
US20210089807A1 (en) * 2019-09-25 2021-03-25 Samsung Electronics Co., Ltd. System and method for boundary aware semantic segmentation
CN114170438A (zh) * 2021-11-05 2022-03-11 五邑大学 神经网络训练的方法、电子设备及计算机存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410261A (zh) * 2018-10-08 2019-03-01 浙江科技学院 基于金字塔池化模块的单目图像深度估计方法
US20210089807A1 (en) * 2019-09-25 2021-03-25 Samsung Electronics Co., Ltd. System and method for boundary aware semantic segmentation
CN111445432A (zh) * 2019-10-14 2020-07-24 浙江科技学院 一种基于信息融合卷积神经网络的图像显著性检测方法
CN111784628A (zh) * 2020-05-11 2020-10-16 北京工业大学 基于有效学习的端到端的结直肠息肉图像分割方法
CN114170438A (zh) * 2021-11-05 2022-03-11 五邑大学 神经网络训练的方法、电子设备及计算机存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116679161A (zh) * 2023-05-25 2023-09-01 国网江苏省电力有限公司南京供电分公司 一种电网线路故障诊断方法、设备和介质
CN116403717A (zh) * 2023-06-08 2023-07-07 广州视景医疗软件有限公司 基于深度学习的近视预测方法、装置、终端设备和介质
CN116403717B (zh) * 2023-06-08 2023-09-05 广州视景医疗软件有限公司 基于深度学习的近视预测方法、装置、终端设备和介质
CN117420209A (zh) * 2023-12-18 2024-01-19 中国机械总院集团沈阳铸造研究所有限公司 基于深度学习的全聚焦相控阵超声快速高分辨率成像方法
CN117420209B (zh) * 2023-12-18 2024-05-07 中国机械总院集团沈阳铸造研究所有限公司 基于深度学习的全聚焦相控阵超声快速高分辨率成像方法

Also Published As

Publication number Publication date
CN114170438A (zh) 2022-03-11

Similar Documents

Publication Publication Date Title
WO2023077809A1 (zh) 神经网络训练的方法、电子设备及计算机存储介质
CN110232394B (zh) 一种多尺度图像语义分割方法
CN109753971B (zh) 扭曲文字行的矫正方法及装置、字符识别方法及装置
CN110349087B (zh) 基于适应性卷积的rgb-d图像高质量网格生成方法
CN109919110B (zh) 视频关注区域检测方法、装置及设备
CN112862877B (zh) 用于训练图像处理网络和图像处理的方法和装置
WO2023212997A1 (zh) 基于知识蒸馏的神经网络训练方法、设备及存储介质
CN110544214A (zh) 一种图像修复方法、装置及电子设备
CN114936605A (zh) 基于知识蒸馏的神经网络训练方法、设备及存储介质
CN112561792B (zh) 图像风格迁移方法、装置、电子设备及存储介质
WO2023077998A1 (zh) 卷积神经网络中自适应特征融合方法及系统
CN113724136B (zh) 一种视频修复方法、设备及介质
CN112861830B (zh) 特征提取方法、装置、设备、存储介质以及程序产品
CN114511487A (zh) 图像融合方法及装置、计算机可读存储介质、终端
CN114202648B (zh) 文本图像矫正方法、训练方法、装置、电子设备以及介质
CN107590790B (zh) 一种基于对称边缘填充的简单透镜边缘区域去模糊方法
CN111932466B (zh) 一种图像去雾方法、电子设备及存储介质
CN116503686B (zh) 图像矫正模型的训练方法、图像矫正方法、装置及介质
CN116757962A (zh) 一种图像去噪方法、装置
CN111738069A (zh) 人脸检测方法、装置、电子设备及存储介质
Zheng et al. Joint residual pyramid for joint image super-resolution
WO2023206343A1 (zh) 一种基于图像预训练策略的图像超分辨率方法
CN115861401A (zh) 一种双目与点云融合深度恢复方法、装置和介质
CN116363429A (zh) 图像识别模型的训练方法、图像识别方法、装置及设备
CN113344200B (zh) 用于训练可分离卷积网络的方法、路侧设备及云控平台

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22888845

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE