WO2023077998A1 - 卷积神经网络中自适应特征融合方法及系统 - Google Patents

卷积神经网络中自适应特征融合方法及系统 Download PDF

Info

Publication number
WO2023077998A1
WO2023077998A1 PCT/CN2022/121730 CN2022121730W WO2023077998A1 WO 2023077998 A1 WO2023077998 A1 WO 2023077998A1 CN 2022121730 W CN2022121730 W CN 2022121730W WO 2023077998 A1 WO2023077998 A1 WO 2023077998A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
scale
feature
feature fusion
weighted
Prior art date
Application number
PCT/CN2022/121730
Other languages
English (en)
French (fr)
Inventor
罗静
刘阳
孔祥斌
李洪研
沈志忠
李洁
王雪嵩
马黎文
陈树骏
Original Assignee
通号通信信息集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 通号通信信息集团有限公司 filed Critical 通号通信信息集团有限公司
Publication of WO2023077998A1 publication Critical patent/WO2023077998A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • Embodiments of the present disclosure relate to the technical field of artificial intelligence, and in particular to a method and system for adaptive feature fusion in a convolutional neural network, electronic equipment, and a computer-readable storage medium.
  • CNN Convolutional Neural Networks
  • Embodiments of the present disclosure provide an adaptive feature fusion method and system in a convolutional neural network, an electronic device, and a computer-readable storage medium, which can improve the adaptability and convergence of feature fusion to different training objectives, and the performance of deep learning algorithms. While maintaining the overall accuracy, it can effectively save manpower, material resources and time costs.
  • an embodiment of the present disclosure provides an adaptive feature fusion method in a convolutional neural network, which includes: obtaining the weight coefficient of at least one scale feature of the current feature fusion layer; The weight coefficients of the features of the scale are activated and normalized; the features of at least one scale are weighted and fused in the current feature fusion layer, and the weighted fusion results are spliced to obtain adaptive features The fusion results complete the adaptive feature fusion in the convolutional neural network and improve the detection accuracy.
  • the obtaining the weight coefficient of the feature of at least one scale of the current feature fusion layer includes:
  • the features of different scales from different feature extraction layers are fused, and the convolution maps corresponding to the features of all scales are scaled to the same size through downsampling or upsampling operations;
  • Convolution maps of features of different scales from at least one feature extraction layer are respectively sent to a lightweight convolution branch;
  • the value of the result of different convolution branches at any pixel position is used as the weight coefficient of the feature of at least one scale at the pixel position of the convolution image of the current feature fusion layer.
  • the activating and normalizing the weight coefficients of the features of at least one scale of the current feature fusion layer includes:
  • the weighted fusion of the features of at least one scale at the current feature fusion layer includes:
  • the features of each scale are weighted and fused with the features of other scales
  • the weighted fusion of the features of each scale and the features of other scales at any pixel position on the convolution map of the current feature fusion layer includes:
  • the weighted fusion of the features of each scale and the features of other scales at any pixel position on the convolution map of the current feature fusion layer includes:
  • the normalized weight coefficient of the feature of the mth scale at the pixel position of the convolution image of the l feature fusion layer is less than the mean value of the normalized weight coefficient of all M features of different scales, according to the mth
  • the normalized weight coefficient of the features of the first scale at the pixel position of the convolutional image of the first feature fusion layer, the features of M different scales, and the features of other scales at the pixel position of the convolutional image of the first feature fusion layer The weighted mean value of all the features whose normalized weight coefficients are greater than 1/M, determine the result of the weighted fusion of the features of the scale at the pixel position of the convolution image and the features of other scales; where m, M, and l are is an integer greater than or equal to 1.
  • the weighted fusion of the features of each scale and the features of other scales at any pixel position on the convolution map of the current feature fusion layer it also includes:
  • the weighted mean value is the mean value of all features whose normalized weight coefficient is greater than 1/M in the features of other scales at the pixel position of the convolution image of the feature fusion layer 1, and m and M are both greater than or equal to 1 integer.
  • the method further includes:
  • the normalized weight coefficient of the feature of the mth scale at the pixel position of the convolution image of the first feature fusion layer is less than the mean value of the normalized weight coefficient of all M features of different scales, and M is equal to 2
  • M is equal to 2
  • the splicing of the weighted and fused results includes:
  • the weighted fusion results of features of at least one scale from M feature extraction layers and features of other scales on the convolution map of feature fusion layer l are spliced in a preset dimension according to a preset order to obtain at least one Adaptive fusion results of features at scale.
  • an embodiment of the present disclosure provides an adaptive feature fusion system in a convolutional neural network, which includes: a weight coefficient acquisition module, a weight coefficient activation and normalization module, and a feature weighted fusion splicing module;
  • the weight coefficient obtaining module is used to obtain the weight coefficient of at least one scale feature of the current feature fusion layer
  • the weight coefficient activation and normalization module is configured to activate and normalize the weight coefficients of the features of at least one scale of the current feature fusion layer;
  • the feature weighted fusion splicing module is used to perform weighted fusion on the features of at least one scale in the current feature fusion layer, and splice the weighted fusion results to obtain adaptive feature fusion results, and complete the convolutional neural network.
  • Adaptive feature fusion in the network improves detection accuracy.
  • an embodiment of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the embodiment of the present disclosure is implemented when the processor executes the computer program An adaptive feature fusion method in any convolutional neural network.
  • an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, an adaptive feature fusion method in any convolutional neural network of the embodiments of the present disclosure is implemented. .
  • the weight coefficient of the feature of at least one scale of the current feature fusion layer by obtaining the weight coefficient of the feature of at least one scale of the current feature fusion layer; activating and normalizing the weight coefficient of the feature of at least one scale of the current feature fusion layer;
  • the features of at least one scale are weighted and fused, and the weighted and fused results are spliced to obtain the adaptive feature fusion result, and the adaptive feature fusion in the convolutional neural network is completed to improve the detection accuracy.
  • the values of the weight coefficients of at least one scale feature are between 0 and 1 and the sum is equal to 1, especially using the saturation region of the nonlinear activation function , to avoid the gap between those weight coefficients with large values being further enlarged too quickly to cause severe shocks during training, and then use linear normalization to reduce the amount of calculation, improving the stability and efficiency of weight coefficient calculation.
  • the loss of weight coefficients that generate at least one scale feature is integrated with the entire convolutional neural network to participate in end-to-end training. During the training process, there is no need for complex sample calibration or intermediate results. Additional manual operations such as parameter adjustment.
  • FIG. 1 is a schematic flowchart of an adaptive feature fusion method in a convolutional neural network provided by an embodiment of the present disclosure.
  • FIG. 2 is a schematic flowchart of a method for performing weighted fusion of features of at least one scale by a feature fusion layer provided by an embodiment of the present disclosure.
  • FIG. 3 is a schematic diagram of a principle of weighted fusion and concatenation of features of at least one scale at a feature fusion layer provided by an embodiment of the present disclosure.
  • FIG. 4 is a block diagram of an adaptive feature fusion system in a convolutional neural network provided by an embodiment of the present disclosure.
  • Fig. 5 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
  • Target detection solutions usually use two mainstream frameworks: one is the two-stage detection framework (Two-stage Detection Frameworks) represented by R-CNN, Fast-RCNN, Faster-RCNN and R-FCN, and the other is It is a single-stage detection framework represented by YOLO (You Only Look Once) algorithm (that is, the neural network only needs to look at the picture once to output the result), single-step detection (Single Shot Detector, SSD) algorithm, Retina-Net algorithm, etc. (One-stage Detection Frameworks).
  • Two-stage Detection Frameworks the two-stage detection framework represented by R-CNN, Fast-RCNN, Faster-RCNN and R-FCN
  • YOLO You Only Look Once
  • single-step detection Single Shot Detector, SSD
  • Retina-Net algorithm etc.
  • the feature fusion mode based on vector splicing (concat) will cause artificially forced
  • the features of each scale participate in the feature fusion with equal weights, which cannot avoid the contradiction between the training objectives of the detectors corresponding to the adjacent feature extraction layers, which is not conducive to improving the adaptability of the convolutional neural network with feature fusion to different training objectives. and convergence, reducing the accuracy of convolutional neural networks for deep learning algorithms.
  • the one-stage detection framework has been more widely used in industry due to its huge advantage in speed.
  • SSD creatively tries to extract features of different scales in parallel from multiple different convolutional layers from low to high to deal with the detection of targets of different sizes.
  • YOLO single-stage framework represented by YOLO that only extracts features from a single convolutional layer at the end.
  • cascaded multiple detection opportunities like Faster RCNN to cut out the regions of interest that may have targets through preliminary detection in the convolution map.
  • Retina-Net as a representative based on the feature pyramid network (Feature Pyramid Network, FPN) structure
  • FPN feature Pyramid Network
  • the feature information of different scales is extracted in multiple layers, and the down-sampling and up-sampling pathways (top-down pathway) are used to convert
  • the convolution maps corresponding to the feature information of different scales are adjusted to the same size, and finally the deep semantic information and the shallow position information are fused, which gradually becomes a common configuration of the single-stage detection framework.
  • the adaptation of the single-stage detection framework to the target scale Ability has improved significantly.
  • a typical feature fusion method includes performing feature fusion on the convolution map of features from different convolutional layers at various scales in the network in a pixel-by-pixel mode (element-wise), or The convolution map of the features of each scale from different convolutional layers in the network is subjected to vector splicing (concat) mode for feature fusion, etc.
  • the detectors that detect objects of different sizes are respectively connected to the feature extraction convolution layer of the corresponding feature scale (hereinafter referred to as: the feature extraction layer of the corresponding scale), which will lead to the following problems:
  • the feature extraction layer of the corresponding scale In the process, because the target sample is forced to correspond to the detector behind the feature extraction layer of the corresponding scale according to the size of the target calibration frame, even though the feature extraction layer of the adjacent scale can also extract a part of the target at the target position of the original image features, but this type of algorithm forces the detectors behind the feature extraction layers of these adjacent scales to judge the features of the target and its category as non-existent, which eventually makes the detector's judgment of the target type inaccurate, and the position of the target There is a bias in the regression; furthermore, it will make it difficult for the training of the algorithm to converge.
  • the feature fusion strategy that is, the adaptively spatial feature fusion target detection (Adaptively Spatial Feature Fusion, ASFF) strategy.
  • ASFF Adaptively Spatial Feature Fusion
  • the ASFF strategy uses a series of learnable parameters to adaptively learn and adjust the weights of features of different scales from each feature extraction layer at each position on the convolutional map in the forward feature fusion of the convolutional neural network, effectively
  • the contradiction between the training objectives of the detectors corresponding to the adjacent feature extraction layers in the reverse error propagation stage is alleviated.
  • Classical algorithms such as YOLOv3 using this method achieve better speed and accuracy than the standard version on the MS COCO dataset. Accuracy compromise.
  • Adaptive Spatial Feature Fusion For each pixel (i, j) on the convolution map of the feature fusion layer l, features of different scales from each feature extraction layer are adaptively learned (the method has a total of The weight of the feature at the pixel position using 3 scales.
  • the method has a total of The weight of the feature at the pixel position using 3 scales.
  • i and j are both integers greater than or equal to 1.
  • This method obtains three convolution maps by adding an additional convolution branch after each of the three feature extraction layers, and converts the value at the position of the pixel (i, j) in the three convolution maps to
  • the softmax formula is used to normalize the value of each weight coefficient itself to the [0,1] interval, and the sum of each weight coefficient is normalized to 1, and finally obtain the normalized weight coefficient of the features of 3 scales for feature fusion
  • e x represents an exponential function with a constant e as the base, where x can be Equal weight coefficients.
  • features from different scales can be adaptively fused at any feature fusion layer l, and the fused result It can be used as the initial input of subsequent detectors to further improve the detection accuracy.
  • the self-adaptive feature fusion method and system in the convolutional neural network is used in the convolutional neural network for deep learning to perform feature fusion relying on adaptive weights.
  • the main steps include: obtaining each of the current feature fusion layers The weight coefficient of the feature of the scale; activate and normalize the weight coefficient of the feature of each scale in the current feature fusion layer; perform weighted fusion of the features of each scale in the current feature fusion layer; The results after feature weighted fusion are spliced; the adaptive feature fusion results of all feature fusion layers in the convolutional neural network are obtained.
  • This application uses the results of lightweight convolution branches, combined with operations such as activation and normalization, to generate normalized weight coefficients for the features of different scales from each feature extraction layer, and then use the normalized weight coefficients for each scale Adaptive weighted fusion and splicing of the features, thus solving the problem of adaptive feature fusion in the vector splicing mode, improving the adaptability and convergence of convolutional neural networks for different training objectives, as well as the overall accuracy of deep learning algorithms. At the same time, it can effectively save manpower, material resources and time costs.
  • the adaptive feature fusion method in the convolutional neural network in this application can be widely used in the fields of artificial intelligence technologies such as target detection, tracking, and semantic segmentation.
  • an embodiment of the present disclosure provides an adaptive feature fusion method in a convolutional neural network.
  • the adaptive feature fusion method in the convolutional neural network of the embodiment of the present disclosure can be executed by a corresponding adaptive feature fusion device in the convolutional neural network, which can be implemented in software and/or hardware, and can generally be integrated in electronic equipment .
  • a method for adaptive feature fusion in a convolutional neural network is provided.
  • the method is applied to a terminal for illustration. It can be understood that the method can also be applied to a server. It can be applied to a system including a terminal and a server, and is realized through interaction between the terminal and the server.
  • This embodiment utilizes the lightweight convolution branch to generate weight coefficients for the features of different scales from each feature extraction layer at the feature fusion layer of the convolutional neural network, and through nonlinear activation and normalization operations, to ensure The value of each weight coefficient is between 0 and 1 and the sum of the sum is equal to 1, and then use the above weight coefficient to weight and fuse the features of each scale with the features of other scales, and finally weight and fuse the features of each scale
  • the results are spliced to obtain the adaptive feature fusion result.
  • the above method solves the problem of adaptive feature fusion in the feature fusion operation based on the vector splicing mode in the convolutional neural network, improves the accuracy of feature fusion without significantly increasing the computational complexity, and thus improves the performance of deep learning algorithms. overall performance.
  • Fig. 1 shows a schematic flowchart of an adaptive feature fusion method in a convolutional neural network in an embodiment of the present application. As shown in Figure 1, in this embodiment, the method includes the following steps:
  • Step S110 acquiring weight coefficients of features of at least one scale in the current feature fusion layer.
  • the way to obtain the weight coefficient includes: at the current feature fusion layer, the features of different scales from different feature extraction layers are fused, and the convolution maps corresponding to the features of all scales are scaled by downsampling or upsampling operations to the same size; the convolutional images of features of different scales from at least one feature extraction layer are sent to a lightweight convolution branch respectively; the values of the results of different convolution branches at any pixel position are used as the current feature fusion Weight coefficients for features of at least one scale at pixel positions in the layer's convolution map.
  • the above-mentioned lightweight convolution branch uses the weight coefficient calculated from its result, after the above-mentioned activation and normalization operations, to affect the weighted fusion result of the feature fusion layer on the features from at least one feature extraction layer, and participate in the depth Learn the reverse error propagation in the basic network training process.
  • the entire training process is end-to-end mode, without additional manual intervention (such as additional labeling samples or specifying hyperparameters, etc.).
  • Step S120 activating and normalizing the weight coefficients of at least one scale feature of the current feature fusion layer.
  • step S120 can be implemented in the following manner: perform nonlinear activation on the weight coefficients of at least one scale feature at any pixel position on the convolution map of the current feature fusion layer; perform linear activation on the weight coefficients of the features after nonlinear activation. Normalization, to obtain the normalized weight coefficients of the features of each scale; to obtain the normalized weight coefficients at all pixel positions on the convolutional map of the current feature fusion layer.
  • the weight coefficients are all greater than 0, and the non-linear activation function Sigmoid is used to nonlinearly activate the weight coefficients of the features of each scale, so that the changes of the activated weight coefficients near the center point of the value range present a relatively rapid linear trend; while in The area away from the center point shows a slower nonlinear saturation trend.
  • the saturation region of the nonlinear activation function Since the saturation region of the nonlinear activation function has been passed, the gap between the weight coefficients with large values is avoided to be further enlarged too quickly, causing severe shocks during training, and the nonlinear activation weight of at least one scale of features is guaranteed.
  • the values of the coefficients are all greater than zero, so this step directly uses linear normalization to ensure that the sum of the weight coefficients of features from different scales at the position of the convolution image pixel (i, j) is equal to 1.
  • SoftMax of ASFF and other algorithms includes reducing the amount of calculation and avoiding negative impact on the function of the saturation region of the nonlinear activation function.
  • the activation weight coefficient of the feature at the mth scale at the pixel (i, j) of the convolutional image of the feature fusion layer l As an example, use the following formula to obtain its normalized weight coefficient ⁇ l, m, ij :
  • n the serial number of the scale of the feature, n ⁇ [1,...,M]. Since the values of the activation weight coefficients of features of at least one scale are all greater than zero, there is no case where the denominator of the above formula is equal to 0.
  • Step S130 performing weighted fusion on features of at least one scale in the current feature fusion layer.
  • step S140 the weighted fusion results are spliced to obtain an adaptive feature fusion result.
  • the result after weighted fusion is the result after weighted fusion of features of at least one scale in the current feature fusion layer.
  • the adaptive feature fusion result may include adaptive feature fusion results of multiple feature fusion layers.
  • all feature fusion layers may be judged separately to determine whether the adaptive feature fusion results of all feature fusion layers have been obtained.
  • the process is ended, and the adaptive feature fusion in the convolutional neural network is determined to be completed, thereby improving the detection accuracy; otherwise, return to step S110 and continue to obtain the current feature
  • the weight coefficient of features of at least one scale in the fusion layer are determined to be completed, thereby improving the detection accuracy; otherwise, return to step S110 and continue to obtain the current feature.
  • Steps S110 to S140 are repeated for each feature fusion layer that requires feature fusion in the convolutional neural network until the adaptive feature fusion results at all feature fusion layers are obtained.
  • the weight coefficient of at least one scale feature of the current feature fusion layer by obtaining the weight coefficient of at least one scale feature of the current feature fusion layer; activating and normalizing the weight coefficient of at least one scale feature of the current feature fusion layer;
  • the features of a scale are weighted and fused, and the weighted and fused results are spliced to obtain the adaptive feature fusion result, and the adaptive feature fusion in the convolutional neural network is completed to improve the detection accuracy; it can rely on lightweight convolution branches and
  • the simple calculation process realizes the adaptive feature fusion in the feature fusion mode based on vector splicing, so as to improve the adaptability and convergence of the convolutional neural network for different training objectives, thereby improving the overall accuracy of the deep learning algorithm.
  • the weighted fusion of features of at least one scale at the current feature fusion layer in step S130 may be implemented in the following manner.
  • FIG. 2 is a schematic flowchart of a method for performing weighted fusion of features of at least one scale by a feature fusion layer provided by an embodiment of the present disclosure. As shown in Figure 2, the method includes but is not limited to the following steps:
  • Step S131 performing weighted fusion of features of each scale and features of other scales at any pixel position on the convolution map of the current feature fusion layer.
  • Step S132 obtaining the weighted fusion results of the features of each scale and the features of other scales at all pixel positions on the convolution map of the current feature fusion layer.
  • step S131 Repeat the operation in step S131 at each pixel position of the convolution map of the feature fusion layer 1 until the weighted fusion results of the features of each scale at all pixel positions and the features of other scales are obtained.
  • step S131 at any pixel position on the convolution map of the current feature fusion layer, the features of each scale are weighted and fused with features of other scales, including:
  • the features of each scale are weighted and fused with the features of other scales, including: if the feature of the mth scale is in the convolution map of the feature fusion layer l
  • the normalized weight coefficient ⁇ l, m, ij at the pixel (i, j) position is greater than or equal to the mean value 1/M of the normalized weight coefficients of all M features of different scales, then at the (i, j) position
  • the result of the weighted fusion of the features of this scale and the features of other scales equal to itself.
  • the normalized weight coefficient ⁇ l, m, ij of the feature of the mth scale at the convolutional image pixel (i, j) of the feature fusion layer l is greater than or equal to the normalized weight of the features of all M scales
  • the mean value of the coefficient is 1/M, then the feature of this scale at the (i, j) position is weighted and fused with features of other scales equal to itself.
  • step S131 at any pixel position on the convolution map of the current feature fusion layer, the features of each scale are weighted and fused with features of other scales, including:
  • the normalized weight coefficient of the feature of the mth scale at the pixel position of the convolution image of the l feature fusion layer is less than the mean value of the normalized weight coefficient of all M features of different scales, according to the mth
  • the normalized weight coefficient of the features of the first scale at the pixel position of the convolutional image of the first feature fusion layer, the features of M different scales, and the features of other scales at the pixel position of the convolutional image of the first feature fusion layer The weighted mean of all the features whose normalized weight coefficients are greater than 1/M, determine the result of the weighted fusion of the features of the scale at the pixel position of the convolution image and the features of other scales; among them, m, M, and l are all greater than or an integer equal to 1.
  • the normalized weight coefficient ⁇ l, m, ij of the feature of the mth scale at the convolutional image pixel (i, j) of the feature fusion layer l is less than the normalized weight of the features of all M scales
  • the mean value of the coefficient is 1/M
  • the feature x l, m, ij of this scale at the position (i, j) is the result of weighted fusion with features of other scales It can be calculated using the following formula.
  • step S131 of performing weighted fusion of features of each scale with features of other scales at any pixel position on the convolutional map of the current feature fusion layer it also includes:
  • the weighted mean is determined according to the normalized weight coefficient at the pixel position of the convolution image of the first feature fusion layer and the number M of the features of the mth scale.
  • the weighted mean value is the mean value of all features whose normalized weight coefficient is greater than 1/M in the features of other scales at the pixel position of the convolution image of the feature fusion layer l, and m and M are integers greater than or equal to 1.
  • weighted mean can be calculated by the following formula (7):
  • Max[*, *] means to take the larger value between the two in the brackets.
  • n represents the serial number of the feature of the scale, for example, the feature of the nth scale.
  • the non-linear activation normalization weight coefficient ⁇ l, m, ij of the features of the mth scale is less than 1/M, so at least one of the features of all scales has a non-linear activation normalization
  • the weight coefficients ⁇ l, m, ij are greater than 1/M, that is, there is no situation where the denominator of the above formula is equal to 0.
  • the adaptive feature fusion method in the convolutional neural network also includes: determining that the normalized weight coefficient of the feature of the mth scale at the pixel position of the convolution image of the first feature fusion layer is less than all M The mean value of the normalized weight coefficients of features of different scales, and when M is equal to 2, according to the normalized weight coefficient of the feature of the mth scale at the pixel position of the convolution image of the feature fusion layer l, and, Among the features of the two scales at the pixel position of the convolutional image of the first feature fusion layer, the feature inconsistent with the mth feature is determined after the weighted fusion of the feature of this scale at the pixel position of the convolutional image and the features of other scales the result of.
  • Formula (7) further calculates the weighted mean of all features whose normalized weight coefficient is greater than 1/M.
  • step S131 is repeatedly performed for the features of each scale until the weighted fusion result of the features of each scale and the features of other scales is obtained.
  • step S140 the splicing of the weighted and fused results in step S140 can be implemented in the following manner:
  • the weighted fusion results of features of at least one scale from M feature extraction layers and features of other scales on the convolution map of feature fusion layer l are spliced in a preset dimension according to a preset order to obtain at least one Adaptive fusion results of features at scale.
  • the weighted fusion results of at least one scale of features are spliced, including:
  • the weighted fusion results of features of at least one scale from M feature extraction layers and features of other scales on the convolution map of feature fusion layer l, in the order of 1,...,M, in the preset dimension Splicing is performed to obtain an adaptive fusion result Y l of features of at least one scale.
  • X l, 1 , X l, 2 , ..., X l, M are the feature fusion layer l respectively, and the features of each scale are weighted with the features of other scales at all pixel positions in its own convolution map
  • a vector matrix composed of numerical values of the fusion result, at least one of the above vector matrices is spliced in a concat mode on a preset dimension, and the formed new vector matrix Y l is the feature of at least one scale of the feature fusion layer l.
  • Steps S110 to S140 are repeated for each feature fusion layer that requires feature fusion in the convolutional neural network until the adaptive feature fusion results at all feature fusion layers are obtained.
  • FIG. 3 is a schematic diagram of a principle of performing weighted fusion and splicing of features of at least one scale at the feature fusion layer provided by an embodiment of the present disclosure. As shown in Figure 3, there are three scales: scale 1, scale 2, and scale 3.
  • multiple sampled images are obtained by down-sampling the image; then feature extraction is performed on each sampled image (such as feature extraction 1, feature extraction 2, and feature extraction 3 shown in Figure 3 etc.) to obtain a convolutional map; the convolutional maps of features of different scales from at least one feature extraction layer are sent to different feature fusion layers respectively.
  • the features of different scales (such as scale 1, scale 2 and scale 3) from different feature extraction layers are fused, and the fused feature map is input into the detector 3 for detection ;
  • features of different scales such as scale 1, scale 2, and scale 3 from different feature extraction layers are fused, and the fused feature maps are input into detector 2 for Detection;
  • features of different scales such as scale 1, scale 2, and scale 3) from different feature extraction layers are fused, and the fused feature map is input to detector 1 for detection , so as to obtain the adaptive feature fusion results in the convolutional neural network.
  • the vector matrix splicing operation between the respective feature fusion results of the features of scales 1, 2, and 3 is performed, so as to obtain the spliced adaptive feature fusion results.
  • the normalized weight coefficient ⁇ 1, 1, ij of the feature of scale 1 is greater than or equal to 1/3; in the case of determining that ⁇ 1, 1, ij is greater than or equal to 1/3, calculate the characteristic value of scale 1; in the case of determining ⁇ 1, 1, ij is less than 1/3, calculate the characteristic value of scale 1
  • the feature value is: (3* ⁇ 1,1,ij )*feature value of scale 1+(1-3* ⁇ 1,1,ij )*weighted mean of features whose normalized weight coefficient is greater than 1/3.
  • FIG. 4 is a block diagram of an adaptive feature fusion system in a convolutional neural network provided by an embodiment of the present disclosure.
  • the adaptive feature fusion system in the convolutional neural network includes but is not limited to the following modules: weight coefficient acquisition module 401, weight coefficient activation and normalization module 402, and feature weighted fusion splicing module 403;
  • the weight coefficient obtaining module 401 is configured to obtain the weight coefficient of at least one scale feature of the current feature fusion layer.
  • the weight coefficient activation and normalization module 402 is configured to activate and normalize the weight coefficients of at least one scale feature of the current feature fusion layer.
  • the feature weighted fusion splicing module 403 performs weighted fusion on the features of at least one scale in the current feature fusion layer, and splices the weighted fusion results to obtain the adaptive feature fusion result, and completes the adaptive feature fusion in the convolutional neural network, Improve detection accuracy.
  • the system provided in this embodiment is used to execute the above-mentioned method embodiments. Please refer to the above-mentioned embodiments for specific procedures and details, and details will not be repeated here.
  • FIG. 5 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device includes: at least one processor 501; at least one memory 502, and one or more I/O interfaces 503, connected between the processor 501 and the memory 502; wherein, the memory 502 stores One or more computer programs that can be executed by at least one processor 501, one or more computer programs are executed by at least one processor 501, so that at least one processor 501 can perform the above-mentioned adaptive feature fusion in the convolutional neural network method.
  • An embodiment of the present disclosure also provides a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above-mentioned adaptive feature fusion method in a convolutional neural network when executed by a processor/processing core .
  • Computer readable storage media may be volatile or nonvolatile computer readable storage media.
  • An embodiment of the present disclosure also provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device
  • the processor in the electronic device executes the above-mentioned adaptive feature fusion method in the convolutional neural network.
  • the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute.
  • Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit .
  • Such software may be distributed on computer readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • Computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable program instructions, data structures, program modules, or other data. volatile, removable and non-removable media.
  • Computer storage media include, but are not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), static random access memory (SRAM), flash memory or other memory technologies, portable Compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical disk storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or any other device that can be used to store desired information and can be accessed by a computer any other medium.
  • communication media typically embodies computer-readable program instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery medium.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA)
  • FPGA field programmable gate array
  • PDA programmable logic array
  • the computer program products described here can be specifically realized by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
  • a software development kit Software Development Kit, SDK
  • These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
  • each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种卷积神经网络中自适应特征融合方法及系统。方法包括:获取当前特征融合层的各个尺度的特征的权重系数;对当前特征融合层的至少一个尺度的特征的权重系数进行激活和归一化;在当前特征融合层对至少一个尺度的特征进行加权融合,并将加权融合后的结果进行拼接,得到自适应特征融合结果,完成卷积神经网络中自适应特征融合。

Description

卷积神经网络中自适应特征融合方法及系统 技术领域
本公开实施例涉及人工智能技术领域,尤其涉及一种卷积神经网络中自适应特征融合方法及系统、电子设备、计算机可读存储介质。
背景技术
近年来,随着基于卷积神经网络(Convolutional Neural Networks,CNN)的深度学习的应用,计算机视觉领域内的图像分类、目标检测、语义分割等方向的研究都取得了显著的进展。与基于手工特征的算法相比,采用CNN可以学习到具有特定表达能力的特征,因此,CNN被广泛使用在目标检测流程中,用于提取目标特征。
发明内容
本公开实施例提供一种卷积神经网络中自适应特征融合方法及系统、电子设备、计算机可读存储介质,其可以提高特征融合的对于不同训练目标的适应和收敛性,以及深度学习算法的整体精度的同时,能有效节省人力、物力和时间成本。
第一方面,本公开实施例提供一种卷积神经网络中自适应特征融合方法,其包括:获取当前特征融合层的至少一个尺度的特征的权重系数;对所述当前特征融合层的至少一个尺度的所述特征的所述权重系数进行激活和归一化;在所述当前特征融合层对至少一个尺度的所述特征进行加权融合,并将加权融合后的结果进行拼接,得到自适应特征融合结果,完成卷积神经网络中自适应特征融合,提高检测精度。
在一些具体实现中,所述获取当前特征融合层的至少一个尺度的特征的权重系数,包括:
在当前的特征融合层处,对来自不同特征提取层的不同尺度的特征进行融合,将所有尺度的特征对应的卷积图通过下采样或上采样操作缩放到相同大小;
将来自至少一个特征提取层的不同尺度的特征的卷积图分别送往一个轻量级卷积分支;
将不同卷积分支的结果在任意像素位置处的数值,作为当前特征融合层的卷积图像素位置处至少一个尺度的特征的权重系数。
在一些具体实现中,所述对所述当前特征融合层的至少一个尺度的所述特征的所述权重系数进行激活和归一化,包括:
对当前特征融合层的卷积图上任意像素位置处的至少一个尺度的特征的权重系数进行非线性激活;
对非线性激活后的特征的权重系数进行线性归一化,得到每个尺度的特征的归一化权重系数;
获取当前特征融合层卷积图上全部像素位置处的归一化权重系数。
在一些具体实现中,所述在所述当前特征融合层对至少一个尺度的所述特征进行加权融合,包括:
在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合;
获取当前取特征融合层卷积图上全部像素位置处,每个尺度的特征与其他尺度的特征进行加权融合的结果。
在一些具体实现中,所述在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合,包括:
在确定第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数大于或等于全部M个不同尺度的特征的归一化权重系数的均值的情况下,在所述卷积图像素位置处的该尺度的特征与其他尺度的特征加权融合后的结果等于其自身;其中,m、M均为大于或等于1的整数。
在一些具体实现中,所述在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合,包括:
在确定第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数小于全部M个不同尺度的特征的归一化权重系数的均值的情况下,依据第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数、M个不同尺度的特征、以及在第l特征融合层的卷积图像素位置处的其他尺度的特征中所有归一化权重系数大于1/M的特征的加权均值,确定在所述卷积图像素位置处的尺度的特征与其他尺度的特征加权融合后的结果;其中,m、M、l均为大于或等于1的整数。
在一些具体实现中,所述在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合之前,还包括:
依据第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数、特征的数量M,确定加权均值;
其中,所述加权均值为在特征融合层l的卷积图像素位置处的其他尺度的特征中所有归一化权重系数大于1/M的特征的均值,m、M均为大于或等于1的整数。
在一些具体实现中,所述方法,还包括:
在确定所述第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数小于全部M个不同尺度的特征的归一化权重系数的均值,且M等于2的情况下,依据第m个尺度的特征在特征融合层l的卷积图像素位置处的归一化权重系数,和,在第l特征融合层的卷积图像素位置处的2个尺度的特征中与第m个特征不一致的特征,确定在所述卷积图像素位置处的该尺度的特征与其他尺度的特征加权融合后的结果。
在一些具体实现中,所述将加权融合后的结果进行拼接,包括:
将特征融合层l的卷积图上来自M个特征提取层的至少一个尺度的特征与其他尺度的特征的加权融合结果,依据预设顺序,在预先设定的维度上进行拼接,获得至少一个尺度的特征的自适应融合结果。
第二方面,本公开实施例提供一种卷积神经网络中自适应特征融合系统,其包括:权重系数获取模块、权重系数激活和归一化模块以及特征加权融合拼接模块;
所述权重系数获取模块,用于获取当前特征融合层的至少一个尺度的特征的权重系数;
所述权重系数激活和归一化模块,用于对所述当前特征融合层的至少一个尺度的所述特征的所述权重系数进行激活和归一化;
所述特征加权融合拼接模块,用于在所述当前特征融合层对至少一个尺度的所述特征进行加权融合,并将加权融合后的结果进行拼接,得到自适应特征融合结果,完成卷积神经网络中自适应特征融合,提高检测精度。
第三方面,本公开实施例提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本公开实施例任意一种卷积神经网络中自适应特征融合方法。
第四方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本公开实施例任意一种卷积神经网络中自适应特征融合方法。
在本公开实施例中,通过获取当前特征融合层的至少一个尺度的特征的权重系数;对所述当前特征融合层的至少一个尺度的所述特征的所述权重系数进行激活和归一化;在所述当前特征融合层对至少一个尺度的所述特征进行加权融合,并将加权融合后的结果进行拼接,得到自适应特征融合结果,完成卷积神经网络中自适应特征融合,提高检测精度;能够依靠轻量级的卷积分支以及简单的计算过程,实现在基于向量拼接(concat)的特征融合模式下的自适应特征融合,以此提高卷积神经网络对于不同训练目标的适应和收敛性,从而提升深度学习算法的整体精度。
进一步地,通过非线性激活和线性归一化操作,保证至少一个尺度的特征的权重系数的数值均位于0到1之间且相加之和等于1,特别是利用非线性激活函数的饱和区,避免了数值较大的那些权重系数之间的差距被过快地进一步放大在训练中引发的剧烈震荡,再利用线性归一化降低运算量,提高了权重系数计算的稳定性和效率。通过其轻量级的卷积分支将生成至少一个尺度特征的权重系数的损失与整个卷积神经网络整合到一起,参与端到端训练,在训练过程中无需根据中间结果进行复杂的样本标定或参数调整等额外的人工操作。并且,可方便的嵌套在目标检测、跟踪、语义分割等含有特征融合结构的算法的卷积神经网络中,其对相关算法精度的提升并未以大幅度牺牲运行速度为代价,轻量级的卷积分支结构和简单高效的特征加权融合计算保证了加入本申请后的算法的运行速度接近于相应的原始算法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。
附图说明
图1为本公开实施例提供的一种卷积神经网络中自适应特征融合方法的流程示意图。
图2为本公开实施例提供的一种特征融合层对至少一个尺度的特征进行加权融合的方法的流程示意图。
图3为本公开实施例提供的一种在特征融合层对至少一个尺度的特征进行加权融合和拼接的原理的示意图。
图4为本公开实施例提供的一种卷积神经网络中自适应特征融合系统的组成方框图。
图5为本公开实施例提供的一种电子设备的框图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例的附图,对本申请实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本申请的一部分实施例,而不是全部的实施例。基于所描述的本申请的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。
目标检测的解决方案通常采用两种主流框架:一种是以R-CNN、Fast-RCNN、Faster-RCNN和R-FCN等为代表的双阶段检测框架(Two-stage Detection Frameworks),另一种是以YOLO(You Only Look Once)算法(即神经网络只需要看一次图片,就能输出结果)、单步检测(Single Shot Detector,SSD)算法、Retina-Net算法等为代表的单阶段检测框架(One-stage Detection Frameworks)。
在以目标检测、跟踪、语义分割等为目标的深度学习算法的卷积神经网络中,针对来自不同特征提取层的各个尺度的特征,基于向量拼接(concat)特征融合模式,会造成因人为强制各个尺度的特征以均等的权重参与特征融合,无法避免相邻特征提取层对应的检测器的训练目标之间产生矛盾,从而不利于提高含有特征融合的卷积神经网络对于不同训练目标的适应性和收敛性,降低了深度学习算法的卷积神经网络的精度。
特别是单阶段检测框架,由于其具有速度上的巨大优势,从而在工业中得到了更为广泛的应用。在单阶段检测框架中,为了提高检测精度,SSD创造性地尝试分别从由低到高的多个不同卷积层,平行地提取不同尺度的特征,来分别应对不同大小的目标的检测的方法,从而取得了显著优于此前以YOLO为代表的只从末尾的单个卷积层提取特征的单阶段框架的成绩。然而,单阶段检测框架因为自身网络结构和工作原理限制,难以做到类似Faster RCNN一样利用级联的多次检测机会,在卷积图中通过初步检测切割出可能存在目标的感兴趣的区域,再把该区域的卷积图归一化到指定大小后作为后续检测的初始输入,从而以逐步递进的方式精确的将检测器的感受野与目标特征的尺度进行匹配, 最终导致其在对不同大小的目标的适应性上弱于双阶段检测框架。近期,随着以Retina-Net为代表的使用基于特征金字塔网络(Feature Pyramid Network,FPN)的结构,分多层提取不同尺度的特征信息,利用下采样和上采样通路(top-down pathway)将不同尺度的特征信息对应的卷积图调整到相同大小,最终将其中的深层语义信息和浅层位置信息进行融合方法,逐渐成为单阶段检测框架的常用配置,单阶段检测框架对目标尺度的适应能力有了显著的提高。在特征金字塔结构中,典型的特征融合方法包括对来自网络中不同卷积层各个尺度的特征的卷积图进行以像素为单位的对位相加(element-wise)的模式进行特征融合,或将来自网络中不同卷积层各个尺度的特征的卷积图进行向量拼接(concat)的模式进行特征融合等。
带有特征金字塔的单阶段检测框架,将检测不同大小的目标的检测器分别连接在相应特征尺度的特征提取卷积层(以下简称:相应尺度的特征提取层),会导致如下问题:在训练过程中因为强制将目标样本按照目标标定框大小对应到相应尺度的特征提取层后面的检测器,因而,尽管与之相邻尺度的特征提取层在原始图像的目标位置处也可提取出一部分目标特征,但是该类算法强制的让这些相邻尺度的特征提取层后面的检测器将目标及其类别的特征判断为不存在,最终使检测器对于目标种类的判断不准确,并对目标的位置的回归存在偏差;进一步还会,导致算法的训练难以收敛。为了解决上述问题,已经有研究者针对各层特征融合时将不同尺度特征对应的卷积图像素对位相加的element-wise模式,提出了由数据驱动的能够灵活改变不同尺度的特征的权重的特征融合策略,即自适应空间特征融合的目标检测(Adaptively Spatial Feature Fusion,ASFF)策略。ASFF策略通过一系列可学习的参数,自适应的学习和调整卷积图上每个位置处来自各个特征提取层的不同尺度的特征在卷积神经网络的正向特征融合中的权重,有效地缓解了在反向误差传播阶段相邻特征提取层对应的检测器的训练目标之间的矛盾,采用这种方法的YOLOv3等经典算法在MS COCO数据集上实现了比标准版本更好的速度与精度的折中。
在自适应空间特征融合(ASFF)方法中,对于特征融合层l的卷积图上的每一个像素(i,j),自适应地学习来自各个特征提取层的不同尺度的特征(该方法一共使用了3个尺度的特征)在该像素位置处的权重。假设
Figure PCTCN2022121730-appb-000001
代表深度学习网络中从第n层到第l层的特征向量在第l层卷积图的像素(i,j)位置处的数值,则在特征融合层l的卷积图处融合后的结果在像素(i,j)位置处的数值
Figure PCTCN2022121730-appb-000002
可以用以下公式计算:
Figure PCTCN2022121730-appb-000003
其中,
Figure PCTCN2022121730-appb-000004
分别代表来自3个特征提取层的不同尺度的特征在特征融合层l卷积图的像素(i,j)位置处进行自适应融合时候的归一化权重系数。其中,i、j均为大于或等于1的整数。
该方法通过在3个特征提取层后面各自增加一个额外的卷积分支,分别获得3个卷积图,并将3个卷积图中像素(i,j)位置处的数值
Figure PCTCN2022121730-appb-000005
作为3个尺度的特征的权 重系数,再利用softmax公式进行归一化,将各个权重系数自身的取值归一化到[0,1]区间,并将各个权重系数相加的和归一化到1,最终获得3个尺度的特征进行特征融合时的归一化的权重系数
Figure PCTCN2022121730-appb-000006
Figure PCTCN2022121730-appb-000007
其中,e x表示以常数e为底数的指数函数,其中,x可以为
Figure PCTCN2022121730-appb-000008
等权重系数。
通过上述操作,来自不同尺度的特征就可以在任意一个特征融合层l进行自适应的融合,而融合后的结果
Figure PCTCN2022121730-appb-000009
可以被用作后续的检测器的初始输入,来进一步提高检测精度。
随着特征融合技术的发展,在以YOLOV4为代表的越来越多的新一代目标检测算法逐渐证明,将来自网络中来自不同特征提取层的各个尺度的特征进行向量拼接的concat模式由于更完整地保留了各个尺度特征的细节信息,从而基于向量拼接模式的特征融合对提高后续目标检测等任务的精度的效果,显著优于传统的对各个尺度的特征进行以像素为单位的对位相加的模式。然而,现有的自适应空间特征融合(ASFF)方法,只能应用于传统的基于以像素为单位的对位相加的特征融合模式,无法用于上述基于向量拼接的模式。目前,针对上述基于向量拼接的特征融合模式,还无法自适应调整不同尺度特征的权重的特征融合方法,采用人为强制各个尺度的特征以均等的权重进行拼接并参与特征融合,无法避免相邻特征提取层对应的检测器的训练目标之间产生矛盾,不利于提高含有特征融合结构的卷积神经网络对于不同训练目标的适应性和收敛性,最终影响对待检测目标的检测精度。
本申请提供的卷积神经网络中自适应特征融合方法及系统,用于深度学习的卷积神经网络中的依靠自适应的权重来进行特征融合,其主要步骤包括:获取当前特征融合层的各个尺度的特征的权重系数;对当前特征融合层的各个尺度的特征的权重系数进行激活和归一化;在当前特征融合层对各个尺度的特征进行加权融合;在当前特征融合层对各个尺度的特征加权融合后的结果进行拼接;获取卷积神经网络中全部特征融合层的自适应特征融合结果。本申请利用轻量级的卷积分支的结果,结合激活和归一化等操作,为来自各个特征提取层的不同尺度的特征生成归一化权重系数,再利用归一化权重系数对各个尺度的特征进行自适应的加权融合和拼接,从而解决了向量拼接模式下的自适应特征融合问题,在提高卷积神经网络对于不同训练目标的适应性和收敛性,以及深度学习算法的整体精度的同时,能有效节省人力、物力和时间成本。
本申请中的卷积神经网络中自适应特征融合方法可以广泛应用在目标检测、跟踪、语义分割等人工智能技术领域中。
第一方面,本公开实施例提供一种卷积神经网络中自适应特征融合方法。本公开实施例的卷积神经网络中自适应特征融合方法可由相应的卷积神经网络中自适应特征融合装置执行,该装置可采用软件和/或硬件的方式实现,一般可集成于电子设备中。
在本申请的一个实施例中,提供一种卷积神经网络中自适应特征融合方法,本实施例以该方法应用于终端进行举例说明,可以理解的是,该方法也可以应用于服务器,还可以应用于包括终端和服务器的系统,并通过终端和服务器的交互实现。本实施例利用轻量级的卷积分支,在卷积神经网络的特征融合层处,为来自各个特征提取层的不同尺度的特征生成权重系数,并通过非线性激活和归一化操作,保证各个权重系数的数值均位于0到1之间且相加之和等于1,再利用上述权重系数分别将每个尺度的特征与其他尺度的特征进行加权融合,最后对各个尺度的特征加权融合后的结果进行拼接,获得自适应特征融合结果。上述方法解决了卷积神经网络中基于向量拼接模式的特征融合操作中的自适应特征融合问题,在不显著增加运算复杂度的前提下提高了特征融合的精度,并以此提升深度学习算法的整体性能。
图1示出本申请实施例中的卷积神经网络中自适应特征融合方法的流程示意图。如图1所示,本实施例中,该方法包括以下步骤:
步骤S110,获取当前特征融合层的至少一个尺度的特征的权重系数。
例如,该权重系数的获取方式包括:在当前的特征融合层处,对来自不同特征提取层的不同尺度的特征进行融合,将所有尺度的特征对应的卷积图通过下采样或上采样操作缩放到相同大小;将来自至少一个特征提取层的不同尺度的特征的卷积图分别送往一个轻量级卷积分支;将不同卷积分支的结果在任意像素位置处的数值,作为当前特征融合层的卷积图像素位置处至少一个尺度的特征的权重系数。
假设在卷积神经网络中当前的特征融合层l处,需要对来自M个不同特征提取层的M个尺度的特征进行融合,将所有尺度的特征对应的卷积图通过下采样或上采样操作缩放(resize)到相同大小,然后将来自至少一个特征提取层的不同尺度的特征的卷积图分别送往一个卷积核大小为1*1的轻量级卷积分支,并将上述M个卷积分支的结果在任意像素(i,j)位置处的数值,作为特征融合层l的卷积图像素(i,j)位置处至少一个尺度的特征的权重系数λ l,1,ij,λ l,2,ij,...,λ l,M,ij,M为大于或等于1的整数。
上述轻量级卷积分支通过由其结果计算出的权重系数,经过上述激活和归一化操作后,以影响特征融合层对来自至少一个特征提取层的特征的加权融合结果的方式,参与深度学习基础网络训练过程中的反向误差传播,整个训练过程为端到端模式,不需要额外的人工干预(例如额外标注样本或指定超参数等)。
步骤S120,对当前特征融合层的至少一个尺度的特征的权重系数进行激活和归一化。
例如,步骤S120可采用如下方式实现:对当前特征融合层的卷积图上任意像素位置处的至少一个尺度的特征的权重系数进行非线性激活;对非线性激活后的特征的权重系数进行线性归一化,得到每个尺度的特征的归一化权重系数;获取当前特征融合层卷 积图上全部像素位置处的归一化权重系数。
具体地,为了避免至少一个尺度的特征的权重系数中那些数值较大的之间的差距被过快地进一步放大,从而导致训练过程出现不稳定的震荡,同时也为了保证至少一个尺度的特征的权重系数均大于0,使用非线性激活函数Sigmoid对每个尺度的特征的权重系数进行非线性激活,使得激活后的权重系数在取值范围中心点附近的变化呈现较为迅速的线性趋势;而在远离中心点的区域,呈现较为缓慢的非线性饱和趋势。以第m个尺度的特征在特征融合层l的卷积图像素(i,j)位置处的权重系数λ l,m,ij为例,使用下列公式获得其激活权重系数
Figure PCTCN2022121730-appb-000010
Figure PCTCN2022121730-appb-000011
重复上述过程,获得特征融合层l的卷积图像素(i,j)位置处,每个尺度的特征的激活权重系数
Figure PCTCN2022121730-appb-000012
其中,m∈[1,...,M]。
由于已经通过非线性激活函数的饱和区,避免了数值较大的权重系数之间的差距被过快地进一步放大在训练中引发的剧烈震荡,并且保证了至少一个尺度的特征的非线性激活权重系数的数值均大于零,因此本步骤直接采用线性归一化来保证在卷积图像素(i,j)位置处来自不同尺度的特征的权重系数相加之和等于1。本申请不使用以ASFF等算法的SoftMax为代表的非线性归一化函数的原因包括降低运算量,并且避免对非线性激活函数的饱和区发挥作用产生负面影响。以第m个尺度的特征在特征融合层l的卷积图像素(i,j)位置处的激活权重系数
Figure PCTCN2022121730-appb-000013
为例,使用下列公式获得其归一化权重系数α l,m,ij
Figure PCTCN2022121730-appb-000014
其中,m∈[1,...,M],n表示特征的尺度的序号,n∈[1,...,M]。由于至少一个尺度的特征的激活权重系数的数值均大于零,因此不存在令上述公式的分母等于0的情况。
重复上述过程,获得特征融合层l的卷积图像素(i,j)位置处,每个尺度的特征的归一化权重系数α l,1,ij,α l,2,ij,...,α l,M,ij
在特征融合层l的卷积图的每个像素位置处重复上述操作,直到获得全部像素位置处的归一化权重系数。
步骤S130,在当前特征融合层对至少一个尺度的特征进行加权融合。
步骤S140,将加权融合后的结果进行拼接,得到自适应特征融合结果。
其中,加权融合后的结果是在当前特征融合层对至少一个尺度的特征加权融合后的结果。自适应特征融合结果可以包括多个特征融合层的自适应特征融合结果。
在一些具体实现中,可以对全部特征融合层分别进行判断,以确定是否已获得全部特征融合层的自适应特征融合结果。
在确定已获得全部特征融合层的自适应特征融合结果的情况下,结束流程,并确定完成卷积神经网络中自适应特征融合,从而提高检测精度;否则,返回执行步骤S110,继续获取当前特征融合层的至少一个尺度的特征的权重系数。
重复步骤S110至步骤S140,在卷积神经网络中每个需要进行特征融合的特征融合层,直到获得全部特征融合层处的自适应特征融合结果。
在本实施例中,通过获取当前特征融合层的至少一个尺度的特征的权重系数;对当前特征融合层的至少一个尺度的特征的权重系数进行激活和归一化;在当前特征融合层对至少一个尺度的特征进行加权融合,并将加权融合后的结果进行拼接,得到自适应特征融合结果,完成卷积神经网络中自适应特征融合,提高检测精度;能够依靠轻量级的卷积分支以及简单的计算过程,实现在基于向量拼接的特征融合模式下的自适应特征融合,以此提高卷积神经网络对于不同训练目标的适应和收敛性,从而提升深度学习算法的整体精度。
在一些具体实现中,步骤S130中的在当前特征融合层对至少一个尺度的特征进行加权融合,可以采用如下方式实现。
图2为本公开实施例提供的一种特征融合层对至少一个尺度的特征进行加权融合的方法的流程示意图。如图2所示,该方法包括但不限于如下步骤:
步骤S131,在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合。
步骤S132,获取当前取特征融合层卷积图上全部像素位置处,每个尺度的特征与其他尺度的特征进行加权融合的结果。
在特征融合层l的卷积图的每个像素位置处重复步骤S131中的操作,直到获得全部像素位置处的每个尺度的特征与其他尺度的特征进行加权融合的结果。
在一些具体实现中,步骤S131中的在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合,包括:
在确定第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数大于或等于全部M个不同尺度的特征的归一化权重系数的均值的情况下,在卷积图像素位置处的该尺度的特征与其他尺度的特征加权融合后的结果等于其自身;其中,m、M均为大于或等于1的整数。
例如,在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合,包括:如果第m个尺度的特征在特征融合层l的卷积图像素(i,j)位置处的归一化权重系数α l,m,ij大于或等于全部M个不同尺度的特征的归一化权重系数的均值1/M,则在(i,j)位置处的该尺度的特征与其他尺度的特征加权融合后的结果
Figure PCTCN2022121730-appb-000015
等于其自身。
利用特征融合层l的卷积图像素(i,j)位置处的M个尺度的特征的归一化权重系数α l,1,ij,α l,2,ij,...,α l,M,ij,对每个尺度的特征x′ l,1,ij,x′ l,2,ij,...,x′ l,M,ij进行加权融合。对于上述任意第m(m∈[1,...,M])个尺度的特征在特征融合层l的卷积图像素(i,j)位置处的数值x′ l,m,ij,加权融合的具体方法如下:
如果第m个尺度的特征在特征融合层l的卷积图像素(i,j)位置处的归一化权重系数 α l,m,ij大于或等于全部M个尺度的特征的归一化权重系数的均值1/M,则在(i,j)位置处的该尺度的特征与其他尺度的特征加权融合后的结果
Figure PCTCN2022121730-appb-000016
等于其自身。
Figure PCTCN2022121730-appb-000017
在一些具体实现中,步骤S131中的在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合,包括:
在确定第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数小于全部M个不同尺度的特征的归一化权重系数的均值的情况下,依据第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数、M个不同尺度的特征、以及在第l特征融合层的卷积图像素位置处的其他尺度的特征中所有归一化权重系数大于1/M的特征的加权均值,确定在卷积图像素位置处的尺度的特征与其他尺度的特征加权融合后的结果;其中,m、M、l均为大于或等于1的整数。
例如,如果第m个尺度的特征在特征融合层l的卷积图像素(i,j)位置处的归一化权重系数α l,m,ij小于全部M个尺度的特征的归一化权重系数的均值1/M,则在(i,j)位置处的该尺度的特征x l,m,ij与其他尺度的特征加权融合后的结果
Figure PCTCN2022121730-appb-000018
可以使用下列公式计算。
Figure PCTCN2022121730-appb-000019
其中,
Figure PCTCN2022121730-appb-000020
代表在特征融合层l的卷积图像素(i,j)位置处的其他尺度的特征中,所有归一化权重系数大于1/M的特征的加权均值。
在一些具体实现中,在执行步骤S131中的在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合之前,还包括:
依据第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数、特征的数量M,确定加权均值。
其中,加权均值为在特征融合层l的卷积图像素位置处的其他尺度的特征中所有归一化权重系数大于1/M的特征的均值,m、M均为大于或等于1的整数。
例如,加权均值可以通过下列公式(7)计算获得:
Figure PCTCN2022121730-appb-000021
其中,Max[*,*]表示在括号中的两者之间取较大的一个的数值。n表示尺度的特性的序号,如,第n个尺度的特征。
因为该公式被执行的前提条件就是第m个尺度的特征的非线性激活归一化权重系数α l,m,ij小于1/M,所以全部尺度的特征中至少有一个的非线性激活归一化权重系数α l,m,ij大于1/M,即不存在令上述公式的分母等于0的情况。
在一些具体实现中,卷积神经网络中自适应特征融合方法,还包括:在确定第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数小于全部M个不同尺度的特征的归一化权重系数的均值,且M等于2的情况下,依据第m个尺度的特征 在特征融合层l的卷积图像素位置处的归一化权重系数,和,在第l特征融合层的卷积图像素位置处的2个尺度的特征中与第m个特征不一致的特征,确定在卷积图像素位置处的该尺度的特征与其他尺度的特征加权融合后的结果。
特别的,当不同尺度的特征的个数M等于2时,公式(6)可以进一步简化为下列形式:
Figure PCTCN2022121730-appb-000022
其中,x l,n≠m,ij表示在特征融合层l的卷积图像素(i,j)位置处的2个尺度的特征中,与第m个不一致的那一个,此时不需要使用公式(7)进一步计算所有归一化权重系数大于1/M的特征的加权均值。
在特征融合层l卷积图像素(i,j)位置处,对于每个尺度的特征重复执行步骤S131的操作,直到获得每个尺度的特征与其他尺度的特征进行加权融合的结果。
在一些具体实现中,步骤S140中的将加权融合后的结果进行拼接,可用采用如下方式实现:
将特征融合层l的卷积图上来自M个特征提取层的至少一个尺度的特征与其他尺度的特征的加权融合结果,依据预设顺序,在预先设定的维度上进行拼接,获得至少一个尺度的特征的自适应融合结果。
例如,在当前特征融合层对至少一个尺度的特征加权融合后的结果进行拼接,包括:
将特征融合层l的卷积图上来自M个特征提取层的至少一个尺度的特征与其他尺度的特征的加权融合结果,按照1,...,M的顺序,在预先设定的维度上进行拼接,获得至少一个尺度的特征的自适应融合结果Y l
Y l=(X l,1,X l,2,...,X l,M)      (9)
其中,X l,1,X l,2,...,X l,M分别是特征融合层l处,每个尺度的特征在自身卷积图中所有像素位置处与其他尺度的特征进行加权融合的结果的数值组成的向量矩阵,将上述至少一个向量矩阵在预先设定的维度上进行concat模式的拼接,形成的新向量矩阵Y l即为特征融合层l的至少一个尺度的特征的自适应特征融合结果。
重复步骤S110至步骤S140,在卷积神经网络中每个需要进行特征融合的特征融合层,直到获得全部特征融合层处的自适应特征融合结果。
例如,图3为本公开实施例提供的一种在特征融合层对至少一个尺度的特征进行加权融合和拼接的原理的示意图。如图3所示,尺度包括3种:即尺度1、尺度2和尺度3。
在基础网络中,通过对图像进行下采样,获得多个采样后的图像;然后针对每个采样后的图像进行特征提取(如,图3所示的特征提取1、特征提取2、特征提取3等),获得卷积图;将来自至少一个特征提取层的不同尺度的特征的卷积图分别送往不同的特征融合层。
其中,在特征融合层3处,对来自不同特征提取层的不同尺度(如,尺度1、尺度2和尺度3)的特征进行融合,并将融合后的特征图输入到检测器3中进行检测;
同样的,在特征融合层2处,对来自不同特征提取层的不同尺度(如,尺度1、尺度2和尺度3)的特征进行融合,并将融合后的特征图输入到检测器2中进行检测;在特征融合层1处,对来自不同特征提取层的不同尺度(如,尺度1、尺度2和尺度3)的特征进行融合,并将融合后的特征图输入到检测器1中进行检测,从而获得卷积神经网络中的自适应特征融合结果。
其中,在特征融合层中,会针对尺度1、2、3的特征各自的特征融合结果之间的向量矩阵拼接操作,从而获得拼接后的自适应特征融合结果。
例如,在特征融合层1的拼接过程中,在尺度1的特征融合结果的向量(i,j)位置处,需要判断尺度1的特征的归一化权重系数α 1,1,ij是否大于或等于1/3;在确定α 1,1,ij大于或等于1/3的情况下,计算尺度1的特征数值;在确定α 1,1,ij小于1/3的情况下,计算尺度1的特征数值为:(3*α 1,1,ij)*尺度1的特征数值+(1-3*α 1,1,ij)*归一化权重系数大于1/3的特征的加权均值。
应当理解,以上实施例还可与本公开实施例的其它任意方式结合使用。以上实施例只是本公开的一个具体例子,而不是对本公开保护范围的限定。
第二方面,本公开实施例提供一种卷积神经网络中自适应特征融合系统。图4为本公开实施例提供的一种卷积神经网络中自适应特征融合系统的组成方框图。如图4所示,该卷积神经网络中自适应特征融合系统包括但不限于如下模块:权重系数获取模块401、权重系数激活和归一化模块402以及特征加权融合拼接模块403;
权重系数获取模块401,用于获取当前特征融合层的至少一个尺度的特征的权重系数。
权重系数激活和归一化模块402,用于对当前特征融合层的至少一个尺度的特征的权重系数进行激活和归一化。
特征加权融合拼接模块403,在当前特征融合层对至少一个尺度的特征进行加权融合,并将加权融合后的结果进行拼接,得到自适应特征融合结果,完成卷积神经网络中自适应特征融合,提高检测精度。
本实施例提供的系统是用于执行上述各方法实施例的,具体流程和详细内容请参照上述实施例,此处不再赘述。
第三方面,图5为本公开实施例提供的一种电子设备的框图。如图5所示,该电子设备包括:至少一个处理器501;至少一个存储器502,以及一个或多个I/O接口503,连接在处理器501与存储器502之间;其中,存储器502存储有可被至少一个处理器501执行的一个或多个计算机程序,一个或多个计算机程序被至少一个处理器501执行,以使至少一个处理器501能够执行上述的卷积神经网络中自适应特征融合方法。
本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序在被处理器/处理核执行时实现上述的卷积神经网络中自适应特征融合方法。计算机可读存储介质可以是易失性或非易失性计算机可读存储介质。
本公开实施例还提供了一种计算机程序产品,包括计算机可读代码,或者承载有计 算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述卷积神经网络中自适应特征融合方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读存储介质上,计算机可读存储介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。
如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读程序指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM)、静态随机存取存储器(SRAM)、闪存或其他存储器技术、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读程序指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现 场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。
这里所描述的计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其他实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。

Claims (12)

  1. 一种卷积神经网络中自适应特征融合方法,其特征在于,包括:
    获取当前特征融合层的至少一个尺度的特征的权重系数;
    对所述当前特征融合层的至少一个尺度的所述特征的所述权重系数进行激活和归一化;
    在所述当前特征融合层对至少一个尺度的所述特征进行加权融合,并将加权融合后的结果进行拼接,得到自适应特征融合结果,完成卷积神经网络中自适应特征融合。
  2. 根据权利要求1所述的方法,其特征在于,所述获取当前特征融合层的至少一个尺度的特征的权重系数,包括:
    在当前的特征融合层处,对来自不同特征提取层的不同尺度的特征进行融合,将所有尺度的特征对应的卷积图通过下采样或上采样操作缩放到相同大小;
    将来自至少一个特征提取层的不同尺度的特征的卷积图分别送往一个轻量级卷积分支;
    将不同卷积分支的结果在任意像素位置处的数值,作为当前特征融合层的卷积图像素位置处至少一个尺度的特征的权重系数。
  3. 根据权利要求1所述的方法,其特征在于,所述对所述当前特征融合层的至少一个尺度的所述特征的所述权重系数进行激活和归一化,包括:
    对当前特征融合层的卷积图上任意像素位置处的至少一个尺度的特征的权重系数进行非线性激活;
    对非线性激活后的特征的权重系数进行线性归一化,得到每个尺度的特征的归一化权重系数;
    获取当前特征融合层卷积图上全部像素位置处的归一化权重系数。
  4. 根据权利要求1所述的方法,其特征在于,所述在所述当前特征融合层对至少一个尺度的所述特征进行加权融合,包括:
    在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合;
    获取当前取特征融合层卷积图上全部像素位置处,每个尺度的特征与其他尺度的特征进行加权融合的结果。
  5. 根据权利要求4所述的方法,其特征在于,所述在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合,包括:
    在确定第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数 大于或等于全部M个不同尺度的特征的归一化权重系数的均值的情况下,在所述卷积图像素位置处的该尺度的特征与其他尺度的特征加权融合后的结果等于其自身;其中,m、M均为大于或等于1的整数。
  6. 根据权利要求4所述的方法,其特征在于,所述在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合,包括:
    在确定第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数小于全部M个不同尺度的特征的归一化权重系数的均值的情况下,依据第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数、M个不同尺度的特征、以及在第l特征融合层的卷积图像素位置处的其他尺度的特征中所有归一化权重系数大于1/M的特征的加权均值,确定在所述卷积图像素位置处的尺度的特征与其他尺度的特征加权融合后的结果;其中,m、M、l均为大于或等于1的整数。
  7. 根据权利要求6所述的方法,其特征在于,所述在当前特征融合层的卷积图上任意像素位置处分别将每个尺度的特征与其他尺度的特征进行加权融合之前,还包括:
    依据第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数、特征的数量M,确定加权均值;
    其中,所述加权均值为在特征融合层l的卷积图像素位置处的其他尺度的特征中所有归一化权重系数大于1/M的特征的均值,m、M均为大于或等于1的整数。
  8. 根据权利要求6所述的方法,其特征在于,所述方法,还包括:
    在确定所述第m个尺度的特征在第l特征融合层的卷积图像素位置处的归一化权重系数小于全部M个不同尺度的特征的归一化权重系数的均值,且M等于2的情况下,依据第m个尺度的特征在特征融合层l的卷积图像素位置处的归一化权重系数,和,在第l特征融合层的卷积图像素位置处的2个尺度的特征中与第m个特征不一致的特征,确定在所述卷积图像素位置处的该尺度的特征与其他尺度的特征加权融合后的结果。
  9. 根据权利要求1所述的方法,其特征在于,所述将加权融合后的结果进行拼接,包括:
    将特征融合层l的卷积图上来自M个特征提取层的至少一个尺度的特征与其他尺度的特征的加权融合结果,依据预设顺序,在预先设定的维度上进行拼接,获得至少一个尺度的特征的自适应融合结果。
  10. 一种卷积神经网络中自适应特征融合系统,其特征在于,所述系统包括:权重系数获取模块、权重系数激活和归一化模块以及特征加权融合拼接模块;
    所述权重系数获取模块,用于获取当前特征融合层的至少一个尺度的特征的权重系 数;
    所述权重系数激活和归一化模块,用于对所述当前特征融合层的至少一个尺度的所述特征的所述权重系数进行激活和归一化;
    所述特征加权融合拼接模块,用于在所述当前特征融合层对至少一个尺度的所述特征进行加权融合,并将加权融合后的结果进行拼接,得到自适应特征融合结果,完成卷积神经网络中自适应特征融合,提高检测精度。
  11. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至9中任一所述的卷积神经网络中自适应特征融合方法。
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1至9中任一所述的卷积神经网络中自适应特征融合方法。
PCT/CN2022/121730 2021-11-05 2022-09-27 卷积神经网络中自适应特征融合方法及系统 WO2023077998A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111310425.9A CN114092760A (zh) 2021-11-05 2021-11-05 卷积神经网络中自适应特征融合方法及系统
CN202111310425.9 2021-11-05

Publications (1)

Publication Number Publication Date
WO2023077998A1 true WO2023077998A1 (zh) 2023-05-11

Family

ID=80299088

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/121730 WO2023077998A1 (zh) 2021-11-05 2022-09-27 卷积神经网络中自适应特征融合方法及系统

Country Status (2)

Country Link
CN (1) CN114092760A (zh)
WO (1) WO2023077998A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958774A (zh) * 2023-09-21 2023-10-27 北京航空航天大学合肥创新研究院 一种基于自适应空间特征融合的目标检测方法
CN117690128A (zh) * 2024-02-04 2024-03-12 武汉互创联合科技有限公司 胚胎细胞多核目标检测系统、方法和计算机可读存储介质
CN117933309A (zh) * 2024-03-13 2024-04-26 西安理工大学 一种用于双时相遥感图像变化发现的三路神经网络及方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092760A (zh) * 2021-11-05 2022-02-25 通号通信信息集团有限公司 卷积神经网络中自适应特征融合方法及系统
CN115316982A (zh) * 2022-09-02 2022-11-11 中国科学院沈阳自动化研究所 一种基于多模态传感的肌肉形变智能检测系统及方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190236411A1 (en) * 2016-09-14 2019-08-01 Konica Minolta Laboratory U.S.A., Inc. Method and system for multi-scale cell image segmentation using multiple parallel convolutional neural networks
CN111753752A (zh) * 2020-06-28 2020-10-09 重庆邮电大学 基于卷积神经网络多层特征融合的机器人闭环检测方法
CN111797779A (zh) * 2020-07-08 2020-10-20 兰州交通大学 基于区域注意力多尺度特征融合的遥感图像语义分割方法
CN112183295A (zh) * 2020-09-23 2021-01-05 上海眼控科技股份有限公司 行人重识别方法、装置、计算机设备及存储介质
CN113111975A (zh) * 2021-05-12 2021-07-13 合肥工业大学 基于多核尺度卷积神经网络的sar图像目标分类方法
CN114092760A (zh) * 2021-11-05 2022-02-25 通号通信信息集团有限公司 卷积神经网络中自适应特征融合方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190236411A1 (en) * 2016-09-14 2019-08-01 Konica Minolta Laboratory U.S.A., Inc. Method and system for multi-scale cell image segmentation using multiple parallel convolutional neural networks
CN111753752A (zh) * 2020-06-28 2020-10-09 重庆邮电大学 基于卷积神经网络多层特征融合的机器人闭环检测方法
CN111797779A (zh) * 2020-07-08 2020-10-20 兰州交通大学 基于区域注意力多尺度特征融合的遥感图像语义分割方法
CN112183295A (zh) * 2020-09-23 2021-01-05 上海眼控科技股份有限公司 行人重识别方法、装置、计算机设备及存储介质
CN113111975A (zh) * 2021-05-12 2021-07-13 合肥工业大学 基于多核尺度卷积神经网络的sar图像目标分类方法
CN114092760A (zh) * 2021-11-05 2022-02-25 通号通信信息集团有限公司 卷积神经网络中自适应特征融合方法及系统

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958774A (zh) * 2023-09-21 2023-10-27 北京航空航天大学合肥创新研究院 一种基于自适应空间特征融合的目标检测方法
CN116958774B (zh) * 2023-09-21 2023-12-01 北京航空航天大学合肥创新研究院 一种基于自适应空间特征融合的目标检测方法
CN117690128A (zh) * 2024-02-04 2024-03-12 武汉互创联合科技有限公司 胚胎细胞多核目标检测系统、方法和计算机可读存储介质
CN117690128B (zh) * 2024-02-04 2024-05-03 武汉互创联合科技有限公司 胚胎细胞多核目标检测系统、方法和计算机可读存储介质
CN117933309A (zh) * 2024-03-13 2024-04-26 西安理工大学 一种用于双时相遥感图像变化发现的三路神经网络及方法

Also Published As

Publication number Publication date
CN114092760A (zh) 2022-02-25

Similar Documents

Publication Publication Date Title
WO2023077998A1 (zh) 卷积神经网络中自适应特征融合方法及系统
WO2019223382A1 (zh) 单目深度估计方法及其装置、设备和存储介质
CN110837811B (zh) 语义分割网络结构的生成方法、装置、设备及存储介质
CN108710885B (zh) 目标对象的检测方法和装置
WO2023060746A1 (zh) 一种基于超分辨率的小图像多目标检测方法
US20220230282A1 (en) Image processing method, image processing apparatus, electronic device and computer-readable storage medium
KR20200015611A (ko) 시맨틱 분할 모델을 위한 훈련 방법 및 장치, 전자 기기, 저장 매체
CN112862877B (zh) 用于训练图像处理网络和图像处理的方法和装置
CN113361710B (zh) 学生模型训练方法、图片处理方法、装置及电子设备
US11030750B2 (en) Multi-level convolutional LSTM model for the segmentation of MR images
WO2022067668A1 (zh) 基于视频图像目标检测的火灾检测方法、系统、终端以及存储介质
CN112861830B (zh) 特征提取方法、装置、设备、存储介质以及程序产品
WO2023077809A1 (zh) 神经网络训练的方法、电子设备及计算机存储介质
US11393072B2 (en) Methods and systems for automatically correcting image rotation
CN114359289A (zh) 一种图像处理方法及相关装置
CN110633716A (zh) 一种目标对象的检测方法和装置
CN114913325B (zh) 语义分割方法、装置及计算机程序产品
Maslov et al. Fast depth reconstruction using deep convolutional neural networks
CN117372928A (zh) 一种视频目标检测方法、装置及相关设备
US20230046088A1 (en) Method for training student network and method for recognizing image
CN112990046B (zh) 差异信息获取方法、相关装置及计算机程序产品
WO2023102724A1 (zh) 图像的处理方法和系统
CN113139463B (zh) 用于训练模型的方法、装置、设备、介质和程序产品
US20230072641A1 (en) Image Processing and Automatic Learning on Low Complexity Edge Apparatus and Methods of Operation
CN115861755A (zh) 特征融合方法、装置、电子设备及自动驾驶车辆

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889024

Country of ref document: EP

Kind code of ref document: A1