CN111753849A - Detection method and system based on compact aggregation feature and cyclic residual learning - Google Patents
Detection method and system based on compact aggregation feature and cyclic residual learning Download PDFInfo
- Publication number
- CN111753849A CN111753849A CN202010606592.7A CN202010606592A CN111753849A CN 111753849 A CN111753849 A CN 111753849A CN 202010606592 A CN202010606592 A CN 202010606592A CN 111753849 A CN111753849 A CN 111753849A
- Authority
- CN
- China
- Prior art keywords
- detection
- aggregation
- convolution
- features
- saliency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 56
- 230000002776 aggregation Effects 0.000 title claims abstract description 42
- 238000004220 aggregation Methods 0.000 title claims abstract description 42
- 125000004122 cyclic group Chemical group 0.000 title claims abstract description 18
- 230000000007 visual effect Effects 0.000 claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 16
- 238000000605 extraction Methods 0.000 claims abstract description 11
- 238000011176 pooling Methods 0.000 claims abstract description 11
- 238000005457 optimization Methods 0.000 claims abstract description 10
- 230000007246 mechanism Effects 0.000 claims abstract description 8
- 238000012360 testing method Methods 0.000 claims abstract description 7
- 239000010410 layer Substances 0.000 claims description 27
- 230000004927 fusion Effects 0.000 claims description 15
- 239000002356 single layer Substances 0.000 claims description 13
- 230000009467 reduction Effects 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 claims description 3
- 241000282326 Felis catus Species 0.000 claims description 2
- 238000002474 experimental method Methods 0.000 claims description 2
- 239000011800 void material Substances 0.000 claims description 2
- 238000007689 inspection Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 4
- 238000012545 processing Methods 0.000 abstract description 3
- 230000001629 suppression Effects 0.000 abstract description 2
- 238000003909 pattern recognition Methods 0.000 description 7
- 238000011156 evaluation Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000005452 bending Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention aims to provide a detection method and a detection system based on compact aggregate characteristics and cyclic residual error learning, and belongs to the technical field of image processing. The system comprises a compact feature extraction module, all feature aggregation modules and a circulating residual error optimization module, and the method comprises the following steps: extracting compact convolution characteristics, combining output characteristics of continuous stages together, and adopting a cavity space pyramid pooling module to realize multi-layer characteristic external information aggregation aiming at the compact convolution characteristics extracted from all layers; under a deep supervision mechanism, the method is continuously optimized in a residual learning mode, the whole cyclic residual network is tested on three visual saliency detection data sets, and the cyclic residual network based on the compact aggregation characteristics can be used for the actual application of the visual saliency detection in natural images after the test is finished. The invention improves the detection effect of visual saliency detection in a complex scene, and enhances the suppression of background noise and the continuity and integrity of a detection area.
Description
Technical Field
The invention relates to a detection method and a detection system based on compact aggregate characteristics and cyclic residual error learning, and belongs to the technical field of image processing.
Background
The visual saliency detection technology aims to detect the most distinctive target in a natural image and screen out complete target content. With the advantage of helping to reduce the complexity of computer understanding and analysis of natural images, visual saliency detection techniques have become one of the important pre-processing steps for many computer vision tasks, including image retrieval, visual tracking, scene classification, and pedestrian re-identification.
The traditional algorithm is based on the comparison or statistical information of manual features such as color, brightness and texture in the image and is accumulated by means of priori knowledge of workers. The convolutional neural network can autonomously and quickly learn the effective characteristics of the image, and the future development space in the field of image processing is large. With the continuous stacking of the convolutional layer and the pooling layer, the resolution of the output features of the high layer of the neural network is gradually reduced, and the semantic information is enhanced. However, when the high-level features are directly applied to the task of detecting the visual saliency at the pixel level, although the salient objects can be accurately positioned, the high-level features lack detailed information and are rough in overall appearance. And the shallow high-resolution feature of the convolutional neural network has the advantage of retaining spatial detail information.
Disclosure of Invention
In order to improve the detection effect of visual saliency detection in a complex scene and enhance the suppression of background noise and the continuity and integrity of a detection area, the invention provides a detection system, which comprises: the tight feature extraction module is used for realizing effective information aggregation in a single layer by adopting a tight connection mode aiming at the last convolutional layer feature of the second to the fifth stages of the ResNeXt101 network; all the feature aggregation modules utilize the ASPP module to realize the exchange and fusion of the feature information of the external layers aiming at the features of the different resolutions of the layers; and the cyclic residual optimization module is used for repeatedly utilizing the compact aggregation characteristics to continuously optimize the predicted saliency map under a deep supervision mechanism.
Another objective of the present invention is to provide a method for detecting the visual saliency of the cyclic residual based on the tightly aggregated features; firstly, extracting compact convolution characteristics of different levels from a basic network respectively, then aggregating all the characteristics of multiple resolutions, and finally realizing continuous optimization of a saliency map in a mode of circularly learning residual errors under a deep supervision mechanism; the method comprises the following steps:
s1, extracting compact convolution characteristics from the basic framework ResNeXt101 network, combining output characteristics of continuous stages together in a dense connection mode, covering a larger receptive field and realizing single-layer characteristic internal information fusion;
s2 only carries out tight aggregation on the basic features of a single layer, and neglecting the fusion of information among different depths and resolution features of the deep neural network is not beneficial to visual saliency detection, so that the aggregation of external information of multilayer features is realized by adopting a cavity space pyramid pooling module aiming at the tight convolution features extracted from all layers;
s3, under a deep supervision mechanism, the tight aggregation characteristics are recycled, continuous optimization is performed in a residual error learning mode, and a proper cycle number is determined through experiments;
s4, the whole cyclic residual error network is tested on three visual saliency detection data sets, and after the test is finished, the cyclic residual error network based on the compact aggregation characteristics can be used for the actual application of the visual saliency detection in natural images.
Optionally, the S1 includes:
firstly, aiming at the characteristics of 256 channels, 512 channels, 1024 channels and 2048 channels of the last convolutional layer of the second to fifth stages in the ResNeXt101 network, a convolution operation with the kernel size of 3 and the number of the channels of 128 is used for dimensionality reduction, dimensionality reduction characteristics are multiplexed to each stage which is then closely connected, the fusion of subsequent information is guided, namely the input of each current stage is the result of the characteristic cascade of all the previous stages, the convolution with the kernel size of 3 and the number of the channels of 64 is uniformly used for extracting characteristic information, and finally the output of a tight characteristic extraction module is obtained by cascading the dimensionality reduction characteristics and the intermediate output of a plurality of stages.
Optionally, the S2 includes:
firstly, cascading the compact convolution characteristics extracted from all layers, realizing dimensionality reduction through convolution operation with two kernels of 3 and channels of 256, then sending the compact convolution characteristics into a void space pyramid pooling module to realize information fusion, namely, parallelly performing convolution operation with one kernel of 1 and one channel of 128, three kernels of 3, expansion rates of 2,4 and 6 and channels of 128 respectively, and a combination of global mean pooling, kernel of 1 and channels of 128, and finally cascading the characteristics of 5 paths through convolution operation with kernel of 1 and channel of 256 to realize aggregation dimensionality reduction, namely obtaining compact aggregated characteristics (DAF). The output characteristics tightly aggregated inside a single layer and outside multiple layers have strong characteristic expression capability and contain rich significant clues.
Optionally, the S3 includes:
firstly, S2 is used to obtain compact aggregation characteristic, and an initial saliency map SM is obtained through convolution operation with kernel size of 1 and channel number of 10Then, the aggregation feature and the saliency map are repeatedly input to a Residual Convolution Block (RCB) in a cascade mode to learn the residual, and the saliency map after the k-th cycle is represented as SMk=RCB(Cat(SMk-1,DAF))+SMk-1Wherein RCB (-) includes convolution operations with two kernel sizes of 3, channel number of 128, and one kernel size of 1, channel number of 1; cat (-) denotes cascading input features across channels. And after the proper cycle number K, obtaining a final saliency map through sigmoid operation.
The whole network training adopts a calculation mode ofStandard cross entropy loss, wherein SMtAnd GTtRespectively representing the saliency of the T-th pixel on the saliency map and the true value map, T representing the total number of pixels of the image, GT 1 and GT 0 representing the saliency and the non-saliency pixels, respectively, and SMt∈[0,1]Representing the significance of the algorithm prediction. The closer the saliency map and the truth map are, the smaller the penalty value. The application of deep supervision mechanisms can place constraints in the middle part of the network, driving the overall trend towards more detailed learning and optimization. In the course of a cycle, a plurality of saliency maps { SM } are generated0,SM1,L,SMKCalculating cross entropy for each output between significance map and truth map, so totalA loss ofIn the process of optimizing each cycle residual error, the input and the output both obtain the constraint of loss, and the learning and the information fusion of the deep aggregation characteristics are more facilitated. By minimizing the total loss, the network parameters are continuously refined to obtain the final model.
The invention has the following beneficial effects:
the invention realizes tight aggregation in the single-layer interior and the multi-layer exterior of the convolutional network, continuously optimizes the obvious prediction result in a cyclic residual error mode, alleviates the problems of regional integrity and continuity in the current deep visual significance detection technology, and improves the detection accuracy and smoothness.
The invention designs a compact characteristic extraction module which is simple, effective and strong in portability aiming at single-layer characteristics in a basic framework, and can effectively enhance the reusability and continuity of the characteristics.
The invention utilizes the cavity space pyramid pooling module to effectively aggregate the closely extracted features of a plurality of levels and different resolutions, directly promotes the fusion of information among the features without layer convolution, and improves the result of visual saliency detection.
Drawings
Fig. 1 is a schematic diagram of a cyclic residual significance detection network structure based on a tight aggregation feature according to the present invention.
Fig. 2 is a compact feature extraction module for the interior of a single layer as proposed by the present invention.
FIG. 3 is an all level feature aggregation module for a multi-layer exterior employed by the present invention.
FIG. 4 is a schematic diagram of the detection results of the present invention and other deep visual saliency detection algorithms on a public data set.
Detailed Description
The first embodiment is as follows:
a detection method based on close aggregation features and cyclic residual learning, see fig. 1, the method comprising the steps of:
step 1: the common data set MSRA10K is set as a training set containing 10000 natural RGB images and corresponding binary true value maps. In order to enhance the robustness of the network to image transformation and to solve the over-fitting problem, the method adopts the modes of random rotation, random cutting and horizontal turnover to realize the expansion of the training sample.
The MSRA10K data set is referred to in Cheng Mingming as "Global Contrast based SalientRegion Detection", published in 2011 at pages 409 and 416 of IEEE Confenre on Computer Vision and Pattern Recognition.
Step 2: referring to fig. 1, training samples are first input into a resenext 101 base network with full link layers removed, and convolution characteristics with channel numbers of 256,512,1024 and 2048 are obtained from the last convolution layer of the second to fifth stages, respectively. The higher the level, the richer the semantic information of the convolution features, and the shallow features better retain the detail and texture information.
The ResNeXt101 network may be referred to as Xie Saining "Aggregated resource transformation for Deep Neural Networks" published in 2017 on IEEE Confenrence on computer Vision and Pattern registration, page 5987. 5995.
And step 3: referring to fig. 2, for the features of different layers obtained in step 2, the feature information of the interior of a single layer is exchanged through the compact feature extraction module. Firstly, using convolution operation with kernel size of 3 x 3 and channel number of 128 to reduce dimension of original feature, then multiplexing the dimension-reduced feature to each stage which is closely connected afterwards, guiding the fusion of subsequent information, then inputting each current stage as the result of feature cascade of all previous stages, uniformly using convolution with kernel size of 3 x 3 and channel number of 64 to extract feature information, and finally outputting the result obtained by cascade connection of dimension-reduced feature and intermediate output of multiple stages.
And 4, step 4: and 4, fusing the multilayer external feature information of the four layers of convolution features with different resolutions obtained in the step 3 through all feature aggregation modules. All the feature aggregation modules are composed of two convolutions with kernel size of 3 × 3 and channel number of 256 and a cavity space pyramid pooling module for operation. The cavity space pyramid pooling module inputs five parallel paths, wherein the first path comprises convolution operation with the kernel size of 1 multiplied by 1 and the number of channels of 128, the middle three paths are convolution operation with the kernel size of 3 multiplied by 3, the cavity expansion rate values of 2,4 and 6 and the number of channels of 128 respectively, the last path is convolution operation with the kernel size of 1 multiplied by 1 and the number of channels of 128 in sequence, and finally the output characteristics of the five paths are subjected to convolution dimensionality reduction with the kernel size of 1 multiplied by 1 and the number of channels of 256. The compact aggregation characteristics obtained by compact aggregation of the single-layer internal characteristic information and the multi-layer external characteristic information have strong characteristic expression capability and contain rich significant clues.
And 5: referring to fig. 1, firstly, an initial saliency map is obtained by using compact aggregation features through convolution operation with a kernel size of 1 × 1 and a channel number of 1, then the saliency map and the compact aggregation features in the previous cycle stage are repeatedly cascaded and input into a residual convolution block formed by convolution operation with two kernels of 3 × 3, a channel number of 128, a kernel size of 1 × 1 and a channel number of 1, a predicted saliency map is continuously optimized, and a saliency map is obtained by performing sigmoid operation on a result after a suitable cycle number.
Step 6: and under a deep learning framework of the Pythrch, training the whole network by adopting a random gradient descent algorithm until loss convergence stops, and storing an optimal network model.
And 7: in order to determine the setting of the cycle number, the whole network is trained under different cycle numbers, and the test is carried out on the public data set DUT-OMRON data set, the results of three objective evaluation indexes of an F value, an MAE value and an S value are given in table 1, and it can be seen that the detection effect is improved to a certain extent along with the increase of the cycle number. The number of cycles was finally determined to be 6.
Table 1: the method and the device have the advantages that the related evaluation index results on the DUT-OMRON data set under different cycles
The DUT-OMRON data set may be referred to as "Frequency-tuned SalientObject Detection" by Radhakrishna Achantay, published in 2009 at IEEE Confenrence on Computer Vision and Pattern Recognition, p 1597. sup. 1604.
And 8: in order to show the superiority of the performance of the cyclic residual visual saliency detection network based on the compact aggregation feature, the application compares the ECSSD, HKU-IS and DUT-OMRON data sets with the currently advanced methods of RFCN, UCF, NLDF, GBR, MPFF, R3Net and RefineNet, respectively. The relevant objective evaluation index values on different test sets are shown in table 2, and it can be seen that the detection effect of the method provided by the invention is in the front.
Table 2: the application and different algorithms have relative evaluation index comparison results on different test sets
Comparison of visual effects of partial test images referring to FIG. 4, the first four rows of images are from the ECSSD dataset, the middle four rows of images are from the HKU-IS dataset, and the last four rows of images are from the DUT-OMRON dataset. In terms of visual effect, different algorithms can effectively detect partial effective salient regions, but the problems of incomplete regions, unclear boundaries, background interference and the like still exist. The detection results of the characters, the flowers and the objects in the first, the fourth and the seventh rows show that the areas detected by the method are more complete and smooth. The detection result of the third row of fishes shows that under the condition of higher similarity with the background, the robustness of the method provided by the invention is better, and the position of the target can be well positioned. The detection result of the fifth element orange shows that under the interference of the same type of targets, the method provided by the invention can still directly position and keep high accuracy. The detection result of the balloon in the twelfth row shows that the method provided by the invention can still accurately detect the obvious target in the dark environment. In a comprehensive view, the use of the close aggregation feature effectively improves the integrity of the detection area, inhibits background noise, and makes the spatial structure of the predicted saliency map closer to that of the true value map.
The ECSSD data set is referred to by Yan Qiong as "Hierarchical sales Detection" published in 2013 on IEEE Confenrence on Computer Vision and Pattern Recognition, pages 1155-1162.
The HKU-IS data set IS referenced to Li Guinbin "Visual science Based on multiscale deep feeds" published in 2015 at pages 5455 and 5463 of IEEE Confenrence on Computer Vision and Pattern Recognition.
RFCN can refer to "Saliency Detection with recovery FullyConvolitional Networks" of Wang Linzhao, published in 2016 at IEEE Confenrence on computer Vision and Pattern Recognition, page 825 and 841.
UCF can refer to Zhang PingPing, "Learning Unsequent software for Accurate Detection, published in 2017 at IEEE Confenrence on computer Vision and Pattern Recognition, page 212 and 221.
NLDF can be referred to the Lun Zhiming "Non-local Deep Features for sales ObjectDetection" published in 2017 at pages 6609 and 6617 of IEEE Confenrence on Computer Vision and Pattern Recognition.
GBR can refer to Tan Xin "Saliency Detection by Deep Network with BoundarryrReference and Global Context", published in 2018 on pages 1-6 of International conference Multimedia and Expo.
MPFF can be referred to as "Multi-Path Feature Fusion Network for sales Detection" by Zhu Hengliang, published in 2018 on International Conference on multimedia and Expo on pages 1-6.
R3Net can refer to "R3 Net: Current resource network for functionality Detection" of Ding bending, which was published in 2018 at 684-.
RefineNet can refer to "Refineet: A Deep Segmentation assistance recovery Network for sales Object Detection" of Keren Fu, published in 2019 on IEEETES ANSACTIONS ON MULTI MEDIA, page 457-469.
Example two
A detection system is applied to the detection method of the first embodiment and comprises a compact feature extraction module, an all-feature aggregation module and a cyclic residual optimization module. The compact feature extraction module is used for realizing effective information aggregation in a single layer by adopting a compact connection mode aiming at the last convolutional layer feature of the second to the fifth stages of the ResNeXt101 network;
all the feature aggregation modules utilize the ASPP module to realize the exchange and fusion of the feature information of the external layers aiming at the features of the layers with different resolutions;
and the loop residual optimization module repeatedly utilizes the compact aggregation characteristics to continuously optimize the predicted saliency map under a deep supervision mechanism.
Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (8)
1. A detection system, comprising:
the tight feature extraction module is used for realizing effective information aggregation in a single layer by adopting a tight connection mode aiming at the last convolutional layer feature of the second to the fifth stages of the ResNeXt101 network;
all the feature aggregation modules utilize the ASPP module to realize the exchange and fusion of the feature information of the external layers aiming at the features of the different resolutions of the layers;
and the cyclic residual optimization module is used for repeatedly utilizing the compact aggregation characteristics to continuously optimize the predicted saliency map under a deep supervision mechanism.
2. A method of detection, comprising the steps of:
s1, extracting compact convolution characteristics from the basic framework ResNeXt101 network, combining output characteristics of continuous stages together in a dense connection mode, covering a larger receptive field and realizing single-layer characteristic internal information fusion;
s2 only carries out tight aggregation on the basic features of a single layer, and neglecting the fusion of information among different depths and resolution features of the deep neural network is not beneficial to visual saliency detection, so that the aggregation of external information of multilayer features is realized by adopting a cavity space pyramid pooling module aiming at the tight convolution features extracted from all layers;
s3, under a deep supervision mechanism, the tight aggregation characteristics are recycled, continuous optimization is performed in a residual error learning mode, and a proper cycle number is determined through experiments;
s4, the whole cyclic residual error network is tested on three visual saliency detection data sets, and after the test is finished, the cyclic residual error network based on the compact aggregation characteristics can be used for the actual application of the visual saliency detection in natural images.
3. The detection method according to claim 2, wherein in S1, firstly, for the last convolution layer of the second to the fifth stages in the resenext 101 network, the last convolution layer has 256,512,1024 and 2048 channels, respectively, dimension reduction is performed by using a convolution operation with a kernel size of 3 and a channel number of 128, the dimension reduction features are multiplexed to each stage which is closely connected afterwards, and fusion of subsequent information is guided, that is, the input of each current stage is the result of feature concatenation of all previous stages, the convolution with a kernel size of 3 and a channel number of 64 is uniformly used for feature information extraction, and the output of the final tight feature extraction module is obtained by cascading the dimension reduction features and intermediate outputs of a plurality of stages.
4. The detection method according to claim 2, wherein in S2, the tight convolution features extracted from all layers are first concatenated, dimension reduction is achieved through two convolution operations with kernel size of 3 and number of channels of 256, then the concatenated features are sent to a void space pyramid pooling module to achieve information fusion, that is, the concatenated features are parallelly passed through a convolution operation with kernel size of 1 and number of channels of 128, three convolution operations with kernel size of 3 and expansion ratios of 2,4,6 respectively and number of channels of 128 and a combination of global mean pooling and kernel size of 1 and number of channels of 128, and finally the feature concatenation of 5 paths is passed through convolution operations with kernel size of 1 and number of channels of 256 to achieve aggregation dimension reduction, that is, the tight aggregation features are obtained.
5. The detection method according to claim 2, wherein in S3, the close clustering feature is obtained first using S2, and the initial saliency map SM is obtained through a convolution operation with kernel size of 1 and channel number of 10Then repeatedly inputting the aggregation characteristics and the saliency map into the residual volume block to learn the residual, wherein the saliency map after the k-th cycle is represented as SMk=RCB(Cat(SMk-1,DAF))+SMk-1Wherein RCB (-) includes convolution operations with two kernel sizes of 3, channel number of 128, and one kernel size of 1, channel number of 1; cat (-) denotes cascading input features across channels. And after the proper cycle number K, obtaining a final saliency map through sigmoid operation.
6. A detection method according to claim 5, characterized in that the whole network training is calculated asStandard cross entropy loss of (1), wherein SMtAnd GTtRespectively representing the saliency of the T-th pixel on the saliency map and the true value map, T representing the total number of pixels of the image, GT 1 and GT 0 representing the saliency and the non-saliency pixels, respectively, and SMt∈[0,1]Representing the significance of the algorithm prediction.
8. A detection method according to claim 7, characterized in that during each cycle of residual optimization, both the input and the output get a constraint of loss.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010606592.7A CN111753849B (en) | 2020-06-29 | 2020-06-29 | Detection method and system based on tight aggregation feature and cyclic residual error learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010606592.7A CN111753849B (en) | 2020-06-29 | 2020-06-29 | Detection method and system based on tight aggregation feature and cyclic residual error learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111753849A true CN111753849A (en) | 2020-10-09 |
CN111753849B CN111753849B (en) | 2024-06-28 |
Family
ID=72678047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010606592.7A Active CN111753849B (en) | 2020-06-29 | 2020-06-29 | Detection method and system based on tight aggregation feature and cyclic residual error learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111753849B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113869371A (en) * | 2021-09-03 | 2021-12-31 | 深延科技(北京)有限公司 | Model training method, clothing fine-grained segmentation method and related device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875777A (en) * | 2018-05-03 | 2018-11-23 | 浙江大学 | Kinds of fibers and blending rate recognition methods in textile fabric based on two-way neural network |
CN109447976A (en) * | 2018-11-01 | 2019-03-08 | 电子科技大学 | A kind of medical image cutting method and system based on artificial intelligence |
US20200026942A1 (en) * | 2018-05-18 | 2020-01-23 | Fudan University | Network, System and Method for Image Processing |
CN111275718A (en) * | 2020-01-18 | 2020-06-12 | 江南大学 | Clothes amount detection and color protection washing discrimination method based on significant region segmentation |
-
2020
- 2020-06-29 CN CN202010606592.7A patent/CN111753849B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875777A (en) * | 2018-05-03 | 2018-11-23 | 浙江大学 | Kinds of fibers and blending rate recognition methods in textile fabric based on two-way neural network |
US20200026942A1 (en) * | 2018-05-18 | 2020-01-23 | Fudan University | Network, System and Method for Image Processing |
CN109447976A (en) * | 2018-11-01 | 2019-03-08 | 电子科技大学 | A kind of medical image cutting method and system based on artificial intelligence |
CN111275718A (en) * | 2020-01-18 | 2020-06-12 | 江南大学 | Clothes amount detection and color protection washing discrimination method based on significant region segmentation |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113869371A (en) * | 2021-09-03 | 2021-12-31 | 深延科技(北京)有限公司 | Model training method, clothing fine-grained segmentation method and related device |
Also Published As
Publication number | Publication date |
---|---|
CN111753849B (en) | 2024-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111210443B (en) | Deformable convolution mixing task cascading semantic segmentation method based on embedding balance | |
CN113052210B (en) | Rapid low-light target detection method based on convolutional neural network | |
CN110210539B (en) | RGB-T image saliency target detection method based on multi-level depth feature fusion | |
CN111582316B (en) | RGB-D significance target detection method | |
CN111291809B (en) | Processing device, method and storage medium | |
CN110533041B (en) | Regression-based multi-scale scene text detection method | |
CN110569851B (en) | Real-time semantic segmentation method for gated multi-layer fusion | |
CN111797841B (en) | Visual saliency detection method based on depth residual error network | |
CN111353544B (en) | Improved Mixed Pooling-YOLOV 3-based target detection method | |
CN113011329A (en) | Pyramid network based on multi-scale features and dense crowd counting method | |
CN112580458B (en) | Facial expression recognition method, device, equipment and storage medium | |
CN113033454B (en) | Method for detecting building change in urban video shooting | |
CN112597985A (en) | Crowd counting method based on multi-scale feature fusion | |
CN114022408A (en) | Remote sensing image cloud detection method based on multi-scale convolution neural network | |
CN112580480A (en) | Hyperspectral remote sensing image classification method and device | |
CN110852199A (en) | Foreground extraction method based on double-frame coding and decoding model | |
CN111401380A (en) | RGB-D image semantic segmentation method based on depth feature enhancement and edge optimization | |
CN113139544A (en) | Saliency target detection method based on multi-scale feature dynamic fusion | |
CN116740439A (en) | Crowd counting method based on trans-scale pyramid convertors | |
CN113850324A (en) | Multispectral target detection method based on Yolov4 | |
CN111898614B (en) | Neural network system and image signal and data processing method | |
CN114092540A (en) | Attention mechanism-based light field depth estimation method and computer readable medium | |
Chua et al. | Visual IoT: ultra-low-power processing architectures and implications | |
CN116229406B (en) | Lane line detection method, system, electronic equipment and storage medium | |
CN111753849A (en) | Detection method and system based on compact aggregation feature and cyclic residual learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |