CN111179212A - Method for realizing micro target detection chip integrating distillation strategy and deconvolution - Google Patents

Method for realizing micro target detection chip integrating distillation strategy and deconvolution Download PDF

Info

Publication number
CN111179212A
CN111179212A CN201911091454.3A CN201911091454A CN111179212A CN 111179212 A CN111179212 A CN 111179212A CN 201911091454 A CN201911091454 A CN 201911091454A CN 111179212 A CN111179212 A CN 111179212A
Authority
CN
China
Prior art keywords
layer
network
output
size
learning network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911091454.3A
Other languages
Chinese (zh)
Other versions
CN111179212B (en
Inventor
熊伟华
吴华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Jingmou Intelligent Technology Co Ltd
Original Assignee
Hangzhou Jingmou Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Jingmou Intelligent Technology Co Ltd filed Critical Hangzhou Jingmou Intelligent Technology Co Ltd
Publication of CN111179212A publication Critical patent/CN111179212A/en
Application granted granted Critical
Publication of CN111179212B publication Critical patent/CN111179212B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

A method for realizing a tiny target detection chip integrating a distillation strategy and deconvolution is characterized in that a plurality of intermediate feature maps of a teaching network for a high-resolution image are trained on a plurality of layers in a learning network for a low-resolution image and comprising deconvolution layers in a loss-resistant learning mode, so that the receptive field of the low-pixel image is enlarged, the output precision of the learning network is improved, and the size of the chip is reduced. The invention simply designs the target detection task as the classification task through the learning network, and only needs to judge whether the target exists in the 20 multiplied by 20 pixel area, so that the detection of the tiny object can effectively eliminate the false detection while the chip area is small and the required memory is less when the hardware is realized.

Description

Method for realizing micro target detection chip integrating distillation strategy and deconvolution
Technical Field
The invention relates to a technology in the field of image detection, in particular to a method for realizing a tiny target detection chip integrating a distillation strategy and deconvolution.
Background
Although there are several methods for detecting objects using convolutional neural networks, most popular algorithms perform well when the target occupies a large portion of the image (typically greater than 20 square pixels in size). Recently, many algorithms have emerged to detect small objects with low resolution (less than 20 square pixels). These methods typically rely on multi-scale resolution and detect target objects of different sizes at the corresponding resolution. The structure detects multiple target objects simultaneously (multitask), which is beneficial to improving the detection of tiny target objects, but requires larger storage amount and longer calculation time in the hardware implementation process.
Disclosure of Invention
The invention provides a method for realizing a small target detection chip integrating a distillation strategy and deconvolution, which aims at the defects in the prior art, and uses a plurality of deconvolution layers to expand the receptive field, and the deconvolution layers are distilled through a pre-training convolution network with high-resolution objects, so that the precision similar to that of large objects on the detection of the small objects is achieved.
The invention is realized by the following technical scheme:
the invention relates to a method for realizing a micro target detection chip integrating a distillation strategy and deconvolution, which trains a plurality of intermediate characteristic maps of a teaching network for a high-resolution image on a plurality of layers in a learning network for a low-resolution image and containing deconvolution layers in a loss-resistant learning mode, enlarges the receptive field of the low-pixel image, improves the output precision of the learning network and reduces the size of the chip.
The learning network does not contain a residual error structure.
The intermediate feature maps used by the guide network to combat loss learning are processed by a residual error network.
Technical effects
Compared with the prior art, the learning network does not contain a residual error structure, so that clock waiting is not needed when hardware is realized, the speed is higher, the feature data of different scales are prevented from being read for fusion operation, and the read-write power consumption is reduced; the target detection task is simply designed into a classification task through a learning network, and only the existence of a target in a 20 multiplied by 20 pixel region needs to be judged, so that the detection of a tiny object can effectively eliminate false detection while the chip area is small and the required memory is small when hardware is realized.
Drawings
FIG. 1 is a schematic diagram of the integrated distillation strategy and deconvolution micro target detection architecture of the present invention;
FIG. 2 is a schematic diagram of a residual structure used in the teaching network;
FIG. 3 is a schematic diagram illustrating the effects of the embodiment.
Detailed Description
As shown in fig. 1, a minute target detection architecture integrating a distillation strategy and deconvolution according to this embodiment includes: the learning network student Net containing the deconvolution layer for the low-resolution images and the teaching network Teachernet for the high-resolution images train multiple layers in the learning network in a loss-resistant learning mode through a plurality of intermediate feature maps of the teaching network, and improve the output accuracy of the learning network while expanding the receptive field of the low-pixel images.
The learning network comprises: a convolutional layer 400, a normalization layer 402 with an S-shaped rectifying nonlinear active unit, a convolutional layer 404, an deconvolution layer 406, a normalization layer 408 with an S-shaped rectifying nonlinear active unit, a convolutional layer 410, a normalization layer 412 with an S-shaped rectifying nonlinear active unit, a pooling layer 414, a normal convolutional layer 416 and a full connection layer 418 which are connected in sequence, wherein: a convolution layer 400 receives an input image with a size of 20 × 20 × 3 and outputs a feature map 401 with a size of 20 × 20 × 32 to a normalization layer 402 for normalization, a convolution layer 404 outputs a feature map 405 with a size of 40 × 40 × 32 based on a feature map 403, a deconvolution layer 406 outputs a feature map 407 with a size of 40 × 40 × 32 to a normalization layer 408 for normalization, a convolution layer 410 outputs a feature map 411 with a size of 40 × 40 × 32 based on a feature map 409, normalization is performed by a normalization layer 412, and a pooling layer 414 obtains a maximum value for each 2 × 2 region based on a feature map 413, and a feature map 415 with the size of 20 × 20 × 32 is obtained by sampling every 2 pixels in the width-height direction, and is output to the full-link layer 418 after passing through the normal convolution layer 416, and finally, a vector with the size of 1 × 4096, that is, a final feature vector of the image, is output.
The fully-connected layer 418 outputs the image feature vectors for use as input to a subsequent classifier for determining the type of object (e.g., face, license plate, etc.) detected in the image.
The deconvolution layer 406 in the learning network is used to expand the field of view of the low-pixel image, and the feature extraction of the low-pixel image is guided by the object detection of the high-resolution image.
The teaching network employs a ResNet50 architecture, that is, it includes a convolutional layer 200, four ResNet blocks 202, 204, 206, 208 connected in series, a max pooling layer 210, a fifth ResNet block 212, and a fully connected layer 214, wherein: the convolutional layer 200 receives a high-resolution image with the size of 40 × 40 × 3, the feature map 209 with the size of 40 × 40 × 32 guides the output feature map 413 of the normalization layer 412 of the learning network through the output of four concatenated ResNet blocks, the feature map 213 with the size of 20 × 20 × 32 output by the fifth ResNet block guides the output feature 417 of the ordinary convolutional layer 416 of the learning network, and the fully-connected layer 214 outputs a vector with the size of 1 × 4096, namely an image feature vector, which guides the output of the corresponding image feature vector of the learning network, namely the fully-connected layer 418.
The guidance is as follows: feature maps 209, 213, 215 output by the teaching network are respectively used for performing antagonistic training with feature maps 413, 417, 419 output by a normalization layer 412, a pooling layer 414 and a full-connection layer 418 of the learning network through a discrimination network, and when the two feature maps are different, guidance of the teaching network on the learning network is realized, so that the output of the learning network is finally consistent with the teaching network, specifically: the feature graph of the channel corresponding to the teaching network firstly finds the most similar channel in a plurality of channels of the corresponding layer of the learning network through cross correlation, namely, firstly, the variance is calculated for the feature array of each channel of the respective network, then the variance is sequenced, the feature graph of the teaching network with high variance is matched with the feature graph with high variance in the learning network through sequencing, and then the matched feature graph is input into a discrimination network.
The embodiment uses the network of IncepotionV 3 for countertraining, and performs counterlearning through a discriminator to transfer the characteristics of the teaching network to the learning network.
The convolutions described in this embodiment all adopt a layer-by-layer (depthwise) and point-by-point (pointwise) form, thereby significantly reducing the computational complexity, for example, the original convolution operation of 64 × 64 × 3 × 3 would become two continuous convolutions, the kernels are 64 × 1 × 3 × 3, respectively, that is, after independent convolution on each input channel, channel fusion is performed with 64 × 64 × 1 × 1, the parameter size is reduced to 4672/36864, the computational complexity is also significantly reduced, which is a convolution mode commonly used by mobile net at present.
As shown in fig. 2, a ResNet block in the teaching network directly adds metadata X of the block to an output F (X) of the block, so that the final output is a result of H (X) ═ F (X) + X, and the target value for training is F (X) ═ H (X) — X, that is, the "residual value" left after the input metadata is directly subtracted from the final output.
The learning network designed by the embodiment does not contain a residual error structure, so that the hardware is realized without clock waiting, the speed is higher, the fusion operation of reading feature data with different scales is avoided, and the reading and writing power consumption is reduced. The learning network simply designs a target detection task as a classification task, and only needs to judge whether a target exists in a 20 × 20 pixel region, so that the on-chip area is small and the required memory is small when hardware is realized.
In the embodiment, a small face data set (about 6000 faces) is constructed based on the open source data set face data set, the data set comprises a large face and a corresponding small face, and the learning network is trained by a distillation method and a non-distillation method, so that the accuracy of the scheme of the embodiment can be improved by 5%.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (8)

1. A method for realizing a tiny target detection chip integrating a distillation strategy and deconvolution is characterized in that a plurality of intermediate feature maps of a teaching network for a high-resolution image are trained on a plurality of layers in a learning network for a low-resolution image and comprising deconvolution layers in a loss-resistant learning mode, so that the receptive field of the low-pixel image is enlarged, the output precision of the learning network is improved, and the size of the chip is reduced; the learning network does not contain residual error structures, and the intermediate characteristic graph used for resisting loss learning by the guiding network is processed by the residual error network.
2. The method of claim 1, wherein the teach network employs a ResNet50 architecture comprising sequentially connected convolutional layers, four serially connected ResNet blocks, a max pooling layer, a fifth ResNet block, and a full connection layer, wherein: the convolution layer receives a high-resolution image with the size of 40 multiplied by 3, an output characteristic diagram of a characteristic diagram guide learning network normalization layer with the size of 40 multiplied by 32 is obtained through the output of four ResNet blocks connected in series, a characteristic diagram with the size of 20 multiplied by 32 is output by a fifth ResNet block, the output characteristic diagram of a common convolution layer of the learning network is guided, and a vector with the size of 1 multiplied by 4096 is output by the full connection layer, namely the image characteristic vector guides a corresponding image characteristic vector of the learning network, namely the output of the full connection layer.
3. The method of claim 1, wherein the learning network comprises: the convolution layer, the normalization layer that has S type rectification nonlinear activation unit, convolution layer, the anti-convolution layer that connect gradually, the normalization layer that has S type rectification nonlinear activation unit, convolution layer, the normalization layer that has S type rectification nonlinear activation unit, pooling layer, ordinary convolution layer and full connection layer, wherein: the convolution layer receives an input image with the size of 20 x 3 and outputs a feature map with the size of 20 x 32 to a normalization layer for normalization, the convolution layer outputs the feature map with the size of 40 x 32 according to the feature map, the deconvolution layer outputs the feature map with the size of 40 x 32 to the normalization layer for normalization, the convolution layer outputs the feature map with the size of 40 x 32 according to the feature map and performs normalization through the normalization layer, the pooling layer obtains the feature map with the size of 20 x 32 in a mode of obtaining the maximum value in each 2 x 2 area according to the feature map and sampling every 2 pixels in the width and height direction, the feature map is output to a full-connection layer after being processed by a common convolution layer, and finally, a vector with the size of 1 x 4096, namely a final feature vector of the image is output.
4. The method of claim 1, 2 or 3, wherein the instructions are: the feature graph output by the teaching network is respectively used for the feature graph output by the normalization layer, the pooling layer and the full connection layer of the learning network, the confrontation training is carried out through the discrimination network, and when the feature graph output by the teaching network and the normalization layer, the pooling layer and the full connection layer of the learning network are different, the teaching of the teaching network to the learning network is realized, so that the output of the learning network is consistent with the teaching network.
5. The method of claim 4, wherein said instructions are: the feature graph of the channel corresponding to the teaching network firstly finds the most similar channel in a plurality of channels of the corresponding layer of the learning network through cross correlation, namely, firstly, the variance is calculated for the feature array of each channel of the respective network, then the variance is sequenced, the feature graph of the teaching network with high variance is matched with the feature graph with high variance in the learning network through sequencing, and then the matched feature graph is input into a discrimination network.
6. The method of claim 4, wherein the countermeasure training is performed using a network of IncepotionV 3, and the characteristics of the teaching network are transferred to the learning network by a discriminator for countermeasure learning.
7. A method as claimed in any preceding claim, wherein said convolution takes the form of a layer-by-layer (depthwise) and a point-by-point (pointwise) to substantially reduce computational complexity.
8. The method of claim 2 wherein the ResNet block adds the block's metadata X directly to the block's output F (X), so the final output is the result of H (X) ═ F (X) + X, and the target value trained on it is F (X) ═ H (X) -X, i.e., the remaining value of the final output after subtracting the input metadata directly.
CN201911091454.3A 2018-11-10 2019-11-10 Method for realizing tiny target detection on-chip by integrating distillation strategy and deconvolution Active CN111179212B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862758517P 2018-11-10 2018-11-10
USUS62/758,517 2018-11-10

Publications (2)

Publication Number Publication Date
CN111179212A true CN111179212A (en) 2020-05-19
CN111179212B CN111179212B (en) 2023-05-23

Family

ID=70656217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911091454.3A Active CN111179212B (en) 2018-11-10 2019-11-10 Method for realizing tiny target detection on-chip by integrating distillation strategy and deconvolution

Country Status (1)

Country Link
CN (1) CN111179212B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183579A (en) * 2020-09-01 2021-01-05 国网宁夏电力有限公司检修公司 Method, medium and system for detecting micro target
JP7459425B2 (en) 2020-06-15 2024-04-02 インテル・コーポレーション Input image size switchable networks for adaptive runtime efficient image classification

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150065803A1 (en) * 2013-09-05 2015-03-05 Erik Scott DOUGLAS Apparatuses and methods for mobile imaging and analysis
US20170140248A1 (en) * 2015-11-13 2017-05-18 Adobe Systems Incorporated Learning image representation by distilling from multi-task networks
CN107247989A (en) * 2017-06-15 2017-10-13 北京图森未来科技有限公司 A kind of neural network training method and device
WO2018150083A1 (en) * 2017-02-16 2018-08-23 Nokia Technologies Oy A method and technical equipment for video processing
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
US20180307894A1 (en) * 2017-04-21 2018-10-25 General Electric Company Neural network systems
CN108764462A (en) * 2018-05-29 2018-11-06 成都视观天下科技有限公司 A kind of convolutional neural networks optimization method of knowledge based distillation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150065803A1 (en) * 2013-09-05 2015-03-05 Erik Scott DOUGLAS Apparatuses and methods for mobile imaging and analysis
US20170140248A1 (en) * 2015-11-13 2017-05-18 Adobe Systems Incorporated Learning image representation by distilling from multi-task networks
WO2018150083A1 (en) * 2017-02-16 2018-08-23 Nokia Technologies Oy A method and technical equipment for video processing
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
US20180307894A1 (en) * 2017-04-21 2018-10-25 General Electric Company Neural network systems
CN107247989A (en) * 2017-06-15 2017-10-13 北京图森未来科技有限公司 A kind of neural network training method and device
CN108764462A (en) * 2018-05-29 2018-11-06 成都视观天下科技有限公司 A kind of convolutional neural networks optimization method of knowledge based distillation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GUIBO ZHU ET AL.: "Feature Distilled Tracking" *
GUOBIN CHEN ET AL.: "Learning Efficient Object Detection Models with Knowledge Distillation" *
VASILEIOS BELAGIANNIS ET AL.: "Adversarial Network Compression" *
葛仕明 等: "基于深度特征蒸馏的人脸识别" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7459425B2 (en) 2020-06-15 2024-04-02 インテル・コーポレーション Input image size switchable networks for adaptive runtime efficient image classification
CN112183579A (en) * 2020-09-01 2021-01-05 国网宁夏电力有限公司检修公司 Method, medium and system for detecting micro target
CN112183579B (en) * 2020-09-01 2023-05-30 国网宁夏电力有限公司检修公司 Method, medium and system for detecting micro target

Also Published As

Publication number Publication date
CN111179212B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN112232232B (en) Target detection method
US20200218948A1 (en) Thundernet: a turbo unified network for real-time semantic segmentation
CN110782420A (en) Small target feature representation enhancement method based on deep learning
US11157764B2 (en) Semantic image segmentation using gated dense pyramid blocks
CN110879982B (en) Crowd counting system and method
KR102165273B1 (en) Method and system for channel pruning of compact neural networks
CN112215332B (en) Searching method, image processing method and device for neural network structure
CN116188999B (en) Small target detection method based on visible light and infrared image data fusion
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN114708437B (en) Training method of target detection model, target detection method, device and medium
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN109815931A (en) A kind of method, apparatus, equipment and the storage medium of video object identification
CN110782430A (en) Small target detection method and device, electronic equipment and storage medium
CN111179212B (en) Method for realizing tiny target detection on-chip by integrating distillation strategy and deconvolution
CN113343989A (en) Target detection method and system based on self-adaption of foreground selection domain
CN112541394A (en) Black eye and rhinitis identification method, system and computer medium
CN110503002B (en) Face detection method and storage medium
US11704894B2 (en) Semantic image segmentation using gated dense pyramid blocks
CN116844032A (en) Target detection and identification method, device, equipment and medium in marine environment
CN114492634A (en) Fine-grained equipment image classification and identification method and system
Yu et al. Intelligent corner synthesis via cycle-consistent generative adversarial networks for efficient validation of autonomous driving systems
KR20210109327A (en) Method and apparatus for learning artificial neural network
CN113255459B (en) Lane line detection method based on image sequence
CN111582057B (en) Face verification method based on local receptive field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant