WO2021164751A1 - 一种感知网络结构搜索方法及其装置 - Google Patents

一种感知网络结构搜索方法及其装置 Download PDF

Info

Publication number
WO2021164751A1
WO2021164751A1 PCT/CN2021/076984 CN2021076984W WO2021164751A1 WO 2021164751 A1 WO2021164751 A1 WO 2021164751A1 CN 2021076984 W CN2021076984 W CN 2021076984W WO 2021164751 A1 WO2021164751 A1 WO 2021164751A1
Authority
WO
WIPO (PCT)
Prior art keywords
convolutional layer
search space
network
sub
operation type
Prior art date
Application number
PCT/CN2021/076984
Other languages
English (en)
French (fr)
Inventor
韩凯
郭健元
王云鹤
陈醒濠
聂迎
许奕星
许春景
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21757765.9A priority Critical patent/EP4109343A4/en
Publication of WO2021164751A1 publication Critical patent/WO2021164751A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/945User interactive design; Environments; Toolboxes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/96Management of image or video recognition tasks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a method and device for searching perceptual network structure.
  • Computer vision is an inseparable part of various intelligent/autonomous systems in various application fields, such as manufacturing, inspection, document analysis, medical diagnosis, and military. It is about how to use cameras/video cameras and computers to obtain What we need is the knowledge of the data and information of the subject. To put it vividly, it is to install eyes (camera or video camera) and brain (algorithm) to the computer to replace the human eye to identify, track and measure the target, so that the computer can perceive the environment. Because perception can be seen as extracting information from sensory signals, computer vision can also be seen as a science that studies how to make artificial systems "perceive" from images or multi-dimensional data.
  • computer vision uses various imaging systems to replace the visual organs to obtain input information, and then the computer replaces the brain to complete the processing and interpretation of the input information.
  • the ultimate research goal of computer vision is to enable computers to observe and understand the world through vision like humans, and have the ability to adapt to the environment autonomously.
  • the visual perception network can complete more and more functions, including image classification, 2D detection, semantic segmentation (Mask), key point detection, linear object detection (such as lane line or stop line detection in automatic driving technology), and driving Area detection, etc.
  • the visual perception system has the characteristics of low cost, non-contact, small size, and large amount of information. With the continuous improvement of the accuracy of visual perception algorithms, it has become the key technology of many artificial intelligence systems today, and has been more and more widely used, such as: advanced driving assistance system (ADAS) and automatic driving system (autonomous driving system).
  • ADAS advanced driving assistance system
  • autonomous driving system autonomous driving system
  • ADS driving system
  • NAS neural architecture search
  • the traditional solutions are for a separate structure search for a certain part of the visual perception network, so that the final perception network may not be able to meet the application requirements well.
  • this application provides a perceptual network structure search method, including:
  • the perception network to be searched includes a backbone network, a feature pyramid network FPN, and a head-end header.
  • the backbone network is connected to the FPN, the FPN is connected to the header, and the The backbone network includes a first convolutional layer, the FPN includes a second convolutional layer, the header includes a third convolutional layer, and the target search space includes multiple operation types;
  • the convolutional layer performs a structural search to obtain a searched perceptual network, wherein the first convolutional layer included in the searched perceptual network corresponds to the first operation type, and the second convolution included in the searched perceptual network
  • the layer corresponds to the second operation type
  • the third convolutional layer included in the searched perceptual network corresponds to the third operation type
  • the first operation type, the second operation type, and the third operation type are The operation type among the plurality of operation types.
  • the performing a structure search on the first convolutional layer in the target search space includes: obtaining a first sub-layer corresponding to the first convolutional layer A search space, where the first sub-search space includes some or all of the operation types in the plurality of operation types; a structure search is performed on the first convolutional layer in the first sub-search space to obtain the first sub-search space A first operation type corresponding to a convolutional layer, where the first operation type is one of the operation types included in the first sub-search space; and/or,
  • the performing structural search on the second convolutional layer in the target search space includes: obtaining a second sub-search space corresponding to the second convolutional layer, where the second sub-search space includes the multiple For some or all of the operation types in the two operation types, the second convolutional layer is searched for the structure in the second sub-search space to obtain the second operation type corresponding to the second convolutional layer.
  • the second operation type is one of the operation types included in the second sub-search space; and/or,
  • the performing a structural search on the third convolutional layer in the target search space includes: obtaining a third sub-search space corresponding to the third convolutional layer, where the third sub-search space includes the multiple For some or all of the operation types in the three operation types, a structure search is performed on the third convolutional layer in the third sub-search space to obtain the third operation type corresponding to the third convolutional layer.
  • the three operation types are one of the operation types included in the third sub-search space.
  • the acquiring the first sub-search space corresponding to the first convolutional layer includes:
  • each of the multiple operation types included in the target search space corresponds to the first weight value of the first convolutional layer, where the first weight value represents the effect of the operation type on the perceptual network to be searched Output the influencing ability of the result, and obtain the first sub-search space corresponding to the first convolutional layer from the target search space according to the first weight value corresponding to the first convolutional layer.
  • the acquiring the first sub-search space corresponding to the first convolutional layer includes:
  • the first sub-search space corresponding to the backbone network is determined in the space, where the second weight value represents the influence capability of the operation type on the output result of the perceptual network to be searched.
  • the acquiring the second sub-search space corresponding to the second convolutional layer includes:
  • the second sub-search space corresponding to the second convolutional layer is obtained from the target search space.
  • the acquiring the second sub-search space corresponding to the second convolutional layer includes:
  • the acquiring a third sub-search space corresponding to the third convolutional layer includes:
  • the third sub-search space corresponding to the third convolutional layer is obtained from the target search space.
  • the acquiring a third sub-search space corresponding to the third convolutional layer includes:
  • the third sub-search space corresponding to the header is determined in the space.
  • the backbone network includes a plurality of convolutional layers
  • the first convolutional layer is one of the multiple convolutional layers included in the backbone network
  • the Perform a structural search on the first convolutional layer in the target search space perform a structural search on the second convolutional layer in the target search space, and perform a structural search on the third convolutional layer in the target search space.
  • the convolutional layer performs structure search, including:
  • the FPN includes a plurality of convolutional layers
  • the second convolutional layer is one of the plurality of convolutional layers included in the FPN.
  • the header includes multiple convolutional layers
  • the third convolutional layer is one of the multiple convolutional layers included in the header.
  • Perform a structural search on the first convolutional layer in the target search space perform a structural search on the second convolutional layer in the target search space, and perform a structural search on the third convolutional layer in the target search space
  • Layer structure search including:
  • the method further includes:
  • the performing a structural search on the sensing network to be searched in the target search space includes: performing a structural search on the sensing network to be searched according to the first model index and a preset loss until the preset loss
  • the searched perception model is obtained by satisfying a preset condition, wherein the preset loss is related to a model index loss, and the model index loss indicates that the second model index of the perception network to be searched and the first model
  • the difference between the indicators, the first model indicator and the second model indicator include at least one of the following: model calculation amount FLOPs or model parameter amount Parameters.
  • the method further includes:
  • the method further includes:
  • this application provides a perceptual network structure search method, including:
  • the perceptual network to be searched includes a backbone network and a head-end header, the backbone network is connected to the header, the backbone network includes a first convolutional layer, and the header includes a first convolutional layer.
  • the target search space includes multiple operation types;
  • the first convolutional layer included in the perceptual network after the search corresponds to the first operation type
  • the third convolutional layer included in the perceptual network after the search corresponds to the third operation type.
  • the first operation type and the third operation type are The operation type is an operation type among the multiple operation types.
  • the performing a structural search on the first convolutional layer in the target search space includes: obtaining a first sub-layer corresponding to the first convolutional layer A search space, where the first sub-search space includes some or all of the operation types in the plurality of operation types; a structure search is performed on the first convolutional layer in the first sub-search space to obtain the first sub-search space A first operation type corresponding to a convolutional layer, where the first operation type is one of the operation types included in the first sub-search space; and/or,
  • the performing a structural search on the third convolutional layer in the target search space includes: obtaining a third sub-search space corresponding to the third convolutional layer, where the third sub-search space includes the multiple For some or all of the operation types in the three operation types, a structure search is performed on the third convolutional layer in the third sub-search space to obtain the third operation type corresponding to the third convolutional layer.
  • the three operation types are one of the operation types included in the third sub-search space.
  • the acquiring the first sub-search space corresponding to the first convolutional layer includes:
  • each of the multiple operation types included in the target search space corresponds to the first weight value of the first convolutional layer, where the first weight value represents the effect of the operation type on the perceptual network to be searched Output the influencing ability of the result, and obtain the first sub-search space corresponding to the first convolutional layer from the target search space according to the first weight value corresponding to the first convolutional layer.
  • the acquiring the first sub-search space corresponding to the first convolutional layer includes:
  • the first sub-search space corresponding to the backbone network is determined in the space, where the second weight value represents the influence capability of the operation type on the output result of the perceptual network to be searched.
  • the obtaining the third sub-search space corresponding to the third convolutional layer includes:
  • the third sub-search space corresponding to the third convolutional layer is obtained from the target search space.
  • the obtaining the third sub-search space corresponding to the third convolutional layer includes:
  • the third sub-search space corresponding to the header is determined in the space.
  • the backbone network includes a plurality of convolutional layers
  • the first convolutional layer is one of the multiple convolutional layers included in the backbone network
  • the Performing a structural search on the first convolutional layer in the target search space and performing a structural search on the third convolutional layer in the target search space includes:
  • the header includes multiple convolutional layers
  • the third convolutional layer is one of the multiple convolutional layers included in the header
  • the Performing a structural search on the first convolutional layer in the target search space, and performing a structural search on the third convolutional layer in the target search space includes:
  • the method further includes:
  • the performing a structural search on the sensing network to be searched in the target search space includes: performing a structural search on the sensing network to be searched according to the first model index and a preset loss until the preset loss
  • the searched perception model is obtained by satisfying a preset condition, wherein the preset loss is related to a model index loss, and the model index loss indicates that the second model index of the perception network to be searched and the first model
  • the difference between the indicators, the first model indicator and the second model indicator include at least one of the following: model calculation amount FLOPs or model parameter amount Parameters.
  • the method further includes:
  • the method further includes:
  • this application provides a device for searching perceptual network structure, including:
  • the acquisition module is used to acquire the perception network to be searched and the target search space.
  • the perception network to be searched includes a backbone network, a feature pyramid network FPN, and a head-end header.
  • the backbone network is connected to the FPN, and the FPN is connected to the FPN.
  • header connection the backbone network includes a first convolution layer, the FPN includes a second convolution layer, the header includes a third convolution layer, and the target search space includes multiple operation types;
  • the structure search module is configured to search for the structure of the first convolutional layer in the target search space, search for the structure of the second convolutional layer in the target search space, and search for the structure in the target search space.
  • the structure of the third convolutional layer is searched within to obtain the searched perceptual network, wherein the first convolutional layer included in the searched perceptual network corresponds to the first operation type, and the searched perceptual network
  • the included second convolutional layer corresponds to the second operation type
  • the third convolutional layer included in the searched perceptual network corresponds to the third operation type, the first operation type, the second operation type, and the
  • the third operation type is an operation type among the multiple operation types.
  • the structure search module is specifically used for:
  • the first convolutional layer performs a structure search to obtain a first operation type corresponding to the first convolutional layer, where the first operation type is one of the operation types included in the first sub-search space; and/or,
  • the second convolutional layer performs a structure search to obtain a second operation type corresponding to the second convolutional layer, where the second operation type is one of the operation types included in the second sub-search space; and/or,
  • the structure search module is specifically used for:
  • each of the multiple operation types included in the target search space corresponds to the first weight value of the first convolutional layer, where the first weight value represents the effect of the operation type on the perceptual network to be searched Output the influencing ability of the result, and obtain the first sub-search space corresponding to the first convolutional layer from the target search space according to the first weight value corresponding to the first convolutional layer.
  • the first sub-search space corresponding to the backbone network is determined in the space, where the second weight value represents the influence capability of the operation type on the output result of the perceptual network to be searched.
  • the structure search module is specifically used for:
  • the second sub-search space corresponding to the second convolutional layer is obtained from the target search space.
  • the structure search module is specifically used for:
  • the structure search module is specifically used for:
  • the structure search module is specifically used for:
  • the third sub-search space corresponding to the header is determined in the space.
  • the backbone network includes multiple convolutional layers
  • the first convolutional layer is one of the multiple convolutional layers included in the backbone network
  • Structure search module specifically used for:
  • the FPN includes multiple convolutional layers
  • the second convolutional layer is one of the multiple convolutional layers included in the FPN
  • the structure search Module specifically used for:
  • the header includes multiple convolutional layers
  • the third convolutional layer is one of the multiple convolutional layers included in the header
  • the structure search Module specifically used for:
  • the device further includes:
  • the receiving module is used to receive the first model index on the end side;
  • the structure search module is specifically used for:
  • the loss of model indicators is related to the loss of model indicators.
  • the loss of model indicators indicates the difference between the second model indicator of the perceptual network to be searched and the first model indicator.
  • the first model indicator and the second model indicator include at least the following One type: model calculation FLOPs or model parameter parameters.
  • the device further includes:
  • the sending module is configured to send the searched perception model to the end side.
  • the device further includes:
  • the weight training module is used to perform weight training on the searched perception model to obtain the trained perception model
  • the acquisition module is used to acquire the perception network to be searched and the target search space
  • the perception network to be searched includes a backbone network and a head-end header, the backbone network is connected to the header, and the backbone network includes a first convolutional layer,
  • the header includes a third convolutional layer, and the target search space includes multiple operation types;
  • the structure search module is used to search for the structure of the first convolutional layer in the target search space, and search for the structure of the third convolutional layer in the target search space to obtain the searched perceptual network , wherein the first convolutional layer included in the searched perceptual network corresponds to a first operation type, the third convolutional layer included in the searched perceptual network corresponds to a third operation type, and the first operation
  • the type and the third operation type are operation types among the multiple operation types.
  • the structure search module is specifically used for:
  • the first convolutional layer performs a structure search to obtain a first operation type corresponding to the first convolutional layer, where the first operation type is one of the operation types included in the first sub-search space; and/or,
  • the structure search module is specifically used for:
  • each of the multiple operation types included in the target search space corresponds to the first weight value of the first convolutional layer, and the first weight value represents the effect of the operation type on the perceptual network to be searched.
  • Output the influencing ability of the result and obtain the first sub-search space corresponding to the first convolutional layer from the target search space according to the first weight value corresponding to the first convolutional layer.
  • the structure search module is specifically used for:
  • the first sub-search space corresponding to the backbone network is determined in the space, where the second weight value represents the influence capability of the operation type on the output result of the perceptual network to be searched.
  • the structure search module is specifically used for:
  • the third sub-search space corresponding to the header is determined in the space.
  • the backbone network includes multiple convolutional layers
  • the first convolutional layer is one of the multiple convolutional layers included in the backbone network
  • the Structure search module specifically used for:
  • the header includes multiple convolutional layers
  • the third convolutional layer is one of the multiple convolutional layers included in the header
  • the structure search Module specifically used for:
  • the device further includes:
  • the receiving module is used to receive the first model index on the end side;
  • the structure search module is specifically used for:
  • the loss of model indicators is related to the loss of model indicators.
  • the loss of model indicators indicates the difference between the second model indicator of the perceptual network to be searched and the first model indicator.
  • the first model indicator and the second model indicator include at least the following One type: model calculation FLOPs or model parameter parameters.
  • the device further includes:
  • the sending module is configured to send the searched perception model to the end side.
  • the device further includes:
  • the training module is used to perform weight training on the searched perception model to obtain the trained perception model
  • the sending module is also used to send the trained perception model to the terminal device.
  • an embodiment of the present application provides an image processing method, and the method includes:
  • the perception network includes a backbone network, a feature pyramid network FPN, and a head-end header
  • the backbone network is connected to the FPN
  • the FPN is connected to the header
  • the backbone network includes a first convolutional layer
  • the FPN includes a second convolutional layer
  • the header includes a third convolutional layer
  • the first convolutional layer corresponds to the first operation type
  • the second convolutional layer corresponds to the second operation type
  • the build-up layer corresponds to the third operation type, the correspondence between the first convolutional layer and the first operation type, the correspondence between the second convolutional layer and the second operation type, and the third convolutional layer
  • the corresponding relationship with the third operation type is obtained based on the structure search of the sensing network to be searched in the target search space; or,
  • the perception network includes a backbone network and a head-end header, the backbone network is connected to the header, the backbone network includes a first convolutional layer, the header includes a third convolutional layer, and the first convolutional layer Corresponding to the first operation type, the third convolution layer corresponds to the third operation type, the correspondence between the first convolution layer and the first operation type, and the relationship between the third convolution layer and the third operation type The corresponding relationship is obtained based on the structure search of the sensing network to be searched in the target search space;
  • the target search space includes multiple operation types, and the first operation type, the second operation type, and the third operation type are operation types in the multiple operation types.
  • the search space corresponding to the first convolutional layer is the first sub-search space
  • the search space corresponding to the second convolutional layer is the second sub-search space; or,
  • the search space corresponding to the third convolutional layer is a third sub-search space
  • the first sub-search space and the second sub-search space include some or all of the multiple operation types.
  • an embodiment of the present application provides an image processing device, the device including:
  • the acquisition module is used to acquire the target image
  • the target detection module is used to perform target detection on the target image through the perception network to obtain the detection result;
  • the perception network includes a backbone network, a feature pyramid network FPN, and a head-end header
  • the backbone network is connected to the FPN
  • the FPN is connected to the header
  • the backbone network includes a first convolutional layer
  • the FPN includes a second convolutional layer
  • the header includes a third convolutional layer
  • the first convolutional layer corresponds to the first operation type
  • the second convolutional layer corresponds to the second operation type
  • the build-up layer corresponds to the third operation type, the correspondence between the first convolutional layer and the first operation type, the correspondence between the second convolutional layer and the second operation type, and the third convolutional layer
  • the corresponding relationship with the third operation type is obtained based on the structure search of the sensing network to be searched in the target search space; or,
  • the perception network includes a backbone network and a head-end header, the backbone network is connected to the header, the backbone network includes a first convolutional layer, the header includes a third convolutional layer, and the first convolutional layer Corresponding to the first operation type, the third convolution layer corresponds to the third operation type, the correspondence between the first convolution layer and the first operation type, and the relationship between the third convolution layer and the third operation type The corresponding relationship is obtained based on the structure search of the sensing network to be searched in the target search space;
  • the target search space includes multiple operation types, and the first operation type, the second operation type, and the third operation type are operation types in the multiple operation types.
  • the search space corresponding to the first convolutional layer is the first sub-search space
  • the search space corresponding to the second convolutional layer is the second sub-search space; or,
  • the search space corresponding to the third convolutional layer is a third sub-search space
  • the first sub-search space and the second sub-search space include some or all of the multiple operation types.
  • an embodiment of the present application provides a device for searching for a perceptual network structure, which may include a memory, a processor, and a bus system, where the memory is used to store programs, and the processor is used to execute the programs in the memory to execute the above-mentioned One aspect and any optional method thereof or the second aspect and any optional method thereof.
  • an embodiment of the present application provides an image processing device, which may include a memory, a processor, and a bus system.
  • the memory is used to store a program
  • the processor is used to execute the program in the memory to execute the above-mentioned fifth aspect. And any optional methods.
  • an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer can execute the fifth aspect and any one thereof.
  • Optional method a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer can execute the fifth aspect and any one thereof.
  • an embodiment of the present application provides a computer program that, when it runs on a computer, causes the computer to execute the above-mentioned first aspect and any of its optional methods or the second aspect and any of its optional method.
  • the embodiments of the present application provide a computer program that, when run on a computer, causes the computer to execute the fifth aspect and any optional method thereof.
  • the present application provides a chip system that includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing the data involved in the above methods; Or, information.
  • the chip system further includes a memory for storing program instructions and data necessary for the execution device or the training device.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • the present application provides a method for searching perceptual network structure, including: acquiring a perceptual network to be searched and a target search space.
  • the perceptual network to be searched includes a backbone network, a feature pyramid network FPN, and a head-end header.
  • the FPN connection the FPN is connected to the header, the backbone network includes a first convolutional layer, the FPN includes a second convolutional layer, the header includes a third convolutional layer, and the target search space includes multiple Operation types; the structure search is performed on the first convolutional layer in the target search space, the second convolutional layer is searched in the target search space, and in the target search space A structural search is performed on the third convolutional layer to obtain a searched perceptual network, wherein the first convolutional layer included in the searched perceptual network corresponds to the first operation type, and the searched perceptual network includes The second convolutional layer corresponding to the second operation type, the third convolutional layer included in the searched perceptual network corresponds to the third operation type, the first operation type, the second operation type, and the The third operation type is an operation type among the multiple operation types.
  • the three network structures (the backbone network, the FPN, and the partial and all convolutional layers included in the header) in the search-aware network have been structured to search, so that the performance of the perceptual network obtained after the structure search is better.
  • Figure 1 is a schematic diagram of a structure of the main frame of artificial intelligence
  • FIGS. 2a and 2b are schematic diagrams of the application system framework of the present invention.
  • Figure 3 is a schematic diagram of an application scenario of this application.
  • Figure 4 is a schematic diagram of an application scenario of this application.
  • Figure 5 is a schematic diagram of a system architecture of this application.
  • FIG. 6 is a schematic diagram of the structure of a neural network according to an embodiment of the application.
  • FIG. 7 is a schematic diagram of the structure of a neural network according to an embodiment of the application.
  • FIG. 8a is a hardware structure of a chip provided by an embodiment of the application.
  • FIG. 8b is a system architecture provided by an embodiment of this application.
  • FIG. 9 is a schematic flowchart of a method for searching for a perceptual network structure provided by an embodiment of this application.
  • Figures 10a and 10b are the backbone network backbone of an embodiment of the application.
  • Figure 11 is a schematic diagram of the structure of an FPN
  • Figure 12a is a schematic diagram of a header
  • Figure 12b is a schematic diagram of the RPN layer of a header
  • FIG. 13a is a schematic diagram of a structure search process in this embodiment.
  • FIG. 13b is a schematic diagram of a structure search process in this embodiment.
  • FIG. 14 is a schematic flowchart of a method for searching for a perceptual network structure provided by an embodiment of this application;
  • FIG. 16 is a schematic flowchart of a method for searching for a perceptual network structure provided by an embodiment of this application;
  • FIG. 17 is a schematic structural diagram of a device for searching a sensory network structure provided by an embodiment of this application.
  • FIG. 18 is a schematic structural diagram of an image processing device provided by an embodiment of the application.
  • FIG. 19 is a schematic structural diagram of an execution device provided by an embodiment of this application.
  • FIG. 20 is a schematic structural diagram of a training device provided by an embodiment of this application.
  • FIG. 21 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
  • Figure 1 shows a schematic diagram of the main framework of artificial intelligence.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
  • the "IT value chain” from the underlying infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflects the value that artificial intelligence brings to the information technology industry.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • basic platforms include distributed computing frameworks and network related platform guarantees and support, which can include cloud storage and Computing, interconnection network, etc.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
  • the data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as the Internet of Things data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies.
  • the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
  • some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart medical care, autonomous driving, safe city, etc.
  • the embodiments of the present application are mainly applied in fields such as driving assistance, automatic driving, and mobile phone terminals that need to complete various perception tasks.
  • the application system framework of the present invention is shown in Figure 2a and Figure 2b.
  • the video is framed to obtain a single picture, and the picture is sent to the perception network shown in Figure 2a or Figure 2b in the present invention to obtain the object of interest in the picture.
  • These detection results are output to the post-processing module for processing.
  • they are sent to the planning control unit in the automatic driving system for decision-making, and the beauty algorithm is sent to the mobile phone terminal for processing to obtain the beauty pictures.
  • the following is a brief introduction to the two application scenarios of ADAS/ADS visual perception system and mobile phone beauty.
  • Application scenario 1 ADAS/ADS visual perception system
  • Application scenario 2 mobile phone beauty function
  • the mask and key points of the human body are detected through the perception network provided by the embodiments of the present application, and the corresponding parts of the human body can be zoomed in and out, such as waist reduction and buttocks operation, so as to output beauty picture of.
  • Application scenario 3 Image classification scenario:
  • the object recognition device After obtaining the image to be classified, the object recognition device adopts the object recognition method of the present application to obtain the category of the object in the image to be classified, and then can classify the image to be classified according to the category of the object in the image to be classified.
  • the object recognition method of the present application For photographers, many photos are taken every day, including animals, people, and plants. Using the method of the present application, photos can be quickly classified according to the content of the photos, which can be divided into photos containing animals, photos containing people, and photos containing plants.
  • the object recognition device After the object recognition device obtains the image of the product, it then uses the object recognition method of the present application to obtain the category of the product in the image of the product, and then classifies the product according to the category of the product. For a wide variety of commodities in large shopping malls or supermarkets, the object recognition method of the present application can quickly complete the classification of commodities, reducing time and labor costs.
  • the method for training a CNN feature extraction model involves computer vision processing, and can be specifically applied to data processing methods such as data training, machine learning, and deep learning.
  • Image blocks and object categories perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc., and finally obtain a trained CNN feature extraction model; and the embodiment of this application will input data (such as this
  • the image of the object in the application is input into the trained CNN feature extraction model to obtain output data (for example, the 2D, 3D, Mask, key points and other information of the object of interest in the image obtained in this application).
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes xs and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep Neural Network can be understood as a neural network with many hidden layers. There is no special metric for "many” here. The essence of the multi-layer neural network and deep neural network we often say The above is the same thing. From the division of DNN according to the location of different layers, the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer. Although DNN looks complicated, it is not complicated as far as the work of each layer is concerned.
  • the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as
  • the superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as Note that the input layer has no W parameter.
  • more hidden layers make the network more capable of portraying complex situations in the real world.
  • a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks.
  • Convolutional Neural Network (Convosutionas Neuras Network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way of extracting image information has nothing to do with location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learning image information. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size.
  • the convolution kernel can obtain reasonable weights through learning.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial super-resolution model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal super-resolution model parameters, such as a weight matrix.
  • RNN Recurrent Neural Networks
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • the neural network can use the back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • an embodiment of the present application provides a system architecture 100.
  • the data collection device 160 is used to collect training data.
  • the training data includes: the image or image block and the category of the object; and the training data is stored in the database 130, and the training device 120 is trained based on the training data maintained in the database 130 to obtain a CNN feature extraction model (explanation: the feature extraction model here is the model trained in the training phase described above, and may be a neural network for feature extraction, etc.).
  • the feature extraction model here is the model trained in the training phase described above, and may be a neural network for feature extraction, etc.
  • the CNN feature extraction model can be used to implement the neural network provided by the embodiment of the application, that is, the image or image block to be recognized After relevant preprocessing, input the CNN feature extraction model to obtain the 2D, 3D, Mask, key points and other information of the object of interest in the image or image block to be recognized.
  • the CNN feature extraction model in the embodiment of the present application may specifically be a CNN convolutional neural network.
  • the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 does not necessarily train the CNN feature extraction model completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training. The above description should not be used as an implementation of this application. Limitations of examples.
  • the target model/rule trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 5.
  • the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, or a notebook. Computers, augmented reality (AR) AR/virtual reality (VR), vehicle-mounted terminals, etc., can also be servers or clouds.
  • the execution device 110 is configured with an input/output (input/output, I/O) interface 112 for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140.
  • the input data in the embodiment of the present application may include: an image to be recognized or an image block or a picture.
  • the execution device 120 may call the data storage system 150
  • the data, codes, etc. are used for corresponding processing, and the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 150.
  • the I/O interface 112 returns the processing result, such as the 2D, 3D, Mask, key points and other information of the image or image block obtained above or the object of interest in the picture, to the client device 140 to provide it to the user.
  • processing result such as the 2D, 3D, Mask, key points and other information of the image or image block obtained above or the object of interest in the picture
  • the client device 140 may be a planning control unit in an automatic driving system or a beauty algorithm module in a mobile phone terminal.
  • the training device 120 can generate corresponding target models/rules based on different training data for different goals or different tasks, and the corresponding target models/rules can be used to achieve the above goals or complete the above tasks. , So as to provide users with the desired results.
  • the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 112.
  • the client device 140 can automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140.
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data, and store it in the database 130 as shown in the figure.
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure.
  • the data is stored in the database 130.
  • FIG. 5 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.
  • a CNN feature extraction model is obtained by training according to the training device 120.
  • the CNN feature extraction model may be a CNN convolutional neural network in the embodiment of the present application or a neural network that will be introduced in the following embodiment.
  • CNN is a very common neural network
  • the structure of CNN will be introduced in detail below in conjunction with Figure 5.
  • a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture.
  • the deep learning architecture refers to the algorithm of machine learning. Multi-level learning is carried out on the abstract level of.
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
  • a convolutional neural network (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230.
  • the input layer 210 can obtain the image to be processed, and pass the obtained image to be processed to the convolutional layer/pooling layer 220 and the subsequent neural network layer 230 for processing, and the processing result of the image can be obtained.
  • the convolutional layer/pooling layer 220 may include layers 221-226, for example: in an implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer. Layers, 224 is the pooling layer, 225 is the convolutional layer, and 226 is the pooling layer; in another implementation, 221 and 222 are the convolutional layers, 223 is the pooling layer, and 224 and 225 are the convolutional layers. Layer, 226 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 221 can include many convolution operators.
  • the convolution operator is also called a kernel. Its function in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the value of stride) to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row ⁇ column) are applied. That is, multiple homogeneous matrices.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are merged to form The output of the convolution operation.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions. .
  • the initial convolutional layer (such as 221) often extracts more general features, which can also be called low-level features; with the convolutional neural network
  • the features extracted by the subsequent convolutional layers (for example, 226) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the sole purpose of the pooling layer is to reduce the size of the image space.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain an image with a smaller size.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of the average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 200 After processing by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 200 needs to use the neural network layer 230 to generate one or a group of required classes of output. Therefore, the neural network layer 230 can include multiple hidden layers (231, 232 to 23n as shown in FIG. 6) and an output layer 240. The parameters contained in the multiple hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
  • the output layer 240 After the multiple hidden layers in the neural network layer 230, that is, the final layer of the entire convolutional neural network 200 is the output layer 240.
  • the output layer 240 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
  • the convolutional neural network 210 shown in FIG. 2 is only used as an example of a convolutional neural network. In specific applications, the convolutional neural network may also exist in the form of other network models.
  • a convolutional neural network (CNN) 200 may include an input layer 110, a convolutional layer/pooling layer 120 (where the pooling layer is optional), and a neural network layer 130.
  • CNN convolutional neural network
  • FIG. 6 multiple convolutional layers/pooling layers in the convolutional layer/pooling layer 120 in FIG. 7 are parallel, and the respectively extracted features are input to the full neural network layer 130 for processing.
  • the convolutional neural network shown in FIGS. 6 and 7 is only used as an example of two possible convolutional neural networks in the image processing method of the embodiment of the present application.
  • the present application implements
  • the convolutional neural network used in the image processing method of the example can also exist in the form of other network models.
  • the structure of the convolutional neural network obtained by using the search method of the neural network structure of the embodiment of the present application may be as shown in the convolutional neural network structure in FIG. 6 and FIG. 7.
  • FIG. 8a is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor NPU 50.
  • the chip can be set in the execution device 110 as shown in FIG. 5 to complete the calculation work of the calculation module 111.
  • the chip can also be set in the training device 120 as shown in FIG. 5 to complete the training work of the training device 120 and output the target model/rule.
  • the algorithms of each layer in the convolutional neural network as shown in FIG. 6 and FIG. 7 can be implemented in the chip as shown in FIG. 8a.
  • the neural network processor NPU 50 which is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU), and the main CPU distributes tasks.
  • the core part of the NPU is the arithmetic circuit 503.
  • the controller 504 controls the arithmetic circuit 503 to extract data from the memory (weight memory or input memory) and perform calculations.
  • the arithmetic circuit 503 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 503 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 502 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the matrix A data and matrix B from the input memory 501 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 508.
  • the vector calculation unit 507 can perform further processing on the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 507 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector calculation unit 507 can store the processed output vector in the unified buffer 506.
  • the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 507 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 503, for example for use in a subsequent layer in a neural network.
  • the unified memory 506 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), stores the weight data in the external memory into the weight memory 502, and stores The data in the unified memory 506 is stored in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 510 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through the bus.
  • An instruction fetch buffer 509 connected to the controller 504 is used to store instructions used by the controller 504;
  • the controller 504 is used to call the instructions cached in the memory 509 to control the working process of the computing accelerator.
  • the input data here in this application is a picture
  • the output data is information such as 2D, 3D, Mask, and key points of the object of interest in the picture.
  • the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are all on-chip (On-Chip) memories.
  • the external memory is a memory external to the NPU.
  • the external memory can be a double data rate synchronous dynamic random access memory.
  • Memory double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.
  • the execution device 110 in FIG. 5 introduced above can execute the image processing method or the steps of the image processing method of the embodiment of the present application.
  • the CNN model shown in FIG. 6 and FIG. 7 and the chip shown in FIG. 8a can also be used for Perform the image processing method or each step of the image processing method in the embodiment of the present application.
  • the image processing method of the embodiment of the present application and the image processing method of the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
  • an embodiment of the present application provides a system architecture 300.
  • the system architecture includes a local device 301, a local device 302, an execution device 210 and a data storage system 250, where the local device 301 and the local device 302 are connected to the execution device 210 through a communication network.
  • the execution device 210 may be implemented by one or more servers.
  • the execution device 210 can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices.
  • the execution device 210 may be arranged on one physical site or distributed on multiple physical sites.
  • the execution device 210 may use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the method for searching the neural network structure of the embodiment of the present application.
  • the execution device 210 may execute the following process:
  • the perception network to be searched includes a backbone network, a feature pyramid network FPN, and a head-end header.
  • the backbone network is connected to the FPN, the FPN is connected to the header, and the The backbone network includes a first convolutional layer, the FPN includes a second convolutional layer, the header includes a third convolutional layer, the target search space includes multiple operation types;
  • the first convolutional layer performs a structure search, the second convolutional layer is searched for a structure in the target search space, and the third convolutional layer is searched for a structure in the target search space.
  • the perceptual network wherein the first convolutional layer included in the searched perceptual network corresponds to the first operation type, the second convolutional layer included in the searched perceptual network corresponds to the second operation type, and the The third convolutional layer included in the searched perceptual network corresponds to a third operation type, and the first operation type, the second operation type, and the third operation type are operation types among the multiple operation types .
  • a target neural network can be built, and the target neural network can be used for image classification or image processing.
  • Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc.
  • the local device of each user can interact with the execution device 210 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the local device 301 and the local device 302 obtain the relevant parameters of the target neural network from the execution device 210, deploy the target neural network on the local device 301 and the local device 302, and use the target neural network for image classification Or image processing and so on.
  • the target neural network can be directly deployed on the execution device 210.
  • the execution device 210 obtains the image to be processed from the local device 301 and the local device 302, and classifies the image to be processed or other types of images according to the target neural network. deal with.
  • the above-mentioned execution device 210 may also be referred to as a cloud device. At this time, the execution device 210 is generally deployed in the cloud.
  • the neural network construction method of the embodiment of the present application will be described in detail below in conjunction with 8b.
  • the method shown in FIG. 9 can be executed by a neural network construction device, and the neural network construction device can be a computer, a server, or the like with sufficient computing power to be used for the neural network construction device.
  • the method shown in FIG. 9 includes steps 901 to 902, which are respectively described in detail below.
  • a perception network to be searched and a target search space where the perception network to be searched includes a backbone network, a feature pyramid network FPN, and a head-end header, the backbone network is connected to the FPN, and the FPN is connected to the header.
  • the backbone network includes a first convolutional layer
  • the FPN includes a second convolutional layer
  • the header includes a third convolutional layer
  • the target search space includes multiple operation types.
  • the architecture of the perception network to be searched may be the architecture shown in FIG. 2a, which is mainly composed of a backbone network backbone, a feature pyramid network (FPN), and a head-end header.
  • FPN feature pyramid network
  • the backbone network backbone is used to receive input pictures, perform convolution processing on the input pictures, and output feature maps with different resolutions corresponding to the pictures; that is to say, output corresponding to the different pictures of the pictures.
  • Large and small feature maps, that is, Backbone completes the extraction of basic features and provides corresponding features for subsequent detection.
  • the backbone network can perform a series of convolution processing on the input pictures to obtain feature maps at different scales. These feature maps will provide basic features for subsequent detection modules.
  • the backbone network can take many forms, such as visual geometry group (VGG), residual neural network (residual neural network, resnet), the core structure of GoogLeNet (Inception-net), and so on.
  • the backbone network backbone can perform convolution processing on the input image to generate several convolution feature maps of different scales.
  • Each feature map is a matrix of H*W*C, where H is the height of the feature map and W is the feature map. Width, C is the number of channels in the feature map.
  • Backbone can use a variety of existing convolutional network frameworks, such as VGG16, Resnet50, Inception-Net, etc.
  • VGG16 convolutional network frameworks
  • Resnet50 Resnet50
  • Inception-Net Inception-Net
  • the process is shown in Figure 10a.
  • the resolution of the input picture is H*W*3 (height H, width W, the number of channels is 3, that is, three channels of RBG).
  • the input image undergoes convolution operation through Resnet18's first convolution module Res18-Conv1 (convolution module 1 in the figure) to generate Featuremap (feature map) C1.
  • This feature map is down-sampled twice with respect to the input image.
  • the number of channels is expanded to 64, so the resolution of C1 is H/4*W/4*64.
  • the convolution module 1 is composed of several convolution layers.
  • the subsequent convolution modules are similar. Refer to Figure 10b.
  • Figure 10b shows the convolution.
  • the structure of the convolution module is shown in Figure 10b.
  • the convolution module 1 can include multiple convolution layers (convolution layer 1 to convolution layer N); C1 passes through the second convolution module Res18- of Resnet18.
  • Conv2 (convolution module 2 in the figure) performs convolution operations to obtain Featuremap C2.
  • the resolution of this feature map is the same as C1; C2 continues to pass through Resnet18's third convolution module Res18-Conv3 (convolution module in the figure) 3) Process and generate Featuremap C3.
  • this feature map is further down-sampled, and the number of channels is doubled. Its resolution is H/8*W/8*128; finally C3 passes through Res18-Conv4 (convolution module 4 in the figure). ) Processing to generate Featuremap C4 with a resolution of H/16*W/16*256.
  • Resnet18 performs multiple levels of convolution processing on the input image to obtain feature maps of different scales: C1/C2/C3/C4.
  • the width and height of the feature map at the bottom layer are relatively large, and the number of channels is small. It is mainly the low-level features of the image (such as image edges and texture features). It is the high-level features of the image (such as shape, object features).
  • the subsequent 2D detection process will make further predictions based on these feature maps.
  • the backbone network backbone includes multiple convolution modules, each convolution module includes multiple convolution layers, and each convolution module can perform convolution processing on the input feature map, and have obtained different resolutions.
  • the feature map is one of the first convolutional layer included in the backbone network backbone in the embodiment of the present application.
  • backbone network in the embodiments of the present application may also be referred to as a backbone network, which is not limited here.
  • the FPN is connected to the backbone network backbone, and the FPN can perform convolution processing on multiple feature maps of different resolutions generated by the backbone network backbone to construct a feature pyramid.
  • the convolution module 1 You can use hole convolution and 1 ⁇ 1 convolution to reduce the number of channels of the topmost feature map C4 to 256, as the topmost feature map P4 of the feature pyramid
  • the FPN includes multiple convolution modules, and each convolution module includes multiple convolution layers. Each convolution module can perform convolution processing on the input feature map.
  • the FPN includes The second convolutional layer is one of a plurality of convolutional layers included in FPN.
  • the header is connected with the FPN.
  • the header can complete the detection of the 2D frame of a task according to the feature map provided by the FPN, and output the 2D frame of the object of the task and the corresponding confidence level, etc., the following describes one
  • FIG. 12a is a schematic diagram of a header.
  • the header includes three modules: a Region Proposal Network (RPN), ROI-ALIGN, and RCNN.
  • RPN Region Proposal Network
  • ROI-ALIGN ROI-ALIGN
  • RCNN Resource Control Network
  • the RPN module can be used to predict the area where the task object is located on one or more feature maps provided by the FPN, and output candidate 2D boxes that match the area; or it can be understood that the RPN is one or
  • the areas where the task object may exist are predicted on multiple horizontal images, and the frames of these areas are given. These areas are called candidate areas (Proposal).
  • candidate areas Proposal
  • the Header when the Header is responsible for detecting a car, its RPN layer predicts a candidate frame that may have a car; when the Header is responsible for detecting a person, its RPN layer predicts a candidate frame that may have a person.
  • these Proposals are not accurate. On the one hand, they do not necessarily contain the object of the task, and on the other hand, these frames are not compact.
  • the 2D candidate area prediction process can be implemented by the RPN module of the Header, which predicts the areas where the task object may exist based on the feature map provided by the FPN, and gives candidate frames (also called candidate areas, Proposal) for these areas.
  • the Header if the Header is responsible for detecting cars, its RPN layer predicts that there may be candidate frames for cars.
  • the basic structure of the RPN layer can be as shown in Figure 12b.
  • the feature map RPNHidden is generated by convolution module 1 (for example, a 3*3 convolution) on the feature map provided by FPN.
  • the RPN layer of the following Header will predict Proposal from RPN Hidden.
  • the RPN layer of the Header respectively uses the convolution module 2 and the convolution module 3 (for example, a 1*1 convolution) to predict the coordinates and confidence of the proposal at each position of the RPN Hidden.
  • the higher the confidence the greater the probability that the proposal exists for the object of the task. For example, the greater the score of a Proposal in the Header, the greater the probability of its existence.
  • the Proposal predicted by each RPN layer needs to go through the Proposal merging module, and the excess Proposal is removed according to the degree of overlap between Proposals (this process can be used but not limited to the NMS algorithm), and the largest score is selected from the remaining K Proposals N (N ⁇ K) proposals are used as candidate areas where objects may exist. It can be seen from Figure 12b that these Proposals are inaccurate. On the one hand, they do not necessarily contain the object of the task, and on the other hand, these frames are not compact. Therefore, the RPN module is only a rough detection process, and the subsequent RCNN module is required for subdivision. When the RPN module returns to the Proposal coordinates, it does not directly return the absolute value of the coordinates, but returns the coordinates relative to the Anchor. The higher the match between these Anchors and the actual object, the greater the probability that the RPN can detect the object.
  • the ROI-ALIGN module is used to extract the features of the region where the candidate 2D frame is located from a feature map provided by the FPN according to the region predicted by the RPN module; that is, the ROI-ALIGN module is mainly based on the RPN module
  • the ROI-ALIGN module can be used but not limited to ROI-POOLING (region of interest pooling)/ROI-ALIGN (region of interest extraction)/PS-ROIPOOLING (position-sensitive region of interest pooling)/ Feature extraction methods such as PS-ROIALIGN (position-sensitive region of interest extraction).
  • the RCNN module is used to perform convolution processing on the features of the region where the candidate 2D box is located through a neural network to obtain the confidence that the candidate 2D box belongs to each object category; adjust the coordinates of the candidate area 2D box through the neural network , Making the adjusted 2D candidate frame more match the shape of the actual object than the candidate 2D frame, and selecting the adjusted 2D candidate frame with a confidence greater than a preset threshold as the 2D frame of the region.
  • the RCNN module mainly refines the features of each Proposal proposed by the ROI-ALIGN module, and obtains the confidence that each Proposal belongs to each category (for example, for the task of car, Backgroud/Car/Truck will be given /Bus 4 points), and adjust the coordinates of the Proposal 2D frame to output a more compact 2D frame. After these 2D boxes are combined by non-maximum suppression (NMS), they are output as the final 2D box.
  • NMS non-maximum suppression
  • the sub-classification of 2D candidate regions is mainly implemented by the Header's RCNN module in Figure 12a. According to the features of each Proposal extracted by the ROI-ALIGN module, it further returns to more compact 2D frame coordinates, and at the same time classifies this Proposal. Output the confidence that it belongs to each category. There are many realizable forms of RCNN, one of which is shown in Figure 12b.
  • the feature size output by the ROI-ALIGN module can be N*14*14*256 (Feature of proposals), which is first processed by the Resnet18 convolution module 4 (Res18-Conv5) in the RCNN module, and the output feature size is N* 7*7*512, and then process it through a Global Avg Pool (average pooling layer).
  • the 7*7 features in each channel in the input features are averaged to obtain N*512 features, each of which is 1*
  • the 512-dimensional feature vector represents the feature of each proposal.
  • FC output N*4 vector, these 4 numerical sub-tables represent the x/y coordinates of the center point of the frame, the width and height of the frame), and the confidence of the frame category Degree (in Header0, the score that this box is Backgroud/Car/Truck/Bus needs to be given).
  • the box merging operation several boxes with the largest scores are selected, and the repeated boxes are removed through the NMS operation, so as to obtain a compact box output.
  • the perception network may also include other headers, and 3D/Mask/Keypoint detection can be further performed on the basis of detecting the 2D frame.
  • the ROI-ALIGN module extracts the features of the region where each 2D box is located on the feature map output by the FPN according to the accurate 2D box provided by the Header.
  • the feature size output by the ROI-ALIGN module is M*14*14*256, which is first processed by Resnet18's convolution module 5 (for example, Res18-Conv5), and the output feature size is N*7*7*512, and then passed A Global Avg Pool (average pooling layer) is processed, and the 7*7 features of each channel in the input features are averaged to obtain M*512 features, where each 1*512-dimensional feature vector represents each 2D The characteristics of the box.
  • orientation angle of the object in the frame (orientation, M*1 vector), centroid point coordinates (centroid, M*2 vector, these 2 values represent the x/y coordinates of the centroid) and Length, width and height (dimention).
  • the header includes at least one convolution module, and each convolution module includes at least one convolution layer. Each convolution module can perform convolution processing on the input feature map.
  • the header includes The third convolutional layer of is one of the multiple convolutional layers included in the header.
  • FIG. 12a and FIG. 12b are only an implementation manner, and does not constitute a limitation of the present application.
  • the target search space may be determined according to the application requirements of the perception network to be searched. Specifically, the above-mentioned target search space may be determined according to the type of processing data of the perception network to be searched.
  • the type and number of operations included in the aforementioned target search space should be adapted to the processing of the image data.
  • the foregoing target search space may contain multiple operation types in a pre-set convolutional neural network, and the operation types may be basic operations or a combination of basic operations, and these basic operations or combinations of basic operations may be collectively referred to as operation types.
  • the foregoing target search space may include, but is not limited to, operation types including but not limited to convolution, pooling, residual connection, etc., for example, may include the following operation types:
  • 3x3 average pooling means pooling with a core size of 3 ⁇ 3
  • 3x3 max pooling means pooling with a core size of 3 ⁇ 3, the maximum pooling
  • 3x3dilatedconvolution means a convolution kernel with a size of 3.
  • 3x3 separable conv indicates a separate convolution with a convolution kernel size of 3 ⁇ 3
  • 5x5 seperable conv indicates a separate convolution with a convolution kernel size of 5 ⁇ 5.
  • the structure search may be used to determine the operation type corresponding to the convolutional layer included in the perceptual network to be searched.
  • the third convolutional layer performs structural search to obtain the searched perceptual network, wherein the first convolutional layer included in the searched perceptual network corresponds to the first operation type, and the searched perceptual network includes the second
  • the convolutional layer corresponds to the second operation type
  • the third convolutional layer included in the searched perceptual network corresponds to the third operation type, the first operation type, the second operation type, and the third operation
  • the type is an operation type among the multiple operation types.
  • the first convolutional layer after acquiring the perceptual network to be searched and the target search space, the first convolutional layer may be structured in the target search space, and the first convolutional layer may be searched in the target search space.
  • the second convolutional layer performs a structural search
  • the third convolutional layer is searched for a structure in the target search space to obtain a searched perceptual network, wherein the searched perceptual network includes the first convolutional layer
  • the second convolutional layer included in the searched perceptual network corresponds to the second operation type
  • the third convolutional layer included in the searched perceptual network corresponds to the third operation type, so
  • the first operation type, the second operation type, and the third operation type are operation types among the multiple operation types.
  • the perceptual network to be searched may include multiple convolutional layers, where the backbone network includes multiple convolutional layers, the FPN includes multiple convolutional layers, and the header includes multiple convolutional layers.
  • the convolutional layers included in the backbone network can be used as the object of the structure search, and at the same time, part or all of the convolutional layers included in the FPN can be used as the object of the structure search, and at the same time, part or all of the header included in the convolution
  • the layer is the object of structure search. It should be noted that the more the number of convolutional layers selected for structure search, the better the perceptual network obtained by the final search, but it needs to consume additional memory overhead.
  • FIG. 13a is a schematic diagram of a structure search process in this embodiment.
  • the convolutional layer (the first convolutional layer, the second convolutional layer Layer or the third convolutional layer)
  • the corresponding search space includes operation type 1, operation type 2,..., operation type N.
  • operation type 1, operation type 2,..., operation type N When the perceptual network to be searched is fed forward, when the feature map is input to the convolutional layer, pass The search space includes operation type 1, operation type 2,..., operation type N to perform convolution processing to obtain N feature maps after convolution processing, and to perform a weighted average on the N feature maps.
  • each convolutional layer of the perceptual network to be searched uses the above Processing method, and use the weighted average feature map as the output feature map of the convolutional layer, where it is for each convolutional layer (first convolutional layer, second convolutional layer, or third convolutional layer)
  • each operation type of the corresponding search space has a weight value.
  • each weight value has an initial value.
  • the processing result can be obtained , And compare the processing result with the true value to obtain the loss loss, calculate the gradient, and then use this gradient to update the search space corresponding to each convolutional layer in the perceptual network to be searched for structure search.
  • the corresponding weight value of each operation type included After a certain number of iterations, the weight value corresponding to each operation type included in the search space corresponding to each convolutional layer performing structure search in the perceptual network to be searched can be obtained. For each convolutional layer that requires structure search, it can be determined that the operation type of the convolutional layer is the operation type corresponding to the largest weight value.
  • the backbone network includes multiple convolutional layers
  • the first convolutional layer is one of the multiple convolutional layers included in the backbone network.
  • the perceptual network to be searched performs a structural search to determine the operation type corresponding to each convolutional layer in the multiple convolutional layers included in the backbone network, the second operation type corresponding to the second convolutional layer, and the third volume The third operation type corresponding to the build-up layer.
  • the FPN includes multiple convolutional layers
  • the second convolutional layer is one of the multiple convolutional layers included in the FPN.
  • the perceptual network performs a structural search to obtain the first operation type corresponding to the first convolutional layer, the operation type corresponding to each convolutional layer among the multiple convolutional layers included in the FPN, and the corresponding operation type of the third convolutional layer The third operation type.
  • the header includes multiple convolutional layers
  • the third convolutional layer is one of the multiple convolutional layers included in the header.
  • the perceptual network performs a structure search to obtain the first operation type corresponding to the first convolution layer, the second operation type corresponding to the second convolution layer, and each convolution of the multiple convolution layers included in the header The operation type corresponding to the layer.
  • the structure search may be performed on all the convolutional layers included in the backbone network, the structure search may be performed on the partial convolutional layers included in the FPN, and the structure search may be performed on the partial convolutional layers included in the header.
  • the structure search may be performed on part of the convolutional layer included in the backbone network, the structure search may be performed on all the convolutional layers included in the FPN, and the structure search may be performed on the part of the convolutional layer included in the header.
  • the structure search may be performed on part of the convolutional layer included in the backbone network, the structure search may be performed on the part of the convolutional layer included in the FPN, and the structure search may be performed on all the convolutional layers included in the header.
  • a structure search may be performed on all convolutional layers included in the backbone network, a structure search may be performed on all convolutional layers included in the FPN, and a structure search may be performed on part of the convolutional layers included in the header.
  • a structure search may be performed on part of the convolutional layers included in the backbone network, a structure search may be performed on all the convolutional layers included in the FPN, and a structure search may be performed on all the convolutional layers included in the header.
  • a structure search may be performed on all convolutional layers included in the backbone network, a structure search may be performed on a part of the convolutional layers included in the FPN, and a structure search may be performed on all the convolutional layers included in the header.
  • a structure search may be performed on all convolutional layers included in the backbone network, a structure search may be performed on all convolutional layers included in the FPN, and a structure search may be performed on all convolutional layers included in the header.
  • ⁇ , ⁇ , and ⁇ are L*8, M*8, N*8, respectively
  • the two-dimensional array of, each number represents the weight value of each operation type in the search space in each layer, the higher the weight value, the greater the probability of being selected.
  • the three parameters ⁇ , ⁇ , and ⁇ are updated together, and the corresponding loss function can be:
  • C() is the calculation amount or parameter amount
  • this part is not necessary, if the calculation amount or parameter amount requirements are specified in advance, this part can be included.
  • the optimized parameters ⁇ , ⁇ , and ⁇ are obtained. By selecting the operation type corresponding to ⁇ , ⁇ , and ⁇ with the largest value of each convolutional layer, you can get The structure of the three-part structure in the searched perception network that is searched out.
  • the target search space corresponding to the convolutional layer search includes a large number of operation types, if the structure search is directly based on all the operation types included in the target search space, a large amount of memory will be consumed.
  • the search space corresponding to the first convolutional layer is the first sub-search space; or, the search space corresponding to the second convolutional layer is the second sub-search space; Or, the search space corresponding to the third convolutional layer is a third sub-search space; wherein, the first sub-search space, the second sub-search space, and the third sub-search space are the target search A subset of space.
  • a first sub-search space corresponding to the first convolutional layer may be obtained, where the first sub-search space includes some or all of the multiple operation types; in the first sub-search space Performing a structure search on the first convolutional layer within to obtain a first operation type corresponding to the first convolutional layer, where the first operation type is one of the operation types included in the first sub-search space; And/or, acquiring a second sub-search space corresponding to the second convolutional layer, where the second sub-search space includes some or all of the operation types in the plurality of operation types, in the second sub-search space Performing a structure search on the second convolutional layer within to obtain a second operation type corresponding to the second convolutional layer, where the second operation type is one of the operation types included in the second sub-search space; And/or, acquiring a third sub-search space corresponding to the third convolutional layer, where the third sub-search space includes some or all of the multiple operation types, and in the third sub-search space A structure search is performed
  • sub-search space a subset of the target search space (hereinafter may be referred to as sub-search space) can be used as the convolutional layer or network structure corresponding to the structure search
  • the search space used.
  • This embodiment is described for the following two situations: 1. Determine the sub-search space corresponding to each convolutional layer that requires structure search; 2. Determine each network structure (backbone network) , FPN and header) corresponding sub search space.
  • each operation type of the multiple operation types included in the target search space can be obtained corresponding to the first weight value of the first convolutional layer, and the first weight value indicates that the operation type is related to The influencing ability of the output result of the perceptual network to be searched, and according to the first weight value corresponding to the first convolutional layer, the first convolutional layer corresponding to the first convolutional layer is obtained from the target search space A sub-search space; acquiring each of the multiple operation types included in the target search space corresponds to the second weight value of the second convolutional layer, and according to the second weight value corresponding to the second convolution The first weight value of the layer, the second sub-search space corresponding to the second convolutional layer is obtained from the target search space; each of the multiple operation types included in the target search space is obtained corresponding to The first weight value of the third convolutional layer, and according to the first weight value corresponding to the third convolutional layer, the first weight value corresponding to the third convolutional layer is obtained from the target search
  • the input feature map is x
  • the target search space can include N operation types as candidates.
  • the feature map x is input into these N operation types, and N feature maps are output ⁇ y 1 ,y 2 , ...,Y N ⁇ , you can use ⁇ as the weight value to do a weighted summation of them:
  • the output feature map y of the convolutional layer is used as the input feature map of the next layer. After feeding forward the perception network to be searched, the inference result is obtained, and the loss function in the search process can be:
  • f( ⁇ ) is the recognition loss function during search, such as but not limited to the cross-entropy loss function.
  • the L21 regularization can promote the appearance of sparseness, and it is easy to select a more appropriate sub-search space.
  • the sparse parameter ⁇ can be obtained.
  • the sub-search space corresponding to the convolutional layer can be extracted.
  • the above ⁇ may indicate the ability of the search type to influence the output results of the perceptual network to be searched. The larger the ⁇ is, the greater the ability of the search type to influence the output results of the perceptual network to be searched.
  • the weight of the convolutional layer can be set to a preset value, and the rest of the convolutional layer can be set to the preset operation type and Weights.
  • the first convolutional layer may be obtained from the target search space according to the weight value of each operation type included in the target search space corresponding to the first convolutional layer (that is, ⁇ ).
  • the first sub-search space corresponding to the layer wherein the weight value represents the influence ability of the operation type on the output result of the perceptual network to be searched; each operation type included in the target search space corresponds to the second convolution
  • the weight value of the layer, the second sub-search space corresponding to the second convolutional layer is determined from the target search space; each operation type included in the target search space corresponds to the weight of the third convolutional layer Value, the third sub-search space corresponding to the third convolutional layer is determined from the target search space.
  • part or all of the convolutional layers in the perceptual network to be searched may be used to determine the corresponding sub-search space in the above-mentioned manner.
  • each convolutional layer there are multiple operation types for processing the input feature map for each structure search. If multiple convolutional layers perform the above sub-search space determination step at the same time, then It will take up a lot of content. Therefore, the sub-search spaces of each convolutional layer can be determined in series, that is, the sub-search spaces of each convolutional layer can be determined in sequence.
  • the backbone network further includes a fourth convolutional layer
  • the FPN further includes a fifth convolutional layer
  • the header further includes a sixth convolutional layer.
  • the fifth convolutional layer and the second convolutional layer correspond to a fifth sub-search space; or,
  • the sixth convolutional layer and the third convolutional layer correspond to a sixth sub-search space
  • the fourth sub-search space, the fifth sub-search space, and the sixth sub-search space are subsets of the target search space.
  • the weight value corresponding to the operation type can be shared when the structure search is required.
  • the operation type 1 When the weight value is updated, the operation type 1 The corresponding weight value changes from weight value 1 to weight value 2, and accordingly, the weight value corresponding to operation type 2 in the convolutional layer 2 is also updated from weight value 1 to weight value 2.
  • the sub-search space corresponding to convolutional layer 1 is the same as the search space corresponding to convolutional layer 2. Because the weight value corresponding to the operation type is shared (for the convenience of description, the embodiment of the present application may describe the above method as a weight value Sharing method), which reduces the amount of training parameters in the training process. For details, refer to that shown in FIG. 13b.
  • convolutional layer 1 and convolutional layer 2 may be directly connected convolutional layers or non-directly connected convolutional layers. This application Not limited.
  • the corresponding sub-search space can be determined based on the weight value of each convolutional layer.
  • the sum of ⁇ 1 and ⁇ 1 can be used as the weight value of operation type 1 corresponding to convolutional layer 1 and convolutional layer 2, and the above processing can also be performed for other operation types.
  • the operation type corresponding to the largest sum of M weight values can be determined as the sub-search space corresponding to the convolutional layer 1 and the convolutional layer 2 (the following embodiments may refer to this method as the weight value addition method), through the above method , So that the sub-search space corresponding to convolutional layer 1 is the same as the search space corresponding to convolutional layer 2.
  • each of the multiple operation types included in the target search space can be obtained corresponding to the second weight value of the backbone network, and according to the second weight value corresponding to the backbone network
  • the weight value determines the first sub-search space corresponding to the backbone network from the target search space, where the second weight value represents the influence capability of the operation type on the output result of the perceptual network to be searched; obtaining the target
  • Each of the multiple operation types included in the search space corresponds to the second weight value of the FPN, and the FPN corresponding to the FPN is determined from the target search space according to the second weight value corresponding to the The second sub-search space; obtain the second weight value corresponding to the header of each operation type of the multiple operation types included in the target search space, and according to the second weight value corresponding to the header header, determining the third sub-search space corresponding to the header from the target search space.
  • the convolutional layers included in each network structure can be shared by the weight value of the operation type, or the sub-search space corresponding to each network structure can be determined by adding the corresponding weight values.
  • sub-search space determination methods may be included, but are not limited to:
  • the convolutional layers included in the same network structure do not share the weight value of the operation type, and the sub-search spaces corresponding to each convolutional layer are different, that is, the same network structure corresponds to multiple sub-search spaces;
  • the convolutional layers included in the same network structure do not share the weight value of the operation type, but by adding the weight values, the sub-search spaces corresponding to the partial convolutional layers included in the same network structure are the same.
  • the network structure corresponds to multiple sub-search spaces;
  • the convolutional layers included in the same network structure do not share the weight value of the operation type, but by adding the weight values, the sub-search spaces corresponding to all the convolutional layers included in the same network structure are the same.
  • the network structure corresponds to the same sub-search space;
  • the partial convolutional layers included in the same network structure share weight values, so that the sub-search spaces corresponding to the partial convolutional layers included in the same network structure are the same, and the same network structure corresponds to multiple sub-search spaces;
  • All convolutional layers included in the same network structure share weight values. All convolutional layers included in the same network structure have the same sub-search space, and the same network structure corresponds to the same sub-search space.
  • the structure search may be performed based on the corresponding sub-search space.
  • the corresponding loss function during structure search can be:
  • the penalty term can make the error between the searched model calculation amount or parameter amount and the user-specified model calculation amount FLOPs or model parameter amount Parameters smaller.
  • the searched perception model may also be sent to the end side.
  • the perceptual model obtained by performing a structure search on the COCO data set can obtain a perceptual model with better performance under the condition of the same parameter amount. It can be seen that, compared to the DetNAS-1.3G method of searching only the backbone and header of the backbone network, only the NASFPN of the backbone network backbone and only the Auto-FPN of the backbone network backbone and head are searched, the structure search method of the embodiment of this application achieves Higher mAP. For details, please refer to Table 1:
  • the present application provides a method for searching perceptual network structure, including: acquiring a perceptual network to be searched and a target search space.
  • the perceptual network to be searched includes a backbone network, a feature pyramid network FPN, and a head-end header.
  • the FPN is connected to the header, the backbone network includes a first convolutional layer, the FPN includes a second convolutional layer, the header includes a third convolutional layer, and the target search space includes multiple Operation types; structural search is performed on the first convolutional layer in the target search space, structural search is performed on the second convolutional layer in the target search space, and in the target search space
  • the third convolutional layer performs structural search to obtain a searched perceptual network, wherein the first convolutional layer included in the searched perceptual network corresponds to the first operation type, and the searched perceptual network
  • the second convolutional layer corresponds to the second operation type, the third convolutional layer included in the searched perceptual network corresponds to the third operation type, the first operation type, the second operation type, and the first operation type
  • the three operation types are the operation types among the multiple operation types.
  • the three network structures (the backbone network, the FPN, and the partial and all convolutional layers included in the header) in the search-aware network have been structured to search, so that the performance of the perceptual network obtained after the structure search is better.
  • FIG. 14 is a schematic flowchart of a method for searching a perceptual network structure provided by an embodiment of the application. As shown in FIG. 14, the method for searching for a perceptual network structure provided by an embodiment of the present application includes:
  • the perception network to be searched includes a backbone network and a head-end header, the backbone network is connected to the header, the backbone network includes a first convolutional layer, the header Including the third convolutional layer, the target search space includes multiple operation types.
  • the difference from the embodiment corresponding to Fig. 9 above is that the architecture of the perception network to be searched in the embodiment of the present application does not include FPN, and the backbone network is connected to the header.
  • the architecture of the perception network to be searched in the embodiment of the present application does not include FPN, and the backbone network is connected to the header.
  • the corresponding Fig. 9 The description in the embodiment will not be repeated here.
  • the first operation type and the The third operation type is an operation type among the multiple operation types.
  • a first sub-search space corresponding to the first convolutional layer may be obtained, where the first sub-search space is a subset of the target search space;
  • the first convolutional layer performs a structure search to obtain a first operation type corresponding to the first convolutional layer, where the first operation type is one of the operation types included in the first sub-search space.
  • each operation type of the multiple operation types included in the target search space may be obtained corresponding to the first weight value of the first convolutional layer, and the first weight value indicates that the operation type is relative to the To search for the influencing ability of the output result of the perceptual network, and obtain the first child corresponding to the first convolutional layer from the target search space according to the first weight value corresponding to the first convolutional layer Search space.
  • each of the multiple operation types included in the target search space may be obtained corresponding to the second weight value of the backbone network, and based on the second weight value corresponding to the backbone network
  • the first sub-search space corresponding to the backbone network is determined from the target search space, where the second weight value represents the influence capability of the operation type on the output result of the perceptual network to be searched.
  • a third sub-search space corresponding to the third convolutional layer may be obtained, where the third sub-search space is a subset of the target search space;
  • each of the multiple operation types included in the target search space may be obtained corresponding to the first weight value of the third convolutional layer, and according to the first weight value corresponding to the third convolution The first weight value of the layer is used to obtain the third sub-search space corresponding to the third convolutional layer from the target search space.
  • the second weight value corresponding to the header of each operation type of the multiple operation types included in the target search space may be obtained, and according to the second weight value header corresponding to the header, The third sub-search space corresponding to the header is determined from the target search space.
  • the backbone network includes multiple convolutional layers
  • the first convolutional layer is one of the multiple convolutional layers included in the backbone network.
  • the perceptual network is searched for structure search, and the operation type corresponding to each convolution layer of the multiple convolution layers included in the backbone network and the third operation type corresponding to the third convolution layer are determined.
  • the header includes multiple convolutional layers
  • the third convolutional layer is one of the multiple convolutional layers included in the header, and can perceive the to-be-searched in the target search space.
  • the network performs a structure search to obtain the first operation type corresponding to the first convolutional layer and the operation type corresponding to each convolutional layer in the multiple convolutional layers included in the header.
  • the first model index on the end side may be received; the structure search is performed on the perception network to be searched according to the first model index and a preset loss, until the preset loss meets a preset condition, and the The searched perception model, wherein the preset loss is related to a model index loss, and the model index loss indicates the difference between the second model index of the sensor network to be searched and the first model index, and
  • the first model index and the second model index include at least one of the following: model calculation FLOPs or model parameter parameters.
  • the searched perception model may be sent to the end side.
  • weight training may be performed on the searched perception model to obtain the trained perception model; and the trained perception model may be sent to the terminal device.
  • the present application provides a method for searching perceptual network structure, including: obtaining a perceptual network to be searched and a target search space, the perceptual network to be searched includes a backbone network and a head-end header, the backbone network is connected to the header, and The backbone network includes a first convolutional layer, the header includes a third convolutional layer, and the target search space includes multiple operation types; a structure search is performed on the first convolutional layer in the target search space, and A structure search is performed on the third convolutional layer in the target search space to obtain a searched perceptual network, wherein the first convolutional layer included in the searched perceptual network corresponds to the first operation type, and The third convolutional layer included in the searched perception network corresponds to a third operation type, and the first operation type and the third operation type are operation types among the multiple operation types.
  • FIG. 15 is a schematic flowchart of an image processing method provided by an embodiment of the application.
  • an image processing method provided by an embodiment of the application includes:
  • the perception network includes a backbone network, a feature pyramid network FPN, and a head-end header
  • the backbone network is connected to the FPN
  • the FPN is connected to the header
  • the backbone network includes a first convolutional layer
  • the FPN includes a second convolutional layer
  • the header includes a third convolutional layer
  • the first convolutional layer corresponds to the first operation type
  • the second convolutional layer corresponds to the second operation type
  • the build-up layer corresponds to the third operation type, the correspondence between the first convolutional layer and the first operation type, the correspondence between the second convolutional layer and the second operation type, and the third convolutional layer
  • the corresponding relationship with the third operation type is obtained based on the structure search of the sensing network to be searched in the target search space; or,
  • the perception network includes a backbone network and a head-end header, the backbone network is connected to the header, the backbone network includes a first convolutional layer, the header includes a third convolutional layer, and the first convolutional layer Corresponding to the first operation type, the third convolution layer corresponds to the third operation type, the correspondence between the first convolution layer and the first operation type, and the relationship between the third convolution layer and the third operation type The corresponding relationship is obtained based on the structure search of the sensing network to be searched in the target search space;
  • the target search space includes multiple operation types, and the first operation type, the second operation type, and the third operation type are operation types in the multiple operation types.
  • the target image can be image processed according to the perceptual network obtained by the perceptual network structure search method corresponding to FIG. 9 or FIG. 14 (and weight value training is performed) to obtain the detection result.
  • the search space corresponding to the first convolutional layer is the first sub-search space
  • the search space corresponding to the second convolutional layer is the second sub-search space; or,
  • the search space corresponding to the third convolutional layer is a third sub-search space
  • the first sub-search space and the second sub-search space include some or all of the multiple operation types.
  • This application provides an image processing method, including: acquiring a target image; performing target detection on the target image through a perception network to obtain a detection result; wherein the perception network includes a backbone network, a feature pyramid network FPN, and a head-end header ,
  • the backbone network is connected to the FPN
  • the FPN is connected to the header
  • the backbone network includes a first convolutional layer
  • the FPN includes a second convolutional layer
  • the header includes a third convolutional layer
  • the first convolutional layer corresponds to the first operation type
  • the second convolutional layer corresponds to the second operation type
  • the third convolutional layer corresponds to the third operation type
  • the first convolutional layer corresponds to the first operation type.
  • the correspondence between an operation type, the correspondence between the second convolutional layer and the second operation type, and the correspondence between the third convolution layer and the third operation type are based on being in the target search space It is obtained by performing a structural search on the perception network to be searched; or, the perception network includes a backbone network and a head-end header, the backbone network is connected to the header, the backbone network includes a first convolutional layer, and the header includes The third convolution layer, the first convolution layer corresponds to the first operation type, the third convolution layer corresponds to the third operation type, the correspondence between the first convolution layer and the first operation type, and the The correspondence between the third convolutional layer and the third operation type is obtained based on the structure search of the perception network to be searched in the target search space; wherein, the target search space includes multiple operation types, and the first An operation type, the second operation type, and the third operation type are the operation types of the multiple operation types.
  • the two network structures the backbone network and the header included in the search-aware
  • FIG. 16 is a schematic flowchart of a perceptual network structure search method provided by an embodiment of the present application.
  • the perceptual network structure search method provided by an embodiment of the present application includes:
  • the 1602. Perform a structural search on the perception network to be searched according to the first model index and the preset loss to obtain the searched perception model, where the preset loss is related to the loss of the model index, and the model index loss indicates the to-be-searched perception model.
  • the difference between the second model index of the perception network and the first model index is searched.
  • the first model index and the second model index include at least one of the following: model calculation amount FLOPs or model parameter amount.
  • the corresponding loss function during structure search can be:
  • the penalty term can make the error between the searched model calculation amount or parameter amount and the user-specified model calculation amount FLOPs or model parameter amount Parameters smaller.
  • the searched perception model may also be sent to the end side.
  • weight training may be performed on the searched perception model to obtain the trained perception model, and the trained perception model may be sent to the terminal device.
  • FIG. 17 is a schematic diagram of a structure of a sensory network structure search device provided by an embodiment of the application.
  • the sensory network structure search device may be a server, and the sensory network structure search device includes:
  • the acquiring module 1701 is configured to acquire the perception network to be searched and the target search space.
  • the perception network to be searched includes a backbone network, a feature pyramid network FPN, and a head-end header.
  • the backbone network is connected to the FPN, and the FPN is connected to the target search space.
  • the header is connected, the backbone network includes a first convolutional layer, the FPN includes a second convolutional layer, the header includes a third convolutional layer, and the target search space includes multiple operation types;
  • the structure search module 1702 is configured to search for the structure of the first convolutional layer in the target search space, search for the structure of the second convolutional layer in the target search space, and search for the structure in the target search space.
  • a structure search is performed on the third convolutional layer in the space to obtain a searched perceptual network, where the first convolutional layer included in the searched perceptual network corresponds to the first operation type, and the searched perceptual network
  • the second convolutional layer included in the network corresponds to the second operation type
  • the third convolutional layer included in the searched perceptual network corresponds to the third operation type.
  • the first operation type, the second operation type, and the The third operation type is an operation type among the multiple operation types.
  • the structure search module is specifically used for:
  • the first convolutional layer performs a structure search to obtain a first operation type corresponding to the first convolutional layer, where the first operation type is one of the operation types included in the first sub-search space; and/or,
  • the second convolutional layer performs a structure search to obtain a second operation type corresponding to the second convolutional layer, where the second operation type is one of the operation types included in the second sub-search space; and/or,
  • the structure search module is specifically used for:
  • each of the multiple operation types included in the target search space corresponds to the first weight value of the first convolutional layer, and the first weight value represents the effect of the operation type on the perceptual network to be searched.
  • Output the influencing ability of the result and obtain the first sub-search space corresponding to the first convolutional layer from the target search space according to the first weight value corresponding to the first convolutional layer.
  • the structure search module is specifically used for:
  • the first sub-search space corresponding to the backbone network is determined in the space, where the second weight value represents the influence capability of the operation type on the output result of the perceptual network to be searched.
  • the structure search module is specifically used for:
  • the second sub-search space corresponding to the second convolutional layer is obtained from the target search space.
  • the structure search module is specifically used for:
  • the structure search module is specifically used for:
  • the third sub-search space corresponding to the third convolutional layer is obtained from the target search space.
  • the structure search module is specifically used for:
  • the third sub-search space corresponding to the header is determined in the space.
  • the backbone network includes multiple convolutional layers
  • the first convolutional layer is one of the multiple convolutional layers included in the backbone network
  • the structure search module is specifically configured to:
  • the FPN includes multiple convolutional layers
  • the second convolutional layer is one of the multiple convolutional layers included in the FPN
  • the structure search module is specifically configured to:
  • the header includes multiple convolutional layers
  • the third convolutional layer is one of the multiple convolutional layers included in the header
  • the structure search module is specifically configured to:
  • the device further includes:
  • the receiving module is used to receive the first model index on the end side;
  • the structure search module is specifically used for:
  • the loss of model indicators is related to the loss of model indicators.
  • the loss of model indicators indicates the difference between the second model indicator of the perceptual network to be searched and the first model indicator.
  • the first model indicator and the second model indicator include at least the following One type: model calculation FLOPs or model parameter parameters.
  • the device further includes:
  • the sending module is configured to send the searched perception model to the end side.
  • the device further includes:
  • the weight training module is used to perform weight training on the searched perception model to obtain the trained perception model
  • the sending module is also used to send the trained perception model to the terminal device.
  • the acquisition module 1701 acquires the perceptual network to be searched and the target search space.
  • the perceptual network to be searched includes a backbone network, a feature pyramid network FPN, and a head-end header.
  • the FPN is connected, the FPN is connected to the header, the backbone network includes a first convolutional layer, the FPN includes a second convolutional layer, the header includes a third convolutional layer, and the target search space includes Multiple operation types;
  • the structure search module 1702 performs a structure search on the first convolutional layer in the target search space, performs a structure search on the second convolutional layer in the target search space, and performs a structure search on the second convolutional layer in the target search space.
  • a structural search is performed on the third convolutional layer in the target search space to obtain a searched perceptual network, where the first convolutional layer included in the searched perceptual network corresponds to the first operation type, and the searched perceptual network
  • the second convolutional layer included in the perceptual network corresponds to the second operation type
  • the third convolutional layer included in the searched perceptual network corresponds to the third operation type.
  • the first operation type and the second operation The type and the third operation type are operation types among the multiple operation types.
  • an apparatus for searching perceptual network structure includes:
  • the acquiring module 1701 is configured to acquire a perception network to be searched and a target search space, the perception network to be searched includes a backbone network and a head-end header, the backbone network is connected to the header, and the backbone network includes a first convolutional layer , The header includes a third convolutional layer, and the target search space includes multiple operation types;
  • the structure search module 1702 is configured to perform a structure search on the first convolutional layer in the target search space, and perform a structure search on the third convolutional layer in the target search space to obtain the perception after the search Network, wherein the first convolutional layer included in the searched perception network corresponds to a first operation type, the third convolutional layer included in the searched perception network corresponds to a third operation type, and the first The operation type and the third operation type are operation types among the multiple operation types.
  • the structure search module is specifically used for:
  • the first convolutional layer performs a structure search to obtain a first operation type corresponding to the first convolutional layer, where the first operation type is one of the operation types included in the first sub-search space; and/or,
  • the structure search module is specifically used for:
  • each of the multiple operation types included in the target search space corresponds to the first weight value of the first convolutional layer, and the first weight value represents the effect of the operation type on the perceptual network to be searched.
  • Output the influencing ability of the result and obtain the first sub-search space corresponding to the first convolutional layer from the target search space according to the first weight value corresponding to the first convolutional layer.
  • the structure search module is specifically used for:
  • the first sub-search space corresponding to the backbone network is determined in the space, where the second weight value represents the influence capability of the operation type on the output result of the perceptual network to be searched.
  • the structure search module is specifically used for:
  • the third sub-search space corresponding to the header is determined in the space.
  • the backbone network includes multiple convolutional layers
  • the first convolutional layer is one of the multiple convolutional layers included in the backbone network
  • the structure search module is specifically configured to:
  • the header includes multiple convolutional layers
  • the third convolutional layer is one of the multiple convolutional layers included in the header
  • the structure search module is specifically configured to:
  • the device further includes:
  • the receiving module is used to receive the first model index on the end side;
  • the structure search module is specifically used for:
  • the loss of model indicators is related to the loss of model indicators.
  • the loss of model indicators indicates the difference between the second model indicator of the perceptual network to be searched and the first model indicator.
  • the first model indicator and the second model indicator include at least the following One type: model calculation FLOPs or model parameter parameters.
  • the device further includes:
  • the sending module is configured to send the searched perception model to the end side.
  • the device further includes:
  • the training module is used to perform weight training on the searched perception model to obtain the trained perception model
  • the sending module is also used to send the trained perception model to the terminal device.
  • the acquisition module 1701 acquires the perceptual network to be searched and the target search space.
  • the perceptual network to be searched includes a backbone network and a head-end header, and the backbone network is connected to the header.
  • the backbone network includes a first convolutional layer, the header includes a third convolutional layer, and the target search space includes a plurality of operation types; the structure search module 1702 analyzes the first convolutional layer in the target search space.
  • the two network structures (the backbone network and the partial and all convolutional layers included in the header) in the perceived network to be searched have been structured searched, so that the performance of the perceptual network obtained after the structure search is better.
  • FIG. 18 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the application.
  • the image processing apparatus may be a terminal device or a server, and the image processing apparatus includes:
  • the obtaining module 1801 is used to obtain a target image
  • the target detection module 1802 is configured to perform target detection on the target image through a perception network to obtain a detection result
  • the perception network includes a backbone network, a feature pyramid network FPN, and a head-end header
  • the backbone network is connected to the FPN
  • the FPN is connected to the header
  • the backbone network includes a first convolutional layer
  • the FPN includes a second convolutional layer
  • the header includes a third convolutional layer
  • the first convolutional layer corresponds to the first operation type
  • the second convolutional layer corresponds to the second operation type
  • the build-up layer corresponds to the third operation type, the correspondence between the first convolutional layer and the first operation type, the correspondence between the second convolutional layer and the second operation type, and the third convolutional layer
  • the corresponding relationship with the third operation type is obtained based on the structure search of the sensing network to be searched in the target search space; or,
  • the perception network includes a backbone network and a head-end header, the backbone network is connected to the header, the backbone network includes a first convolutional layer, the header includes a third convolutional layer, and the first convolutional layer Corresponding to the first operation type, the third convolution layer corresponds to the third operation type, the correspondence between the first convolution layer and the first operation type, and the relationship between the third convolution layer and the third operation type The corresponding relationship is obtained based on the structure search of the sensing network to be searched in the target search space;
  • the target search space includes multiple operation types, and the first operation type, the second operation type, and the third operation type are operation types in the multiple operation types.
  • the search space corresponding to the first convolutional layer is the first sub-search space
  • the search space corresponding to the second convolutional layer is the second sub-search space; or,
  • the search space corresponding to the third convolutional layer is a third sub-search space
  • the first sub-search space and the second sub-search space include some or all of the multiple operation types.
  • the embodiment of the present application provides an image processing device.
  • the acquisition module 1801 acquires a target image;
  • the target detection module 1802 performs target detection on the target image through a perception network to obtain a detection result;
  • the perception network includes a backbone network and features A pyramid network FPN and a head-end header, the backbone network is connected to the FPN, the FPN is connected to the header, the backbone network includes a first convolutional layer, the FPN includes a second convolutional layer, and the The header includes a third convolutional layer, the first convolutional layer corresponds to the first operation type, the second convolutional layer corresponds to the second operation type, and the third convolutional layer corresponds to the third operation type.
  • the correspondence between the first convolution layer and the first operation type, the correspondence between the second convolution layer and the second operation type, and the correspondence between the third convolution layer and the third operation type are Based on the structure search of the perception network to be searched in the target search space; or the perception network includes a backbone network and a head-end header, the backbone network is connected to the header, and the backbone network includes the first convolution Layer, the header includes a third convolutional layer, the first convolutional layer corresponds to the first operation type, the third convolutional layer corresponds to the third operation type, and the first convolutional layer corresponds to the first operation type
  • the corresponding relationship between the third convolutional layer and the third operation type is based on the structure search of the sensing network to be searched in the target search space; wherein, the target search space includes multiple Operation type, the first operation type, the second operation type, and the third operation type are operation types among the multiple operation types.
  • the two network structures (the backbone network and the partial and all convolutional layers included in the header) or the three network structures (the backbone network, the FPN and the partial and all convolutional layers included in the header) in the search-aware network are both
  • the structure search is carried out to make the performance of the perceptual network obtained after the structure search better.
  • FIG. 19 is a schematic structural diagram of an execution device provided by an embodiment of this application. Tablets, laptops, smart wearable devices, monitoring data processing devices or servers, etc., are not limited here.
  • the execution device 1900 includes: a receiver 1901, a transmitter 1902, a processor 1903, and a memory 1904 (the number of processors 1903 in the execution device 1900 may be one or more, and one processor is taken as an example in FIG. 19) , Where the processor 1903 may include an application processor 19031 and a communication processor 19032.
  • the receiver 1901, the transmitter 1902, the processor 1903, and the memory 1904 may be connected by a bus or other methods.
  • the memory 1904 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1903. A part of the memory 1904 may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1904 stores a processor and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them, where the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1903 controls the operation of the execution device.
  • the various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • various buses are referred to as bus systems in the figure.
  • the method disclosed in the foregoing embodiment of the present application may be applied to the processor 1903 or implemented by the processor 1903.
  • the processor 1903 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 1903 or instructions in the form of software.
  • the aforementioned processor 1903 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • FPGA field programmable Field-programmable gate array
  • the processor 1903 can implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1904, and the processor 1903 reads the information in the memory 1904, and completes the steps of the foregoing method in combination with its hardware.
  • the receiver 1901 can be used to receive input digital or character information, and generate signal input related to the relevant settings and function control of the execution device.
  • the transmitter 1902 can be used to output digital or character information through the first interface; the transmitter 1902 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1902 can also include display devices such as a display screen .
  • the processor 1903 is configured to obtain a target image, perform target detection on the target image through a perception network, and obtain a detection result;
  • the perception network includes a backbone network and a feature pyramid A network FPN and a head-end header, the backbone network is connected to the FPN, the FPN is connected to the header, the backbone network includes a first convolutional layer, the FPN includes a second convolutional layer, and the header It includes a third convolutional layer, the first convolutional layer corresponds to the first operation type, the second convolutional layer corresponds to the second operation type, and the third convolutional layer corresponds to the third operation type.
  • the correspondence between a convolutional layer and the first operation type, the correspondence between the second convolution layer and the second operation type, and the correspondence between the third convolution layer and the third operation type are based on Obtained by performing a structural search on the perception network to be searched in the target search space; or the perception network includes a backbone network and a head-end header, the backbone network is connected to the header, and the backbone network includes a first convolutional layer ,
  • the header includes a third convolutional layer, the first convolutional layer corresponds to the first operation type, the third convolutional layer corresponds to the third operation type, and the first convolutional layer corresponds to the first operation type
  • the correspondence and the correspondence between the third convolutional layer and the third operation type are obtained based on the structure search of the perception network to be searched in the target search space; wherein, the target search space includes multiple operations Type, the first operation type, the second operation type, and the third operation type are operation types among the multiple operation types.
  • the search space corresponding to the first convolutional layer is the first sub-search space; or, the search space corresponding to the second convolutional layer is the second sub-search space; or, The search space corresponding to the third convolutional layer is a third sub-search space; wherein, the first sub-search space and the second sub-search space include some or all of the multiple operation types.
  • FIG. 20 is a schematic structural diagram of a training device provided in an embodiment of the present application.
  • the training device 2000 may be deployed with the perception described in the embodiment corresponding to FIG.
  • the network structure search device is used to realize the function of the perceptual network structure search device in the embodiment corresponding to FIG. 17.
  • the training device 2000 is implemented by one or more servers, and the training device 2000 can produce relatively large results due to different configurations or performances.
  • the difference may include one or more central processing units (CPU) 2020 (e.g., one or more processors) and memory 2032, and one or more storage media 2030 (e.g., One or one storage device in Shanghai).
  • CPU central processing units
  • storage media 2030 e.g., One or one storage device in Shanghai.
  • the memory 2032 and the storage medium 2030 may be short-term storage or persistent storage.
  • the program stored in the storage medium 2030 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the training device.
  • the central processing unit 2020 may be configured to communicate with the storage medium 2030, and execute a series of instruction operations in the storage medium 2030 on the training device 2000.
  • the training device 2000 may also include one or more power supplies 2026, one or more wired or wireless network interfaces 2050, and one or more input and output interfaces 2058; or, one or more operating systems 2041, such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 2041 such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • the central processor 2020 is configured to execute the acquisition of the perceived network to be searched and the target search space.
  • the perceived network to be searched includes a backbone network, a feature pyramid network FPN, and a head-end header.
  • FPN connection the FPN is connected to the header, the backbone network includes a first convolutional layer, the FPN includes a second convolutional layer, the header includes a third convolutional layer, and the target search space includes multiple Operation types;
  • the structure search is performed on the first convolutional layer in the target search space, the second convolutional layer is searched in the target search space, and in the target search space
  • a structural search is performed on the third convolutional layer to obtain a searched perceptual network, wherein the first convolutional layer included in the searched perceptual network corresponds to the first operation type, and the searched perceptual network includes The second convolutional layer corresponds to the second operation type, the third convolutional layer included in the searched perceptual network corresponds to the third operation type, the first operation type, the second
  • the embodiments of the present application also provide a product including a computer program, which when running on a computer, causes the computer to execute the steps executed by the aforementioned execution device, or causes the computer to execute the steps executed by the aforementioned training device.
  • the embodiments of the present application also provide a computer-readable storage medium that stores a program for signal processing, and when it runs on a computer, the computer executes the steps performed by the aforementioned execution device. Or, make the computer execute the steps performed by the aforementioned training device.
  • the execution device, training device, or terminal device provided by the embodiments of the present application may specifically be a chip.
  • the chip includes a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, Pins or circuits, etc.
  • the processing unit can execute the computer-executable instructions stored in the storage unit to make the chip in the execution device execute the data processing method described in the foregoing embodiment, or to make the chip in the training device execute the data processing method described in the foregoing embodiment.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • Figure 21 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
  • the Host CPU assigns tasks.
  • the core part of the NPU is the arithmetic circuit 2103.
  • the arithmetic circuit 2103 is controlled by the controller 2104 to extract matrix data from the memory and perform multiplication operations.
  • the arithmetic circuit 2103 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 2103 is a two-dimensional systolic array. The arithmetic circuit 2103 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2103 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 2102 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches matrix A data and matrix B from the input memory 2101 to perform matrix operations, and the partial or final result of the obtained matrix is stored in an accumulator 2108.
  • the unified memory 2106 is used to store input data and output data.
  • the weight data directly passes through the memory unit access controller (Direct Memory Access Controller, DMAC) 2105, and the DMAC is transferred to the weight memory 2102.
  • the input data is also transferred to the unified memory 2106 through the DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 2110, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (IFB) 2109.
  • IFB instruction fetch buffer
  • the bus interface unit 2110 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 2109 to obtain instructions from the external memory, and is also used for the storage unit access controller 2105 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 2106 or to transfer the weight data to the weight memory 2102 or to transfer the input data to the input memory 2101.
  • the vector calculation unit 2107 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit 2103, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. It is mainly used in the calculation of non-convolutional/fully connected layer networks in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
  • the vector calculation unit 2107 can store the processed output vector to the unified memory 2106.
  • the vector calculation unit 2107 may apply a linear function; or, apply a nonlinear function to the output of the arithmetic circuit 2103, for example, perform linear interpolation on the feature plane extracted by the convolutional layer, and for example, a vector of accumulated values to generate the activation value.
  • the vector calculation unit 2107 generates normalized values, pixel-level summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 2103, for example for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 2109 connected to the controller 2104 is used to store instructions used by the controller 2104;
  • the unified memory 2106, the input memory 2101, the weight memory 2102, and the fetch memory 2109 are all On-Chip memories.
  • the external memory is private to the NPU hardware architecture.
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware.
  • it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve.
  • all functions completed by computer programs can be easily implemented with corresponding hardware.
  • the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the various embodiments described in this application method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, training device, or data.
  • the center transmits to another website, computer, training equipment, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种感知网络结构搜索方法,涉及人工智能领域,包括:获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型(901);在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络(902)。经过结构搜索后得到的感知网络的性能更优。

Description

一种感知网络结构搜索方法及其装置
本申请要求于2020年02月21日提交中国国家知识产权局、申请号为202010109254.2、发明名称为“一种感知网络结构搜索方法及其装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种感知网络结构搜索方法及其装置。
背景技术
计算机视觉是各个应用领域,如制造业、检验、文档分析、医疗诊断,和军事等领域中各种智能/自主系统中不可分割的一部分,它是一门关于如何运用照相机/摄像机和计算机来获取我们所需的,被拍摄对象的数据与信息的学问。形象地说,就是给计算机安装上眼睛(照相机或摄像机)和大脑(算法)用来代替人眼对目标进行识别、跟踪和测量等,从而使计算机能够感知环境。因为感知可以看作是从感官信号中提取信息,所以计算机视觉也可以看作是研究如何使人工系统从图像或多维数据中“感知”的科学。总的来说,计算机视觉就是用各种成象系统代替视觉器官获取输入信息,再由计算机来代替大脑对这些输入信息完成处理和解释。计算机视觉的最终研究目标就是使计算机能像人那样通过视觉观察和理解世界,具有自主适应环境的能力。
目前视觉感知网络能完成的功能越来越多,包括图片分类、2D检测、语义分割(Mask)、关键点检测、线性物体检测(比如自动驾驶技术中的车道线或停止线检测)、可行驶区域检测等。另外,视觉感知系统具有成本低、非接触性、体积小、信息量大的特点。随着视觉感知算法的精度的不断提高,其成为当今众多人工智能系统的关键技术,得到越来越广泛的应用,如:高级驾驶辅助系统(advanced driving assistant system,ADAS)和自动驾驶系统(autonomous driving system,ADS)中对路面上的动态障碍物(人或车)、静态物体(交通灯、交通标志或交通锥状物)的识别,在终端视觉的拍照美颜功能中通过识别人体的Mask和关键点实现瘦身效果等。
随着人工智能技术的快速发展,一个性能优良的神经网络往往拥有精妙的网络结构,而这需要具有高超技能和丰富经验的人类专家花费大量精力进行构建。为了更好地构建神经网络,人们提出了通过神经网络结构搜索(neuralarchitecture search,NAS)的方法来搭建神经网络,通过自动化地搜索神经网络结构,从而得到性能优异的神经网络结构。
然而针对于视觉感知网络,传统方案都是针对于视觉感知网络中的某一部分单独进行结构搜索,从而使得最终搭建的感知网络可能无法很好地满足应用需求。
发明内容
第一方面,本申请提供了一种感知网络结构搜索方法,包括:
获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接, 所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作(operation)类型;
在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
可选地,在第一方面的一种设计中,所述在所述目标搜索空间内对所述第一卷积层进行结构搜索,包括:获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,
所述在所述目标搜索空间内对所述第二卷积层进行结构搜索,包括:获取所述第二卷积层对应的第二子搜索空间,所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第二子搜索空间内对所述第二卷积层进行结构搜索,得到所述第二卷积层对应的第二操作类型,所述第二操作类型为所述第二子搜索空间包括的操作类型中的一个;和/或,
所述在所述目标搜索空间内对所述第三卷积层进行结构搜索,包括:获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
可选地,在第一方面的一种设计中,所述获取所述第一卷积层对应的第一子搜索空间,包括:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
可选地,在第一方面的一种设计中,所述获取所述第一卷积层对应的第一子搜索空间,包括:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述主干网络的第二权重值,并根据所述对应于所述主干网络的第二权重值从所述目标搜索空间中确定所述主干网络对应的第一子搜索空间,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力。
可选地,在第一方面的一种设计中,所述获取所述第二卷积层对应的第二子搜索空间,包括:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第二卷积层的第二权重值,并根据所述对应于所述第二卷积层的第一权重值,从所述目标搜索空间中得到所述第二卷积层对应的第二子搜索空间。
可选地,在第一方面的一种设计中,所述获取所述第二卷积层对应的第二子搜索空间,包括:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述FPN的第二权重值,并根据所述对应于所述的第二权重值从所述目标搜索空间中确定所述FPN对应的第二子搜索空间。
可选地,在第一方面的一种设计中,所述获取所述第三卷积层对应的第三子搜索空间,包括:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第三卷积层的第一权重值,并根据所述对应于所述第三卷积层的第一权重值,从所述目标搜索空间中得到所述第三卷积层对应的第三子搜索空间。
可选地,在第一方面的一种设计中,所述获取所述第三卷积层对应的第三子搜索空间,包括:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述header的第二权重值,并根据所述对应于所述header的第二权重值header,从所述目标搜索空间中确定所述header对应的第三子搜索空间。
可选地,在第一方面的一种设计中,所述主干网络包括多个卷积层,所述第一卷积层为所述主干网络包括的多个卷积层中的一个,所述在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,包括:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,确定所述主干网络包括的多个卷积层中每个卷积层对应的操作类型、所述第二卷积层对应的第二操作类型和所述第三卷积层对应的第三操作类型。
可选地,在第一方面的一种设计中,所述FPN包括多个卷积层,所述第二卷积层为所述FPN包括的多个卷积层中的一个,所述在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,包括:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型、所述FPN包括的多个卷积层中每个卷积层对应的操作类型和所述第三卷积层对应的第三操作类型。
可选地,在第一方面的一种设计中,所述header包括多个卷积层,所述第三卷积层为所述header包括的多个卷积层中的一个,所述在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,包括:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对 应的第一操作类型、所述第二卷积层对应的第二操作类型和所述header包括的多个卷积层中每个卷积层对应的操作类型。
可选地,在第一方面的一种设计中,所述方法还包括:
接收端侧的第一模型指标;
所述在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,包括:根据所述第一模型指标以及预设损失对所述待搜索感知网络进行结构搜索,直到所述预设损失满足预设条件,得到所述搜索后的感知模型,其中,所述预设损失与模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异,所述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量Parameters。
可选地,在第一方面的一种设计中,所述方法还包括:
向所述端侧发送所述搜索后的感知模型。
可选地,在第一方面的一种设计中,所述方法还包括:
对所述搜索后的感知模型进行权重训练,得到训练后的感知模型;
向所述终端设备发送所述训练后的感知模型。
第二方面,本申请提供了一种感知网络结构搜索方法,包括:
获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型;
在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
可选地,在第二方面的一种设计中,所述在所述目标搜索空间内对所述第一卷积层进行结构搜索,包括:获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,
所述在所述目标搜索空间内对所述第三卷积层进行结构搜索,包括:获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
可选地,在第二方面的一种设计中,所述获取所述第一卷积层对应的第一子搜索空间,包括:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层 的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
可选地,在第二方面的一种设计中,所述获取所述第一卷积层对应的第一子搜索空间,包括:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述主干网络的第二权重值,并根据所述对应于所述主干网络的第二权重值从所述目标搜索空间中确定所述主干网络对应的第一子搜索空间,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力。
可选地,在第二方面的一种设计中,所述获取所述第三卷积层对应的第三子搜索空间,包括:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第三卷积层的第一权重值,并根据所述对应于所述第三卷积层的第一权重值,从所述目标搜索空间中得到所述第三卷积层对应的第三子搜索空间。
可选地,在第二方面的一种设计中,所述获取所述第三卷积层对应的第三子搜索空间,包括:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述header的第二权重值,并根据所述对应于所述header的第二权重值header,从所述目标搜索空间中确定所述header对应的第三子搜索空间。
可选地,在第二方面的一种设计中,所述主干网络包括多个卷积层,所述第一卷积层为所述主干网络包括的多个卷积层中的一个,所述在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,包括:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,确定所述主干网络包括的多个卷积层中每个卷积层对应的操作类型和所述第三卷积层对应的第三操作类型。
可选地,在第二方面的一种设计中,所述header包括多个卷积层,所述第三卷积层为所述header包括的多个卷积层中的一个,所述在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,包括:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型和所述header包括的多个卷积层中每个卷积层对应的操作类型。
可选地,在第二方面的一种设计中,所述方法还包括:
接收端侧的第一模型指标;
所述在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,包括:根据所述第一模型指标以及预设损失对所述待搜索感知网络进行结构搜索,直到所述预设损失满足预设条件,得到所述搜索后的感知模型,其中,所述预设损失与模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异,所述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量Parameters。
可选地,在第二方面的一种设计中,所述方法还包括:
向所述端侧发送所述搜索后的感知模型。
可选地,在第二方面的一种设计中,所述方法还包括:
对所述搜索后的感知模型进行权重训练,得到训练后的感知模型;
向所述终端设备发送所述训练后的感知模型。
第三方面,本申请提供了一种感知网络结构搜索装置,包括:
获取模块,用于获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作operation类型;
结构搜索模块,用于在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
可选地,在第三方面的一种设计中,所述结构搜索模块,具体用于:
获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,
获取所述第二卷积层对应的第二子搜索空间,所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第二子搜索空间内对所述第二卷积层进行结构搜索,得到所述第二卷积层对应的第二操作类型,所述第二操作类型为所述第二子搜索空间包括的操作类型中的一个;和/或,
获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
可选地,在第三方面的一种设计中,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
可选地,在第三方面的一种设计中,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述主干网络的 第二权重值,并根据所述对应于所述主干网络的第二权重值从所述目标搜索空间中确定所述主干网络对应的第一子搜索空间,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力。
可选地,在第三方面的一种设计中,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第二卷积层的第二权重值,并根据所述对应于所述第二卷积层的第一权重值,从所述目标搜索空间中得到所述第二卷积层对应的第二子搜索空间。
可选地,在第三方面的一种设计中,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述FPN的第二权重值,并根据所述对应于所述的第二权重值从所述目标搜索空间中确定所述FPN对应的第二子搜索空间。
可选地,在第三方面的一种设计中,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第三卷积层的第一权重值,并根据所述对应于所述第三卷积层的第一权重值,从所述目标搜索空间中得到所述第三卷积层对应的第三子搜索空间。
可选地,在第三方面的一种设计中,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述header的第二权重值,并根据所述对应于所述header的第二权重值header,从所述目标搜索空间中确定所述header对应的第三子搜索空间。
可选地,在第三方面的一种设计中,所述主干网络包括多个卷积层,所述第一卷积层为所述主干网络包括的多个卷积层中的一个,所述结构搜索模块,具体用于:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,确定所述主干网络包括的多个卷积层中每个卷积层对应的操作类型、所述第二卷积层对应的第二操作类型和所述第三卷积层对应的第三操作类型。
可选地,在第三方面的一种设计中,所述FPN包括多个卷积层,所述第二卷积层为所述FPN包括的多个卷积层中的一个,所述结构搜索模块,具体用于:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型、所述FPN包括的多个卷积层中每个卷积层对应的操作类型和所述第三卷积层对应的第三操作类型。
可选地,在第三方面的一种设计中,所述header包括多个卷积层,所述第三卷积层为所述header包括的多个卷积层中的一个,所述结构搜索模块,具体用于:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型、所述第二卷积层对应的第二操作类型和所述header包括的多个卷积层中每个卷积层对应的操作类型。
可选地,在第三方面的一种设计中,所述装置还包括:
接收模块,用于接收端侧的第一模型指标;
所述结构搜索模块,具体用于:
根据所述第一模型指标以及预设损失对所述待搜索感知网络进行结构搜索,直到所述预设损失满足预设条件,得到所述搜索后的感知模型,其中,所述预设损失与模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异,所述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量Parameters。
可选地,在第三方面的一种设计中,所述装置还包括:
发送模块,用于向所述端侧发送所述搜索后的感知模型。
可选地,在第三方面的一种设计中,所述装置还包括:
权重训练模块,用于对所述搜索后的感知模型进行权重训练,得到训练后的感知模型;
所述发送模块,还用于向所述终端设备发送所述训练后的感知模型。
第四方面,本申请提供了一种感知网络结构搜索装置,包括:
获取模块,用于获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型;
结构搜索模块,用于在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
可选地,在第四方面的一种设计中,所述结构搜索模块,具体用于:
获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,
获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
可选地,在第四方面的一种设计中,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
可选地,在第四方面的一种设计中,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述主干网络的第二权重值,并根据所述对应于所述主干网络的第二权重值从所述目标搜索空间中确定所 述主干网络对应的第一子搜索空间,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力。
可选地,在第四方面的一种设计中,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述header的第二权重值,并根据所述对应于所述header的第二权重值header,从所述目标搜索空间中确定所述header对应的第三子搜索空间。
可选地,在第四方面的一种设计中,所述主干网络包括多个卷积层,所述第一卷积层为所述主干网络包括的多个卷积层中的一个,所述结构搜索模块,具体用于:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,确定所述主干网络包括的多个卷积层中每个卷积层对应的操作类型和所述第三卷积层对应的第三操作类型。
可选地,在第四方面的一种设计中,所述header包括多个卷积层,所述第三卷积层为所述header包括的多个卷积层中的一个,所述结构搜索模块,具体用于:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型和所述header包括的多个卷积层中每个卷积层对应的操作类型。
可选地,在第四方面的一种设计中,所述装置还包括:
接收模块,用于接收端侧的第一模型指标;
所述结构搜索模块,具体用于:
根据所述第一模型指标以及预设损失对所述待搜索感知网络进行结构搜索,直到所述预设损失满足预设条件,得到所述搜索后的感知模型,其中,所述预设损失与模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异,所述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量Parameters。
可选地,在第四方面的一种设计中,所述装置还包括:
发送模块,用于向所述端侧发送所述搜索后的感知模型。
可选地,在第四方面的一种设计中,所述装置还包括:
训练模块,用于对所述搜索后的感知模型进行权重训练,得到训练后的感知模型;
所述发送模块,还用于向所述终端设备发送所述训练后的感知模型。
第五方面,本申请实施例提供了一种图像处理方法,所述方法包括:
获取目标图像;
通过感知网络对所述目标图像进行目标检测,得到检测结果;
其中,所述感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,所述第二卷积层对应于第二操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系、所述第二卷积层与所述第二操作类型的对应关系以及所述第 三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;或,
所述感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;
其中,所述目标搜索空间包括多个操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
可选地,在第五方面的一种设计中,在进行结构搜索时,所述第一卷积层对应的搜索空间为第一子搜索空间;或,
所述第二卷积层对应的搜索空间为第二子搜索空间;或,
所述第三卷积层对应的搜索空间为第三子搜索空间;
其中,所述第一子搜索空间和所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型。
第六方面,本申请实施例提供了一种图像处理装置,所述装置包括:
获取模块,用于获取目标图像;
目标检测模块,用于通过感知网络对所述目标图像进行目标检测,得到检测结果;
其中,所述感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,所述第二卷积层对应于第二操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系、所述第二卷积层与所述第二操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;或,
所述感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;
其中,所述目标搜索空间包括多个操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
可选地,在第六方面的一种设计中,在进行结构搜索时,所述第一卷积层对应的搜索空间为第一子搜索空间;或,
所述第二卷积层对应的搜索空间为第二子搜索空间;或,
所述第三卷积层对应的搜索空间为第三子搜索空间;
其中,所述第一子搜索空间和所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型。
第七方面,本申请实施例提供了一种感知网络结构搜索装置,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,以执行如上述第一方面及其任一可选的方法或第二方面及其任一可选的方法。
第八方面,本申请实施例提供了一种图像处理装置,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,以执行如上述第五方面及其任一可选的方法。
第九方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面及其任一可选的方法或第二方面及其任一可选的方法。
第十方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第五方面及其任一可选的方法。
第十一方面,本申请实施例提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面及其任一可选的方法或第二方面及其任一可选的方法。
第十二方面,本申请实施例提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第五方面及其任一可选的方法。
第八方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持执行设备或训练设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据;或,信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存执行设备或训练设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
本申请提供了一种感知网络结构搜索方法,包括:获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作operation类型;在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。通过上述方式,对待搜索感知网络中的三个网络结构(主干网络、FPN以及header包括的部分和全部卷积层)都进行了结构搜索,使得结构搜索后得到的感知网络的性能更优。
附图说明
图1为人工智能主体框架的一种结构示意图;
图2a和图2b为本发明的应用系统框架示意;
图3为本申请的一种应用场景示意;
图4为本申请的一种应用场景示意;
图5为本申请的一种系统架构示意;
图6为本申请实施例的神经网络的结构示意;
图7为本申请实施例的神经网络的结构示意;
图8a为本申请实施例提供的一种芯片的硬件结构;
图8b为本申请实施例提供的一种系统架构;
图9为本申请实施例提供的一种感知网络结构搜索方法的流程示意;
图10a和图10b为本申请实施例的主干网络backbone;
图11为一种FPN的结构示意;
图12a为一种header的示意;
图12b为一种header的RPN层的示意;
图13a为本实施例中的一种结构搜索过程的示意;
图13b为本实施例中的一种结构搜索过程的示意;
图14为本申请实施例提供的一种感知网络结构搜索方法的流程示意;
图15为本申请实施例提供的一种图像处理方法的流程示意;
图16为本申请实施例提供的一种感知网络结构搜索方法的流程示意;
图17为本申请实施例提供的感知网络结构搜索装置的一种结构示意图;
图18为本申请实施例提供的图像处理装置的一种结构示意图;
图19为本申请实施例提供的执行设备的一种结构示意图;
图20为本申请实施例提供的训练设备一种结构示意图;
图21为本申请实施例提供的芯片的一种结构示意图。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、平安城市等。
本申请实施例主要应用在驾驶辅助、自动驾驶、手机终端等需要完成多种感知任务的领域。本发明的应用系统框架如图2a和图2b所示,视频经过抽帧得到单张图片,该图片送入到本发明中图2a或图2b所示的感知网络,得到该图片中感兴趣物体的2D、3D、Mask(掩膜)、关键点等信息。这些检测结果输出到后处理模块进行处理,比如在自动驾驶系统中送入规划控制单元进行决策、在手机终端中送入美颜算法进行处理得到美颜后的图片。下面分别 对ADAS/ADS视觉感知系统和手机美颜两种应用场景做简单的介绍。
应用场景1:ADAS/ADS视觉感知系统
如图3所示,在ADAS和ADS中,需要实时进行多类型的2D目标检测,包括:动态障碍物(行人(Pedestrian)、骑行者(Cyclist)、三轮车(Tricycle)、轿车(Car)、卡车(Truck)、公交车(Bus)),静态障碍物(交通锥标(TrafficCone)、交通棍标(TrafficStick)、消防栓(FireHydrant)、摩托车(Motocycle)、自行车(Bicycle)),交通标志(TrafficSign、导向标志(GuideSign)、广告牌(Billboard)、红色交通灯(TrafficLight_Red)/黄色交通灯(TrafficLight_Yellow)/绿色交通灯(TrafficLight_Green)/黑色交通灯(TrafficLight_Black)、路标(RoadSign))。另外,为了准确获取动态障碍物的在3维空间所占的区域,还需要对动态障碍物进行3D估计,输出3D框。为了与激光雷达的数据进行融合,需要获取动态障碍物的Mask,从而把打到动态障碍物上的激光点云筛选出来;为了进行精确的泊车位,需要同时检测出泊车位的4个关键点;为了进行构图定位,需要检测出静态目标的关键点。使用本申请实施例提供的技术方案,可以在感知网络中完成上述的全部或一部分功能。
应用场景2:手机美颜功能
如图4所示,在手机中,通过本申请实施例提供的感知网络检测出人体的Mask和关键点,可以对人体相应的部位进行放大缩小,比如进行收腰和美臀操作,从而输出美颜的图片。
应用场景3:图像分类场景:
物体识别装置在获取待分类图像后,采用本申请的物体识别方法获取待分类图像中的物体的类别,然后可根据待分类图像中物体的物体的类别对待分类图像进行分类。对于摄影师来说,每天会拍很多照片,有动物的,有人物,有植物的。采用本申请的方法可以快速地将照片按照照片中的内容进行分类,可分成包含动物的照片、包含人物的照片和包含植物的照片。
对于图像数量比较庞大的情况,人工分类的方式效率比较低下,并且人在长时间处理同一件事情时很容易产生疲劳感,此时分类的结果会有很大的误差;而采用本申请的方法可以快速地将图像进行分类,并且不会有误差。
应用场景4商品分类:
物体识别装置获取商品的图像后,然后采用本申请的物体识别方法获取商品的图像中商品的类别,然后根据商品的类别对商品进行分类。对于大型商场或超市中种类繁多的商品,采用本申请的物体识别方法可以快速完成商品的分类,降低了时间开销和人工成本。
下面从模型训练侧和模型应用侧对本申请提供的方法进行描述:
本申请实施例提供的训练CNN特征提取模型的方法,涉及计算机视觉的处理,具体可以应用于数据训练、机器学习、深度学习等数据处理方法,对训练数据(如本申请中的物体的图像或图像块和物体的类别)进行符号化和形式化的智能信息建模、抽取、预处理、训练等,最终得到训练好的CNN特征提取模型的;并且,本申请实施例将输入数据(如本申请中的物体的图像)输入到所述训练好的CNN特征提取模型中,得到输出数据(如本申请中得到该图片中感兴趣物体的2D、3D、Mask、关键点等信息)。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。
(1)物体识别,利用图像处理和机器学习、计算机图形学等相关方法,确定图像物体的类别。
(2)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2021076984-appb-000001
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(3)深度神经网络
深度神经网络(Deep Neural Network,DNN),可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准,我们常说的多层神经网络和深度神经网络其本质上是同一个东西。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2021076984-appb-000002
其中,
Figure PCTCN2021076984-appb-000003
是输入向量,
Figure PCTCN2021076984-appb-000004
是输出向量,
Figure PCTCN2021076984-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2021076984-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2021076984-appb-000007
由于DNN层数多,则系数W和偏移向量
Figure PCTCN2021076984-appb-000008
的数量也就是很多了。那么,具体的参数在DNN是如何定义的呢?首先我们来看看系数W的定义。以一个三层的DNN为例,如:第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2021076984-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结下,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2021076984-appb-000010
注意,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。
(4)卷积神经网络(Convosutionas Neuras Network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可 以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,我们都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(5)反向传播算法
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。
(6)循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。
在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。
既然已经有了卷积神经网络,为什么还要循环神经网络?原因很简单,在卷积神经网络中,有一个前提假设是:元素之间是相互独立的,输入与输出也是独立的,比如猫和狗。但现实世界中,很多元素都是相互连接的,比如股票随时间的变化,再比如一个人说了:我喜欢旅游,其中最喜欢的地方是云南,以后有机会一定要去。这里填空,人类应该都知道是填“云南”。因为人类会根据上下文的内容进行推断,但如何让机器做到这一步?RNN就应运而生了。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。
(7)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之 间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(8)反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
下面介绍本申请实施例提供系统架构。
参见图5,本申请实施例提供了一种系统架构100。如所述系统架构100所示,数据采集设备160用于采集训练数据,本申请实施例中训练数据包括:物体的图像或者图像块及物体的类别;并将训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到CNN特征提取模型(解释说明:这里的特征提取模型就是前面介绍的经训练阶段训练得到的模型,可以是用于特征提取的神经网络等)。下面将以实施例一更详细地描述训练设备120如何基于训练数据得到CNN特征提取模型,该CNN特征提取模型能够用于实现本申请实施例提供的神经网络,即,将待识别图像或图像块通过相关预处理后输入该CNN特征提取模型,即可得到待识别图像或图像块感兴趣物体的2D、3D、Mask、关键点等信息。本申请实施例中的CNN特征提取模型具体可以为CNN卷积神经网络。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行CNN特征提取模型的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的目标模型/规则可以应用于不同的系统或设备中,如应用于图5所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)AR/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图5中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:待识别图像或者图像块或者图片。
在执行设备120对输入数据进行预处理,或者在执行设备120的计算模块111执行计算等相关的处理(比如进行本申请中神经网络的功能实现)过程中,执行设备120可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果,如上述得到的图像或图像块或者图片中感兴趣物体的2D、3D、Mask、关键点等信息返回给客户设备140,从而提供给用户。
可选地,客户设备140,可以是自动驾驶系统中的规划控制单元、手机终端中的美颜算法模块。
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则,该相应的目标模型/规则即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在图5中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
值得注意的是,图5仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图5中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。
如图5所示,根据训练设备120训练得到CNN特征提取模型,该CNN特征提取模型在本申请实施例中可以是CNN卷积神经网络也可以是下面实施例即将介绍的的神经网络。
由于CNN是一种非常常见的神经网络,下面结合图5重点对CNN的结构进行详细的介绍。如上文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。
本申请实施例的图像处理方法具体采用的神经网络的结构可以如图6所示。在图6中,卷积神经网络(CNN)200可以包括输入层210,卷积层/池化层220(其中池化层为可选的),以及神经网络层230。其中,输入层210可以获取待处理图像,并将获取到的待处理图像交由卷积层/池化层220以及后面的神经网络层230进行处理,可以得到图像的处理结果。下面对图6中的CNN 200中内部的层结构进行详细的介绍。
卷积层/池化层220:
卷积层:
如图6所示卷积层/池化层220可以包括如示例221-226层,举例来说:在一种实现中,221层为卷积层,222层为池化层,223层为卷积层,224层为池化层,225为卷积层,226为池化层;在另一种实现方式中,221、222为卷积层,223为池化层,224、225为卷积层,226为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。
下面将以卷积层221为例,介绍一层卷积层的内部工作原理。
卷积层221可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图 像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的卷积特征图的尺寸也相同,再将提取到的多个尺寸相同的卷积特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络200进行正确的预测。
当卷积神经网络200有多个卷积层的时候,初始的卷积层(例如221)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络200深度的加深,越往后的卷积层(例如226)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图6中220所示例的221-226各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
神经网络层230:
在经过卷积层/池化层220的处理后,卷积神经网络200还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层220只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络200需要利用神经网络层230来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层230中可以包括多层隐含层(如图6所示的231、232至23n)以及输出层240,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图 像识别,图像分类,图像超分辨率重建等等。
在神经网络层230中的多层隐含层之后,也就是整个卷积神经网络200的最后层为输出层240,该输出层240具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络200的前向传播(如图6由210至240方向的传播为前向传播)完成,反向传播(如图6由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络200的损失,及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。
需要说明的是,如图2所示的卷积神经网络210仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在。
本申请实施例的图像处理方法具体采用的神经网络的结构可以如图7所示。在图7中,卷积神经网络(CNN)200可以包括输入层110,卷积层/池化层120(其中池化层为可选的),以及神经网络层130。与图6相比,图7中的卷积层/池化层120中的多个卷积层/池化层并行,将分别提取的特征均输入给全神经网络层130进行处理。
需要说明的是,图6和图7所示的卷积神经网络仅作为一种本申请实施例的图像处理方法的两种可能的卷积神经网络的示例,在具体的应用中,本申请实施例的图像处理方法所采用的卷积神经网络还可以以其他网络模型的形式存在。
另外,采用本申请实施例的神经网络结构的搜索方法得到的卷积神经网络的结构可以如图6和图7中的卷积神经网络结构所示。
图8a为本申请实施例提供的一种芯片的硬件结构,该芯片包括神经网络处理器NPU 50。该芯片可以被设置在如图5所示的执行设备110中,用以完成计算模块111的计算工作。该芯片也可以被设置在如图5所示的训练设备120中,用以完成训练设备120的训练工作并输出目标模型/规则。如图6和图7所示的卷积神经网络中各层的算法均可在如图8a所示的芯片中得以实现。
神经网络处理器NPU 50,NPU作为协处理器挂载到主中央处理器(centralprocessing unit,CPU)(host CPU)上,由主CPU分配任务。NPU的核心部分为运算电路503,控制器504控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。
在一些实现中,运算电路503内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路503是二维脉动阵列。运算电路503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路503是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)508中。
向量计算单元507可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元507可以用于神经网络中非卷积/非FC层的 网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。
在一些实现种,向量计算单元能507将经处理的输出的向量存储到统一缓存器506。例如,向量计算单元507可以将非线性函数应用到运算电路503的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元507生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路503的激活输入,例如用于在神经网络中的后续层中的使用。
统一存储器506用于存放输入数据以及输出数据。
权重数据直接通过存储单元访问控制器505(directmemory accesscontroller,DMAC)将外部存储器中的输入数据搬运到输入存储器501和/或统一存储器506、将外部存储器中的权重数据存入权重存储器502,以及将统一存储器506中的数据存入外部存储器。
总线接口单元(bus interface unit,BIU)510,用于通过总线实现主CPU、DMAC和取指存储器509之间进行交互。
与控制器504连接的取指存储器(instruction fetch buffer)509,用于存储控制器504使用的指令;
控制器504,用于调用指存储器509中缓存的指令,实现控制该运算加速器的工作过程。
可选地,本申请中此处的输入数据为图片,输出数据为图片中感兴趣物体的2D、3D、Mask、关键点等信息。
一般地,统一存储器506,输入存储器501,权重存储器502以及取指存储器509均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random accessmemory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。
上文中介绍的图5中的执行设备110能够执行本申请实施例的图像处理方法或者图像处理方法的各个步骤,图6和图7所示的CNN模型和图8a所示的芯片也可以用于执行本申请实施例的图像处理方法或者图像处理方法的各个步骤。下面结合附图对本申请实施例的图像处理方法和本申请实施例的图像处理方法进行详细的介绍。
如图8b所示,本申请实施例提供了一种系统架构300。该系统架构包括本地设备301、本地设备302以及执行设备210和数据存储系统250,其中,本地设备301和本地设备302通过通信网络与执行设备210连接。
执行设备210可以由一个或多个服务器实现。可选的,执行设备210可以与其它计算设备配合使用,例如:数据存储器、路由器、负载均衡器等设备。执行设备210可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备210可以使用数据存储系统250中的数据,或者调用数据存储系统250中的程序代码来实现本申请实施例的搜索神经网络结构的方法。
具体地,执行设备210可以执行以下过程:
获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接, 所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型;在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
通过上述过程执行设备210能够搭建成一个目标神经网络,该目标神经网络可以用于图像分类或者进行图像处理等等。
用户可以操作各自的用户设备(例如本地设备301和本地设备302)与执行设备210进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备210进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。
在一种实现方式中,本地设备301、本地设备302从执行设备210获取到目标神经网络的相关参数,将目标神经网络部署在本地设备301、本地设备302上,利用该目标神经网络进行图像分类或者图像处理等等。
在另一种实现中,执行设备210上可以直接部署目标神经网络,执行设备210通过从本地设备301和本地设备302获取待处理图像,并根据目标神经网络对待处理图像进行分类或者其他类型的图像处理。
上述执行设备210也可以称为云端设备,此时执行设备210一般部署在云端。
下面先结合8b对本申请实施例的神经网络的构建方法进行详细的介绍。图9所示的方法可以由神经网络构建装置来执行,该神经网络构建装置可以是电脑、服务器等运算能力足以用来神经网络构建装置。
图9所示的方法包括步骤901至902,下面分别对这些步骤进行详细的描述。
901、获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作operation类型。
本申请实施例中,待搜索感知网络的架构可以为图2a中示出的架构,其主要由主干网络backbone、特征金字塔网络(feature pyramid network,FPN)以及头端header组成。
本申请实施例中,主干网络backbone用于接收输入的图片,并对输入的图片进行卷积处理,输出对应所述图片的具有不同分辨率的特征图;也就是说输出对应所述图片的不同大小的特征图,也就是说,Backbone完成基础特征的提取,为后续的检测提供相应的特征。
具体的,主干网络可以对输入的图片进行一系列的卷积处理,得到在不同的尺度下的特征图(feature map)。这些特征图将为后续的检测模块提供基础特征。主干网络可以采用 多种形式,比如视觉几何组(visual geometry group,VGG)、残差神经网络(residual neural network,resnet)、GoogLeNet的核心结构(Inception-net)等。
主干网络backbone可以对输入的图像进行卷积处理,生成若干不同尺度的卷积特征图,每张特征图是一个H*W*C的矩阵,其中H是特征图的高度,W是特征图的宽度、C是特征图的通道数。
Backbone可以采用目前多种现有的卷积网络框架,比如VGG16、Resnet50、Inception-Net等,下面以Resnet18为Backbone为例进行说明。该流程如图10a所示。
假设输入的图片的分辨率为H*W*3(高度H,宽度W,通道数为3,也就是RBG三个通道)。输入图片经过Resnet18的第一个卷积模块Res18-Conv1(图中的卷积模块1)进行卷积运算,生成Featuremap(特征图)C1,这个特征图相对于输入图像进行了2次下采样,并且通道数扩充为64,因此C1的分辨率是H/4*W/4*64,该卷积模块1由若干卷积层组成,后面的卷积模块类似,参照图10b,图10b为卷积模块的结构示意,如图10b中示出的那样,卷积模块1可以包括多个卷积层(卷积层1至卷积层N);C1经过Resnet18的第2个卷积模块Res18-Conv2(图中的卷积模块2)进行卷积运算,得到Featuremap C2,这个特征图的分辨率与C1一致;C2继续经过Resnet18的第3个卷积模块Res18-Conv3(图中的卷积模块3)处理,生成Featuremap C3,这个特征图相对C2进一步下采样,通道数增倍,其分辨率为H/8*W/8*128;最后C3经过Res18-Conv4(图中的卷积模块4)处理,生成Featuremap C4,其分辨率为H/16*W/16*256。
从图10a可以看出,Resnet18对输入图片进行多个层次的卷积处理,得到不同尺度的特征图:C1/C2/C3/C4。底层的特征图的宽度和高度比较大,通道数较少,其主要为图像的低层特征(比如图像边缘、纹理特征),高层的特征图的宽度和高度比较小,通道数较多,其主要为图像的高层特征(比如形状、物体特征)。后续的2D检测流程将会基于这些特征图进行进一步的预测。
本申请实施例中,主干网络backbone包括多个卷积模块,每个卷积模块包括多个卷积层,每个卷积模块可以对输入的特征图进行卷积处理,已得到不同分辨率的特征图,本申请实施例中主干网络backbone包括的第一卷积层为主干网络backbone包括的多个卷积层中的一个。
需要说明的是,本申请实施例中的主干网络也可以称为骨干网络,这里并不限定。
需要说明的是,图10a和图10b中示出的主干网络backbone仅为一种实现方式,并不构成对本申请的限定。
本申请实施例中,FPN与主干网络backbone连接,FPN可以对主干网络backbone生成的多个不同分辨率的特征图进行卷积处理,来构造特征金字塔。
参照图11,图11为一种FPN的结构示意,其中,使用卷积模块1对最顶层特征图C4进行处理,卷积模块1可以包括至少一个卷积层,示例性的,卷积模块1可以使用空洞卷积和1×1卷积将最顶层特征图C4的通道数下降为256,作为特征金字塔的最顶层特征图P4;横向链接最顶层下一层特征图C3的输出结果并使用1×1卷积(卷积模块2)降低通道数至256后,与特征图p4逐像素相加得到特征图p3;以此类推,从上到下,构建出特征金字塔Φp={特 征图p4,特征图p3,特征图p2,特征图p1}。
本申请实施例中,FPN包括多个卷积模块,每个卷积模块包括多个卷积层,每个卷积模块可以对输入的特征图进行卷积处理,本申请实施例中FPN包括的第二卷积层为FPN包括的多个卷积层中的一个。
需要说明的是,图11中示出的FPN仅为一种实现方式,并不构成对本申请的限定。
本申请实施例中,header与FPN连接,header可以根据FPN提供的特征图,完成一个任务的2D框的检测,输出这个任务的物体的2D框以及对应的置信度等等,接下来描述一种header的结构示意,参照图12a,图12a为一种header的示意,如图12a中示出的那样,Header包括候选区域生成网络(Region Proposal Network,RPN)、ROI-ALIGN和RCNN三个模块。
其中,RPN模块可以用于在FPN提供的一个或者多个特征图上预测所述任务物体所在的区域,并输出匹配所述区域的候选2D框;或者可以这样理解,RPN在FPN输出的一个或者多个横图上预测出可能存在该任务物体的区域,并且给出这些区域的框,这些区域称为候选区域(Proposal)。比如,当Header负责检测车时,其RPN层就预测出可能存在车的候选框;当Header负责检测人时,其RPN层就预测出可能存在人的候选框。当然,这些Proposal是不准确的,一方面其不一定含有该任务的物体,另一方面这些框也是不紧致的。
2D候选区域预测流程可以由Header的RPN模块实施,其根据FPN提供的特征图,预测出可能存在该任务物体的区域,并且给出这些区域的候选框(也可以叫候选区域,Proposal)。在本实施例中,若Header负责检测车,其RPN层就预测出可能存在车的候选框。
RPN层的基本结构可以如图12b所示。在FPN提供的特征图上通过卷积模块1(例如一个3*3的卷积),生成特征图RPNHidden。后面Header的RPN层将会从RPN Hidden中预测Proposal。具体来说,Header的RPN层分别通过卷积模块2和卷积模块3(例如分别是一个1*1的卷积),预测出RPN Hidden每个位置处的Proposal的坐标以及置信度。这个置信度越高,表示这个Proposal存在该任务的物体的概率越大。比如,在Header中某个Proposal的score越大,就表示其存在车的概率越大。每个RPN层预测出来的Proposal需要经过Proposal合并模块,根据Proposal之间的重合程度去掉多余的Proposal(这个过程可以采用但不限制于NMS算法),在剩余的K个Proposal中挑选出score最大的N(N<K)个Proposal作为候选的可能存在物体的区域。从图12b可以看出,这些Proposal是不准确的,一方面其不一定含有该任务的物体,另一方面这些框也是不紧致的。因此,RPN模块只是一个粗检测的过程,需要后续的RCNN模块进行细分。在RPN模块回归Proposal的坐标时,并不是直接回归坐标的绝对值,而是回归出相对于Anchor的坐标。当这些Anchor与实际的物体匹配越高,RPN能检测出物体的概率越大。
ROI-ALIGN模块用于根据所述RPN模块预测得到的区域,从所述FPN提供的一个特征图中扣取出所述候选2D框所在区域的特征;也就是说,ROI-ALIGN模块主要根据RPN模块提供的Proposal,在某个特征图上把每个Proposal所在的区域的特征扣取出来,并且resize到固定的大小,得到每个Proposal的特征。可以理解的是,ROI-ALIGN模块可以使用但不局限于ROI-POOLING(感兴趣区域池化)/ROI-ALIGN(感兴趣区域提取)/PS-ROIPOOLING(位置敏感的 感兴趣区域池化)/PS-ROIALIGN(位置敏感的感兴趣区域提取)等特征抽取方法。
RCNN模块用于通过神经网络对所述候选2D框所在区域的特征进行卷积处理,得到所述候选2D框属于各个物体类别的置信度;通过神经网络对所述候选区域2D框的坐标进行调整,使得调整后的2D候选框比所述候选2D框与实际物体的形状更加匹配,并选择置信度大于预设阈值的调整后的2D候选框作为所述区域的2D框。也就是说,RCNN模块主要是对ROI-ALIGN模块提出的每个Proposal的特征进行细化处理,得到每个Proposal的属于各个类别置信度(比如对于车这个任务,会给出Backgroud/Car/Truck/Bus 4个分数),同时对Proposal的2D框的坐标进行调整,输出更加紧致的2D框。这些2D框经过非极大值抑制(non maximum suppression,NMS)合并后,作为最后的2D框输出。
2D候选区域细分类主要由图12a中的Header的RCNN模块实施,其根据ROI-ALIGN模块提取出来的每个Proposal的特征,进一步回归出更加紧致的2D框坐标,同时对这个Proposal进行分类,输出其属于各个类别的置信度。RCNN的可实现形式很多,其中一种实现形式如图12b所示。ROI-ALIGN模块输出的特征大小可以为N*14*14*256(Feature of proposals),其在RCNN模块中首先经过Resnet18的卷积模块4(Res18-Conv5)处理,输出的特征大小为N*7*7*512,然后通过一个Global Avg Pool(平均池化层)进行处理,把输入特征中每个通道内的7*7的特征进行平均,得到N*512的特征,其中每个1*512维的特征向量代表每个Proposal的特征。接下来通过2个全连接层FC分别回归框的精确坐标(输出N*4的向量,这4个数值分表表示框的中心点x/y坐标,框的宽高),框的类别的置信度(在Header0中,需要给出这个框是Backgroud/Car/Truck/Bus的分数)。最后通过框合并操作,选择分数最大的若干个框,并且通过NMS操作去除重复的框,从而得到紧致的框输出。
在一些实际应用场景中,该感知网络还可以包括其他Header,可以在检测出2D框的基础上,进一步进行3D/Mask/Keypoint检测。示例性的,以3D为例,ROI-ALIGN模块根据Header提供的准确的2D框,在FPN输出的特征图上提取出每个2D框所在区域的特征,假设2D框的个数为M,那么ROI-ALIGN模块输出的特征大小为M*14*14*256,其首先经过Resnet18的卷积模块5(例如为Res18-Conv5)处理,输出的特征大小为N*7*7*512,然后通过一个Global Avg Pool(平均池化层)进行处理,把输入特征中每个通道的7*7的特征进行平均,得到M*512的特征,其中每个1*512维的特征向量代表每个2D框的特征。接下来通过3个全连接层FC分别回归框中物体的朝向角(orientation,M*1向量)、质心点坐标(centroid,M*2向量,这2个数值表示质心的x/y坐标)和长宽高(dimention)。
本申请实施例中,header包括至少是一个卷积模块,每个卷积模块包括至少一个卷积层,每个卷积模块可以对输入的特征图进行卷积处理,本申请实施例中header包括的第三卷积层为header包括的多个卷积层中的一个。
需要说明的是,图12a和图12b中示出的header仅为一种实现方式,并不构成对本申请的限定。
本申请实施例中,还需要获取目标搜索空间,其中,目标搜索空间可以是根据待搜索感知网络的应用需求确定的。具体地,上述目标搜索空间可以是根据待搜索感知网络的处理数据的类型确定的。
例如,当上述待搜索感知网络为用于处理图像数据的神经网络时,上述目标搜索空间包含的操作的种类和数量要与图像数据的处理相适应。上述目标搜索空间包含的可以是预先设定好的卷积神经网络中的多个操作类型,操作类型可以是基础运算或者基础运算的组合,这些基础运算或者基础运算的组合可以统称为操作类型。
示例性的,上述目标搜索空间可以包含但不限于包括但不限于卷积、池化、残差连接等操作类型,例如可以包括以下操作类型:
1x3和3x1convolution、1x7和7x1convolution、3x3 dilatedconvolution、3x3 average pooling、3x3 max pooling、5x5max pooling、7x7 max pooling、1x1convolution、3x3convolution、3x3 separable conv、5x5 seperable conv、7x7 separable conv、跳连接操作、置零操作(Zero,相应位置所有神经元置零)等等;
其中,示例性的,3x3 average pooling表示池化核大小为3×3的均值池化;3x3 max pooling表示池化核大小为3×3的最大值池化;3x3 dilatedconvolution表示卷积核大小为3×3且空洞率为2的空洞卷积;3x3 separable conv表示卷积核大小为3×3的分离卷积;5x5 seperable conv表示卷积核大小为5×5的分离卷积。
本申请实施例中,在获取到待搜索感知网络和目标搜索空间之后,可以将通过结构搜索来确定待搜索感知网络中包括的卷积层对应的操作类型。
902、在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
本申请实施例中,在获取到待搜索感知网络和目标搜索空间之后,可以在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
首先,描述要进行结构搜索的对象:
如上述实施例中描述的那样,待搜索感知网络可以包括多个卷积层,其中,主干网络包括多个卷积层,FPN包括多个卷积层,header包括多个卷积层,本实施例中,可以将主干网络包括的一部分或者全部卷积层作为结构搜索的对象,同时,将FPN包括的一部分或者全部卷积层作为结构搜索的对象,同时,将header包括的一部分或者全部卷积层作为结构搜索的对象,需要说明的是,选择的进行结构搜索的卷积层的数量越多,最后搜索得到的感知网络越好,但是需要消耗额外的内存开销。
本申请实施例中,参照图13a,图13a为本实施例中的一种结构搜索过程的示意,如图13a中示出的那样,该卷积层(第一卷积层、第二卷积层或第三卷积层)对应的搜索空间包括操作类型1、操作类型2、…、操作类型N,在对待搜索的感知网络进行前馈时,特征图输入到该卷积层时,分别通过搜索空间包括的操作类型1、操作类型2、…、操作类型N进行卷积处理,以得到卷积处理后的N个特征图,并对N个特征图进行加权平均,可以选择主干网络中的部分或全部卷积层进行结构搜索、FPN中的部分或全部卷积层进行结构搜索、header中的部分或全部卷积层进行结构搜索,其中待搜索感知网络的每个卷积层都采用上述处理方式,并将加权平均后的特征图,作为该卷积层的输出特征图,其中,针对于每个卷积层(第一卷积层、第二卷积层或第三卷积层),在做加权平均时,其对应的搜索空间的每个操作类型都对应有一个权重值,在刚开始进行结构搜索时,各个权重值都有一个初始值,进行前馈后,可以得到处理结果,并将处理结果和真值进行比对得到损失loss,计算梯度,然后利用这个梯度更新待搜索感知网络中每个进行结构搜索的卷积层对应的搜索空间包括的各个操作类型对应的权重值,进过一定次数的迭代后,可以得到待搜索感知网络中每个进行结构搜索的卷积层对应的搜索空间包括的各个操作类型对应的权重值。针对于每个需要进行结构搜索的卷积层,可以确定该卷积层的操作类型为最大的权重值对应的操作类型。
本申请实施例中,所述主干网络包括多个卷积层,所述第一卷积层为所述主干网络包括的多个卷积层中的一个,在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,确定所述主干网络包括的多个卷积层中每个卷积层对应的操作类型、所述第二卷积层对应的第二操作类型和所述第三卷积层对应的第三操作类型。
本申请实施例中,所述FPN包括多个卷积层,所述第二卷积层为所述FPN包括的多个卷积层中的一个,在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型、所述FPN包括的多个卷积层中每个卷积层对应的操作类型和所述第三卷积层对应的第三操作类型。
本申请实施例中,所述header包括多个卷积层,所述第三卷积层为所述header包括的多个卷积层中的一个,在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型、所述第二卷积层对应的第二操作类型和所述header包括的多个卷积层中每个卷积层对应的操作类型。
在一种实施例中,可以对主干网络包括的全部卷积层进行结构搜索,对FPN包括的部分卷积层进行结构搜索,对header包括的部分卷积层进行结构搜索。
在一种实施例中,可以对主干网络包括的部分卷积层进行结构搜索,对FPN包括的全部卷积层进行结构搜索,对header包括的部分卷积层进行结构搜索。
在一种实施例中,可以对主干网络包括的部分卷积层进行结构搜索,对FPN包括的部分卷积层进行结构搜索,对header包括的全部卷积层进行结构搜索。
在一种实施例中,可以对主干网络包括的全部卷积层进行结构搜索,对FPN包括的全部卷积层进行结构搜索,对header包括的部分卷积层进行结构搜索。
在一种实施例中,可以对主干网络包括的部分卷积层进行结构搜索,对FPN包括的全部卷积层进行结构搜索,对header包括的全部卷积层进行结构搜索。
在一种实施例中,可以对主干网络包括的全部卷积层进行结构搜索,对FPN包括的部分卷积层进行结构搜索,对header包括的全部卷积层进行结构搜索。
在一种实施例中,可以对主干网络包括的全部卷积层进行结构搜索,对FPN包括的全部卷积层进行结构搜索,对header包括的全部卷积层进行结构搜索。
示例性的,假设主干网络backbone有L层卷积层,FPN有M层卷积层,header有N层卷积层,那么α,β,γ分别为L*8、M*8、N*8的二维数组,每个数代表每一层中搜索空间的每个操作类型的权重值,权重值越高表示被选中的概率越大。结构搜索过程中三部分参数α,β,γ一起进行更新,对应的损失函数可以为:
Figure PCTCN2021076984-appb-000011
其中C()是计算量或参数量,该部分为非必须的,若预先指定了计算量或参数量要求,则可以包括该部分。使用这个损失函数进行结构搜索训练指定迭代次数后,就得到了优化后的的参数α,β,γ,通过选取每一卷积层值最大的α,β,γ对应的操作类型,就可以得结构搜索出来的待搜索感知网络中三部分的结构。
本申请实施例中,由于卷积层搜索对应的目标搜索空间包括的操作类型的数量很多,若直接基于目标搜索空间包括的全部操作类型进行结构搜索,会消耗大量的内存。
本申请实施例中,在进行结构搜索时,所述第一卷积层对应的搜索空间为第一子搜索空间;或,所述第二卷积层对应的搜索空间为第二子搜索空间;或,所述第三卷积层对应的搜索空间为第三子搜索空间;其中,所述第一子搜索空间、所述第二子搜索空间和所述第三子搜索空间为所述目标搜索空间的子集。
具体的,可以获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,获取所述第二卷积层对应的第二子搜索空间,所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第二子搜索空间内对所述第二卷积层进行结构搜索,得到所述第二卷积层对应的第二操作类型,所述第二操作类型为所述第二子搜索空间包括的操作类型中的一个;和/或,获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
即,针对于卷积层或者网络结构(主干网络、FPN或header),可以将目标搜索空间的一个子集(下文可以称为子搜索空间)作为该卷积层或网络结构在结构搜索时对应使用的搜索空间。接下来描述如何确定上述子集,本实施例针对于下述两种情况进行说明:1、确定每个需要进行结构搜索的卷积层对应的子搜索空间;2、确定各个网络结构(主干网络、FPN以及header)对应的子搜索空间。
一、确定每个需要进行结构搜索的卷积层对应的子搜索空间
本申请实施例中,可以获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间;获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第二卷积层的第二权重值,并根据所述对应于所述第二卷积层的第一权重值,从所述目标搜索空间中得到所述第二卷积层对应的第二子搜索空间;获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第三卷积层的第一权重值,并根据所述对应于所述第三卷积层的第一权重值,从所述目标搜索空间中得到所述第三卷积层对应的第三子搜索空间。
针对于每一卷积层,输入特征图为x,目标搜索空间可以包括N个操作类型作为候选,特征图x输入到这N个操作类型里,输出N个特征图{y 1,y 2,…,y N},可以使用α作为权重值对他们做加权求和:
y=α 1y 12y 2+…+α Ny N
该卷积层的输出特征图y作为下一层的输入特征图。对待搜索感知网络进行前馈后,得到推理结果,搜索过程中的损失函数可以为:
Figure PCTCN2021076984-appb-000012
其中,f(α)是搜索时的识别损失函数,比如但不限于交叉熵损失函数,L21正则能够促使稀疏的出现,容易选择更合适的子搜索空间。使用这个损失函数进行搜索训练指定迭代次数后,可以得到稀疏的参数α,通过选取值最大的M个α对应的操作类型,就可以提炼出了该卷积层对应的子搜索空间。上述α可以表示该搜索类型对于待搜索感知网络的输出结果的影响能力,α越大,相当于该搜索类型对于待搜索感知网络的输出结果的影响能力越大。
需要说明的是,当进行某一卷积层的子搜索空间确定时,在进行前馈时,该卷积层的权重可以设置为预设值,其余卷积层设置为预设的操作类型和权重值。
本申请实施例中,可以根据所述目标搜索空间包括的各个操作类型对应于所述第一卷积层的权重值(即上述α),从所述目标搜索空间中得到所述第一卷积层对应的第一子搜索空间,其中所述权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力;根据所述目标搜索空间包括的各个操作类型对应于所述第二卷积层的权重值,从所述目标搜索空间中确定所述第二卷积层对应的第二子搜索空间;根据所述目标搜索空间包括的各个操作类型对应于所述第三卷积层的权重值,从所述目标搜索空间中确定所述第三卷积层对应的第三子搜索空间。
需要说明的是,可以对待搜索感知网络中的部分或全部卷积层利用上述方式确定对应的子搜索空间。
需要说明的是,由于针对于每个卷积层而言,每次结构搜索都存在多个操作类型对输入特征图进行处理,若多个卷积层同时进行上述子搜索空间的确定步骤,则会占用大量内容,因此,可以串行进行各个卷积层的子搜索空间确定步骤,即依次确定各个卷积层的子 搜索空间。
在一种实施例中,所述主干网络还包括第四卷积层,所述FPN还包括第五卷积层,所述header还包括第六卷积层,在进行结构搜索时,所述第四卷积层和所述第一卷积层对应于第四子搜索空间;或,
所述第五卷积层和所述第二卷积层对应于第五子搜索空间;或,
所述第六卷积层和所述第三卷积层对应于第六子搜索空间;
其中,所述第四子搜索空间、所述第五子搜索空间和所述第六子搜索空间为所述目标搜索空间的子集。
本申请表实施例中,在进行结构搜索时,若同一个网络结构包括多个卷积层需要进行结构搜索时,可以共享操作类型对应的权重值,具体的,若对于卷积层1而言,其加权求和的方式为:y=α 1y 12y 2+…+α Ny N;对于和卷积层1属于同一个网络结构的卷积层2而言,其加权求和的方式为:Y=α 1Y 12Y 2+…+α NY N;在每次迭代时,卷积层1和卷积层2中各个操作类型的权重值是同步更新的,即,若在某次迭代时,卷积层1的操作类型1对应于权重值1,则卷积层2中的操作类型1也对应于权重值1,进行权重值更新时,操作类型1对应的权重值由权重值1变为权重值2,则相应的,卷积层2中的操作类型2对应的权重值也由权重值1更新为权重值2。通过上述方式,使得卷积层1对应的子搜索空间与卷积层2对应的搜索空间相同,由于操作类型对应的权重值共享(为了方便描述,本申请实施例可以将上述方式描述为权重值共享的方式),减少了训练过程中的训练参数量。具体可以参照图13b示出的那样。
需要说明的是,图13b中的结构搜索过程仅为一种示意,实际应用中,卷积层1和卷积层2可以为直连的卷积层或者是非直连的卷积层,本申请并不限定。
本申请表实施例中,在进行结构搜索时,若同一个网络结构包括多个卷积层需要进行结构搜索时,可以基于各个卷积层的权重值来确定对应的子搜索空间,具体的,若对于卷积层1而言,当完成迭代后其加权求和的方式为:y=α 1y 12y 2+…+α Ny N;对于和卷积层1属于同一个网络结构的卷积层2而言,当完成迭代后其加权求和的方式为:Y=β 1Y 12Y 2+…+β NY N;其中,α 1和β 1为对于与操作类型1的权重值,可以将α 1和β 1的加和作为卷积层1和卷积层2共同对应的操作类型1的权重值,对于其他操作类型也可以进行上述处理。可以确定加和最大的M个权重值对应的操作类型作为卷积层1和卷积层2对应的子搜索空间(以下实施例可以将该方式称为权重值加和的方式),通过上述方法,使得卷积层1对应的子搜索空间与卷积层2对应的搜索空间相同。
二、确定各个网络结构(主干网络、FPN以及header)对应的子搜索空间
本申请实施例中,可以获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述主干网络的第二权重值,并根据所述对应于所述主干网络的第二权重值从所述目标搜索空间中确定所述主干网络对应的第一子搜索空间,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力;获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述FPN的第二权重值,并根据所述对应于所述的第二 权重值从所述目标搜索空间中确定所述FPN对应的第二子搜索空间;获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述header的第二权重值,并根据所述对应于所述header的第二权重值header,从所述目标搜索空间中确定所述header对应的第三子搜索空间。
本申请实施例中,可以通过将各个网络结构包括的卷积层进行操作类型的权重值共享,或者通过对应的权重值的加和的方式来确定各个网络结构对应的子搜索空间,具体技术细节可以参照上述实施例中的描述,这里不再赘述。
综上,本申请实施例中,可以但不限定于包括如下子搜索空间的确定方式:
1、同一个网络结构包括的卷积层之间不共享操作类型的权重值,且各个卷积层对应的子搜索空间不同,即同一个网络结构对应于多个子搜索空间;
2、同一个网络结构包括的卷积层之间不共享操作类型的权重值,但是通过权重值加和的方式,使得同一个网络结构包括的部分卷积层对应的子搜索空间相同,同一个网络结构对应于多个子搜索空间;
3、同一个网络结构包括的卷积层之间不共享操作类型的权重值,但是通过权重值加和的方式,使得同一个网络结构包括的全部卷积层对应的子搜索空间相同,同一个网络结构对应于同一个子搜索空间;
4、同一个网络结构包括的部分卷积层之间共享权重值,使得同一个网络结构包括的部分卷积层对应的子搜索空间相同,同一个网络结构对应于多个子搜索空间;
5、同一个网络结构包括的部分卷积层之间共享权重值,但是通过权重值加和的方式,使得同一个网络结构包括的全部卷积层对应的子搜索空间相同,同一个网络结构对应于同一个子搜索空间;
6、同一个网络结构包括的全部卷积层之间共享权重值,同一个网络结构包括的全部卷积层对应的子搜索空间相同,同一个网络结构对应于同一个子搜索空间。
本申请实施例中,在得到各个卷积层或者网络结构对应的子搜索空间后,可以基于对应的子搜索空间进行结构搜索。
本申请实施例中,终端设备还可以包括:接收端侧的第一模型指标,所述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量Parameters;根据所述第一模型指标以及预设损失对所述待搜索感知网络进行结构搜索,得到搜索后的感知模型,其中,所述预设损失与所述模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异。
本申请实施例中,如果用户给定了需要的计算量或参数量s,那么结构搜索时,对应的损失函数可以为:
Figure PCTCN2021076984-appb-000013
其中,惩罚项可以促使搜索的模型计算量或参数量和用户指定的模型计算量FLOPs或模型参数量Parameters之间误差较小。
本申请实施例中,还可以向所述端侧发送所述搜索后的感知模型。
本申请实施例中,还可以对所述搜索后的感知模型进行权重训练,得到训练后的感知 模型,并向所述终端设备发送所述训练后的感知模型。
本申请实施例中在COCO数据集上进行结构搜索得到的感知模型,跟已有的方法相比,在同等参数量的情况下,本申请实施例能够得到性能更优的感知模型。可以看到,相比于只搜索主干网络backbone和header的DetNAS-1.3G方法,只搜索主干网络backbone的NASFPN,只搜索主干网络backbone和Head的Auto-FPN,本申请实施例的结构搜索方法达到了更高的mAP。具体可以参照表1:
表1本在COCO数据集上的结果统计
Tale 2.Comparison of the number of parameters,FLOPs and mAP on COCO minival.The FLOPs is based on the 800×800 input and 1000 proposals in region proposal network.
Figure PCTCN2021076984-appb-000014
means the 2x schedule in training.
Figure PCTCN2021076984-appb-000015
本申请提供了一种感知网络结构搜索方法,包括:获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型;在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。通过上述方式,对待搜索感知网络中的三个网络结构(主干网络、FPN以及header包括的部分和全部卷积层)都进行了结构搜索,使得结构搜索后得到的感知网络的性能更优。
参照图14,图14为本申请实施例提供的一种感知网络结构搜索方法的流程示意,如图14中示出的那样,本申请实施例提供的一种感知网络结构搜索方法包括:
1401、获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型。
参照图2b,和上述图9对应的实施例不同的是,本申请实施例中的待搜索感知网络的架构中不包括FPN,主干网络与所述header连接,其余技术细节可以参照图9对应的实施例中的描述,这里不再赘述。
1402、在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间 内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
关于如何根据在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,可以参照上述图9对应的实施例中的描述,这里不再赘述。
可选地,可以获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间为所述目标搜索空间的子集;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个。
可选地,可以获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
可选地,可以获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述主干网络的第二权重值,并根据所述对应于所述主干网络的第二权重值从所述目标搜索空间中确定所述主干网络对应的第一子搜索空间,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力。
可选地,可以获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间为所述目标搜索空间的子集;
在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
可选地,可以获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第三卷积层的第一权重值,并根据所述对应于所述第三卷积层的第一权重值,从所述目标搜索空间中得到所述第三卷积层对应的第三子搜索空间。
可选地,可以获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述header的第二权重值,并根据所述对应于所述header的第二权重值header,从所述目标搜索空间中确定所述header对应的第三子搜索空间。
可选地,所述主干网络包括多个卷积层,所述第一卷积层为所述主干网络包括的多个卷积层中的一个,可以在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,确定所述主干网络包括的多个卷积层中每个卷积层对应的操作类型和所述第三卷积层对应的第三操作类型。
可选地,所述header包括多个卷积层,所述第三卷积层为所述header包括的多个卷积层中的一个,可以在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型和所述header包括的多个卷积层中每个卷积层对应的操作类型。
可选地,可以接收端侧的第一模型指标;根据所述第一模型指标以及预设损失对所述 待搜索感知网络进行结构搜索,直到所述预设损失满足预设条件,得到所述搜索后的感知模型,其中,所述预设损失与模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异,所述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量Parameters。
可选地,可以向所述端侧发送所述搜索后的感知模型。
可选地,可以对所述搜索后的感知模型进行权重训练,得到训练后的感知模型;向所述终端设备发送所述训练后的感知模型。
本申请提供了一种感知网络结构搜索方法,包括:获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型;在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型和所述第三操作类型为所述多个操作类型中的操作类型。通过上述方式,对待搜索感知网络中的两个网络结构(主干网络以及header包括的部分和全部卷积层)都进行了结构搜索,使得结构搜索后得到的感知网络的性能更优。
参照图15,图15为本申请实施例提供的一种图像处理方法的流程示意,如图15中示出的那样,本申请实施例提供的一种图像处理方法包括:
1501、获取目标图像。
1502、通过感知网络对所述目标图像进行目标检测,得到检测结果;
其中,所述感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,所述第二卷积层对应于第二操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系、所述第二卷积层与所述第二操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;或,
所述感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;
其中,所述目标搜索空间包括多个操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
本申请实施例中,可以根据通过图9或者图14对应的感知网络结构搜索方法得到的感知网络(并进行了权重值训练)对目标图像进行图像处理,得到检测结果。
可选地,在进行结构搜索时,所述第一卷积层对应的搜索空间为第一子搜索空间;或,
所述第二卷积层对应的搜索空间为第二子搜索空间;或,
所述第三卷积层对应的搜索空间为第三子搜索空间;
其中,所述第一子搜索空间和所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型。
本申请提供了一种图像处理方法,包括:获取目标图像;通过感知网络对所述目标图像进行目标检测,得到检测结果;其中,所述感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,所述第二卷积层对应于第二操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系、所述第二卷积层与所述第二操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;或,所述感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;其中,所述目标搜索空间包括多个操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型通过上述方式,对待搜索感知网络中的两个网络结构(主干网络以及header包括的部分和全部卷积层)都进行了结构搜索,使得结构搜索后得到的感知网络的性能更优。
参照图16,图16为本申请实施例提供的一种感知网络结构搜索方法的流程示意,如图16中示出的那样,本申请实施例提供的一种感知网络结构搜索方法包括:
1601、接收端侧的第一模型指标;
1602、根据所述第一模型指标以及预设损失对待搜索感知网络进行结构搜索,得到搜索后的感知模型,其中,所述预设损失与模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异,所述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量。
本申请实施例中,如果用户给定了需要的计算量或参数量s,那么结构搜索时,对应的损失函数可以为:
Figure PCTCN2021076984-appb-000016
其中,惩罚项可以促使搜索的模型计算量或参数量和用户指定的模型计算量FLOPs或模型参数量Parameters之间误差较小。
可选地,还可以向所述端侧发送所述搜索后的感知模型。
可选地,还可以对所述搜索后的感知模型进行权重训练,得到训练后的感知模型,向 所述终端设备发送所述训练后的感知模型。
在图1至图16所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图17,图17为本申请实施例提供的感知网络结构搜索装置的一种结构示意图,感知网络结构搜索装置可以是服务器,感知网络结构搜索装置包括:
获取模块1701,用于获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型;
结构搜索模块1702,用于在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
可选地,所述结构搜索模块,具体用于:
获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,
获取所述第二卷积层对应的第二子搜索空间,所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第二子搜索空间内对所述第二卷积层进行结构搜索,得到所述第二卷积层对应的第二操作类型,所述第二操作类型为所述第二子搜索空间包括的操作类型中的一个;和/或,
获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
可选地,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
可选地,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述主干网络的第二权重值,并根据所述对应于所述主干网络的第二权重值从所述目标搜索空间中确定所 述主干网络对应的第一子搜索空间,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力。
可选地,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第二卷积层的第二权重值,并根据所述对应于所述第二卷积层的第一权重值,从所述目标搜索空间中得到所述第二卷积层对应的第二子搜索空间。
可选地,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述FPN的第二权重值,并根据所述对应于所述的第二权重值从所述目标搜索空间中确定所述FPN对应的第二子搜索空间。
可选地,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第三卷积层的第一权重值,并根据所述对应于所述第三卷积层的第一权重值,从所述目标搜索空间中得到所述第三卷积层对应的第三子搜索空间。
可选地,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述header的第二权重值,并根据所述对应于所述header的第二权重值header,从所述目标搜索空间中确定所述header对应的第三子搜索空间。
可选地,所述主干网络包括多个卷积层,所述第一卷积层为所述主干网络包括的多个卷积层中的一个,所述结构搜索模块,具体用于:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,确定所述主干网络包括的多个卷积层中每个卷积层对应的操作类型、所述第二卷积层对应的第二操作类型和所述第三卷积层对应的第三操作类型。
可选地,所述FPN包括多个卷积层,所述第二卷积层为所述FPN包括的多个卷积层中的一个,所述结构搜索模块,具体用于:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型、所述FPN包括的多个卷积层中每个卷积层对应的操作类型和所述第三卷积层对应的第三操作类型。
可选地,所述header包括多个卷积层,所述第三卷积层为所述header包括的多个卷积层中的一个,所述结构搜索模块,具体用于:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型、所述第二卷积层对应的第二操作类型和所述header包括的多个卷积层中每个卷积层对应的操作类型。
可选地,所述装置还包括:
接收模块,用于接收端侧的第一模型指标;
所述结构搜索模块,具体用于:
根据所述第一模型指标以及预设损失对所述待搜索感知网络进行结构搜索,直到所述 预设损失满足预设条件,得到所述搜索后的感知模型,其中,所述预设损失与模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异,所述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量Parameters。
可选地,所述装置还包括:
发送模块,用于向所述端侧发送所述搜索后的感知模型。
可选地,所述装置还包括:
权重训练模块,用于对所述搜索后的感知模型进行权重训练,得到训练后的感知模型;
所述发送模块,还用于向所述终端设备发送所述训练后的感知模型。
本申请提供了一种感知网络结构搜索装置,获取模块1701获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型;结构搜索模块1702在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。通过上述方式,对待搜索感知网络中的三个网络结构(主干网络、FPN以及header包括的部分和全部卷积层)都进行了结构搜索,使得结构搜索后得到的感知网络的性能更优。
参照图17,本申请实施例提供的一种感知网络结构搜索装置,包括:
获取模块1701,用于获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型;
结构搜索模块1702,用于在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
可选地,所述结构搜索模块,具体用于:
获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,
获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索, 得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
可选地,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
可选地,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述主干网络的第二权重值,并根据所述对应于所述主干网络的第二权重值从所述目标搜索空间中确定所述主干网络对应的第一子搜索空间,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力。
可选地,所述结构搜索模块,具体用于:
获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述header的第二权重值,并根据所述对应于所述header的第二权重值header,从所述目标搜索空间中确定所述header对应的第三子搜索空间。
可选地,所述主干网络包括多个卷积层,所述第一卷积层为所述主干网络包括的多个卷积层中的一个,所述结构搜索模块,具体用于:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,确定所述主干网络包括的多个卷积层中每个卷积层对应的操作类型和所述第三卷积层对应的第三操作类型。
可选地,所述header包括多个卷积层,所述第三卷积层为所述header包括的多个卷积层中的一个,所述结构搜索模块,具体用于:
在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,得到所述第一卷积层对应的第一操作类型和所述header包括的多个卷积层中每个卷积层对应的操作类型。
可选地,所述装置还包括:
接收模块,用于接收端侧的第一模型指标;
所述结构搜索模块,具体用于:
根据所述第一模型指标以及预设损失对所述待搜索感知网络进行结构搜索,直到所述预设损失满足预设条件,得到所述搜索后的感知模型,其中,所述预设损失与模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异,所述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量Parameters。
可选地,所述装置还包括:
发送模块,用于向所述端侧发送所述搜索后的感知模型。
可选地,所述装置还包括:
训练模块,用于对所述搜索后的感知模型进行权重训练,得到训练后的感知模型;
所述发送模块,还用于向所述终端设备发送所述训练后的感知模型。
本申请提供了一种感知网络结构搜索装置,获取模块1701获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作类型;结构搜索模块1702在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型和所述第三操作类型为所述多个操作类型中的操作类型。通过上述方式,对待搜索感知网络中的两个网络结构(主干网络以及header包括的部分和全部卷积层)都进行了结构搜索,使得结构搜索后得到的感知网络的性能更优。
参阅图18,图18为本申请实施例提供的图像处理装置的一种结构示意图,图像处理装置可以是终端设备或者服务器,图像处理装置包括:
获取模块1801,用于获取目标图像;
目标检测模块1802,用于通过感知网络对所述目标图像进行目标检测,得到检测结果;
其中,所述感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,所述第二卷积层对应于第二操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系、所述第二卷积层与所述第二操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;或,
所述感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;
其中,所述目标搜索空间包括多个操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
可选地,在进行结构搜索时,所述第一卷积层对应的搜索空间为第一子搜索空间;或,
所述第二卷积层对应的搜索空间为第二子搜索空间;或,
所述第三卷积层对应的搜索空间为第三子搜索空间;
其中,所述第一子搜索空间和所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型。
本申请实施例提供了一种图像处理装置,获取模块1801获取目标图像;目标检测模块1802通过感知网络对所述目标图像进行目标检测,得到检测结果;其中,所述感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN 与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,所述第二卷积层对应于第二操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系、所述第二卷积层与所述第二操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;或所述感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;其中,所述目标搜索空间包括多个操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。通过上述方式,对待搜索感知网络中的两个网络结构(主干网络以及header包括的部分和全部卷积层)或者三个网络结构(主干网络、FPN以及header包括的部分和全部卷积层)都进行了结构搜索,使得结构搜索后得到的感知网络的性能更优。
接下来介绍本申请实施例提供的一种执行设备,请参阅图19,图19为本申请实施例提供的执行设备的一种结构示意图,执行设备1900具体可以表现为虚拟现实VR设备、手机、平板、笔记本电脑、智能穿戴设备、监控数据处理设备或服务器等,此处不做限定。具体的,执行设备1900包括:接收器1901、发射器1902、处理器1903和存储器1904(其中执行设备1900中的处理器1903的数量可以一个或多个,图19中以一个处理器为例),其中,处理器1903可以包括应用处理器19031和通信处理器19032。在本申请的一些实施例中,接收器1901、发射器1902、处理器1903和存储器1904可通过总线或其它方式连接。
存储器1904可以包括只读存储器和随机存取存储器,并向处理器1903提供指令和数据。存储器1904的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1904存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1903控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1903中,或者由处理器1903实现。处理器1903可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1903中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1903可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1903可以实现或 者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1904,处理器1903读取存储器1904中的信息,结合其硬件完成上述方法的步骤。
接收器1901可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1902可用于通过第一接口输出数字或字符信息;发射器1902还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1902还可以包括显示屏等显示设备。
本申请实施例中,在一种情况下,处理器1903,用于获取目标图像,通过感知网络对所述目标图像进行目标检测,得到检测结果;其中,所述感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,所述第二卷积层对应于第二操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系、所述第二卷积层与所述第二操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;或所述感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;其中,所述目标搜索空间包括多个操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
可选地,在进行结构搜索时,所述第一卷积层对应的搜索空间为第一子搜索空间;或,所述第二卷积层对应的搜索空间为第二子搜索空间;或,所述第三卷积层对应的搜索空间为第三子搜索空间;其中,所述第一子搜索空间和所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型。
本申请实施例还提供了一种训练设备,请参阅图20,图20是本申请实施例提供的训练设备一种结构示意图,训练设备2000上可以部署有图17对应实施例中所描述的感知网络结构搜索装置,用于实现图17对应实施例中感知网络结构搜索装置的功能,具体的,训练设备2000由一个或多个服务器实现,训练设备2000可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)2020(例如,一个或一个以上处理器)和存储器2032,一个或一个以上存储应用程序2042或数据2044的存储介质2030(例如一个或一个以上海量存储设备)。其中,存储器2032和存储介质2030可以是短暂存储或持久存储。存储在存储介质2030的程序可以包括一个或 一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器2020可以设置为与存储介质2030通信,在训练设备2000上执行存储介质2030中的一系列指令操作。
训练设备2000还可以包括一个或一个以上电源2026,一个或一个以上有线或无线网络接口2050,一个或一个以上输入输出接口2058;或,一个或一个以上操作系统2041,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,中央处理器2020,用于执行获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作operation类型;在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。
本申请实施例提供的执行设备、训练设备或终端设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使执行设备内的芯片执行上述实施例描述的数据处理方法,或者,以使训练设备内的芯片执行上述实施例描述的数据处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体的,请参阅图21,图21为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 2100,NPU 2100作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路2103,通过控制器2104控制运算电路2103提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路2103内部包括多个处理单元(Process Engine,PE)。在一 些实现中,运算电路2103是二维脉动阵列。运算电路2103还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路2103是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器2102中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器2101中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)2108中。
统一存储器2106用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)2105,DMAC被搬运到权重存储器2102中。输入数据也通过DMAC被搬运到统一存储器2106中。
BIU为Bus Interface Unit即,总线接口单元2110,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)2109的交互。
总线接口单元2110(Bus Interface Unit,简称BIU),用于取指存储器2109从外部存储器获取指令,还用于存储单元访问控制器2105从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2106或将权重数据搬运到权重存储器2102中或将输入数据数据搬运到输入存储器2101中。
向量计算单元2107包括多个运算处理单元,在需要的情况下,对运算电路2103的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元2107能将经处理的输出的向量存储到统一存储器2106。例如,向量计算单元2107可以将线性函数;或,非线性函数应用到运算电路2103的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元2107生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路2103的激活输入,例如用于在神经网络中的后续层中的使用。
控制器2104连接的取指存储器(instruction fetch buffer)2109,用于存储控制器2104使用的指令;
统一存储器2106,输入存储器2101,权重存储器2102以及取指存储器2109均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装 置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (30)

  1. 一种感知网络结构搜索方法,其特征在于,包括:
    获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作(operation)类型;
    在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述多个操作类型包括所述第一操作类型、所述第二操作类型和所述第三操作类型。
  2. 根据权利要求1所述的方法,其特征在于,所述在所述目标搜索空间内对所述第一卷积层进行结构搜索,包括:获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,
    所述在所述目标搜索空间内对所述第二卷积层进行结构搜索,包括:获取所述第二卷积层对应的第二子搜索空间,所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第二子搜索空间内对所述第二卷积层进行结构搜索,得到所述第二卷积层对应的第二操作类型,所述第二操作类型为所述第二子搜索空间包括的操作类型中的一个;和/或,
    所述在所述目标搜索空间内对所述第三卷积层进行结构搜索,包括:获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
  3. 根据权利要求2所述的方法,其特征在于,所述获取所述第一卷积层对应的第一子搜索空间,包括:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
  4. 根据权利要求2所述的方法,其特征在于,所述获取所述第一卷积层对应的第一子 搜索空间,包括:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述主干网络的第二权重值,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述主干网络的第二权重值,从所述目标搜索空间中确定所述主干网络对应的第一子搜索空间。
  5. 根据权利要求2至4任一所述的方法,其特征在于,所述获取所述第二卷积层对应的第二子搜索空间,包括:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第二卷积层的第二权重值,并根据所述对应于所述第二卷积层的第一权重值,从所述目标搜索空间中得到所述第二卷积层对应的第二子搜索空间。
  6. 根据权利要求2至4任一所述的方法,其特征在于,所述获取所述第二卷积层对应的第二子搜索空间,包括:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述FPN的第二权重值,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述的第二权重值从所述目标搜索空间中确定所述FPN对应的第二子搜索空间。
  7. 根据权利要求2至6任一所述的方法,其特征在于,所述获取所述第三卷积层对应的第三子搜索空间,包括:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第三卷积层的第一权重值,并根据所述对应于所述第三卷积层的第一权重值,从所述目标搜索空间中得到所述第三卷积层对应的第三子搜索空间。
  8. 根据权利要求2至6任一所述的方法,其特征在于,所述获取所述第三卷积层对应的第三子搜索空间,包括:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述header的第二权重值,并根据所述对应于所述header的第二权重值header,从所述目标搜索空间中确定所述header对应的第三子搜索空间。
  9. 根据权利要求1至8任一所述的方法,其特征在于,所述方法还包括:
    接收端侧的第一模型指标;
    所述在所述目标搜索空间内对所述待搜索感知网络进行结构搜索,包括:根据所述第一模型指标以及预设损失对所述待搜索感知网络进行结构搜索,直到所述预设损失满足预设条件,得到所述搜索后的感知模型,其中,所述预设损失与模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异,所 述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量Parameters。
  10. 一种感知网络结构搜索方法,其特征在于,包括:
    获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作(operation)类型;
    在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
  11. 根据权利要求10所述的方法,其特征在于,所述在所述目标搜索空间内对所述第一卷积层进行结构搜索,包括:获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,
    所述在所述目标搜索空间内对所述第三卷积层进行结构搜索,包括:获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
  12. 根据权利要求11所述的方法,其特征在于,所述获取所述第一卷积层对应的第一子搜索空间,包括:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
  13. 根据权利要求11或12所述的方法,其特征在于,所述获取所述第三卷积层对应的第三子搜索空间,包括:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第三卷积层的第一权重值,并根据所述对应于所述第三卷积层的第一权重值,从所述目标搜索空间中得到所述第三卷积层对应的第三子搜索空间。
  14. 一种图像处理方法,其特征在于,所述方法包括:
    获取目标图像;
    通过感知网络对所述目标图像进行目标检测,得到检测结果;
    其中,所述感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,所述第二卷积层对应于第二操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系、所述第二卷积层与所述第二操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;或,
    所述感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;
    其中,所述目标搜索空间包括多个操作类型,所述第一操作类型、所述第二操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
  15. 根据权利要求14所述的方法,其特征在于,在进行结构搜索时,所述第一卷积层对应的搜索空间为第一子搜索空间;或,
    所述第二卷积层对应的搜索空间为第二子搜索空间;或,
    所述第三卷积层对应的搜索空间为第三子搜索空间;
    其中,所述第一子搜索空间和所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型。
  16. 一种感知网络结构搜索装置,其特征在于,包括:
    获取模块,用于获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作operation类型;
    结构搜索模块,用于在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第二卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第二卷积层对应于第二操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述多个操作类型包括所述第一操作类型、所述第二操作类型和所述第三操作类型。
  17. 根据权利要求16所述的装置,其特征在于,所述结构搜索模块,具体用于:
    获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作 类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,
    获取所述第二卷积层对应的第二子搜索空间,所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第二子搜索空间内对所述第二卷积层进行结构搜索,得到所述第二卷积层对应的第二操作类型,所述第二操作类型为所述第二子搜索空间包括的操作类型中的一个;和/或,
    获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
  18. 根据权利要求17所述的装置,其特征在于,所述结构搜索模块,具体用于:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
  19. 根据权利要求17所述的装置,其特征在于,所述结构搜索模块,具体用于:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述主干网络的第二权重值,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述主干网络的第二权重值,从所述目标搜索空间中确定所述主干网络对应的第一子搜索空间。
  20. 根据权利要求17至19任一所述的装置,其特征在于,所述结构搜索模块,具体用于:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第二卷积层的第二权重值,并根据所述对应于所述第二卷积层的第一权重值,从所述目标搜索空间中得到所述第二卷积层对应的第二子搜索空间。
  21. 根据权利要求17至19任一所述的装置,其特征在于,所述结构搜索模块,具体用于:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述FPN的第二权重值,其中所述第二权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述的第二权重值从所述目标搜索空间中确定所述FPN对应的第二子搜索空间。
  22. 根据权利要求17至21任一所述的装置,其特征在于,所述结构搜索模块,具体用于:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第三卷积层的第一权重值,并根据所述对应于所述第三卷积层的第一权重值,从所述目标搜索空间中得到所述第三卷积层对应的第三子搜索空间。
  23. 根据权利要求17至21任一所述的装置,其特征在于,所述结构搜索模块,具体用于:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述header的第二权重值,并根据所述对应于所述header的第二权重值header,从所述目标搜索空间中确定所述header对应的第三子搜索空间。
  24. 根据权利要求16至23任一所述的装置,其特征在于,所述装置还包括:
    接收模块,用于接收端侧的第一模型指标;
    所述结构搜索模块,具体用于:
    根据所述第一模型指标以及预设损失对所述待搜索感知网络进行结构搜索,直到所述预设损失满足预设条件,得到所述搜索后的感知模型,其中,所述预设损失与模型指标损失有关,所述模型指标损失指示所述待搜索感知网络的第二模型指标与所述第一模型指标之间的差异,所述第一模型指标和所述第二模型指标至少包括如下的一种:模型计算量FLOPs或模型参数量Parameters。
  25. 一种感知网络结构搜索装置,其特征在于,包括:
    获取模块,用于获取待搜索感知网络和目标搜索空间,所述待搜索感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述目标搜索空间包括多个操作(operation)类型;
    结构搜索模块,用于在所述目标搜索空间内对所述第一卷积层进行结构搜索,在所述目标搜索空间内对所述第三卷积层进行结构搜索,得到搜索后的感知网络,其中,所述搜索后的感知网络包括的第一卷积层对应于第一操作类型,所述搜索后的感知网络包括的第三卷积层对应于第三操作类型,所述第一操作类型和所述第三操作类型为所述多个操作类型中的操作类型。
  26. 根据权利要求25所述的装置,其特征在于,所述结构搜索模块,具体用于:
    获取所述第一卷积层对应的第一子搜索空间,所述第一子搜索空间包括所述多个操作类型中的部分或全部操作类型;在所述第一子搜索空间内对所述第一卷积层进行结构搜索,得到所述第一卷积层对应的第一操作类型,所述第一操作类型为所述第一子搜索空间包括的操作类型中的一个;和/或,
    获取所述第三卷积层对应的第三子搜索空间,所述第三子搜索空间包括所述多个操作 类型中的部分或全部操作类型,在所述第三子搜索空间内对所述第三卷积层进行结构搜索,得到所述第三卷积层对应的第三操作类型,所述第三操作类型为所述第三子搜索空间包括的操作类型中的一个。
  27. 根据权利要求26所述的装置,其特征在于,所述结构搜索模块,具体用于:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第一卷积层的第一权重值,所述第一权重值表示操作类型对于所述待搜索感知网络的输出结果的影响能力,并根据所述对应于所述第一卷积层的第一权重值,从所述目标搜索空间中获取所述第一卷积层对应的第一子搜索空间。
  28. 根据权利要求26或27所述的装置,其特征在于,所述结构搜索模块,具体用于:
    获取所述目标搜索空间包括的多个操作类型中的每个操作类型对应于所述第三卷积层的第一权重值,并根据所述对应于所述第三卷积层的第一权重值,从所述目标搜索空间中得到所述第三卷积层对应的第三子搜索空间。
  29. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取目标图像;
    目标检测模块,用于通过感知网络对所述目标图像进行目标检测,得到检测结果;
    其中,所述感知网络包括主干网络、特征金字塔网络FPN和头端header,所述主干网络与所述FPN连接,所述FPN与所述header连接,所述主干网络包括第一卷积层,所述FPN包括第二卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,所述第二卷积层对应于第二操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系、所述第二卷积层与所述第二操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;或,
    所述感知网络包括主干网络和头端header,所述主干网络与所述header连接,所述主干网络包括第一卷积层,所述header包括第三卷积层,所述第一卷积层对应于第一操作类型,第三卷积层对应于第三操作类型,所述第一卷积层与第一操作类型的对应关系以及所述第三卷积层与所述第三操作类型的对应关系为基于在所述目标搜索空间内对待搜索感知网络进行结构搜索得到的;
    其中,所述目标搜索空间包括所述第一操作类型、所述第二操作类型和所述第三操作类型。
  30. 根据权利要求29所述的装置,其特征在于,在进行结构搜索时,所述第一卷积层对应的搜索空间为第一子搜索空间;或,
    所述第二卷积层对应的搜索空间为第二子搜索空间;或,
    所述第三卷积层对应的搜索空间为第三子搜索空间;
    其中,所述第一子搜索空间和所述第二子搜索空间包括所述多个操作类型中的部分或全部操作类型。
PCT/CN2021/076984 2020-02-21 2021-02-20 一种感知网络结构搜索方法及其装置 WO2021164751A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21757765.9A EP4109343A4 (en) 2020-02-21 2021-02-20 PERCEPTION NETWORK ARCHITECTURE SEARCH METHOD AND DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010109254.2 2020-02-21
CN202010109254.2A CN111401517B (zh) 2020-02-21 2020-02-21 一种感知网络结构搜索方法及其装置

Publications (1)

Publication Number Publication Date
WO2021164751A1 true WO2021164751A1 (zh) 2021-08-26

Family

ID=71436308

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076984 WO2021164751A1 (zh) 2020-02-21 2021-02-20 一种感知网络结构搜索方法及其装置

Country Status (3)

Country Link
EP (1) EP4109343A4 (zh)
CN (1) CN111401517B (zh)
WO (1) WO2021164751A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401517B (zh) * 2020-02-21 2023-11-03 华为技术有限公司 一种感知网络结构搜索方法及其装置
CN112215073A (zh) * 2020-09-10 2021-01-12 华蓝设计(集团)有限公司 高速运动场景下的交通标线快速识别与循迹方法
CN112528123A (zh) * 2020-12-18 2021-03-19 北京百度网讯科技有限公司 模型搜索方法、装置、电子设备、存储介质和程序产品
CN112766282B (zh) * 2021-01-18 2024-04-12 上海明略人工智能(集团)有限公司 图像识别方法、装置、设备及计算机可读介质
CN113705320A (zh) * 2021-05-24 2021-11-26 中国科学院深圳先进技术研究院 手术动作识别模型的训练方法、介质和设备
CN113657388B (zh) * 2021-07-09 2023-10-31 北京科技大学 一种融合图像超分辨率重建的图像语义分割方法
CN115063635A (zh) * 2022-06-23 2022-09-16 澜途集思生态科技集团有限公司 基于DetNAS算法的生态生物识别方法
CN117132761A (zh) * 2023-08-25 2023-11-28 京东方科技集团股份有限公司 目标检测方法及装置、存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919304A (zh) * 2019-03-04 2019-06-21 腾讯科技(深圳)有限公司 神经网络搜索方法、装置、可读存储介质和计算机设备
CN110020667A (zh) * 2019-02-21 2019-07-16 广州视源电子科技股份有限公司 神经网络结构的搜索方法、系统、存储介质以及设备
US20190286984A1 (en) * 2018-03-13 2019-09-19 Google Llc Neural architecture search by proxy
CN110533180A (zh) * 2019-07-15 2019-12-03 北京地平线机器人技术研发有限公司 网络结构搜索方法和装置、可读存储介质、电子设备
CN111401517A (zh) * 2020-02-21 2020-07-10 华为技术有限公司 一种感知网络结构搜索方法及其装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11250314B2 (en) * 2017-10-27 2022-02-15 Cognizant Technology Solutions U.S. Corporation Beyond shared hierarchies: deep multitask learning through soft layer ordering
US20190188285A1 (en) * 2017-12-19 2019-06-20 Facebook, Inc. Image Search with Embedding-based Models on Online Social Networks
CN108694401B (zh) * 2018-05-09 2021-01-12 北京旷视科技有限公司 目标检测方法、装置及系统
WO2019232099A1 (en) * 2018-05-29 2019-12-05 Google Llc Neural architecture search for dense image prediction tasks
US10304009B1 (en) * 2018-10-08 2019-05-28 StradVision, Inc. Learning method and testing method for object detector based on R-CNN, and learning device and testing device using the same
CN110119462B (zh) * 2019-04-03 2021-07-23 杭州中科先进技术研究院有限公司 一种属性网络的社区搜索方法
CN110427827A (zh) * 2019-07-08 2019-11-08 辽宁工程技术大学 一种多尺度感知及全局规划下的自主驾驶网络
CN110569972A (zh) * 2019-09-11 2019-12-13 北京百度网讯科技有限公司 超网络的搜索空间构建方法、装置以及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190286984A1 (en) * 2018-03-13 2019-09-19 Google Llc Neural architecture search by proxy
CN110020667A (zh) * 2019-02-21 2019-07-16 广州视源电子科技股份有限公司 神经网络结构的搜索方法、系统、存储介质以及设备
CN109919304A (zh) * 2019-03-04 2019-06-21 腾讯科技(深圳)有限公司 神经网络搜索方法、装置、可读存储介质和计算机设备
CN110533180A (zh) * 2019-07-15 2019-12-03 北京地平线机器人技术研发有限公司 网络结构搜索方法和装置、可读存储介质、电子设备
CN111401517A (zh) * 2020-02-21 2020-07-10 华为技术有限公司 一种感知网络结构搜索方法及其装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4109343A4

Also Published As

Publication number Publication date
EP4109343A1 (en) 2022-12-28
EP4109343A4 (en) 2023-11-08
CN111401517A (zh) 2020-07-10
CN111401517B (zh) 2023-11-03

Similar Documents

Publication Publication Date Title
WO2021164751A1 (zh) 一种感知网络结构搜索方法及其装置
US20220165045A1 (en) Object recognition method and apparatus
WO2020253416A1 (zh) 物体检测方法、装置和计算机存储介质
WO2020221200A1 (zh) 神经网络的构建方法、图像处理方法及装置
WO2021043112A1 (zh) 图像分类方法以及装置
WO2021164750A1 (zh) 一种卷积层量化方法及其装置
WO2022042713A1 (zh) 一种用于计算设备的深度学习训练方法和装置
CN110070107B (zh) 物体识别方法及装置
WO2021238366A1 (zh) 一种神经网络构建方法以及装置
US20220215227A1 (en) Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium
WO2021147325A1 (zh) 一种物体检测方法、装置以及存储介质
WO2022052601A1 (zh) 神经网络模型的训练方法、图像处理方法及装置
WO2021218786A1 (zh) 一种数据处理系统、物体检测方法及其装置
WO2021218517A1 (zh) 获取神经网络模型的方法、图像处理方法及装置
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
WO2021008206A1 (zh) 神经网络结构的搜索方法、图像处理方法和装置
WO2022007867A1 (zh) 神经网络的构建方法和装置
CN113011562A (zh) 一种模型训练方法及装置
CN112464930A (zh) 目标检测网络构建方法、目标检测方法、装置和存储介质
CN114764856A (zh) 图像语义分割方法和图像语义分割装置
WO2023125628A1 (zh) 神经网络模型优化方法、装置及计算设备
WO2021136058A1 (zh) 一种处理视频的方法及装置
US20230401826A1 (en) Perception network and data processing method
CN113065575A (zh) 一种图像处理方法及相关装置
CN116258176A (zh) 一种数据处理方法及其装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21757765

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021757765

Country of ref document: EP

Effective date: 20220921