US20230064692A1 - Network Space Search for Pareto-Efficient Spaces - Google Patents

Network Space Search for Pareto-Efficient Spaces Download PDF

Info

Publication number
US20230064692A1
US20230064692A1 US17/846,007 US202217846007A US2023064692A1 US 20230064692 A1 US20230064692 A1 US 20230064692A1 US 202217846007 A US202217846007 A US 202217846007A US 2023064692 A1 US2023064692 A1 US 2023064692A1
Authority
US
United States
Prior art keywords
network
space
spaces
flops
range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/846,007
Inventor
Hao Yun Chen
Min-Hung Chen
Min-Fong Horng
Yu-Syuan Xu
Hsien-Kai Kuo
Yi-Min Tsai
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MediaTek Inc
Original Assignee
MediaTek Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MediaTek Inc filed Critical MediaTek Inc
Priority to US17/846,007 priority Critical patent/US20230064692A1/en
Assigned to MEDIATEK INC. reassignment MEDIATEK INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHEN, MIN-HUNG, KUO, HSIEN-KAI, CHEN, HAO YUN, HORNG, MIN-FONG, TSAI, YI-MIN, XU, YU-SYUAN
Priority to CN202210799314.7A priority patent/CN115713098A/en
Priority to TW111126458A priority patent/TWI805446B/en
Publication of US20230064692A1 publication Critical patent/US20230064692A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N7/005
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn

Definitions

  • Embodiments of the invention relate to neural networks; more specifically, to automatic searches for network spaces.
  • NAS Neural Architecture Search
  • efficiency considerations for architectures are also required for deploying products on various platforms, such as mobile, augmented reality (AR), and virtual reality (VR) devices.
  • a method for network space search.
  • the method comprises the step of partitioning an expanded search space into a plurality of network spaces.
  • Each network space includes multiple network architectures and is characterized by a first range of network depths and a second range of network widths.
  • the method further comprises the step of evaluating performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function. The evaluated performance is indicated as a probability associated with each network space.
  • the method further comprises the steps of identifying a subset of the network spaces that has highest probabilities, and selecting a target network space from the subset based on model complexity.
  • a system for network space search.
  • the system includes one or more processors, and a memory that stores instructions, when executed by the one or more processors, cause the system to partition an expanded search space into multiple network spaces.
  • Each network space includes a plurality of network architectures and is characterized by a first range of network depths and a second range of network widths.
  • the instructions when executed by the one or more processors, further cause the system to evaluate performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function, wherein the evaluated performance is indicated as a probability associated with each network space; identify a subset of the network spaces that has highest probabilities; and select a target network space from the subset based on model complexity.
  • FIG. 1 is a diagram illustrating an overview of a Network Space Search (NSS) framework according to one embodiment.
  • NSS Network Space Search
  • FIG. 2 is a diagram illustrating a network architecture in Expanded Search Space according to one embodiment.
  • FIG. 3 illustrates a residual block in the network body of a network architecture according to one embodiment.
  • FIG. 4 is a flow diagram illustrating a method for network space search according to one embodiment.
  • FIG. 5 is a flow diagram illustrating a method for network space search according to another embodiment.
  • FIG. 6 is a block diagram illustrating a system operative to perform network space search according to one embodiment.
  • NSS Network Space Search
  • the NSS method is performed automatically on an Expanded Search Space, which is a search space scalable with minimal assumptions in network designs.
  • the NSS method automatically searches for Pareto-efficient network spaces in Expanded Search Space, instead of searching for a single architecture.
  • the search for network spaces takes into account efficiency and computational costs.
  • the NSS method is based upon differentiable approaches and incorporates multi-objectives into the search process to search for network spaces under given complexity constraints.
  • the network spaces output by the NSS method are Pareto-efficient spaces aligned with the Pareto front with respect to performance (e.g., error rates) and complexity (e.g., number of floating-point operations (FLOPs)).
  • Elite Spaces can further serve as NAS search spaces to improve NAS performance.
  • Experimental results using the CIFAR-100 dataset show that NAS searches in Elite Spaces result in an average 2.3% lower error rate and 3.7% closer to target complexity than the baseline (e.g., Expanded Search Space) with around 90% fewer samples required to find satisfactory networks.
  • the NSS method can search for superior spaces from various search spaces with different complexity, showing the applicability in unexplored and untailored spaces.
  • the NSS method automatically searches for favorable network spaces, reducing the human expertise involved in both designing network designs and defining NAS search spaces.
  • FIG. 1 is a diagram illustrating an overview of a Network Space Search (NSS) framework 100 according to one embodiment.
  • the NSS framework 100 executes the aforementioned NSS method.
  • the NSS method searches network spaces from Expanded Search Space 110 based on the feedback from space evaluation 120 .
  • Expanded Search Space 110 includes a large number of network spaces 140 .
  • a novel paradigm is disclosed to estimate the performance of each network space 140 by evaluating the comprised network architectures 130 based on multi-objectives.
  • the discovered network spaces, named Elite Spaces 150 can be further utilized for designing favorable networks and served as search spaces for NAS approaches.
  • Expanded Search Space 110 is a large-scale space with two main properties: automatability (i.e., minimal human expertise) and scalability (i.e. capability of scaling networks). Expanded Search Space 110 serves as a search space for NSS to search for network spaces.
  • FIG. 2 is a diagram illustrating a network architecture 200 in Expanded Search Space (e.g., Expanded Search Space 110 in FIG. 1 ) according to one embodiment.
  • a network architecture in Expanded Search Space includes a stem network 210 , a network body 220 , and a prediction network 230 .
  • the network body 220 defines network computation and determines network performance.
  • a non-limiting example of the stem network 210 is a 3 ⁇ 3 convolution network.
  • a non-limiting example of the prediction network 230 includes global average pooling followed by a fully connected layer.
  • the network body 220 includes N stages (e.g., stage 1 , stage 2 , and stage 3 ), and each stage further includes a sequence of identical blocks based on residual blocks.
  • the degrees of freedom include network depths d i (i.e., number of blocks) and block width w i (i.e., number of channels), where d i ⁇ d max and w i ⁇ w max .
  • Expanded Search Space includes (d max ⁇ w max ) N possible networks in total. Expanded Search Space allows a wide range of candidates in each degree of freedom.
  • FIG. 3 illustrates a residual block 300 in the network body 200 according to one embodiment.
  • the residual block 300 includes two 3 ⁇ 3 convolution sub-blocks, and each convolution sub-block is followed by BatchNorm (BN) and ReLU.
  • BN BatchNorm
  • ReLU ReLU
  • the block parameters, depth d i and width w i are discovered by the NSS framework.
  • Expanded Search Space is much more complex than the conventional NAS search spaces in terms of difficulty in the selections among candidates. This is because there are d max possible blocks in network depths and w max possible channels in network widths. Moreover, Expanded Search Space can be potentially extended by replacing it with more sophisticated building blocks (e.g., complex bottleneck blocks). Thus, Expanded Search Space meets the goals of scalability in network designs and automatability with minimal human expertise.
  • NSS is formulated as a differentiable problem of searching for an entire network space:
  • A* ⁇ A is obtained from A along with its weights w A* to achieve minimal loss (A*, w A* ).
  • A is a space without any prior knowledge imposed in network designs (e.g., Expanded Search Space).
  • probability sampling is adopted and Objective (1) is rewritten to:
  • a super network is a network with d max blocks in each stage, and w max channels in each block. More specifically, from a sampled space A ⁇ A in (2), architectures a ⁇ A are sampled to evaluate the expected loss of A. Therefore, Objective (2) is further extended accordingly:
  • A can be represented with the components in Expanded Search Space.
  • Expanded Search Space is composed of searchable network depths d i and widths w i
  • the NSS method searches for network spaces that satisfy a multi-objective loss function for further use in designing networks or defining NAS search spaces. In this way, the searched spaces enable downstream tasks to reduce the effort of refining tradeoffs and concentrate on fine-grained objectives instead.
  • the NSS method discovers networks with satisfactory tradeoffs between accuracy and model complexity.
  • the multi-objectives search incorporates model complexity in terms of FLOPs into Objective (1) to search for network spaces fulfilling the constraints.
  • the FLOPs loss is defined as:
  • the NSS method produces the network spaces satisfying a multi-objective loss function.
  • Elite Spaces are derived from the optimized probability distribution P ⁇ after the searching process. From P ⁇ , the n spaces having the highest probabilities are sampled. The one space that is closest to the FLOPs constraint is selected as Elite Space.
  • weight sharing techniques can be adopted in two aspects: 1) the masking techniques can be adopted to simulate various numbers of blocks and channels by sharing a portion of the super components. 2) To ensure well-trained super networks, warmup techniques can be applied to both block and channel search.
  • Expanded Search Space includes a wide range of possible network depths and widths, simply enumerating each candidate is memory-prohibited for either the kernels with various channel sizes or the stages with various block sizes.
  • a masking technique can be used to efficiently search for channel sizes and block depths.
  • a single super kernel is constructed with the largest possible number of channels (i.e., w max ). Smaller channel sizes w ⁇ w max is simulated by retaining the first w channels and zeroing out the remaining ones.
  • a single deepest stage with the largest possible number of blocks i.e., d max
  • d max shallower block sizes d ⁇ d max are simulated by taking the output of the dth block as the output of the corresponding stage.
  • the masking technique achieves the lower bound of memory consumption and more importantly, is differential-friendly.
  • a super network in Expanded Search Space is constructed to have d max blocks in each stage and w max channels in each convolutional kernel.
  • Super network weights need to be sufficiently well-trained to ensure reliable performance estimation of each candidate network space. Therefore, several warmup techniques can be used to improve the quality of super network weights. For example, in the first 25% of epochs, only the network weights are updated and network space search is disabled since network weights cannot appropriately guide the searching process in the early period.
  • Each network space in the Expanded Search Space is defined as a continuous range of network depths and widths for simplicity.
  • a searching process is performed on the 2 18 network spaces, with each network space assigned a probability according to a probability distribution. The probability assigned to each network space is updated by gradient descent.
  • the network architectures in the n spaces are sampled.
  • the network space having a FLOPs count closest to a predetermined FLOPs constraint is chosen as Elite Space.
  • the images in each of CIFAR-10 and CIFAR-100 datasets are equally split into a training set and a validation set. These two sets are used for training the super network and searching for network spaces, respectively.
  • the batch size is set to 64.
  • the searching process lasts for 50 epochs where the first 15 ones are reserved for warmup.
  • the temperature for Gumbel-Softmax is initialed to 5 and linearly annealed down to 0.001 throughout the searching process.
  • the search cost for a single run of the NSS process is roughly 0.5 days under the above settings, and the subsequent NAS performed on Expanded Search Space and Elite Spaces requires 0.5 days and merely several hours to complete a searching process, respectively.
  • the performance of Elite Spaces is evaluated by the performance of their comprised architectures.
  • the NSS method sustainably discovers promising network spaces across different FLOPs constraints in both CIFAR-10 and CIFAR-100 datasets.
  • Elite Spaces achieve satisfactory tradeoffs between the error rates and meeting the FLOPs constraints, and are aligned with the Pareto front of Expanded Search Space. Since Elite Spaces discovered by the NSS method are guaranteed to consist of superior networks provided in various FLOPs regimes, they can be utilized for designing promising networks. More importantly, Elite Spaces are searched by NSS automatically, therefore the human effort involved in network designs is significantly reduced.
  • FIG. 4 is a flow diagram illustrating a method 400 for network space search according to one embodiment.
  • the method 400 may be performed by a computing system, such as a system 600 to be described with reference to FIG. 6 .
  • the system at step 410 partitions an expanded search space into multiple network spaces, with each network space including multiple network architectures.
  • Each network space is characterized by a first range of network depths and a second range of network widths.
  • the system at step 420 evaluates the performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function. The evaluated performance is indicated as a probability associated with each network space.
  • the system at step 430 identifies a subset of the network spaces that has the highest probabilities.
  • the system at step 440 selects a target network space from the subset based on model complexity. In one embodiment, the target network space selected at step 440 is referred to as Elite network space.
  • FIG. 5 is a flow diagram illustrating a method 500 for network space search according to another embodiment, which may be an example of the method 400 in FIG. 4 .
  • the method 500 may be performed by a computing system, such as a system 600 to be described with reference to FIG. 6 .
  • the system at step 510 constructs and trains a super network in an expanded search space.
  • the system at step 520 partitions the expanded search space into multiple network spaces, and assigns a probability to each network space.
  • Steps 530 and 540 are repeated for multiple samples of a network space, and are also repeated for all network spaces.
  • the system at step 530 randomly samples a network architecture in each network space using at least a portion of the super network's weights.
  • the system at step 540 updates the network space's probability based on the performance of the sampled network architecture.
  • the performance may be measured by the aforementioned multi-objective loss function.
  • Gumbel-Softmax may be used to calculate a gradient vector of the probability of each network space. Gumbel-Softmax enables parallel optimization of subspace optimization and network optimization to reduce computational cost.
  • the system at step 550 identifies n network spaces with the highest probabilities.
  • the system samples network architectures in the n network spaces and chooses a network space having a FLOPS count closest to a predetermined FLOPS constraint as Elite Space.
  • FIG. 6 is a block diagram illustrating a system 600 operative to perform network space search according to one embodiment.
  • the system 600 includes processing hardware 610 which further includes one or more processors 630 such as central processing units (CPUs), graphics processing units (GPUs), digital processing units (DSPs), neural processing units (NPUs) 635 , field-programmable gate arrays (FPGAs), and other general-purpose processors and/or special-purpose processors.
  • processors 630 such as central processing units (CPUs), graphics processing units (GPUs), digital processing units (DSPs), neural processing units (NPUs) 635 , field-programmable gate arrays (FPGAs), and other general-purpose processors and/or special-purpose processors.
  • the processing hardware 610 is coupled to a memory 620 , which may include memory devices such as dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, and other non-transitory machine-readable storage media; e.g., volatile or non-volatile memory devices.
  • DRAM dynamic random access memory
  • SRAM static random access memory
  • flash memory and other non-transitory machine-readable storage media; e.g., volatile or non-volatile memory devices.
  • the memory 620 is represented as one block; however, it is understood that the memory 620 may represent a hierarchy of memory components such as cache memory, system memory, solid-state or magnetic storage devices, etc.
  • the processing hardware 610 executes instructions stored in the memory 620 to perform operating system functionalities and run user applications.
  • the memory 620 may store NSS parameters 625 , which may be used by method 400 in FIG. 4 and method 500 in FIG. 5 to execute network space searches.
  • the memory 620 may store instructions which, when executed by the processing hardware 610 , cause the processing hardware 610 to perform image refinement operations according to method 400 in FIG. 4 and method 500 in FIG. 5 .
  • FIGS. 4 and 5 have been described with reference to the exemplary embodiment of FIG. 6 .
  • the operations of the flow diagrams of FIGS. 4 and 5 can be performed by embodiments of the invention other than the embodiment of FIG. 6 and the embodiment of FIG. 6 can perform operations different than those discussed with reference to the flow diagram.
  • the flow diagrams of FIGS. 4 and 5 show a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

According to a network space search method, an expanded search space is partitioned into multiple network spaces. Each network space includes a plurality of network architectures and is characterized by a first range of network depths and a second range of network widths. The performance of the network spaces is evaluated by sampling respective network architectures with respect to a multi-objective loss function. The evaluated performance is indicated as a probability associated with each network space. The method then identifies a subset of the network spaces that has the highest probabilities, and selects a target network space from the subset based on model complexity.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of U.S. Provisional Application No. 63/235,221 filed on Aug. 20, 2021, the entirety of which is incorporated by reference herein.
  • TECHNICAL FIELD
  • Embodiments of the invention relate to neural networks; more specifically, to automatic searches for network spaces.
  • BACKGROUND
  • Recent architectural advances in deep convolutional neural networks consider several factors for network designs (e.g., types of convolutions, network depths, filter sizes, etc.), which are combined to form a network space. One can leverage such network spaces to design favorable networks or utilize them as the search spaces for Neural Architecture Search (NAS). In industry, efficiency considerations for architectures are also required for deploying products on various platforms, such as mobile, augmented reality (AR), and virtual reality (VR) devices.
  • Design spaces have lately been demonstrated to be a decisive factor in designing networks. Accordingly, several design principles are proposed to deliver promising networks. However, these design principles are based on human expertise and require extensive experiments for validation. In contrast to handcrafted designs, NAS automatically searches for favorable architectures within a predefined search space. The choice of the search space is a critical factor affecting the performance and efficiency of NAS approaches. It is common to reuse tailored search spaces developed in previous works. However, these approaches ignore the potential of exploring untailored spaces. On the other hand, defining a new, effective search space involves tremendous prior knowledge and/or manual effort. Hence, there is a need for automatic network space discovery.
  • SUMMARY
  • In one embodiment, a method is provided for network space search. The method comprises the step of partitioning an expanded search space into a plurality of network spaces. Each network space includes multiple network architectures and is characterized by a first range of network depths and a second range of network widths. The method further comprises the step of evaluating performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function. The evaluated performance is indicated as a probability associated with each network space. The method further comprises the steps of identifying a subset of the network spaces that has highest probabilities, and selecting a target network space from the subset based on model complexity.
  • In another embodiment, a system is provided for network space search. The system includes one or more processors, and a memory that stores instructions, when executed by the one or more processors, cause the system to partition an expanded search space into multiple network spaces. Each network space includes a plurality of network architectures and is characterized by a first range of network depths and a second range of network widths. The instructions, when executed by the one or more processors, further cause the system to evaluate performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function, wherein the evaluated performance is indicated as a probability associated with each network space; identify a subset of the network spaces that has highest probabilities; and select a target network space from the subset based on model complexity.
  • Other aspects and features will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that different references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
  • FIG. 1 is a diagram illustrating an overview of a Network Space Search (NSS) framework according to one embodiment.
  • FIG. 2 is a diagram illustrating a network architecture in Expanded Search Space according to one embodiment.
  • FIG. 3 illustrates a residual block in the network body of a network architecture according to one embodiment.
  • FIG. 4 is a flow diagram illustrating a method for network space search according to one embodiment.
  • FIG. 5 is a flow diagram illustrating a method for network space search according to another embodiment.
  • FIG. 6 is a block diagram illustrating a system operative to perform network space search according to one embodiment.
  • DETAILED DESCRIPTION
  • In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures, and techniques have not been shown in detail in order not to obscure the understanding of this description. It will be appreciated, however, by one skilled in the art, that the invention may be practiced without such specific details. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
  • A method and a system are provided for Network Space Search (NSS). The NSS method is performed automatically on an Expanded Search Space, which is a search space scalable with minimal assumptions in network designs. The NSS method automatically searches for Pareto-efficient network spaces in Expanded Search Space, instead of searching for a single architecture. The search for network spaces takes into account efficiency and computational costs. The NSS method is based upon differentiable approaches and incorporates multi-objectives into the search process to search for network spaces under given complexity constraints.
  • The network spaces output by the NSS method, named Elite Spaces, are Pareto-efficient spaces aligned with the Pareto front with respect to performance (e.g., error rates) and complexity (e.g., number of floating-point operations (FLOPs)). Moreover, Elite Spaces can further serve as NAS search spaces to improve NAS performance. Experimental results using the CIFAR-100 dataset show that NAS searches in Elite Spaces result in an average 2.3% lower error rate and 3.7% closer to target complexity than the baseline (e.g., Expanded Search Space) with around 90% fewer samples required to find satisfactory networks. Finally, the NSS method can search for superior spaces from various search spaces with different complexity, showing the applicability in unexplored and untailored spaces. The NSS method automatically searches for favorable network spaces, reducing the human expertise involved in both designing network designs and defining NAS search spaces.
  • FIG. 1 is a diagram illustrating an overview of a Network Space Search (NSS) framework 100 according to one embodiment. The NSS framework 100 executes the aforementioned NSS method. During the network space searching process, the NSS method searches network spaces from Expanded Search Space 110 based on the feedback from space evaluation 120. Expanded Search Space 110 includes a large number of network spaces 140. A novel paradigm is disclosed to estimate the performance of each network space 140 by evaluating the comprised network architectures 130 based on multi-objectives. The discovered network spaces, named Elite Spaces 150, can be further utilized for designing favorable networks and served as search spaces for NAS approaches.
  • Expanded Search Space 110 is a large-scale space with two main properties: automatability (i.e., minimal human expertise) and scalability (i.e. capability of scaling networks). Expanded Search Space 110 serves as a search space for NSS to search for network spaces.
  • FIG. 2 is a diagram illustrating a network architecture 200 in Expanded Search Space (e.g., Expanded Search Space 110 in FIG. 1 ) according to one embodiment. A network architecture in Expanded Search Space includes a stem network 210, a network body 220, and a prediction network 230. The network body 220 defines network computation and determines network performance. A non-limiting example of the stem network 210 is a 3×3 convolution network. A non-limiting example of the prediction network 230 includes global average pooling followed by a fully connected layer. In one embodiment, the network body 220 includes N stages (e.g., stage 1, stage 2, and stage 3), and each stage further includes a sequence of identical blocks based on residual blocks. For each stage i (≤N), the degrees of freedom include network depths di (i.e., number of blocks) and block width wi (i.e., number of channels), where di≤dmax and wi≤wmax. Thus, Expanded Search Space includes (dmax×wmax)N possible networks in total. Expanded Search Space allows a wide range of candidates in each degree of freedom.
  • FIG. 3 illustrates a residual block 300 in the network body 200 according to one embodiment. The residual block 300 includes two 3×3 convolution sub-blocks, and each convolution sub-block is followed by BatchNorm (BN) and ReLU. The block parameters, depth di and width wi, are discovered by the NSS framework.
  • Expanded Search Space is much more complex than the conventional NAS search spaces in terms of difficulty in the selections among candidates. This is because there are dmax possible blocks in network depths and wmax possible channels in network widths. Moreover, Expanded Search Space can be potentially extended by replacing it with more sophisticated building blocks (e.g., complex bottleneck blocks). Thus, Expanded Search Space meets the goals of scalability in network designs and automatability with minimal human expertise.
  • After defining Expanded Search Space, the following question is addressed: how to search for network spaces given Expanded Search Space? To answer this, NSS is formulated as a differentiable problem of searching for an entire network space:
  • min 𝒜 𝔸 min w 𝒜 ( 𝒜 , w 𝒜 ) ( 1 )
  • where the optimal network space A*ϵ A is obtained from A along with its weights wA* to achieve minimal loss
    Figure US20230064692A1-20230302-P00001
    (A*, wA*). Here A is a space without any prior knowledge imposed in network designs (e.g., Expanded Search Space). To reduce the computational cost, probability sampling is adopted and Objective (1) is rewritten to:
  • min Θ min w 𝒜 E 𝒜 ~ P Θ , 𝒜 𝔸 [ ( 𝒜 , w 𝒜 ) ] ( 2 )
  • where Θ contains parameters for sampling spaces A ϵ A. Although Objective (2), which is relaxed from Objective (1), can be used for optimization, the estimation of expected loss for each space A is still lacking. To solve this, distributional sampling is adopted to optimize (2) for the inference of super networks. A super network is a network with dmax blocks in each stage, and wmax channels in each block. More specifically, from a sampled space A ϵ A in (2), architectures a ϵ A are sampled to evaluate the expected loss of A. Therefore, Objective (2) is further extended accordingly:
  • min Θ min w 𝒜 E 𝒜 ~ P Θ , 𝒜 𝔸 [ E a ~ P θ , a 𝒜 [ ( a , w a ) ] ] ( 3 )
  • where Pθ is a uniform distribution and θ contains parameters that determine the sampling probability Pθ of each architecture a. Objective (3) is to be optimized for network space search, and the evaluation of expected loss of a sampled space is based on (3) as well.
  • Instead of regarding a network space A as a set of individual architectures, A can be represented with the components in Expanded Search Space. Recalling that Expanded Search Space is composed of searchable network depths di and widths wi, a network space A can therefore be viewed as a subset of all possible numbers of blocks and channels. More formally, the network space is expressed as A={di A ϵ d, wi A ϵ w}i=1 N, where d={1, 2, . . . , dmax}, w={1, 2, . . . , wmax}, and di A and wi A respectively denote the set of possible numbers of blocks and channels in A. After the searching process, di A and wi A are retained to represent the discovered network space.
  • The NSS method searches for network spaces that satisfy a multi-objective loss function for further use in designing networks or defining NAS search spaces. In this way, the searched spaces enable downstream tasks to reduce the effort of refining tradeoffs and concentrate on fine-grained objectives instead. In one embodiment, the NSS method discovers networks with satisfactory tradeoffs between accuracy and model complexity. The multi-objectives search incorporates model complexity in terms of FLOPs into Objective (1) to search for network spaces fulfilling the constraints. The FLOPs loss is defined as:

  • Figure US20230064692A1-20230302-P00001
    FLOPs(
    Figure US20230064692A1-20230302-P00002
    )=|FLOPs(
    Figure US20230064692A1-20230302-P00002
    )/FLOPstarget−1|  (4)
  • where |·| denotes the absolute function and FLOPstarget is the FLOPs constraint to be satisfied. The multi-objective losses are combined by weighted summation, and therefore
    Figure US20230064692A1-20230302-P00001
    in (1) can be replaced with the following equation:

  • Figure US20230064692A1-20230302-P00001
    (
    Figure US20230064692A1-20230302-P00002
    , w
    Figure US20230064692A1-20230302-P00002
    )=
    Figure US20230064692A1-20230302-P00001
    task(
    Figure US20230064692A1-20230302-P00002
    , w
    Figure US20230064692A1-20230302-P00002
    )+λ
    Figure US20230064692A1-20230302-P00001
    FLOPs(
    Figure US20230064692A1-20230302-P00002
    )   (5)
  • where
    Figure US20230064692A1-20230302-P00001
    task is the ordinary task-specific loss in (1), which can be optimized with (3) in practice, and λ is the hyperparameter controlling the strength of FLOPs constraint.
  • By optimizing (5), the NSS method produces the network spaces satisfying a multi-objective loss function. Elite Spaces are derived from the optimized probability distribution PΘ after the searching process. From PΘ, the n spaces having the highest probabilities are sampled. The one space that is closest to the FLOPs constraint is selected as Elite Space.
  • To improve the efficiency of the NSS framework, weight sharing techniques can be adopted in two aspects: 1) the masking techniques can be adopted to simulate various numbers of blocks and channels by sharing a portion of the super components. 2) To ensure well-trained super networks, warmup techniques can be applied to both block and channel search.
  • As Expanded Search Space includes a wide range of possible network depths and widths, simply enumerating each candidate is memory-prohibited for either the kernels with various channel sizes or the stages with various block sizes. A masking technique can be used to efficiently search for channel sizes and block depths. A single super kernel is constructed with the largest possible number of channels (i.e., wmax). Smaller channel sizes w≤wmax is simulated by retaining the first w channels and zeroing out the remaining ones. Moreover, a single deepest stage with the largest possible number of blocks (i.e., dmax) is constructed, and shallower block sizes d≤dmax are simulated by taking the output of the dth block as the output of the corresponding stage. The masking technique achieves the lower bound of memory consumption and more importantly, is differential-friendly.
  • To provide the maximum flexibility in network space search, a super network in Expanded Search Space is constructed to have dmax blocks in each stage and wmax channels in each convolutional kernel. Super network weights need to be sufficiently well-trained to ensure reliable performance estimation of each candidate network space. Therefore, several warmup techniques can be used to improve the quality of super network weights. For example, in the first 25% of epochs, only the network weights are updated and network space search is disabled since network weights cannot appropriately guide the searching process in the early period.
  • The following description provides a non-limiting example of an experimental setup for NSS. A super network in Expanded Search Space is constructed to have dmax=16 blocks in each stage and wmax=512 channels in each convolutional kernel of all 3 stages. Each network space in the Expanded Search Space is defined as a continuous range of network depths and widths for simplicity. As an example, each network space includes 4 and 32 possible blocks and channels, respectively, and therefore Expanded Search Space results in (16/4)3×(512/32)3=218 possible network spaces. A searching process is performed on the 218 network spaces, with each network space assigned a probability according to a probability distribution. The probability assigned to each network space is updated by gradient descent. The top n network spaces having the highest probabilities are selected for further evaluation; e.g., n=5. In one embodiment, the network architectures in the n spaces are sampled. The network space having a FLOPs count closest to a predetermined FLOPs constraint is chosen as Elite Space.
  • The images in each of CIFAR-10 and CIFAR-100 datasets are equally split into a training set and a validation set. These two sets are used for training the super network and searching for network spaces, respectively. The batch size is set to 64. The searching process lasts for 50 epochs where the first 15 ones are reserved for warmup. The temperature for Gumbel-Softmax is initialed to 5 and linearly annealed down to 0.001 throughout the searching process. The search cost for a single run of the NSS process is roughly 0.5 days under the above settings, and the subsequent NAS performed on Expanded Search Space and Elite Spaces requires 0.5 days and merely several hours to complete a searching process, respectively.
  • The performance of Elite Spaces is evaluated by the performance of their comprised architectures. The NSS method sustainably discovers promising network spaces across different FLOPs constraints in both CIFAR-10 and CIFAR-100 datasets. Elite Spaces achieve satisfactory tradeoffs between the error rates and meeting the FLOPs constraints, and are aligned with the Pareto front of Expanded Search Space. Since Elite Spaces discovered by the NSS method are guaranteed to consist of superior networks provided in various FLOPs regimes, they can be utilized for designing promising networks. More importantly, Elite Spaces are searched by NSS automatically, therefore the human effort involved in network designs is significantly reduced.
  • FIG. 4 is a flow diagram illustrating a method 400 for network space search according to one embodiment. The method 400 may be performed by a computing system, such as a system 600 to be described with reference to FIG. 6 . The system at step 410 partitions an expanded search space into multiple network spaces, with each network space including multiple network architectures. Each network space is characterized by a first range of network depths and a second range of network widths.
  • The system at step 420 evaluates the performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function. The evaluated performance is indicated as a probability associated with each network space. The system at step 430 identifies a subset of the network spaces that has the highest probabilities. The system at step 440 selects a target network space from the subset based on model complexity. In one embodiment, the target network space selected at step 440 is referred to as Elite network space.
  • FIG. 5 is a flow diagram illustrating a method 500 for network space search according to another embodiment, which may be an example of the method 400 in FIG. 4 . The method 500 may be performed by a computing system, such as a system 600 to be described with reference to FIG. 6 . The system at step 510 constructs and trains a super network in an expanded search space. The system at step 520 partitions the expanded search space into multiple network spaces, and assigns a probability to each network space. Steps 530 and 540 are repeated for multiple samples of a network space, and are also repeated for all network spaces. The system at step 530 randomly samples a network architecture in each network space using at least a portion of the super network's weights. The system at step 540 updates the network space's probability based on the performance of the sampled network architecture. The performance may be measured by the aforementioned multi-objective loss function. Furthermore, Gumbel-Softmax may be used to calculate a gradient vector of the probability of each network space. Gumbel-Softmax enables parallel optimization of subspace optimization and network optimization to reduce computational cost. The system at step 550 identifies n network spaces with the highest probabilities. At step 560, the system samples network architectures in the n network spaces and chooses a network space having a FLOPS count closest to a predetermined FLOPS constraint as Elite Space.
  • FIG. 6 is a block diagram illustrating a system 600 operative to perform network space search according to one embodiment. The system 600 includes processing hardware 610 which further includes one or more processors 630 such as central processing units (CPUs), graphics processing units (GPUs), digital processing units (DSPs), neural processing units (NPUs) 635, field-programmable gate arrays (FPGAs), and other general-purpose processors and/or special-purpose processors.
  • The processing hardware 610 is coupled to a memory 620, which may include memory devices such as dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, and other non-transitory machine-readable storage media; e.g., volatile or non-volatile memory devices. To simplify the illustration, the memory 620 is represented as one block; however, it is understood that the memory 620 may represent a hierarchy of memory components such as cache memory, system memory, solid-state or magnetic storage devices, etc. The processing hardware 610 executes instructions stored in the memory 620 to perform operating system functionalities and run user applications. For example, the memory 620 may store NSS parameters 625, which may be used by method 400 in FIG. 4 and method 500 in FIG. 5 to execute network space searches.
  • In some embodiments, the memory 620 may store instructions which, when executed by the processing hardware 610, cause the processing hardware 610 to perform image refinement operations according to method 400 in FIG. 4 and method 500 in FIG. 5 .
  • The operations of the flow diagrams of FIGS. 4 and 5 have been described with reference to the exemplary embodiment of FIG. 6 . However, it should be understood that the operations of the flow diagrams of FIGS. 4 and 5 can be performed by embodiments of the invention other than the embodiment of FIG. 6 and the embodiment of FIG. 6 can perform operations different than those discussed with reference to the flow diagram. While the flow diagrams of FIGS. 4 and 5 show a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).
  • Various functional components, blocks, or modules have been described herein. As will be appreciated by persons skilled in the art, the functional blocks or modules may be implemented through circuits (either dedicated circuits or general-purpose circuits, which operate under the control of one or more processors and coded instructions), which will typically comprise transistors that are configured in such a way as to control the operation of the circuity in accordance with the functions and operations described herein.
  • While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, and can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.

Claims (20)

What is claimed is:
1. A method for network space search, comprising:
partitioning an expanded search space into a plurality of network spaces, wherein each network space includes a plurality of network architectures and is characterized by a first range of network depths and a second range of network widths;
evaluating performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function, wherein the evaluated performance is indicated as a probability associated with each network space;
identifying a subset of the network spaces that has highest probabilities; and
selecting a target network space from the subset based on model complexity.
2. The method of claim 1, wherein each network architecture in the expanded network space includes a stem network to receive an input, a prediction network to generate an output, and a network body that includes the predetermined number of stages.
3. The method of claim 1, wherein the multi-objective loss function includes a task-specific loss function and a model complexity function.
4. The method of claim 3, wherein the model complexity function calculates complexity of a network architecture in terms of the number of floating-point operations (FLOPs).
5. The method of claim 3, wherein the model complexity function calculates a ratio of a network architecture's floating-point operations (FLOPs) to a predetermined FLOPs constraint.
6. The method of claim 1, wherein selecting the target network space further comprises:
choosing the target network space that has a floating-point operations (FLOPs) count closest to a predetermined FLOPS constraint.
7. The method of claim 1, wherein each network architecture includes a predetermined number of stages, each stage including d blocks and each block including w channels, wherein each network space is characterized by a first range of d values and a second range of w values.
8. The method of claim 1, wherein each block is a residual block including two convolution sub-blocks.
9. The method of claim 1, further comprising:
training a super network with a maximum network depth and a maximum network width to obtain weights; and
sampling the network architectures in each network space using at least a portion of the weights of the super network.
10. The method of claim 1, wherein evaluating the performance further comprises:
optimizing a probability distribution over the network spaces.
11. A system operative to perform network space search, comprising:
one or more processors; and
memory to store instructions, when executed by the one or more processors, cause the system to:
partition an expanded search space into a plurality of network spaces, wherein each network space includes a plurality of network architectures and is characterized by a first range of network depths and a second range of network widths;
evaluate performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function, wherein the evaluated performance is indicated as a probability associated with each network space;
identify a subset of the network spaces that has highest probabilities; and
select a target network space from the subset based on model complexity.
12. The system of claim 11, wherein each network architecture in the expanded network space includes a stem network to receive an input, a prediction network to generate an output, and a network body that includes the predetermined number of stages.
13. The system of claim 11, wherein the multi-objective loss function includes a task-specific loss function and a model complexity function.
14. The system of claim 13, wherein the model complexity function calculates complexity of a network architecture in terms of the number of floating-point operations (FLOPs).
15. The system of claim 13, wherein the model complexity function calculates a ratio of a network architecture's floating-point operations (FLOPs) to a predetermined FLOPs constraint.
16. The system of claim 11, wherein the instructions, when executed by the one or more processors, cause the system to:
choose the target network space that has a floating-point operations (FLOPs) count closest to a predetermined FLOPS constraint.
17. The system of claim 11, wherein each network architecture includes a predetermined number of stages, each stage including d blocks and each block including w channels, wherein each network space is characterized by a first range of d values and a second range of w values.
18. The system of claim 11, wherein each block is a residual block including two convolution sub-blocks.
19. The system of claim 11, wherein the instructions, when executed by the one or more processors, cause the system to:
train a super network with a maximum network depth and a maximum network width to obtain weights; and
sample the network architectures in each network space using at least a portion of the weights of the super network.
20. The system of claim 11, wherein the instructions, when executed by the one or more processors, cause the system to:
optimize a probability distribution over the network spaces.
US17/846,007 2021-08-20 2022-06-22 Network Space Search for Pareto-Efficient Spaces Pending US20230064692A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US17/846,007 US20230064692A1 (en) 2021-08-20 2022-06-22 Network Space Search for Pareto-Efficient Spaces
CN202210799314.7A CN115713098A (en) 2021-08-20 2022-07-06 Method and system for performing a cyber space search
TW111126458A TWI805446B (en) 2021-08-20 2022-07-14 Method and system for network space search

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163235221P 2021-08-20 2021-08-20
US17/846,007 US20230064692A1 (en) 2021-08-20 2022-06-22 Network Space Search for Pareto-Efficient Spaces

Publications (1)

Publication Number Publication Date
US20230064692A1 true US20230064692A1 (en) 2023-03-02

Family

ID=85230492

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/846,007 Pending US20230064692A1 (en) 2021-08-20 2022-06-22 Network Space Search for Pareto-Efficient Spaces

Country Status (3)

Country Link
US (1) US20230064692A1 (en)
CN (1) CN115713098A (en)
TW (1) TWI805446B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10496927B2 (en) * 2014-05-23 2019-12-03 DataRobot, Inc. Systems for time-series predictive data analytics, and related methods and apparatus
GB2606674B (en) * 2016-10-21 2023-06-28 Datarobot Inc System for predictive data analytics, and related methods and apparatus
CN110677433B (en) * 2019-10-23 2022-02-22 杭州安恒信息技术股份有限公司 Method, system, equipment and readable storage medium for predicting network attack
CN112784954A (en) * 2019-11-08 2021-05-11 华为技术有限公司 Method and device for determining neural network
CN112418392A (en) * 2020-10-21 2021-02-26 华为技术有限公司 Neural network construction method and device

Also Published As

Publication number Publication date
CN115713098A (en) 2023-02-24
TW202310588A (en) 2023-03-01
TWI805446B (en) 2023-06-11

Similar Documents

Publication Publication Date Title
Vincent et al. Sparse group lasso and high dimensional multinomial classification
CN111882040B (en) Convolutional neural network compression method based on channel number search
US20160342888A1 (en) Memory efficiency for convolutional neural networks operating on graphics processing units
US11763150B2 (en) Method and system for balanced-weight sparse convolution processing
CN110414630A (en) The training method of neural network, the accelerated method of convolutional calculation, device and equipment
US20210224650A1 (en) Method for differentiable architecture search based on a hierarchical grouping mechanism
CN113269312B (en) Model compression method and system combining quantization and pruning search
EP4118583A1 (en) Edge message passing neural network
CN114422382A (en) Network flow prediction method, computer device, product and storage medium
KR20210060980A (en) Apparatus and method for pruning for neural network with multi-sparsity level
Kim et al. Fine-grained neural architecture search
Shin et al. A pragmatic approach to on-device incremental learning system with selective weight updates
US20220207374A1 (en) Mixed-granularity-based joint sparse method for neural network
Risso et al. Channel-wise mixed-precision assignment for dnn inference on constrained edge nodes
Salem et al. Sequential dimension reduction for learning features of expensive black-box functions
US20230064692A1 (en) Network Space Search for Pareto-Efficient Spaces
US20210383188A1 (en) System and method for differential architecture search for neural networks
Wang et al. EASNet: searching elastic and accurate network architecture for stereo matching
Xie et al. Scalenet: Searching for the model to scale
CN112149805B (en) Acceleration and compression method and system of deep neural network based on frame search
Chitty-Venkata et al. Array-aware neural architecture search
CN110858311B (en) Deep nonnegative matrix factorization-based link prediction method and system
US20240046078A1 (en) Desparsified convolution for sparse activations
CN118196600B (en) Neural architecture searching method and system based on differential evolution algorithm
Kim et al. Fine-grained neural architecture search for image super-resolution

Legal Events

Date Code Title Description
AS Assignment

Owner name: MEDIATEK INC., TAIWAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHEN, HAO YUN;CHEN, MIN-HUNG;HORNG, MIN-FONG;AND OTHERS;SIGNING DATES FROM 20220606 TO 20220620;REEL/FRAME:060392/0231

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION