WO2024040941A1 - 神经网络结构搜索方法、装置及存储介质 - Google Patents

神经网络结构搜索方法、装置及存储介质 Download PDF

Info

Publication number
WO2024040941A1
WO2024040941A1 PCT/CN2023/081678 CN2023081678W WO2024040941A1 WO 2024040941 A1 WO2024040941 A1 WO 2024040941A1 CN 2023081678 W CN2023081678 W CN 2023081678W WO 2024040941 A1 WO2024040941 A1 WO 2024040941A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
model
network models
target
models
Prior art date
Application number
PCT/CN2023/081678
Other languages
English (en)
French (fr)
Inventor
田奇
尹浩然
彭君然
谢凌曦
张兆翔
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024040941A1 publication Critical patent/WO2024040941A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This application relates to the field of artificial intelligence (AI), and in particular to a neural network structure search method, device and storage medium.
  • AI artificial intelligence
  • NAS Neural network structure search
  • CNN convolutional neural network
  • a neural network structure search method, device and storage medium are proposed.
  • the search space indicates the correspondence between multiple model scales and multiple first neural network models.
  • the respective model structure parameters of the neural network models are different.
  • the model structure parameters indicate the ratio between at least two neural network structures of the first neural network model, which can be determined to meet the current goal under the target constraints that limit the model scale.
  • the target neural network model of constraint conditions that is, the ideal hybrid neural network structure is automatically searched, which improves the effect of neural network structure search.
  • embodiments of the present application provide a neural network structure search method, which method includes:
  • target constraints indicating that the target instance is a model scale limited by running the neural network model
  • the search space indicates a correspondence between a plurality of model scales and the plurality of first neural network models, and the plurality of first neural network models Respective model structure parameters are different, and the model structure parameters indicate a ratio between at least two neural network structures of the first neural network model;
  • a target neural network model that meets the target constraint is selected from the plurality of first neural network models.
  • the embodiment of the present application pre-constructs a search space.
  • the search space indicates the correspondence between multiple model scales and multiple first neural network models.
  • the respective model structure parameters of the multiple first neural network models are Differently, the model structure parameter indicates the ratio between at least two neural network structures of the first neural network model, so that under the target constraints that limit the model scale, it is possible to automatically search among multiple first neural network models to find the current
  • the target neural network model with target constraints that is, searching for an ideal hybrid neural network structure and a ratio between at least two mixed neural network structures, improves the effect of neural network structure search.
  • selecting a target neural network model that meets the target constraints from the plurality of first neural network models includes:
  • the at least two second neural network models are trained on the target data set corresponding to the target task to obtain evaluation parameters corresponding to each of the at least two second neural network models, and the evaluation parameters indicate the second neural network model. How well the network model matches the target task;
  • the target neural network model is determined according to the corresponding evaluation parameters of the at least two second neural network models.
  • the at least two second neural network models that meet the target constraints are selected from multiple first neural network models
  • the at least two second neural network models are applied on the target data set corresponding to the target task.
  • Perform training to obtain evaluation parameters corresponding to at least two second neural network models, and determine the target neural network model based on the evaluation parameters corresponding to at least two second neural network models, that is, under the target constraints that limit the model scale,
  • the optimal combination under the model scale, that is, the target neural network model can be automatically searched, further improving the efficiency of neural network structure search.
  • the method before obtaining a plurality of pre-trained first neural network models according to the search space, the method further includes:
  • the search space of a super network includes a plurality of neural network models sharing respective model structure parameters, the search space also indicates that some neural network models in the super network are the plurality of first Neural network model.
  • a search space of a super network is constructed.
  • the super network includes multiple neural network models that share their respective model structure parameters.
  • the search space also indicates that some neural network models in the super network are multiple first neural network models. , enabling subsequent neural network structure search based on multiple first neural network models indicated by the search space, further improving the efficiency of neural network structure search.
  • the at least two neural network structures include a CNN structure and a Transformer structure.
  • the first neural network model involved in this case includes two branches, namely the CNN branch and the Transformer branch, which implements a search method for a hybrid neural network structure of the CNN structure and the Transformer structure.
  • the method before obtaining a plurality of pre-trained first neural network models according to the search space, the method further includes:
  • the original neural network models indicating the at least two neural network structures and the ratio between the at least two neural network structures;
  • the plurality of original neural network models are trained according to the training sample set, and the plurality of trained neural network models are obtained.
  • a neural network model, the training sample set includes multiple sample images.
  • multiple preset original neural network models are obtained, and the original neural network models indicate at least two neural network structures and the ratio between at least two neural network structures; multiple original neural networks are evaluated according to the training sample set.
  • the model is trained to obtain multiple first neural network models that have been trained, and a hybrid neural network model, that is, a training strategy for the first neural network model is constructed, which can greatly improve model training efficiency.
  • the plurality of original neural network models are trained according to the training sample set to obtain the plurality of first neural network models that have been trained, including:
  • the currently determined sampling model is trained according to the at least one sample image to obtain the corresponding first neural network model.
  • the sampling model is determined from multiple original neural network models according to the preset sampling period. For at least one sample image in the training sample set, the currently determined sampling model is trained based on at least one sample image to obtain the corresponding The first neural network model ensures the reliability of the first neural network model trained based on the sampling model.
  • the sampling model is a model with the largest model size among the multiple original neural network models, or a model with the smallest model size among the multiple original neural network models, or is a model with the smallest model size among the multiple original neural network models. Describe a randomly determined model among multiple original neural network models.
  • the sampling model is a model with the largest model size among multiple original neural network models, or a model with the smallest model size among multiple original neural network models, or a randomly determined model among multiple original neural network models. , further ensuring the rationality and effectiveness of model training.
  • a neural network structure search device which includes:
  • Memory used to store instructions executable by the processor
  • the processor is configured to implement the method of the claims when executing the instructions.
  • a neural network structure search device which includes:
  • the first acquisition unit is used to acquire the target constraint conditions, where the target constraint conditions indicate that the target instance is a model scale limited by running the neural network model;
  • the second acquisition unit is configured to acquire a plurality of pre-trained first neural network models according to a search space, where the search space indicates the correspondence between a plurality of model scales and the plurality of first neural network models, the The model structure parameters of each of the plurality of first neural network models are different, and the model structure parameters indicate the ratio between at least two neural network structures of the first neural network model;
  • a screening unit configured to select a target neural network model that meets the target constraint conditions from the plurality of first neural network models.
  • embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored.
  • the computer program instructions are executed by a processor, the above-mentioned first aspect or aspects are implemented. Any possible implementation method provided.
  • inventions of the present application provide a computer program product.
  • the computer program product includes computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code.
  • the processor in the computing device executes the above first aspect or the first Methods provided by any possible implementation of aspects.
  • embodiments of the present application provide a computing device cluster, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is used to execute the at least one computing device instructions stored in the memory, so that the computing device cluster executes the method provided by the above-mentioned first aspect or any possible implementation of the first aspect.
  • embodiments of the present application provide a computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to execute the above first aspect or any one of the first aspects Possible implementations of the provided methods.
  • embodiments of the present application provide a computer-readable storage medium, including computer program instructions.
  • the computer program instructions When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the above first aspect or the first aspect. Methods provided by any possible implementation of aspects.
  • Figure 1 shows a schematic structural diagram of a computing device provided by an exemplary embodiment of the present application.
  • Figure 2 shows a flow chart of a neural network structure search method provided by an exemplary embodiment of the present application.
  • Figure 3 shows a flow chart of a neural network structure search method provided by another exemplary embodiment of the present application.
  • Figure 4 shows a schematic diagram of a hybrid neural network model including a CNN structure and a Transformer structure provided by an exemplary embodiment of the present application.
  • Figure 5 shows a schematic diagram of a search space of a supernetwork provided by an exemplary embodiment of the present application.
  • Figure 6 shows a flow chart of a neural network structure search method provided by another exemplary embodiment of the present application.
  • Figure 7 shows a block diagram of a neural network structure search device provided by an exemplary embodiment of the present application.
  • Figure 8 shows a schematic structural diagram of a computing device cluster provided by an exemplary embodiment of the present application.
  • Figure 9 shows a schematic diagram of a connection method between computing device clusters provided by an exemplary embodiment of the present application.
  • exemplary means "serving as an example, example, or illustrative.” Any embodiment described herein as “exemplary” is not necessarily to be construed as superior or superior to other embodiments.
  • Super network a collection of networks composed of multiple independent neural network models, all neural networks in the super network The models share their own model structure parameters, where the model structure parameters are part of the model structure parameters specified among all the model structure parameters of the neural network model.
  • the space consists of the neural network models to be searched.
  • the search space of the super network indicates that some neural network models in the super network are neural network models to be searched. That is, the search space indicates multiple neural network models that are subsets of the supernetwork.
  • the embodiment of the present application does not limit the indication method of the search space.
  • the search space includes multiple neural network models indicated.
  • the search space includes search rules that indicate multiple neural network models.
  • CNN It is a deep neural network with a convolutional structure.
  • CNN contains a feature extractor composed of convolutional layers and subsampling layers.
  • the feature extractor can be regarded as a filter, and the convolution process can be regarded as using a trainable filter to convolve with an input image or convolution feature plane (English: feature map).
  • the convolutional layer refers to the neuron layer in CNN that convolves the input signal.
  • a neuron can be connected to only some of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as a way to extract image information independent of position.
  • the underlying principle is that the statistical information of one part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learned image information.
  • multiple convolution kernels can be used to extract different image information. Generally, the greater the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a random-sized matrix, and the convolution kernel can obtain reasonable weights through learning during the CNN training process.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the CNN, while reducing the risk of overfitting.
  • BP Back propagation
  • CNN can use the error back propagation algorithm to correct the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. .
  • forward propagation of the input signal until the output will produce an error loss, and the parameters in the initial super-resolution model are updated by back-propagating the error loss information, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the super-resolution model, such as the weight matrix.
  • the neural network structure search method for hybrid neural network structures (such as CNN+Transformer) proposed in the embodiments of this application can better solve the above problems of inconsistency and waste of computing resources. It can search for the ideal hybrid neural network structure under different model scales, such as small models (such as CNN with high parameter ratio) and large models (such as Transformer with high parameter ratio), while maintaining high efficiency and reliability.
  • the optimal neural network model can be determined through rapid pre-training without the need to restructure various networks from scratch. training and then select the optimal model. The hardware resources consumed by the entire training are much lower than those required to train various sub-models from random initialization in various scenarios.
  • the embodiment of the present application provides a neural network structure search method, and the execution subject is a computing device.
  • FIG. 1 shows a schematic structural diagram of a computing device provided by an exemplary embodiment of the present application.
  • the computing device may be a terminal or a server.
  • the terminal includes a mobile terminal or a fixed terminal.
  • the terminal can be a mobile phone, a tablet computer, a laptop computer, a desktop computer, etc.
  • the server can be a server, a server cluster composed of several servers, or a cloud computing service center.
  • the computing device involved in the model training phase is a server, such as a server including 8 GPUs.
  • the embodiment of this application uses a fixed large network and a flexible dynamic super network to complete the pre-training of hundreds of thousands of neural network models at one time.
  • neural network structure search can be performed on computing devices of various computing power sizes according to user requirements, and the computing device is a terminal or a server.
  • the computing device includes a processor 10 , a memory 20 , and a communication interface 30 .
  • a processor 10 the structure shown in Figure 1 does not constitute a limitation of the computing device, and may include more or fewer components than shown, or combine certain components, or arrange different components. in:
  • the processor 10 is the control center of the computing device, using various interfaces and lines to connect various parts of the entire computing device, by running or executing software programs and/or modules stored in the memory 20, and calling data stored in the memory 20 , perform various functions of the computing device and process data, thereby providing overall control of the computing device.
  • the processor 10 can be implemented by a CPU or a graphics processor (Graphics Processing Unit, GPU).
  • Memory 20 may be used to store software programs and modules.
  • the processor 10 executes various functional applications and data processing by running software programs and modules stored in the memory 20 .
  • the memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store the operating system 21, the first acquisition unit 22, the second acquisition unit 23, the filtering unit 24 and at least one application program 25 required for the function (such as neural network training, etc.); the storage data area can store data created based on the use of computing devices, etc.
  • the memory 20 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (Static Random Access Memory, SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk. Accordingly, the memory 20 may also include a memory controller to provide the processor 10 with access to the memory 20 .
  • SRAM static random access memory
  • EEPROM Electrically erasable programmable read-only memory
  • EPROM Erasable Programmable Read Only Memory
  • PROM Programmable Read-Only Memory
  • ROM Read Only Memory
  • magnetic memory flash memory
  • magnetic disk or optical disk Accordingly, the memory 20 may also include a memory controller to provide the processor 10 with access to the memory 20 .
  • the processor 10 performs the following functions by running the first acquisition unit 22: acquiring target constraints, which indicate that the target instance is a model scale limited by running the neural network model; the processor 10 executes the following by running the second acquisition unit 23 Function: Obtain multiple first neural network models that have been pre-trained according to the search space.
  • the search space indicates the correspondence between the multiple model scales and the multiple first neural network models, and the respective model structures of the multiple first neural network models.
  • the parameters are different, and the model structure parameters indicate the ratio between at least two neural network structures of the first neural network model; the processor 10 performs the following functions by running the screening unit 24: filtering out the matching among multiple first neural network models.
  • Target neural network model with target constraints are examples of target neural network models that have been pre-trained according to the search space.
  • the search space indicates the correspondence between the multiple model scales and the multiple first neural network models, and the respective model structures of the multiple first neural network models.
  • the parameters are different, and the model structure parameters indicate the ratio between at least two neural network structures of
  • neural network structure search method provided by the embodiments of this application can be applied to a variety of application scenarios.
  • Several exemplary application scenarios of the technical solutions of this application are introduced below.
  • One possible application scenario is a photo album management system. Users store a large number of pictures in the mobile phone album and hope to classify and manage the pictures in the album. For example, users want their phones to automatically group all images of birds together and all photos of people together.
  • the technical solution provided by this application can be used to search for an image classification model that matches the computing resources of the user's mobile phone based on the computing resources of the mobile phone.
  • the target constraints include mobile phone computing resources
  • the target neural network model is an image classification model.
  • target detection and segmentation In autonomous driving, the detection and segmentation of pedestrians, vehicles and other targets on the street are crucial for the vehicle to make safe driving decisions.
  • the technical solution provided by this application can be used to search for a target detection and segmentation model that matches the vehicle's computing resources based on the vehicle's computing resources.
  • the target constraints include vehicle computing resources
  • the target neural network model is a target detection and segmentation model.
  • FIG. 2 shows a flow chart of a neural network structure search method provided by an exemplary embodiment of the present application. This embodiment illustrates that this method is used in the computing device shown in FIG. 1 .
  • the method includes the following steps.
  • Step 201 Obtain the target constraint conditions, which indicate that the target instance is a model scale limited by running the neural network model.
  • the computing device obtains a target constraint, the target constraint indicating that the target instance is a model size limited by which the neural network model is run.
  • the target instance is a physical instance or a virtual instance.
  • the physical instance may be a physical machine
  • the virtual instance may be a virtual machine, a container, or a bare metal server.
  • virtual machines can also be called cloud servers (Elastic Compute Service, ECS) or elastic instances.
  • ECS Elastic Compute Service
  • the target instance and the computing device are the same device, or they are two different devices.
  • the computing device is a server and the target instance is a terminal.
  • the computing device and the target instance are the same terminal.
  • the target constraints include constraints on resources provided by the target instance for running the neural network model, and the resources include computing resources and/or storage resources of the target instance.
  • the target constraints include limited model scale information, and the model scale information includes model parameter information and model calculation information.
  • target constraints include Floating Point Operations per Second (FLOPS), or the amount of model parameters, or the maximum memory limit of the model.
  • FLOPS Floating Point Operations per Second
  • Step 202 Obtain multiple pre-trained first neural network models according to the search space.
  • the search space indicates the correspondence between the multiple model sizes and the multiple first neural network models.
  • the respective models of the multiple first neural network models are
  • the structure parameters are different, and the model structure parameters indicate a ratio between at least two neural network structures of the first neural network model.
  • the computing device pre-trains a plurality of first neural network models.
  • the terminal obtains multiple trained first neural network models stored in itself, or obtains multiple trained first neural network models from the server.
  • the server obtains multiple trained first neural network models stored by itself.
  • the computing device pre-constructs a search space, and the search space indicates corresponding relationships between the plurality of model scales and the plurality of first neural network models. That is, the search space indicates the correspondence between multiple model scales and multiple model structure parameters.
  • the model structure parameters include parameters corresponding to at least two neural network structures and/or parameter amounts between at least two neural network structures. Proportion.
  • the correspondence between multiple model scales indicated by the search space and multiple first neural network models includes: a one-to-many correspondence between multiple model scales and multiple first neural network models, that is, Each model size corresponds to a hybrid neural network model set, and the hybrid neural network set includes at least two first neural network models.
  • the computing device acquires a search space of the super network, the super network includes multiple neural network models sharing respective model structure parameters, and the search space further indicates that some neural network models in the super network are multiple first neural network models.
  • the search space includes multiple first neural network models.
  • the search space includes search rules, and the search rules indicate that some neural network models in the super network are multiple first neural network models.
  • the first neural network model is a model obtained by training a hybrid neural network structure using sample images and correct result information.
  • the correct result information is the pre-marked correct result information corresponding to the sample electroimage.
  • the first neural network model is a recurrent neural network model capable of identifying result information in images.
  • the first neural network model is used to convert input images into result information.
  • the first neural network model is used to represent the correlation between the image and the result information.
  • the first neural network model is a preset mathematical model, and the fault detection model includes model coefficients between the image and the result information.
  • the model coefficient can be a fixed value, a value that is dynamically modified over time, or a value that is dynamically modified according to the application scenario.
  • the result information is any one of classification result information, detection result information, and segmentation result information.
  • the embodiments of the present application are not limited to this.
  • the first neural network model is a hybrid neural network model including at least two neural network structures.
  • the first neural network model is a hybrid neural network model including two neural network structures.
  • the first neural network model is a hybrid neural network model including a CNN structure and a Transformer structure.
  • the first neural network model is a hybrid neural network model including a Graph Convolutional Networks (GCN) structure and a Transformer structure.
  • GCN Graph Convolutional Networks
  • each first neural network model is a hybrid neural network model including a CNN structure and a Transformer structure.
  • the types of hybrid neural network structures of at least two neural network models are different, and/or, the corresponding model structure parameters of any two first neural network models are different.
  • At least one first neural network model is a hybrid neural network model including a CNN structure and a Transformer structure
  • at least one first neural network model is a hybrid neural network model including a GCN structure and a Transformer structure. network model.
  • each first neural network model includes a hybrid neural network
  • the corresponding model structure parameters of any two first neural network models are different
  • the model structure parameters corresponding to the first neural network model are at least two of the first neural network models.
  • the corresponding parameters of each neural network structure and/or the parameter amount ratio between at least two neural network structures are multiple hybrid neural network models with different model structure parameters.
  • Step 203 Select the target neural network model that meets the target constraint conditions from multiple first neural network models.
  • the computing device determines the first neural network model corresponding to the model scale indicated by the target constraint as the target neural network based on the correspondence between the multiple model scales indicated by the search space and the multiple first neural network models.
  • the target neural network model is a neural network model among a plurality of first neural network models.
  • the target neural network model is a neural network model to be run that meets the target constraints
  • the target neural network model includes a hybrid neural network model of at least two neural network structures
  • the target neural network model indicates at least two neural network structures included. the ratio between.
  • the target neural network model includes two neural network structures, for example, the two neural network structures are a CNN structure and a Transformer structure.
  • the computing device selects a target neural network model that meets the target constraint condition among the plurality of first neural network models, including: the computing device selects at least two of the multiple first neural network models that meet the target constraint condition. a second neural network model; train at least two second neural network models on the target data set corresponding to the target task to obtain evaluation parameters corresponding to each of the at least two second neural network models, and the evaluation parameters indicate the second neural network model The degree of matching with the target task; determining the target neural network model based on the corresponding evaluation parameters of at least two second neural network models.
  • the target task is a pending migration downstream task.
  • the target task includes one of classification task, detection task and segmentation task. It should be noted that the target task can also be other types of tasks, which is not limited in the embodiments of the present application.
  • the target data set is a data set used for target task deployment.
  • the downstream migration data set is PASCAL VOC data set, or COCO data set or Pets data set.
  • the computing device randomly samples K second neural network models from the selected at least two second neural network models, and trains the K second neural network models on the downstream migration data set to obtain K second neural network models.
  • the corresponding evaluation parameters of the second neural network model According to the corresponding evaluation parameters of the K second neural network models, the target neural network model is determined, and K is a positive integer greater than 1.
  • the evaluation parameter is the accuracy value of the evaluation index
  • the computing device determines the target neural network model based on the evaluation parameters corresponding to the at least two second neural network models, including: based on the corresponding evaluation parameters of the at least two second neural network models.
  • the accuracy value of the index is evaluated, and the second neural network model with the highest accuracy value is determined as the target neural network model.
  • the evaluation index is the average AP value (Mean Average Precision, mAP) or Top1.
  • the embodiments of the present application are not limited to this.
  • the embodiments of the present application complete multiple hybrid neural network models, that is, multiple first neural network models, through pre-training.
  • the model structure parameters of the multiple first neural network models are different, and the model structure parameters indicate the first
  • the ratio between at least two neural network structures of the neural network model enables automatic search for a target neural network model that meets the current target constraint conditions among multiple first neural network models under the target constraint conditions that limit the model scale, That is, the ideal hybrid neural network structure and the ratio between at least two mixed neural network structures are searched, which improves the effect of neural network structure search.
  • Figure 3 shows a flow chart of a neural network structure search method provided by another exemplary embodiment of the present application.
  • this embodiment illustrates that the model training process and the model application process involved in this method are both used in the computing device shown in FIG. 1 .
  • the method includes the following steps.
  • Step 301 Obtain a plurality of preset original neural network models.
  • the original neural network models indicate at least two neural network structures and a ratio between at least two neural network structures.
  • the computing device acquires a plurality of preset original neural network models, and the original neural network model is a hybrid neural network model including at least two neural network structures.
  • the original neural network model is a hybrid neural network model including two neural network structures.
  • the original neural network model is a hybrid neural network model including a CNN structure and a Transformer structure.
  • the original neural network model is a hybrid neural network model including a GCN structure and a Transformer structure. The embodiments of the present application are not limited to this.
  • Step 302 Train multiple original neural network models according to the training sample set to obtain multiple first neural network models that have been trained.
  • the training sample set includes multiple sample images.
  • the computing device acquires a training sample set, which includes a plurality of sample images.
  • the computing device preprocesses each sample image in the training sample set to obtain a preprocessed sample image, and trains multiple original neural network models based on the multiple preprocessed sample images to obtain multiple first neural network models that have been trained. network model.
  • the preprocessing includes at least one of normalization processing, cropping processing (including randomly cropping to a uniform size, such as 224 ⁇ 224), and data enhancement processing.
  • the data enhancement processing includes flipping, random erasing. At least one of elimination and random automatic enhancement. The embodiments of the present application are not limited to this.
  • the computing device determines a sampling model from multiple original neural network models according to a preset sampling period; for at least one sample image in the training sample set, train the currently determined sampling model based on at least one sample image to obtain the corresponding The first neural network model.
  • the computing device inputs the preprocessed multiple sample images into the sampling model in batches for training.
  • the sampling model is a model determined from multiple original neural network models according to a preset sampling period.
  • the sampling model is a model with the largest model size among multiple original neural network models, or a model with the smallest model size among multiple original neural network models, or a randomly determined model among multiple original neural network models.
  • the computing device trains the currently determined sampling model based on at least one sample image to obtain the corresponding first neural network model, including: calling the currently determined sampling model for forward propagation based on at least one sample image, and Calculate the loss function, use the back propagation algorithm and the preset optimizer to train according to the loss function, and obtain the corresponding first neural network model after multiple iterative trainings, that is, finally obtain the corresponding first trained neural network model of multiple original neural network models.
  • a neural network model is
  • a schematic diagram of a hybrid neural network model including a CNN structure and a Transformer structure is shown in Figure 4.
  • the hybrid neural network model includes a dual-branch network structure, which is a CNN structure and a Transformer structure.
  • a dynamic convolution layer module is used on the CNN structure, and the dynamic convolution layer module includes a dynamic convolution operator with variable width;
  • a dynamic Transformer module is used on the Transformer structure, and the dynamic Transformer module includes a variable width, head (English :head) at least one parameter in the number.
  • d1, d2, d3, and d4 respectively represent the number of network layers in the four stages. d1, d2, d3, and d4 will also dynamically change during the model training process.
  • the value of d1 is 4 or 5, and the value of d2 is 4. or 5, the value of d3 is 3 or 4, the value of d4 is 1, the sum of the network layers in the four stages d1+d2+d3+d4 is d-model, the value of d-model is between 12-15 .
  • Step 303 Obtain the target constraint conditions, which indicate that the target instance is a model scale limited by running the neural network model.
  • the computing device obtains the target constraint, and the target constraint indicates that the target instance is a model size limited by the running neural network model.
  • Step 304 Obtain multiple pre-trained first neural network models according to the search space.
  • the search space indicates the correspondence between the multiple model sizes and the multiple first neural network models.
  • the respective models of the multiple first neural network models are
  • the structure parameters are different, and the model structure parameters indicate a ratio between at least two neural network structures of the first neural network model.
  • the computing device obtains a search space of the super network and obtains a plurality of first neural network models indicated by the search space, wherein the search space indicates a correspondence between a plurality of model scales and a plurality of first neural network models, and the plurality of first neural network models
  • the respective model structure parameters of the network models are different, and the model structure parameters indicate the ratio between at least two neural network structures of the first neural network model.
  • a schematic diagram of the search space of the super network is shown in Figure 5.
  • the value of each triplet represents the starting point, end point, and step size of sampling in this dimension.
  • the network randomly samples structural parameters from the table and builds a structure, and then trains with this structure.
  • the model structure will also be searched according to the search space specified in this table.
  • width stem corresponds to the number of output channels of the convolution of the convolution layer in Figure 4;
  • width b1 is the number of output channels of the network corresponding to d1;
  • width b2 is the number of output channels of the network corresponding to d2;
  • width b3 is the number of output channels corresponding to d3.
  • the number of output channels of the network or also called the number of output channels of the network corresponding to d4, the number of output channels of the network corresponding to d3 is the same as the number of output channels of the network corresponding to d4;
  • dim e is the self-attention layer Number of feature channels;
  • n h is the number of attention heads in the self-attention layer;
  • ratio mlp is the hidden feature dimension ratio in the Multilayer Perceptron (MLP) layer, that is, the hidden feature dimension of the MLP layer is dim e *ratio mlp ;
  • d model is the number of blocks in the Transformer structure.
  • the first type of supernet is also called an ultra-small supernet, and the parameter range is 5.5M ⁇ 13.5M;
  • the second type of supernet is also called a small supernet, and the parameter range is 15.1 M ⁇ 39.9M;
  • the third type of super network is also called a basic super network, and the parameter range is 23.2M ⁇ 66.2M.
  • the embodiments of the present application are not limited to this.
  • Step 305 Select the target neural network model that meets the target constraint conditions from multiple first neural network models.
  • the computing device selects a target neural network model that meets the target constraint conditions from the plurality of first neural network models. It should be noted that, for the relevant details of steps 303 to 305, reference may be made to the relevant descriptions in the above embodiments and will not be described again here.
  • the embodiments of the present application construct a search space.
  • the search space indicates the correspondence between multiple model scales and multiple first neural network models.
  • the multiple first neural network models are hybrid neural networks with different proportions.
  • Model under different model scales, the ideal model, that is, the target neural network model that meets the target constraints, can be searched.
  • a model training strategy is constructed, which can greatly improve the efficiency of model training and subsequent network search.
  • the neural network structure search method provided by the embodiments of this application can be applied on public clouds/hybrid clouds/private clouds to provide intelligent and efficient neural network structure search and production capabilities and deployment.
  • the neural network structure search method provided by the embodiments of the present application can involve various downstream tasks such as classification, detection, and segmentation.
  • FIG. 6 shows a flow chart of a neural network structure search method provided by another exemplary embodiment of the present application. This embodiment illustrates that this method is used in the computing device shown in FIG. 1 .
  • the method includes the following steps.
  • Step 601 Select the upstream image classification data set as the training sample set.
  • Step 602 Obtain the preset search space of the super network.
  • the super network includes multiple neural network models sharing respective model structure parameters, and the search space indicates that some neural network models in the super network are multiple first neural network models.
  • Step 603 Preprocess each sample image in the training sample set.
  • each sample image in the training sample set and randomly crop it to a uniform size, and then perform data enhancement processing such as flipping, random erasing, and random automatic enhancement.
  • Step 604 Randomly sample a model structure from the preset search space as a sampling model.
  • a model structure is randomly sampled from the preset search space.
  • the sampling method is as follows. We use T as a period to sample structural types, and collect the model with the largest model size, the model with the smallest model size, and randomly sampled T-2 models in the search space within a preset sampling period;
  • Step 605 Divide the preprocessed multiple sample images into batches and send them to the sampling model for training to obtain multiple first neural network models.
  • the preprocessed multiple sample images are divided into batches (for example, 256 pictures) and sent to the sampled model in step S3 for forward propagation, and the cross-entropy loss function is calculated.
  • the back propagation algorithm and a preset optimizer (such as AdamW optimizer) are used to reduce the loss function to train the model.
  • the final first neural network model is obtained.
  • Step 606 After completing the training, traverse multiple first neural network models indicated in the search space, perform information statistics and tabulate them.
  • the information statistics include statistics on at least one of FLOPS, parameter amount, and maximum memory limit.
  • the upstream construction is completed, and the computing device can subsequently perform structure search and customization based on downstream scenarios (such as data, model size, etc.).
  • Step 607 According to the target constraints, the search space is traversed one by one, and at least two second neural network models that meet the target constraints are selected from the plurality of first neural network models.
  • Step 608 Select a specific target task.
  • the migration downstream task is the above-mentioned target task.
  • Step 609 Select a downstream migration data set.
  • the downstream migration data set is the above-mentioned target data set, which is the data set used for migrating downstream task deployment.
  • Step 610 Train at least two second neural network models on the downstream migration data set to obtain evaluation parameters corresponding to each of the at least two second neural network models.
  • the evaluation parameter is the precision value of the evaluation index, for example, the evaluation index is the mean AP value (Mean Average Precision, mAP) or Top1.
  • the embodiments of the present application are not limited to this.
  • Step 611 Determine the target neural network model based on the corresponding evaluation parameters of at least two second neural network models.
  • K second neural network models are randomly sampled from the selected at least two second neural network models, and the K second neural network models are trained on the downstream migration data set to obtain K second neural network models.
  • Neural network models each Based on the accuracy value of the corresponding evaluation index, the second neural network model with the highest accuracy value is determined as the target neural network model.
  • the plurality of first neural network models in the embodiments of the present application are hybrid neural network models including at least two neural network structures. Based on the plurality of first neural network models, a search space of the super network is constructed. The search space The proportion of parameters of at least two neural network structures involving different layers can maximize the model's good local feature extraction and global modeling capabilities. Moreover, the neural network structure search method provided by the embodiments of the present application solves the problem of model structure inconsistency in related technologies, and can obtain a target neural network with the best accuracy-parameter balance ratio under multiple model scales. Model.
  • FIG. 7 shows a block diagram of a neural network structure search device provided by an exemplary embodiment of the present application.
  • the device can be implemented as all or part of the computing device through software, hardware, or a combination of both.
  • the device may include: a first acquisition unit 710, a second acquisition unit 720, and a screening unit 730.
  • the first acquisition unit 710 is used to acquire the target constraint conditions, which indicate that the target instance is a model scale limited by running the neural network model;
  • the second acquisition unit 720 is configured to acquire multiple first neural network models that have been pre-trained according to the search space.
  • the search space indicates the correspondence between the multiple model scales and the multiple first neural network models.
  • the multiple first neural network models The respective model structure parameters of the network models are different, and the model structure parameters indicate the ratio between at least two neural network structures of the first neural network model;
  • the screening unit 730 is used to screen out the target neural network model that meets the target constraint conditions from the plurality of first neural network models.
  • the screening unit 730 is also used to:
  • At least two second neural network models are trained on the target data set corresponding to the target task to obtain evaluation parameters corresponding to each of the at least two second neural network models.
  • the evaluation parameters indicate the matching degree of the second neural network model and the target task.
  • the target neural network model is determined according to the corresponding evaluation parameters of the at least two second neural network models.
  • the device further includes: a third acquisition unit;
  • the third acquisition unit is used to obtain the search space of the super network.
  • the super network includes multiple neural network models that share their respective model structure parameters.
  • the search space also indicates that some neural network models in the super network are multiple first neural network models. .
  • At least two neural network structures include a convolutional neural network CNN structure and a Transformer structure.
  • the device further includes: a model training unit; the model training unit is used for:
  • the original neural network models indicating at least two neural network structures and a ratio between at least two neural network structures
  • Multiple original neural network models are trained according to the training sample set to obtain multiple first neural network models that have been trained, and the training sample set includes multiple sample images.
  • model training unit is used to:
  • the currently determined sampling model is trained according to the at least one sample image to obtain a corresponding first neural network model.
  • the sampling model is a model with the largest model size among multiple original neural network models, or a model with the smallest model size among multiple original neural network models, or a model with the smallest model size among multiple original neural network models. Randomly determined models.
  • An embodiment of the present application provides a neural network structure search device.
  • the device includes: a processor; a memory used to store instructions executable by the processor; wherein the processor is configured to implement the above calculation by executing the instructions. The method the device performs.
  • Embodiments of the present application provide a computer program product, which includes computer-readable code, or a non-volatile computer-readable storage medium carrying the computer-readable code.
  • the computer-readable code is run in a processor of a computing device, , the processor in the computing device performs the above method performed by the computing device.
  • Embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored.
  • the computer program instructions are executed by a processor, the above method executed by a computing device is implemented.
  • the computing device cluster includes at least one computing device 800 .
  • Computing device 800 includes: bus 802, processor 804, memory 806, and communication interface 808.
  • the processor 804, the memory 806 and the communication interface 808 communicate through the bus 802.
  • the computing device 800 may be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device 800 may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
  • the same instructions for performing the neural network structure search method may be stored in the memory 806 of one or more computing devices 800 in the computing device cluster.
  • the memory 806 of one or more computing devices 800 in the computing device cluster may also store part of the instructions for executing the neural network structure search method.
  • a combination of one or more computing devices 800 may collectively execute instructions for performing a neural network structure search method.
  • the memories 806 in different computing devices 800 in the computing device cluster can store different instructions, respectively used to execute part of the functions of the neural network structure search apparatus. That is, instructions stored in the memory 806 in different computing devices 800 may implement the functions of one or more units among the first acquisition unit, the second acquisition unit and the filtering unit.
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network may be a wide area network or a local area network, etc.
  • Figure 9 shows a possible implementation. As shown in Figure 9, two computing devices 800A and 800B are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device.
  • the memory 806 in the computing device 800A stores instructions for performing the functions of the first acquisition unit, the second acquisition unit and the filtering unit.
  • computing equipment Memory 806 in 800B stores instructions for executing the functions of the model training unit.
  • connection method between the computing device clusters shown in Figure 9 can be: Considering that the neural network structure search method provided by this application requires a large number of hybrid neural network models to be trained and to store multiple first neural network models that have been trained, it is considered that The functions implemented by the model training unit are executed by the computing device 800B.
  • computing device 800A shown in FIG. 9 may also be performed by multiple computing devices 800.
  • computing device 800B may also be performed by multiple computing devices 800 .
  • Embodiments of the present application also provide a computing device cluster, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is used to execute instructions stored in the memory of the at least one computing device, So that the computing device cluster performs the above method performed by the computing device.
  • Embodiments of the present application also provide a computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to execute the above method performed by the computing devices.
  • Embodiments of the present application also provide a computer-readable storage medium, including computer program instructions.
  • the computer program instructions When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the above method executed by the computing device.
  • Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard drives, random access memory (RAM), read only memory (ROM), erasable memory Electrically Programmable Read-Only-Memory (EPROM or Flash Memory), Static Random-Access Memory (SRAM), Portable Compact Disc Read-Only Memory (CD) -ROM), Digital Video Disc (DVD), memory stick, floppy disk, mechanical encoding device, such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above .
  • RAM random access memory
  • ROM read only memory
  • EPROM or Flash Memory erasable memory Electrically Programmable Read-Only-Memory
  • SRAM Static Random-Access Memory
  • CD Portable Compact Disc Read-Only Memory
  • DVD Digital Video Disc
  • memory stick floppy disk
  • mechanical encoding device such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above .
  • Computer-readable program instructions or code described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
  • the computer program instructions used to perform the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, status setting data, or one or more Source code or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • the computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer (e.g. Use Internet services service provider to connect via the Internet).
  • electronic circuits are customized by utilizing state information of computer-readable program instructions, such as programmable logic circuits, field-programmable gate arrays (Field-Programmable Gate Arrays, FPGAs), or programmable logic arrays (Programmable logic circuits).
  • Logic Array PLA
  • this electronic circuit can execute computer-readable program instructions to implement various aspects of the present application.
  • These computer-readable program instructions may be provided to a processor of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, thereby producing a machine, such that when executed by the processor of the computer or other programmable data processing apparatus, these computer-readable program instructions , resulting in a device that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s).
  • Executable instructions may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by hardware (such as circuits or ASICs) that perform the corresponding function or action. Specific Integrated Circuit), or can be implemented with a combination of hardware and software, such as firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及人工智能领域,尤其涉及一种神经网络结构搜索方法、装置及存储介质。该方法包括:获取目标约束条件,目标约束条件指示目标实例为运行神经网络模型所限定的模型规模;根据搜索空间获取预先训练完成的多个第一神经网络模型,搜索空间指示多个模型规模与多个第一神经网络模型之间的对应关系,多个第一神经网络模型各自的模型结构参数是不同的,模型结构参数指示第一神经网络模型的至少两个神经网络结构之间的比例;在多个第一神经网络模型中筛选出符合目标约束条件的目标神经网络模型。本申请实施例提供的方案可以在限定模型规模的目标约束条件下自动搜索到理想的混合神经网络结构,提高了神经网络结构搜索的效果。

Description

神经网络结构搜索方法、装置及存储介质
本申请要求于2022年08月25日提交中国专利局、申请号为202211026784.6、申请名称为“神经网络结构搜索方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能(artificial intelligence,AI)领域,尤其涉及一种神经网络结构搜索方法、装置及存储介质。
背景技术
随着AI技术的快速发展,各种神经网络模型层出不穷。神经网络结构的性能对神经网络模型的任务执行效果具有重要的影响。神经网络结构的性能越优,神经网络模型的任务执行效果越好。因此,构建神经网络模型时,如何确定性能优的神经网络结构是本领域技术人员的研究热点。
神经网络结构搜索(neural architecture search,NAS)技术应用而生,NAS技术可以在预先定义的搜索空间中自动搜索到性能最优的神经网络结构。但是,在给定模型规模的情况下不同的神经网络结构的表现存在不一致性,比如在大模型规模下Transformer结构表现出优于卷积神经网络(convolutional neural network,CNN)的性能,又比如在小模型尺度下CNN表现出优于Transformer结构的性能。这表示在给定模型规模的特定约束条件下,很难确定该使用哪种神经网络结构以设计出符合当前约束条件下的理想神经网络结构,相关技术中尚未提供一种合理且有效的实现方式。
发明内容
有鉴于此,提出了一种神经网络结构搜索方法、装置及存储介质,通过预先构建搜索空间,搜索空间指示多个模型规模与多个第一神经网络模型之间的对应关系,多个第一神经网络模型各自的模型结构参数是不同的,模型结构参数指示第一神经网络模型的至少两个神经网络结构之间的比例,可以实现在限定模型规模的目标约束条件下,确定出符合当前目标约束条件的目标神经网络模型,也即自动搜索到理想的混合神经网络结构,提高了神经网络结构搜索的效果。
第一方面,本申请的实施例提供了一种神经网络结构搜索方法,所述方法包括:
获取目标约束条件,所述目标约束条件指示目标实例为运行神经网络模型所限定的模型规模;
根据搜索空间获取预先训练完成的多个第一神经网络模型,所述搜索空间指示多个模型规模与所述多个第一神经网络模型之间的对应关系,所述多个第一神经网络模型各自的模型结构参数是不同的,所述模型结构参数指示所述第一神经网络模型的至少两个神经网络结构之间的比例;
在所述多个第一神经网络模型中筛选出符合所述目标约束条件的目标神经网络模型。
在该实现方式中,本申请实施例通过预先构建搜索空间,搜索空间指示多个模型规模与多个第一神经网络模型之间的对应关系,多个第一神经网络模型各自的模型结构参数是不同的,模型结构参数指示第一神经网络模型的至少两个神经网络结构之间的比例,使得在限定模型规模的目标约束条件下,能够在多个第一神经网络模型中自动搜索到符合当前目标约束条件的目标神经网络模型,也即搜索到理想的混合神经网络结构和混合的至少两个神经网络结构之间的比例,提高了神经网络结构搜索的效果。
在一种可能的实现方式中,所述在所述多个第一神经网络模型中筛选出符合所述目标约束条件的目标神经网络模型,包括:
在所述多个第一神经网络模型中筛选出符合所述目标约束条件的至少两个第二神经网络模型;
将所述至少两个第二神经网络模型在目标任务对应的目标数据集上进行训练,得到所述至少两个第二神经网络模型各自对应的评价参数,所述评价参数指示所述第二神经网络模型与所述目标任务的匹配程度;
根据所述至少两个第二神经网络模型各自对应的评价参数,确定所述目标神经网络模型。
在该实现方式中,在多个第一神经网络模型中筛选出符合目标约束条件的至少两个第二神经网络模型后,将至少两个第二神经网络模型在目标任务对应的目标数据集上进行训练,得到至少两个第二神经网络模型各自对应的评价参数,根据至少两个第二神经网络模型各自对应的评价参数,确定目标神经网络模型,即在限定模型规模的目标约束条件下,经过一次训练就能自动搜索出该模型规模下的最优组合即目标神经网络模型,进一步提高了神经网络结构搜索的效率。
在另一种可能的实现方式中,所述根据搜索空间获取预先训练完成的多个第一神经网络模型之前,还包括:
获取超网络的所述搜索空间,所述超网络包括共享各自的模型结构参数的多个神经网络模型,所述搜索空间还指示所述超网络中的部分神经网络模型为所述多个第一神经网络模型。
在该实现方式中,构建了超网络的搜索空间,超网络包括共享各自的模型结构参数的多个神经网络模型,搜索空间还指示超网络中的部分神经网络模型为多个第一神经网络模型,使得后续基于搜索空间指示的多个第一神经网络模型进行神经网络结构搜索,进一步提高了神经网络结构搜索的效率。
在另一种可能的实现方式中,所述至少两个神经网络结构包括CNN结构和Transformer结构。
在该实现方式中,本案涉及的第一神经网络模型包括两个分支即CNN分支和Transformer分支,实现了CNN结构和Transformer结构的混合神经网络结构的搜索方法。
在另一种可能的实现方式中,所述根据搜索空间获取预先训练完成的多个第一神经网络模型之前,还包括:
获取预设的多个原始神经网络模型,所述原始神经网络模型指示所述至少两个神经网络结构和所述至少两个神经网络结构之间的比例;
根据训练样本集对所述多个原始神经网络模型进行训练,得到训练完成的所述多个第 一神经网络模型,所述训练样本集中包括多个样本图像。
在该实现方式中,获取预设的多个原始神经网络模型,原始神经网络模型指示至少两个神经网络结构和至少两个神经网络结构之间的比例;根据训练样本集对多个原始神经网络模型进行训练,得到训练完成的多个第一神经网络模型,构建了混合神经网络模型即第一神经网络模型的训练策略,可以大幅提高模型训练效率。
在另一种可能的实现方式中,所述根据训练样本集对所述多个原始神经网络模型进行训练,得到训练完成的所述多个第一神经网络模型,包括:
按照预设采样周期从所述多个原始神经网络模型中确定采样模型;
对于所述训练样本集中的至少一个样本图像,根据所述至少一个样本图像对当前确定出的所述采样模型进行训练得到对应的所述第一神经网络模型。
在该实现方式中,按照预设采样周期从多个原始神经网络模型中确定采样模型,对于训练样本集中的至少一个样本图像,根据至少一个样本图像对当前确定出的采样模型进行训练得到对应的第一神经网络模型,保证了基于采样模型训练得到的第一神经网络模型的可靠性。
在另一种可能的实现方式中,所述采样模型为所述多个原始神经网络模型中模型规模最大的模型、或者为所述多个原始神经网络模型中模型规模最小的模型、或者为所述多个原始神经网络模型中随机确定的模型。
在该实现方式中,采样模型为多个原始神经网络模型中模型规模最大的模型、或者为多个原始神经网络模型中模型规模最小的模型、或者为多个原始神经网络模型中随机确定的模型,进一步保证了模型训练的合理性和有效性。
第二方面,本申请的实施例提供了一种神经网络结构搜索装置,所述装置包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置为执行所述指令时实现权利要求上述的方法。
第三方面,本申请的实施例提供了一种神经网络结构搜索装置,所述装置包括:
第一获取单元,用于获取目标约束条件,所述目标约束条件指示目标实例为运行神经网络模型所限定的模型规模;
第二获取单元,用于根据搜索空间获取预先训练完成的多个第一神经网络模型,所述搜索空间指示多个模型规模与所述多个第一神经网络模型之间的对应关系,所述多个第一神经网络模型各自的模型结构参数是不同的,所述模型结构参数指示所述第一神经网络模型的至少两个神经网络结构之间的比例;
筛选单元,用于在所述多个第一神经网络模型中筛选出符合所述目标约束条件的目标神经网络模型。
第四方面,本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述第一方面或第一方面的任意一种可能的实现方式所提供的方法。
第五方面,本申请的实施例提供了一种计算机程序产品,所述计算机程序产品包括计算机可读代码,或者承载有所述计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在计算设备中运行时,所述计算设备中的处理器执行上述第一方面或第一 方面的任意一种可能的实现方式所提供的方法。
第六方面,本申请的实施例提供了一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器;所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行上述第一方面或第一方面的任意一种可能的实现方式所提供的方法。
第七方面,本申请的实施例提供了一种包含指令的计算机程序产品,当所述指令被计算设备集群运行时,使得所述计算设备集群执行上述第一方面或第一方面的任意一种可能的实现方式所提供的方法。
第八方面,本申请的实施例提供了一种计算机可读存储介质,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行上述第一方面或第一方面的任意一种可能的实现方式所提供的方法。
附图说明
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实施例、特征和方面,并且用于解释本申请的原理。
图1示出了本申请一个示例性实施例提供的计算设备的结构示意图。
图2示出了本申请一个示例性实施例提供的神经网络结构搜索方法的流程图。
图3示出了本申请另一个示例性实施例提供的神经网络结构搜索方法的流程图。
图4示出了本申请一个示例性实施例提供的包括CNN结构和Transformer结构的混合神经网络模型的示意图。
图5示出了本申请一个示例性实施例提供的超网络的搜索空间的示意图。
图6示出了本申请另一个示例性实施例提供的神经网络结构搜索方法的流程图。
图7示出了本申请一个示例性实施例提供的神经网络结构搜索装置的框图。
图8示出了本申请一个示例性实施例提供的计算设备集群的结构示意图。
图9示出了本申请一个示例性实施例提供的计算设备集群之间的连接方式的示意图。
具体实施方式
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。
首先,对本申请实施例涉及的一些名词进行介绍。
1、超网络:由多个独立的神经网络模型组成的网络集合,超网络中的所有神经网络 模型共享各自的模型结构参数,其中模型结构参数为该神经网络模型的所有模型结构参数中指定的部分模型结构参数。
2、搜索空间:神经网络结构搜索过程中,由待搜索的神经网络模型组成的空间。超网络的搜索空间指示超网络中的部分神经网络模型为待搜索的神经网络模型。也即,搜索空间指示的多个神经网络模型为超网络的子集。可选地,本申请实施例对搜索空间的指示方式不加以限定。比如,搜索空间包括指示的多个神经网络模型。又比如,搜索空间包括搜索规则,该搜索规则指示多个神经网络模型。
3、CNN:是一种带有卷积结构的深度神经网络。CNN包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(英文:feature map)做卷积。卷积层是指CNN中对输入信号进行卷积处理的神经元层。在CNN的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,我们都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。
卷积核可以以随机大小的矩阵的形式初始化,在CNN的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少CNN各层之间的连接,同时又降低了过拟合的风险。
4、反向传播(back propagation,BP)算法:CNN可以采用误差反向传播算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。
相关技术中,大部分方法聚焦于单个模型尺度和模型规模下的模型结构搜索,然而不同的神经网络模型在不同模型规模下的优越性不同,因此以上方法无法解决不同模型规模下的结构的不一致性。其次,在不同的应用场景中,可用的计算资源差异十分明显,能够部署的模型也不相同,针对不同模型需要进行单独的预训练,模型的复用性很差,十分浪费计算资源。
本申请实施例提出的关于混合神经网络结构(比如CNN+Transformer)的神经网络结构搜索方法,可以较好的解决上述不一致性和计算资源浪费问题。可做到在小模型(比如CNN参数比例高)和大模型(比如Transformer参数比例高)等不同模型规模下均能搜索出理想的混合神经网络结构,且保持高效可靠。并且,本申请实施例根据下游硬件以及部署时限定的参数量和计算量等目标约束条件进行筛选之后,进行快速的预训练即可确定出最佳神经网络模型,不需要对各种网络进行从头的训练再选出最优模型。整个训练花费的硬件资源远远低于在各种场景中从随机初始化训练各种子模型所需的硬件资源。
下面,对本申请涉及的应用场景进行介绍。
本申请实施例提供了一种神经网络结构搜索方法,执行主体为计算设备。请参考图1,其示出了本申请一个示例性实施例提供的计算设备的结构示意图。
该计算设备可以是终端或者服务器。终端包括移动终端或者固定终端,比如终端可以是手机、平板电脑、膝上型便携计算机和台式计算机等等。服务器可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心。
可选地,在模型训练阶段涉及的计算设备为服务器,比如为包括8块GPU的服务器。本申请实施例通过固定大网络与灵活的动态超网,一次性完成十几万个神经网络模型的预训练。
可选地,在模型推理阶段可根据用户要求,在各种算力大小的计算设备上进行神经网络结构搜索,计算设备为终端或者服务器。
如图1所示,计算设备包括处理器10、存储器20以及通信接口30。本领域技术人员可以理解,图1中示出的结构并不构成对该计算设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
处理器10是计算设备的控制中心,利用各种接口和线路连接整个计算设备的各个部分,通过运行或执行存储在存储器20内的软件程序和/或模块,以及调用存储在存储器20内的数据,执行计算设备的各种功能和处理数据,从而对计算设备进行整体控制。处理器10可以由CPU实现,也可以由图形处理器(Graphics Processing Unit,GPU)实现。
存储器20可用于存储软件程序以及模块。处理器10通过运行存储在存储器20的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器20可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统21、第一获取单元22、第二获取单元23、筛选单元24和至少一个功能所需的应用程序25(比如神经网络训练等)等;存储数据区可存储根据计算设备的使用所创建的数据等。存储器20可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(Static Random Access Memory,SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM),可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM),可编程只读存储器(Programmable Read-Only Memory,PROM),只读存储器(Read Only Memory,ROM),磁存储器,快闪存储器,磁盘或光盘。相应地,存储器20还可以包括存储器控制器,以提供处理器10对存储器20的访问。
其中,处理器10通过运行第一获取单元22执行以下功能:获取目标约束条件,目标约束条件指示目标实例为运行神经网络模型所限定的模型规模;处理器10通过运行第二获取单元23执行以下功能:根据搜索空间获取预先训练完成的多个第一神经网络模型,搜索空间指示多个模型规模与多个第一神经网络模型之间的对应关系,多个第一神经网络模型各自的模型结构参数是不同的,模型结构参数指示第一神经网络模型的至少两个神经网络结构之间的比例;处理器10通过运行筛选单元24执行以下功能:在多个第一神经网络模型中筛选出符合目标约束条件的目标神经网络模型。
需要说明的是,本申请实施例提供的神经网络结构搜索方法可以应用于多种应用场景。下面介绍本申请的技术方案的几种示例性应用场景。
一种可能的应用场景为相册管理系统。用户在手机相册中存储了大量的图片,并希望能够对相册中的图片进行分类管理。例如,用户希望手机能够自动将所有鸟类图像归类在一起,将所有人物照片归类在一起。这种应用场景下,可以利用本申请提供的技术方案,基于用户的手机计算资源,搜索出与该手机计算资源匹配的图像分类模型。这样,在手机上运行具有该图像分类模型,就可以对手机相册中的不同类别的图片进行分类管理,从而方便用户的查找,节省用户的管理时间,提高相册管理的效率。即目标约束条件包括手机计算资源,目标神经网络模型为图像分类模型。
另一种可能的应用场景为目标检测与分割。在自动驾驶中,对街道的行人、车辆等目标进行检测和分割,对车辆做出安全驾驶决策至关重要。这种应用场景下,可以利用本申请提供的技术方案,基于车辆计算资源,搜索出与该车辆计算资源匹配的目标检测与分割模型。这样,在车辆上运行具有该模型结构的目标检测与分割模型,即可对车辆采集到的图像中目标进行准确的检测定位与分割。即目标约束条件包括车辆计算资源,目标神经网络模型为目标检测与分割模型。
下面,采用几个示例性实施例对本申请实施例提供的神经网络结构搜索方法进行介绍。
请参考图2,其示出了本申请一个示例性实施例提供的神经网络结构搜索方法的流程图,本实施例以该方法用于图1所示的计算设备中来举例说明。该方法包括以下几个步骤。
步骤201,获取目标约束条件,目标约束条件指示目标实例为运行神经网络模型所限定的模型规模。
可选地,计算设备获取目标约束条件,目标约束条件指示目标实例为运行神经网络模型所限定的模型规模。
可选地,目标实例为物理实例或虚拟实例。示意性地,物理实例可以是物理机,虚拟实例可以是虚拟机或者容器或者裸金属服务器。其中,虚拟机也可称为云服务器(Elastic Compute Service,ECS)、或者弹性实例。本申请实施例对此不加以限定。
可选地,目标实例和计算设备为同一个设备,或者为不同的两个设备。比如,计算设备为服务器,目标实例为终端。又比如,计算设备和目标实例为同一个终端。
可选地,目标约束条件包括目标实例为运行神经网络模型提供的资源的约束条件,资源包括目标实例的计算资源和/或存储资源。
可选地,目标约束条件包括限定的模型规模信息,模型规模信息包括模型参数信息和模型计算信息。比如,目标约束条件包括每秒浮点运算次数(Floating Point Operations per Second,FLOPS)、或者模型参数量、或者模型的最大内存限制量。
步骤202,根据搜索空间获取预先训练完成的多个第一神经网络模型,搜索空间指示多个模型规模与多个第一神经网络模型之间的对应关系,多个第一神经网络模型各自的模型结构参数是不同的,模型结构参数指示第一神经网络模型的至少两个神经网络结构之间的比例。
可选地,计算设备预先训练完成多个第一神经网络模型。在一种可能的实现方式中,当计算设备为终端时,终端获取自身存储的训练好的多个第一神经网络模型,或者从服务器中获取训练好的多个第一神经网络模型。在另一种可能的实现方式中,当计算设备为服务器时,服务器获取自身存储的训练好的多个第一神经网络模型。
可选地,计算设备预先构建搜索空间,搜索空间指示多个模型规模与多个第一神经网络模型之间的对应关系。也即,搜索空间指示多个模型规模与多种模型结构参数之间的对应关系,模型结构参数包括至少两个神经网络结构各自对应的参数和/或至少两个神经网络结构之间的参数量比例。
可选地,搜索空间指示的多个模型规模与多个第一神经网络模型之间的对应关系,包括:多个模型规模与多个第一神经网络模型之间一对多的对应关系,即每个模型规模对应一个混合神经网络模型集合,混合神经网络集合包括至少两个第一神经网络模型。
示意性的,在多个模型规模中,任意两个模型规模各自对应的混合神经网络模型集合不存在交集。或者,在多个模型规模中,存在至少两个模型规模各自对应的混合神经网络模型集合不存在交集。
可选地,计算设备获取超网络的搜索空间,超网络包括共享各自的模型结构参数的多个神经网络模型,搜索空间还指示超网络中的部分神经网络模型为多个第一神经网络模型。在一种可能的实现方式中,搜索空间包括多个第一神经网络模型。在另一种可能的实现方式中,搜索空间包括搜索规则,搜索规则指示超网络中的部分神经网络模型为多个第一神经网络模型。
可选地,第一神经网络模型为采用样本图像和正确结果信息对混合神经网络结构进行训练得到的模型。其中,正确结果信息为预先标注的与样本电图像对应的正确结果信息。
第一神经网络模型是具有对图像中结果信息进行识别的循环神经网络模型。
第一神经网络模型用于将输入的图像转化为结果信息。第一神经网络模型用于表示图像与结果信息之间的相关关系。
第一神经网络模型为预设的数学模型,该故障检测模型包括图像与结果信息之间的模型系数。模型系数可以为固定值,也可以是随时间动态修改的值,还可以是随着应用场景动态修改的值。
可选地,结果信息为分类结果信息、检测结果信息、分割结果信息中的任意一种。本申请实施例对此不加以限定。
第一神经网络模型为包括至少两个神经网络结构的混合神经网络模型。可选地,第一神经网络模型为包括两个神经网络结构的混合神经网络模型,比如第一神经网络模型为包括CNN结构和Transformer结构的混合神经网络模型。又比如,第一神经网络模型为包括图卷积网络(Graph Convolutional Networks,GCN)结构和Transformer结构的混合神经网络模型。本申请实施例对此不加以限定。
可选地,多个第一神经网络模型中,每个第一神经网络模型所包括的混合神经网络结构的类型是相同的。比如,多个第一神经网络模型中,每个第一神经网络模型均为包括CNN结构和Transformer结构的混合神经网络模型。
可选地,多个第一神经网络模型中,存在至少两个神经网络模型的混合神经网络结构的类型是不同的,和/或,任意两个第一神经网络模型各自对应的模型结构参数是不同的。
比如,多个第一神经网络模型中,存在至少一个第一神经网络模型为包括CNN结构和Transformer结构的混合神经网络模型,存在至少一个第一神经网络模型为包括GCN结构和Transformer结构的混合神经网络模型。
可选地,在多个第一神经网络模型中,每个第一神经网络模型所包括的混合神经网络 结构的类型是相同的情况下,任意两个第一神经网络模型各自对应的模型结构参数是不同的,第一神经网络模型对应的模型结构参数为该第一神经网络模型所包括的至少两个神经网络结构各自对应的参数和/或至少两个神经网络结构之间的参数量比例。即,多个第一神经网络模型为具有不同的模型结构参数的多个混合神经网络模型。
步骤203,在多个第一神经网络模型中筛选出符合目标约束条件的目标神经网络模型。
可选地,计算设备根据搜索空间指示的多个模型规模与多个第一神经网络模型之间的对应关系,将目标约束条件指示的模型规模所对应的第一神经网络模型确定为目标神经网络模型,目标神经网络模型为多个第一神经网络模型中的一个神经网络模型。
其中,目标神经网络模型为符合目标约束条件的待运行的神经网络模型,目标神经网络模型包括至少两个神经网络结构的混合神经网络模型,目标神经网络模型指示所包括的至少两个神经网络结构之间的比例。
可选地,目标神经网络模型包括两个神经网络结构,比如两个神经网络结构为CNN结构和Transformer结构。
可选地,计算设备在多个第一神经网络模型中筛选出符合目标约束条件的目标神经网络模型,包括:计算设备在多个第一神经网络模型中筛选出符合目标约束条件的至少两个第二神经网络模型;将至少两个第二神经网络模型在目标任务对应的目标数据集上进行训练,得到至少两个第二神经网络模型各自对应的评价参数,评价参数指示第二神经网络模型与目标任务的匹配程度;根据至少两个第二神经网络模型各自对应的评价参数,确定目标神经网络模型。
可选地,目标任务为待处理的迁移下游任务。目标任务包括分类任务、检测任务和分割任务中的一种。需要说明的是,目标任务还可以是其他类型的任务,本申请实施例对此不加以限定。
可选地,目标数据集为目标任务部署所用的数据集。比如,下游迁移数据集为PASCAL VOC数据集、或者COCO数据集或者Pets数据集。
可选地,计算设备在筛选出的至少两个第二神经网络模型中随机采样出K个第二神经网络模型,将K个第二神经网络模型在下游迁移数据集上进行训练,得到K个第二神经网络模型各自对应的评价参数。根据K个第二神经网络模型各自对应的评价参数,确定目标神经网络模型,K为大于1的正整数。
可选地,评价参数为评价指标的精度值;计算设备根据至少两个第二神经网络模型各自对应的评价参数,确定目标神经网络模型,包括:根据至少两个第二神经网络模型各自对应的评价指标的精度值,将精度值最高的第二神经网络模型确定为目标神经网络模型。
比如评价指标为平均AP值(Mean Average Precision,mAP)或者Top1。本申请实施例对此不加以限定。
综上所述,本申请实施例通过预先训练完成多个混合神经网络模型即多个第一神经网络模型,多个第一神经网络模型各自的模型结构参数是不同的,模型结构参数指示第一神经网络模型的至少两个神经网络结构之间的比例,使得在限定模型规模的目标约束条件下,能够在多个第一神经网络模型中自动搜索到符合当前目标约束条件的目标神经网络模型,也即搜索到理想的混合神经网络结构和混合的至少两个神经网络结构之间的比例,提高了神经网络结构搜索的效果。
请参考图3,其示出了本申请另一个示例性实施例提供的神经网络结构搜索方法的流程图。为了方便说明,本实施例以该方法涉及的模型训练过程和模型应用过程均用于图1所示的计算设备中来举例说明。该方法包括以下几个步骤。
步骤301,获取预设的多个原始神经网络模型,原始神经网络模型指示至少两个神经网络结构和至少两个神经网络结构之间的比例。
计算设备获取预设的多个原始神经网络模型,原始神经网络模型为包括至少两个神经网络结构的混合神经网络模型。可选地,原始神经网络模型为包括两个神经网络结构的混合神经网络模型,比如原始神经网络模型为包括CNN结构和Transformer结构的混合神经网络模型。又比如,原始神经网络模型为包括GCN结构和Transformer结构的混合神经网络模型。本申请实施例对此不加以限定。
步骤302,根据训练样本集对多个原始神经网络模型进行训练,得到训练完成的多个第一神经网络模型,训练样本集中包括多个样本图像。
可选地,计算设备获取训练样本集,训练样本集中包括多个样本图像。计算设备将训练样本集中的每个样本图像进行预处理得到预处理后的样本图像,根据预处理后的多个样本图像对多个原始神经网络模型进行训练,得到训练完成的多个第一神经网络模型。
可选地,预处理包括归一化处理、裁剪处理(包括以随机裁剪的方式裁剪至统一的大小,比如224×224)、数据增强处理中的至少一种,数据增强处理包括翻转、随机擦除、随机自动增强中的至少一种。本申请实施例对此不加以限定。
可选地,计算设备按照预设采样周期从多个原始神经网络模型中确定采样模型;对于训练样本集中的至少一个样本图像,根据至少一个样本图像对当前确定出的采样模型进行训练得到对应的第一神经网络模型。
可选地,计算设备将预处理后的多个样本图像分批次输入至采样模型中进行训练。采样模型是按照预设采样周期从多个原始神经网络模型中确定的模型。示意性的,采样模型为多个原始神经网络模型中模型规模最大的模型、或者为多个原始神经网络模型中模型规模最小的模型、或者为多个原始神经网络模型中随机确定的模型。
可选地,计算设备根据至少一个样本图像对当前确定出的采样模型进行训练得到对应的第一神经网络模型,包括:根据至少一个样本图像,调用当前确定出的采样模型进行前向传播,并计算损失函数,根据损失函数采用反向传播算法和预设优化器进行训练,经过多次迭代训练得到对应的第一神经网络模型,即最终得到多个原始神经网络模型各自对应的训练完成的第一神经网络模型。
在一个示意性的例子中,包括CNN结构和Transformer结构的混合神经网络模型的示意图如图4所示。该混合神经网络模型包括双分支的网络结构,分别为CNN结构和Transformer结构。其中,在CNN结构上采用动态卷积层模块,动态卷积层模块包括可变宽度的动态卷积算子;在Transformer结构上采用动态Transformer模块,动态Transformer模块包括可变宽度、头部(英文:head)数目中的至少一种参数。其中,d1,d2,d3,d4分别表示四个阶段的网络层数,d1、d2、d3、d4也会在模型训练过程中动态变化,比如d1的数值为4或5,d2的数值为4或5,d3的数值为3或4,d4的数值为1,四个阶段的网络层数相加d1+d2+d3+d4即为d-model,d-model的数值位于12-15之间。
步骤303,获取目标约束条件,目标约束条件指示目标实例为运行神经网络模型所限定的模型规模。
计算设备获取目标约束条件,目标约束条件指示目标实例为运行神经网络模型所限定的模型规模。
步骤304,根据搜索空间获取预先训练完成的多个第一神经网络模型,搜索空间指示多个模型规模与多个第一神经网络模型之间的对应关系,多个第一神经网络模型各自的模型结构参数是不同的,模型结构参数指示第一神经网络模型的至少两个神经网络结构之间的比例。
计算设备获取超网络的搜索空间,获取搜索空间指示的多个第一神经网络模型,其中,搜索空间指示多个模型规模与多个第一神经网络模型之间的对应关系,多个第一神经网络模型各自的模型结构参数是不同的,模型结构参数指示第一神经网络模型的至少两个神经网络结构之间的比例。
在一个示意性的例子中,基于图4提供的包括CNN结构和Transformer结构的混合神经网络模型,超网络的搜索空间的示意图如图5所示。每个三元组的值分别表示该维度采样的起始点、终止点、步长。在训练过程中每个迭代过程中,网络会随机从表中采样结构参数并构建结构,然后以此结构进行训练。在测试过程中,也将依照此表制定的搜索空间进行模型结构搜索。其中,widthstem对应图4中的卷积层的卷积的输出通道数;widthb1为d1所对应网络的输出通道数;widthb2为d2所对应网络的输出通道数;widthb3为d3所对应网络的输出通道数,或者也称为d4所对应网络的输出通道数,d3所对应的网络的输出通道数与d4所对应的网络的输出通道数是相同的;dime为自注意力层的特征通道数;nh为自注意力层中注意力头的数量;ratioqkv为自注意力层中q、k、v矩阵的特征维度比例,即q、k、v矩阵的特征维度=n-h*ratioqkv;ratiomlp为多层感知器(Multilayer Perceptron,MLP)层中的隐藏特征维度比例,即MLP层隐藏特征维度为dime*ratiomlp;dmodel为Transformer结构中块的数量。比如,根据超网的大小定义,其中第一类型的超网络也称超小超网,参数量范围为5.5M~13.5M;第二类型的超网络也称小超网,参数量范围为15.1M~39.9M;第三类型的超网络也称基础超网,参数量范围为23.2M~66.2M。本申请实施例对此不加以限定。
步骤305,在多个第一神经网络模型中筛选出符合目标约束条件的目标神经网络模型。
计算设备在多个第一神经网络模型中筛选出符合目标约束条件的目标神经网络模型。需要说明的是,步骤303至步骤305的相关细节可参考上述实施例中的相关描述,在此不再赘述。
综上所述,本申请实施例构建了搜索空间,搜索空间指示多个模型规模与多个第一神经网络模型之间的对应关系,多个第一神经网络模型为具有不同比例的混合神经网络模型,在不同的模型规模下可以搜索到理想模型即符合目标约束条件的目标神经网络模型。构建了模型训练策略,可大幅提高模型训练和后续的网络搜索效率。本申请实施例提供的神经网络结构搜索方法可以在公有云/混合云/私有云上应用,提供智能高效的神经网络结构搜索和生产能力并部署。本申请实施例提供的神经网络结构搜索方法可涉及各种分类、检测、分割等各种下游任务。
请参考图6,其示出了本申请另一个示例性实施例提供的神经网络结构搜索方法的流程图,本实施例以该方法用于图1所示的计算设备中来举例说明。该方法包括以下几个步骤。
步骤601,选取上游的图像分类数据集作为训练样本集。
步骤602,获取预设的超网络的搜索空间。
其中,超网络包括共享各自的模型结构参数的多个神经网络模型,搜索空间指示超网络中的部分神经网络模型为多个第一神经网络模型。
步骤603,将训练样本集中每个样本图像进行预处理。
可选地,将训练样本集中每个样本图像进行归一化并以随机裁剪的方式裁剪至统一的大小,然后进行翻转,随机擦除,随机自动增强等数据增强处理。
步骤604,从预设的搜索空间中随机采样一个模型结构作为采样模型。
可选地,在每个批次的样本图像送入超网络之前,从预设的搜索空间中随机采样一个模型结构。采样方法如下,我们以T为周期进行结构类型的采样,在一个预设采样周期内分别采集搜索空间内模型规模最大的模型、模型规模最小的模型、随机采样的T-2个模型;
步骤605,将预处理后的多个样本图像分成各个批次送入采样模型中进行训练,得到多个第一神经网络模型。
可选地,将预处理后的多个样本图像分成各个批次(比如256张图片)送入步骤S3中被采样模型进行前向传播,并计算交叉熵损失函数。采用反向传播算法和预设优化器(比如AdamW优化器)来减小损失函数以训练该模型,经过多次迭代训练得到最终的第一神经网络模型。
步骤606,在完成训练后,遍历搜索空间指示的多个第一神经网络模型,进行信息统计并制表。
可选地,信息统计包括FLOPS、参数量、最大内存限制中的至少一种信息的统计。完成以上步骤,即完成了上游构建,计算设备后续可以根据下游场景(比如数据、模型大小等)进行结构搜索及定制化。
步骤607,根据目标约束条件,在搜索空间中进行逐一遍历,在多个第一神经网络模型中筛选出符合目标约束条件的至少两个第二神经网络模型。
步骤608,选取特定的目标任务。
其中迁移下游任务即为上述的目标任务。
步骤609,选取下游迁移数据集。
其中下游迁移数据集即为上述的目标数据集,为迁移下游任务部署所用的数据集。
步骤610,将至少两个第二神经网络模型在下游迁移数据集上进行训练,得到至少两个第二神经网络模型各自对应的评价参数。
可选地,评价参数为评价指标的精度值,比如评价指标为平均AP值(Mean Average Precision,mAP)或者Top1。本申请实施例对此不加以限定。
步骤611,根据至少两个第二神经网络模型各自对应的评价参数,确定目标神经网络模型。
可选地,在筛选出的至少两个第二神经网络模型中随机采样出K个第二神经网络模型,将K个第二神经网络模型在下游迁移数据集上进行训练,得到K个第二神经网络模型各自 对应的评价指标的精度值,将精度值最高的第二神经网络模型确定为目标神经网络模型。
综上所述,本申请实施例中的多个第一神经网络模型为包括至少两个神经网络结构的混合神经网络模型,基于多个第一神经网络模型构建了超网络的搜索空间,搜索空间涉及不同层的至少两个神经网络结构的参数量比例,能够最大程度上使模型具备良好的局部特征提取和全局建模能力。并且,本申请实施例提供的神经网络结构搜索方法解决了相关技术中模型结构不一致性的问题,能够在多个模型规模下,得出具有最佳的准确度-参数量均衡比的目标神经网络模型。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
请参考图7,其示出了本申请一个示例性实施例提供的神经网络结构搜索装置的框图。该装置可以通过软件、硬件或者两者的结合实现成为计算设备的全部或者一部分。该装置可以包括:第一获取单元710、第二获取单元720和筛选单元730。
第一获取单元710,用于获取目标约束条件,目标约束条件指示目标实例为运行神经网络模型所限定的模型规模;
第二获取单元720,用于根据搜索空间获取预先训练完成的多个第一神经网络模型,搜索空间指示多个模型规模与多个第一神经网络模型之间的对应关系,多个第一神经网络模型各自的模型结构参数是不同的,模型结构参数指示第一神经网络模型的至少两个神经网络结构之间的比例;
筛选单元730,用于在多个第一神经网络模型中筛选出符合目标约束条件的目标神经网络模型。
在一种可能的实现方式中,筛选单元730,还用于:
在多个第一神经网络模型中筛选出符合目标约束条件的至少两个第二神经网络模型;
将至少两个第二神经网络模型在目标任务对应的目标数据集上进行训练,得到至少两个第二神经网络模型各自对应的评价参数,评价参数指示第二神经网络模型与目标任务的匹配程度;
根据至少两个第二神经网络模型各自对应的评价参数,确定目标神经网络模型。
在另一种可能的实现方式中,该装置还包括:第三获取单元;
第三获取单元,用于获取超网络的搜索空间,超网络包括共享各自的模型结构参数的多个神经网络模型,搜索空间还指示超网络中的部分神经网络模型为多个第一神经网络模型。
在另一种可能的实现方式中,至少两个神经网络结构包括卷积神经网络CNN结构和Transformer结构。
在另一种可能的实现方式中,该装置还包括:模型训练单元;模型训练单元用于:
获取预设的多个原始神经网络模型,原始神经网络模型指示至少两个神经网络结构和至少两个神经网络结构之间的比例;
根据训练样本集对多个原始神经网络模型进行训练,得到训练完成的多个第一神经网络模型,训练样本集中包括多个样本图像。
在另一种可能的实现方式中,模型训练单元用于:
按照预设采样周期从多个原始神经网络模型中确定采样模型;
对于训练样本集中的至少一个样本图像,根据至少一个样本图像对当前确定出的采样模型进行训练得到对应的第一神经网络模型。
在另一种可能的实现方式中,采样模型为多个原始神经网络模型中模型规模最大的模型、或者为多个原始神经网络模型中模型规模最小的模型、或者为多个原始神经网络模型中随机确定的模型。
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本申请实施例提供了一种神经网络结构搜索装置,该装置包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述由计算设备执行的方法。
本申请实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当计算机可读代码在计算设备的处理器中运行时,计算设备中的处理器执行上述由计算设备执行的方法。
本申请实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,计算机程序指令被处理器执行时实现上述由计算设备执行的方法。
本申请实施例还提供了一种计算设备集群。如图8所示,该计算设备集群包括至少一台计算设备800。计算设备800包括:总线802、处理器804、存储器806和通信接口808。处理器804、存储器806和通信接口808之间通过总线802通信。该计算设备800可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备800也可以是台式机、笔记本电脑或者智能手机等终端设备。该计算设备集群中的一个或多个计算设备800中的存储器806中可以存有相同的用于执行神经网络结构搜索方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备800的存储器806中也可以分别存有用于执行神经网络结构搜索方法的部分指令。换言之,一个或多个计算设备800的组合可以共同执行用于执行神经网络结构搜索方法的指令。
需要说明的是,计算设备集群中的不同的计算设备800中的存储器806可以存储不同的指令,分别用于执行神经网络结构搜索装置的部分功能。也即,不同的计算设备800中的存储器806存储的指令可以实现第一获取单元、第二获取单元和筛选单元中的一个或多个单元的功能。
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图9示出了一种可能的实现方式。如图9所示,两个计算设备800A和800B之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备800A中的存储器806中存有执行第一获取单元、第二获取单元和筛选单元的功能的指令。同时,计算设备 800B中的存储器806中存有执行模型训练单元的功能的指令。
图9所示的计算设备集群之间的连接方式可以是考虑到本申请提供的神经网络结构搜索方法需要大量地训练混合神经网络模型并存储训练完成的多个第一神经网络模型,因此考虑将模型训练单元实现的功能交由计算设备800B执行。
应理解,图9中示出的计算设备800A的功能也可以由多个计算设备800完成。同样,计算设备800B的功能也可以由多个计算设备800完成。
本申请的实施例还提供了一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器;至少一个计算设备的处理器用于执行至少一个计算设备的存储器中存储的指令,以使得计算设备集群执行上述由计算设备执行的方法。
本申请的实施例还提供了一种包含指令的计算机程序产品,当指令被计算设备集群运行时,使得计算设备集群执行上述由计算设备执行的方法。
本申请的实施例还提供了一种计算机可读存储介质,包括计算机程序指令,当计算机程序指令由计算设备集群执行时,计算设备集群执行上述由计算设备执行的方法。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
这里所描述的计算机可读程序指令或代码可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服 务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本申请的多个实施例的装置、系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行相应的功能或动作的硬件(例如电路或ASIC(Application Specific Integrated Circuit,专用集成电路))来实现,或者可以用硬件和软件的组合,如固件等来实现。
尽管在此结合各实施例对本申请进行了描述,然而,在实施所要求保护的本申请过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (13)

  1. 一种神经网络结构搜索方法,其特征在于,所述方法包括:
    获取目标约束条件,所述目标约束条件指示目标实例为运行神经网络模型所限定的模型规模;
    根据搜索空间获取预先训练完成的多个第一神经网络模型,所述搜索空间指示多个模型规模与所述多个第一神经网络模型之间的对应关系,所述多个第一神经网络模型各自的模型结构参数是不同的,所述模型结构参数指示所述第一神经网络模型中的至少两个神经网络结构之间的比例;
    在所述多个第一神经网络模型中筛选出符合所述目标约束条件的目标神经网络模型。
  2. 根据权利要求1所述的方法,其特征在于,所述在所述多个第一神经网络模型中筛选出符合所述目标约束条件的目标神经网络模型,包括:
    在所述多个第一神经网络模型中筛选出符合所述目标约束条件的至少两个第二神经网络模型;
    将所述至少两个第二神经网络模型在目标任务对应的目标数据集上进行训练,得到所述至少两个第二神经网络模型各自对应的评价参数,所述评价参数指示所述第二神经网络模型与所述目标任务的匹配程度;
    根据所述至少两个第二神经网络模型各自对应的评价参数,确定所述目标神经网络模型。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据搜索空间获取预先训练完成的多个第一神经网络模型,所述搜索空间指示多个模型规模与所述多个第一神经网络模型之间的对应关系根据搜索空间获取预先训练完成的多个第一神经网络模型,所述搜索空间指示多个模型规模与所述多个第一神经网络模型之间的对应关系之前,还包括:
    获取超网络的所述搜索空间,所述超网络包括共享各自的模型结构参数的多个神经网络模型,所述搜索空间还指示所述超网络中的部分神经网络模型为所述多个第一神经网络模型。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述至少两个神经网络结构包括卷积神经网络CNN结构和Transformer结构。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述根据搜索空间获取预先训练完成的多个第一神经网络模型之前,还包括:
    获取预设的多个原始神经网络模型,所述原始神经网络模型指示所述至少两个神经网络结构和所述至少两个神经网络结构之间的比例;
    根据训练样本集对所述多个原始神经网络模型进行训练,得到训练完成的所述多个第一神经网络模型,所述训练样本集中包括多个样本图像。
  6. 根据权利要求5所述的方法,其特征在于,所述根据训练样本集对所述多个原始神经网络模型进行训练,得到训练完成的所述多个第一神经网络模型,包括:
    按照预设采样周期从所述多个原始神经网络模型中确定采样模型;
    对于所述训练样本集中的至少一个样本图像,根据所述至少一个样本图像对当前确定出的所述采样模型进行训练得到对应的所述第一神经网络模型。
  7. 根据权利要求6所述的方法,其特征在于,
    所述采样模型为所述多个原始神经网络模型中模型规模最大的模型、或者为所述多个原始神经网络模型中模型规模最小的模型、或者为所述多个原始神经网络模型中随机确定的模型。
  8. 一种神经网络结构搜索装置,其特征在于,所述装置包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令时实现权利要求1-7任意一项所述的方法。
  9. 一种神经网络结构搜索装置,其特征在于,所述装置包括:
    第一获取单元,用于获取目标约束条件,所述目标约束条件指示目标实例为运行神经网络模型所限定的模型规模;
    第二获取单元,用于根据搜索空间获取预先训练完成的多个第一神经网络模型,所述搜索空间指示多个模型规模与所述多个第一神经网络模型之间的对应关系,所述多个第一神经网络模型各自的模型结构参数是不同的,所述模型结构参数指示所述第一神经网络模型中的至少两个神经网络结构之间的比例;
    筛选单元,用于在所述多个第一神经网络模型中筛选出符合所述目标约束条件的目标神经网络模型。
  10. 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-7中任意一项所述的方法。
  11. 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;
    所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1-7中任意一项所述的方法。
  12. 一种包含指令的计算机程序产品,其特征在于,当所述指令被计算设备集群运行时,使得所述计算设备集群执行如权利要求1-7中任意一项所述的方法。
  13. 一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如权利要求1-7中任意一项所述的 方法。
PCT/CN2023/081678 2022-08-25 2023-03-15 神经网络结构搜索方法、装置及存储介质 WO2024040941A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211026784.6 2022-08-25
CN202211026784.6A CN117688984A (zh) 2022-08-25 2022-08-25 神经网络结构搜索方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2024040941A1 true WO2024040941A1 (zh) 2024-02-29

Family

ID=90012289

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081678 WO2024040941A1 (zh) 2022-08-25 2023-03-15 神经网络结构搜索方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN117688984A (zh)
WO (1) WO2024040941A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117829083A (zh) * 2024-03-01 2024-04-05 上海励驰半导体有限公司 基于神经网络的布线方法、装置、电子设备及存储介质
CN118033732A (zh) * 2024-04-12 2024-05-14 中国石油大学(华东) 一种基于空域频域融合架构的地震数据重建方法

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382868A (zh) * 2020-02-21 2020-07-07 华为技术有限公司 神经网络结构搜索方法和神经网络结构搜索装置
CN111767990A (zh) * 2020-06-29 2020-10-13 北京百度网讯科技有限公司 神经网络的处理方法和装置
CN111767988A (zh) * 2020-06-29 2020-10-13 北京百度网讯科技有限公司 神经网络的融合方法和装置
CN112116090A (zh) * 2020-09-28 2020-12-22 腾讯科技(深圳)有限公司 神经网络结构搜索方法、装置、计算机设备及存储介质
US20210056420A1 (en) * 2018-05-10 2021-02-25 Panasonic Semiconductor Solutions Co., Ltd. Neural network construction device, information processing device, neural network construction method, and recording medium
CN112434462A (zh) * 2020-10-21 2021-03-02 华为技术有限公司 一种模型的获取方法及设备
CN112445823A (zh) * 2019-09-04 2021-03-05 华为技术有限公司 神经网络结构的搜索方法、图像处理方法和装置
CN112949842A (zh) * 2021-05-13 2021-06-11 北京市商汤科技开发有限公司 神经网络结构搜索方法、装置、计算机设备以及存储介质
CN113505883A (zh) * 2021-05-31 2021-10-15 华为技术有限公司 一种神经网络训练方法以及装置
US20210383223A1 (en) * 2020-06-03 2021-12-09 Google Llc Joint Architecture And Hyper-Parameter Search For Machine Learning Models
CN113781518A (zh) * 2021-09-10 2021-12-10 商汤集团有限公司 神经网络结构搜索方法及装置、电子设备和存储介质
CN113869496A (zh) * 2021-09-30 2021-12-31 华为技术有限公司 一种神经网络的获取方法、数据处理方法以及相关设备
CN114330692A (zh) * 2021-12-30 2022-04-12 科大讯飞股份有限公司 神经网络模型的部署方法、装置、设备及存储介质
CN114492767A (zh) * 2022-03-28 2022-05-13 深圳比特微电子科技有限公司 用于搜索神经网络的方法、装置及存储介质
CN115034449A (zh) * 2022-05-25 2022-09-09 国网安徽省电力有限公司电力科学研究院 架空线停电预测神经网络结构搜索方法及装置

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210056420A1 (en) * 2018-05-10 2021-02-25 Panasonic Semiconductor Solutions Co., Ltd. Neural network construction device, information processing device, neural network construction method, and recording medium
CN112445823A (zh) * 2019-09-04 2021-03-05 华为技术有限公司 神经网络结构的搜索方法、图像处理方法和装置
CN111382868A (zh) * 2020-02-21 2020-07-07 华为技术有限公司 神经网络结构搜索方法和神经网络结构搜索装置
US20210383223A1 (en) * 2020-06-03 2021-12-09 Google Llc Joint Architecture And Hyper-Parameter Search For Machine Learning Models
CN111767988A (zh) * 2020-06-29 2020-10-13 北京百度网讯科技有限公司 神经网络的融合方法和装置
CN111767990A (zh) * 2020-06-29 2020-10-13 北京百度网讯科技有限公司 神经网络的处理方法和装置
CN112116090A (zh) * 2020-09-28 2020-12-22 腾讯科技(深圳)有限公司 神经网络结构搜索方法、装置、计算机设备及存储介质
CN112434462A (zh) * 2020-10-21 2021-03-02 华为技术有限公司 一种模型的获取方法及设备
CN112949842A (zh) * 2021-05-13 2021-06-11 北京市商汤科技开发有限公司 神经网络结构搜索方法、装置、计算机设备以及存储介质
CN113505883A (zh) * 2021-05-31 2021-10-15 华为技术有限公司 一种神经网络训练方法以及装置
CN113781518A (zh) * 2021-09-10 2021-12-10 商汤集团有限公司 神经网络结构搜索方法及装置、电子设备和存储介质
CN113869496A (zh) * 2021-09-30 2021-12-31 华为技术有限公司 一种神经网络的获取方法、数据处理方法以及相关设备
CN114330692A (zh) * 2021-12-30 2022-04-12 科大讯飞股份有限公司 神经网络模型的部署方法、装置、设备及存储介质
CN114492767A (zh) * 2022-03-28 2022-05-13 深圳比特微电子科技有限公司 用于搜索神经网络的方法、装置及存储介质
CN115034449A (zh) * 2022-05-25 2022-09-09 国网安徽省电力有限公司电力科学研究院 架空线停电预测神经网络结构搜索方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117829083A (zh) * 2024-03-01 2024-04-05 上海励驰半导体有限公司 基于神经网络的布线方法、装置、电子设备及存储介质
CN117829083B (zh) * 2024-03-01 2024-05-28 上海励驰半导体有限公司 基于神经网络的布线方法、装置、电子设备及存储介质
CN118033732A (zh) * 2024-04-12 2024-05-14 中国石油大学(华东) 一种基于空域频域融合架构的地震数据重建方法
CN118033732B (zh) * 2024-04-12 2024-06-07 中国石油大学(华东) 一种基于空域频域融合架构的地震数据重建方法

Also Published As

Publication number Publication date
CN117688984A (zh) 2024-03-12

Similar Documents

Publication Publication Date Title
WO2022083536A1 (zh) 一种神经网络构建方法以及装置
WO2022042002A1 (zh) 一种半监督学习模型的训练方法、图像处理方法及设备
WO2021078027A1 (zh) 构建网络结构优化器的方法、装置及计算机可读存储介质
WO2021057056A1 (zh) 神经网络架构搜索方法、图像处理方法、装置和存储介质
CN110321910B (zh) 面向点云的特征提取方法、装置及设备
WO2024040941A1 (zh) 神经网络结构搜索方法、装置及存储介质
WO2020073951A1 (zh) 用于图像识别的模型的训练方法、装置、网络设备和存储介质
WO2021238366A1 (zh) 一种神经网络构建方法以及装置
WO2021147325A1 (zh) 一种物体检测方法、装置以及存储介质
WO2021218517A1 (zh) 获取神经网络模型的方法、图像处理方法及装置
CN110991311A (zh) 一种基于密集连接深度网络的目标检测方法
WO2021051987A1 (zh) 神经网络模型训练的方法和装置
WO2021218470A1 (zh) 一种神经网络优化方法以及装置
WO2022100165A1 (zh) 神经网络模型的训练方法、图像处理方法及装置
WO2021027157A1 (zh) 基于图片识别的车险理赔识别方法、装置、计算机设备及存储介质
CN113420651B (zh) 深度卷积神经网络的轻量化方法、系统及目标检测方法
WO2021042857A1 (zh) 图像分割模型的处理方法和处理装置
CN113609337A (zh) 图神经网络的预训练方法、训练方法、装置、设备及介质
CN113240683A (zh) 基于注意力机制的轻量化语义分割模型构建方法
WO2023020214A1 (zh) 检索模型的训练和检索方法、装置、设备及介质
CN113987236A (zh) 基于图卷积网络的视觉检索模型的无监督训练方法和装置
CN111079930A (zh) 数据集质量参数的确定方法、装置及电子设备
CN112418256A (zh) 分类、模型训练、信息搜索方法、系统及设备
WO2021057690A1 (zh) 构建神经网络的方法与装置、及图像处理方法与装置
CN111079900B (zh) 一种基于自适应连接神经网络的图像处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856038

Country of ref document: EP

Kind code of ref document: A1