CN115759197A

CN115759197A - Neural network searching method and device and computer equipment

Info

Publication number: CN115759197A
Application number: CN202211423554.3A
Authority: CN
Inventors: 尹首一; 韩振华; 韩慧明; 魏少军
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-11-15
Filing date: 2022-11-15
Publication date: 2023-03-07

Abstract

The application relates to a neural network searching method, a neural network searching device and computer equipment. The method comprises the following steps: determining the number of target network layers; determining a plurality of initialized neural network models according to the number of target network layers, and judging the expression capacity of each initialized neural network model to obtain a corresponding judgment result; for any initialized neural network model, determining the on-chip storage information and the off-chip storage information of the initialized neural network model under the condition that the judgment result represents that the expression capability of the initialized neural network model meets the preset condition, and determining the performance data of the initialized neural network model according to the target bandwidth utilization rate under the condition that the on-chip storage information and the off-chip storage information of the initialized neural network model meet the storage condition; and determining a target neural network model according to the performance data of each initialized neural network model. By adopting the method, the target neural network model with better performance and higher precision in deploying the chip can be quickly obtained.

Description

Neural network searching method and device and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a neural network search method, an apparatus, and a computer device.

Background

The design of a network structure is a key for improving the classification performance of the network, and Neural Architecture Search (NAS) is a design means of the network structure, and essentially changes the process of manually adjusting the Neural network into a process of automatically executing tasks to discover a more complex Neural network Architecture.

NAS is generally divided into two directions: the first is the neural network structure search based on reinforcement learning and evolution algorithm, and the second is the neural network structure search based on gradient descent. The neural network architecture search based on reinforcement learning, taking NAS-net (neural network architecture search) as an example, is very large in search space dimensionality due to the fact that operators on each layer are not fixed, specification parameters are not fixed, connection relations are not fixed, and the number of network layers is not fixed, and therefore the network architecture search time is very long; and for gradient descent-based neural network structure search, for example, darts requires a short search time, but hardware information cannot be searched through gradients, and a connection relationship between a search model and a hardware data model cannot be established, so that the performance of the searched neural network model on deployed hardware is poor.

Therefore, the existing neural network structure searching method has the problem that the searching speed and the performance and the precision of the searched neural network model on a deployment chip cannot be considered at the same time.

Disclosure of Invention

In view of the above, it is necessary to provide a neural network searching method, apparatus, computer device, computer readable storage medium and computer program product for simultaneously considering the precision and performance of the searched neural network model on the deployment chip.

In a first aspect, the present application provides a neural network searching method, including:

determining the number of target network layers;

in the ith iterative search process, determining a plurality of initialized neural network models according to the number of the target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model, wherein i is an integer greater than 0;

for any initialized neural network model, determining on-chip storage information and off-chip storage information of the initialized neural network model under the condition that the judgment result represents that the expression capacity of the initialized neural network model meets a preset condition;

for any initialized neural network model, determining performance data of the initialized neural network model according to a target bandwidth utilization rate under the condition that the on-chip storage information and the off-chip storage information of the initialized neural network model meet storage conditions;

and determining a target neural network model according to the performance data of each initialized neural network model.

In one embodiment, the method further comprises:

entering a next iterative search process under the condition that the judgment results corresponding to the initialized neural network models all represent that the expression capacity of the initialized neural network models does not meet the preset condition, repeatedly determining a plurality of initialized neural network models according to the number of the target network layers, and judging the expression capacity of the initialized neural network models to obtain the judgment results corresponding to the initialized neural network models until the initialized neural network models corresponding to the judgment results representing that the expression capacity of the initialized neural network models meets the preset condition are obtained, or until the iteration times reach the target times;

and after the number of the target network layers is adjusted, repeating the steps of determining a plurality of initialized neural network models according to the adjusted number of the target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model.

In one embodiment, the determining on-chip storage information and off-chip storage information of the initialized neural network model comprises:

determining a chip calculation graph according to the original network graph layer of the initialized neural network model;

establishing a layer mapping relation between the original network layer and the chip computing graph;

acquiring sub-chip storage information of each layer of computation graph in the chip computation graph and acquiring off-chip storage information of the initialized neural network model;

and determining the in-chip storage information of the initialized neural network model according to the layer mapping relation and the in-chip storage information of each layer of the calculation graph.

In one embodiment, the determining the performance data of the initialized neural network model according to the target bandwidth utilization ratio in the case that the on-chip storage information and the off-chip storage information of the initialized neural network model satisfy the storage condition includes:

acquiring target storage parameters, wherein the target storage parameters comprise target off-chip storage parameters and target on-chip storage parameters;

comparing the on-chip storage information and the off-chip storage information of the initialized neural network model with the target off-chip storage parameters and the target on-chip storage parameters to obtain a comparison result of the initialized neural network model, wherein the comparison result comprises a first comparison result or a second comparison result, the first comparison result represents that the on-chip storage information and the off-chip storage information meet the storage condition, and the second comparison result represents that the on-chip storage information and the off-chip storage information of the initialized neural network model do not meet the storage condition;

and under the condition of obtaining the first comparison result, determining the performance data of the initialized neural network model according to the target bandwidth utilization rate.

In one embodiment, the comparing the on-chip storage information and the off-chip storage information of the initialized neural network model with the target off-chip storage parameter and the target on-chip storage parameter to obtain a comparison result includes:

obtaining the first comparison result under the condition that the on-chip storage information is less than or equal to the target on-chip storage parameter and the off-chip storage information is less than or equal to the target off-chip storage parameter; alternatively, the first and second liquid crystal display panels may be,

and obtaining the second comparison result under the condition that the on-chip storage information is larger than the target on-chip storage parameter and/or the off-chip storage information is larger than the target on-chip storage parameter.

In one embodiment, the method further comprises:

under the condition that the comparison result corresponding to each initialized neural network model is the second comparison result, entering a next iterative search process, taking the comparison result corresponding to each initialized neural network model as a constraint condition, repeatedly determining a plurality of initialized neural network models according to the number of target network layers, and judging the expression capacity of each initialized neural network model to obtain the judgment result corresponding to each initialized neural network model until the initialized neural network model corresponding to the first comparison result is obtained or until the iteration number reaches the target number;

and after the target network layer number and/or the target storage parameter are/is adjusted, the steps of determining a plurality of initialized neural network models according to the adjusted target network layer number, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model are repeated.

In one embodiment, the performance data includes inference time, and the determining performance data of the initialized neural network model according to the target bandwidth utilization includes:

determining a target bandwidth utilization;

and determining the inference time of the initialized neural network model according to the target bandwidth utilization rate and the target storage parameters, wherein the inference time represents the time of completing a target scene task by the initialized neural network model.

In one embodiment, said determining a target neural network model from said performance data of each of said initialized neural network models comprises:

taking the initialized neural network model as a target neural network model if the performance data of the initialized neural network model meets a target performance condition; alternatively, the first and second electrodes may be,

under the condition that the performance data of each initialized neural network model does not meet the target performance condition, entering a next iterative search process, repeating the steps of determining a plurality of initialized neural network models according to the number of target network layers, judging the expression capacity of each initialized neural network model and obtaining a judgment result corresponding to each initialized neural network model until the target neural network model is obtained or until the iteration number reaches the target number;

In a second aspect, the present application also provides a neural network searching apparatus, including:

the network layer number determining module is used for determining the number of target network layers;

the expression capacity judging module is used for determining a plurality of initialized neural network models according to the number of the target network layers in the ith iterative search process, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model, wherein i is an integer larger than 0;

the storage information determining module is used for determining on-chip storage information and off-chip storage information of the initialized neural network model under the condition that the judgment result represents that the expression capability of the initialized neural network model meets a preset condition;

a performance data determining module, configured to determine, for any one of the initialized neural network models, performance data of the initialized neural network model according to a target bandwidth utilization rate when the on-chip storage information and the off-chip storage information of the initialized neural network model satisfy a storage condition;

and the target network determining module is used for determining a target neural network model according to the performance data of each initialized neural network model.

In an embodiment, the neural network search device further includes an iteration device, where the iteration device is configured to enter a next iteration search process when the determination result corresponding to each of the initialized neural network models indicates that the expression capacity of the initialized neural network model does not satisfy the preset condition, repeat determining a plurality of initialized neural network models according to the number of target network layers, and perform expression capacity determination on each of the initialized neural network models to obtain a determination result corresponding to each of the initialized neural network models, until an initialized neural network model corresponding to a determination result indicating that the expression capacity of the initialized neural network model satisfies the preset condition is obtained, or until the iteration number reaches the target number; and after the number of the target network layers is adjusted, repeating the steps of determining a plurality of initialized neural network models according to the adjusted number of the target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model.

In an embodiment, the stored information determining module is further configured to determine a chip computation graph according to an original network layer of the initialized neural network model; establishing a layer mapping relation between the original network layer and the chip computation graph; acquiring sub-chip storage information of each layer of computational graph in the chip computational graph and acquiring off-chip storage information of the initialized neural network model; and determining the in-chip storage information of the initialized neural network model according to the layer mapping relation and the in-chip storage information of each layer of the calculation graph.

In one embodiment, the performance data determining module is further configured to obtain a target storage parameter, where the target storage parameter includes a target off-chip storage parameter and a target on-chip storage parameter; comparing the on-chip storage information and the off-chip storage information of the initialized neural network model with the target off-chip storage parameters and the target on-chip storage parameters to obtain a comparison result of the initialized neural network model, wherein the comparison result comprises a first comparison result or a second comparison result, the first comparison result represents that the on-chip storage information and the off-chip storage information meet the storage condition, and the second comparison result represents that the on-chip storage information and the off-chip storage information of the initialized neural network model do not meet the storage condition; and under the condition of obtaining the first comparison result, determining the performance data of the initialized neural network model according to the target bandwidth utilization rate.

In one embodiment, the performance data determining module is further configured to obtain the first comparison result when the on-chip storage information is less than or equal to the target on-chip storage parameter and the off-chip storage information is less than or equal to the target off-chip storage parameter; or, the second comparison result is obtained under the condition that the on-chip storage information is larger than the target on-chip storage parameter and/or the off-chip storage information is larger than the target on-chip storage parameter.

In an embodiment, the iterative device is further configured to enter a next iterative search process when the comparison result corresponding to each of the initialized neural network models is the second comparison result, use the comparison result corresponding to each of the initialized neural network models as a constraint condition, repeat the step of determining a plurality of initialized neural network models according to the number of target network layers, and perform the expression capability judgment on each of the initialized neural network models to obtain the judgment result corresponding to each of the initialized neural network models until the initialized neural network model corresponding to the first comparison result is obtained, or until the number of iterations reaches the target number; and after the target network layer number and/or the target storage parameter are/is adjusted, the steps of determining a plurality of initialized neural network models according to the adjusted target network layer number, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model are repeated.

In one embodiment, the performance data includes inference time, and the performance data determination module is further configured to determine a target bandwidth utilization; and determining the reasoning time of the initialized neural network model according to the target bandwidth utilization rate and the target storage parameters, wherein the reasoning time represents the time for completing a target scene task by the initialized neural network model.

In one embodiment, the target network determining module is further configured to take the initialized neural network model as a target neural network model if the performance data of the initialized neural network model meets a target performance condition; or, under the condition that the performance data of each initialized neural network model does not meet the target performance condition, entering a next iterative search process, repeating the steps of determining a plurality of initialized neural network models according to the number of target network layers, judging the expression capacity of each initialized neural network model, and obtaining a judgment result corresponding to each initialized neural network model until the target neural network model is obtained, or until the iteration number reaches the target number; and after the number of the target network layers is adjusted, repeating the steps of determining a plurality of initialized neural network models according to the adjusted number of the target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model.

In a third aspect, the present application further provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the foregoing method embodiments when executing the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps in the above-described method embodiments.

In a fifth aspect, the present application further provides a computer program product comprising a computer program that, when executed by a processor, performs the steps of the above-described method embodiments.

The neural network searching method, the neural network searching device, the computer equipment, the computer readable storage medium and the computer program product determine the number of target network layers; in the ith iterative search process, determining a plurality of initialized neural network models according to the number of the target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model; for any initialized neural network model, determining on-chip storage information and off-chip storage information of the initialized neural network model under the condition that the judgment result represents that the expression capability of the initialized neural network model meets a preset condition; for any initialized neural network model, determining performance data of the initialized neural network model according to a target bandwidth utilization rate under the condition that the on-chip storage information and the off-chip storage information of the initialized neural network model meet storage conditions; and determining a target neural network model according to the performance data of each initialized neural network model. According to the neural network searching method, the device, the computer equipment, the computer readable storage medium and the computer program product, firstly, the number of target network layers is manually set to search, the searching speed is improved, and meanwhile, the target neural network model is searched by combining the expression capacity, the on-chip storage information, the off-chip storage information and the target bandwidth utilization rate, so that the target neural network model with better performance and higher precision in the deployment of chips can be quickly obtained.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a neural network search method in one embodiment;

FIG. 2 is a schematic flow chart of step 106 in one embodiment;

FIG. 3 is a schematic flow chart of step 108 in one embodiment;

FIG. 4 is a schematic flow chart of step 108 in one embodiment;

FIG. 5 is a flow diagram illustrating a neural network searching method, according to one embodiment;

FIG. 6 is a block diagram showing the structure of a neural network searching apparatus according to an embodiment;

FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The neural network model search is currently a model design means, and is mainly divided into two directions: the method comprises the following steps of firstly searching a neural network model based on reinforcement learning and an evolutionary algorithm, and secondly searching the neural network model based on gradient descent. The gradient descent-based neural network model search takes a differential neural network model as an example, and the search cannot feed back chip bottom information to a model design level.

The neural network model architecture search has long network architecture search time due to the fact that each layer of operator is not fixed, specification parameters are not fixed, connection relation is not fixed, and the number of network layers is not fixed, and the search space dimension is large, and the network architecture search time is long, and in the case of NAS-net based on reinforcement learning provided by Google at first, more than 1000 GPU (graphics processing unit) days are needed, namely 1000 days are needed to run on one GPU; taking a neural network based on gradient descent as an example, darts (differential Architecture Search), a required Search time is not long, but hardware information cannot be searched through a gradient, and a connection relationship between a Search model and a hardware data model cannot be established, that is, although a neural network model searched by Darts is small, the time when the neural network model actually runs on hardware is long.

Currently, research on model expressive force, which is another method for searching neural network models, is becoming an important direction. The model expression force is input with data conforming to Gaussian distribution and is output as the performance of judging the model expression force, so that the optimal network structure characteristic can be quickly searched, and the search is carried out in a discrete space; but the method lacks the constraints of searching networks and target tasks, does not search for a targeted data set, and has poor specificity of the searched networks and lacks the constraints of hardware information.

Because the neural network models with different sizes can accurately evaluate the occupied storage size and the occupied bandwidth utilization rate in the process of compiling and deploying the neural network models in hardware, and meanwhile, the width of the neural network models has a large influence on the model precision, the method finds a searching method capable of balancing the parameter and precision changes brought by the width and width of the neural network models, improves the precision to the maximum extent, reduces the parameter quantity, and balances the hardware storage and bandwidth utilization rate, so that the searching method with small storage occupation and high bandwidth utilization rate has great practical value.

Based on this, the embodiment of the present application provides a neural network searching method, so as to solve the above problems, implement a fast search of a neural network structure, and obtain a neural network model with good performance and high precision when deploying a chip.

In an embodiment, as shown in fig. 1, a neural network searching method is provided, and this embodiment is illustrated by applying the method to a server, it is to be understood that the method may also be applied to a terminal, and may also be applied to a system including the terminal and the server, and is implemented through interaction between the terminal and the server. The neural network searching process can be completed in a large GPU server. In this embodiment, the method includes the steps of:

step 102, determining the number of target network layers.

And the target network layer number is the network layer number of the target neural network model to be searched. The initial number of target network layers can be artificially set according to different application scenarios, and the number of target network layers is not specifically limited in the embodiments of the present application. For example: the application scene may be face recognition, object classification, object detection, image recognition, etc.

And 104, in the ith iterative search process, determining a plurality of initialized neural network models according to the number of target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model, wherein i is an integer greater than 0.

In the embodiment of the present application, i is the number of iterations and is a positive integer starting from 1. First, a search space may be selected, and the search space is not specifically limited in the embodiment of the present application. And the initialized neural network model is the neural network model of which the number of network layers searched out according to the search space accords with the number of target network layers. For example, the expression capability may include the precision of the neural network model, and taking an application scenario as a classification task, on an open data set cifar10, the accuracy of the classification result of the initialized neural network model is obtained through data in the data set cifar10, that is, the precision of the initialized neural network model is obtained, and the determination result may be used to represent whether the precision of the initialized neural network model meets a preset target precision.

And 106, aiming at any initialized neural network model, determining the on-chip storage information and the off-chip storage information of the initialized neural network model under the condition that the judgment result represents that the expression capability of the initialized neural network model meets the preset condition.

In the embodiment of the application, taking the accuracy of the expression capability including the neural network model as an example, the preset condition may be a preset target accuracy, and if the accuracy of the initialized neural network model is equal to or greater than the preset target accuracy, a judgment result indicating that the expression capability of the initialized neural network model meets the preset condition is obtained. If the expression capability meets the preset condition, the accuracy requirement of the neural network model can be met after training. The on-chip storage information may include the storage amount occupied by the initialization neural network model stored on the chip after the neural network model is deployed on the chip. The off-chip storage information may include storage capacity occupied by off-chip storage after the neural network model is initialized to be deployed on the chip. The initialized neural network model can be quantitatively compiled to calculate the on-chip storage information and the off-chip storage information of the initialized neural network model.

And step 108, for any initialized neural network model, determining the performance data of the initialized neural network model according to the target bandwidth utilization rate under the condition that the on-chip storage information and the off-chip storage information of the initialized neural network model meet the storage conditions.

In this embodiment of the present application, the storage condition may be a target storage parameter that is set manually, the target storage parameter may include a target off-chip storage parameter and a target on-chip storage parameter, and the target off-chip storage parameter is a space parameter stored off-chip, that is, a storage amount that the target neural network model can occupy most in off-chip storage. The target on-chip storage parameter is a space parameter stored on a chip, namely the storage capacity which can be occupied by the target neural network model in the storage of the chip at most. The target bandwidth utilization rate is the bandwidth which can be occupied by the target neural network model to be searched at most in the process of deploying the target neural network model in hardware, and the target bandwidth utilization rate can be set artificially according to different application scenes.

Under the condition that the on-chip storage information of the initialized neural network model does not exceed the target on-chip storage parameter and the off-chip storage information does not exceed the target off-chip storage parameter, parameters such as the target bandwidth utilization rate and the like can be set on the simulator for performance simulation to obtain performance data.

Step 110, determining a target neural network model according to the performance data of each initialized neural network model.

In the embodiment of the application, if at least one initialized neural network model in each initialized neural network model has the performance data which can meet the preset performance condition, the initialized neural network model is taken as the target neural network model. The performance data may be a chip inference time, that is, a time required for the neural network model to run on the chip to complete a scene task, and the preset performance condition may be an inference time of the human being as a set target chip. And if the performance data of each initialized neural network model does not meet the preset performance condition, feeding the performance data back to the model search system for re-searching. The model searching system is an algorithm for bearing the neural network searching method.

The neural network searching method provided by the embodiment of the application determines the number of target network layers; in the ith iterative search process, determining a plurality of initialized neural network models according to the number of target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model; aiming at any initialized neural network model, determining on-chip storage information and off-chip storage information of the initialized neural network model under the condition that the judgment result represents that the expression capability of the initialized neural network model meets the preset condition; for any initialized neural network model, determining performance data of the initialized neural network model according to the target bandwidth utilization rate under the condition that the on-chip storage information and the off-chip storage information of the initialized neural network model meet the storage conditions; and determining a target neural network model according to the performance data of each initialized neural network model. According to the neural network searching method, firstly, searching is carried out based on the number of the target network layers which are set manually, the searching speed is improved, meanwhile, the target neural network model is searched by combining the expression capacity, the on-chip storage information, the off-chip storage information and the target bandwidth utilization rate, and the target neural network model which is good in performance and high in precision when a chip is deployed can be obtained quickly.

In one embodiment, the neural network searching method further comprises:

under the condition that the judgment results corresponding to the initialized neural network models all represent that the expression capacity of the initialized neural network models does not meet the preset conditions, entering the next iterative search process, repeatedly determining a plurality of initialized neural network models according to the number of target network layers, and judging the expression capacity of the initialized neural network models to obtain the judgment results corresponding to the initialized neural network models until the initialized neural network models corresponding to the judgment results representing that the expression capacity of the initialized neural network models meets the preset conditions are obtained, or until the iteration times reach the target times;

and after the number of target network layers is adjusted, repeating the steps of determining a plurality of initialized neural network models according to the adjusted number of target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model.

The target times are target search iteration times and can be set according to actual requirements.

Illustratively, the number of target network layers is 4, and the target number of times is 10. In the 1 st iterative search process, judging results of a plurality of initialized neural network model searches determined according to the number of target network layers all represent that the expression capacity does not meet the preset condition, entering the 2 nd iterative search process, determining a plurality of initialized neural network models according to the number of target network layers again, judging the expression capacity, if the judging results of the plurality of initialized neural network model searches obtained in the 2 nd iterative search process still all represent that the expression capacity does not meet the preset condition, entering the 3 rd iterative search process, and repeating the steps until the initialized neural network model with the expression capacity meeting the preset condition is obtained. Or when the 10 th iteration search is completed, the initialized neural network model with the expression capability meeting the preset condition still can not be obtained, the number of target network layers can be increased to 5, the steps of determining a plurality of initialized neural network models according to the adjusted number of target network layers, judging the expression capability of each initialized neural network model and obtaining the judgment result corresponding to each initialized neural network model can be repeated.

According to the embodiment of the disclosure, the initialized neural network model with the expression capability meeting the preset condition can be searched within limited times according to the number of the target network layers, the searching speed is accelerated, and the number of the target network layers and the expression capability are simultaneously used as constraint conditions, so that the precision and the performance of the finally obtained target neural network model are improved.

In one embodiment, as shown in fig. 2, determining on-chip storage information and off-chip storage information to initialize the neural network model in step 106 may include:

step 202, determining a chip calculation graph according to the original network layer of the initialized neural network model.

The original network layer may be obtained by quantizing the initialized neural network model, and the original network layer may include network structures of each layer, where the number of layers of the network structures is the number of layers of the target network. And marking and numbering the network structures of all layers in the original network layer. The method comprises the steps that operator fusion and calculation graph optimization are carried out on an original network graph layer of an initialized neural network model in a compiler, a chip calculation graph can be obtained, and the operator fusion and calculation graph optimization technology disclosed herein mainly aims at high-efficiency fusion among different calculation networks, and algorithm execution efficiency is improved; the optimization of the computation graph is the process of adjusting and calculating aiming at different images and different network calculations. The chip calculation graph is obtained by abstracting the original network layer of the initialized neural network model after the operation is compiled into the operation process according with the chip language instruction. The chip computation graph may include a plurality of computation graphs, and each layer of computation graph may include an operation process, where the operation process of each layer of computation graph has a one-to-one correspondence with each layer of network structures in the original network graph layer.

For example, the weight parameter of the initialized neural network model with the expression capability meeting the preset condition may be initialized according to the gaussian distributed random data, and then quantized according to the random number input, so as to obtain the quantized original network map layer. Quantization may be performed by equalizing 8bit and 16bit or by mixing quantization. The operator fusion can be completed by convolution and relu function (Rectified Linear Unit, which is a common activation function in artificial neural networks), and the operators can be fused into a layer of computation graph.

And step 204, establishing a layer mapping relation between the original network layer and the chip computation graph.

The operation process of each layer of calculation graph in the chip calculation graph has a one-to-one correspondence relation with each layer of network structure in the original network graph layer, and in the process of operator fusion and calculation graph optimization of the original network graph layer of the initialized neural network model, the comparison and numbering of each layer of calculation graph in the chip calculation graph can be carried out according to the number of each layer of network structure in the original network graph layer. The corresponding relation between the serial number of each layer of calculation graph in the chip calculation graph and the serial number of each layer of network structure in the original network layer is the layer mapping relation. Illustratively, the optimized chip computation graph and the original quantized network graph are compared by marking numbers, and a layer mapping relation between the original network graph and the chip computation graph is established.

In step 206, the in-chip storage information of each layer of the computational graph in the chip computational graph is obtained, and the off-chip storage information of the initialized neural network model is obtained.

The compiler compiles each layer of computation graph in the chip computation graph to obtain an instruction set corresponding to each layer of computation graph, and performs memory allocation on each instruction set, so that the instruction set corresponding to each layer of computation graph can simulate operation in the allocated storage space, and the memory allocated to the instruction set of the computation graph in the operation process is the information stored in the sub-chip. The off-chip storage information of the initialized neural network model can directly obtain the off-chip storage occupied by the initialized neural network model deployed on the chip. For example, in the process of allocating a Memory space to compile the chip computation graph, the Memory information in the sub-slice of each layer of computation graph may be checked and recorded, where the Memory information in the sub-slice includes the occupied Memory size of an SRAM (Static Random-Access Memory) on the chip.

And step 208, determining the in-chip storage information for initializing the neural network model according to the layer mapping relation and the in-chip storage information of each layer of the calculation graph.

After the information stored in the sub-sheets of each layer of the computation graph is determined, the information stored in the sub-sheets of each layer of the network structure in the corresponding original network graph layer can be obtained according to the layer mapping relation. The in-chip storage information for initializing the neural network model may include storage information in sub-chips of each layer of network structure in the original network layer, or may include a sum of storage information in sub-chips of each layer of network structure.

In the embodiment of the disclosure, the off-chip storage information of the initialized neural network model is determined, and the on-chip storage information of the initialized neural network model is determined through quantitative compiling, so that the width index requirement of the neural network model can be added into the neural network searching process, and the on-chip and off-chip storage of the chip can be fully utilized by the target neural network model.

In one embodiment, as shown in fig. 3, in step 108, in the case that the on-chip storage information and the off-chip storage information of the initialized neural network model satisfy the storage condition, determining the performance data of the initialized neural network model according to the target bandwidth utilization ratio may include:

step 302, obtaining target storage parameters, wherein the target storage parameters comprise target off-chip storage parameters and target on-chip storage parameters.

The target storage parameters may include target off-chip storage parameters and target on-chip storage parameters, and the target off-chip storage parameters are space parameters stored off-chip, that is, storage capacity that the target neural network model can occupy at most in off-chip storage. The target on-chip storage parameter is a space parameter stored on a chip, namely the storage capacity which can be occupied by the target neural network model in the storage of the chip at most. The target storage parameters may be set according to actual requirements.

And 304, comparing the on-chip storage information and the off-chip storage information of the initialized neural network model with a target off-chip storage parameter and the target on-chip storage parameter to obtain a comparison result of the initialized neural network model, wherein the comparison result comprises a first comparison result or a second comparison result, the first comparison result represents that the on-chip storage information and the off-chip storage information of the initialized neural network model meet storage conditions, and the second comparison result represents that the on-chip storage information and the off-chip storage information of the initialized neural network model do not meet the storage conditions.

The off-chip storage information of the initialized neural network model can be compared with the target off-chip storage parameters, and the obtained comparison result can include redundancy or overflow information of the off-chip storage. The sub-chip storage information of each layer of network structure in the chip storage information of the initialized neural network model can be compared with the target on-chip storage parameters; or, the sum of the sub-chip storage information of each layer of network structure in the chip storage information of the initialized neural network model may be compared with the target on-chip storage parameter, and the obtained comparison result may include redundancy or overflow information stored in the chip. The storage conditions are that the off-chip storage information of the initialized neural network model is less than or equal to the target off-chip storage parameter, and the on-chip storage information is less than or equal to the target on-chip storage parameter.

And step 306, determining performance data of the initialized neural network model according to the target bandwidth utilization rate under the condition of obtaining the first comparison result.

The target bandwidth utilization rate can be set according to different application scenarios. Under the condition that the first comparison result is obtained, namely the initialized neural network model meets the storage condition, the target bandwidth utilization rate and the target storage parameters can be set in the simulator, and the initialized neural network model is subjected to performance simulation to obtain performance data. The performance data may include chip inference time, i.e., the time when the neural network model is initialized to complete an application scenario on the chip, such as a classification task.

According to the embodiment of the disclosure, the width (storage) index requirement of the neural network model is added into the neural network searching process, so that in the searching process, the performance data is calculated only under the condition that the index requirements such as the expressive force requirement, the on-chip storage, the off-chip storage and the like are met, the searching speed of the neural network model is accelerated, and the searched target neural network model has both performance and precision.

In one embodiment, in step 304, comparing the on-chip storage information and the off-chip storage information of the initialized neural network model with the target off-chip storage parameter and the target on-chip storage parameter to obtain a comparison result, which may include:

obtaining a first comparison result under the condition that the on-chip storage information is less than or equal to the target on-chip storage parameter and the off-chip storage information is less than or equal to the target off-chip storage parameter; or, obtaining a second comparison result under the condition that the on-chip storage information is larger than the target on-chip storage parameter or the off-chip storage information is larger than the target on-chip storage parameter.

Wherein the off-chip storage information of the initialized neural network model may be compared to the target off-chip storage parameters. The sub-chip storage information of each layer of network structure in the chip storage information of the initialized neural network model can be compared with the target on-chip storage parameters; or, the sum of the sub-chip storage information of each layer of network structure in the chip storage information of the initialized neural network model can be compared with the target on-chip storage parameter. And if one of the on-chip storage information and the off-chip storage information exceeds the corresponding target storage information, obtaining a second comparison result, terminating performance reasoning in the next step, feeding the storage information back to the model search system algorithm, and then searching again.

In one embodiment, the neural network searching method may further include:

under the condition that the comparison results corresponding to the initialized neural network models are all second comparison results, entering a next iterative search process, taking the comparison results corresponding to the initialized neural network models as constraint conditions, repeatedly determining a plurality of initialized neural network models according to the number of target network layers, and judging the expression capacity of each initialized neural network model to obtain the judgment results corresponding to each initialized neural network model until the initialized neural network model corresponding to the first comparison result is obtained or until the iteration times reach the target times;

Illustratively, the number of target network layers is 4, and the target number of times is 10. In the 1 st iterative search process, if the comparison results of the plurality of initialized neural network models with the expression capability meeting the preset condition are all second comparison results, entering the 2 nd iterative search process, taking each second comparison result as a constraint condition, and determining the plurality of initialized neural network models again according to the number of target network layers. For example: and the second comparison result represents that the on-chip storage information of the 4 th layer exceeds the target storage parameter, the storage parameter quantity of the 5 th layer can be limited during the 2 nd iterative search, and if the total off-chip storage information exceeds the target storage parameter, the parameter quantity can be gradually limited to perform re-search according to the size sequence of each layer of storage during the 2 nd iterative search. And judging the expression capacity of the plurality of initialized neural network models obtained by searching again, determining the comparison results of the plurality of initialized neural network models with the expression capacity meeting the preset conditions, entering a 3 rd iteration searching process if the comparison results of the plurality of initialized neural network models with the expression capacity meeting the preset conditions obtained in the 2 nd search are still the second comparison results, and repeating the steps until the initialized neural network model corresponding to the first comparison result is obtained. Or when the 10 th iteration search is completed, the initialized neural network model corresponding to the first comparison result cannot be obtained, the number of target network layers can be increased to 5 or target storage parameters are increased, the steps of determining a plurality of initialized neural network models according to the adjusted number of target network layers, judging the expression capacity of each initialized neural network model, and obtaining the judgment result corresponding to each initialized neural network model are repeated.

According to the embodiment of the disclosure, the initialized neural network model with the expression capability and the on-chip and off-chip storage meeting the conditions can be searched within limited times according to the number of the target network layers, so that the searching speed is increased, and the precision and the performance of the finally obtained target neural network model are improved.

In one embodiment, the performance data includes an inference time. As shown in fig. 4, determining performance data for initializing the neural network model according to the target bandwidth utilization in step 108 may include:

step 402, determining a target bandwidth utilization.

The target bandwidth utilization rate is the bandwidth which can be occupied by the target neural network model to be searched at most in the process of deploying the target neural network model in hardware, and the target bandwidth utilization rate can be set according to different application scenarios.

And step 404, determining inference time of the initialized neural network model according to the target bandwidth utilization rate and the target storage parameters, wherein the inference time represents the time of the initialized neural network model for completing the target scene task.

The target scene task is an operation purpose which needs to be completed by a neural network model in an application scene, such as face recognition, classification and the like.

For example, the bandwidth utilization may be set in the emulator as a percentage according to the application scenario, such as a target bandwidth utilization of 40%. And carrying out simulation reasoning performance on the initialized neural network model which reaches the storage standard to obtain reasoning time. In a classification application scene, the time for completing classification of the initialized neural network model is 50s, and the inference time corresponding to the initialized neural network model is 50s.

According to the embodiment of the disclosure, the performance index requirement of the neural network model is added into the neural network searching process, so that in the searching process, the target neural network model is obtained only under the condition of the index requirements such as expression requirement, on-chip storage, off-chip storage, performance and the like, the searching speed of the neural network model is accelerated, and the performance and the precision of the searched target neural network model are considered at the same time.

In one embodiment, in step 110, determining the target neural network model according to the performance data of each initialized neural network model may include:

taking the initialized neural network model as a target neural network model under the condition that the performance data of the initialized neural network model meet the target performance condition; or, under the condition that the performance data of each initialized neural network model does not meet the target performance condition, entering a next iterative search process, repeating the steps of determining a plurality of initialized neural network models according to the number of target network layers, judging the expression capacity of each initialized neural network model and obtaining a judgment result corresponding to each initialized neural network model until the target neural network model is obtained or until the iteration times reach the target times;

Illustratively, the target performance condition may be a preset target inference time, for example, 10s. And (4) initializing the inference time of the neural network model to be less than or equal to the target inference time, namely meeting the target performance condition. If the inference time of the initialized neural network model obtained in the simulator is 8s and is less than the target inference time 10s, the initialized neural network model simultaneously meets the requirements of expression capacity, storage and performance and serves as a target neural network model. The iterative search may be repeated in the event that the performance data of the initialized neural network model does not meet the target performance condition. When the target times are searched (when reinforcement learning or the evolutionary algorithm converges to the best result), the ideal neural network model still cannot be converged, and at the moment, the number of network layers can be increased to search in the steps again.

According to the embodiment of the disclosure, the performance index requirement of the neural network model is added into the neural network searching process, so that in the searching process, the target neural network model is obtained only under the condition that the performance index requirements such as the expressive force requirement, the on-chip storage, the off-chip storage, the performance and the like are met, the searching speed of the neural network model is accelerated, and the performance and the precision of the searched target neural network model are considered at the same time.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

To facilitate a further understanding of the embodiments of the present application, reference is made to FIG. 5, which provides one of the most complete embodiments herein. The neural network searching method provided by the embodiment of the application fully utilizes the storage and bandwidth of the deployed chip to guide the neural network searching, accelerates the convergence speed of the searching of the neural network model (the speed of searching the neural network), and simultaneously ensures the chip reasoning time (the reasoning time of the neural network on the chip) and the model precision. The specific method comprises the following steps: firstly, manually setting different bandwidth utilization rates, off-chip space parameters and on-chip space parameters according to different application scenes, selecting a search space and a target data set of a network, and manually setting the number of initial network search layers, for example, 20 layers. The data flow of the image and the voice occupies the bandwidth, the bandwidth utilization rate calculated by the neural network model of some scenes is 70-80%, and the bandwidth utilization rate of some scenes is 20-30%. Then, judging whether the expression capability of the network structure subjected to initial search meets the standard or not, if the expression capability meets the standard, indicating that the precision requirement of the neural network model can be met through training, and then, carrying out network layer marking and compiling optimization; if not, the information is fed back to a searching system to increase the network structure parameters (width, layer number and the like) for searching again.

And secondly, initializing weight parameters of the network according to the network structure meeting the requirements acquired in the first step and Gaussian distributed random data, and then inputting and quantizing according to random numbers, wherein the quantizing can be 8-bit and 16-bit equivalent quantizing or can be mixed quantizing, so that an original quantized model layer is obtained. Optimizing and compiling the compiler, wherein the optimizing mainly comprises operator fusion (taking convolution + relu as an example, the operator fusion can be a layer), and calculation graph optimization, and meanwhile, the model layer mark in the optimizing process and the original quantization is compared, and the calculation graph layer after the optimizing still can establish a corresponding relation with the layer of the original model. And (4) building a layer mapping relation between the calculation graph obtained by the operator fusion of the compiler and the optimization of the calculation graph and the original network graph. Then, carrying out a memory allocation instruction, allocating memory to each layer of calculation graph, generating an instruction set, completing compilation, simultaneously counting the in-chip storage redundancy of each layer of the current calculation graph according to the preset in-chip space parameters, if the in-chip storage is not available, feeding back to determine the storage shortage of the current layer, then calculating the parameter storage information of each layer of the original network graph through the layer mapping relation of all in-chip storage information (whether redundancy and shortage) and judging whether the total out-chip storage reaches a reasonable storage size according to the preset out-chip space parameters after compilation is completed; whether the on-chip storage or the off-chip storage is carried out or not, the next performance reasoning is stopped under the condition that data cannot be stored, the storage information is fed back to the model searching system, then searching is carried out again until the expression requirement, the on-chip storage and the off-chip storage index requirement are met, and then the next step is carried out; the method comprises the steps of distributing a space compiling calculation graph, checking and recording storage information of each layer, wherein the storage information comprises the redundancy of an SRAM and the storage overflow, feeding back whether the compiled file size exceeds a reasonable range of off-chip storage, feeding back a network search system for re-searching if one of the on-chip storage and the off-chip storage exceeds the storage range, and re-adjusting according to the storage information of each layer, for example, if the parameter storage of the 5 th layer of the current search result exceeds the standard, limiting the storage parameter of the 5 th layer in the next search, and if the total off-chip storage exceeds the standard, gradually limiting the parameter size for re-searching according to the size sequence of each layer storage.

Thirdly, model performance reasoning, namely, the simulator reads and sets the bandwidth utilization rate according to the upper application scene setting, then sets relevant parameters such as space parameters outside the chip and space parameters inside the chip, performs performance simulation, and finally feeds performance data (reasoning time) back to the model search system for re-searching. And fourthly, because the number of network layers is set by manual initialization at the beginning, a certain number of search iterations is set according to the first step to the third step, so that when a certain number of search iterations (when reinforcement learning or an evolutionary algorithm has converged the best result) is searched, an ideal network structure model can still not be converged, the number of network layers can be increased, the search of the steps can be carried out again, and finally, a neural network model which fully utilizes the storage inside and outside the chip, can meet the precision requirement and has optimal reasoning performance is searched.

The neural network searching method provided by the embodiment of the application combines the influence of the storage size and the bandwidth size of deployed hardware on model searching, provides a better direction for the searching design of the model, solves the problems of how to use hardware information and how to realize rapid searching of a network structure, and provides the neural network searching method for guiding network searching based on the utilization rate of hardware storage and bandwidth. According to the neural network searching method provided by the embodiment of the application, in order to accelerate the searching speed, firstly, searching is carried out based on the number of network layers which are set manually, then judgment of expressive force is carried out, quantitative compiling is carried out, the size information of data stored in each layer of the chip and the total information stored outside the chip are calculated, finally, the performance of the searched network model is fed back to a model searching system through a simulator in combination with the bandwidth size, the network with better performance when the chip is deployed can be obtained quickly, meanwhile, the accuracy is higher, and the method has greater value and significance in fully utilizing the calculation storage resources and the bandwidth of a bottom chip for different efficiently-executed network structures designed according to different hardware architectures.

Based on the same inventive concept, the embodiment of the present application further provides a neural network searching apparatus for implementing the above-mentioned method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the neural network search device provided below can be referred to the limitations of the neural network search method in the above, and details are not repeated here.

In one embodiment, referring to fig. 6, a neural network searching apparatus 600 is provided. The neural network searching apparatus 600 includes:

a network layer number determining module 602, configured to determine a target network layer number;

an expression capacity judgment module 604, configured to determine multiple initialized neural network models according to the number of target network layers in the ith iterative search process, and judge an expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model, where i is an integer greater than 0;

a storage information determining module 606, configured to determine, for any initialized neural network model, on the condition that the determination result represents that the expression capability of the initialized neural network model meets a preset condition, on-chip storage information and off-chip storage information of the initialized neural network model;

a performance data determining module 608, configured to determine, for any initialized neural network model, performance data of the initialized neural network model according to the target bandwidth utilization rate when the on-chip storage information and the off-chip storage information of the initialized neural network model satisfy the storage condition;

and a target network determining module 610, configured to determine a target neural network model according to the performance data of each initialized neural network model.

The neural network searching device provided by the embodiment of the application determines the number of target network layers; in the ith iterative search process, determining a plurality of initialized neural network models according to the number of target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model; aiming at any initialized neural network model, determining on-chip storage information and off-chip storage information of the initialized neural network model under the condition that the judgment result represents that the expression capability of the initialized neural network model meets the preset condition; for any initialized neural network model, determining performance data of the initialized neural network model according to the target bandwidth utilization rate under the condition that the on-chip storage information and the off-chip storage information of the initialized neural network model meet the storage conditions; and determining a target neural network model according to the performance data of each initialized neural network model. The neural network searching device provided by the application firstly searches based on the number of the target network layers set manually, improves the searching speed, and simultaneously searches the target neural network model by combining the expression capacity, the on-chip storage information, the off-chip storage information and the target bandwidth utilization rate, so that the target neural network model with better performance and higher precision can be quickly acquired when a chip is deployed.

In one embodiment, the neural network searching means 600 further comprises iteration means. The iterative device is used for entering the next iterative search process under the condition that the judgment results corresponding to the initialized neural network models all represent that the expression capacity of the initialized neural network models does not meet the preset conditions, repeatedly determining a plurality of initialized neural network models according to the number of target network layers, and judging the expression capacity of the initialized neural network models to obtain the judgment results corresponding to the initialized neural network models until the initialized neural network models corresponding to the judgment results representing that the expression capacity of the initialized neural network models meets the preset conditions are obtained or until the iteration times reach the target times; and after the number of target network layers is adjusted, repeating the steps of determining a plurality of initialized neural network models according to the adjusted number of target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model.

In one embodiment, the storage information determining module 606 is further configured to determine a chip computation graph according to the original network layer of the initialized neural network model; establishing a layer mapping relation between an original network layer and a chip calculation graph; acquiring sub-chip storage information of each layer of computational graph in the chip computational graph and acquiring off-chip storage information for initializing the neural network model; and determining the in-chip storage information for initializing the neural network model according to the layer mapping relation and the in-chip storage information of each layer of the calculation graph.

In one embodiment, the performance data determination module 608 is further configured to obtain target storage parameters, where the target storage parameters include a target off-chip storage parameter and a target on-chip storage parameter; comparing the on-chip storage information and the off-chip storage information of the initialized neural network model with the target off-chip storage parameters and the target on-chip storage parameters to obtain a comparison result of the initialized neural network model, wherein the comparison result comprises a first comparison result or a second comparison result, the first comparison result represents that the on-chip storage information and the off-chip storage information of the initialized neural network model meet storage conditions, and the second comparison result represents that the on-chip storage information and the off-chip storage information of the initialized neural network model do not meet the storage conditions; and under the condition of obtaining the first comparison result, determining the performance data of the initialized neural network model according to the target bandwidth utilization rate.

In one embodiment, the performance data determining module 608 is further configured to obtain a first comparison result in a case that the on-chip storage information is less than or equal to the target on-chip storage parameter, and the off-chip storage information is less than or equal to the target off-chip storage parameter; or, obtaining a second comparison result under the condition that the on-chip storage information is larger than the target on-chip storage parameter and/or the off-chip storage information is larger than the target on-chip storage parameter.

In an embodiment, the iterative device is further configured to enter a next iterative search process when the comparison results corresponding to the initialized neural network models are all the second comparison results, take the comparison results corresponding to the initialized neural network models as constraint conditions, repeat the step of determining a plurality of initialized neural network models according to the number of target network layers, and perform expression capability judgment on the initialized neural network models to obtain judgment results corresponding to the initialized neural network models until the initialized neural network models corresponding to the first comparison results are obtained, or until the number of iterations reaches the target number; and after the target network layer number and/or the target storage parameter are/is adjusted, the steps of determining a plurality of initialized neural network models according to the adjusted target network layer number, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model are repeated.

In one embodiment, the performance data includes an inference time. The performance data determination module 608 is further configured to determine a target bandwidth utilization; and determining the inference time of the initialized neural network model according to the target bandwidth utilization rate and the target storage parameters, wherein the inference time represents the time for completing the target scene task by the initialized neural network model.

In one embodiment, the target network determining module 610 is further configured to take the initialized neural network model as the target neural network model if the performance data of the initialized neural network model meets the target performance condition; or, under the condition that the performance data of each initialized neural network model does not meet the target performance condition, entering the next iterative search process, repeating the steps of determining a plurality of initialized neural network models according to the number of target network layers, judging the expression capacity of each initialized neural network model and obtaining the judgment result corresponding to each initialized neural network model until the target neural network model is obtained or until the iteration times reach the target times; and after the number of target network layers is adjusted, repeating the steps of determining a plurality of initialized neural network models according to the adjusted number of target network layers, and judging the expression capacity of each initialized neural network model to obtain a judgment result corresponding to each initialized neural network model.

The modules in the neural network searching device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a neural network search method.

It will be appreciated by those skilled in the art that the configuration shown in fig. 7 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above-described method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments. In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, carries out the steps in the method embodiments described above.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A neural network searching method, the method comprising:

determining the number of target network layers;

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein determining on-chip storage information and off-chip storage information for the initialized neural network model comprises:

acquiring sub-chip storage information of each layer of computational graph in the chip computational graph and acquiring off-chip storage information of the initialized neural network model;

4. The method of claim 1, wherein determining the performance data of the initialized neural network model according to the target bandwidth utilization in case that the on-chip storage information and the off-chip storage information of the initialized neural network model satisfy a storage condition comprises:

5. The method of claim 4, further comprising:

6. The method of claim 4, wherein the performance data comprises inference time, and wherein determining the performance data of the initialized neural network model based on the target bandwidth utilization comprises:

determining a target bandwidth utilization;

and determining the reasoning time of the initialized neural network model according to the target bandwidth utilization rate and the target storage parameters, wherein the reasoning time represents the time for completing a target scene task by the initialized neural network model.

7. The method of claim 1, wherein determining a target neural network model from the performance data for each of the initialized neural network models comprises:

8. An apparatus for neural network searching, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.