WO2021103977A1 - 神经网络的搜索方法、装置及设备 - Google Patents

神经网络的搜索方法、装置及设备 Download PDF

Info

Publication number
WO2021103977A1
WO2021103977A1 PCT/CN2020/126795 CN2020126795W WO2021103977A1 WO 2021103977 A1 WO2021103977 A1 WO 2021103977A1 CN 2020126795 W CN2020126795 W CN 2020126795W WO 2021103977 A1 WO2021103977 A1 WO 2021103977A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
evolution
neural
neural networks
networks
Prior art date
Application number
PCT/CN2020/126795
Other languages
English (en)
French (fr)
Inventor
徐航
陈泽伟
李震国
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021103977A1 publication Critical patent/WO2021103977A1/zh
Priority to US17/826,873 priority Critical patent/US20220292357A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Definitions

  • the present invention relates to the technical field of machine learning, in particular to a neural network search method, device and equipment.
  • Machine learning is widely used in various fields.
  • the construction of machine learning models requires high requirements for machine learning experts, requiring machine learning experts to manually design and debug the model, which has high labor and time costs, and increases Product iteration cycle.
  • machine learning experts In order to make machine learning easier to apply, reduce the required expertise and improve the performance of the model, automatic machine learning came into being.
  • AutoML Automatic machine learning
  • AutoML usually uses model search methods in the process of model construction, model training and evaluation to realize automatic optimization of model structure and model parameters.
  • the current search method is to select some models in the search space for training, and evaluate the trained model, and then adjust the structure and parameters of the model according to the evaluation result.
  • this method requires training and evaluation for each selected model, which is time-consuming and low in efficiency of automatic machine learning.
  • the embodiments of the present invention provide a neural network search method, device, and equipment to solve the technical problem of low efficiency of automatic machine learning.
  • an embodiment of the present invention provides a neural network search method.
  • the method includes: a computing device obtains a data set and N neural networks, where N is a positive integer; and performs K evolutions on the N neural networks to obtain the first
  • K is a positive integer
  • the i-th evolution includes: the computing device mutates the network structure of the neural network obtained from the i-1th evolution to obtain the mutated neural network;
  • the neural network whose network structure is better than the neural network obtained by the i-1th evolution is selected, and the candidate neural network is obtained; each neural network in the set of the neural network obtained according to the i-1th evolution and the candidate neural network P evaluation parameters corresponding to the network, the neural network obtained by the i-th evolution is selected from the set; among them, P evaluation parameters are used to evaluate the neural network training and testing of each neural network in the set.
  • Performance, i and P are positive integers, 1 ⁇ i ⁇ K.
  • the above method applies partial order hypothesis to pruning the search space of the network in each evolution process, eliminating neural networks with poor network structure, reducing models that need to be trained and evaluated, and avoiding poor networks from occupying computing resources and time. Consumption, improve the efficiency of automatic machine learning.
  • the neural network obtained by the i-1th evolution is CNN
  • the computing device mutating the neural network obtained by the i-1th evolution may include at least one of the following steps:
  • One or more pooling layers are deleted from one or more neural networks in the neural network obtained by the i-1th evolution.
  • the network structure of the neural network obtained by the mutation and the network structure of the neural network before the mutation have a similar topological structure to meet the partial order hypothesis, avoid network pruning with excellent network structure, and improve the accuracy of pruning .
  • the neural network obtained from the i-1th evolution is ResNet
  • the computing device mutating the neural network obtained from the i-1th evolution may include at least one of the following steps:
  • the network structure of the neural network obtained by the mutation and the network structure of the neural network before the mutation have a similar topological structure to meet the partial order hypothesis, avoid network pruning with excellent network structure, and improve the accuracy of pruning Sex.
  • the computing device selects from the mutated neural network the candidate neural network whose network structure is better than the neural network obtained by the i-1th evolution.
  • One implementation can be : The computing device selects the neural network whose network structure is better than that of the first neural network from the neural network after the mutation of the first neural network.
  • the candidate neural network includes the neural network after the mutation of the first neural network.
  • the first neural network is any one of the neural networks obtained from the i-1th evolution.
  • the network structure of the neural network after the mutation of the first neural network is better than the network structure of the first neural network:
  • the number of channels of the neural network after the mutation of the first neural network is greater than the number of channels of the first neural network
  • the number of convolutional layers in the neural network after the mutation of the first neural network is greater than the number of convolutional layers in the first neural network.
  • the pruning method of the mutated neural network through the number of channels of the neural network and the number of layers of the convolutional layer only needs to count the number of channels and the number of layers in the neural network, the pruning efficiency is high, and the efficiency of automatic machine learning is further improved.
  • the computing device selects the P evaluation parameters corresponding to each neural network in the set of the neural network obtained by the i-1th evolution and the candidate neural network from the set
  • An implementation of the neural network obtained by the i-th evolution may be: the computing device performs non-dominant sorting of the neural networks in the set according to the P evaluation parameters corresponding to each neural network in the set, and then determines the i-th evolution to obtain
  • the neural network of is a neural network that is not dominated in the set; among them, the second neural network and the third neural network are two neural networks in the set. If the second neural network is not for each of the P evaluation parameters When the second neural network is inferior to the third neural network and for at least one evaluation parameter of the P evaluation parameters, the second neural network is better than the third neural network, then the second neural network dominates the third neural network.
  • the optimal network for mine removal is selected from the set of neural networks obtained from the previous evolution and the candidate neural networks obtained from its mutation, thereby reducing the need to enter the neural network of the next evolution.
  • the number greatly reducing the amount of calculation in each evolution process, and further improving the efficiency of automatic machine learning.
  • one implementation manner for the computing device to obtain N neural networks may be: the computing device randomly generates M neural networks, where M is a positive integer; The network is trained and tested separately to obtain P evaluation parameters corresponding to each neural network in the M neural networks; further, according to the P evaluation parameters corresponding to each neural network in the M neural networks, from the M neural networks Select N neural networks, N is not greater than M.
  • the P evaluation parameters include at least one of running time, accuracy, and parameter amount.
  • the above method can avoid the situation that one evaluation parameter is excellent in the neural network obtained by the K-th evolution and the other evaluation parameters are poor, and multi-objective optimization is realized, and a neural network with a balance of P evaluation parameters can be obtained.
  • an embodiment of the present application also provides an object recognition method, including: user equipment or client equipment obtains an image to be recognized; inputting the image to be recognized into an object recognition neural network to obtain an object corresponding to the image to be recognized Object type.
  • the object recognition neural network is a network determined in the search space by the neural network search method as described in the first aspect or any one of the first aspects, and the search space is determined by the basic unit and the The parameters of the basic unit are constructed.
  • the image to be recognized is an image of the surrounding environment of the vehicle to recognize objects in the surrounding environment of the vehicle.
  • the parameters of the basic unit include at least one of the type of the basic unit, the number of channels parameter, and the size parameter.
  • the basic unit is used to perform a first operation and a second operation on the feature map input to the basic unit
  • the feature map is the feature map of the image to be recognized
  • the first operation is used to The number of feature maps input to the basic unit is doubled or remains unchanged
  • the second operation is used to change the size of the feature map input to the basic unit from the original first size to the second size or maintain the The first size remains unchanged, and the first size is greater than the second size.
  • the neural network in the search space is ResNet
  • the basic unit includes a residual module
  • the residual module is used to combine the feature map of the input basic unit with the feature input of the basic unit
  • the feature maps of the graphs processed by the basic unit are added.
  • the neural network in the search space is a CNN
  • the type of the basic unit includes a convolutional layer and a pooling layer.
  • the data set described in the first aspect includes a plurality of samples, and each sample in the data set includes a sample image and an object type corresponding to the sample image.
  • an embodiment of the present application also provides a gesture recognition method, including: a user device or a client device obtains an image to be recognized; inputting the image to be recognized into a gesture recognition neural network to obtain a gesture type corresponding to the image to be recognized.
  • the gesture recognition neural network is a network determined in the search space by the neural network search method as described in the first aspect or any one of the first aspects, and the search space is determined by the basic unit and the The parameters of the basic unit are constructed.
  • the parameters of the basic unit include at least one of the type of the basic unit, the number of channels parameter, and the size parameter.
  • the basic unit is used to perform a first operation and a second operation on the feature map input to the basic unit
  • the feature map is the feature map of the image to be recognized
  • the first operation is used to The number of feature maps input to the basic unit is doubled or remains unchanged
  • the second operation is used to change the size of the feature map input to the basic unit from the original first size to the second size or maintain the The first size remains unchanged, and the first size is greater than the second size.
  • the neural network in the search space is ResNet
  • the basic unit includes a residual module
  • the residual module is used to combine the feature map of the input basic unit with the feature input of the basic unit
  • the feature maps of the graphs processed by the basic unit are added.
  • the neural network in the search space is a CNN
  • the type of the basic unit includes a convolutional layer and a pooling layer.
  • the data set described in the first aspect includes a plurality of samples, and each sample in the data set includes a sample image and a gesture type corresponding to the sample image.
  • the embodiments of the present application also provide a data prediction method, which may include: user equipment or client equipment acquiring data to be predicted; inputting the data to be predicted into the target neural network model to obtain the data corresponding to the data to be predicted forecast result.
  • the target neural network can be a neural network obtained from the first aspect of the Kth evolution or a neural network obtained from the Kth evolution, or it can be a neural network and data obtained from the Kth evolution mentioned above.
  • a machine learning model obtained by combining cleaning and feature engineering algorithms.
  • the target neural network is a network determined in the search space by the neural network search method as described in the first aspect or any one of the first aspects, and
  • the search space is constructed by the basic unit and the parameters of the basic unit.
  • an embodiment of the present application also provides a neural network search device, including:
  • the acquisition module is used to acquire data sets and N neural networks, where N is a positive integer;
  • the evolution module is used to perform K evolutions on the N neural networks to obtain the neural network obtained by the Kth evolution, and K is a positive integer;
  • the evolution module includes a mutation unit, a first screening unit, and a second screening unit, wherein,
  • the mutation unit is used to: in the i-th evolution process, mutate the network structure of the neural network obtained from the i-1th evolution to obtain a mutated neural network, and the neural network obtained from the 0th evolution is The N neural networks;
  • the first screening unit is used to: in the i-th evolution process, select a candidate neural network whose network structure is better than the neural network obtained from the i-1th evolution from the mutated neural network, so The candidate neural network is a neural network selected;
  • the second screening unit is used to: in the i-th evolution process, according to the neural network obtained from the i-1th evolution and the P evaluation parameters corresponding to each neural network in the set of candidate neural networks, The neural network obtained by the i-th evolution is selected from the set; wherein the P evaluation parameters are used to evaluate the performance of the neural network trained and tested by the data set of each neural network in the set , I is a positive integer not greater than K, and the P is a positive integer.
  • an embodiment of the present application also provides an object recognition device, including: a functional unit for implementing the object recognition method described in the second aspect.
  • an embodiment of the present application also provides a gesture recognition device, including: a functional unit for implementing the gesture recognition method described in the third aspect.
  • an embodiment of the present application also provides a data prediction device, including: a functional unit for implementing the gesture recognition method described in the fourth aspect.
  • an embodiment of the present application also provides a neural network search device, including a processor and a memory, the memory is used to store a program, the processor executes the program stored in the memory, and when the When the program stored in the memory is executed, the search device of the neural network realizes the method as described in the first aspect or any one of the possible implementations of the first aspect.
  • an embodiment of the present application also provides an object recognition device, including: a processor and a memory, the memory is used to store a program, the processor executes the program stored in the memory, and when the memory stores When the program of is executed, the object recognition apparatus is made to implement the method as described in the second aspect or any one of the possible implementations of the second aspect.
  • an embodiment of the present application also provides a gesture recognition device, including a processor and a memory, the memory is used to store a program, the processor executes the program stored in the memory, and when the memory When the stored program is executed, the gesture recognition apparatus is enabled to implement the method as described in the third aspect or any one of the possible implementations of the third aspect.
  • an embodiment of the present application also provides a data prediction device, including: a processor and a memory, the memory is used to store a program, the processor executes the program stored in the memory, and when the memory When the stored program is executed, the data prediction apparatus is made to implement the method as described in the fourth aspect or any one of the possible implementations of the fourth aspect.
  • the embodiments of the present application also provide a computer program product containing instructions.
  • the computer program product When the computer program product is run on a computer, the computer can implement any possible implementation method as in the first aspect.
  • the embodiments of the present application also provide a computer-readable storage medium in which computer-executable instructions are stored, and the computer-executable instructions are used when called by the computer
  • the computer is made to implement the method described in any possible implementation manner in the first aspect.
  • the embodiments of the present application also provide a computer program product containing instructions.
  • the computer program product When the computer program product is run on a computer, the computer can implement any possible implementation method as in the second aspect.
  • the embodiments of the present application also provide a computer-readable storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are used when called by the computer
  • the computer is made to implement the method described in any possible implementation manner in the second aspect.
  • the embodiments of the present application also provide a computer program product containing instructions.
  • the computer program product When the computer program product is run on a computer, the computer can implement any possible implementation method as in the third aspect.
  • the embodiments of the present application also provide a computer-readable storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are used when called by the computer
  • the computer is made to implement the method described in any possible implementation manner in the third aspect.
  • the embodiments of the present application also provide a computer program product containing instructions.
  • the computer program product When the computer program product is run on a computer, the computer can implement any possible implementation method in the fourth aspect.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores computer-executable instructions. When called by the computer, the computer-executable instructions are used to The computer is made to implement the method described in any possible implementation manner in the fourth aspect.
  • FIG. 1 is a system architecture diagram of AutoML provided by an embodiment of the present application
  • Figure 2A is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 2B is a schematic diagram of another application scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the architecture of a system provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the architecture of a convolutional neural network provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of a chip hardware structure provided by an embodiment of the present invention.
  • 6A is a schematic flowchart of a neural network search method provided by an embodiment of the present application.
  • FIG. 6B is a schematic flowchart of an implementation manner in which a computing device provided by an embodiment of the present application selects a neural network obtained by the i-th evolution from the set;
  • FIG. 7 is a schematic structural diagram of ResNet before and after mutation provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a CNN before and after mutation provided by an embodiment of the present application.
  • FIG. 9A is a schematic flowchart of an object recognition method provided by an embodiment of the present application.
  • FIG. 9B is a schematic flowchart of a gesture recognition method provided by an embodiment of the present application.
  • FIG. 10A is a schematic explanatory diagram of the running time and top1 accuracy of a model obtained according to an embodiment of the present application.
  • FIG. 10B is a schematic explanatory diagram of the parameter amount and top1 accuracy of a model obtained according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a neural network search device provided by an embodiment of the present application.
  • FIG. 12A is a schematic structural diagram of an object recognition device provided by an embodiment of the present application.
  • FIG. 12B is a schematic structural diagram of a gesture recognition device provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of another neural network search device provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • a system architecture diagram of AutoML provided by this embodiment of the application, the main flow of AutoML may include the following processes:
  • Data preparation may include data collection (date collection) and data cleaning (date cleaning).
  • data collection includes receiving the original data sent by the user equipment, and it can also be obtained from existing databases, such as imageNet, Labelme, etc., or obtained through other methods;
  • data cleaning mainly includes processing missing values and judging data in the original data Type, outlier detection, text encoding, data segmentation, etc. to obtain data that can be operated by machine learning models.
  • the original data can be images, voice, text, video, or a combination thereof.
  • Feature engineering is the process of using relevant knowledge in the data field to create features that enable machine learning algorithms to achieve the best performance. It is a process of transforming raw data into features. Its purpose is to maximize the extraction of features from the raw data for the algorithm and Model use.
  • This process can include feature construction, feature extraction, feature selection, etc., among which: feature construction is the artificial construction of new features from the original data; feature extraction is the automatic construction of new features, transforming the original features into a set of obvious physical features Significance or statistical significance or core features, such as reducing the number of values of a feature in the original data by transforming feature values; feature selection is to select a set of the most statistically significant feature subsets from the feature set, and Irrelevant features are deleted to achieve the effect of dimensionality reduction.
  • feature engineering is an iterative process, which requires continuous feature construction, feature extraction, feature selection, model selection, model training, and model evaluation, and then the final machine learning model can be obtained.
  • a data set that can be input to the machine learning model can be obtained.
  • the data set can be divided into a training data set and a test data set.
  • the training data set is used to train the constructed machine learning model to obtain the trained machine learning model;
  • the test data set is used to train the machine learning model.
  • the subsequent machine learning model is tested to evaluate the performance of the trained machine learning model, such as accuracy, running time, etc.
  • feature engineering is not a necessary process for AutoML.
  • the original data can be cleaned to obtain a data set.
  • machine learning model After feature engineering, it is necessary to select a machine learning model from the search space of the machine learning model, and set hyperparameters for the selected machine learning model. Among them, all possible machine learning models form the search space of the machine learning model.
  • the machine model in the search space can be constructed or constructed during the search process, and there is no limitation here.
  • the initialized machine learning model can be trained through the training data set, and then the trained machine learning model can be evaluated through the test data set, and then the machine learning model will be guided by the feedback of the evaluation result
  • the construction, selection, and hyperparameter setting of the learning model will finally get the best one or more machine learning models.
  • Neural network search (neural architecture search, NAS)
  • the machine learning model in the embodiment of the application is a neural network, which may be a deep neural network, such as a convolutional neural network (CNN), a residual neural network (deep residual network, ResNet), a recurrent neural network, and other neural networks.
  • CNN convolutional neural network
  • ResNet deep residual network
  • NAS is an algorithm for searching the best neural network architecture. The method mainly includes automatic optimization of model structure and model parameters.
  • an evolutionary algorithm is used, that is, one or more neural networks are constructed; the one or more neural networks are randomly mutated, for example, randomly added Or delete a layer of structure, randomly change the number of channels of one or more layer structures in the neural network, etc.; based on the partial order hypothesis, select the neural network with better network structure than the neural network before the mutation from the neural network after the mutation, It is a candidate neural network; each neural network in the candidate neural network is trained and tested, and P evaluation parameters corresponding to each neural network are obtained; the candidate neural networks are selected based on the P evaluation parameters corresponding to each neural network Neural networks with better evaluation parameters; further, based on the selected neural networks, iteratively execute mutations, the selection of mutated neural networks, the training and testing of candidate neural networks, and the selection of candidate neural networks. The network is getting better and better.
  • the embodiment of this application mainly introduces the search method of neural network. It should be understood that this method can be combined with other steps or processes, such as feature engineering, hyperparameter optimization, etc., to obtain the optimal model. For the combination of other processes, it can be See related content in the prior art, which is not limited in the embodiment of the present application.
  • Pareto Optimality refers to an ideal state of resources. Given an inherent group of people and allocatable resources, if the change from one state of allocation to another state does not make anyone worse off Under the premise, to make at least one person better, also known as Pareto improvement. Pareto's optimal state is that there can be no more Pareto improvements; in other words, it is impossible to improve the situation of some people without harming anyone else.
  • the model with evaluation parameters of (0.8, 2) is better than the model with evaluation parameters of (0.7, 3); and the evaluation parameters are (0.8, 2), (0.9, 2.5), (0.7, 1), (0.8 , 2) cannot compare the advantages and disadvantages between the models.
  • the models with the evaluation parameters (0.8, 2), (0.9, 2.5), (0.7, 1), (0.8, 2) are Pareto optimal Model.
  • multiple evaluation parameters of the model such as accuracy, running time, parameter amount, etc.
  • the accuracy of the model is the best
  • the parameter amount of the model Or the running time may be the worst.
  • it will weaken other evaluation parameters.
  • the set of models with the best evaluation parameters is the Pareto front, that is to say, the Pareto front is the set of Pareto optimal models.
  • Non-dominant sorting is a commonly used sorting method for multi-objectives. It is assumed that the optimization objectives are (A, B, C), point 1 (A 1 , B 1 , C 1 ) and dominant point 2 (A 2 , B 2 , C 2 ). and only if a 1 ⁇ A 2, B 1 ⁇ B 2, C 1 ⁇ C 2 and at least an equal sign is not satisfied. Point 1 dominates point 2 means point 1 is better than point 2. The point that is not dominated by any point is the point on the Pareto front, that is, the non-dominant point.
  • the target of optimization is the P evaluation parameters of the model, and Model 1 dominates Model 2, if and only if the P evaluation parameters of Model 1 are not inferior to Model 2 and among the P evaluation parameters of Model 1 At least one evaluation parameter is better than Model 2.
  • the partial order hypothesis means that networks with similar topologies, narrower and shallower networks are worse than deeper and wider networks, that is, a deeper and wider network is better than a narrower and shallower network.
  • “wide” and “narrow” respectively describe the number of network channels; “deep” and “shallow” respectively describe the number of layers of the network.
  • the partial order pruning algorithm is an algorithm that applies the principle of partial order hypothesis to narrow the search space of the model.
  • the principle of partial order hypothesis is applied to narrow the search space of the model and improve the efficiency of model search.
  • Hole convolution is to insert 0 in the middle of the ordinary convolution kernel to obtain a larger convolution kernel, but the parameter amount remains unchanged to obtain a larger range of information.
  • the ordinary convolution process is: M feature maps P1 pass through an ordinary convolution kernel, such as a high-dimensional matrix convolution with a size of (D k , D k , M, N), and M feature maps can be turned into N P2s.
  • the depth separation convolution is: first use a matrix of size (D k , D k , M, 1) to convolve the M feature maps P1, and turn the M feature maps P1 into M feature maps P3 , And then use a convolution kernel with a size of (1,1,M,N) to convolve the M feature maps P3 to obtain N feature maps P4. This method can greatly reduce the amount of parameters and achieve good results.
  • a neural network can be composed of neural units, which can refer to an arithmetic unit that takes x s and intercept 1 as inputs, and the output of the arithmetic unit can be
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with many hidden layers. There is no special metric for "many” here. Dividing DNN according to the location of different layers, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated as far as the work of each layer is concerned.
  • DNN The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the summary is: the coefficient from the kth neuron of the L-1 layer to the jth neuron of the Lth layer is defined as It should be noted that there is no W parameter in the input layer. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world.
  • a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is also the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
  • Convolutional neural network (CNN, convolutional neuron network) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units.
  • Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way of extracting image information has nothing to do with location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. Therefore, the image information obtained by the same learning can be used for all positions on the image. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a random-sized matrix. During the training of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • the depth of the network is critical to the performance of the model. As the number of network layers increases, the network extracts more complex features, and the performance of the network continues to improve. Therefore, in theory, the deeper the network, the better the effect; but in practice Due to the difficulty of training, a too deep network will produce degradation problems, but the effect is not as good as a relatively shallow network. This is called a degradation problem. The reason is that as the network becomes deeper and deeper, the training becomes more difficult, and the optimization of the network becomes more and more difficult.
  • ResNet can include multiple cascaded residual units (also called residual blocks) and several fully connected layers.
  • ResNet the output and input of the previous residual unit are simultaneously input to the next residual unit.
  • x l+1 f(h(x l )+F(x l ,W l )), where F(x l , W l ) is the output of the l-th residual unit, x l is the input of the l-th residual unit, and W l is the multilayer volume included in the l-th residual unit
  • the layered weight matrix is activated by a function f() between each residual unit.
  • the client can use the client device to send original data or data sets to a computing device, such as a cloud server, and request the cloud server based on the provided original data or
  • the data set is trained to obtain a target neural network that can complete a specific task.
  • the cloud server can use its powerful computing resources and AutoML architecture to automatically generate the target neural network required by the customer by using the original data or data set provided by the customer.
  • the data set is the data obtained after the original data has been cleaned and feature engineering, including training data set and test data set.
  • the original data and data set can also be data obtained from an existing database, such as pictures obtained from imageNet.
  • the customer wants a neural network that can identify the type of objects.
  • the neural network is applied to autonomous vehicles or semi-autonomous vehicles to identify objects in the vehicle's field of view observed by the camera. During the movement, the safe driving of the vehicle needs to be ensured. Therefore, the real-time recognition of objects and the accuracy of the recognition of objects in the surrounding environment of the vehicle are highly required. At this time, the customer can require the neural network to predict the types of objects with high Accuracy and low time consumption.
  • the client sends the data set to the cloud server through the client device, and requests multi-objective optimization (that is, high accuracy and low time-consuming) to find the optimal neural network.
  • the data set includes multiple types of sample images, and each sample image is labeled with the type of object to which it belongs; object types can include: people, dogs, vehicles, traffic lights are red, buildings, traffic lines, trees, roads Wait along.
  • the cloud server can use the above data set through its AutoML architecture to select a neural network in the search space of the neural network, and train and evaluate the selected neural network according to the customer’s high accuracy and low time-consuming requirements. For the accuracy and time-consuming of each trained neural network, the neural network with high accuracy and low time-consuming is further screened out, and the Pareto optimal object recognition neural network required by the customer is obtained through multiple screenings. Furthermore, the cloud server sends the object recognition neural network to the client device. The client device can send the object recognition neural network to the vehicle. Optionally, when the client device is a server, the vehicle can also download the object recognition neural network from the client device.
  • the vehicle After the vehicle receives the object recognition neural network, it can execute the object type recognition method.
  • the method may include the following steps: the vehicle obtains the image to be recognized through a camera, and the image to be recognized may be an image of the surrounding environment of the vehicle; Input to the object recognition neural network, and predict the object type corresponding to the image to be recognized. Further, the vehicle can also perform corresponding safe driving methods based on the types of objects in the surrounding environment that are recognized, for example, decelerate or brake when recognizing a person in front of you to improve the safety of vehicle operation; another example, after recognizing When the street light ahead is green, the vehicle can pass through the traffic intersection.
  • the customer wants a neural network that can recognize dynamic gestures.
  • the neural network is applied to terminals, such as portable devices, such as mobile phones, tablets, etc., or wearable devices, such as smart bracelets, smart watches, and VR. Glasses, etc., or smart home devices such as smart TVs, smart speakers, smart lamps, monitors, etc., to recognize the gestures in the field of view observed by the above-mentioned devices through the camera. Since the computing power and storage resources of the terminal are limited, the neural network applied on it is required to have high accuracy and low parameter amount.
  • the client sends the data set to the cloud server through the client device, and requests multi-objective optimization (that is, high accuracy, low parameter amount) to find the optimal neural network.
  • the data set includes sample images of multiple gestures, and each sample image is labeled with the gesture type to which it belongs, and the gesture type may include a variety of different gestures.
  • the cloud server can use the above data set through its AutoML architecture to select a neural network in the search space of the neural network, and train and evaluate the selected neural network according to the customer’s high accuracy and low time-consuming requirements. For the accuracy and time-consuming of each trained neural network, the neural network with high accuracy and low time-consuming is further selected, and the Pareto optimal gesture recognition neural network required by the customer is obtained through multiple selections. Furthermore, the cloud server sends the gesture recognition neural network to the client device. The client device can send the gesture recognition neural network to the terminal. Optionally, when the client device is a server, the terminal may also download the gesture recognition neural network from the client device.
  • the terminal After the terminal receives the gesture recognition neural network, it can execute the gesture recognition method.
  • the method may include the following steps: the terminal obtains the image to be recognized through a camera; inputs the image to be recognized into the gesture recognition neural network, and predicts that the image corresponds to the image to be recognized Type of gesture. Further, the terminal may also perform a corresponding operation based on the recognized gesture type, for example, perform an operation of opening the application "camera" when the first gesture is recognized.
  • the first gesture may be any one of a variety of different gestures that can be recognized by the gesture recognition neural network.
  • the cloud server automatically generates an object recognition neural network or a gesture recognition neural network based on the data set for specific implementations, which can be referred to related descriptions in the following method embodiments, which will not be repeated here.
  • Figure 3 is a schematic diagram of the architecture of a system provided by an embodiment of the present application, in which:
  • the computing device 32 may include part or all of the AutoML architecture shown in FIG. 1.
  • the computing device 32 may automatically generate and execute specific data based on the original data or data set stored in the database 33 or the original data or data set sent by the client device 31.
  • Functional machine learning models such as the object recognition neural network in the above scene A, the gesture recognition neural network in the above scene B, and so on.
  • the computing device 32 may include multiple nodes.
  • the computing device 32 may be a distributed computing system, and the multiple nodes included in the computing device 32 may be computer devices with computing capabilities; on the other hand, the computing device 32 may be one
  • the device, which includes multiple nodes, may be functional modules/devices in the computing device 32, and so on.
  • the preprocessing node 321 is used to preprocess the received raw data, such as data cleaning, etc.; the feature engineering node 322 performs feature engineering on the preprocessed raw data to obtain a data set.
  • the preprocessed original data is the data set.
  • the data set can be divided into training data set and test data set.
  • the model construction node 323 is used to randomly generate the neural network architecture according to the training data set, configure hyperparameters for it, and get the initialized neural network; the model search node 324 is used to execute the neural network search method, and perform multiple times on the initialized neural network Evolve to get the neural network finally evolved.
  • the model building node 323 is used to mutate the neural network in the evolution process to obtain a candidate neural network; the model training node 325 can train the initialized neural network, candidate neural network, etc. to obtain the trained neural network; model evaluation node 326 is used to test the trained neural network according to the test data set, and obtain the evaluation parameters of the trained neural network, such as accuracy, running time, parameter amount, etc.
  • the model search node 324 Before the model search node 324 trains and tests the neural network, it screens candidate neural networks based on the partial order pruning algorithm, and only trains and tests those whose network structure is better than the neural network before mutation to narrow down the neural network search Space to improve the search efficiency of neural networks.
  • the model search node 324 is also used to screen out the optimal one or more neural networks or the Pareto optimal neural network based on the evaluation parameters of the trained neural network obtained by the model evaluation node 326, as the neural network to enter the next evolution.
  • the internet After multiple evolutions, one or more neural networks are obtained. Combining the obtained neural networks with feature engineering, preprocessing and other modules can form a target neural network.
  • the computing device 32 can send the target neural network to the client device 31.
  • the system may also include a user device 34.
  • the user device 34 can download the target neural network to the client device 31 or the computing device 32 to use the target neural network to predict the data to be predicted , The prediction result is obtained; or, the user equipment 34 may also send the data to be predicted to the client device 31.
  • the client data 31 receives the data to be predicted
  • the data to be predicted is input to the target neural network to obtain the prediction result, and then the user device 34 Send the forecast result.
  • the target neural network may be the object recognition neural network in scene A, the gesture recognition neural network in scene B described above, and the data to be predicted may be the image to be recognized in scene A or scene B.
  • Each node in the foregoing computing device 32 and computing device 32 may be a cloud server, a server, a computer device, a terminal device, etc., which will not be repeated here.
  • the aforementioned client device 31 or user device 34 may be a mobile phone, a tablet computer, a personal computer, a vehicle, a vehicle-mounted unit, a point of sales (POS), a personal digital assistant (PDA), a drone, and a smart watch. , Smart glasses, VR equipment, etc., are not limited here.
  • the client device 31 may also be a server.
  • preprocessing node 321, the special engineering node 322, the model construction node 323, the model training node 325, and the model evaluation node 326 are not necessary nodes for the computing device 32.
  • the aforementioned preprocessing node 321, special engineering node 322, model The functions implemented by one or more of the construction node 323, the model training node 325, and the model evaluation node 326 can also be integrated in the model search node 324.
  • the client equipment 31, the user equipment 34, and the database 33 in the system are also not necessary equipment for the system, and the system does not include the above-mentioned equipment, or may also include other equipment or functional units, which is not limited in the embodiment of the present application.
  • a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture.
  • the deep learning architecture refers to the algorithm of machine learning. Multi-level learning is carried out on the abstract level of.
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
  • a convolutional neural network (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (the pooling layer is optional), and a neural network layer 230.
  • the convolutional layer/pooling layer 220 may include layers 221-226, for example: in an implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer. Layers, 224 is the pooling layer, 225 is the convolutional layer, and 226 is the pooling layer; in another implementation, 221 and 222 are the convolutional layers, 223 is the pooling layer, and 224 and 225 are the convolutional layers. Layer, 226 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 221 can include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the value of stride) to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row ⁇ column) are applied. That is, multiple homogeneous matrices.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), and the feature maps extracted by the multiple weight matrices of the same size have the same size, and then the multiple extracted feature maps of the same size are combined to form a convolution operation. Output.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions. .
  • the initial convolutional layer (such as 221) often extracts more general features, which can also be called low-level features; with the convolutional neural network
  • the features extracted by the subsequent convolutional layers (for example, 226) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the 221-226 layers as illustrated by 220 in Figure 4 can be a convolutional layer followed by a layer
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the sole purpose of the pooling layer is to reduce the size of the image space.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain an image with a smaller size.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of the average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 200 After processing by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 200 needs to use the neural network layer 230 to generate one or a group of required classes of output. Therefore, the neural network layer 230 can include multiple hidden layers (231, 232 to 23n as shown in FIG. 4) and an output layer 240. The parameters contained in the multiple hidden layers can be based on specific task types. Relevant training data of, for example, the task type can include image recognition, image classification, image super-resolution reconstruction, etc...
  • the output layer 240 After the multiple hidden layers in the neural network layer 230, that is, the final layer of the entire convolutional neural network 200 is the output layer 240.
  • the output layer 240 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
  • the convolutional neural network 200 shown in FIG. 4 is only used as an example of a convolutional neural network. In specific applications, the convolutional neural network may also exist in the form of other network models.
  • FIG. 5 is a hardware structure of a chip provided by an embodiment of the present invention.
  • the chip includes a neural network processor 30.
  • the chip can be set in the computing device 32 as shown in FIG. 3 to complete the calculation work of neural network training and testing.
  • the chip can also be set in the client device 31 or the user device 34 as shown in FIG. 3 to complete the prediction work of the predicted data through the target neural network.
  • the algorithms of each layer in the convolutional neural network or deep residual neural network as shown in FIG. 4 can be implemented in the chip as shown in FIG. 5.
  • the neural network processor 30 may be any processor suitable for large-scale XOR operation processing, such as NPU, TPU, or GPU.
  • NPU can be mounted on the host CPU (Host CPU) as a coprocessor, and the host CPU assigns tasks to it.
  • the core part of the NPU is the arithmetic circuit 303.
  • the arithmetic circuit 303 is controlled by the controller 304 to extract matrix data in the memory (301 and 302) and perform multiplication and addition operations.
  • the arithmetic circuit 303 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.
  • the arithmetic circuit 303 fetches the weight data of the matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303.
  • the arithmetic circuit 303 fetches the input data of matrix A from the input memory 301, and performs matrix operations based on the input data of matrix A and the weight data of matrix B, and the partial or final result of the obtained matrix is stored in an accumulator 308 .
  • the unified memory 306 is used to store input data and output data.
  • the weight data is directly transferred to the weight memory 302 through the direct memory access controller (DMAC, Direct Memory Access Controller) 305 of the storage unit.
  • the input data is also transferred to the unified memory 306 through the DMAC.
  • DMAC Direct Memory Access Controller
  • the bus interface unit (BIU, Bus Interface Unit) 310 is used for the interaction between the DMAC and the instruction fetch buffer (Instruction Fetch Buffer) 309; the bus interface unit 301 is also used for the instruction fetch memory 309 to obtain instructions from the external memory; the bus interface unit 301 also The storage unit access controller 305 obtains the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 306, or to transfer the weight data to the weight memory 302, or to transfer the input data to the input memory 301.
  • the vector calculation unit 307 has multiple arithmetic processing units, if necessary, further processing the output of the arithmetic circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on.
  • the vector calculation unit 307 is mainly used for the calculation of non-convolutional layers or fully connected layers (FC, fully connected layers) in the neural network. Specifically, it can process: Pooling (pooling), Normalization (normalization), etc. calculations.
  • the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 307 generates a normalized value, a combined value, or both.
  • the vector calculation unit 307 stores the processed vector to the unified memory 306.
  • the vector processed by the vector calculation unit 307 can be used as the activation input of the arithmetic circuit 303, for example, for use in subsequent layers in a neural network, as shown in FIG. 4, if the current processing layer is a hidden layer 1 (231), the vector processed by the vector calculation unit 307 can also be used for calculation in the hidden layer 2 (232).
  • the instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304.
  • the unified memory 306, the input memory 301, the weight memory 302, and the fetch memory 309 are all On-Chip memories.
  • the external memory is independent of the NPU hardware architecture.
  • the calculation of each layer in the convolutional neural network shown in FIG. 4 or the calculation of each residual unit in the deep residual network may be performed by the arithmetic circuit 303 or the vector calculation unit 142.
  • a neural network search method provided by an embodiment of this application can be used to search for the object recognition neural network in scene A or the gesture recognition neural network in scene B.
  • the embodiment of this application provides The neural network search method can be applied to the AutoML architecture to realize the automatic generation of machine learning models.
  • the method 60 may be executed by the computing device 320 shown in FIG. 3.
  • the computing device may be a distributed computing device, including a preprocessing node 321, a feature engineering node 322, a model construction node 323, and a model search Node 324, model training node 325, model evaluation node 326, and so on.
  • step S6021 of obtaining the data set in step S602 in the method 60 can be executed by the preprocessing node 321 or the feature engineering node 322; the obtaining of N neural networks in step S602 and step S6042 can be executed by the model construction node 323;
  • the training process in S6022, S6046 may be executed by the model training node 325, the testing process in steps S6022, S6046 may be executed by the model evaluation node 326, and steps S6023, S604, S6044, and S6048 may be executed by the model search node 324.
  • step S602 and step S6042 may also be executed by the model evaluation node 326.
  • the method 60 or each step in the method may be processed by the CPU separately, or jointly processed by the CPU and GPU, or GPU may not be used, and other processors suitable for neural network calculations may be used, for example, as shown in Figure 4
  • the shown neural network processor 40 is not limited here.
  • the execution subject is a computing device as an example.
  • the method 60 may include some or all of the following steps:
  • the computing device obtains a data set and N neural networks, where N is a positive integer.
  • the data set in the method 60 can be the original data after data cleaning or the data set obtained by feature engineering of the original data.
  • the original data or data set can be derived from the database 330 as shown in FIG. 3, or it can be a client device. 310 Collected or acquired.
  • the data set may include a training data set and a test data set.
  • the training data set is used to train the initialized neural network
  • the test data set is used to test the performance of the trained neural network, such as accuracy and running time.
  • the training data set includes multiple training samples
  • the test data set may include multiple test samples
  • one training sample or one test sample may include input data and labels.
  • the input data of the training sample is used to input to the initialized neural network to obtain the prediction result corresponding to the input data;
  • the label is the real result corresponding to the input data, and the error between the real result and the prediction result is used to feed back and adjust the initialized neural network.
  • the model parameters of the network to obtain the trained neural network.
  • the input data of the test sample is used for input to the trained neural network to obtain the prediction result corresponding to the input data, and the accuracy of the trained neural network is evaluated according to the error between the prediction result and the real result, or the input data Input to the trained neural network, test the running time of the trained neural network, etc.
  • the N neural networks may be one or more neural networks constructed artificially, or one or more neural networks randomly generated by a computing device.
  • the N neural networks may also be N neural networks selected from randomly generated M neural networks, and M is a positive integer not less than N.
  • An implementation of obtaining N neural networks by a computing device may include but is not limited to the following steps:
  • S6021 The computing device randomly generates M neural networks, and M is a positive integer.
  • randomly generating M neural networks can refer to the relevant description in the following embodiment of the method for randomly generating neural networks, which will not be repeated here.
  • the computing device separately trains and tests the M neural networks through the data set, and obtains P evaluation parameters corresponding to each neural network in the M neural networks.
  • the computing device selects N neural networks from the M neural networks according to the P evaluation parameters corresponding to each of the M neural networks, and N is not greater than M.
  • the computing device selects P neural networks whose evaluation parameters meet preset conditions from M neural networks, for example, selects from M neural networks that the accuracy is greater than a preset threshold (such as 90%) , A neural network whose running time is less than the first duration (such as 2s), obtain N neural networks.
  • a preset threshold such as 90%
  • a neural network whose running time is less than the first duration such as 2s
  • the computing device performs K evolutions on the N neural networks to obtain the neural network obtained by the Kth evolution, where K is a positive integer.
  • K is a positive integer.
  • the i-th evolution includes but is not limited to the following steps:
  • the computing device mutates the network structure of the neural network obtained in the i-1th evolution to obtain the mutated neural network, and the neural network obtained in the 0th evolution is N neural networks.
  • the computing device may mutate one or more neural networks in the neural network obtained by the i-1th evolution, and may also mutate each neural network in the neural network obtained by the i-1th evolution.
  • mutating the neural network please refer to the relevant description in the following neural network mutation method embodiment, which will not be repeated here.
  • the computing device selects, from the mutated neural network, the selected neural network whose network structure is better than the neural network obtained by the i-1th evolution, and obtains the candidate neural network.
  • a neural network and the neural network obtained after mutation belong to a network with a similar topology.
  • a wider and deeper network is better than a narrower and shallower network. Therefore, the network can be initially screened based on the depth and width of the network to filter out the poor network.
  • “wide” and “narrow” respectively describe the number of network channels; “deep” and “shallow” respectively describe the number of layers of the network.
  • the more layers and channels the better.
  • ResNets with similar topological structures the greater the number of residual units and the greater the number of channels, the greater the network. excellent.
  • each neural network in the neural network obtained by the i-1th evolution can be mutated.
  • the neural network of the network as a candidate neural network. It should be understood that the candidate neural network is the neural network selected, including at least one neural network.
  • the neural network is mutated to generate a neural network similar to its topology, and the characteristics of the neural network with similar topology are used to prun the search space of the neural network, which reduces the need for training. And the number of neural networks tested to improve the efficiency of automatic machine learning.
  • the computing device trains and tests each neural network in the candidate neural network, and obtains P evaluation parameters corresponding to each neural network in the candidate neural network.
  • P is a positive integer.
  • the data set can be divided into training data set and test data set.
  • the computing device trains each neural network in the candidate neural network through the training data set, and then uses the test data set to evaluate the trained neural network to obtain P evaluation parameters corresponding to each neural network.
  • the evaluation parameter is used to evaluate the performance of the neural network trained through the training data set, such as at least one of accuracy, running time, parameter amount, and the like.
  • the computing device screens the neural network obtained by the i-th evolution from the set of P evaluation parameters corresponding to each neural network in the set of the neural network obtained by the i-1th evolution and the candidate neural network.
  • the preset accuracy and the preset duration may be set by the customer and sent by the client device to the computing device to indicate the accuracy and running time of the target neural network required by the client.
  • the neural network obtained by the K-th evolution may be a trained neural network.
  • the computing equipment can be obtained from the neural network obtained from the Kth evolution or the combination of the neural network obtained from the Kth evolution with the feature engineering module and the data preprocessing module according to the customer's requirements for the P evaluation parameters of the neural network.
  • the neural network selects the target neural network that meets the customer's requirements, and then sends the neural network to the customer device; the computing device can also combine the neural network obtained by the Kth evolution or the neural network obtained by the Kth evolution with the feature engineering module,
  • the neural network obtained by the combination of the data preprocessing modules is sent to the client device as the target neural network, which is not limited here.
  • the target neural network may be an object recognition neural network in scene A.
  • the data set includes a plurality of samples, and each sample includes a sample image and an object type corresponding to the sample image.
  • the target neural network may also be the gesture recognition neural network in the aforementioned scene B.
  • the data set includes a plurality of samples, and each sample includes a sample image and a gesture type corresponding to the sample image.
  • the neural network obtained from the i-1th evolution has been trained and tested during the i-1th evolution, and P evaluations corresponding to each neural network in the i-1th evolution are obtained. parameter.
  • the neural network obtained by the 0th evolution is the above N neural networks.
  • the computing device can perform a training data set on each of the N neural networks. Perform training, and then use the test data set to evaluate the trained neural network, and obtain P evaluation parameters corresponding to each neural network in the N neural networks.
  • the neural network obtained by the i-th evolution can be selected from the set according to the level of accuracy.
  • the first Q neural networks with the highest accuracy can be selected from the set as the neural network obtained by the i-th evolution; For example, select a neural network whose accuracy is greater than a preset value from the set, such as 90% neural network as the neural network obtained by the i-th evolution.
  • the computing device performs non-dominant sorting of the neural networks in the set according to the P evaluation parameters corresponding to each neural network in the set; furthermore, it is determined that the neural network obtained by the i-th evolution is The neural network that is not dominant in this set.
  • each of the P evaluation parameters corresponding to the dominant neural network is not inferior to the dominant neural network, and at least one of the P evaluation parameters corresponding to the dominant neural network is superior to the dominant neural network.
  • the dominant neural network are accuracy and running time.
  • Neural network A and neural network B are two neural networks in geometry. When neural network A and neural network B meet at least one of the following two conditions, the neural network A dominates the neural network B:
  • the accuracy of neural network A is higher than that of neural network B and the running time of neural network A is not higher than the running time of neural network B;
  • the running time of neural network A is lower than that of neural network B and the accuracy of neural network A is not lower than that of neural network B.
  • each neural network in the neural network obtained by the i-1th evolution is not dominated by other neural networks in the neural network obtained by the i-1th evolution. At this time, it is also called The neural network obtained for the i-1th evolution is the neural network at the forefront of Pareto.
  • the computing device selects from the set a schematic flow diagram of an implementation manner of the neural network obtained by the i-th evolution.
  • the implementation manner may include but is not limited to the following steps:
  • S60481 Determine the jth neural network from the candidate neural network, where j is a positive integer, and j is not greater than the total number of neural networks in the candidate neural network.
  • the neural network in the Pareto front is the neural network obtained by the i-1th evolution.
  • step S6048 After step S60485, further, execute S60486.
  • the neural network NN1 dominates the neural network NN2 on the mine-clearing front, remove the dominant neural network NN2 from the Pareto front, and add the dominant neural network NN1 to the Pareto front; if the neural network NN1 neither dominates the Pareto front If the neural network on the Reto frontier is not dominated by the neural network on the Pareto frontier, then the neural network NN1 is a new Pareto optimal, and the neural network NN1 is directly added to the Pareto frontier; The neural network NN1 is dominated by the neural network on the Pareto front, and the Pareto front is not updated.
  • the neural network obtained is getting better and better.
  • a multi-objective optimization scheme is adopted, so that the neural network obtained by the Kth evolution can reach the balance of P evaluation parameters, and furthermore, it can avoid the neural network obtained by the Kth evolution. A situation where one evaluation parameter is excellent, while the other evaluation parameters are poor.
  • ResNet and CNN takes ResNet and CNN as examples to introduce the method of randomly generating a neural network and the method of mutating the neural network involved in the embodiments of the present application.
  • the neural network obtained by the above K-th evolution is a network determined in the search space by the neural network search method described in the first embodiment.
  • the search space is constructed by the basic unit and the parameters of the basic unit.
  • the search space is used to search the neural network obtained by the Kth evolution.
  • the parameters of the basic unit include the type of the basic unit, the number of channels, and the size parameters. At least one item, the basic unit is used to perform the first operation and the second operation on the feature map of the input basic unit, the first operation is used to double or keep the number of the feature map input to the basic unit, and the second operation is used To change the size of the feature map input to the basic unit from the original first size to the second size or keep the first size unchanged, the first size is greater than the second size.
  • the size here can refer to the side length or area of the feature map.
  • the channel number parameter is used to indicate the change in the number of feature maps processed by the basic unit, such as doubling or remaining unchanged; the size parameter is used to indicate the change in the size of the feature map processed by the basic unit, such as doubling, Stay the same, etc.
  • the neural network may be ResNet
  • the basic unit is also called residual unit
  • ResNet may include multiple residual units and at least one fully connected layer, and each residual unit may have at least two (such as 3) It is composed of several convolutional layers, where the number of fully connected layers can be preset or changed, which is not limited here.
  • the parameters of the residual unit are used to encode the network structure of ResNet. For example, the order of symbols is used to refer to the order of the residual units in ResNet.
  • the residual unit coded as "1" means that the number of channels of the residual unit remains unchanged, and the residual unit coded as "2" It means that the number of channels of the residual unit is doubled, and the residual unit coded with "-" in front means that the feature map size of the residual unit is reduced by half.
  • the network structure of ResNet coded "121-211-121” is shown in Figure 7. Among them, the width of a residual unit in FIG. 7 reflects the number of channels, and the length of a residual unit reflects the size of its feature map.
  • a ResNet is obtained by using a number of combinations of "1", “2" and "-".
  • the process of randomly generating ResNet by a computing device can be converted into a process of randomly generating a character string. It should be understood that when the computing device randomly generates a character string, it needs to add constraint conditions, or it needs to filter the randomly generated character string to remove ResNet that does not meet the requirements. For example, two characters "-" cannot be arranged consecutively.
  • the computing device can mutate and generate multiple mutated ResNets for one ResNet, where each mutated neural network is generated by ResNet through one mutation.
  • the one-time mutation of ResNet by the computing device may specifically be one of the following implementation manners:
  • the specific implementation may be to randomly transform a "1" in the ResNet code into a "2". For example, as shown in Figure 7, if the number of channels of the sixth residual unit of ResNet shown in Figure (a) remains unchanged from the original number of channels to double the number of channels, the encoding "121-111-211" ResNet is mutated into ResNet coded as "121-212-111", as shown in Figure (b), it should be understood that the number of channels of all residual units located after the sixth residual unit is doubled on the original basis.
  • the step size of one residual unit is determined by the step size of the convolution kernel corresponding to at least two layers of convolutional layers it includes. For example, suppose the residual unit includes two layers of convolutional layers.
  • step size of the corresponding convolution kernel is 1, then the step size of the residual unit is 1. If the step size of the residual unit is changed to 2, then the step of one convolutional layer in the two convolutional layers is required. Change the length to 2, if the step length of the first convolutional layer is changed to 2.
  • Randomly delete a residual unit with a constant number of channels in the ResNet The specific implementation may be to randomly delete a "1" in the encoding of the ResNet. For example, as shown in Figure 7, the fifth residual unit of ResNet shown in (a) is deleted, and the ResNet coded as "121-111-211” is mutated into ResNet coded as "121-11-211", As shown in figure (f).
  • the embodiments of this application may also include other mutation methods, for example, randomly adding a "-" or randomly reducing a "-" in the ResNet code; another example, randomly deleting in the ResNet code A "2" or a random addition of a "2", etc., the specific structure of the ResNet after its mutation can be inferred by referring to the meaning of the encoding of each residual unit, and will not be repeated here.
  • the neural network may be a convolutional neural network, and the basic unit may be called a layer structure.
  • the convolutional neural network is composed of a convolutional layer, a pooling layer, and a fully connected layer. Wherein, the number of fully connected layers can be preset or changed, which is not limited here.
  • the order of symbols is used to refer to the order of each layer structure in CNN; the layer structure coded as "1” means that the layer structure is a convolutional layer and the number of channels remains unchanged; the layer structure coded as "2" It means that the layer structure is a convolutional layer and the number of channels is doubled; the layer structure coded with "-" in front means that the step size of the convolution kernel in this layer structure is changed from 1 to 2; the layer structure coded as "3” means this layer The structure is a pooling layer, and the feature map size is reduced by half; the layer structure coded as "3", "4", and "5" means that the layer structure is a pooling layer, and the pooling layer coded as "3” adopts the average Pooling, the pooling layer coded "4" uses average pooling, and the pooling layer coded "5" uses LP pooling.
  • the pooling layer selects the 2 ⁇ 2 area in the input image to perform the pooling operation, and its function is to reduce the size of the feature map generated by the convolution to 1/4 of the original size as an example.
  • other types of pooling layers can also be coded, and the selected regions of the pooling operation can also be distinguished by coding, which is not limited here.
  • the network structure of the CNN encoding "121-113-211" is shown in Figure 8.
  • the width of a layer structure in FIG. 8 reflects the number of channels
  • the length of a layer structure reflects the size of its feature map.
  • the process of randomly generating a CNN by a computing device can be converted to a process of randomly generating a character string. It should be understood that when the computing device randomly generates a character string, it needs to add constraint conditions, or it needs to filter the randomly generated character string to remove the CNN that does not meet the requirements. For example, two characters “-” cannot be arranged consecutively, and “3” will not be adjacent after “-”, and the pooling layer will not appear continuously, that is, "3", "4", and "5" are not adjacent.
  • the computing device can mutate a CNN to generate multiple mutated CNNs, where each mutated neural network is generated by the CNN through one mutation.
  • the one-time mutation of the CNN by the computing device may specifically be one of the following implementation manners:
  • the specific implementation can be to randomly transform a "1" in the CNN code into a "2".
  • the CNN with "121-113-211” will be encoded
  • the mutation is CNN coded as "121-113-212", as shown in figure (b), it should be understood that the number of channels of all layer structures after the eighth layer structure is doubled on the original basis. It should be understood that the number of channels of multiple convolutional layers in the CNN can also be doubled randomly, which is limited in the embodiment of the present application.
  • the specific implementation can be to randomly exchange the position between a "1" and a symbol “2" in the CNN code to obtain the mutated CNN Encoding.
  • the CNN coded “121-113-211” passes the mutation process, and the code of the mutated CNN is "121-113-211".
  • the specific implementation can be to randomly change the step size of the CNN
  • the position of a symbol "-" in the code is moved from before one convolutional layer to before another convolutional layer.
  • the step size of the third layer structure in the CNN coded as "121-123-211" shown in (a) is changed from 2 to 1
  • the step size of the fourth layer structure is changed Change from 1 to 2 to get the mutated CNN, that is, the CNN coded as "121-1132-11", as shown in (d).
  • Randomly double the step size of one or more convolutional layers in CNN The specific implementation can be to randomly insert one or more symbols "-" into the CNN code, and insert the CNN code after the symbol "-" Does not include two adjacent symbols "-", and the symbol -" will not be adjacent to "3".
  • a layer of convolutional layer is randomly inserted into the CNN.
  • the inserted convolutional layer can be a convolutional layer with a constant number of channels, or a convolutional layer with a doubled number of channels.
  • the specific implementation can be in this Randomly insert a "1" or "2" into the CNN encoding. For example, as shown in Figure 8, add a "1" after the fifth convolutional layer in the CNN encoding shown in Figure (a), which means encoding The CNN whose name is "121-113-211” is mutated into the CNN whose code is "121-1131-211", as shown in (e).
  • the convolutional layer can be a convolutional layer with a constant number of channels, or a convolutional layer with doubled channels.
  • the specific implementation can be: in the CNN Randomly delete a "1" or "2" from the code. For example, the 8th layer structure of CNN is deleted, and the CNN coded as "121-123-211” is mutated into CNN coded as "121-21-11", as shown in (f). Randomly delete one or more symbols "1" from the CNN code, or randomly delete one or more symbols "2".
  • One or more pooling layers are randomly added to the CNN, or one or more pooling layers are randomly deleted. It should be understood that a pooling layer will not be added before and after the pooling layer, that is to say , The mutated CNN will not have two pooling layers adjacent to each other.
  • the specific implementation can be: randomly delete a "3” or randomly add a "3” in the CNN encoding to obtain the mutated CNN, It should be understood that the CNN obtained by adding "3" before or after "3” needs to be filtered out.
  • CNN may also include other mutation operations, which will not be repeated here.
  • the residual unit in the above-mentioned ResNet may include one or more of a normal convolution layer, a hole convolution layer, a depth classifiable convolution layer, a fully connected layer, and the like.
  • the convolutional layer in the above-mentioned CNN may be a normal convolutional layer, a hollow convolutional layer, a depth-classifiable convolutional layer, and the like.
  • the internal network structure of each residual unit in ResNet may be the same or different, and the type of each convolutional layer in CNN may be the same or different, which is not limited in the embodiment of the present application.
  • the residual unit in the above ResNet may only include a normal convolutional layer or a combination of a normal convolutional layer and a fully connected layer;
  • the convolutional layer of the above-mentioned CNN may be a normal convolutional layer without holes Convolutional layer and depth-classifiable convolutional layer to avoid that the NPU chip does not support the hollow convolutional layer and the deep-classifiable convolutional layer.
  • the neural network search method cannot be applied to the hardware platform, so that the neural network provided by this application The search method can be universally applied to various devices or platforms.
  • the target neural network model can be sent to the client device or user device, and further, the client device and the user device can implement corresponding functions based on the target neural network model.
  • the neural network search method of the embodiment of the present application can be applied to the field of automatic driving.
  • a vehicle obtains images through a camera to observe obstacles in the surrounding environment of the vehicle in real time.
  • the vehicle or the device communicating with the vehicle can make decisions based on the identified objects in the surrounding environment to drive safely.
  • an object recognition method provided by this embodiment of the present application can be executed by the vehicle in FIG. 2A, the client device 31 or the user device 34 in FIG. 3, and the method includes but is not limited to the following step:
  • S902 Acquire an image to be recognized.
  • S904 Input the image to be recognized into the object recognition neural network to obtain the object type corresponding to the image to be recognized.
  • the to-be-recognized image may be an image of the surrounding environment acquired by the vehicle through a camera, and the to-be-recognized image is processed by an object recognition neural network to recognize objects in the surrounding environment of the vehicle.
  • the object recognition neural network may be a network determined by the neural network search method described in the first embodiment in the search space. At this time, each sample in the data set in the first embodiment includes a sample image and the sample. The type of object corresponding to the image.
  • the search space is constructed by the basic unit and the parameters of the basic unit.
  • the search space is used to search the object recognition neural network.
  • the parameters of the basic unit include at least one of the type of the basic unit, the number of channels, and the size parameter.
  • the basic unit is used to perform the first operation and the second operation on the feature map of the input basic unit, where the feature map is the feature map of the image to be recognized, and the first operation is used to double the number of feature maps of the input basic unit Or remain unchanged, the second operation is used to change the size of the feature map input to the basic unit from the original first size to the second size or keep the first size unchanged, the first size is greater than the second size, for example, the first The size is twice the second size, where the size refers to the side length of the feature map.
  • the channel number parameter is used to indicate the change in the number of feature maps processed by the basic unit, such as doubling or remaining unchanged;
  • the size parameter is used to indicate the change in the size of the feature map processed by the basic unit, such as
  • the neural network in the search space may be ResNet.
  • the basic unit is also called a residual unit, and the residual unit may be composed of at least two convolutional layers.
  • the residual unit also includes a residual module, which is used to add the feature map of the input residual unit and the feature map of the input residual unit processed by the residual unit, and add The result is input to the next residual unit.
  • a neural network can be constructed by encoding the residual unit, and the search space can be expanded by mutation. For specific implementation, please refer to the related description in the foregoing FIG. 7 and will not be repeated here.
  • the neural network in the search space may be a CNN.
  • the basic unit is also called a layer structure, and the layer structure may be a convolutional layer, a pooling layer, and so on.
  • the neural network can be constructed through coding, and the search space can be expanded through mutation. For specific implementation, please refer to the related description in Figure 8 above, which will not be repeated here.
  • the neural network search method of the embodiment of the present application can be applied to the field of image recognition, for example, the user equipment obtains an image through a camera, and further, can be based on the recognized surrounding environment In the object, make a decision to drive safely.
  • a gesture recognition method provided by this embodiment of the application can be implemented by user equipment such as monitors, mobile phones, smart TVs, etc. in FIG. 2B, and client equipment 31 or user equipment 34 in FIG. Implementation, the method includes but not limited to the following steps:
  • S908 Input the image to be recognized into the gesture recognition neural network to obtain the gesture type corresponding to the image to be recognized.
  • the user equipment can also perform operations corresponding to the recognized gesture type according to the gesture type, for example, when the first gesture is recognized, turn on the music player; for another example, when the point comes, if the second gesture is recognized When the time, the phone is connected and so on.
  • the gesture recognition neural network may be a network determined by the neural network search method described in the first embodiment in the search space. At this time, each sample in the data set in the first embodiment includes a sample image and the sample. The type of gesture corresponding to the image.
  • the search space in the embodiment of this application is constructed by the basic unit and the parameters of the basic unit.
  • the search space is used to search the object recognition neural network.
  • the parameters of the basic unit include the type of the basic unit and the number of channels.
  • the basic unit is used to perform the first operation and the second operation on the feature map of the input basic unit, where the feature map is the feature map of the image to be recognized, and the first operation is used to input the basic
  • the number of feature maps of the unit is doubled or remains unchanged.
  • the second operation is used to change the size of the feature map of the basic unit from the original first size to the second size or keep the first size unchanged.
  • the first size is twice the second size, where the size refers to the side length of the feature map.
  • the channel number parameter is used to indicate the change in the number of feature maps processed by the basic unit, such as doubling or remaining unchanged;
  • the size parameter is used to indicate the change in the size of the feature map processed by the basic unit, such as doubling, Stay the same, etc.
  • the neural network in the search space may be ResNet.
  • the basic unit is also called a residual unit, and the residual unit may be composed of at least two convolutional layers.
  • the residual unit also includes a residual module, which is used to add the feature map of the input residual unit and the feature map of the input residual unit processed by the residual unit, and add The result is input to the next residual unit.
  • a neural network can be constructed by encoding the residual unit, and the search space can be expanded by mutation. For specific implementation, please refer to the related description in the foregoing FIG. 7 and will not be repeated here.
  • the neural network in the search space may be a CNN.
  • the basic unit is also called a layer structure, and the layer structure may be a convolutional layer, a pooling layer, and so on.
  • the neural network can be constructed through coding, and the search space can be expanded through mutation. For specific implementation, please refer to the related description in Figure 8 above, which will not be repeated here.
  • the horizontal axis is the running time of the architecture on the chip platform, and the vertical axis is the top1 accuracy on the data set (ImageNet).
  • ResNet18 is the running time of the expert model on the chip platform and the top1 accuracy on the data set (ImageNet) (training 40 epochs).
  • the other points are the best models we found at the same running speed. It can be seen from Fig. 10A that all models in the wireframe 1001 are superior to the existing ResNet18 model in terms of speed and accuracy. Taking the leftmost point in the wireframe 1001 as an example, while ensuring the same accuracy, the speed of our search model is 4.42 milliseconds, while ResNet is 8.11 milliseconds. The speed is nearly 2 times faster.
  • Fig. 10B the horizontal axis is the parameter quantity of the model, and the vertical axis is the top1 accuracy on the data set (ImageNet).
  • Point B and point C are ResNet18-1/4 and ResNet18-1/8, respectively, which are expert models. Other points are the models obtained by the neural network search method of this application.
  • all models in wireframe 1002 are superior in speed and accuracy to the existing ResNet18-1/8 model; all models in wireframe 1003 are superior in speed and accuracy to ResNet18- 1/4 model.
  • the device 1100 may be the computing device 32 in the system shown in FIG. 3, and the device 1100 may include but is not limited to the following functional units:
  • the obtaining module 1110 is used to obtain a data set and N neural networks, where N is a positive integer;
  • the evolution module 1120 is used to perform K evolutions on the N neural networks to obtain the neural network obtained by the Kth evolution, and K is a positive integer;
  • the evolution module 1120 includes a mutation unit 1121, a first screening unit 1122, and a second screening unit 1123, wherein:
  • the mutation unit 1121 is used to: in the i-th evolution process, mutate the network structure of the neural network obtained from the i-1th evolution to obtain the mutated neural network;
  • the first screening unit 1122 is configured to: in the i-th evolution process, select a neural network whose network structure is better than the neural network obtained from the i-1th evolution from the mutated neural network, to obtain Candidate neural network;
  • the second screening unit 1123 is configured to: in the i-th evolution process, according to the neural network obtained from the i-1th evolution and the P evaluation parameters corresponding to each neural network in the set of candidate neural networks , The neural network obtained by the i-th evolution is selected from the set; wherein, the P evaluation parameters are used to evaluate the performance of each neural network in the set after training and testing of the data set Performance, i and P are positive integers, 1 ⁇ i ⁇ K.
  • the mutation unit 1121 is specifically configured to mutate the first neural network in the neural network obtained by the i-1th evolution, and the mutation unit 1121 is performing the mutation on the i-th neural network.
  • the first neural network in the neural network obtained by the first evolution undergoes mutation and performs at least one of the following steps:
  • One or more pooling layers are deleted from one or more neural networks in the neural network obtained by the i-1th evolution.
  • the mutation unit 1121 is specifically configured to mutate the first neural network in the neural network obtained by the i-1th evolution, and the mutation unit 1121 is performing the mutation on the i-th neural network.
  • the first neural network in the neural network obtained by the first evolution undergoes mutation and performs at least one of the following steps:
  • One or more residual units are deleted from one or more neural networks in the neural network obtained by the i-1th evolution.
  • the first screening unit 1122 is specifically configured to: select a neural network whose network structure is better than that of the first neural network from the neural network after the mutation of the first neural network, and the candidate neural network
  • the network includes a neural network whose network structure is better than that of the first neural network in the neural network after the mutation of the first neural network, and the first neural network is any of the neural networks obtained from the i-1th evolution.
  • a neural network is specifically configured to: select a neural network whose network structure is better than that of the first neural network from the neural network after the mutation of the first neural network, and the candidate neural network
  • the network includes a neural network whose network structure is better than that of the first neural network in the neural network after the mutation of the first neural network, and the first neural network is any of the neural networks obtained from the i-1th evolution.
  • the network structure of the neural network after the mutation of the first neural network is better than the network structure of the first neural network:
  • the number of channels of the neural network after the mutation of the first neural network is greater than the number of channels of the first neural network
  • the number of convolutional layers in the neural network after the mutation of the first neural network is greater than the number of convolutional layers in the first neural network.
  • the second screening unit 1123 is specifically configured to: perform non-dominant sorting of the neural networks in the set according to the P evaluation parameters corresponding to each neural network in the set; and determine the The neural network obtained by the i-th evolution is the neural network that is not dominated in the set; where the second neural network and the third neural network are two neural networks in the set, if the P evaluation parameters are When each evaluation parameter in the second neural network is not inferior to the third neural network and for at least one evaluation parameter of the P evaluation parameters, the second neural network is better than the third neural network , The second neural network dominates the third neural network.
  • the acquisition module 1110 is specifically configured to: randomly generate M neural networks, where M is a positive integer; train and test the M neural networks through the data set to obtain the P evaluation parameters corresponding to each neural network in the M neural networks; and, based on the P evaluation parameters corresponding to each neural network in the M neural networks, select N from the M neural networks Neural network, N is not greater than M.
  • the P evaluation parameters include at least one of running time, accuracy, and parameter amount.
  • Fig. 12A shows an object recognition apparatus provided by an embodiment of this application.
  • the apparatus 1200 may be the client equipment 31 or the user equipment 34 in the system shown in Fig. 3, and the apparatus 1200 may include, but is not limited to, the following functional units:
  • the obtaining unit 1210 is configured to obtain an image to be recognized, and the image to be recognized is an image of the surrounding environment of the vehicle;
  • the recognition unit 1220 is configured to input the image to be recognized into the object recognition neural network to obtain the object type corresponding to the image to be recognized;
  • the object recognition neural network is a network determined by the neural network search method as described in the first embodiment in the search space, and the search space is constructed by the basic unit and the parameters of the basic unit.
  • the parameters of the basic unit include at least one of the type of the basic unit, the number of channels parameter, and the size parameter.
  • the basic unit is used to perform a first operation and a second operation on the feature map input to the basic unit
  • the feature map is the feature map of the image to be recognized
  • the first operation is used to The number of feature maps input to the basic unit is doubled or remains unchanged
  • the second operation is used to change the size of the feature map input to the basic unit from the original first size to the second size or maintain the The first size remains unchanged, and the first size is greater than the second size.
  • FIG. 12B shows a gesture recognition device provided by an embodiment of the application.
  • the device 1201 may be the client equipment 31 or the user equipment 34 in the system shown in FIG. 3, and the device 1201 may include but is not limited to the following functional units:
  • the obtaining unit 1230 is used to obtain the image to be recognized
  • the recognition unit 1240 is configured to input the gesture recognition neural network to which the image to be recognized is input to obtain the gesture type in the image to be recognized;
  • the gesture recognition neural network is a network determined by the neural network search method described in the first embodiment in a search space, and the search space is constructed by basic units and parameters of the basic units.
  • the parameters of the basic unit include at least one of the type of the basic unit, the number of channels parameter, and the size parameter.
  • the basic unit is used to perform a first operation and a second operation on the feature map input to the basic unit
  • the feature map is the feature map of the image to be recognized
  • the first operation is used to The number of feature maps input to the basic unit is doubled or remains unchanged
  • the second operation is used to change the size of the feature map input to the basic unit from the original first size to the second size or maintain the The first size remains unchanged, and the first size is larger than the second size.
  • the first size is twice the second size.
  • FIG. 13 is a schematic diagram of the hardware structure of a neural network search device provided by an embodiment of the present application.
  • the neural network training apparatus 1300 shown in FIG. 13 may include a memory 1301, a processor 1302, a communication interface 1303, and a bus 1304.
  • the memory 1301, the processor 1302, and the communication interface 1303 implement communication connections between each other through the bus 1304.
  • the memory 1301 may be a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 1301 may store a program. When the program stored in the memory 1301 is executed by the processor 1302, the processor 1302 and the communication interface 1303 are used to execute all or part of the steps in the neural network search method of the embodiment of the present application.
  • the processor 1302 may adopt a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processing unit (graphics processing unit, GPU), or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the neural network training device of the embodiment of the present application, or to execute all or part of the neural network search method in the first embodiment of the method of the present application step.
  • the processor 1302 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the neural network training method of the present application can be completed by the integrated logic circuit of hardware in the processor 1302 or instructions in the form of software.
  • the aforementioned processor 1302 may also be a general-purpose processor, a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices , Discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • FPGA Field Programmable Gate Array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1301, and the processor 1302 reads the information in the memory 1301, and combines its hardware to complete the functions required by the units included in the neural network search device of the embodiment of the present application, or perform the functions of the method embodiment of the present application. All or part of the steps in the neural network search method.
  • the communication interface 1303 uses a transceiving device such as but not limited to a transceiver to implement communication between the device 1300 and other devices or a communication network.
  • a transceiving device such as but not limited to a transceiver to implement communication between the device 1300 and other devices or a communication network.
  • the data set can be obtained through the communication interface 1303.
  • the bus 1304 may include a path for transferring information between various components of the device 1300 (for example, the memory 1301, the processor 1302, and the communication interface 1303).
  • the acquisition module 1110 in the neural network search device 1100 may be equivalent to the communication interface 1303 in the neural network search device 1300, and the evolution module 1120 may be equivalent to the processor 1302.
  • FIG. 14 is a schematic block diagram of an electronic device in an embodiment of the present invention
  • the electronic device 1400 shown in FIG. 14 includes a memory 1401, a baseband chip 1402 , RF module 1403, peripheral system 1404 and sensor 1405.
  • the baseband chip 1402 includes at least one processor 14021, such as a CPU, a clock module 14022, and a power management module 14023; the peripheral system 1404 includes a camera 14041, an audio module 14042, a touch screen 14043, etc., further, the sensor 1405 may include a light sensor 14051, The acceleration sensor 14052, the fingerprint sensor 14053, etc.; the modules included in the peripheral system 1404 and the sensor 1405 can be increased or decreased according to actual needs.
  • processor 14021 such as a CPU, a clock module 14022, and a power management module 14023
  • the peripheral system 1404 includes a camera 14041, an audio module 14042, a touch screen 14043, etc.
  • the sensor 1405 may include a light sensor 14051, The acceleration sensor 14052, the fingerprint sensor 14053, etc.; the modules included in the peripheral system 1404 and the sensor 1405 can be increased or decreased according to actual needs.
  • bus which can be an industry standard architecture (English: industry standard architecture, abbreviated as: ISA) bus, and external device interconnection (English: peripheral component interconnect, abbreviated as: PCI) Bus or extended standard architecture (English: extended industry standard architecture, EISA for short) bus, etc.
  • ISA industry standard architecture
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the radio frequency module 1403 may include an antenna and a transceiver (including a modem).
  • the transceiver is used to convert electromagnetic waves received by the antenna into electric current and finally into a digital signal.
  • the transceiver is also used to output the device 1400.
  • the digital signal is converted into electric current and then into electromagnetic wave, and finally the electromagnetic wave is emitted into free space through the antenna.
  • the radio frequency module 1403 may further include at least one amplifier for amplifying signals.
  • wireless transmission can be performed through the radio frequency module 1403, such as Bluetooth (English: Bluetooth) transmission, wireless assurance (English: Wireless-Fidelity, abbreviation: WI-FI) transmission, third-generation mobile communication technology (English: 3rd -Generation (abbreviation: 3G) transmission, fourth-generation mobile communication technology (English: the 4th Generation mobile communication, abbreviation: 4G) transmission, etc.
  • Bluetooth English: Bluetooth
  • WI-FI Wireless-Fidelity, abbreviation: WI-FI
  • 3G 3rd -Generation
  • 4G fourth-generation mobile communication technology
  • the touch screen 14043 can be used to display information input by the user or show information to the user.
  • the touch screen 14043 can include a touch panel and a display panel.
  • a liquid crystal display (English: Liquid Crystal Display, abbreviated: LCD) can be used.
  • Organic Light-Emitting Diode (English: Organic Light-Emitting Diode, abbreviated as: OLED) and other forms to configure the display panel.
  • the touch panel can cover the display panel. When the touch panel detects a touch operation on or near it, it is sent to the processor 14021 to determine the type of the touch event, and then the processor 14021 displays the display according to the type of the touch event. Corresponding visual output is provided on the panel.
  • the touch panel and the display panel are used as two independent components to realize the input and output functions of the device 1400. However, in some embodiments, the touch panel and the display panel may be integrated to realize the input and output functions of the device 1400.
  • the camera 14041 is used to obtain images for input to the object recognition neural network.
  • the object recognition neural network is a deep neural network used to process images.
  • the audio input module 14042 may specifically be a microphone, which can acquire voices.
  • the device 1400 can convert speech into text, and then input the text into the compressed neural network.
  • the compressed neural network is a deep neural network used to process text. For example, the neural network after compression of the text meaning network in scene C.
  • the sensor 1405 may include a light sensor 14051, an acceleration sensor 14052, and a fingerprint sensor 14052.
  • the light sensor 14051 is used to obtain the light intensity of the environment, and the acceleration sensor 14052 (such as a gyroscope) can obtain the movement status of the device 1400.
  • the fingerprint sensor 14053 can input fingerprint information; the sensor 1405 senses the relevant signal and quantizes the signal into a digital signal and transmits it to the processor 14021 for further processing.
  • the memory 1401 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 1401 may also include at least one storage device located far away from the aforementioned processor 14021.
  • the memory 1401 may specifically include a storage instruction area and a storage data area.
  • the storage instruction area may store an operating system, a user interface program, and a communication interface. Programs and other programs, the data storage area can store the data required by the processing to perform related operations, or the data generated by performing related operations.
  • the processor 14021 is the control center of the device 1400. It uses various interfaces and lines to connect various parts of the entire mobile phone. It executes various functions of the device 1400 by running programs stored in the memory 1401 and calling data stored in the memory 1401. .
  • the processor 14021 may include one or more application processors, and the application processors mainly process an operating system, a user interface, an application program, and the like.
  • the processor 14021 reads the information in the memory 1401, and combines its hardware to complete the functions required by the unit included in the object recognition device 1200 or the gesture recognition device 1201 of the embodiment of the present application, or execute the present application The object recognition method or gesture recognition method of the method embodiment.
  • the user realizes the communication function of the device 1400 through the radio frequency module 1403.
  • the device 1400 can receive the target neural network or other data sent by the client device 31 or the computing device 32 in FIG. 3.
  • the devices 1300 and 1400 shown in FIG. 13 and FIG. 14 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the devices 1300 and 1400 also include implementations. Other devices necessary for normal operation. At the same time, according to specific needs, those skilled in the art should understand that the devices 1300 and 1400 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the apparatuses 1300 and 1400 may also only include the necessary devices for implementing the embodiments of the present application, and not necessarily all the devices shown in FIG. 13 and FIG. 14.
  • the apparatus 1300 is equivalent to the computing device 32 or the node in the computing device 32 in FIG. 3, and the apparatus 1400 is equivalent to the client device 31 or the user equipment 34 in FIG. 3.
  • a person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or a communication medium that includes any medium that facilitates the transfer of a computer program from one place to another (for example, according to a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this application.
  • the computer program product may include a computer-readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or structures that can be used to store instructions or data Any other media that can be accessed by the computer in the form of desired program code. And, any connection is properly termed a computer-readable medium.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave to transmit instructions from a website, server, or other remote source
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are actually directed to non-transitory tangible storage media.
  • magnetic disks and optical discs include compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), and blu-ray discs. Disks usually reproduce data magnetically, while optical discs use lasers to reproduce data optically. data. Combinations of the above should also be included in the scope of computer-readable media.
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • DSP digital signal processors
  • ASSIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • processor may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein.
  • the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided in dedicated hardware and/or software modules configured for encoding and decoding, or combined Into the combined codec.
  • the technology may be fully implemented in one or more circuits or logic elements.
  • the technology of this application can be implemented in a variety of devices or devices, including wireless handsets, integrated circuits (ICs), or a set of ICs (for example, chipsets).
  • ICs integrated circuits
  • Various components, modules, or units are described in this application to emphasize the functional aspects of the device for implementing the disclosed technology, but they do not necessarily need to be implemented by different hardware units.
  • various units can be combined with appropriate software and/or firmware in the codec hardware unit, or by interoperating hardware units (including one or more processors as described above). provide.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络的搜索方法、装置及设备,涉及人工智能技术领域,具体涉及自动机器学习技术领域,该方法包括:计算设备获取数据集和N个神经网络(S602),N为正整数;对所述N个神经网络进行K次演化,得到第K次演化得到的神经网络,K为正整数(S604);其中,在每一次演化过程中对上一次演化测到的神经网络的网络结构进行变异,基于偏序假设对变异后得到的网络进行筛选,以得到候选神经网络,进而,从候选神经网络和上一次演化得到的神经网络中筛选出本次演化得到的神经网络,上述方法在每一次演化过程中应用偏序假设对网络的搜索空间进行剪枝,提高自动机器学习的效率。

Description

神经网络的搜索方法、装置及设备
本申请要求于2019年11月30日提交中国专利局、申请号为201911209275.5、申请名称为“神经网络的搜索方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及机器学习技术领域,尤其涉及一种神经网络的搜索方法、装置及设备。
背景技术
机器学习被广泛应用于各个领域,然而,机器学习模型的构建对机器学习专家的要求较高,需要机器学习专家手动对模型进行繁冗的设计和调试,具有较高的人力和时间成本,提高了产品迭代周期。为使得机器学习更易于应用,减少所需的专业知识并改善模型的性能自动机器学习应运而生。
自动机器学习(automatic machine learning,AutoML)提供数据清洗、特征工程、模型构建、模型训练和评估等机器学习的各个过程的全套自动化解决方法,以算力换人力和时间,减少对机器学习工程师的依赖。
目前,AutoML在模型构建、模型训练和评估的过程中通常利用模型搜索方法,以实现模型结构和模型参数的自动优化。目前的搜索方法是在搜索空间选择一些模型进行训练,并针对训练后的模型进行评估,进而根据评估结果来调节模型的结构和参数。然而,该方法需要对选择的每个模型都进行训练和评估,耗时长,自动机器学习效率低。
发明内容
本发明实施例提供一种神经网络的搜索方法、装置及设备,以解决自动机器学习效率低的技术问题。
第一方面,本发明实施例提供了一种神经网络的搜索方法,该方法包括:计算设备获取数据集和N个神经网络,N为正整数;对N个神经网络进行K次演化,得到第K次演化得到的神经网络,K为正整数;其中,第i次演化包括:计算设备对第i-1次演化得到的神经网络的网络结构进行变异,得到变异后的神经网络;从变异后的神经网络中筛选出网络结构优于第i-1次演化得到的神经网络的神经网络,得到候选神经网络;根据第i-1次演化得到的神经网络和候选神经网络的集合中每一个神经网络对应的P个评价参数,从该集合中筛选出第i次演化得到的神经网络;其中,P个评价参数用于评价集合中每一个神经网络的通过数据集训练和测试后的神经网络的性能,i、P为正整数,1≤i≤K。
上述方法,在每一次演化过程中应用偏序假设对网络的搜索空间进行剪枝,排除了网络结构劣的神经网络,减少需要进行训练和评估的模型,避免劣的网络对计算资源占用和时间的消耗,提高自动机器学习的效率。
结合第一方面,在一种可能的实现方式中,第i-1次演化得到的神经网络为CNN,计算设备对第i-1次演化得到的神经网络进行变异可以包括如下至少一个步骤:
将第i-1次演化得到的神经网络中的一个或多个神经网络中的两层卷积层的位置进行交换;
将第i-1次演化得到的神经网络中的一个或多个神经网络中的一层或多层卷积层的通道数加倍;
在第i-1次演化得到的神经网络中的一个或多个神经网络中一层或多层卷积层的卷积核的步长加倍;
在第i-1次演化得到的神经网络中的一个或多个神经网络中插入一层或多层卷积层;
在第i-1次演化得到的神经网络中的一个或多个神经网络中删除一层或多层卷积层;
在第i-1次演化得到的神经网络中的一个或多个神经网络中插入一层或多层池化层;
在第i-1次演化得到的神经网络中的一个或多个神经网络中删除一层或多层池化层。
通过上述变异方式使得变异得到的神经网络的网络结构与变异前的神经网络的网络结构具备相近的拓扑结构,以符合偏序假设,避免对网络结构优的网络剪枝,提高剪枝的准确性。
结合第一方面,在一种可能的实现方式中,第i-1次演化得到的神经网络为ResNet,计算设备对第i-1次演化得到的神经网络进行变异可以包括如下至少一个步骤:
将第i-1次演化得到的神经网络中的一个或多个神经网络中两个残差单元的位置进行交换;
将第i-1次演化得到的神经网络中的一个或多个神经网络中一个或多个残差单元的通道数加倍;
在第i-1次演化得到的神经网络中的一个或多个神经网络中一个或多个残差单元的卷积核的步长加倍;
在第i-1次演化得到的神经网络中的一个或多个神经网络中插入一个或多个残差单元;
在第i-1次演化得到的神经网络中的一个或多个神经网络中删除一个或多个残差单元。
通过上述网络的变异方式使得变异得到神经网络的网络结构与变异前的神经网络的网络结构具备相近的拓扑结构,以符合偏序假设,避免对网络结构优的网络剪枝,提高剪枝的准确性。
结合第一方面,在一种可能的实现方式中,计算设备从变异后的神经网络中筛选出网络结构优于第i-1次演化得到的神经网络的候选神经网络的一种实现方式可以是:计算设备从第一神经网络变异后的神经网络中筛选出网络结构优于第一神经网络的神经网络,候选神经网络包括第一神经网络变异后的神经网络中网络结构优于第一神经网络的神经网络,第一神经网络为该第i-1次演化得到的神经网络中的任意一个神经网络。
可选地,在满足如下条件中的至少一种时,第一神经网络变异后的神经网络的网络结构优于第一神经网络的网络结构:
第一神经网络变异后的神经网络的通道数大于第一神经网络的通道数;
第一神经网络变异后的神经网络中卷积层的层数大于第一神经网络中卷积层的层数。
上述方法中通过神经网络的通道数和卷积层的层数对变异后神经网络的剪枝方式仅需要统计神经网络中通道数和层数,剪枝效率高,进一步提高自动机器学习的效率。
结合第一方面,在一种可能的实现方式中,计算设备根据第i-1次演化得到的神经网络 和候选神经网络的集合中每一个神经网络对应的P个评价参数,从集合中筛选出第i次演化得到的神经网络的一种实现方式可以是:计算设备根据集合中每一个神经网络对应的P个评价参数对集合中的神经网络进行非主导排序,进而,确定第i次演化得到的神经网络为集合中不被主导的神经网络;其中,第二神经网络和第三神经网络为集合中的两个神经网络,若针对P个评价参数中每一个评价参数第二神经网络都不劣于第三神经网络且针对P个评价参数中的至少一个评价参数第二神经网络优于第三神经网络时,则第二神经网络主导第三神经网络。
上述方法,在每一次演化过程中,从上一次演化得到神经网络和其变异所得到的候选神经网络组成的集合中筛选出排雷托最优的网络,进而,减少进入下一次演化的神经网络的数量,大大减小每次演化过程中的计算量,进一步提高自动机器学习的效率。
结合第一方面,在一种可能的实现方式中,计算设备获取N个神经网络的一种实现方式可以是:计算设备随机生成M个神经网络,M为正整数;通过数据集对M个神经网络分别进行训练和测试,得到M个神经网络中每一个神经网络对应的P个评价参数;进而,根据M个神经网络的中每一个神经网络对应的P个评价参数,从M个神经网络中选出N个神经网络,N不大于M。
结合第一方面,在一种可能的实现方式中,P个评价参数包括运行时间、精确度、参数量中的至少一个。
上述方法可以避免一个得到的第K次演化得到的神经网络中一个评价参数优,而其他评价参数差的情况,实现了多目标优化,得到P个评价参数的平衡的神经网络。
第二方面,本申请实施例还提供了一种物体识别方法,包括:用户设备或客户设备获取待识别图像;将所述待识别图像输入到物体识别神经网络,得到所述待识别图像对应的物体类型。
其中,所述物体识别神经网络是在搜索空间中通过如第一方面或第一方面任一种实现方式所述的神经网络的搜索方法确定的网络,所述搜索空间是通过基本单元和所述基本单元的参数构建的。
可选地,该待识别图像为车辆的周围环境的图像,以识别车辆周围环境中的物体。
可选地,所述基本单元的参数包括基本单元的类型、通道数参数和尺寸参数中的至少一种。
可选地,所述基本单元用于对输入所述基本单元的特征图进行第一操作和第二操作,所述特征图为所述待识别图像的特征图,所述第一操作用于将输入所述基本单元的特征图的个数加倍或保持不变,所述第二操作用于将输入所述基本单元的特征图的尺寸从原来的第一尺寸变为第二尺寸或维持所述第一尺寸不变,所述第一尺寸大于所述第二尺寸。
可选地,所述搜索空间中的神经网络为ResNet,所述基本单元包括残差模块,所述残差模块用于将输入所述基本单元的特征图与所述输入所述基本单元的特征图经过所述基本单元处理后的特征图进行相加。
可选地,所述搜索空间中的神经网络为CNN,所述基本单元的类型包括卷积层、池化层。
此时,第一方面中所述的该数据集包括多个样本,该数据集中的每一个样本包括样本图像和该样本图像对应的物体类型。
第三方面,本申请实施例还提供了一种手势识别方法,包括:用户设备或客户设备获取待识别图像;将待识别图像输入到手势识别神经网络,得到待识别图像对应的手势类型。
其中,所述手势识别神经网络是在搜索空间中通过如第一方面或第一方面任一种实现方式所述的神经网络的搜索方法确定的网络,所述搜索空间是通过基本单元和所述基本单元的参数构建的。
可选地,所述基本单元的参数包括基本单元的类型、通道数参数和尺寸参数中的至少一种。
可选地,所述基本单元用于对输入所述基本单元的特征图进行第一操作和第二操作,所述特征图为所述待识别图像的特征图,所述第一操作用于将输入所述基本单元的特征图的个数加倍或保持不变,所述第二操作用于将输入所述基本单元的特征图的尺寸从原来的第一尺寸变为第二尺寸或维持所述第一尺寸不变,所述第一尺寸大于所述第二尺寸。
可选地,所述搜索空间中的神经网络为ResNet,所述基本单元包括残差模块,所述残差模块用于将输入所述基本单元的特征图与所述输入所述基本单元的特征图经过所述基本单元处理后的特征图进行相加。
可选地,所述搜索空间中的神经网络为CNN,所述基本单元的类型包括卷积层、池化层。
此时,第一方面中所述的该数据集包括多个样本,该数据集中的每一个样本包括样本图像和该样本图像对应的手势类型。
第四方面,本申请实施例还提供了一种数据预测方法,该方法可以包括:用户设备或客户设备获取待预测数据;将待预测数据输入到目标神经网络模型,以得到待预测数据对应的预测结果。
其中,目标神经网络可以是上述第一方面得到的第K次演化得到的神经网络或第K次演化得到的神经网络中的一个神经网络,也可以是上述第K次演化得到的神经网络与数据清洗、特征工程算法结合得到的机器学习模型。
同上述第二方面或第三方面,其中,所述目标神经网络是在搜索空间中通过如第一方面或第一方面任一种实现方式所述的神经网络的搜索方法确定的网络,所述搜索空间是通过基本单元和所述基本单元的参数构建的。
第五方面,本申请实施例还提供了一种神经网络的搜索装置,包括:
获取模块,用于获取数据集和N个神经网络,N为正整数;
演化模块,用于对所述N个神经网络进行K次演化,得到第K次演化得到的神经网络,K为正整数;
其中,所述演化模块包括变异单元、第一筛选单元和第二筛选单元,其中,
所述变异单元用于:在第i次演化过程中,对第i-1次演化得到的神经网络的网络结构 进行变异,得到变异后的神经网络,所述第0次演化得到的神经网络为所述N个神经网络;
所述第一筛选单元用于:在第i次演化过程中,从所述变异后的神经网络中筛选出网络结构优于所述第i-1次演化得到的神经网络的候选神经网络,所述候选神经网络为筛选出的神经网络;
所述第二筛选单元用于:在第i次演化过程中,根据所述第i-1次演化得到的神经网络和所述候选神经网络的集合中每一个神经网络对应的P个评价参数,从所述集合中筛选出第i次演化得到的神经网络;其中,所述P个评价参数用于评价所述集合中每一个神经网络的通过所述数据集训练和测试后的神经网络的性能,i为不大于K的正整数,所述P为正整数。
上述各个单元的具体实现可以参见上述第一方面或第一方面中任一种实现方式的方法。
第六方面,本申请实施例还提供了一种物体识别装置,包括:用于实现第二方面所述的物体识别方法中的功能单元。
物体识别装置所包括的各个单元和上述各个单元的具体实现可以参见上述第二方面或第二方面中任一种实现方式的方法
第七方面,本申请实施例还提供了一种手势识别装置,包括:用于实现第三方面所述的手势识别方法中的功能单元。
手势识别装置所包括的各个单元和上述各个单元的具体实现可以参见上述第三方面或第三方面中任一种实现方式的方法。
第八方面,本申请实施例还提供了一种数据预测装置,包括:用于实现第四方面所述的手势识别方法中的功能单元。
数据预测装置所包括的各个单元和上述各个单元的具体实现可以参见上述第四方面或第四方面中任一种实现方式的方法。
第九方面,本申请实施例还提供了一种神经网络的搜索装置,包括:包括处理器和存储器,所述存储器用于存储程序,所述处理器执行所述存储器存储的程序,当所述存储器存储的程序被执行时,使得所述神经网络的搜索装置实现如第一方面或第一方面任一项可能的实现所述的方法。
第十方面,本申请实施例还提供了一种物体识别装置,包括:包括处理器和存储器,所述存储器用于存储程序,所述处理器执行所述存储器存储的程序,当所述存储器存储的程序被执行时,使得所述物体识别装置实现如第二方面或第二方面任一项可能的实现所述的方法。
第十一方面,本申请实施例还提供了一种手势识别装置,包括:包括处理器和存储器,所述存储器用于存储程序,所述处理器执行所述存储器存储的程序,当所述存储器存储的程序被执行时,使得所述手势识别装置实现如第三方面或第三方面任一项可能的实现所述的方法。
第十二方面,本申请实施例还提供了一种数据预测装置,包括:包括处理器和存储器,所述存储器用于存储程序,所述处理器执行所述存储器存储的程序,当所述存储器存储的程序被执行时,使得所述数据预测装置实现如第四方面或第四方面任一项可能的实现所述 的方法。
第十三方面,本申请实施例还提供了一种包含指令的计算机程序产品,当上述计算机程序产品在计算机上运行时,使得上述计算机实现如第一方面中任一可能的实现方式的方法。
第十四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可执行指令,所述计算机可执行指令在被所述计算机调用时用于使所述计算机实现如第一方面中任一可能的实现方式所述的方法。
第十五方面,本申请实施例还提供了一种包含指令的计算机程序产品,当上述计算机程序产品在计算机上运行时,使得上述计算机实现如第二方面中任一可能的实现方式的方法。
第十六方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可执行指令,所述计算机可执行指令在被所述计算机调用时用于使所述计算机实现如第二方面中任一可能的实现方式所述的方法。
第十七方面,本申请实施例还提供了一种包含指令的计算机程序产品,当上述计算机程序产品在计算机上运行时,使得上述计算机实现如第三方面中任一可能的实现方式的方法。
第十八方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可执行指令,所述计算机可执行指令在被所述计算机调用时用于使所述计算机实现如第三方面中任一可能的实现方式所述的方法。
第十九方面,本申请实施例还提供了一种包含指令的计算机程序产品,当上述计算机程序产品在计算机上运行时,使得上述计算机实现如第四方面中任一可能的实现方式的方法。
第二十方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可执行指令,所述计算机可执行指令在被所述计算机调用时用于使所述计算机实现如第四方面中任一可能的实现方式所述的方法。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种AutoML的系统架构图;
图2A是本申请实施例提供的一种应用场景的示意图;
图2B是本申请实施例提供的又一种应用场景的示意图;
图3是本申请实施例提供的一种系统的架构示意图;
图4是本申请实施例提供的一种卷积神经网络的架构示意图;
图5是本发明实施例提供的一种芯片硬件结构的示意图;
图6A是本申请实施例提供的一种的神经网络的搜索方法的流程示意图;
图6B是本申请实施例提供的计算设备从该集合中筛选出第i次演化得到的神经网络的一种实现方式的流程示意图;
图7是本申请实施例提供的一种变异前后ResNet的结构示意图;
图8是本申请实施例提供的一种变异前后CNN的结构示意图;
图9A是本申请实施例提供的一种物体识别方法的流程示意图;
图9B是本申请实施例提供的一种手势识别方法的流程示意图;
图10A是本申请实施例提供的一种得到的模型的运行时间和top1精度的示意性说明图;
图10B是本申请实施例提供的一种得到的模型的参数量和top1精度的示意性说明图;
图11是本申请实施例提供的一种神经网络的搜索装置的结构示意图;
图12A是本申请实施例提供的一种物体识别装置的结构示意图;
图12B是本申请实施例提供的一种手势识别装置的结构示意图;
图13是本申请实施例提供的另一种神经网络的搜索装置的结构示意图;
图14是本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
首先对本申请涉及的自动机器学习(automatic machine learning,AutoML)的架构进行简单描述。
如图1所示,为本申请实施例提供的一种AutoML的系统架构图,AutoML的主要流程可以包括如下过程:
a)数据准备(date preparation)
数据准备可以包括数据采集(date collection)和数据清洗(date cleaning)。其中,数据采集包括接收用户设备发送的原始数据,也可以来源于已有数据库,如imageNet、Labelme等或者通过其他方式获取到;数据清洗主要包括对原始数据中的数据进行缺失值处理、判断数据类型、异常值检测、文本编码、数据分割等,以得到可以被机器学习模型运算的数据。原始数据可以是图像、语音、文本、视频或其组合等。
b)特征工程(feature engineering)
特征工程是利用数据领域的相关知识来创建能够使机器学习算法达到最佳性能的特征的过程,是将原始数据转变为特征的过程,其目的是从原始数据中最大化提取特征以供算法和模型使用。该过程可以包括特征构建、特征提取、特征选择等,其中:特征构建是从原始数据中人工的构建新的特征;特征提取是自动地构建新的特征,将原始特征转换为一组具有明显物理意义或者统计意义或核的特征,比如通过变换特征取值来减少原始数据中某个特征的取值个数等;特征选择是从特征集合中挑选一组最具统计意义的特征子集,把无关的特征删掉,从而达到降维的效果。在实际应用中,特征工程是一个迭代过程,需要不断的进行特征构建、特征提取、选择特征、模型选择、模型训练、评估模型,然后才能得到最终的机器学习模型。
原始数据经过上述特征工程后可以得到可以输入到机器学习模型的数据集。应理解, 该数据集可以被划分为训练数据集和测试数据集,其中,训练数据集用于对构建的机器学习模型的训练,以得到训练后的机器学习模型;测试数据集用于对训练后的机器学习模型进行测试,以评估该训练后的机器学习模型的性能,例如准确性、运行时间等。
在一些实施例中,特征工程不是AutoML必须的过程,此时原始数据经过数据清洗即可得到数据集。
c)模型构建(model generation)
特征工程之后,需要从机器学习模型的搜索空间中选择机器学习模型,并为选择的机器学习模型设置超参数。其中,所有可能的机器学习模型组成了机器学习模型的搜索空间。搜索空间中的机器模型可以是构建好的,也可以是在搜索过程中构建的,此处不作限定。
d)模型训练(model training)和模型评估(model evaluation)
在选择出机器学习模型并设置其超参数后,可以通过训练数据集对初始化的机器学习模型进行训练,进而通过测试数据集对训练完成的机器学习模型进行评估,进而,由评估结果反馈指导机器学习模型的构建、选择、超参数的设置等,最终得到最优的一个或多个机器学习模型。
e)神经网络搜索(neural architecture search,NAS)
本申请实施例中机器学习模型为神经网络,可以是深度神经网络、比如卷积神经网络(convolutional neural network,CNN)、残差神经网络(deep residual network,ResNet)、循环神经网络等神经网络。通过NAS来实现模型的搜索和选择,NAS是一种搜索最佳神经网络体系结构的算法,该方法主要包括模型结构和模型参数的自动优化。本申请实施例中,在通过NAS进行神经网络模型的搜索和选择时,采用了演化算法,即:构建一个或多个神经网络;对该一个或多个神经网络进行随机变异,例如,随机增加或删除一层结构,随机对改变神经网络中的一个或多个层结构的通道数等;基于偏序假设从变异后的神经网络中筛选出网络结构比变异前的神经网络优的神经网络,即为候选神经网络;对候选神经网络中每一个神经网络进行训练和测试,得到每一个神经网络对应的P个评价参数;基于每一个神经网络对应的P个评价参数从候选神经网络中筛选出评价参数较优的神经网络;进而,基于筛选出的神经网络迭代执行变异、变异后的神经网络的筛选、候选神经网络的训练和测试、候选神经网络的筛选等过程,使得最终筛选出的神经网络越来越好。
需要说明的是,在为特定任务构建机器学习模型过程中,上述各个过程通常是相互依赖的。例如,模型的选择会影响某些特征的采用的特征变换。
本申请实施例主要针对神经网络的搜索方法进行介绍,应理解,该方法可以结合其他的步骤或过程,例如特征工程、超参数优化等,得到最优的模型,针对于其他过程的结合,可以参见现有技术中相关内容,对此,本申请实施例不作限定。
下面对本申请实施例涉及的部分关键术语进行说明。
(1)、帕雷托最优(Pareto optimality)
帕雷托最优是指资源的一种理想状态,给定固有的一群人和可分配的资源,如果从一种分配状态到另一种状态的变化中,在没有使任何人境况变坏的前提下,使得至少一个人变得更好,也称为帕雷托改善。帕雷托最优的状态就是不可能再有更多的帕雷托改善的状 态;换句话说,不可能在不使任何其他人不受损的情况下再改善某些人的境况。
例如,给出一组模型其评价参数(精确度,运行时间(s))分别为(0.8,2)、(0.7,3)、(0.9,2.5)、(0.7,1)。由于,模型的精确度越高、模型的运行时间越低,则模型越优。可见,评价参数为(0.8,2)的模型优于评价参数为(0.7,3)的模型;而,评价参数为(0.8,2)、(0.9,2.5)、(0.7,1)、(0.8,2)的模型之间不能比较出优劣,此时,评价参数为(0.8,2)、(0.9,2.5)、(0.7,1)、(0.8,2)的模型为帕雷托最优的模型。
(2)、帕雷托前沿
本申请实施例中,模型的多个评价参数,如精确度、运行时间、参数量等在模型的优化过程中存在冲突和无法比较的现象,当模型的精确度最好时,模型的参数量或运行时间可能是最差的。在通过NAS改进模型的过程中,在改进一个评价参数的同时,会削弱其他评价参数。各个评价参数都最优的模型的集合即为帕雷托前沿,也就是说帕雷托前沿为帕雷托最优的模型的集合。
(3)、非主导排序
非主导排序是多目标常用的排序方法,假设优化的目标有(A,B,C),点1(A 1,B 1,C 1)主导点2(A 2,B 2,C 2)当仅且当A 1≥A 2,B 1≥B 2,C 1≥C 2且至少有一个等号不成立。点1主导点2意味着点1优于点2。没有被任何点主导的点就是帕雷托前沿上的点,即非主导点。
在本申请实施例中,优化的目标为模型的P个评价参数,模型1主导模型2,当且仅当模型1的P个评价参数都不劣于模型2且模型1的P个评价参数中至少存在一个评价参数优于模型2。
(4)、偏序假设
偏序假设是指,拓扑结构相近的网络,更窄更浅的网络比更深更宽的网络差,也就是说,更深更宽的网络优于更窄更浅的网络。这里,“宽”、“窄”分别形容网络的通道数;“深”、“浅”分别形容网络的层数。
(5)偏序剪枝算法
偏序剪枝算法是指应用偏序假设的原理来对缩小模型的搜索空间的算法。在本申请一些实施例中应用偏序假设的原理来对缩小模型的搜索空间,提高模型搜索效率。
(6)空洞卷积
空洞卷积就是在普通的卷积核中间插入0,得到更大的卷积核,但参数量保持不变,以获得更大范围的信息。
(7)深度可分离卷积
普通卷积过程是:M个特征图P1通过普通卷积核,如大小为(D k,D k,M,N)的高维矩阵卷积,可将M个特征图变成N个P2。然而,深度分离卷积则是:先用大小为(D k,D k,M,1)的矩阵,对M个特征图P1进行卷积,把M个特征图P1变成M个特征图P3,再用大小为(1,1,M,N)的卷积核对M个特征图P3进行卷积,得到N个个特征图P4,该方法能够大大减少参数量,且取得很好的效果。
(8)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2020126795-appb-000001
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(9)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式y=α(Wx+b),其中,x是输入向量,y是输出向量,b是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量x经过如此简单的操作得到输出向量y。由于DNN层数多,则系数W和偏移向量b的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2020126795-appb-000002
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2020126795-appb-000003
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(10)卷积神经网络
卷积神经网络(CNN,convolutional neuron network)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一层卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可 以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(11)深度残差网络(deep residual network,ResNet)
网络的深度对模型的性能至关重要,随着网络层数的增加,网络提取更加复杂的特征,网络的性能也不断提升,因此理论上,越深的网络,效果应该更好;但实际上,由于训练难度,过深的网络会产生退化问题,效果反而不如相对较浅的网络,这被称为退化问题(degradation problem)。其原因是随着网络越来越深,训练变得原来越难,网络的优化变得越来越难。
为解决该问题,引入了跳过连接(也称为快捷连接),ResNet,其中,ResNet可以包括多个级联的残差单元(也称为残差块)和若干个全连接层。在ResNet中,上一个残差单元的输出和输入被同时输入到下一个残差单元,对于第l个残差单元,x l+1=f(h(x l)+F(x l,W l)),其中,F(x l,W l)是第l个残差单元的输出,x l是第l个残差单元的输入,W l为第l个残差单元包括的多层卷积层的成的权重矩阵,每个残差单元之间经过一个函数f()进行激活。
下面介绍本申请实施例涉及的应用场景。
如图2A和图2B为本申请实施例提供的二种应用场景的示意图,客户可以使用客户设备向计算设备,如云服务器,发送原始数据或数据集,并请求云服务器基于提供的原始数据或数据集训练得到可以完成特定任务的目标神经网络。云服务器可以利用其强大的计算资源和AutoML架构,利用客户提供的原始数据或数据集,自动生成客户需要的目标神经网络。其中,数据集为原始数据经过数据清洗、特征工程之后得到的数据,包括训练数据集和测试数据集,关于原始数据或数据集的描述可以参见上述图1中相关描述,此处不再赘述。该原始数据和数据集也可以是从现有的数据库获取的数据,如从imageNet中获取的图片。如下为本申请实施例提供的两种场景:
场景A:
如图2A所示,客户想要一个可以识别物体类型的神经网络,该神经网络应用于自动驾驶车辆或半自动驾驶车辆,以识别车辆的通过摄像头观测到的视野中的物体,由于,车辆处于不断运动过程中,且需要保证车辆的安全驾驶,因此对物体识别的实时性和对车辆周围环境中的物体的识别准确性要求高,此时,客户可以要求该神经网络预测的物体类型具有高的精确度和低耗时。客户通过客户设备向云服务器发送数据集,并请求采用多目标优化(即高精确度、低耗时)查找最优神经网络。其中,该数据集包括多类型的样本图像,每一个样本图像被标注其所属的物体类型;物体类型可以包括:人、狗、车辆、交通灯为红灯、建筑物、交通线、树木、路沿等。
云服务器可以利用上述数据集通过其AutoML架构,按照客户的高的精确度和低耗时的要求,在神经网络的搜索空间中选择神经网络、并对选择的神经网络进行训练、评估,在根据各个训练后的神经网络的精确度和耗时,进一步地筛选出精确度高且耗时低的神经网络,通过多次筛选得到客户需要的帕累托最优的物体识别神经网络。进而,云服务器将该物体识别神经网络发送至客户设备。客户设备可以将该物体识别神经网络发送至车辆。 可选地,在客户设备为服务器时,车辆也可以从客户设备下载该物体识别神经网络。
车辆在接收到物体识别神经网络后,可以执行物体类型的识别方法,该方法可以包括如下步骤:车辆通过摄像头获取待识别图像,该待识别图像可以是车辆周围环境的图像;将该待识别图像输入到物体识别神经网络,预测得到该待识别图像对应的物体类型。进一步地,车辆还可以基于识别到的周围环境中的物体类型执行相应的安全驾驶方法,例如,在识别当前方有人时,减速或刹车,以提高车辆运行的安全性;又例如,在识别到前方路灯为绿灯时,则车辆可以通过交通路口。
场景B
如图2B所示,客户想要一个可以识别动态手势的神经网络,该神经网络应用于终端,如便携式设备,如手机、平板电脑等,或可穿戴设备,如智能手环、智能手表、VR眼镜等,或智能电视、智能音响、智能灯具、监控器等智能家居设备等,以识别上述设备通过摄像头观测到的视野中的手势。由于,终端的计算能力和存储资源有限,因此,要求应用于其上的神经网络具有高的精确度和低的参数量。客户通过客户设备向云服务器发送数据集,并请求采用多目标优化(即高精确度、低参数量)查找最优神经网络。其中,该数据集包括多种手势的样本图像,每一个样本图像被标注其所属的手势类型,手势类型可以包括多种不同的手势。
云服务器可以利用上述数据集通过其AutoML架构,按照客户的高的精确度和低耗时的要求,在神经网络的搜索空间中选择神经网络、并对选择的神经网络进行训练、评估,在根据各个训练后的神经网络的精确度和耗时,进一步地筛选出精确度高且耗时低的神经网络,通过多次筛选得到客户需要的帕累托最优的手势识别神经网络。进而,云服务器将该手势识别神经网络发送至客户设备。客户设备可以将该手势识别神经网络发送至终端。可选地,在客户设备为服务器时,终端也可以从客户设备下载该手势识别神经网络。
终端在接收到手势识别神经网络后,可以执行手势识别方法,该方法可以包括如下步骤:终端通过摄像头获取待识别图像;将该待识别图像输入到手势识别神经网络,预测得到该待识别图像对应的手势类型。进一步地,终端还可以基于识别到的手势类型执行相应的操作,例如,在识别到第一手势时,执行打开应用“相机”的操作。其中,第一手势可以是该手势识别神经网络可以识别的多种不同的手势中任意一种。
应理解,在场景A或场景B中,云服务器基于数据集通过自动生成物体识别神经网络或手势识别神经网络的具体实现可以参见下述方法实施例中相关描述,此处不再赘述。
下面介绍本申请实施例涉及的系统架构。如图3所示,图3是本申请实施例提供的一种系统的架构示意图,其中:
计算设备32可以包括上述图1所示的AutoML架构的部分或全部,计算设备32可以根据数据库33存储的原始数据或数据集,或客户设备31发送的原始数据或数据集,自动生成可以执行特定功能的机器学习模型,如上述场景A中物体识别神经网络、上述场景B中手势识别神经网络等。
计算设备32可以包括多个节点,一方面,计算设备32可以是分布式计算系统,计算设备32包括的多个节点可以分别为具备计算能力的计算机设备;另一方面,计算设备32 可以是一个设备,其包括的多个节点可以是计算设备32中的功能模块/器件等。其中,预处理节点321用于对接收到的原始数据进行预处理,如数据清洗等;特征工程节点322对预处理后的原始数据进行特征工程,得到数据集。在另一些实施例中,预处理后的原始数据即为数据集。数据集可以划分为训练数据集和测试数据集。
模型构建节点323用于根据训练数据集随机生成神经网络的架构,为其配置超参数,得到初始化的神经网络;模型搜索节点324用于执行神经网络的搜索方法,对初始化的神经网络进行多次演化,以得到最终演化得到的神经网络。模型构建节点323用于对演化过程中的神经网络进行变异,得到候选神经网络;模型训练节点325可以对初始化的神经网络、候选神经网络等进行训练,以得到训练后的神经网络;模型评估节点326用于根据测试数据集对训练后的神经网络进行测试,得到该训练后的神经网络的评价参数,如精确度、运行时间、参数量等。模型搜索节点324在对神经网络进行训练和测试之前,基于偏序剪枝算法对候选神经网络进行筛选,仅对网络结构优于变异前的神经网络的进行训练和测试,以缩小神经网络的搜索空间,提高神经网络的搜索效率。模型搜索节点324还用于基于模型评估节点326得到的训练后的神经网络的评价参数筛选出最优的一个或多个神经网络或帕雷托最优的神经网络,作为进入下一次演化的神经网络。经过多次演化,得到一个或多个神经网络,将得到的神经网络与特征工程、预处理等模块结合可以组成目标神经网络。计算设备32可以将目标神经网络发送给客户设备31。
系统还可以包括用户设备34,在客户设备31或计算设备32得到目标神经网络后,用户设备34可以向客户设备31或计算设备32下载目标神经网络,以使用目标神经网络来对待预测数据进行预测,得到预测结果;或,用户设备34也可以向客户设备31发送待预测数据,客户数据31在接收到待预测数据后,将待预测数据输入到目标神经网络得到预测结果,进而向用户设备34发送该预测结果。目标神经网络可以是场景A中物体识别神经网络、上述场景B中手势识别神经网络,待预测数据可以是场景A中或场景B中的待识别图像。
上述计算设备32、计算设备32中的各个节点可以是云服务器、服务器、计算机设备、终端设备等,此处不再赘述。
上述客户设备31或用户设备34可以是手机、平板电脑、个人计算机、车辆、车载单元、销售终端(point of sales,POS)、个人数字助理(personal digital assistant,PDA)、无人机、智能手表、智能眼镜、VR设备等,此处不作限定。客户设备31还可以是服务器。
需要说明的是,预处理节点321、特种工程节点322、模型构建节点323、模型训练节点325、模型评估节点326等不是计算设备32必须的节点,上述预处理节点321、特种工程节点322、模型构建节点323、模型训练节点325、模型评估节点326等中一个或多个节点所实现的功能也可以集合在模型搜索节点324中。
系统中客户设备31、用户设备34、数据库33也不是系统必须的设备,系统不包括上述设备,或还可以包括其他设备或功能单元,本申请实施例不作限定。
如前文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽 象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。
如图4所示,卷积神经网络(CNN)200可以包括输入层210,卷积层/池化层220(其中池化层为可选的),以及神经网络层230。
卷积层/池化层220:
卷积层:
如图4所示卷积层/池化层220可以包括如示例221-226层,举例来说:在一种实现中,221层为卷积层,222层为池化层,223层为卷积层,224层为池化层,225为卷积层,226为池化层;在另一种实现方式中,221、222为卷积层,223为池化层,224、225为卷积层,226为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一层卷积层的输入以继续进行卷积操作。
下面将以卷积层221为例,介绍一层卷积层的内部工作原理。
卷积层221可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络200进行正确的预测。
当卷积神经网络200有多个卷积层的时候,初始的卷积层(例如221)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络200深度的加深,越往后的卷积层(例如226)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图4中220所示例的221-226各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图 像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
神经网络层230:
在经过卷积层/池化层220的处理后,卷积神经网络200还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层220只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络200需要利用神经网络层230来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层230中可以包括多层隐含层(如图4所示的231、232至23n)以及输出层240,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等……
在神经网络层230中的多层隐含层之后,也就是整个卷积神经网络200的最后层为输出层240,该输出层240具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络200的前向传播(如图5由210至240方向的传播为前向传播)完成,反向传播(如图4由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络200的损失,及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。
需要说明的是,如图4所示的卷积神经网络200仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在。
下面介绍本申请实施例提供的一种芯片硬件结构。
图5为本发明实施例提供的一种芯片硬件结构,该芯片包括神经网络处理器30。该芯片可以被设置在如图3所示的计算设备32中,用以完成神经网络的训练和测试的计算工作。该芯片也可以被设置在如图3所示的客户设备31或用户设备34中,用以完成通过目标神经网络对待预测数据的预测工作。如图4所示的卷积神经网络或深度残差神经网络中各层的算法均可在如图5所示的芯片中得以实现。
神经网络处理器30可以是NPU,TPU,或者GPU等一切适合用于大规模异或运算处理的处理器。以NPU为例:NPU可以作为协处理器挂载到主CPU(Host CPU)上,由主CPU为其分配任务。NPU的核心部分为运算电路303,通过控制器304控制运算电路303提取存储器(301和302)中的矩阵数据并进行乘加运算。
在一些实现中,运算电路303内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路303是二维脉动阵列。运算电路303还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路303是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路303从权重存储器302中取矩阵B的权重数据,并缓存在运算电路303中的每一个PE上。运算电路303从输入存储器301中取矩阵A的输入数据,根据矩阵A的输入数据与矩阵B的权重数据进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)308中。
统一存储器306用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(DMAC,Direct Memory Access Controller)305,被搬运到权重存储器302中。输入数据也通过DMAC被搬运到统一存储器306中。
总线接口单元(BIU,Bus Interface Unit)310,用于DMAC和取指存储器(Instruction Fetch Buffer)309的交互;总线接口单元301还用于取指存储器309从外部存储器获取指令;总线接口单元301还用于存储单元访问控制器305从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器306中,或将权重数据搬运到权重存储器302中,或将输入数据搬运到输入存储器301中。
向量计算单元307多个运算处理单元,在需要的情况下,对运算电路303的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。向量计算单元307主要用于神经网络中非卷积层,或全连接层(FC,fully connected layers)的计算,具体可以处理:Pooling(池化),Normalization(归一化)等的计算。例如,向量计算单元307可以将非线性函数应用到运算电路303的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元307生成归一化的值、合并值,或二者均有。
在一些实现中,向量计算单元307将经处理的向量存储到统一存储器306。在一些实现中,经向量计算单元307处理过的向量能够用作运算电路303的激活输入,例如用于神经网络中后续层中的使用,如图4所示,若当前处理层是隐含层1(231),则经向量计算单元307处理过的向量还可以被用到隐含层2(232)中的计算。
控制器304连接的取指存储器(instruction fetch buffer)309,用于存储控制器304使用的指令。
统一存储器306,输入存储器301,权重存储器302以及取指存储器309均为On-Chip存储器。外部存储器独立于该NPU硬件架构。
其中,图4所示的卷积神经网络中各层的运算或者深度残差网络中各个残差单元的计算可以由运算电路303或向量计算单元142执行。
实施例一:
如图6A所示为本申请实施例提供的一种的神经网络的搜索方法,可以用于搜索得到上述场景A中的物体识别神经网络或场景B中的手势识别神经网络,本申请实施例提供的神经网络的搜索方法可以应用于AutoML架构,以实现自动生成机器学习模型。该方法60可以由如图3所示的计算设备320执行,在另一种实现中,计算设备可以是分布式计算设备,包括预处理节点321、特征工程节点322、模型构建节点323、模型搜索节点324、模型训练节点325和模型评价节点326等。其中,该方法60中的步骤S602中获取数据集的步骤S6021等可以由预处理节点321或特征工程节点322执行;步骤S602中获取N个神 经网络、步骤S6042可以由模型构建节点323执行;步骤S6022、S6046中训练过程可以由模型训练节点325执行,步骤S6022、S6046中测试过程可以由模型评价节点326执行,步骤S6023、S604、S6044 S6048可以由模型搜索节点324执行。可选地,步骤S602、步骤S6042也可以由模型评价节点326执行。可选的,所述方法60或方法中各个步骤可以分别由CPU处理,也可以由CPU和GPU共同处理,也可以不用GPU,而使用其他适合用于神经网络计算的处理器,例如,图4所示的神经网络处理器40,此处不作限定。本申请实施例以执行主体为计算设备为例来说明,该方法60可以包括如下部分或全部步骤:
S602:计算设备获取数据集和N个神经网络,N为正整数。
该方法60中的数据集可以是经过数据清洗的原始数据也可以是原始数据经过特征工程得到的数据集,原始数据或数据集可以来源于如图3所示的数据库330,也可以是客户设备310采集或获取的。
其中,数据集可以包括训练数据集和测试数据集,训练数据集用于对初始化的神经网络进行训练,测试数据集用于测试训练后的神经网络的性能,如精确度、运行时间等。训练数据集包括多个训练样本,测试数据集可以包括多个测试样本,一个训练样本或一个测试样本可以包括输入数据和标签。其中,训练样本的输入数据用于输入到初始化的神经网络,得到该输入数据对应的预测结果;标签即为该输入数据对应的真实结果,真实结果与预测结果的误差用于反馈调节初始化的神经网络的模型参数,以得到训练后的神经网络。而,测试样本的输入数据用于输入到训练后的神经网络,得到该输入数据对应的预测结果,根据预测结果与真实结果的误差来评价该训练后的神经网络的精确度,或者将输入数据输入到训练后的神经网络,测试训练后的神经网络的运行时间等。
在一些实施例中,N个神经网络可以是人工构建的1个或多个神经网络,也可以是计算设备一个随机生成的一个或多个神经网络。
在另一些实施例中,N个神经网络还可以是从随机生成的M个神经网络筛选出的N个神经网络,M为不小于N的正整数。计算设备获取N个神经网络的一种实现可以包括但不限于如下步骤:
S6021:计算设备随机生成M个神经网络,M为正整数。
其中,随机生成M个神经网络的具体实现可以参见下述随机生成神经网络的方法实施例中相关描述,此处不再赘述。
S6022:计算设备通过所述数据集对M个神经网络分别进行训练和测试,得到M个神经网络中每一个神经网络对应的P个评价参数。
S6023:计算设备根据M个神经网络的中每一个神经网络对应的P个评价参数,从M个神经网络中筛选出N个神经网络,N不大于M。
在一种具体实现中,计算设备从M个神经网络中筛选出P个评价参数满足预设条件的神经网络,例如,从M个神经网络中筛选出精确度大于预设阈值(如90%),运行时间小于第一时长(如2s)的神经网络,得到N个神经网络。
S604:计算设备对N个神经网络进行K次演化,得到第K次演化得到的神经网络,K为正整数。以第i次演化为例来说明K次演化的过程,i为不大于K的正整数,第i次演化包括但不限于如下步骤:
S6042:计算设备对第i-1次演化得到的神经网络的网络结构进行变异,得到变异后的神经网络,第0次演化得到的神经网络即为N个神经网络。
其中,计算设备可以对第i-1次演化得到的神经网络中的一个或多个神经网络进行变异,也可以对第i-1次演化得到的神经网络中每一个神经网络进行变异。对神经网络进行变异的具体实现可以参见下述神经网络的变异方法实施例中相关描述,此处不再赘述。
S6044:计算设备从变异后的神经网络中筛选出网络结构优于第i-1次演化得到的神经网络的选神经网络,得到候选神经网络。
应理解,一个神经网络与其变异后得到的神经网络属于拓扑结构相近的网络。而拓扑结构相近的网络中,更宽更深的网络优于更窄更浅的网络,因此,可以基于网络的深浅和宽窄初步筛选网络,以过滤差的网络。这里,“宽”、“窄”分别形容网络的通道数;“深”、“浅”分别形容网络的层数。也就是说,拓扑结构相近的网络中,层数越多、通道数越多的网络越优。例如,对于拓扑结构相近的CNN来说,层数越多,通道数越多,网络越优;对于拓扑结构相近的ResNet来说,残差单元的个数越多、通道数越多,网络越优。
本申请实施例中,第i-1次演化得到的神经网络中的每一个神经网络都可以进行变异,在针对其中一个神经网络变异后的神经网络进行筛选时,仅保留网络结构优于变异前的神经网络的网络,作为候选神经网络。应理解,候选神经网络即为筛选出的神经网络,包括至少一个神经网络。
可见,本申请实施例中,通过对神经网络进行变异,以产生与其拓扑结构相近的神经网络,利用拓扑结构相近的神经网络具备的特性,对神经网络的搜索空间进行剪枝,减少了需要训练和测试的神经网络的数量,提高自动机器学习的效率。
S6046:计算设备对候选神经网络中每一个神经网络进行训练和测试,得到候选神经网络中每一个神经网络对应的P个评价参数。P为正整数。
数据集可以划分为训练数据集和测试数据集。计算设备通过训练数据集对候选神经网络中的每一个神经网络进行训练,进而,利用测试数据集对训练后的神经网络进行评估,得到每一个神经网络分别对应的P个评价参数。评价参数用于评价通过训练数据集训练后的神经网络的性能,如,精确度、运行时间、参数量等中至少一种。
S6048:计算设备根据第i-1次演化得到的神经网络和候选神经网络的集合中每一个神经网络对应的P个评价参数,从该集合中筛选出第i次演化得到的神经网络。
在S6048之后,计算设备可以判断i是否等于K,即判断第i次演化是否为最后一次演化,如果是,则输出第K次演化得到的神经网络,否则,另i=i+1,重复执行S6042,以基于第i次演化得到的神经网络进行下一次演化。在本申请另一实施例中,也可以判断第i次演化得到的神经网络的评价参数是否满足条件,比如,第i次演化得到的神经网络的精确度是否都大于预设精确度且其运行时间都小于预设时长,如果是,则输出第K次演化得到的神经网络,否则,另i=i+1,重复执行S6042。其中,预设精确度和预设时长可以是客户设定的,由客户设备发送给计算设备的,用于指示其需要的目标神经网络的精确度、运行时间等。
应理解,第K次演化得到的神经网络可以是训练后的神经网络。计算设备可以根据客户对神经网络的P个评价参数的要求,从第K次演化得到的神经网络中或从第K次演化得 到的神经网络分别与特征工程模块、数据预处理模块的组合得到的神经网络中选择满足客户要求的目标神经网络,进而,向客户设备发送该神经网络;计算设备也可以将第K次演化得到的神经网络或第K次演化得到的神经网络分别与特征工程模块、数据预处理模块的组合得到的神经网络作为目标神经网络发送至客户设备,此处不作限定。目标神经网络可以是场景A中物体识别神经网络,此时,数据集包括多个样本,每一个样本包括样本图像还该样本图像对应的物体类型。目标神经网络也可以是上述场景B中手势识别神经网络,此时,数据集包括多个样本,每一个样本包括样本图像还该样本图像对应的手势类型。
下面着重介绍S6048的具体实现方式:
应理解,第i-1次演化得到的神经网络在第i-1次演化过程中已经进行训练和测试,并得到第i-1次演化得到的神经网络中每一个神经网络对应的P个评价参数。应理解,第0次演化得到的神经网络为上述N个神经网络,在第一次演化过程中或第一次演之前,计算设备可以通过训练数据集对N个神经网络中的每一个神经网络进行训练,进而,利用测试数据集对训练后的神经网络进行评估,得到N个神经网络中每一个神经网络分别对应的P个评价参数。
在一种实现中,P=1,例如,P个评价参数为精确度。此时,可以根据精确度的高低从集合中筛选出第i次演化得到的神经网络,例如,从集合中筛选出精确度最高的前Q个神经网络作为第i次演化得到的神经网络;又例如,从集合中筛选出精确度大于预设值,如90%的神经网络作为第i次演化得到的神经网络。
在另一种实现中,P>1,计算设备根据集合中每一个神经网络对应的P个评价参数对该集合中的神经网络进行非主导排序;进而,确定第i次演化得到的神经网络为该集合中不被主导的神经网络。其中,主导的神经网络对应的P个评价参数中每一个评价参数都不劣于被主导的神经网络且所述主导的神经网络对应的P个评价参数中至少存在一个评价参数优于所述被主导的神经网络。例如,P个评价参数为精确度和运行时间,神经网络A和神经网络B是几何中的两个神经网络,在神经网络A、神经网络B满足如下2个条件中的至少一个,则神经网络A主导神经网络B:
①神经网络A的精确度高于神经网络B的精确度且神经网络A的运行时间不高于神经网络B的运行时间;
②神经网络A的运行时间低于神经网络B的运行时间且神经网络A的精确度不低于神经网络B。
在一种具体实现中,在第i-1次演化得到的神经网络中的每一个神经网络都不被第i-1次演化得到的神经网络中的其他神经网络所主导,此时,也称为第i-1次演化得到的神经网络为处于帕雷托前沿的神经网络。如图6B所示为,计算设备从该集合中筛选出第i次演化得到的神经网络的一种实现方式的流程示意图,该实现方式可以包括但不限于如下步骤:
S60481:确定从候选神经网络中的第j个神经网络,其中,j为正整数,j不大于候选神经网络中的神经网络的总数。
S60482:判断第j个神经网络与帕雷托前沿中的第k个神经网络的主导关系,其中,k为正整数,k不大于第i-1次演化得到的神经网络中的神经网络的总数。如果第k个神经网络主导第j个神经网络,第j个神经网络不可能处于帕雷托前沿,此时,不必将第j个神经 网络与帕雷托前沿中的神经网络一一比较,则令j=j+1,重复执行S60482。如果第j个神经网络主导第k个神经网络,则执行S60483;如果第j个神经网络不主导第k个神经网络且第k个神经网络也不主导第j个神经网络,则执行S60484。
其中,在j=1,k=1时,帕雷托前沿中的神经网络即为第i-1次演化得到的神经网络。
当判断第j个神经网络与第k个神经网络的主导关系时,若第j个神经网络对应的P个评价参数中每一个评价参数都不劣于第k个神经网络且第j个神经网络对应的P个评价参数中存在至少一个评价参数优于第k个神经网络,则第j个神经网络主导第k个神经网络;反之,当判断第k个神经网络与第j个神经网络的主导关系时,若第k个神经网络对应的P个评价参数中每一个评价参数都不劣于第j个神经网络且第k个神经网络对应的P个评价参数中存在至少一个评价参数优于第j个神经网络,则第k个神经网络主导第j个神经网络;若第j个神经网络对应的P个评价参数中存在至少一个评价参数不劣于第k个神经网络且第k个神经网络对应的P个评价参数中存在至少一个评价参数不劣于第j个神经网络,那么,第j个神经网络与第k个神经网络互不主导。
S60483:将第k个神经网络从当前帕雷托前沿中删除。
S60484:判断第k个神经网络是否为帕雷托前沿中的最后一个网络,如果否,则需要继续将第j个神经网络与帕雷托前沿中的下一个网络进行比较,此时,令k=k+1,重复执行S60482;否则,第k个神经网络为帕雷托前沿中的最后一个网络,执行S60485。
S60485:将第j个神经网络加入到帕雷托前沿。
步骤S60485之后,进一步地,执行S60486。
S60486:判断第j个神经网络是否为候选神经网络中的最后一个网络,如果是,则第i次演化完成,可基于第i次演化得到的神经网络进行下一次演化;否则,则令j=j+1,重复执行S60482。
例如,以P个评价参数为精确度和运行时间为例来说明,若候选神经网络中神经网络NN1运行时间比帕雷托前沿上的神经网络NN2运行时间短且准确度高,则该神经网络NN1主导了排雷托前沿上的神经网络NN2,则将被主导的神经网络NN2从帕雷托前沿上去除,而将主导的神经网络NN1添加到帕雷托前沿;若神经网络NN1既不主导帕雷托前沿上的神经网络,也不被帕雷托前沿上的神经网络主导,则该神经网络NN1是一个新的帕雷托最优,直接将该神经网络NN1添加到帕雷托前沿;若神经网络NN1被帕雷托前沿上的神经网络主导,则不更新帕雷托前沿。
需要说明的是,在经过多次演化后,得到的神经网络越来越优。在经过固定次数(比如K=10次)演化后,或者,在K次演化后得到的神经网络的P个评价参数满足客户要求时,可以停止演化,输出第K次演化得到的神经网络。
而且,本申请实施例中采用了多目标优化方案,可以其得到的第K次演化得到的神经网络可以到达P个评价参数的平衡,进而,避免一个得到的第K次演化得到的神经网络中一个评价参数优,而其他评价参数差的情况。
下面以ResNet和CNN为例介绍本申请实施例涉及的随机生成神经网络的方法和对神经网络的变异方法。
上述第K次演化得到的神经网络是在搜索空间中通过上述实施例一所述的神经网络的搜索方法确定的网络。其中,搜索空间是通过基本单元和该基本单元的参数构建的,搜索空间用于搜索第K次演化得到的神经网络,该基本单元的参数包括基本单元的类型、通道数参数和尺寸参数中的至少一项,基本单元用于对输入该基本单元的特征图进行第一操作和第二操作,第一操作用于将输入基本单元的特征图的个数加倍或保持不变,第二操作用于将输入该基本单元的特征图的尺寸从原来的第一尺寸变为第二尺寸或维持第一尺寸不变,第一尺寸大于第二尺寸。这里尺寸可以是指特征图的边长或面积。通道数参数用于指示经过基本单元处理后的特征图的个数的变化,如加倍、保持不变;尺寸参数用于指示经过基本单元处理后的特征图的尺寸的变化,如缩小一倍、保持不变等。
在一些实施例中,神经网络可以是ResNet,基本单元也称为残差单元,ResNet可以包括多个残差单元和至少一个全连接层,每一个残差单元可以有至少两个(比如3)个卷积层组成,其中,全连接层的个数可以是预先设定的或变化的,此处不作限定。利用残差单元的参数对ResNet的网络结构进行编码。例如,通过有序的符号来指代ResNet中各个残差单元的次序,编码为“1”的残差单元,表示该残差单元的通道数保持不变,编码为“2”的残差单元表示该残差单元的通道数加倍,前方编码了“-”的残差单元表示,将该残差单元的特征图尺寸缩小一半。例如,编码“121-211-121”的ResNet的网络结构如图7所示。其中,图7中通过一个残差单元的宽度反映其通道的多少,一个残差单元的长度反映其特征图的尺寸。
利用若干个“1”、“2”和“-”的组合就得到一个ResNet。计算设备随机生成ResNet的过程,可以转换为随机生成字符串的过程。应理解,计算设备在随机生成字符串时,需要添加约束条件,或者需要对随机生成的字符串进行过滤,以去除不符合要求的ResNet。例如两个字符“-”不可能连续排列。
计算设备针对一个ResNet可以变异产生多个变异后的ResNet,其中,每一个变异后的神经网络都是ResNet通过一次变异产生的。本申请实施例中,计算设备对ResNet进行一次变异具体可以是如下实现方式中的一种:
(1)随机将一个残差单元的通道数由保持不变改变为通道数加倍,具体实现方式可以是随机将该ResNet的编码中的一个“1”变换成“2”。例如,如图7所示,将(a)图所示的ResNet第6个残差单元的通道数由原来的通道数保持不变改为通道数加倍,则编码“121-111-211”的ResNet变异为编码为“121-212-111”的ResNet,如(b)图所示,应理解,位于第6个残差单元后的所有残差单元的通道数在原来的基础上加倍。
(2)随机将一个残差单元的通道数由通道数加倍改变为通道数保持不变,具体实现方式可以是随机将该ResNet的编码中的一个“2”变换成“1”。例如,如图7所示,将(a)图所示的ResNet第7个残差单元的通道数由原来的通道数加倍改为通道数保持不变,则编码“121-211-111”的ResNet变异为编码为“121-111-111”的ResNet,如(c)图所示,应理解,位于第7个残差单元后的所有残差单元的通道数在原来的基础上减小50%。
(3)将一个残差单元的步长由原来的2变换为1,将另一个残差单元的步长由原来 的1变为2,具体实现方式可以是随机改变ResNet的编码中的一个“-”的位置。如图7所示,将(a)图所示的编码为“121-211-111”的ResNet中第7个残差单元的步长由2变为1,而将第8个残差单元的步长由1变为2,得到变异后的ResNet,即编码为“121-1112-11”的ResNet,如(d)图所示。其中,残差单元的步长是由其包括的至少两层卷积层分别对应的卷积核的步长决定,例如,假设残差单元包括两层卷积层,每一层卷积层的对应的卷积核的步长1,则残差单元的步长为1;若将该残差单元的步长变为2,则需要将该两层卷积层中一层卷积层的步长改为2,若第一层卷积层步长改为2。
(4)在该ResNet中随机插入一个通道数不变的残差单元,具体实现方式可以是在该ResNet的编码中随机插入一个“1”,例如,如图7所示,将(a)图所示的ResNet第9个残差单元增加一个通道数不变的残差单元,即将编码为“121-111-211”的ResNet变异为编码为“121-111-2111”的ResNet,如(e)图所示。
(5)在该ResNet中随机删除一个通道数不变的残差单元,具体实现方式可以是在该ResNet的编码中随机删除一个“1”。例如,如图7所示,将(a)图所示的ResNet第5个残差单元删除,编码为“121-111-211”的ResNet变异为编码为“121-11-211”的ResNet,如(f)图所示。
不限于上述5中变异方式,本申请实施例还可以包括其他变异方式,例如,在ResNet的编码中随机增加一个“-”或随机减少一个“-”;又例如,在ResNet的编码中随机删除一个“2”或随机增加一个“2”等,其变异后的ResNet的具体结构,可参照各个残差单元的编码的含义来推断,此处不再赘述。
在一些实施例中,神经网络可以是卷积神经网络,基本单元可以称为层结构,该卷积神经网络由卷积层、池化层、全连接层组成。其中,全连接层的个数可以是预先设定的或变化的,此处不作限定。利用层结构的参数对CNN的网络结构进行编码,其中层结构可以是卷积层或池化层。例如,通过有序的符号来指代CNN中各个层结构的次序;编码为“1”的层结构,表示该层结构为卷积层且通道数保持不变;编码为“2”的层结构表示该层结构为卷积层且通道数加倍;前方编码了“-”的层结构表示该层结构中卷积核的步长由1变为2;编码为“3”的层结构表示该层结构为池化层,的特征图尺寸缩小一半;编码为“3”“4”、“5”的层结构表示该层结构为池化层,其中,编码为“3”的池化层采用平均池化,编码为“4”的池化层采用平均池化、编码为“5”的池化层采用LP池化。在本申请实施例中,以池化层选择对输入图像中2×2的区域进行池化操作,其作用是减小卷积产生的特征图的尺寸为原来的1/4为例来说明,在本申请其他实现方式中,还可以对其他种类的池化层进行编码,也可以通过编码区分池化操作的选择的区域,此处不作限定。
例如,编码“121-113-211”的CNN的网络结构如图8所示。其中,图8中通过一个层结构的宽度反映其通道的多少,一个层结构的长度反映其特征图的尺寸。
利用若干个“1”、“2”、“-”、“3”、“4”、“5”的组合就得到一个CNN。计算设备随机生成CNN的过程,可以转换为随机生成字符串的过程。应理解,计算设备在随机生成字符串时,需要添加约束条件,或者需要对随机生成的字符串进行过滤,以去除不符合要求的CNN。例如两个字符“-”不可能连续排列,“-”之后不会邻接“3”,池化层也不会连续出现,即“3”、 “4”、“5”不相邻。
计算设备针对一个CNN可以变异产生多个变异后的CNN,其中,每一个变异后的神经网络都是CNN通过一次变异产生的。本申请实施例中,计算设备对CNN进行一次变异具体可以是如下实现方式中的一种:
(1)随机将CNN中一层卷积层的通道数由保持不变改变为通道数加倍,具体实现方式可以是随机将该CNN的编码中的一个“1”变换成“2”。例如,如图8所示,将(a)图所示的CNN第8个层结构的通道数由原来的通道数保持不变改为通道数加倍,则编码“121-113-211”的CNN变异为编码为“121-113-212”的CNN,如(b)图所示,应理解,位于第8个层结构后的所有层结构的通道数在原来的基础上加倍。应理解,也可以随机将CNN中的多个卷积层的通道数加倍,本申请实施例作限定。
(2)随机将CNN中一层卷积层的通道数由通道数加倍改变为通道数保持不变,具体实现方式可以是随机将该CNN的编码中的一个“2”变换成“1”。例如,如图8所示,将(a)图所示的CNN第7个层结构的通道数由原来的通道数加倍改为通道数保持不变,则编码“121-113-211”的CNN变异为编码为“121-113-111”的CNN,如(c)图所示,应理解,位于第7个层结构后的所有层结构的通道数在原来的基础上减小50%。
(3)随机将CNN中的两层卷积层的位置进行交换,具体的实现可以是随机交换CNN的编码中一个符合“1”和一个符号“2”之间的位置,得到变异后的CNN的编码。例如,编码“121-113-211”的CNN通过该变异过程,得到变异后的CNN的编码为“121-113-211”。
(4)随机将CNN中一层卷积层的步长由原来的2变换为1,将另一层卷积层的步长由原来的1变为2,具体实现方式可以是随机将CNN的编码中的一个符号“-”的位置从一层卷积层之前移动到另一层卷积层之前。如图8所示,将(a)图所示的编码为“121-123-211”的CNN中第3个层结构的步长由2变为1,而将第4个层结构的步长由1变为2,得到变异后的CNN,即编码为“121-1132-11”的CNN,如(d)图所示。
(5)随机将CNN中一层或多层卷积层的步长加倍,具体实现方式可以是将CNN的编码中随机插入一个或多个符号“-”,插入符号“-”后的CNN编码中不包括两个相邻的符号“-”,且符号-”之后不会邻接“3”。
(6)随机变换CNN中一层卷积层和一个池化层的位置,应理解,池化层不位于CNN的初始位置,编码为“121-123-211”的CNN中第5个卷积层与第1个池化层的位置进行交换,得到变异后的编码为“121-132-211”的CNN。
(7)在该CNN中随机插入一层卷积层,该插入的卷积层可以是通道数不变的卷积层,也可以是通道数加倍的卷积层,具体实现方式可以是在该CNN的编码中随机插入一个“1”或“2”,例如,如图8所示,将(a)图所示的CNN的编码中第5个卷积层后增加一个“1”,即将编码为“121-113-211”的CNN变异为编码为“121-1131-211”的CNN,如(e)图所示。
(8)在该CNN中随机删除一层卷积层,该卷积层可以是通道数不变的卷积层,也可以是通道数加倍的卷积层,具体实现方式可以是:在该CNN的编码中随机删除一个“1”或“2”。例如,将CNN第8个层结构删除,编码为“121-123-211”的CNN变异为编码为“121-21-11”的CNN,如(f)图所示。将CNN的编码中随机删除一个或多个符号“1”,或,随机删除一个或多个符号“2”。
(9)在该CNN中随机增加一个或多个池化层,或者随机删除一个或多个池化层,应理解,不会在池化层的之前后之后增加一个池化层,也就是说,变异后的CNN不会出现两个池化层相邻的情况,具体实现方式可以是:在该CNN的编码中随机删除一个“3”或随机增加一个“3”,得到变异后的CNN,应理解,增加在“3”之前或之后的“3”得到的CNN需要被过滤掉。如图8所示,将(a)图所示的CNN中CNN第8个层结构后增加一个池化层,即编码为“121-113-211”的CNN变异为编码为“121-113-2131”的CNN,如(f)图所示。
不限于上述变异操作,CNN还可以包括其他变异操作,此处不再赘述。
需要说明的是,上述ResNet中的残差单元可以包括普通卷积层、空洞卷积层、深度可分类卷积层、全连接层等中的一个或多个。上述CNN中的卷积层可以是普通卷积层、空洞卷积层、深度可分类卷积层等。ResNet中各个残差单元的内部网络结构可以相同或不同,CNN中的各个卷积层的种类可以相同或不同,本申请实施例不作限定。
在一种实现中,上述ResNet中的残差单元可以仅包括普通卷积层或包括普通卷积层与全连接层的组合;上述CNN的卷积层可以是普通卷积层,而不包括空洞卷积层、深度可分类卷积层,以避免NPU芯片不支持空洞卷积层、深度可分类卷积层,所导致的神经网络搜索方法无法应用于硬件平台,以使本申请提供的神经网络搜索方法可以普遍应用于各个设备或平台。
实施例二
在计算设备得到目标神经网络模型后,可以将目标神经网络模型发送给客户设备或用户设备,进而,客户设备和用于设备可以基于该目标神经网络模型实现相应的功能。
在一个实施例中,如图2A所示的场景A,本申请实施例的神经网络的搜索方法可以应用于自动驾驶领域,例如,车辆通过摄像头获取图像,以实时观测车辆周围环境中的障碍物,进而,车辆或与车辆通信连接的设备可以基于识别到的周围环境中物体,做出决策,以安全驾驶。如图9A所示,为本申请实施例提供的一种物体识别方法,该物体识别方法可以由如图2A中车辆、图3中客户设备31或用户设备34执行,该方法包括但不限于如下步骤:
S902:获取待识别图像。
S904:将待识别图像输入到物体识别神经网络,得到待识别图像对应的物体类型。
在一些实施例中,该待识别图像可以是车辆通过摄像头获取的周围环境的图像,通过物体识别神经网络对该待识别图像进行处理,以识别车辆的周围环境中的物体。
其中,物体识别神经网络可以是在搜索空间中通过上述实施例一所述的神经网络的搜索方法确定的网络,此时,上述实施例一中的数据集中的每一个样本包括样本图像和该样 本图像对应的物体类型。
其中,搜索空间是通过基本单元和该基本单元的参数构建的,搜索空间用于搜索物体识别神经网络,该基本单元的参数包括基本单元的类型、通道数参数和尺寸参数等中的至少一项,基本单元用于对输入该基本单元的特征图进行第一操作和第二操作,此处特征图为待识别图像的特征图,第一操作用于将输入基本单元的特征图的个数加倍或保持不变,第二操作用于将输入该基本单元的特征图的尺寸从原来的第一尺寸变为第二尺寸或维持第一尺寸不变,第一尺寸大于第二尺寸,例如第一尺寸为第二尺寸的2倍,这里尺寸是指特征图的边长。通道数参数用于指示经过基本单元处理后的特征图的个数的变化,如加倍、保持不变;尺寸参数用于指示经过基本单元处理后的特征图的尺寸的变化,如缩小一倍、保持不变等。
在一种实现中,搜索空间中的神经网络可以是ResNet,此时,基本单元也称为残差单元,残差单元可以是至少2个是卷积层构成等。残差单元还包括残差模块,该残差模块用于将输入残差单元的特征图与所述输入残差单元的特征图经过残差单元处理后的特征图进行相加,并将相加结果输入到下一个残差单元。本申请实施例中,可以通过对残差单元进行编码构建神经网络,通过变异扩展搜索空间,具体实现可以参见上述图7中相关描述,此处不再赘述。
在一种实现中,搜索空间中的神经网络可以是CNN,此时,基本单元也称为层结构,层结构可以是卷积层、池化层等。可以通过编码构建神经网络,通过变异扩展搜索空间,具体实现可以参见上述图8中相关描述,此处不再赘述。
实施例三
在一个实施例中,如图2B所示的场景B,本申请实施例的神经网络的搜索方法可以应用于图像识别领域,例如,用户设备通过摄像头获取图像,进而,可以基于识别到的周围环境中物体,做出决策,以安全驾驶。如图9B所示,为本申请实施例提供的一种手势识别方法,该手势识别方法可以由如图2B中监控器、手机、智能电视等用户设备、图3中客户设备31或用户设备34执行,该方法包括但不限于如下步骤:
S906:获取待识别图像。
S908:将待识别图像输入到手势识别神经网络,以得到待识别图像对应的手势类型。
进一步地,用户设备还可以根据手势类型,执行该识别到的手势类型对应的操作,例如识别到第一手势时,打开音乐播放器;又例如,在来点时,若识别到第二手势时,则接通电话等。
其中,手势识别神经网络可以是在搜索空间中通过上述实施例一所述的神经网络的搜索方法确定的网络,此时,上述实施例一中的数据集中的每一个样本包括样本图像和该样本图像对应的手势类型。
同上述实施例二,本申请实施例中搜索空间是通过基本单元和该基本单元的参数构建的,搜索空间用于搜索物体识别神经网络,该基本单元的参数包括基本单元的类型、通道数参数和尺寸参数中的至少一项,基本单元用于对输入该基本单元的特征图进行第一操作和第二操作,此处特征图为待识别图像的特征图,第一操作用于将输入基本单元的特征图的个数加倍或保持不变,第二操作用于将输入该基本单元的特征图的尺寸从原来的第一尺 寸变为第二尺寸或维持第一尺寸不变,第一尺寸大于第二尺寸,例如,第一尺寸为第二尺寸的2倍,这里尺寸是指特征图的边长。通道数参数用于指示经过基本单元处理后的特征图的个数的变化,如加倍、保持不变;尺寸参数用于指示经过基本单元处理后的特征图的尺寸的变化,如缩小一倍、保持不变等。
在一种实现中,搜索空间中的神经网络可以是ResNet,此时,基本单元也称为残差单元,残差单元可以是至少2个是卷积层构成等。残差单元还包括残差模块,该残差模块用于将输入残差单元的特征图与所述输入残差单元的特征图经过残差单元处理后的特征图进行相加,并将相加结果输入到下一个残差单元。本申请实施例中,可以通过对残差单元进行编码构建神经网络,通过变异扩展搜索空间,具体实现可以参见上述图7中相关描述,此处不再赘述。
在一种实现中,搜索空间中的神经网络可以是CNN,此时,基本单元也称为层结构,层结构可以是卷积层、池化层等。可以通过编码构建神经网络,通过变异扩展搜索空间,具体实现可以参见上述图8中相关描述,此处不再赘述。
需要说明的是,上述实施例二和实施例三中关于场景的描述可分别参见上述场景A或场景B中相关描述,此处不再赘述。
下面结合场景A和场景B介绍应用本申请中神经网络搜索的方法得到的模型。
图10A中横轴是架构在芯片平台的运行时间,纵轴是在数据集(ImageNet)上的top1精度。ResNet18是专家模型在芯片平台的运行时间和在数据集(ImageNet)上的top1精确度(训练40个epochs)。其他点是我们找到的在相同运行速度下找到的最好的模型。从图10A中可以看出,线框1001里的所有模型都在速度以及精确度上优于现有的ResNet18模型。以线框1001中最左侧的点为例,在保证相同的精确度的情况下,我们搜索的模型的速度是4.42毫秒一张,而ResNet是8.11毫秒一张。速度快了近2倍。由此可看出我们针对硬件平台设计的搜索空间确实可以搜索出很多比专家模型运行更快,精确更高的架构。如下为线框1001中的一些ResNet和专家模型ResNet18,在数据集(ImageNet)上完整训练,得到的模型的运行时间、Top-1精确度和Top-5精确度的比对表,如表1:
模型名称 运行时间(ms) Top-1 Top-5
Resnet18 8.113 69.70 89.30
12-11112-1112 4.292 69.98 89.39
1-21112-111121 4.635 70.21 89.55
12-1111-21121 4.430 70.32 89.59
112-1111-21111112 4.644 70.45 89.68
12-11121-121 5.352 70.61 89.76
112-2-1111121 4.921 70.82 89.90
12-2-11111211 5.547 71.14 90.06
21-111112-1211 6.415 71.24 90.23
21-111121-2111 7.690 72.04 90.60
1211-11112-1111121 7.268 72.18 90.79
表1
从表1可以看出应用本申请中神经网络搜索的方法得到的模型经过完整训练均比原来 的ResNet18更快更好。最快的模型从8.11毫秒缩短到4.29毫秒(速度提升48%),精度还有0.28%的提升。而且,这些模型只用了conv1x1以及conv3x3的常用操作(ResNet18里有的),没有使用特别的卷积操作,对硬件友好。
图10B中横轴是模型的参数量,纵轴是在数据集(ImageNet)上的top1精度。点B和点C分别是ResNet18-1/4和ResNet18-1/8是专家模型。其他点通过本申请神经网络搜索的方法得到的模型。从图10B中可以看出,线框1002里的所有模型都在速度以及精确度上优于现有的是ResNet18-1/8模型;线框1003的所有模型都在速度以及精确度上优于ResNet18-1/4模型。
有上述场景A和场景B可见,本申请实施例提供的神经网络搜索的方法在不同的场景下都能有效地提高专家模型的结果,该方法具有一定的通用性。
下面介绍本申请实施例涉及的装置、设备。
如图11所示,为本申请实施例提供的一种神经网络的搜索装置,该装置1100可以是图3所示的系统中的计算设备32,该装置1100可以包括但不限于如下功能单元:
获取模块1110,用于获取数据集和N个神经网络,N为正整数;
演化模块1120,用于对所述N个神经网络进行K次演化,得到第K次演化得到的神经网络,K为正整数;
其中,所述演化模块1120包括变异单元1121、第一筛选单元1122和第二筛选单元1123,其中,
所述变异单元1121用于:在第i次演化过程中,对第i-1次演化得到的神经网络的网络结构进行变异,得到变异后的神经网络;
所述第一筛选单元1122用于:在第i次演化过程中,从所述变异后的神经网络中筛选出网络结构优于所述第i-1次演化得到的神经网络的神经网络,得到候选神经网络;
所述第二筛选单元1123用于:在第i次演化过程中,根据所述第i-1次演化得到的神经网络和所述候选神经网络的集合中每一个神经网络对应的P个评价参数,从所述集合中筛选出第i次演化得到的神经网络;其中,所述P个评价参数用于评价所述集合中每一个神经网络的通过所述数据集训练和测试后的神经网络的性能,i、P为正整数,1≤i≤K。
在一种可能的实现中,所述变异单元1121具体用于对所述第i-1次演化得到的神经网络中的第一神经网络进行变异,所述变异单元1121在对所述第i-1次演化得到的神经网络中的第一神经网络进行变异执行如下至少一个步骤:
将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的两层卷积层的位置进行交换;
将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一层或多层卷积层的通道数加倍;
在所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一层或多层卷积层的卷积核的步长加倍;
在所述第i-1次演化得到的神经网络中的一个或多个神经网络中插入一层或多层卷积层;
在所述第i-1次演化得到的神经网络中的一个或多个神经网络中删除一层或多层卷积层;
在所述第i-1次演化得到的神经网络中的一个或多个神经网络中插入一层或多层池化层;
在所述第i-1次演化得到的神经网络中的一个或多个神经网络中删除一层或多层池化层。
在一种可能的实现中,所述变异单元1121具体用于对所述第i-1次演化得到的神经网络中的第一神经网络进行变异,所述变异单元1121在对所述第i-1次演化得到的神经网络中的第一神经网络进行变异执行如下至少一个步骤:
将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的两个残差单元的位置进行交换;
将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一个或多个残差单元的通道数加倍;
在所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一个或多个残差单元的卷积核的步长加倍;
在所述第i-1次演化得到的神经网络中的一个或多个神经网络中插入一个或多个残差单元;
在所述第i-1次演化得到的神经网络中的一个或多个神经网络中删除一个或多个残差单元。
在一种可能的实现中,所述第一筛选单元1122具体用于:从所述第一神经网络变异后的神经网络中筛选出网络结构优于所述第一神经网络的神经网络,候选神经网络包括所述第一神经网络变异后的神经网络中网络结构优于所述第一神经网络的神经网络,所述第一神经网络为所述第i-1次演化得到的神经网络中的任意一个神经网络。
在一种可能的实现中,在满足如下条件中的至少一种时,所述第一神经网络变异后的神经网络的网络结构优于所述第一神经网络的网络结构:
所述第一神经网络变异后的神经网络的通道数大于所述第一神经网络的通道数;
所述第一神经网络变异后的神经网络中卷积层的层数大于所述第一神经网络中是卷积层的层数。
在一种可能的实现中,所述第二筛选单元1123具体用于:根据所述集合中每一个神经网络对应的P个评价参数对所述集合中的神经网络进行非主导排序;确定所述第i次演化得到的神经网络为所述集合中不被主导的神经网络;其中,第二神经网络和第三神经网络为所述集合中的两个神经网络,若针对所述P个评价参数中每一个评价参数所述第二神经网络都不劣于所述第三神经网络且针对所述P个评价参数中的至少一个评价参数所述第二神经网络优于所述第三神经网络时,则所述第二神经网络主导所述第三神经网络。
在一种可能的实现中,所述获取模块1110具体用于:随机生成M个神经网络,M为正整数;通过所述数据集对所述M个神经网络分别进行训练和测试,得到所述M个神经网络中每一个神经网络对应的P个评价参数;以及,根据所述M个神经网络的中每一个神经网络对应的P个评价参数,从所述M个神经网络中选出N个神经网络,N不大于M。
在一种可能的实现中,所述P个评价参数包括运行时间、精确度、参数量中的至少一个。
需要说明的是,上述各个单元的具体实现可以参见上述实施例一所述的神经网络搜索方法中相关描述,此处不再赘述。
如图12A所示为本申请实施例提供的一种物体识别装置,该装置1200可以是图3所示的系统中客户设备31或用户设备34,该装置1200可以包括但不限于如下功能单元:
获取单元1210,用于获取待识别图像,所述待识别图像为车辆的周围环境的图像;
识别单元1220,用于将所述待识别图像输入到物体识别神经网络,得到所述待识别图像对应的物体类型;
其中,所述物体识别神经网络是在搜索空间中通过如实施例一所述的神经网络的搜索方法确定的网络,所述搜索空间是通过基本单元和所述基本单元的参数构建的。
可选地,所述基本单元的参数包括基本单元的类型、通道数参数和尺寸参数中的至少一项。
可选地,所述基本单元用于对输入所述基本单元的特征图进行第一操作和第二操作,所述特征图为所述待识别图像的特征图,所述第一操作用于将输入所述基本单元的特征图的个数加倍或保持不变,所述第二操作用于将输入所述基本单元的特征图的尺寸从原来的第一尺寸变为第二尺寸或维持所述第一尺寸不变,所述第一尺寸大于所述第二尺寸。
需要说明的是,上述各个单元的具体实现可以参见上述实施例二所述的物体识别方法中相关描述,此处不再赘述。
如图12B所示为本申请实施例提供的一种手势识别装置,该装置1201可以是图3所示的系统中客户设备31或用户设备34,该装置1201可以包括但不限于如下功能单元:
获取单元1230,用于获取待识别图像;
识别单元1240,用于将所述待识别图像输入到的手势识别神经网络,得到所述待识别图像中的手势类型;
其中,所述手势识别神经网络是在搜索空间中通过实施例一中所述的神经网络的搜索方法确定的网络,所述搜索空间是通过基本单元和所述基本单元的参数构建的。
可选地,所述基本单元的参数包括基本单元的类型、通道数参数和尺寸参数中的至少一项。
可选地,所述基本单元用于对输入所述基本单元的特征图进行第一操作和第二操作,所述特征图为所述待识别图像的特征图,所述第一操作用于将输入所述基本单元的特征图的个数加倍或保持不变,所述第二操作用于将输入所述基本单元的特征图的尺寸从原来的第一尺寸变为第二尺寸或维持所述第一尺寸不变,所述第一尺寸大于所述第二尺寸,在一个示例中,所述第一尺寸是所述第二尺寸的2倍。
需要说明的是,上述各个单元的具体实现可以参见上述实施例三所述的手势识别方法中相关描述,此处不再赘述。
图13是本申请实施例提供的一种神经网络的搜索装置的硬件结构示意图。图13所示的神经网络的训练装置1300(该装置1300具体可以是一种计算机设备)可以包括存储器1301、处理器1302、通信接口1303以及总线1304。其中,存储器1301、处理器1302、通信接口1303通过总线1304实现彼此之间的通信连接。
存储器1301可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器1301可以存储程序,当存储器1301中存储的程序被处理器1302执行时,处理器1302和通信接口1303用于执行本申请实施例的神经网络的搜索方法中的全部或部分步骤。
处理器1302可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的神经网络的训练装置中的单元所需执行的功能,或者执行本申请方法实施例一中的神经网络的搜索方法中的全部或部分步骤。
处理器1302还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的神经网络的训练方法的各个步骤可以通过处理器1302中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1302还可以是通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1301,处理器1302读取存储器1301中的信息,结合其硬件完成本申请实施例的神经网络的搜索装置中包括的单元所需执行的功能,或者执行本申请方法实施例的神经网络的搜索方法中的全部或部分步骤。
通信接口1303使用例如但不限于收发器一类的收发装置,来实现装置1300与其他设备或通信网络之间的通信。例如,可以通过通信接口1303获取数据集。
总线1304可包括在装置1300各个部件(例如,存储器1301、处理器1302、通信接口1303)之间传送信息的通路。
应理解,神经网络的搜索装置1100中的获取模块1110可以相当于神经网络搜索装置1300中的通信接口1303,演化模块1120可以相当于处理器1302。
图14为本发明实施例中一种电子设备的示意性框图;图14所示的电子设备1400(该装置1400具体可以是一种终端、车辆、服务器或者其他设备)包括存储器1401、基带芯片1402、射频模块1403、外围系统1404和传感器1405。基带芯片1402包括至少一个处理器14021,例如CPU,时钟模块14022和电源管理模块14023;外围系统1404包括摄像头14041、音频模块14042、触摸显示屏14043等,进一步地,传感器1405可以包括光线 传感器14051、加速度传感器14052、指纹传感器14053等;外围系统1404和传感器1405包括的模块可以视实际需要来增加或者减少。上述任意两个相连接的模块可以具体通过总线相连,该总线可以是工业标准体系结构(英文:industry standard architecture,简称:ISA)总线、外部设备互连(英文:peripheral component interconnect,简称:PCI)总线或扩展标准体系结构(英文:extended industry standard architecture,简称:EISA)总线等。
射频模块1403可以包括天线和收发器(包括调制解调器),该收发器用于将天线接收到的电磁波转换为电流并且最终转换为数字信号,相应地,该收发器还用于将该装置1400将要输出的数字信号据转换为电流然后转换为电磁波,最后通过该天线将该电磁波发射到自由空间中。射频模块1403还可包括至少一个用于放大信号的放大器。通常情况下,可以通过该射频模块1403进行无线传输,如蓝牙(英文:Bluetooth)传输、无线保证(英文:Wireless-Fidelity,简称:WI-FI)传输、第三代移动通信技术(英文:3rd-Generation,简称:3G)传输、第四代移动通信技术(英文:the 4th Generation mobile communication,简称:4G)传输等。
触摸显示屏14043可用于显示由用户输入的信息或向用户展示信息,触摸显示屏14043可包括触控面板和显示面板,可选的,可以采用液晶显示器(英文:Liquid Crystal Display,简称:LCD)、有机发光二极管(英文:Organic Light-Emitting Diode,简称:OLED)等形式来配置显示面板。进一步的,触控面板可覆盖显示面板,当触控面板检测到在其上或附近的触摸操作后,传送给处理器14021以确定触摸事件的类型,随后处理器14021根据触摸事件的类型在显示面板上提供相应的视觉输出。触控面板与显示面板是作为两个独立的部件来实现装置1400的输入和输出功能,但是在某些实施例中,可以将触控面板与显示面板集成而实现装置1400的输入和输出功能。
摄像头14041用于获取图像,以输入到物体识别神经网络。应理解,此情况下,物体识别神经网络是用于实现对图像进行处理的深度神经网络。
音频输入模块14042具体可以为麦克风,可以获取语音。本身实施例中,装置1400可以将语音转换为文本,进而将该文本输入到压缩后的神经网络。应理解,此情况下,压缩后的神经网络是用于实现对文本进行处理的深度神经网络。如,对场景C中文本意别网络压缩后的神经网络。
传感器1405用于可以包括光线传感器14051、加速度传感器14052、指纹传感器14052,其中,光线传感器14051用于获取环境的光强,加速度传感器14052(比如陀螺仪等)可以获取装置1400的运动状态,指纹传感器14053可以输入的指纹信息;传感器1405感应到相关信号后将该信号量化为数字信号并传递给处理器14021做进一步处理。
存储器1401可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1401可选的还可以包括至少一个位于远离前述处理器14021的存储装置,该存储器1401可以具体包括存储指令区和存储数据区,其中,存储指令区可存储操作系统、用户接口程序、通信接口程序等程序,该存储数据区可存储该处理在执行相关操作所需要的数据,或者执行相关操作所产生的数据。
处理器14021是装置1400的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行存储在存储器1401内的程序,以及调用存储在存储器1401内的数据,执行装置 1400的各项功能。可选的,处理器14021可包括一个或多个应用处理器,该应用处理器主要处理操作系统、用户界面和应用程序等。在本申请实施例中,处理器14021读取存储器1401中的信息,结合其硬件完成本申请实施例的物体识别装置1200或手势识别装置1201中包括的单元所需执行的功能,或者执行本申请方法实施例的物体识别方法或手势识别方法。
通过射频模块1403用户实现该装置1400的通信功能,具体地,装置1400可以接收图3中的客户设备31或者计算设备32发送的目标神经网络或其他数据。
图14中所述的各个功能单元的具体实现可以参见上述实施例二或实施例三中相关描述,本申请实施例不再赘述。
应注意,尽管图13和图14所示的装置1300和1400仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置1300和1400还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置1300和1400还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置1300和1400也可仅仅包括实现本申请实施例所必须的器件,而不必包括图13和图14中所示的全部器件。
可以理解,所述装置1300相当于图3中的所述计算设备32或计算设备32中的节点,所述装置1400相当于图3中的所述述客户设备31或用户设备34。本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所 使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (22)

  1. 一种神经网络的搜索方法,其特征在于,包括:
    计算设备获取数据集和N个神经网络,N为正整数;
    所述计算设备对所述N个神经网络进行K次演化,得到第K次演化得到的神经网络,K为正整数;
    其中,第i次演化包括:
    所述计算设备对第i-1次演化得到的神经网络的网络结构进行变异,得到变异后的神经网络;
    所述计算设备从所述变异后的神经网络中筛选出网络结构优于所述第i-1次演化得到的神经网络的神经网络,得到候选神经网络;
    所述计算设备根据所述第i-1次演化得到的神经网络和所述候选神经网络的集合中每一个神经网络对应的P个评价参数,从所述集合中筛选出第i次演化得到的神经网络;其中,所述P个评价参数用于评价所述集合中每一个神经网络的通过所述数据集训练和测试后的神经网络的性能,i、P为正整数,1≤i≤K。
  2. 如权利要求1所述的方法,其特征在于,所述计算设备对第i-1次演化得到的神经网络的网络结构进行变异,包括如下至少一个步骤:
    将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的两层卷积层的位置进行交换;
    将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一层或多层卷积层的通道数加倍;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一层或多层卷积层的卷积核的步长加倍;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中插入一层或多层卷积层;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中删除一层或多层卷积层;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中插入一层或多层池化层;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中删除一层或多层池化层。
  3. 如权利要求1所述的方法,其特征在于,所述第i-1次演化得到的神经网络为深度残差网络,所述计算设备对第i-1次演化得到的神经网络的网络结构进行变异,包括如下至少一个步骤:
    将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的两个残差单元的位置进行交换;
    将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一个或多个残差单元的通道数加倍;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一个或多个残差单元的卷积核的步长加倍;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中插入一个或多个残差单元;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中删除一个或多个残差单元。
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述计算设备从所述变异后的神经网络中筛选出网络结构优于所述第i-1次演化得到的神经网络的神经网络,得到候选神经网络,包括:
    所述计算设备从第一神经网络变异后的神经网络中筛选出网络结构优于所述第一神经网络的神经网络,所述候选神经网络包括所述第一神经网络变异后的神经网络中网络结构优于所述第一神经网络的神经网络,所述第一神经网络为所述第i-1次演化得到的神经网络中的任意一个神经网络。
  5. 如权利要求4所述的方法,其特征在于,在满足如下条件中的至少一种时,所述第一神经网络变异后的神经网络的网络结构优于所述第一神经网络的网络结构:
    所述第一神经网络变异后的神经网络的通道数大于所述第一神经网络的通道数;
    所述第一神经网络变异后的神经网络中卷积层的层数大于所述第一神经网络中卷积层的层数。
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述计算设备根据所述第i-1次演化得到的神经网络和所述候选神经网络的集合中每一个神经网络对应的P个评价参数,从所述集合中筛选出第i次演化得到的神经网络,具体包括:
    所述计算设备根据所述集合中每一个神经网络对应的P个评价参数对所述集合中的神经网络进行非主导排序;
    所述计算设备确定所述第i次演化得到的神经网络为所述集合中不被主导的神经网络;
    其中,第二神经网络和第三神经网络为所述集合中的两个神经网络,若针对所述P个评价参数中每一个评价参数所述第二神经网络都不劣于所述第三神经网络且针对所述P个评价参数中的至少一个评价参数所述第二神经网络优于所述第三神经网络时,则所述第二神经网络主导所述第三神经网络。
  7. 如权利要求1-6任一项所述的方法,其特征在于,所述计算设备获取N个神经网络,具体包括:
    所述计算设备随机生成M个神经网络,M为正整数;
    所述计算设备通过所述数据集对所述M个神经网络分别进行训练和测试,得到所述M 个神经网络中每一个神经网络对应的P个评价参数;
    所述计算设备根据所述M个神经网络的中每一个神经网络对应的P个评价参数,从所述M个神经网络中选出N个神经网络,N不大于M。
  8. 如权利要求1-7任一项所述的方法,其特征在于,所述P个评价参数包括运行时间、精确度、参数量中的至少一个。
  9. 一种神经网络的搜索装置,其特征在于,包括:
    获取模块,用于获取数据集和N个神经网络,N为正整数;
    演化模块,用于对所述N个神经网络进行K次演化,得到第K次演化得到的神经网络,K为正整数;
    其中,所述演化模块包括变异单元、第一筛选单元和第二筛选单元,其中,
    所述变异单元用于:在第i次演化过程中,对第i-1次演化得到的神经网络的网络结构进行变异,得到变异后的神经网络,所述第0次演化得到的神经网络为所述N个神经网络;
    所述第一筛选单元用于:在第i次演化过程中,从所述变异后的神经网络中筛选出网络结构优于所述第i-1次演化得到的神经网络的神经网络,得到候选神经网络;
    所述第二筛选单元用于:在第i次演化过程中,根据所述第i-1次演化得到的神经网络和所述候选神经网络的集合中每一个神经网络对应的P个评价参数,从所述集合中筛选出第i次演化得到的神经网络;其中,所述P个评价参数用于评价所述集合中每一个神经网络的通过所述数据集训练和测试后的神经网络的性能,i、P为正整数,1≤i≤K。
  10. 如权利要求9所述的装置,其特征在于,所述变异单元具体用于执行如下至少一个步骤:
    将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的两层卷积层的位置进行交换;
    将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一层或多层卷积层的通道数加倍;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一层或多层卷积层的卷积核的步长加倍;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中插入一层或多层卷积层;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中删除一层或多层卷积层;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中插入一层或多层池化层;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中删除一层或多层池化层。
  11. 如权利要求9所述的装置,其特征在于,所述第i-1次演化得到的神经网络为深度残差网络,所述变异单元具体用于执行如下至少一个步骤:
    将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的两个残差单元的位置进行交换;
    将所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一个或多个残差单元的通道数加倍;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中的一个或多个残差单元的卷积核的步长加倍;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中插入一个或多个残差单元;
    在所述第i-1次演化得到的神经网络中的一个或多个神经网络中删除一个或多个残差单元。
  12. 如权利要求9-11任一项所述的装置,其特征在于,所述第一筛选单元具体用于:
    从第一神经网络变异后的神经网络中筛选出网络结构优于所述第一神经网络的神经网络,所述候选神经网络包括所述第一神经网络变异后的神经网络中网络结构优于所述第一神经网络的神经网络,所述第一神经网络为所述第i-1次演化得到的神经网络中的任意一个神经网络。
  13. 如权利要求12所述的装置,其特征在于,在满足如下条件中的至少一种时,所述第一神经网络变异后的神经网络的网络结构优于所述第一神经网络的网络结构:
    所述第一神经网络变异后的神经网络的通道数大于所述第一神经网络的通道数;
    所述第一神经网络变异后的神经网络中的卷积层的层数大于所述第一神经网络中卷积层的层数。
  14. 如权利要求9-13任一项所述的装置,其特征在于,所述第二筛选单元具体用于:
    根据所述集合中每一个神经网络对应的P个评价参数对所述集合中的神经网络进行非主导排序;
    确定所述第i次演化得到的神经网络为所述集合中不被主导的神经网络;
    其中,第二神经网络和第三神经网络为所述集合中的两个神经网络,若针对所述P个评价参数中每一个评价参数所述第二神经网络都不劣于所述第三神经网络且针对所述P个评价参数中的至少一个评价参数所述第二神经网络优于所述第三神经网络时,则所述第二神经网络主导所述第三神经网络。
  15. 如权利要求9-14任一项所述的装置,其特征在于,所述获取模块具体用于:
    随机生成M个神经网络,M为正整数;
    通过所述数据集对所述M个神经网络分别进行训练和测试,得到所述M个神经网络中每一个神经网络对应的P个评价参数;
    根据所述M个神经网络的中每一个神经网络对应的P个评价参数,从所述M个神经网络中选出N个神经网络,N不大于M。
  16. 如权利要求9-15任一项所述的装置,其特征在于,所述P个评价参数包括运行时间、精确度、参数量中的至少一个。
  17. 一种神经网络的搜索装置,其特征在于,包括:包括处理器和存储器,所述存储器用于存储程序,所述处理器执行所述存储器存储的程序,当所述存储器存储的程序被执行时,使得所述神经网络的搜索装置实现如权利要求1-9任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读介质用于存储有计算机可执行指令,所述计算机可执行指令在被所述计算机调用时用于使所述计算机实现如权利要求1-9任一项所述的方法。
  19. 一种物体识别方法,其特征在于,包括:
    获取待识别图像;
    将所述待识别图像输入到物体识别神经网络,得到所述待识别图像对应的物体类型;
    其中,所述物体识别神经网络是在搜索空间中通过如权利要求1-8任一项所述的神经网络的搜索方法确定的网络,所述搜索空间是通过基本单元和所述基本单元的参数构建的。
  20. 如权利要求19所述的方法,其特征在于,所述基本单元的参数包括所述基本单元的类型、通道数参数和尺寸参数中的至少一项。
  21. 如权利要求19所述的方法,其特征在于,所述基本单元用于对输入所述基本单元的特征图进行第一操作和第二操作,所述特征图为所述待识别图像的特征图,所述第一操作用于将输入所述基本单元的特征图的个数加倍或保持不变,所述第二操作用于将输入所述基本单元的特征图的尺寸从原来的第一尺寸变为第二尺寸或维持所述第一尺寸不变,所述第一尺寸大于所述第二尺寸。
  22. 一种物体识别方法装置,其特征在于,包括:包括处理器和存储器,所述存储器用于存储程序,所述处理器执行所述存储器存储的程序,当所述存储器存储的程序被执行时,使得所述神经网络的搜索装置实现如权利要求19-21任一项所述的方法。
PCT/CN2020/126795 2019-11-30 2020-11-05 神经网络的搜索方法、装置及设备 WO2021103977A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/826,873 US20220292357A1 (en) 2019-11-30 2022-05-27 Neural Network Search Method, Apparatus, And Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911209275.5A CN112884118A (zh) 2019-11-30 2019-11-30 神经网络的搜索方法、装置及设备
CN201911209275.5 2019-11-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/826,873 Continuation US20220292357A1 (en) 2019-11-30 2022-05-27 Neural Network Search Method, Apparatus, And Device

Publications (1)

Publication Number Publication Date
WO2021103977A1 true WO2021103977A1 (zh) 2021-06-03

Family

ID=76039379

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/126795 WO2021103977A1 (zh) 2019-11-30 2020-11-05 神经网络的搜索方法、装置及设备

Country Status (3)

Country Link
US (1) US20220292357A1 (zh)
CN (1) CN112884118A (zh)
WO (1) WO2021103977A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115506783A (zh) * 2021-06-21 2022-12-23 中国石油化工股份有限公司 一种岩性识别方法
CN117668701A (zh) * 2024-01-30 2024-03-08 云南迅盛科技有限公司 Ai人工智能机器学习系统及方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240055B (zh) * 2021-06-18 2022-06-14 桂林理工大学 基于宏操作变异神经架构搜索的色素性皮损图像分类方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021983A (zh) * 2016-10-28 2018-05-11 谷歌有限责任公司 神经架构搜索
CN108334949A (zh) * 2018-02-11 2018-07-27 浙江工业大学 一种优化深度卷积神经网络结构的快速进化方法
US20180365557A1 (en) * 2016-03-09 2018-12-20 Sony Corporation Information processing method and information processing apparatus
US20190057309A1 (en) * 2016-04-28 2019-02-21 Sony Corporation Information processing apparatus and information processing method
US20190122119A1 (en) * 2017-10-25 2019-04-25 SparkCognition, Inc. Adjusting automated neural network generation based on evaluation of candidate neural networks
US20190180186A1 (en) * 2017-12-13 2019-06-13 Sentient Technologies (Barbados) Limited Evolutionary Architectures For Evolution of Deep Neural Networks

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6470261B1 (en) * 1998-07-31 2002-10-22 Cet Technologies Pte Ltd Automatic freeway incident detection system and method using artificial neural network and genetic algorithms
US6553357B2 (en) * 1999-09-01 2003-04-22 Koninklijke Philips Electronics N.V. Method for improving neural network architectures using evolutionary algorithms
US20190138901A1 (en) * 2017-11-06 2019-05-09 The Royal Institution For The Advancement Of Learning/Mcgill University Techniques for designing artificial neural networks
CN108875904A (zh) * 2018-04-04 2018-11-23 北京迈格威科技有限公司 图像处理方法、图像处理装置和计算机可读存储介质
DE102018109835A1 (de) * 2018-04-24 2019-10-24 Albert-Ludwigs-Universität Freiburg Verfahren und Vorrichtung zum Ermitteln einer Netzkonfiguration eines neuronalen Netzes
CN108960411A (zh) * 2018-06-27 2018-12-07 郑州云海信息技术有限公司 一种卷积神经网络调整及相关装置
CN108985386A (zh) * 2018-08-07 2018-12-11 北京旷视科技有限公司 获得图像处理模型的方法、图像处理方法及对应装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180365557A1 (en) * 2016-03-09 2018-12-20 Sony Corporation Information processing method and information processing apparatus
US20190057309A1 (en) * 2016-04-28 2019-02-21 Sony Corporation Information processing apparatus and information processing method
CN108021983A (zh) * 2016-10-28 2018-05-11 谷歌有限责任公司 神经架构搜索
US20190122119A1 (en) * 2017-10-25 2019-04-25 SparkCognition, Inc. Adjusting automated neural network generation based on evaluation of candidate neural networks
US20190180186A1 (en) * 2017-12-13 2019-06-13 Sentient Technologies (Barbados) Limited Evolutionary Architectures For Evolution of Deep Neural Networks
CN108334949A (zh) * 2018-02-11 2018-07-27 浙江工业大学 一种优化深度卷积神经网络结构的快速进化方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115506783A (zh) * 2021-06-21 2022-12-23 中国石油化工股份有限公司 一种岩性识别方法
CN117668701A (zh) * 2024-01-30 2024-03-08 云南迅盛科技有限公司 Ai人工智能机器学习系统及方法
CN117668701B (zh) * 2024-01-30 2024-04-12 云南迅盛科技有限公司 Ai人工智能机器学习系统及方法

Also Published As

Publication number Publication date
CN112884118A (zh) 2021-06-01
US20220292357A1 (en) 2022-09-15

Similar Documents

Publication Publication Date Title
WO2020200213A1 (zh) 图像生成方法、神经网络的压缩方法及相关装置、设备
WO2021103977A1 (zh) 神经网络的搜索方法、装置及设备
WO2020221200A1 (zh) 神经网络的构建方法、图像处理方法及装置
WO2021078027A1 (zh) 构建网络结构优化器的方法、装置及计算机可读存储介质
WO2022068623A1 (zh) 一种模型训练方法及相关设备
WO2022116933A1 (zh) 一种训练模型的方法、数据处理的方法以及装置
WO2022042713A1 (zh) 一种用于计算设备的深度学习训练方法和装置
WO2021022521A1 (zh) 数据处理的方法、训练神经网络模型的方法及设备
WO2021218517A1 (zh) 获取神经网络模型的方法、图像处理方法及装置
CN113326930B (zh) 数据处理方法、神经网络的训练方法及相关装置、设备
WO2024041479A1 (zh) 一种数据处理方法及其装置
WO2021190296A1 (zh) 一种动态手势识别方法及设备
CN107229942A (zh) 一种基于多个分类器的卷积神经网络快速分类方法
WO2023231794A1 (zh) 一种神经网络参数量化方法和装置
JP2023510566A (ja) ニューラル・ネットワークのための適応的探索方法および装置
CN110795618B (zh) 内容推荐方法、装置、设备及计算机可读存储介质
WO2022111617A1 (zh) 一种模型训练方法及装置
CN112487217A (zh) 跨模态检索方法、装置、设备及计算机可读存储介质
WO2021175278A1 (zh) 一种模型更新方法以及相关装置
WO2022012668A1 (zh) 一种训练集处理方法和装置
WO2023231954A1 (zh) 一种数据的去噪方法以及相关设备
WO2023207487A1 (zh) 一种电路走线确定方法及相关设备
CN116368796A (zh) 数据处理方法和装置
CN113191479A (zh) 联合学习的方法、系统、节点及存储介质
WO2022063076A1 (zh) 对抗样本的识别方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20893961

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20893961

Country of ref document: EP

Kind code of ref document: A1