CN111695673A

CN111695673A - Method for training neural network predictor, image processing method and device

Info

Publication number: CN111695673A
Application number: CN202010387976.4A
Authority: CN
Inventors: 许奕星; 唐业辉; 王云鹤; 许春景
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2020-09-22
Also published as: WO2021227787A1

Abstract

The application relates to a method for training a neural network predictor, an image processing method and a device in the field of artificial intelligence, wherein the method for training the neural network predictor comprises the following steps: acquiring a first network structure of a first neural network and a second network structure of a second neural network, wherein the first network structure is a labeled network structure, and the label is used for indicating the performance of the first network structure; acquiring the similarity between the first network structure and the second network structure; training the neural network predictor according to the first network structure, the second network structure, the similarity and the label, wherein the neural network predictor is used for predicting the performance of the network structure. In the method of the embodiment of the application, the neural network predictor is trained by using the relationship among the network structures, so that the prediction accuracy of the trained neural network predictor can be improved under the condition of using a small amount of marking data.

Description

Method for training neural network predictor, image processing method and device

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to a method for training a neural network predictor, an image processing method, and an apparatus.

Background

Computer vision is an integral part of various intelligent/autonomous systems in various application fields, such as manufacturing, inspection, document analysis, medical diagnosis, military and the like, and is a study on how to use cameras/video cameras and computers to acquire data and information of a photographed object which are required by us. In a descriptive sense, a computer is provided with eyes (camera/camcorder) and a brain (algorithm) to recognize, track, measure, etc. a target instead of human eyes, thereby enabling the computer to perceive the environment. Because perception can be viewed as extracting information from sensory signals, computer vision can also be viewed as the science of how to make an artificial system "perceive" from images or multidimensional data. Generally, computer vision is to use various imaging systems to obtain input information instead of visual organs, and then use computer to process and interpret the input information instead of brain. The ultimate research goal of computer vision is to make a computer have the ability to adapt to the environment autonomously by visually observing and understanding the world like a human.

With the rapid development of artificial intelligence technology, neural networks (e.g., convolutional neural networks) have been widely used in the field of computer vision. The performance of a neural network is often related to the network structure of the neural network, and at present, the network structure of the neural network may be determined by a neural network structure search (NAS) method, for example, a network structure meeting the performance requirement is searched in a preset search space for a specific task.

However, it is very time consuming to determine whether the performance of the network fabric meets the performance requirements. Currently, one approach is to use neural net predictors to predict the performance of the net structure on a given data set.

Therefore, how to improve the prediction accuracy of the neural network predictor becomes a technical problem which needs to be solved urgently.

Disclosure of Invention

The application provides a method for training a neural network predictor, an image processing method and an image processing device, which are beneficial to improving the prediction accuracy of the neural network predictor.

In a first aspect, a method of training a neural network predictor is provided, the method comprising:

acquiring a first network structure of a first neural network and a second network structure of a second neural network, wherein the first network structure is a labeled network structure, and the label is used for indicating the performance of the first network structure; acquiring the similarity between the first network structure and the second network structure; training the neural network predictor according to the first network structure, the second network structure, the similarity and the label, wherein the neural network predictor is used for predicting the performance of the network structure.

In the embodiment of the present application, the relationship between the first network structure and the second network structure (e.g., the similarity between the first network structure and the second network structure) is used to assist in training the neural network predictor, so that the training effect of the neural network predictor can be improved under the condition of using a small amount of marking data (e.g., at least one labeled network structure), namely, the prediction accuracy of the trained neural network predictor can be improved.

With reference to the first aspect, in certain implementations of the first aspect, the obtaining the similarity between the first network structure and the second network structure includes: acquiring a first feature vector according to the first network structure, wherein the first feature vector is used for representing the first network structure; acquiring a second feature vector according to the second network structure, wherein the second feature vector is used for representing the second network structure; and acquiring the similarity according to the first feature vector and the second feature vector.

In the embodiment of the present application, the similarity is obtained through the first feature vector and the second feature vector, so that the similarity can more accurately describe the relationship between the two network structures, and therefore, the training effect of the neural network predictor, that is, the prediction accuracy of the trained neural network predictor can be further improved by using the similarity.

With reference to the first aspect, in certain implementations of the first aspect, the obtaining a first feature vector according to the first network structure includes: encoding the first network structure by using an encoder to obtain the first feature vector, wherein the encoder is used for encoding to obtain the feature vector representing the network structure; the obtaining a second feature vector according to the second network structure includes: and encoding the second network structure by using the encoder to obtain the second feature vector.

In the embodiment of the application, the network structure is encoded through the encoder, and the feature vector of the network structure can be conveniently obtained.

Alternatively, the encoder may be implemented by a Neural Network (NN). For example, the encoder may be a Recurrent Neural Network (RNN).

With reference to the first aspect, in certain implementations of the first aspect, the encoder is obtained by training: decoding the second feature vector by using a decoder to obtain a third network structure, wherein the decoder is used for decoding to obtain the network structure represented by the feature vector; training the encoder according to a difference between the second network structure and the third network structure.

In this embodiment, the second feature vector is decoded by a decoder to obtain a third network structure, and the encoder can be conveniently trained according to a difference between the second network structure and the third network structure without labeling data (for example, without labeling the feature vector output by the encoder).

Alternatively, the decoder may be trained during the training of the encoder.

For example, the encoder and the decoder may be trained simultaneously based on a difference between the second network structure and the third network structure.

For example, assuming that the second network structure is expected to be as consistent as possible with the third network structure, the encoder (and the decoder) can be conveniently trained without manually labeling the feature vectors output by the encoder, and taking the difference between the second network structure and the third network structure as a loss value.

Meanwhile, the training process does not need manual operation, so that the training process of the encoder (and the decoder) can be more automated.

Further, by training the encoder through a learning method, the accuracy of the feature vectors (of the first network structure and of the second network structure) extracted by using the encoder can be improved, that is, different feature vectors can more accurately embody the characteristics of different network structures (for example, the computing power of different network structures).

With reference to the first aspect, in certain implementations of the first aspect, the training the neural network predictor according to the first network structure, the second network structure, the similarities, and the labels includes: determining the performance of the first network structure according to the first feature vector, the second feature vector and the similarity; training the neural network predictor according to the performance of the first network structure and the label.

In this embodiment of the present application, the performance of the first network structure is predicted by using the first feature vector, the second feature vector and the relationship between the first network structure and the second network structure (e.g., the similarity between the two network structures), so that the performance of the (predicted) first network structure can be more accurate, and at this time, the training of the neural network predictor according to the performance of the first network structure and the label can improve the training effect of the neural network predictor, that is, the prediction accuracy of the trained neural network predictor can be improved.

With reference to the first aspect, in certain implementations of the first aspect, the similarity is a distance between the first feature vector and the second feature vector.

In this embodiment, the distance between the first feature vector and the second feature vector may more accurately represent the similarity between the first network structure and the second network structure, and the training effect of the neural network predictor may be further improved by training the neural network predictor according to the similarity.

With reference to the first aspect, in certain implementations of the first aspect, the neural network predictor is a graph convolutional neural network (GCN).

In the embodiment of the application, by the graph convolution neural network, the relationship between the first network structure and the second network structure (for example, the similarity between the first network structure and the second network structure) can be better utilized in the training process, so that the training effect of the neural network predictor can be improved.

With reference to the first aspect, in certain implementations of the first aspect, the neural network predictor is to predict a performance of a network structure of a target neural network, the target neural network being used for image processing.

Wherein the target neural network may be a Convolutional Neural Network (CNN).

For example, the target neural network may be used for image classification, image segmentation, image detection, image super-segmentation, and the like.

In a second aspect, there is provided an image processing method, comprising:

acquiring an image to be processed; using a neural network to perform image processing on the image to be processed; wherein the neural network is determined according to a neural network predictor, and the neural network predictor is obtained by training through the method in any one of the implementation manners of the first aspect.

Meanwhile, the neural network is determined according to a neural network predictor, and the effect of image processing can be improved by using the neural network.

Optionally, the image processing may include image classification, image segmentation, image detection, image super-segmentation, and the like.

In a third aspect, an apparatus for training a neural network predictor is provided, including:

a first obtaining module, configured to obtain a first network structure of a first neural network and a second network structure of a second neural network, where the first network structure is a labeled network structure, and the label is used to indicate performance of the first network structure; a second obtaining module, configured to obtain a similarity between the first network structure and the second network structure; a training module for training the neural network predictor according to the first network structure, the second network structure, the similarity and the label, the neural network predictor being used for predicting the performance of the network structure.

In the embodiment of the present application, the relationship between a plurality of network structures (for example, the similarity between the first network structure and the second network structure) is used to assist in training the neural network predictor, so that the training effect of the neural network predictor can be improved under the condition of using a small amount of marking data (for example, at least one labeled network structure), namely, the prediction accuracy of the trained neural network predictor can be improved.

With reference to the third aspect, in some implementation manners of the third aspect, the second obtaining module is specifically configured to: acquiring a first feature vector according to the first network structure, wherein the first feature vector is used for representing the first network structure; acquiring a second feature vector according to the second network structure, wherein the second feature vector is used for representing the second network structure; and acquiring the similarity according to the first feature vector and the second feature vector.

With reference to the third aspect, in some implementation manners of the third aspect, the second obtaining module is specifically configured to: encoding the first network structure by using an encoder to obtain the first feature vector, wherein the encoder is used for encoding to obtain the feature vector representing the network structure; and encoding the second network structure by using the encoder to obtain the second feature vector.

With reference to the third aspect, in certain implementations of the third aspect, the encoder is trained by: decoding the second feature vector by using a decoder to obtain a third network structure, wherein the decoder is used for decoding to obtain the network structure represented by the feature vector; training the encoder according to a difference between the second network structure and the third network structure.

Alternatively, the decoder may be trained during the training of the encoder.

With reference to the third aspect, in some implementations of the third aspect, the training module is specifically configured to: determining the performance of the first network structure according to the first feature vector, the second feature vector and the similarity; training the neural network predictor according to the performance of the first network structure and the label.

In this embodiment of the present application, the first feature vector, the second feature vector, and the relationship between the first network structure and the second network structure (e.g., the similarity between the two network structures) are used to predict the performance of the first network structure, so that the performance information of the (predicted) first network structure can be more accurate, and at this time, the neural network predictor is trained according to the performance of the first network structure and the label, so that the training effect of the neural network predictor, that is, the prediction accuracy of the trained neural network predictor can be improved.

With reference to the third aspect, in certain implementations of the third aspect, the similarity is a distance between the first feature vector and the second feature vector.

With reference to the third aspect, in certain implementations of the third aspect, the neural network predictor is a graph convolution neural network.

With reference to the third aspect, in certain implementations of the third aspect, the neural network predictor is to predict a performance of a network structure of a target neural network, the target neural network being used for image processing.

Wherein the target neural network may be a Convolutional Neural Network (CNN).

In a fourth aspect, there is provided an image processing apparatus comprising:

the acquisition module is used for acquiring an image to be processed; the image processing module is used for carrying out image processing on the image to be processed by using a neural network; wherein the neural network is determined according to a neural network predictor, and the neural network predictor is obtained by training through the method in any one of the implementation manners of the first aspect.

In a fifth aspect, a method of training a neural network is provided, the method comprising:

acquiring a first network structure of a first neural network and a second network structure of a second neural network, wherein the first network structure is a labeled network structure, and the label is used for indicating the performance of the first network structure; acquiring the similarity between the first network structure and the second network structure; training the neural network according to the first network structure, the second network structure, the similarity and the label.

In the embodiment of the present application, the relationship between the first network structure and the second network structure (e.g., the similarity between the first network structure and the second network structure) is used to assist in training the neural network, so that the training effect of the neural network can be improved with a small amount of label data (e.g., at least one network structure with a performance label).

With reference to the fifth aspect, in some implementations of the fifth aspect, the obtaining the similarity between the first network structure and the second network structure includes: acquiring a first feature vector according to the first network structure, wherein the first feature vector is used for representing the first network structure; acquiring a second feature vector according to the second network structure, wherein the second feature vector is used for representing the second network structure; and acquiring the similarity according to the first feature vector and the second feature vector.

In the embodiment of the application, the similarity is obtained through the first feature vector and the second feature vector, so that the similarity can more accurately describe the relationship between the two network structures, and therefore, the training effect of the neural network can be further improved by using the similarity.

With reference to the fifth aspect, in some implementations of the fifth aspect, the obtaining a first feature vector according to the first network structure includes: encoding the first network structure by using an encoder to obtain the first feature vector, wherein the encoder is used for encoding to obtain the feature vector representing the network structure; the obtaining a second feature vector according to the second network structure includes: and encoding the second network structure by using the encoder to obtain the second feature vector.

With reference to the fifth aspect, in certain implementations of the fifth aspect, the encoder is trained by: decoding the second feature vector by using a decoder to obtain a third network structure, wherein the decoder is used for decoding to obtain the network structure represented by the feature vector; training the encoder according to a difference between the second network structure and the third network structure.

Alternatively, the decoder may be trained during the training of the encoder.

With reference to the fifth aspect, in some implementations of the fifth aspect, the training the neural network predictor according to the first network structure, the second network structure, the similarities, and the labels includes: determining the performance of the first network structure according to the first feature vector, the second feature vector and the similarity; training the neural network predictor according to the performance of the first network structure and the label.

In this embodiment of the present application, the performance of the first network structure is predicted by using the first feature vector, the second feature vector and the relationship between the first network structure and the second network structure (e.g., the similarity between the two network structures), so that the performance of the (predicted) first network structure can be more accurate, and at this time, the training of the neural network according to the performance of the first network structure and the label can improve the training effect of the neural network.

With reference to the fifth aspect, in certain implementations of the fifth aspect, the similarity is a distance between the first feature vector and the second feature vector.

In this embodiment, the distance between the first feature vector and the second feature vector may more accurately represent the similarity between the first network structure and the second network structure, and the training effect of the neural network may be further improved by training the neural network according to the similarity.

With reference to the fifth aspect, in some implementations of the fifth aspect, the neural network is a graph convolution neural network (GCN).

In the embodiment of the present application, by using the graph convolution neural network, the relationship between the first network structure and the second network structure (e.g., the similarity between the first network structure and the second network structure) can be better utilized in the training process, so that the training effect of the neural network can be improved.

With reference to the fifth aspect, in certain implementations of the fifth aspect, the neural network is used to predict performance of a network structure of a target neural network, the target neural network being used for image processing.

Wherein the target neural network may be a Convolutional Neural Network (CNN).

In a sixth aspect, an apparatus for training a neural network predictor is provided, the apparatus comprising: a memory for storing a program; a processor for executing the memory-stored program, the processor being configured to perform the method of any one of the implementations of the first aspect when the memory-stored program is executed.

The processor in the sixth aspect may be a Central Processing Unit (CPU), or may be a combination of a CPU and a neural network computing processor, where the neural network computing processor may include a Graphics Processing Unit (GPU), a neural Network Processing Unit (NPU), a Tensor Processing Unit (TPU), and the like. Wherein, the TPU is an artificial intelligence accelerator application specific integrated circuit which is completely customized for machine learning by google (google).

In a seventh aspect, there is provided an image processing apparatus comprising: a memory for storing a program; a processor for executing the memory-stored program, the processor being configured to perform the method of any one of the implementations of the second aspect when the memory-stored program is executed.

The processor in the seventh aspect may be a Central Processing Unit (CPU), or may be a combination of a CPU and a neural network computing processor, where the neural network computing processor may include a Graphics Processing Unit (GPU), a neural Network Processing Unit (NPU), a Tensor Processing Unit (TPU), and the like. Wherein, the TPU is an artificial intelligence accelerator application specific integrated circuit which is completely customized for machine learning by google (google).

In an eighth aspect, there is provided an apparatus for training a neural network, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to perform the method in any one of the implementation manners of the fifth aspect.

The processor in the above eighth aspect may be a Central Processing Unit (CPU), or may be a combination of a CPU and a neural network computing processor, where the neural network computing processor may include a Graphics Processing Unit (GPU), a neural Network Processing Unit (NPU), a Tensor Processing Unit (TPU), and the like. Wherein, the TPU is an artificial intelligence accelerator application specific integrated circuit which is completely customized for machine learning by google (google).

In a ninth aspect, a computer readable medium is provided, which stores program code for execution by a device, the program code comprising instructions for performing the method of any one of the implementations of the first aspect, or the second aspect, or the third aspect.

A tenth aspect provides a computer program product comprising instructions for causing a computer to perform the method of any one of the implementations of the first or second aspect when the computer program product runs on a computer.

In an eleventh aspect, a chip is provided, where the chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface to perform the method in any one implementation manner of the first aspect, the second aspect, or the third aspect.

Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the method in any one implementation manner of the first aspect, the second aspect, or the third aspect.

The chip may be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).

Drawings

Fig. 1 is a schematic structural diagram of a system architecture according to an embodiment of the present application.

Fig. 2 is a schematic structural diagram of a convolutional neural network provided in an embodiment of the present application.

Fig. 3 is a schematic diagram of a chip hardware structure according to an embodiment of the present disclosure.

Fig. 4 is a schematic structural diagram of another system architecture provided in the embodiment of the present application.

FIG. 5 is a schematic flow chart diagram of a method for training a neural network predictor provided by one embodiment of the present application.

FIG. 6 is a schematic flow chart diagram of a method for training a neural network predictor provided in another embodiment of the present application.

FIG. 7 is a schematic block diagram of a method of training a neural network predictor provided by one embodiment of the present application.

Fig. 8 is a schematic flowchart of an image processing method according to an embodiment of the present application.

Fig. 9 is a schematic block diagram of an apparatus for training a neural network predictor provided in an embodiment of the present application.

Fig. 10 is a schematic block diagram of an image processing apparatus provided in an embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

The embodiment of the application can be applied to many fields in artificial intelligence, such as intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe cities and other fields.

Specifically, the embodiment of the application can be applied to photographing, video recording, safe cities, human-computer interaction and other scenes needing image processing, such as image classification, image segmentation, image detection, image super-segmentation and the like.

The method for training the neural network predictor in the embodiment of the application can be applied to neural network structure search (NAS), and the performance of the network structure can be predicted quickly and accurately by using the trained neural network predictor, so that the time spent on searching the neural network structure is saved. Meanwhile, the network structure obtained by searching the neural network structure can be used for constructing the neural network applied to the image processing scene, and the image processing effect is improved.

For example, the neural network constructed by the method in the embodiment of the application can be applied to the scene of image classification, and the accuracy and efficiency of image classification can be improved by using the neural network, so that the user experience can be improved.

For another example, the neural network constructed by the method in the embodiment of the present application may be applied to a scene of image recognition, and the accuracy and efficiency of image recognition may be improved by using the neural network, so that user experience may be improved.

It should be understood that the method in the embodiment of the present application is not limited to the above two scenarios, and the neural network constructed by the method in the embodiment of the present application may also be used in photographing, video recording, safe cities, human-computer interaction, and other scenarios requiring image processing, such as image classification, image segmentation, image detection, image super-segmentation, and the like.

The method for training the neural network predictor in the embodiment of the present application may also be applied to other scenarios in which performance of a network structure needs to be predicted, or the method for training the neural network predictor in the embodiment of the present application may also be applied to other scenarios in which a neural network needs to be trained, or the method for training the neural network predictor in the embodiment of the present application may also be applied to other scenarios in which a neural network (for example, speech recognition, machine translation, semantic segmentation, and the like) needs to be used, which is not limited in the embodiment of the present application.

It should be noted that the image in the embodiment of the present application may be a still image (or referred to as a still picture) or a moving image (or referred to as a moving picture), for example, the image in the present application may be a video or a moving picture, or the image in the present application may also be a still picture or a photo. For convenience of description, the present application collectively refers to a still image or a moving image as an image in the following embodiments.

The embodiments of the present application relate to a large number of related applications of neural networks, and in order to better understand the scheme of the embodiments of the present application, the following first introduces related terms and concepts of neural networks that may be related to the embodiments of the present application.

(1) Neural network

The neural network may be composed of neural units, which may be referred to as x_sAnd an arithmetic unit with intercept 1 as input, the output of which can be expressed by the following formula:

wherein s is 1, 2, … … n, n is a natural number greater than 1, and W is_sIs x_sB is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

(2) Deep neural network

Deep Neural Networks (DNNs), also called multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer.

Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein the content of the first and second substances,

is the input vector of the input vector,

is the output vector of the output vector,

is an offset vector, W is a weight matrix (also called coefficient), α () is an activation function

Obtaining the output vector through such simple operation

Due to the large number of DNN layers, the coefficient W and the offset vector

The number of the same is also large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.

In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as

Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.

(3) Convolutional neural network

A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Recurrent Neural Networks (RNNs) are used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are all connected, and each node between every two layers is connectionless. Although solving many problems, the common neural network still has no capability to solve many problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. The RNN is called a recurrent neural network, i.e., the current output of a sequence is also related to the previous output. The concrete expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more but connected, and the input of the hidden layer not only comprises the output of the input layer but also comprises the output of the hidden layer at the last moment. In theory, RNNs can process sequence data of any length. The training for RNN is the same as for conventional CNN or DNN.

Now that there is a convolutional neural network, why is a circular neural network? For simple reasons, in convolutional neural networks, there is a precondition assumption that: the elements are independent of each other, as are inputs and outputs, such as cats and dogs. However, in the real world, many elements are interconnected, such as stock changes over time, and for example, a person says: i like to travel, wherein the favorite place is Yunnan, and the opportunity is in future to go. Here, to fill in the blank, humans should all know to fill in "yunnan". Because humans infer from the context, but how do the machine do it? The RNN is generated. RNNs aim at making machines capable of memory like humans. Therefore, the output of the RNN needs to be dependent on the current input information and historical memory information.

(5) Loss function

In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are preset for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be lower, and the adjustment is continuously carried out until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.

(6) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.

(7) Pixel value

The pixel value of the image may be a Red Green Blue (RGB) color value and the pixel value may be a long integer representing a color. For example, the pixel value is 256 Red +100 Green +76Blue, where Blue represents the multiplication, Green represents the Green component, and Red represents the Red component. In each color component, the smaller the numerical value, the lower the luminance, and the larger the numerical value, the higher the luminance. For a grayscale image, the pixel values may be grayscale values.

As shown in fig. 1, the present embodiment provides a system architecture 100. In fig. 1, a data acquisition device 160 is used to acquire training data. For the method for training a neural network predictor according to the embodiment of the present application, the training data may include an unlabeled network structure, a labeled network structure, and a true value (GT) corresponding to the labeled network structure, where the true value corresponding to the labeled network structure may be a performance of the labeled network structure (for example, a performance of the labeled network structure on a specified data set) that is artificially pre-labeled.

After the training data is collected, data collection device 160 stores the training data in database 130, and training device 120 trains target model/rule 101 based on the training data maintained in database 130.

The following describes the training device 120 obtaining the target model/rule 101 based on the training data, the training device 120 processes the input network structures (e.g., the unlabeled network structure and the labeled network structure) to obtain the performance of the network structure, and compares the performance of the network structure with the true value corresponding to the network structure (e.g., the performance of the labeled network structure with the true value corresponding to the labeled network structure) until the difference between the performance of the network structure output by the training device 120 and the true value corresponding to the network structure is smaller than a certain threshold, thereby completing the training of the target model/rule 101 (i.e., the neural network predictor).

The target model/rule 101 can be used for realizing a neural network predictor obtained after training, that is, the performance of the network structure can be predicted by inputting the network structure into the target model/rule 101 after relevant preprocessing. The predicted performance of the network structure may be used to determine a neural network, which may be used for image processing.

It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices. It should be noted that, the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130, and may also obtain the training data from the cloud or other places for performing the model training.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 1, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR), a vehicle-mounted terminal, or a server or a cloud device. In fig. 1, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include: network configuration of client device input.

The preprocessing module 113 and the preprocessing module 114 are configured to perform preprocessing according to input data (such as a network structure) received by the I/O interface 112, and in this embodiment, the input data may be processed directly by the computing module 111 without the preprocessing module 113 and the preprocessing module 114 (or only one of them may be used).

In the process that the execution device 110 preprocesses the input data or in the process that the calculation module 111 of the execution device 110 executes the calculation or other related processes, the execution device 110 may call the data, the code, and the like in the data storage system 150 for corresponding processes, and may store the data, the instruction, and the like obtained by corresponding processes in the data storage system 150.

Finally, the I/O interface 112 returns the processing result, such as the performance of the network configuration obtained as described above, to the client device 140, thereby providing it to the user.

It should be noted that the training device 120 may generate corresponding target models/rules 101 for different targets or different tasks based on different training data, and the corresponding target models/rules 101 may be used to achieve the targets or complete the tasks, so as to provide the user with the required results.

For example, the target model/rule 101 in the embodiment of the present application may specifically be an image processing apparatus in the embodiment of the present application, and the image processing apparatus may be determined according to the performance of the network structure predicted by the neural network predictor. For an image processing apparatus, the training data may include an image to be processed and a true value corresponding to the image to be processed.

In the case shown in fig. 1, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data inputted to the I/O interface 112 and the output result outputted from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.

It should be noted that fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 1, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110.

As shown in fig. 1, a target model/rule 101 is obtained by training according to a training device 120, where the target model/rule 101 may be a neural network predictor obtained by training based on a method for training a neural network predictor in the present application in this embodiment, or the target model/rule 101 may also be an image processing apparatus in this embodiment.

Specifically, the neural network predictor obtained after training based on the method for training the neural network predictor in the application can be used for searching the neural network, and the neural network can be used for image processing, voice processing, natural language processing and the like. For example, the neural network predictor may be used to search for a Convolutional Neural Network (CNN), a Deep Convolutional Neural Network (DCNN), and/or a Recurrent Neural Network (RNNS), among others.

Since CNN is a very common neural network, the structure of CNN will be described in detail below with reference to fig. 2, taking image processing as an example. As described in the introduction of the basic concept above, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning (deep learning) architecture, where the deep learning architecture refers to performing multiple levels of learning at different abstraction levels through a machine learning algorithm. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to data (e.g., images) input thereto.

As shown in fig. 2, Convolutional Neural Network (CNN)200 may include an input layer 210, a convolutional/pooling layer 220 (where pooling is optional), and a neural network layer 230. The relevant contents of these layers are described in detail below.

Convolutional layer/pooling layer 220:

and (3) rolling layers:

the convolutional layer/pooling layer 220 shown in fig. 2 may include layers such as example 221 and 226, for example: in one implementation, 221 is a convolutional layer, 222 is a pooling layer, 223 is a convolutional layer, 224 is a pooling layer, 225 is a convolutional layer, 226 is a pooling layer; in another implementation, 221, 222 are convolutional layers, 223 is a pooling layer, 224, 225 are convolutional layers, and 226 is a pooling layer. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.

The inner working principle of a convolutional layer will be described below by taking convolutional layer 221 as an example.

Convolution layer 221 may include a number of convolution operators, also called kernels, whose role in image processing is to act as a filter to extract specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on the image, the weight matrix is usually processed pixel by pixel (or two pixels by two pixels … …, depending on the value of the step size stride) in the horizontal direction on the input image, so as to complete the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same size (row by column), i.e. a plurality of matrices of the same type, are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by "plurality" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The plurality of weight matrices have the same size (row × column), the feature maps extracted by the plurality of weight matrices having the same size also have the same size, and the extracted feature maps having the same size are combined to form the output of the convolution operation.

The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can be used to extract information from the input image, so that the convolutional neural network 200 can make correct prediction.

When convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (e.g., 221) tends to extract more general features, which may also be referred to as low-level features; as the depth of convolutional neural network 200 increases, the more convolutional layers (e.g., 226) that go further back extract more complex features, such as features with high levels of semantics, the more highly semantic features are more suitable for the problem to be solved.

Pooling layer/pooling layer 220:

since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, where the layers 221-226, as illustrated by 220 in fig. 2, may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.

The neural network layer 230:

after processing by convolutional layer/pooling layer 220, convolutional neural network 200 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to generate one or a set of the required number of classes of output using the neural network layer 230. Accordingly, a plurality of hidden layers (231, 232 to 23n shown in fig. 2) and an output layer 240 may be included in the neural network layer 230, and parameters included in the hidden layers may be pre-trained according to related training data of a specific task type, for example, the task type may include image recognition, image classification, image super-resolution reconstruction, and the like.

After the hidden layers in the neural network layer 230, i.e. the last layer of the whole convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to the classification cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e. the propagation from the direction 210 to 240 in fig. 2 is the forward propagation) of the whole convolutional neural network 200 is completed, the backward propagation (i.e. the propagation from the direction 240 to 210 in fig. 2 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 200, and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

It should be noted that the convolutional neural network 200 shown in fig. 2 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models.

In the embodiment of the present application, the neural network predictor obtained after training based on the method for training a neural network predictor in the present application may be used to search (a network structure of) a neural network, and the neural network obtained by searching the neural network structure may include the convolutional neural network 200 shown in fig. 2; alternatively, the image processing apparatus in the embodiment of the present application may include the convolutional neural network 200 shown in fig. 2, and the image processing apparatus may perform image processing on the image to be processed to obtain a processing result of the image to be processed.

Fig. 3 is a hardware structure of a chip provided in an embodiment of the present application, where the chip includes a neural network processor 50. The chip may be provided in the execution device 110 as shown in fig. 1 to complete the calculation work of the calculation module 111. The chip may also be disposed in the training apparatus 120 as shown in fig. 1 to complete the training work of the training apparatus 120 and output the target model/rule 101. The algorithms for the various layers in the convolutional neural network shown in fig. 2 can all be implemented in a chip as shown in fig. 3.

The neural network processor NPU 50 is mounted as a coprocessor on a main CPU (host CPU), which allocates tasks. The core portion of the NPU is an arithmetic circuit 503, and the controller 504 controls the arithmetic circuit 503 to extract data in a memory (weight memory or input memory) and perform an operation.

In some implementations, the arithmetic circuit 503 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuitry 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 503 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 503 fetches the data corresponding to the matrix B from the weight memory 502 and buffers it in each PE in the arithmetic circuit 503. The arithmetic circuit 503 takes the matrix a data from the input memory 501 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in an accumulator (accumulator) 508.

The vector calculation unit 507 may further process the output of the operation circuit 503, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 507 may be used for network calculation of non-convolution/non-FC layers in a neural network, such as pooling (Pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector calculation unit 507 can store the processed output vector to the unified buffer 506. For example, the vector calculation unit 507 may apply a non-linear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 507 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 503, for example, for use in subsequent layers in a neural network.

The unified memory 506 is used to store input data as well as output data.

The weight data directly passes through a memory unit access controller 505 (DMAC) to transfer input data in the external memory to the input memory 501 and/or the unified memory 506, store the weight data in the external memory in the weight memory 502, and store the data in the unified memory 506 in the external memory.

A Bus Interface Unit (BIU) 510, configured to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through a bus.

An instruction fetch buffer 509 connected to the controller 504 for storing instructions used by the controller 504;

the controller 504 is configured to call the instruction cached in the instruction storage 509 to implement controlling the working process of the operation accelerator.

Generally, the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are On-Chip memories, and the external memory is a memory external to the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM), or other readable and writable memories.

The operations of the layers in the convolutional neural network shown in fig. 2 may be performed by the operation circuit 503 or the vector calculation unit 307.

The training device 120 in fig. 1 described above can perform the steps of the method for training a neural network predictor according to the embodiment of the present application, the execution device 110 in fig. 1 can perform the steps of the image processing method according to the embodiment of the present application, the CNN model shown in fig. 2 and the chip shown in fig. 3 can also be used to perform the steps of the image processing method according to the embodiment of the present application, and the chip shown in fig. 3 can also be used to perform the steps of the method for training a neural network predictor according to the embodiment of the present application.

As shown in fig. 4, the present embodiment provides a system architecture 300. The system architecture includes a local device 301, a local device 302, and an execution device 210 and a data storage system 250, wherein the local device 301 and the local device 302 are connected with the execution device 210 through a communication network.

The execution device 210 may be implemented by one or more servers. Optionally, the execution device 210 may be used with other computing devices, such as: data storage, routers, load balancers, and the like. The execution device 210 may be disposed on one physical site or distributed across multiple physical sites. The execution device 210 may use data in the data storage system 250 or call program code in the data storage system 250 to implement the method for training the neural network predictor or the image processing method according to the embodiment of the present application.

Specifically, the execution device 210 may perform the following process:

The process execution device 210 can build a neural network predictor, and the neural network predictor can be used for searching a neural network, and the neural network can be used for image processing, voice processing, natural language processing and the like.

Alternatively, the execution device 210 may also execute the following process:

acquiring an image to be processed; using a neural network to perform image processing on the image to be processed; wherein the neural network is determined from a neural network predictor trained by the method of any one of claims 1 to 8.

The execution device 210 can be constructed as an image processing apparatus that can be used for image processing by the above-described procedure.

The user may operate respective user devices (e.g., local device 301 and local device 302) to interact with the execution device 210. Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and so forth.

The local devices of each user may interact with the enforcement device 210 via a communication network of any communication mechanism/standard, such as a wide area network, a local area network, a peer-to-peer connection, etc., or any combination thereof.

In one implementation, the local device 301 or the local device 302 acquires relevant parameters of the neural network predictor from the execution device 210, deploys the neural network predictor on the local device 301 or the local device 302, and predicts the performance of the network structure by using the neural network predictor.

In another implementation, the neural network predictor may be directly deployed on the execution device 210, and the execution device 210 predicts the performance of the network structure by obtaining the network structure from the local device 301 and the local device 302 and using the neural network predictor.

In one implementation manner, the local device 301 or the local device 302 acquires the relevant parameters of the image processing apparatus from the execution device 210, deploys the image processing apparatus on the local device 301 or the local device 302, and performs image processing on the image to be processed by using the image processing apparatus.

In another implementation, the execution device 210 may directly deploy an image processing apparatus, and the execution device 210 performs image processing on the image to be processed by acquiring the image to be processed from the local device 301 and the local device 302.

That is, the execution device 210 may also be a cloud device, and at this time, the execution device 210 may be deployed in the cloud; alternatively, the execution device 210 may also be a terminal device, in which case, the execution device 210 may be deployed at a user terminal side, which is not limited in this embodiment of the application.

The following describes a method for training a neural network predictor and an image processing method in detail in accordance with an embodiment of the present invention with reference to the accompanying drawings.

FIG. 5 is a schematic flow chart diagram of a method of training a neural network predictor of the present application. The method 500 of training a neural network predictor in FIG. 5 may include step 510, step 520, and step 530.

In some examples, the method 500 may be performed by the execution device 120 of fig. 1, the chip shown in fig. 3, and the execution device 210 of fig. 4, among other devices.

S510, a first network structure of the first neural network and a second network structure of the second neural network are obtained.

Wherein the first network structure may be a tagged network structure, and the tag may be used to indicate the performance of the first network structure.

The label herein is understood to be a real label corresponding to the network structure, and the label can be used to represent the real performance corresponding to the network structure.

For example, a neural network formed by a network structure may be trained, and when the neural network is trained to converge, the true performance corresponding to the network structure may be determined according to the converged neural network, so as to obtain a performance label corresponding to the network structure.

At present, the method for training the neural network to be converged and then obtaining the performance label is time-consuming.

At this time, the network structure with the performance label can be obtained by using the performance label corresponding to the network structure and by a manual labeling mode.

Optionally, the second network structure may be an untagged network structure.

For example, in S510, a plurality of network structures may also be obtained, wherein the plurality of network structures may include a small number of network structures with performance labels and a large number of network structures without performance labels.

S520, obtaining the similarity between the first network structure and the second network structure.

Wherein the similarity may be used to represent a degree of similarity between the first network structure and the second network structure.

It should be noted that, in the embodiment of the present application, a plurality of network structures may be obtained in the above step S510, and at this time, a similarity between partial network structures (for example, at least two network structures) in the plurality of network structures may be obtained; alternatively, the similarity between every two of all the network structures in the multiple network structures may also be obtained, which is not limited in this embodiment of the application.

For example, a similarity between any two network structures of the plurality of network structures may be obtained.

As another example, a similarity between (each) tagged network structure and (each) untagged network structure of the plurality of network structures is obtained.

Optionally, a first feature vector may be obtained according to the first network structure; acquiring a second feature vector according to the second network structure, wherein the second feature vector is used for representing the second network structure; and acquiring the similarity according to the first feature vector and the second feature vector.

Wherein the first feature vector may be used to represent the first network structure, e.g. the first feature vector represents a characteristic of the first network structure (i.e. a network feature), or the first feature vector may be used to represent the computational power of the first network structure.

The second feature vector may be used to represent the second network structure, e.g. the second feature vector represents a characteristic of the second network structure (i.e. a network feature), or the second feature vector may be used to represent the computational power of the second network structure.

It should be noted that, through the feature vectors of different network structures, the characteristics of different network structures (e.g., the computing capabilities of different network structures) can be more accurately embodied.

Meanwhile, the feature vector can be used to represent a network structure, and the neural network (e.g., neural network predictor) can also process the network structure (represented by the feature vector) through the feature vector.

Optionally, the first feature vector of the first network structure may be embedded (embedding), or a vector feature similar to the embedding form.

The second feature vector of the second network structure is similar to the first feature vector of the first network structure, and is not described herein again.

Optionally, the similarity may be a distance between the first feature vector and the second feature vector. For example, the similarity may be a cosine distance between the first feature vector and the second feature vector.

Further, an encoder may be used to encode the first network structure, resulting in the first feature vector; the encoder may be used to encode the second network structure resulting in the second feature vector.

The encoder may be configured to encode a feature vector representing a network structure, and the encoder may be implemented by a Neural Network (NN).

For example, the Encoder may be an Encoder in an Auto-Encoder (AE), and the Auto-Encoder may further include a decoder.

The encoder may be a Recurrent Neural Network (RNN).

In the embodiment of the application, the network structure is encoded by the encoder, so that the feature vector of the network structure can be obtained in a portable manner.

Alternatively, the encoder may be trained by:

decoding the second feature vector by using a decoder to obtain a third network structure, wherein the decoder is used for decoding to obtain the network structure represented by the feature vector; training the encoder according to a difference between the second network structure and the third network structure.

Alternatively, the decoder may be trained during the training of the encoder.

In this embodiment, the second eigenvector is decoded by a decoder to obtain a third network structure, and the encoder and the decoder can be conveniently trained according to the difference between the second network structure and the third network structure without labeling data (for example, without labeling eigenvectors output by the encoder).

For example, assuming that the second network structure is expected to be as consistent as possible with the third network structure, the encoder and the decoder can be conveniently trained by using the difference between the second network structure and the third network structure as a loss value without manually labeling the feature vector output by the encoder.

S530, training the neural network predictor according to the first network structure, the second network structure, the similarity and the label.

Wherein the neural network predictor may be used to predict the performance of the network structure.

Alternatively, the neural network predictor may be used to predict the performance of the network structure of the target neural network, which may be used for image processing, speech processing, natural language processing, and the like.

For example, the target neural network may be a convolutional neural network as shown in fig. 2, and the target neural network may be used for image classification, image segmentation, image detection, image super-segmentation, and the like.

Alternatively, the neural network predictor may be a graph convolutional neural network (GCN).

Optionally, the training the neural network predictor according to the first network structure, the second network structure, the similarity and the label may include:

determining the performance of the first network structure according to the first feature vector, the second feature vector and the similarity; training the neural network predictor according to the performance of the first network structure and the label.

For example, a plurality of network structures may be obtained in S510, and the performance of the plurality of network structures may be obtained according to the feature vectors of the plurality of network structures and the similarity; the neural network predictor is trained using as a loss value a difference between a performance of a labeled network structure in the plurality of network structures and a corresponding label of the network structure (e.g., the label may be used to indicate a true performance of the network structure).

In the process of training the neural network predictor, the first feature vector, the second feature vector and the relationship between the first network structure and the second network structure (e.g., the similarity between the two network structures) are used for predicting the performance of the first network structure, so that the performance information of the first network structure (obtained by prediction) can be more accurate, and at the moment, the neural network predictor is trained according to the performance of the first network structure and the label, so that the training effect of the neural network predictor can be improved, namely the prediction accuracy of the trained neural network predictor is improved.

In the method 500 shown in fig. 5, the relationship between the first network structure and the second network structure (e.g., the similarity between the first network structure and the second network structure) is used to assist in training the neural network predictor, so that the training effect of the neural network predictor can be improved, i.e., the prediction accuracy of the trained neural network predictor can be improved, with a small amount of labeled data (e.g., at least one labeled network structure).

FIG. 6 is a schematic flow chart diagram of a method of training a neural network predictor of the present application. The method 600 of training a neural network predictor in FIG. 6 may include steps 610, 620, and 630.

In some examples, the method 600 may be performed by the execution device 120 of fig. 1, the chip shown in fig. 3, and the execution device 210 of fig. 4.

S610, obtaining the network characteristics of the network structure.

Optionally, network characteristics of a plurality of network fabrics in the set of network fabrics may be extracted, which may include a small number of network fabrics with performance tags and a large number of network fabrics without performance tags.

The network feature of the network structure may be a feature vector of the network structure in the method 500 in fig. 5, which may specifically refer to the description in the method 500 and is not described herein again.

For example, the set of network structures X includes N network structures (N network structures include N)_lNetwork architecture with performance tags and N_uA network structure without a performance tag), a set of network structures X ═ X^l∪X^uWherein X is^lIs N_lA network structure with a capability label is provided,

Y^lis N_lThe network structure with the performance tags corresponds to the performance tags,

X^uis N_uA network structure without a capability tag,

N_l、N_uare all positive integers.

Optionally, a self-encoder may be used to extract network features of N network fabrics in the set X of network fabrics, which may include an encoder E and a decoder D.

Wherein the encoder E and the decoder D may be implemented by a Neural Network (NN). For example, the encoder E and the decoder D may be a Recurrent Neural Network (RNN).

For example, the encoder E may be used to extract N network structures in the network structure set X for encoding, so as to obtain network features of the N network structures.

For another example, the decoder D may be used to decode the network features of the N network structures to obtain N candidate network structures, where the N candidate network structures correspond to the N network structures; the encoder E and the decoder D may be trained on N candidate network structures.

Alternatively, the decoder D may also be used, only for the N_lDecoding the network characteristics of the network structure with the performance label to obtain N_lA candidate network structure, N_lA candidate network structure corresponding to the N_lA network fabric with a performance label; can be based on N_lA candidate network structure, training the encoder E and the decoder D.

Alternatively, the following loss function may be constructed to train the self-encoder (i.e., the encoder E and the decoder D):

wherein, W_eAs a parameter of the encoder, W_dAre parameters of the decoder and are,

which represents the output of the encoder, and,

representing the output of the decoder, N_l、N_uAre all positive integers.

At this time, the output of the encoder may be adjusted

As a network feature of the extracted network structure, for convenience of description, the output of the encoder may be subsequently used

Abbreviated as E (x)_i)。

The encoder may be the encoder in the method 500 in fig. 5, and the decoder may be the decoder in the method 500 in fig. 5, and the specific description may refer to an embodiment in the method 500, which is not repeated here.

S620, constructing a network relation graph according to the network characteristics.

Alternatively, a network relationship graph may be constructed according to the network characteristics acquired in S610.

In S610, N network features corresponding to the N network structures may be obtained, and then an NxN network relationship diagram may be constructed according to the N network features.

The NxN network relationship graph may include the similarity between each of the N network features and other N-1 network features, and the similarity between each network feature and its own network structure.

For example, the prepaid range of similarity may be [0,1], where 0 may indicate that the two are completely different (or that the similarity between the two is the lowest) and 1 may indicate that the two are completely the same (or that the two are the same).

Since each network feature is identical to its own network structure, the similarity of each network feature to its own network structure may be 1.

For example, for network structure X in network structure set X_iAnd network architecture x_jCan be represented by s (x)_i,x_j) Representing a network structure x_iAnd network architecture x_jSimilarity between them, s (x)_i,x_j) Can be calculated by the following distance formula:

wherein d (-) is an arbitrary distance metric function, σ is a hyper-parameter, and exp (-) is an exponential function.

The meaning of the above formula is that for a given network structure x_iNetwork characteristic E (x)_i) And network architecture x_jNetwork characteristic E (x)_j) Network characteristic E (x)_i) And network characteristics E (x)_j) The farther the distance between the two is, the lower the similarity is; otherwise, the network characteristic E (x)_i) And network characteristics E (x)_j) The closer the distance therebetween, the higher the similarity.

The similarity may be the similarity in the method 500 in fig. 5, and the specific description may refer to an embodiment in the method 500, which is not repeated here.

Assuming that there are N network structures in total, and a similarity can be calculated between every two of the N network structures according to the above method, we obtain an NxN relationship graph. Each element in the relationship graph represents the similarity between (network characteristics of) two network structures.

And S630, predicting the performance through a graph convolution neural network.

Wherein the graph convolutional neural network can be regarded as a neural network predictor. The graph convolution neural network may be the neural network predictor in the method 500 in fig. 5, and the specific description may refer to the embodiment in the method 500, which is not repeated here.

As shown in fig. 7, the network characteristics E (x) of the N network structures output by the encoder may be set_i) And NxN network relationship maps (e.g., NxN network relationship maps may be NxN matrices) are input into the graph convolution neural network, and the performance of N network structures may be obtained.

For example, assume a network structure of inputs

Performance labels of

Predicted performance is

It is desirable to predict a property of

Can be as close as possible to the true value

The following loss function may be constructed to train the graph convolution neural network (i.e., neural network predictor):

alternatively, the following loss functions may be constructed while training the encoder, decoder, and graph convolution neural network:

wherein, W_eAs a parameter of the encoder, W_dAs a parameter of the decoder, W_pParameters for the graph convolution neural network, L_rcAs a loss function from the encoder, L_rgFor the parameters of the graph convolution neural network, λ is the hyperparameter, and λ is used to adjust the weights of the two loss functions.

Next, the method of training the neural network predictor in the embodiment of the present application can be tested on the data set NAS-Bench-101 by several methods as described below.

Wherein the data set NAS-Bench-101 may contain about 423000 different network structures, and the actual accuracy of these network structures (i.e. about 423000 different network structures contained in the data set NAS-Bench-101) obtained after training on the data set CIFAR-10.

The method comprises the following steps:

in the first method, in order to fully reflect the effects of the neural network predictors obtained by different methods, the effects of the neural network predictors can be evaluated by using the following indexes:

(1)Kendall’s Tau(KTau)：

KTau is an index on sorting, and the value range is [ -1,1 ]. Wherein, when the predicted ordering of a set of samples is identical to the true ordering, KTau ═ 1; KTau is-1 when the prediction rank is completely opposite to the true rank, and around 0 when the prediction rank is not related to the true rank.

(2)mean square error(MSE)：

MSE is used to evaluate the prediction accuracy of a single sample point. For a given sample value (or sample point), the more accurate the predicted value (predicted performance) is, the smaller the MSE, and when the predicted value is identical to the true value, the MSE is 0.

(3)correlation coefficient(r)：

And r is a correlation coefficient, the value range is [ -1,1], and the method is used for evaluating the correlation degree between the predicted value (predicted performance) and the true value. The larger the value of r, the more accurate the predicted value (predicted performance) can be explained.

The following table 1 shows how many values of KTau, MSE, r of the neural network predictor obtained by different methods are respectively in the case of using different amounts of tagged data in the NAS-Bench-101 dataset, as shown in the following table 1:

TABLE 1

The labeled samples in table 1 are the network structures with performance labels in the above embodiments.

As can be seen from Table 1, the effect of the method of the present application (three indices of KTau, MSE, r) is superior to that of the other methods (Peephole 5 and E2EPP 32) in the case of the number of labeled samples of 1000, 10000 and 100000.

The second method comprises the following steps:

in the second method, 1000 labeled samples and all unlabeled samples can be used as training data to train a neural network predictor, and the obtained predictor is used for searching a neural network, and the search result is shown in the following table 2:

TABLE 2

Method of producing a composite material	Accuracy of measurement	Ranking position
			Peephole[5]	93.41±0.34	1.64
E2EPP[32]	93.77±0.13	0.15
			This application	94.01±0.12	0.01

The precision in table 2 may refer to the precision of the searched network structure, for example, the precision may be Top-1 Accuracy (%), and the Ranking position may be the Ranking position of the network structure in the current search space, for example, the Ranking position may be Ranking (%).

As can be seen from Table 2, the method in the present application is significantly better than other methods (Peephole [5] and E2EPP [32]), regardless of the accuracy of the searched network structure or the ranking position of the network structure in the current search space.

The third method comprises the following steps:

in method three, the performance of the method in the present application for unknown search spaces can be verified.

For example, 1000 data (i.e., the network structure in the data set NAS-Bench-101) can be randomly selected from the data set NAS-Bench-101, and the 1000 data are trained on the CIFAR-100 data set to obtain the true accuracy of the 1000 data, which is used for training the predictor.

The predictor obtained after training can be used for predicting the performance of the network model on the CIFAR-100 data set, and the predicted result is shown in the following table 3:

TABLE 3

Method of producing a composite material	Top-1 Accuracy(％)	Top-5 Accuracy(％)
			Peephole[5]	74.21±0.32	92.04±0.15
E2EPP[32]	75.86±0.19	93.11±0.10
			This application	78.64±0.16	94.23±0.08

As can be seen from Table 3, the method of the present application is significantly superior to other methods (Peephole [5] and E2EPP [32]), regardless of whether Top-1 Accuracy (%) or Top-5 Accuracy (%).

Fig. 8 is a schematic flowchart of an image processing method of the present application. The method 800 in fig. 8 includes steps 810 and 820.

In some examples, the method 800 may be performed by the execution device 120 of fig. 1, the chip shown in fig. 3, and the execution device 210 of fig. 4, among other devices.

And S810, acquiring an image to be processed.

And S820, carrying out image processing on the image to be processed by using a neural network.

The neural network may be determined according to a neural network predictor trained by the method 500 in fig. 5 or the method 600 in fig. 6.

For example, the neural network may be a neural network that meets the performance requirement and is searched in a preset search space by a neural network structure search method.

During the searching process of the neural network structure, the performance of the network structure can be predicted by using the neural network predictor obtained after the training by using the method 500 in fig. 5 or the method 600 in fig. 6.

Fig. 9 is a hardware structural diagram of an apparatus for training a neural network predictor provided in an embodiment of the present application. An apparatus 3000 for training a neural network predictor (the apparatus 3000 may specifically be a computer device) shown in fig. 9 includes a memory 3001, a processor 3002, a communication interface 3003, and a bus 3004. The memory 3001, the processor 3002, and the communication interface 3003 are communicatively connected to each other via a bus 3004.

The memory 3001 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 3001 may store a program, and the processor 3002 is configured to perform the steps of the method for training a neural network predictor according to the embodiment of the present application when the program stored in the memory 3001 is executed by the processor 3002.

The processor 3002 may be a general Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more integrated circuits, and is configured to execute related programs to implement the method for training the neural network predictor according to the embodiment of the present invention.

The processor 3002 may also be an integrated circuit chip having signal processing capabilities, such as the chip shown in FIG. 2. In implementation, the steps of the method for training the neural network predictor of the present application can be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 3002.

The processor 3002 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 3001, and the processor 3002 reads information in the memory 3001, and in combination with hardware thereof, performs functions that need to be performed by units included in the apparatus for training a neural network predictor, or performs a method for training a neural network predictor according to an embodiment of the present invention.

The communication interface 3003 enables communication between the apparatus 3000 and other devices or communication networks using transceiver means such as, but not limited to, a transceiver. For example, information of the neural network predictor to be constructed and training data required in training the neural network predictor may be acquired through the communication interface 3003.

The bus 3004 may include a pathway to transfer information between various components of the apparatus 3000 (e.g., memory 3001, processor 3002, communication interface 3003).

Fig. 10 is a schematic diagram of a hardware configuration of an image processing apparatus according to an embodiment of the present application. An image processing apparatus 4000 shown in fig. 10 includes a memory 4001, a processor 4002, a communication interface 4003, and a bus 4004. The memory 4001, the processor 4002 and the communication interface 4003 are communicatively connected to each other via a bus 4004.

Memory 4001 may be a ROM, a static storage device, and a RAM. The memory 4001 may store a program, and the processor 4002 and the communication interface 4003 are used to execute the steps of the image processing method according to the embodiment of the present application when the program stored in the memory 4001 is executed by the processor 4002.

The processor 4002 may be a general-purpose, CPU, microprocessor, ASIC, GPU or one or more integrated circuits, and is configured to execute a relevant program to implement the functions required to be executed by the units in the image processing apparatus according to the embodiment of the present application, or to execute the image processing method according to the embodiment of the method of the present application.

The processor 4002 may also be an integrated circuit chip having signal processing capabilities, such as the chip shown in fig. 2. In implementation, the steps of the image processing method according to the embodiment of the present application may be implemented by an integrated logic circuit of hardware in the processor 4002 or by instructions in the form of software.

The processor 4002 may also be a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware component. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The memory medium is located in the memory 4001, and the processor 4002 reads information in the memory 4001, and completes functions required to be executed by units included in the image processing apparatus of the embodiment of the present application in combination with hardware thereof, or executes the image processing method of the embodiment of the method of the present application.

Communication interface 4003 enables communication between apparatus 4000 and other devices or a communication network using transceiver means such as, but not limited to, a transceiver. For example, the image to be processed may be acquired through the communication interface 4003.

Bus 4004 may include a pathway to transfer information between various components of apparatus 4000 (e.g., memory 4001, processor 4002, communication interface 4003).

It should be understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct bus RAM (DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, which may be understood with particular reference to the former and latter text.

In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of training a neural network predictor, comprising:

acquiring a first network structure of a first neural network and a second network structure of a second neural network, wherein the first network structure is a labeled network structure, and the label is used for indicating the performance of the first network structure;

acquiring the similarity between the first network structure and the second network structure;

training the neural network predictor according to the first network structure, the second network structure, the similarity and the label, wherein the neural network predictor is used for predicting the performance of the network structure.

2. The method of claim 1, wherein obtaining the similarity between the first network structure and the second network structure comprises:

acquiring a first feature vector according to the first network structure, wherein the first feature vector is used for representing the first network structure;

acquiring a second feature vector according to the second network structure, wherein the second feature vector is used for representing the second network structure;

and acquiring the similarity according to the first feature vector and the second feature vector.

3. The method of claim 2, wherein obtaining the first feature vector according to the first network structure comprises:

encoding the first network structure by using an encoder to obtain the first feature vector, wherein the encoder is used for encoding to obtain the feature vector representing the network structure;

the obtaining a second feature vector according to the second network structure includes:

and encoding the second network structure by using the encoder to obtain the second feature vector.

4. The method of claim 3, wherein the encoder is trained by:

decoding the second feature vector by using a decoder to obtain a third network structure, wherein the decoder is used for decoding to obtain the network structure represented by the feature vector;

training the encoder according to a difference between the second network structure and the third network structure.

5. The method of any one of claims 2 to 4, wherein training the neural network predictor according to the first network structure, the second network structure, the similarities, and the labels comprises:

determining the performance of the first network structure according to the first feature vector, the second feature vector and the similarity;

training the neural network predictor according to the performance of the first network structure and the label.

6. The method according to any one of claims 2 to 5, wherein the similarity is a distance between the first feature vector and the second feature vector.

7. The method of any one of claims 1 to 6, wherein the neural network predictor is a atlas neural network.

8. The method of any one of claims 1 to 7, wherein the neural network predictor is configured to predict a performance of a network structure of a target neural network, the target neural network being configured for image processing.

9. An image processing method, comprising:

acquiring an image to be processed;

using a neural network to perform image processing on the image to be processed;

wherein the neural network is determined from a neural network predictor trained by the method of any one of claims 1 to 8.

10. An apparatus for training a neural network predictor, comprising:

a first obtaining module, configured to obtain a first network structure of a first neural network and a second network structure of a second neural network, where the first network structure is a labeled network structure, and the label is used to indicate performance of the first network structure;

a second obtaining module, configured to obtain a similarity between the first network structure and the second network structure;

a training module for training the neural network predictor according to the first network structure, the second network structure, the similarity and the label, the neural network predictor being used for predicting the performance of the network structure.

11. The apparatus of claim 10, wherein the second obtaining module is specifically configured to:

12. The apparatus of claim 11, wherein the second obtaining module is specifically configured to:

13. The apparatus of claim 12, wherein the encoder is trained by:

14. The apparatus according to any one of claims 11 to 13, wherein the training module is specifically configured to:

15. The apparatus according to any one of claims 11 to 14, wherein the similarity is a distance between the first feature vector and the second feature vector.

16. The apparatus of any one of claims 10 to 15, wherein the neural network predictor is a convolutional neural network.

17. The apparatus of any one of claims 10 to 16, wherein the neural network predictor is configured to predict a performance of a network structure of a target neural network, the target neural network being configured for image processing.

18. An image processing apparatus characterized by comprising:

the acquisition module is used for acquiring an image to be processed;

the image processing module is used for carrying out image processing on the image to be processed by using a neural network;

19. An apparatus for training a neural network predictor, comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of any one of claims 1-8.

20. An image processing apparatus comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of claim 9.

21. A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, the program code comprising instructions for performing the method of any of claims 1 to 8 or 9.

22. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory through the data interface to perform the method of any one of claims 1 to 8 or 9.