CN111695673B

CN111695673B - Method for training neural network predictor, image processing method and device

Info

Publication number: CN111695673B
Application number: CN202010387976.4A
Authority: CN
Inventors: 许奕星; 唐业辉; 王云鹤; 许春景
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2024-05-24
Anticipated expiration: 2040-05-09
Also published as: CN111695673A; WO2021227787A1

Abstract

The application relates to a method for training a neural network predictor, an image processing method and a device thereof in the artificial intelligence field, wherein the method for training the neural network predictor comprises the following steps: acquiring a first network structure of a first neural network and a second network structure of a second neural network, wherein the first network structure is a network structure with a label, and the label is used for indicating the performance of the first network structure; obtaining the similarity between the first network structure and the second network structure; and training the neural network predictor according to the first network structure, the second network structure, the similarity and the label, wherein the neural network predictor is used for predicting the performance of the network structure. In the method provided by the embodiment of the application, the relation between the network structures is used for assisting in training the neural network predictor, so that the prediction accuracy of the trained neural network predictor can be improved under the condition of using a small amount of marked data.

Description

Method for training neural network predictor, image processing method and device

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to a method of training a neural network predictor, an image processing method and apparatus.

Background

Computer vision is an integral part of various intelligent/autonomous systems in various fields of application, such as manufacturing, inspection, document analysis, medical diagnosis, and military, and is a study of how to use cameras/cameras and computers to acquire the data and information of a subject. In image, eyes (cameras/video cameras) and brains (algorithms) are installed on a computer to replace human eyes to identify, track, measure targets and the like, so that the computer can sense the environment. Because perception can be seen as the extraction of information from sensory signals, computer vision can also be seen as science of how to "perceive" an artificial system from images or multi-dimensional data. In general, computer vision is to acquire input information by using various imaging systems instead of visual organs, and then to process and interpret the input information by using a computer instead of the brain. The ultimate goal of computer vision is to enable computers to view and understand the world visually, like humans, with the ability to adapt themselves to the environment.

With the rapid development of artificial intelligence technology, neural networks (e.g., convolutional neural networks) have found widespread use in the field of computer vision. The performance of the neural network is often related to the network structure of the neural network, and currently, the network structure of the neural network can be determined by a neural network structure search (neural architecture search, NAS) method, for example, for a specific task, the network structure meeting the performance requirement is searched out in a preset search space.

Determining whether the performance of the network fabric meets the performance requirements is time consuming. Currently, one approach is to use a neural network predictor to predict the performance of a network structure on a specified data set.

Therefore, how to improve the prediction accuracy of the neural network predictor is a technical problem to be solved.

Disclosure of Invention

The application provides a method for training a neural network predictor, an image processing method and an image processing device, which are beneficial to improving the prediction accuracy of the neural network predictor.

In a first aspect, a method of training a neural network predictor is provided, the method comprising:

Acquiring a first network structure of a first neural network and a second network structure of a second neural network, wherein the first network structure is a network structure with a label, and the label is used for indicating the performance of the first network structure; obtaining the similarity between the first network structure and the second network structure; and training the neural network predictor according to the first network structure, the second network structure, the similarity and the label, wherein the neural network predictor is used for predicting the performance of the network structure.

In the embodiment of the application, the relationship between the first network structure and the second network structure (for example, the similarity between the first network structure and the second network structure) is used for assisting in training the neural network predictor, so that the training effect of the neural network predictor, namely, the prediction accuracy of the trained neural network predictor can be improved under the condition that a small amount of marking data (for example, at least one network structure with a label) is used.

With reference to the first aspect, in certain implementation manners of the first aspect, the acquiring a similarity between the first network structure and the second network structure includes: acquiring a first feature vector according to the first network structure, wherein the first feature vector is used for representing the first network structure; acquiring a second feature vector according to the second network structure, wherein the second feature vector is used for representing the second network structure; and obtaining the similarity according to the first feature vector and the second feature vector.

In the embodiment of the application, the similarity is obtained through the first feature vector and the second feature vector, so that the similarity can describe the relationship between the two network structures more accurately, and therefore, the training effect of the neural network predictor can be further improved by using the similarity, namely, the prediction accuracy of the trained neural network predictor is improved.

With reference to the first aspect, in certain implementation manners of the first aspect, the acquiring, according to the first network structure, a first feature vector includes: encoding the first network structure by using an encoder to obtain the first feature vector, wherein the encoder is used for encoding to obtain a feature vector representing the network structure; the obtaining a second feature vector according to the second network structure includes: and encoding the second network structure by using the encoder to obtain the second feature vector.

In the embodiment of the application, the network structure is encoded by the encoder, so that the feature vector of the network structure can be conveniently obtained.

Alternatively, the encoder may be implemented by a Neural Network (NN). For example, the encoder may be a recurrent neural network (recurrent neural network, RNN).

With reference to the first aspect, in certain implementation manners of the first aspect, the encoder is obtained after training by: decoding the second feature vector by using a decoder to obtain a third network structure, wherein the decoder is used for decoding to obtain the network structure represented by the feature vector; the encoder is trained based on differences between the second network structure and the third network structure.

In the embodiment of the application, the second feature vector is decoded by the decoder to obtain the third network structure, and the encoder can be conveniently trained according to the difference between the second network structure and the third network structure without marking data (for example, without marking the feature vector output by the encoder).

Alternatively, the decoder may be trained during the training of the encoder.

For example, the encoder and the decoder may be trained simultaneously based on differences between the second network structure and the third network structure.

For example, if it is desired that the second network structure is as consistent as possible with the third network structure, the encoder (and the decoder) can be easily trained without manually labeling the feature vectors output by the encoder, and taking the difference between the second network structure and the third network structure as a loss value.

Meanwhile, manual operation is not needed in the training process, so that the training process of the encoder (and the decoder) can be more automated.

Further, training the encoder by the learning method can improve the accuracy of the feature vectors (of the first network structure and the second network structure) extracted by the encoder, that is, can make different feature vectors more accurately reflect the characteristics of different network structures (for example, the computing power of different network structures).

With reference to the first aspect, in certain implementation manners of the first aspect, the training the neural network predictor according to the first network structure, the second network structure, the similarity, and the label includes: determining the performance of the first network structure according to the first feature vector, the second feature vector and the similarity; training the neural network predictor based on the performance of the first network structure and the tag.

In the embodiment of the present application, the performance of the first network structure is predicted by using the first feature vector, the second feature vector, and the relationship between the first network structure and the second network structure (for example, the similarity between the two network structures), so that the performance of the first network structure (obtained by prediction) is more accurate, and at this time, the neural network predictor is trained according to the performance of the first network structure and the label, so that the training effect of the neural network predictor can be improved, that is, the prediction accuracy of the trained neural network predictor is improved.

With reference to the first aspect, in certain implementations of the first aspect, the similarity is a distance between the first feature vector and the second feature vector.

In the embodiment of the application, the distance between the first feature vector and the second feature vector can more accurately represent the similarity between the first network structure and the second network structure, and training the neural network predictor according to the similarity can further improve the training effect of the neural network predictor.

With reference to the first aspect, in certain implementations of the first aspect, the neural network predictor is a graph roll-up neural network (graph convolutional network, GCN).

In the embodiment of the application, the relationship between the first network structure and the second network structure (for example, the similarity between the first network structure and the second network structure) can be better utilized in the training process through the graph convolution neural network, so that the training effect of the neural network predictor can be improved.

With reference to the first aspect, in certain implementations of the first aspect, the neural network predictor is configured to predict performance of a network structure of a target neural network, the target neural network being configured for image processing.

Wherein the target neural network may convolve a neural network (convolutional neural networks, CNN).

For example, the target neural network may be used for image classification, image segmentation, image detection, image superdivision, and the like.

In a second aspect, there is provided an image processing method, the method comprising:

Acquiring an image to be processed; performing image processing on the image to be processed by using a neural network; wherein the neural network is determined according to a neural network predictor, and the neural network predictor is obtained after training by the method in any implementation manner of the first aspect.

Meanwhile, the neural network is determined according to a neural network predictor, and the effect of image processing can be improved by using the neural network.

Alternatively, the image processing may include image classification, image segmentation, image detection, image superdivision, and the like.

In a third aspect, an apparatus for training a neural network predictor is provided, comprising:

The first acquisition module is used for acquiring a first network structure of a first neural network and a second network structure of a second neural network, wherein the first network structure is a network structure with a label, and the label is used for indicating the performance of the first network structure; the second acquisition module is used for acquiring the similarity between the first network structure and the second network structure; and the training module is used for training the neural network predictor according to the first network structure, the second network structure, the similarity and the label, and the neural network predictor is used for predicting the performance of the network structure.

In the embodiment of the application, the relationship between a plurality of network structures (for example, the similarity between the first network structure and the second network structure) is used for assisting in training the neural network predictor, so that the training effect of the neural network predictor, namely, the prediction accuracy of the trained neural network predictor can be improved under the condition that a small amount of marking data (for example, at least one labeled network structure) is used.

With reference to the third aspect, in some implementations of the third aspect, the second obtaining module is specifically configured to: acquiring a first feature vector according to the first network structure, wherein the first feature vector is used for representing the first network structure; acquiring a second feature vector according to the second network structure, wherein the second feature vector is used for representing the second network structure; and obtaining the similarity according to the first feature vector and the second feature vector.

With reference to the third aspect, in some implementations of the third aspect, the second obtaining module is specifically configured to: encoding the first network structure by using an encoder to obtain the first feature vector, wherein the encoder is used for encoding to obtain a feature vector representing the network structure; and encoding the second network structure by using the encoder to obtain the second feature vector.

With reference to the third aspect, in certain implementations of the third aspect, the encoder is trained by: decoding the second feature vector by using a decoder to obtain a third network structure, wherein the decoder is used for decoding to obtain the network structure represented by the feature vector; the encoder is trained based on differences between the second network structure and the third network structure.

Alternatively, the decoder may be trained during the training of the encoder.

With reference to the third aspect, in some implementations of the third aspect, the training module is specifically configured to: determining the performance of the first network structure according to the first feature vector, the second feature vector and the similarity; training the neural network predictor based on the performance of the first network structure and the tag.

In the embodiment of the present application, the performance of the first network structure is predicted by using the first feature vector, the second feature vector, and the relationship between the first network structure and the second network structure (for example, the similarity between the two network structures), so that the performance information of the first network structure (obtained by prediction) is more accurate, and at this time, the neural network predictor is trained according to the performance of the first network structure and the label, so that the training effect of the neural network predictor can be improved, that is, the prediction accuracy of the trained neural network predictor is improved.

With reference to the third aspect, in certain implementations of the third aspect, the similarity is a distance between the first feature vector and the second feature vector.

With reference to the third aspect, in certain implementations of the third aspect, the neural network predictor is a graph roll-up neural network.

With reference to the third aspect, in certain implementations of the third aspect, the neural network predictor is configured to predict performance of a network structure of a target neural network, where the target neural network is used for image processing.

In a fourth aspect, there is provided an image processing apparatus including:

The acquisition module is used for acquiring the image to be processed; the image processing module is used for performing image processing on the image to be processed by using a neural network; wherein the neural network is determined according to a neural network predictor, and the neural network predictor is obtained after training by the method in any implementation manner of the first aspect.

In a fifth aspect, a method of training a neural network is provided, the method comprising:

Acquiring a first network structure of a first neural network and a second network structure of a second neural network, wherein the first network structure is a network structure with a label, and the label is used for indicating the performance of the first network structure; obtaining the similarity between the first network structure and the second network structure; training the neural network according to the first network structure, the second network structure, the similarity, and the tag.

In the embodiment of the application, the relationship between the first network structure and the second network structure (for example, the similarity between the first network structure and the second network structure) is used for assisting in training the neural network, so that the training effect of the neural network can be improved under the condition that a small amount of marking data (for example, at least one network structure with a performance label) is used.

With reference to the fifth aspect, in certain implementation manners of the fifth aspect, the acquiring a similarity between the first network structure and the second network structure includes: acquiring a first feature vector according to the first network structure, wherein the first feature vector is used for representing the first network structure; acquiring a second feature vector according to the second network structure, wherein the second feature vector is used for representing the second network structure; and obtaining the similarity according to the first feature vector and the second feature vector.

In the embodiment of the application, the similarity is obtained through the first feature vector and the second feature vector, so that the similarity can describe the relationship between the two network structures more accurately, and the training effect of the neural network can be further improved by using the similarity.

With reference to the fifth aspect, in some implementations of the fifth aspect, the acquiring, according to the first network structure, a first feature vector includes: encoding the first network structure by using an encoder to obtain the first feature vector, wherein the encoder is used for encoding to obtain a feature vector representing the network structure; the obtaining a second feature vector according to the second network structure includes: and encoding the second network structure by using the encoder to obtain the second feature vector.

With reference to the fifth aspect, in certain implementations of the fifth aspect, the encoder is trained by: decoding the second feature vector by using a decoder to obtain a third network structure, wherein the decoder is used for decoding to obtain the network structure represented by the feature vector; the encoder is trained based on differences between the second network structure and the third network structure.

Alternatively, the decoder may be trained during the training of the encoder.

With reference to the fifth aspect, in certain implementations of the fifth aspect, the training the neural network predictor according to the first network structure, the second network structure, the similarity, and the label includes: determining the performance of the first network structure according to the first feature vector, the second feature vector and the similarity; training the neural network predictor based on the performance of the first network structure and the tag.

In the embodiment of the present application, the performance of the first network structure is predicted by using the first feature vector, the second feature vector, and the relationship between the first network structure and the second network structure (for example, the similarity between the two network structures), so that the performance of the first network structure (obtained by prediction) is more accurate, and at this time, the neural network is trained according to the performance of the first network structure and the label, so that the training effect of the neural network can be improved.

With reference to the fifth aspect, in certain implementations of the fifth aspect, the similarity is a distance between the first feature vector and the second feature vector.

In the embodiment of the application, the distance between the first feature vector and the second feature vector can more accurately represent the similarity between the first network structure and the second network structure, and the training effect of the neural network can be further improved by training the neural network according to the similarity.

With reference to the fifth aspect, in certain implementations of the fifth aspect, the neural network is a graph roll-up neural network (graph convolutional network, GCN).

In the embodiment of the application, the relationship between the first network structure and the second network structure (for example, the similarity between the first network structure and the second network structure) can be better utilized in the training process through the graph convolution neural network, so that the training effect of the neural network can be improved.

With reference to the fifth aspect, in certain implementations of the fifth aspect, the neural network is used to predict performance of a network structure of a target neural network, the target neural network being used for image processing.

In a sixth aspect, there is provided an apparatus for training a neural network predictor, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of any one of the implementations of the first aspect when the program stored in the memory is executed.

The processor in the sixth aspect may be a central processing unit (central processing unit, CPU) or a combination of a CPU and a neural network operation processor, where the neural network operation processor may include a graphics processor (graphics processing unit, GPU), a neural network processor (neural-network processing unit, NPU), a tensor processor (tensor processing unit, TPU), and the like. Wherein the TPU is an artificial intelligence accelerator application specific integrated circuit fully customized for machine learning by google (google).

In a seventh aspect, there is provided an image processing apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being for executing the method in any one of the implementations of the second aspect when the program stored in the memory is executed.

The processor in the seventh aspect may be a central processing unit (central processing unit, CPU) or a combination of a CPU and a neural network operation processor, where the neural network operation processor may include a graphics processor (graphics processing unit, GPU), a neural network processor (neural-network processing unit, NPU), a tensor processor (tensor processing unit, TPU), and the like. Wherein the TPU is an artificial intelligence accelerator application specific integrated circuit fully customized for machine learning by google (google).

In an eighth aspect, there is provided an apparatus for training a neural network, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of any one of the implementation manners of the fifth aspect when the program stored in the memory is executed.

The processor in the eighth aspect may be a central processing unit (central processing unit, CPU) or a combination of a CPU and a neural network operation processor, where the neural network operation processor may include a graphics processor (graphics processing unit, GPU), a neural network processor (neural-network processing unit, NPU), a tensor processor (tensor processing unit, TPU), and the like. Wherein the TPU is an artificial intelligence accelerator application specific integrated circuit fully customized for machine learning by google (google).

A ninth aspect provides a computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method of any one of the implementations of the first or second or third aspects.

In a tenth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of any one of the implementations of the first or second aspects described above.

In an eleventh aspect, a chip is provided, the chip including a processor and a data interface, the processor reading instructions stored on a memory through the data interface, performing the method in any implementation of the first aspect or the second aspect or the third aspect.

Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the instructions, when executed, are configured to perform the method in any implementation manner of the first aspect or the second aspect or the third aspect.

The chip may be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).

Drawings

Fig. 1 is a schematic structural diagram of a system architecture according to an embodiment of the present application.

Fig. 2 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a chip hardware structure according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of another system architecture according to an embodiment of the present application.

Fig. 5 is a schematic flow chart of a method of training a neural network predictor provided by an embodiment of the present application.

Fig. 6 is a schematic flow chart of a method of training a neural network predictor provided in another embodiment of the present application.

Fig. 7 is a schematic block diagram of a method of training a neural network predictor provided by an embodiment of the present application.

Fig. 8 is a schematic flow chart of an image processing method provided in one embodiment of the present application.

Fig. 9 is a schematic block diagram of an apparatus for training a neural network predictor provided by an embodiment of the present application.

Fig. 10 is a schematic block diagram of an image processing apparatus provided by an embodiment of the present application.

Detailed Description

The technical scheme of the application will be described below with reference to the accompanying drawings.

The embodiment of the application can be applied to various fields in artificial intelligence, such as intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city and the like.

Specifically, the embodiment of the application can be applied to photographing, video recording, safe city, man-machine interaction and other scenes needing image processing, such as image classification, image segmentation, image detection, image superdivision and the like.

The method for training the neural network predictor in the embodiment of the application can be applied to the search (neural architecture search, NAS) of the neural network structure, and the performance of the network structure can be rapidly and accurately predicted by using the trained neural network predictor, thereby saving the time spent by searching the neural network structure. Meanwhile, the network structure obtained through the neural network structure search can be used for constructing a neural network applied to an image processing scene, and the image processing effect is improved.

For example, the neural network constructed by the method in the embodiment of the application can be applied to scenes of image classification, and the accuracy of image classification and the efficiency of image classification can be improved by using the neural network, so that the user experience can be improved.

For another example, the neural network constructed by the method in the embodiment of the application can be applied to the scene of image recognition, and the accuracy and the efficiency of image recognition can be improved by using the neural network, so that the user experience can be improved.

It should be understood that the method in the embodiment of the present application is not limited to the above two scenes when applied, and the neural network constructed by the method in the embodiment of the present application may also be used for photographing, video recording, safe city, man-machine interaction, and other scenes requiring image processing, such as image classification, image segmentation, image detection, image super-division, and the like.

The method for training the neural network predictor in the embodiment of the present application may also be applied to other scenes where the performance of the network structure needs to be predicted, or the method for training the neural network predictor in the embodiment of the present application may also be applied to other scenes where the neural network needs to be trained, or the method for training the neural network predictor in the embodiment of the present application may also be applied to other scenes where the neural network needs to be used (for example, speech recognition, machine translation, semantic segmentation, etc.), which is not limited in the embodiment of the present application.

It should be noted that, in the embodiment of the present application, the image may be a still image (or referred to as a still picture) or a moving image (or referred to as a moving picture), for example, the image in the present application may be a video or a moving picture, or the image in the present application may be a still picture or a photo. For convenience of description, the present application refers to static images or dynamic images collectively as images in the following embodiments.

The embodiments of the present application relate to a number of related applications of neural networks, and in order to better understand the schemes of the embodiments of the present application, related terms and concepts of the neural networks to which the embodiments of the present application may relate are first described below.

(1) Neural network

The neural network may be composed of neural units, which may refer to an arithmetic unit having x _s and an intercept 1 as inputs, and an output of the arithmetic unit may be represented by the following formula:

Where s=1, 2, … … n, n is a natural number greater than 1, W _s is the weight of x _s, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to a next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by joining together a plurality of the above-described single neural units, i.e., the output of one neural unit may be the input of another neural unit. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

(2) Deep neural network

Deep neural networks (deep neural network, DNN), also known as multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three types: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression: Wherein/> Is an input vector,/>Is an output vector,/>Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for input vectors/>The output vector/>, is obtained through the simple operationSince the number of DNN layers is large, the coefficient W and the offset vector/>And the number of (2) is also relatively large. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as/>The superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

In summary, the coefficients of the kth neuron of the L-1 layer to the jth neuron of the L layer are defined as

It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

(3) Convolutional neural network

The convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer, which can be regarded as a filter. The convolution layer refers to a neuron layer in the convolution neural network, which performs convolution processing on an input signal. In the convolutional layer of the convolutional neural network, one neuron may be connected with only a part of adjacent layer neurons. A convolutional layer typically contains a number of feature planes, each of which may be composed of a number of neural elements arranged in a rectangular pattern. Neural elements of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights can be understood as the way image information is extracted is independent of location. The convolution kernel can be initialized in the form of a matrix with random size, and reasonable weight can be obtained through learning in the training process of the convolution neural network. In addition, the direct benefit of sharing weights is to reduce the connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(4) A recurrent neural network (recurrent neural networks, RNN) is used to process the sequence data. In the traditional neural network model, from an input layer to an implicit layer to an output layer, the layers are fully connected, and no connection exists for each node between each layer. Although this common neural network solves many problems, it still has no power to solve many problems. For example, you want to predict what the next word of a sentence is, it is generally necessary to use the previous word, because the previous and next words in a sentence are not independent. RNN is called a recurrent neural network in the sense that a sequence's current output is related to the previous output. The specific expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more and are connected, and the input of the hidden layers comprises not only the output of the input layer but also the output of the hidden layer at the last moment. In theory, RNNs are able to process sequence data of any length. Training for RNNs is the same as training for traditional CNNs or DNNs.

Why is the convolutional neural network already present, the neural network is also looped? The reason is simple, and in convolutional neural networks, one precondition assumption is that: the elements are independent of each other, and the input and output are independent of each other, such as cats and dogs. However, in the real world, many elements are interconnected, such as the stock changes over time, and further such as one says: i like travel, where the most favored place is Yunnan, and later have the opportunity to go. Here, the filling should be known to humans as filling "yunnan". Because humans will infer from the context, but how to have the machine do this? RNNs have thus been developed. RNNs aim to give robots the ability to memorize as a robot. Thus, the output of the RNN needs to rely on current input information and historical memory information.

(5) Loss function

In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually expected target value according to the difference between the predicted value of the current network and the actually expected target value (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be lower than the predicted value, and the adjustment is continuously performed until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.

(6) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the parameter in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the input signal is transmitted forward until the output is generated with error loss, and the parameters in the initial neural network model are updated by back propagation of the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal neural network model, such as a weight matrix.

(7) Pixel value

The pixel value of the image may be a Red Green Blue (RGB) color value and the pixel value may be a long integer representing the color. For example, the pixel value is 256×red+100×green+76blue, where×represents multiplication, blue represents the Blue component, green represents the Green component, and Red represents the Red component. The smaller the value, the lower the luminance, the larger the value, and the higher the luminance in each color component. For a gray image, the pixel value may be a gray value.

As shown in fig. 1, an embodiment of the present application provides a system architecture 100. In fig. 1, a data acquisition device 160 is used to acquire training data. For the method for training a neural network predictor according to the embodiment of the present application, the training data may include an unlabeled network structure, a labeled network structure, and a true value (ground truth, GT) corresponding to the labeled network structure, where the true value corresponding to the labeled network structure may be a performance of the labeled network structure (for example, a performance of the labeled network structure on a specified data set) that is labeled in advance by a person.

After the training data is collected, the data collection device 160 stores the training data in the database 130 and the training device 120 trains the target model/rule 101 based on the training data maintained in the database 130.

In the following description, the training device 120 obtains the target model/rule 101 based on the training data, the training device 120 processes the input network structure (for example, the unlabeled network structure and the labeled network structure) to obtain the performance of the network structure, and compares the performance of the network structure with the true value corresponding to the network structure (for example, the true value corresponding to the labeled network structure) until the difference between the performance of the network structure output by the training device 120 and the true value corresponding to the network structure is smaller than a certain threshold value, thereby completing the training of the target model/rule 101 (i.e., the neural network predictor).

The target model/rule 101 can be used to implement a neural network predictor obtained after training, that is, the network structure is input into the target model/rule 101 after related preprocessing, so that the performance of the network structure can be predicted. The predicted performance of the network structure may be used to determine a neural network, which may be used for image processing.

In practical applications, the training data maintained in the database 130 is not necessarily collected by the data collecting device 160, but may be received from other devices. It should be noted that the training device 120 is not necessarily completely based on the training data maintained by the database 130 to perform training of the target model/rule 101, and it is also possible to obtain the training data from the cloud or other places to perform model training, which should not be taken as a limitation of the embodiments of the present application.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, such as the execution device 110 shown in fig. 1, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR)/Virtual Reality (VR), a vehicle-mounted terminal, or may also be a server or cloud device. In fig. 1, an execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through a client device 140, where the input data may include in an embodiment of the present application: network architecture entered by the client device.

The preprocessing module 113 and the preprocessing module 114 are used for preprocessing according to the input data (such as a network structure) received by the I/O interface 112, and in the embodiment of the present application, the preprocessing module 113 and the preprocessing module 114 (or only one of the preprocessing modules) may be omitted, and the computing module 111 may be directly used for processing the input data.

In preprocessing input data by the execution device 110, or in performing processing related to computation or the like by the computation module 111 of the execution device 110, the execution device 110 may call data, codes or the like in the data storage system 150 for corresponding processing, or may store data, instructions or the like obtained by corresponding processing in the data storage system 150.

Finally, the I/O interface 112 returns the processing results, such as the performance of the network structure obtained as described above, to the client device 140, thereby providing the user with the processing results.

It should be noted that the training device 120 may generate, based on different training data, a corresponding target model/rule 101 for different targets or different tasks, where the corresponding target model/rule 101 may be used to achieve the targets or complete the tasks, thereby providing the user with the desired result.

For example, the object model/rule 101 in the embodiment of the present application may specifically be an image processing apparatus in the embodiment of the present application, and the image processing apparatus may be determined according to the performance of the network structure predicted by the neural network predictor. For the image processing apparatus, the training data may include an image to be processed and a true value corresponding to the image to be processed.

In the case shown in FIG. 1, the user may manually give input data that may be manipulated through an interface provided by the I/O interface 112. In another case, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data requiring the user's authorization, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 140 may also be used as a data collection terminal to collect input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data as shown in the figure, and store the new sample data in the database 130. Of course, instead of being collected by the client device 140, the I/O interface 112 may directly store the input data input to the I/O interface 112 and the output result output from the I/O interface 112 as new sample data into the database 130.

It should be noted that fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 1, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may be disposed in the execution device 110.

As shown in fig. 1, the target model/rule 101 is obtained by training according to the training device 120, where the target model/rule 101 may be a neural network predictor obtained by training based on the method for training a neural network predictor in the present application in the embodiment of the present application, or the target model/rule 101 may be an image processing apparatus in the embodiment of the present application.

Specifically, the neural network predictor obtained after training based on the method for training the neural network predictor in the application can be used for searching the neural network, and the neural network can be used for image processing, voice processing, natural language processing and the like. For example, the neural network predictor may be used to search convolutional neural networks (convolutional neural networks, CNN), deep convolutional neural networks (deep convolutional neural networks, DCNN), and/or recurrent neural networks (recurrent neural network, RNNS), among others.

Since CNN is a very common neural network, the structure of CNN will be described in detail below by taking image processing as an example in conjunction with fig. 2. As described in the basic concept introduction above, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning (DEEP LEARNING) architecture, where the deep learning architecture refers to learning at multiple levels at different abstraction levels through machine learning algorithms. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to data (e.g., images) input thereto.

As shown in fig. 2, convolutional Neural Network (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230. The relevant contents of these layers are described in detail below.

Convolution layer/pooling layer 220:

Convolution layer:

The convolution/pooling layer 220 as shown in fig. 2 may include layers as examples 221-226, for example: in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, layer 223 is a convolutional layer, layer 224 is a pooling layer, layer 225 is a convolutional layer, and layer 226 is a pooling layer; in another implementation, 221, 222 are convolutional layers, 223 are pooling layers, 224, 225 are convolutional layers, and 226 are pooling layers. I.e. the output of the convolution layer may be used as input to a subsequent pooling layer or as input to another convolution layer to continue the convolution operation.

The internal principle of operation of one convolution layer will be described below using the convolution layer 221 as an example.

The convolution layer 221 may include a plurality of convolution operators, also known as kernels, which function in image processing as a filter to extract specific information from the input image matrix, which may be a weight matrix in nature, which is typically predefined, and which is typically processed on the input image in a horizontal direction, pixel by pixel (or two pixels by two pixels … … depending on the value of the step size stride), to accomplish the task of extracting specific features from the image during the convolution operation on the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix produces a convolved output of a single depth dimension, but in most cases does not use a single weight matrix, but instead applies multiple weight matrices of the same size (row by column), i.e., multiple homography matrices. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by the "multiple" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix is used to extract image edge information, another weight matrix is used to extract a particular color of the image, yet another weight matrix is used to blur unwanted noise in the image, etc. The plurality of weight matrixes have the same size (row and column), the feature images extracted by the plurality of weight matrixes with the same size have the same size, and the extracted feature images with the same size are combined to form the output of convolution operation.

The weight values in the weight matrices are required to be obtained through a large amount of training in practical application, and each weight matrix formed by the weight values obtained through training can be used for extracting information from an input image, so that the convolutional neural network 200 can perform correct prediction.

When convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (e.g., 221) tends to extract more general features, which may also be referred to as low-level features; as the depth of the convolutional neural network 200 increases, features extracted by the later convolutional layers (e.g., 226) become more complex, such as features of high level semantics, which are more suitable for the problem to be solved.

Pooling layer/pooling layer 220:

Since it is often desirable to reduce the number of training parameters, the convolutional layers often require periodic introduction of pooling layers, one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers, as illustrated by layers 221-226 in FIG. 2, 220. The only purpose of the pooling layer during image processing is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator may calculate pixel values in the image over a particular range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.

Neural network layer 230:

After processing by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not yet sufficient to output the desired output information. Because, as previously described, the convolution/pooling layer 220 will only extract features and reduce the parameters imposed by the input image. However, in order to generate the final output information (the required class information or other relevant information), convolutional neural network 200 needs to utilize neural network layer 230 to generate the output of the required number of classes or a set of classes. Thus, multiple hidden layers (231, 232 to 23n as shown in fig. 2) may be included in the neural network layer 230, and the output layer 240, where parameters included in the multiple hidden layers may be pre-trained according to relevant training data of a specific task type, for example, the task type may include image recognition, image classification, image super-resolution reconstruction, and so on.

After the underlying layers in the neural network layer 230, i.e., the final layer of the overall convolutional neural network 200 is the output layer 240, the output layer 240 has a class-cross entropy-like loss function, specifically for calculating the prediction error, once the forward propagation of the overall convolutional neural network 200 (e.g., propagation from 210 to 240 as shown in fig. 2) is completed, the backward propagation (e.g., propagation from 240 to 210 as shown in fig. 2) will begin to update the weights and deviations of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the desired result.

It should be noted that the convolutional neural network 200 shown in fig. 2 is only an example of a convolutional neural network, and the convolutional neural network may also exist in the form of other network models in a specific application.

In the embodiment of the present application, the neural network predictor obtained after training based on the method for training the neural network predictor in the present application may be used to search for (a network structure of) a neural network, and the neural network obtained by searching for the neural network structure may include the convolutional neural network 200 shown in fig. 2; or the image processing device in the embodiment of the application may include the convolutional neural network 200 shown in fig. 2, and the image processing device may perform image processing on the image to be processed to obtain a processing result of the image to be processed.

Fig. 3 is a hardware structure of a chip according to an embodiment of the present application, where the chip includes a neural network processor 50. The chip may be provided in an execution device 110 as shown in fig. 1 for performing the calculation of the calculation module 111. The chip may also be provided in the training device 120 as shown in fig. 1 to complete the training work of the training device 120 and output the target model/rule 101. The algorithms of the various layers in the convolutional neural network shown in fig. 2 may be implemented in a chip as shown in fig. 3.

The neural network processor NPU 50 is mounted as a coprocessor to a main CPU (host CPU) which distributes tasks. The NPU has a core part of an arithmetic circuit 503, and a controller 504 controls the arithmetic circuit 503 to extract data in a memory (weight memory or input memory) and perform arithmetic.

In some implementations, the arithmetic circuit 503 includes a plurality of processing units (PEs) inside. In some implementations, the operation circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 503 is a general-purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit 503 takes the data corresponding to the matrix B from the weight memory 502 and buffers the data on each PE in the arithmetic circuit 503. The arithmetic circuit 503 performs matrix operation on the matrix a data and the matrix B data from the input memory 501, and the partial result or the final result of the matrix obtained is stored in an accumulator (accumulator) 508.

The vector calculation unit 507 may further process the output of the operation circuit 503, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 507 may be used for network calculations of non-convolutional/non-FC layers in a neural network, such as pooling (pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector computation unit 507 can store the vector of processed outputs to the unified buffer 506. For example, the vector calculation unit 507 may apply a nonlinear function to an output of the operation circuit 503, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 507 generates a normalized value, a combined value, or both. In some implementations, the vector of processed outputs can be used as an activation input to the operational circuitry 503, for example for use in subsequent layers in a neural network.

The unified memory 506 is used for storing input data and output data.

The weight data is transferred to the input memory 501 and/or the unified memory 506 directly by the memory cell access controller 505 (direct memory access controller, DMAC), the weight data in the external memory is stored in the weight memory 502, and the data in the unified memory 506 is stored in the external memory.

A bus interface unit (bus interface unit, BIU) 510 for interfacing between the main CPU, DMAC and finger memory 509 via a bus.

An instruction fetch memory (instruction fetch buffer) 509 connected to the controller 504 for storing instructions used by the controller 504;

And a controller 504 for calling the instruction cached in the instruction memory 509 to control the operation of the operation accelerator.

Typically, the unified memory 506, the input memory 501, the weight memory 502, and the finger memory 509 are On-Chip (On-Chip) memories, and the external memory is a memory external to the NPU, which may be a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), a high bandwidth memory (high bandwidth memory, HBM), or other readable and writable memory.

The operations of the layers in the convolutional neural network shown in fig. 2 may be performed by the operation circuit 503 or the vector calculation unit 307.

The training device 120 in fig. 1 described above is capable of performing the steps of the method for training a neural network predictor according to the embodiment of the present application, the performing device 110 in fig. 1 is capable of performing the steps of the image processing method according to the embodiment of the present application, the CNN model shown in fig. 2 and the chip shown in fig. 3 may also be used to perform the steps of the image processing method according to the embodiment of the present application, and the chip shown in fig. 3 may also be used to perform the steps of the method for training a neural network predictor according to the embodiment of the present application.

As shown in fig. 4, an embodiment of the present application provides a system architecture 300. The system architecture includes a local device 301, a local device 302, and an execution device 210 and data storage system 250, where the local device 301 and the local device 302 are connected to the execution device 210 through a communication network.

The execution device 210 may be implemented by one or more servers. Alternatively, the execution device 210 may be used with other computing devices, such as: data storage, routers, load balancers, etc. The execution device 210 may be disposed on one physical site or distributed across multiple physical sites. The execution device 210 may use data in the data storage system 250 or invoke program code in the data storage system 250 to implement the method of training a neural network predictor or the image processing method of embodiments of the present application.

Specifically, the execution device 210 may execute the following procedure:

By the above-described process execution device 210, it is possible to build a neural network predictor that can be used to search a neural network that can be used for image processing, voice processing, natural language processing, and the like.

Or the execution device 210 may also execute the following procedure:

Acquiring an image to be processed; performing image processing on the image to be processed by using a neural network; wherein the neural network is determined from a neural network predictor, the neural network predictor being trained by the method of any one of claims 1 to 8.

By the above-described process execution device 210, an image processing apparatus can be built, which can be used for image processing.

The user may operate respective user devices (e.g., local device 301 and local device 302) to interact with the execution device 210. Each local device may represent any computing device, such as a personal computer, computer workstation, smart phone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set top box, game console, etc.

The local device of each user may interact with the performing device 210 through a communication network of any communication mechanism/communication standard, which may be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

In one implementation, local device 301, local device 302 obtains relevant parameters of the neural network predictor from executive device 210, deploys the neural network predictor on local device 301, local device 302, and utilizes the neural network predictor to predict the performance of the network structure.

In another implementation, the neural network predictor may be deployed directly on the execution device 210, and the execution device 210 predicts the performance of the network structure by taking the network structure from the local device 301 and the local device 302 and using the neural network predictor.

In one implementation, the local device 301 and the local device 302 acquire relevant parameters of the image processing apparatus from the execution device 210, and the image processing apparatus is disposed on the local device 301 and the local device 302, and is used for performing image processing on an image to be processed.

In another implementation, the image processing apparatus may be disposed directly on the execution device 210, and the execution device 210 performs image processing on the image to be processed by acquiring the image to be processed from the local device 301 and the local device 302.

That is, the executing device 210 may also be a cloud device, where the executing device 210 may be deployed in the cloud; or the executing device 210 may be a terminal device, where the executing device 210 may be disposed on the user terminal side, which is not limited in the embodiment of the present application.

The method for training the neural network predictor and the image processing method according to the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 5 is a schematic flow chart of a method of training a neural network predictor of the present application. The method 500 of training a neural network predictor in fig. 5 may include steps 510, 520, and 530.

In some examples, the method 500 may be performed by the execution device 120 of fig. 1, the chip shown in fig. 3, and the execution device 210 of fig. 4, among other devices.

S510, acquiring a first network structure of the first neural network and a second network structure of the second neural network.

Wherein the first network structure may be a tagged network structure, and the tag may be used to indicate a performance of the first network structure.

The label here may be understood as a real label corresponding to the network structure, which may be used to represent the real performance corresponding to the network structure.

For example, a neural network formed by a network structure can be trained, and when the neural network is trained to be converged, the real performance corresponding to the network structure can be determined according to the converged neural network, so that the performance label corresponding to the network structure is obtained.

At present, the method for obtaining the performance label after training the neural network to be converged is time-consuming.

At this time, the performance label corresponding to the network structure is used, and the network structure with the performance label can be obtained through a manual labeling mode.

Alternatively, the second network structure may be a non-tagged network structure.

For example, in S510, a plurality of network structures may be acquired, where the plurality of network structures may include a small number of network structures with performance tags and a large number of network structures without performance tags.

S520, obtaining the similarity between the first network structure and the second network structure.

Wherein the similarity may be used to represent a degree of similarity between the first network structure and the second network structure.

It should be noted that in the embodiment of the present application, a plurality of network structures may be acquired in the step S510, where a similarity between some network structures (such as at least two network structures) of the plurality of network structures may be acquired; or the similarity between every two network structures in the plurality of network structures may be obtained, which is not limited in the embodiment of the present application.

For example, a similarity between any two network structures of the plurality of network structures may be obtained.

For another example, a similarity between the (each) tagged network structure and the (each) untagged network structure of the plurality of network structures is obtained.

Optionally, a first feature vector may be acquired according to the first network structure; acquiring a second feature vector according to the second network structure, wherein the second feature vector is used for representing the second network structure; and obtaining the similarity according to the first feature vector and the second feature vector.

Wherein the first feature vector may be used to represent the first network structure, e.g. the first feature vector represents a characteristic of the first network structure (i.e. a network feature), or the first feature vector may be used to represent a computing power of the first network structure.

The second feature vector may be used to represent the second network structure, for example, the second feature vector represents a characteristic of the second network structure (i.e., a network feature), or the second feature vector may be used to represent computing power of the second network structure.

It should be noted that, by using the feature vectors of different network structures, features of different network structures (for example, computing capabilities of different network structures) may be more accurately represented.

At the same time, feature vectors may be used to represent network structures, which may also be used to facilitate the processing of the network structure (represented by the feature vector) by a neural network (e.g., a neural network predictor).

Alternatively, the first feature vector of the first network structure may be embedded (embedding), or vector features in a form similar to embedding.

The second eigenvector of the second network structure is similar to the first eigenvector of the first network structure, and will not be described here again.

Alternatively, the similarity may be a distance between the first feature vector and the second feature vector. For example, the similarity may be a cosine distance between the first feature vector and the second feature vector.

Further, an encoder may be used to encode the first network structure to obtain the first feature vector; the second network structure may be encoded using the encoder to obtain the second feature vector.

Wherein the encoder may be configured to encode a feature vector representing a network structure, and the encoder may be implemented by a Neural Network (NN).

For example, the encoder may be an encoder in a self-encoder (Auto-Encoder, AE), which may also include a decoder.

The encoder may be a recurrent neural network (recurrent neural network, RNN).

In the embodiment of the application, the network structure is encoded by the encoder, so that the feature vector of the network structure can be obtained in a portable way.

Alternatively, the encoder may be trained by:

Decoding the second feature vector by using a decoder to obtain a third network structure, wherein the decoder is used for decoding to obtain the network structure represented by the feature vector; the encoder is trained based on differences between the second network structure and the third network structure.

Alternatively, the decoder may be trained during the training of the encoder.

In the embodiment of the application, the second feature vector is decoded by the decoder to obtain the third network structure, so that the encoder and the decoder can be conveniently trained according to the difference between the second network structure and the third network structure without marking data (for example, without marking the feature vector output by the encoder).

For example, if it is desired that the second network structure is as consistent as possible with the third network structure, the encoder and the decoder can be trained easily without manually labeling the feature vectors output by the encoder and using the difference between the second network structure and the third network structure as a loss value.

And S530, training the neural network predictor according to the first network structure, the second network structure, the similarity and the label.

Wherein the neural network predictor may be used to predict the performance of the network structure.

Alternatively, the neural network predictor may be used to predict the performance of a network structure of a target neural network, which may be used for image processing, voice processing, natural language processing, and the like.

For example, the target neural network may be a convolutional neural network as shown in fig. 2, which may be used for image classification, image segmentation, image detection, image superdivision, and the like.

Alternatively, the neural network predictor may be a graph roll-up neural network (graph convolutional network, GCN).

Optionally, the training the neural network predictor according to the first network structure, the second network structure, the similarity, and the label may include:

Determining the performance of the first network structure according to the first feature vector, the second feature vector and the similarity; training the neural network predictor based on the performance of the first network structure and the tag.

For example, a plurality of network structures may be acquired in S510, and the performance of the plurality of network structures may be obtained according to the feature vectors and the similarity of the plurality of network structures; the neural network predictor is trained using as a loss value a difference in performance of a tagged network structure of the plurality of network structures and a corresponding tag of the network structure (e.g., the tag may be used to indicate a true performance of the network structure).

In the process of training the neural network predictor, the performance of the first network structure is predicted by using the first feature vector, the second feature vector and the relationship between the first network structure and the second network structure (for example, the similarity between the two network structures), so that the performance information of the first network structure (obtained by prediction) is more accurate, and at this time, the neural network predictor is trained according to the performance of the first network structure and the label, so that the training effect of the neural network predictor can be improved, that is, the prediction accuracy of the trained neural network predictor is improved.

In the method 500 shown in fig. 5, training the neural network predictor is assisted by using the relationship between the first network structure and the second network structure (e.g., the similarity between the first network structure and the second network structure), so that the training effect of the neural network predictor, that is, the prediction accuracy of the trained neural network predictor, can be improved when a small amount of marking data (e.g., at least one tagged network structure) is used.

Fig. 6 is a schematic flow chart of a method of training a neural network predictor of the present application. The method 600 of training a neural network predictor in fig. 6 may include steps 610, 620, and 630.

In some examples, the method 600 may be performed by the execution device 120 of fig. 1, the chip shown in fig. 3, and the execution device 210 of fig. 4, among other devices.

S610, acquiring network characteristics of the network structure.

Alternatively, network characteristics of a plurality of network structures in a set of network structures may be extracted, where the plurality of network structures may include a small number of network structures with performance tags and a large number of network structures without performance tags.

The network characteristics of the network structure may be the characteristic vector of the network structure in the method 500 in fig. 5, and specific reference may be made to the description in the method 500, which is not repeated herein.

For example, network fabric set X includes N network fabrics (N _l performance tagged network fabrics and N _u non-performance tagged network fabrics), network fabric set x=x ^l∪X^u, where X ^l is N _l performance tagged network fabrics,Y ^l is the performance label corresponding to the network structure of N _l performance labels,X ^u is N _u network structures without performance labels,/>N _l、N_u are positive integers.

Alternatively, the network characteristics of the N network structures in the set of network structures X may be extracted using a self-encoder, which may include an encoder E and a decoder D.

Wherein both the encoder E and the decoder D may be implemented by a Neural Network (NN). For example, the encoder E and the decoder D may be a recurrent neural network (recurrent neural network, RNN).

For example, the encoder E may be used to extract N network structures in the network structure set X to perform encoding, so as to obtain network characteristics of the N network structures.

For another example, the decoder D may be used to decode the network features of the N network structures to obtain N candidate network structures, where the N candidate network structures correspond to the N network structures; the encoder E and the decoder D may be trained on N candidate network structures.

Or the decoder D can be used for decoding the network characteristics of the N _l network structures with the performance labels to obtain N _l candidate network structures, wherein the N _l candidate network structures correspond to the N _l network structures with the performance labels; the encoder E and the decoder D may be trained on N _l candidate network structures.

Alternatively, the following loss functions may be constructed to train the self-encoder (i.e., the encoder E and the decoder D):

Where W _e is the parameters of the encoder, W _d is the parameters of the decoder, Representing the output of the encoder and,N _l、N_u is a positive integer representing the output of the decoder.

At this time, the output of the encoder can beAs extracted network features of the network structure, the output of the encoder may be subsequently/>, for ease of descriptionAbbreviated as E (x _i).

The encoder may be the encoder in the method 500 in fig. 5, and the decoder may be the decoder in the method 500 in fig. 5, and the detailed description may refer to the embodiment in the method 500, which is not repeated herein.

S620, constructing a network relation diagram according to the network characteristics.

Alternatively, a network relationship graph may be constructed according to the network characteristics acquired in S610.

In S610, N network features corresponding to the N network structures may be obtained, and an NxN network relationship graph may be constructed according to the N network features.

The network relation diagram of the NxN may include a similarity of each of the N network features with other N-1 network features, and a similarity of each of the N network features with its own network structure.

For example, the prepaid range of similarity may be [0,1], where 0 may represent that the two are completely different (or the similarity of the two is the lowest) and 1 may represent that the two are completely the same (or the two are the same).

Since each network feature is identical to its own network structure, the similarity of each network feature to its own network structure may be 1.

For example, for network structure X _i and network structure X _j in network structure set X, the similarity between network structure X _i and network structure X _j may be represented by s (X _i,x_j), and s (X _i,x_j) may be calculated by the following distance formula:

wherein d (·) is an arbitrary distance metric function, σ is a hyper-parameter, exp (·) is an exponential function.

The meaning of the above formula is that for a given network feature E (x _i) of network structure x _i and network feature E (x _j) of network structure x _j, the farther the distance between network feature E (x _i) and network feature E (x _j), the lower the similarity; conversely, the closer the distance between network feature E (x _i) and network feature E (x _j), the higher the similarity.

The similarity may be the similarity in the method 500 in fig. 5, and the specific description may refer to the embodiment in the method 500, which is not repeated herein.

Assuming a total of N network structures, each of which can calculate a similarity according to the above method, we obtain an NxN relationship graph. Each element in the relationship diagram represents a similarity between (network characteristics of) two network structures.

S630, predicting performance through the graph convolutional neural network.

Wherein the graph roll-up neural network can be considered a neural network predictor. The graph roll-up neural network may be the neural network predictor in the method 500 of fig. 5, and the detailed description may refer to the embodiment in the method 500, which is not repeated herein.

As shown in fig. 7, the performance of the N network structures can be obtained by inputting the network characteristics E (x _i) of the N network structures output by the encoder and the network relationship diagram of NxN (for example, the network relationship diagram of NxN may be a matrix of NxN) into the graph convolutional neural network.

For example, assume an input network structurePerformance tags of/>Predicted performance is/>The expected performance is/>Can be as close to the true value/>The graph roll-up neural network (i.e., the neural network predictor) can be trained by constructing the following loss function:

Alternatively, the following loss functions may be constructed while training the encoder, decoder, and convolutional neural network:

Wherein, W _e is the parameter of the encoder, W _d is the parameter of the decoder, W _p is the parameter of the graph convolution neural network, L _rc is the loss function of the self-encoder, L _rg is the parameter of the graph convolution neural network, lambda is the super-parameter, lambda is used for adjusting the weight of the two loss functions.

Next, the method of training the neural network predictor in embodiments of the present application may be tested on the data set NAS-Bench-101 by several methods described below.

Wherein the data set NAS-standard-101 may contain about 423000 different network structures, and the true precision that these network structures (i.e., about 423000 different network structures that the data set NAS-standard-101 contains) achieve after training on the data set CIFAR-10.

The method comprises the following steps:

In the first method, in order to fully embody the effect of the neural network predictor obtained by the different methods, the effect of the neural network predictor may be evaluated using several indexes as follows:

(1)Kendall’s Tau(KTau)：

KTau is an index about the ranking, and the range of values is [ -1,1]. Wherein KTau =1 when the predicted ordering of a group of samples is exactly the same as the true ordering; KTau = -1 when the predicted ordering is exactly opposite to the true ordering, KTau is around 0 when the predicted ordering is independent of the true ordering.

(2)mean square error(MSE)：

MSE is used to evaluate the prediction accuracy of a single sample point. For a given sample value (or sample point), the more accurate the predicted value (predicted performance) is, the smaller the MSE, when the predicted value is exactly the same as the true value, mse=0.

(3)correlation coefficient(r)：

R is a correlation coefficient, and the value range is [ -1,1], and is used for evaluating the correlation between the predicted value (predicted performance) and the true value. The larger the value of r, the more accurate the predicted value (predicted performance) can be explained.

Table 1 below shows how the KTau, MSE, r values of the neural network predictors obtained by different methods are respectively in the case of using different numbers of marker data in the NAS-standard-101 dataset, specifically as shown in table 1 below:

TABLE 1

The label samples in table 1 are the network structures with performance labels described in the above embodiments.

As can be seen from Table 1, the effect of the method of the present application (three indices KTau, MSE, r) was better than that of the other methods (Peephole [5] and E2EPP [32 ]) in the three cases of the number of labeled samples of 1000, 10000 and 100000.

The second method is as follows:

in the second method, 1000 marked samples and all unmarked samples may be used as training data to train a neural network predictor, and the resulting predictor is used to search the neural network, with the search results shown in table 2 below:

TABLE 2

Method of	Precision of	Ranking position
			Peephole[5]	93.41±0.34	1.64
E2EPP[32]	93.77±0.13	0.15
			The application is that	94.01±0.12	0.01

The accuracy in table 2 may refer to the accuracy of the searched network structure, for example, the accuracy may be Top-1 Accuracy (%), and the Ranking position may be a Ranking position of the network structure in the current search space, for example, the Ranking position may be Ranking (%).

As can be seen from Table 2, the method of the present application is significantly better than other methods (Peephole and E2EPP 32) regardless of the accuracy of the network structure searched or the ranking of the network structure in the current search space.

And a third method:

in method three, the performance of the method of the present application for an unknown search space can be verified.

For example, 1000 data may be randomly selected from the data set NAS-standard-101 (i.e., the network structure in the data set NAS-standard-101), and the 1000 data may be trained on the CIFAR-100 data set to obtain the true accuracy of the 1000 data for training the predictor.

The predictors obtained after training can be used to predict the behavior of the network model on CIFAR-100 datasets, with the predicted results shown in Table 3 below:

TABLE 3 Table 3

Method of	Top-1 Accuracy(％)	Top-5 Accuracy(％)
			Peephole[5]	74.21±0.32	92.04±0.15
E2EPP[32]	75.86±0.19	93.11±0.10
			The application is that	78.64±0.16	94.23±0.08

As can be seen from Table 3, the method of the present application is significantly superior to other methods (Peephole [5] and E2EPP [32 ]), regardless of Top-1 Accuracy (%) or Top-5 Accuracy (%).

Fig. 8 is a schematic flow chart of the image processing method of the present application. The method 800 in fig. 8 includes steps 810 and 820.

In some examples, the method 800 may be performed by the execution device 120 of fig. 1, the chip shown in fig. 3, and the execution device 210 of fig. 4, among other devices.

S810, acquiring an image to be processed.

S820, performing image processing on the image to be processed by using a neural network.

The neural network may be determined according to a neural network predictor, which is trained by the method 500 in fig. 5 or the method 600 in fig. 6.

For example, the neural network may be a neural network that meets performance requirements and is searched in a preset search space by a neural network structure search method.

In the process of searching the neural network structure, the neural network predictor obtained after training by the method 500 in fig. 5 or the method 600 in fig. 6 may be used to predict the performance of the network structure.

Fig. 9 is a schematic hardware structure of an apparatus for training a neural network predictor according to an embodiment of the present application. The apparatus 3000 for training a neural network predictor shown in fig. 9 (the apparatus 3000 may be a computer device in particular) includes a memory 3001, a processor 3002, a communication interface 3003, and a bus 3004. The memory 3001, the processor 3002, and the communication interface 3003 are connected to each other by a bus 3004.

The memory 3001 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 3001 may store a program that, when executed by the processor 3002, the processor 3002 is operative to perform the various steps of the method of training a neural network predictor in embodiments of the present application.

The processor 3002 may employ a general-purpose central processing unit (central processing unit, CPU), microprocessor, application SPECIFIC INTEGRATED Circuit (ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing associated programs to implement the methods of training neural network predictors of the method embodiments of the present application.

The processor 3002 may also be an integrated circuit chip with signal processing capabilities, for example, the chip shown in fig. 2. In implementation, the steps of the method of training a neural network predictor of the present application may be performed by integrated logic circuitry of hardware or instructions in the form of software in the processor 3002.

The processor 3002 may also be a general purpose processor, a digital signal processor (DIGITAL SIGNAL processing DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (field programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 3001, and the processor 3002 reads information in the memory 3001, and in combination with its hardware, performs the functions that the units included in the apparatus for training a neural network predictor need to perform, or performs the method for training a neural network predictor according to the method embodiment of the present application.

The communication interface 3003 enables communications between the apparatus 3000 and other devices or communication networks using a transceiving apparatus such as, but not limited to, a transceiver. For example, information of the neural network predictor to be constructed and training data required in training the neural network predictor may be acquired through the communication interface 3003.

A bus 3004 may include a path to transfer information between various components of the device 3000 (e.g., memory 3001, processor 3002, communication interface 3003).

Fig. 10 is a schematic diagram of a hardware configuration of an image processing apparatus according to an embodiment of the present application. The image processing apparatus 4000 shown in fig. 10 includes a memory 4001, a processor 4002, a communication interface 4003, and a bus 4004. The memory 4001, the processor 4002 and the communication interface 4003 are connected to each other by a bus 4004.

The memory 4001 may be a ROM, a static storage device, and a RAM. The memory 4001 may store a program, and when the program stored in the memory 4001 is executed by the processor 4002, the processor 4002 and the communication interface 4003 are used to perform the respective steps of the image processing method of the embodiment of the present application.

The processor 4002 may employ a general-purpose CPU, microprocessor, ASIC, GPU, or one or more integrated circuits for executing associated programs to perform the functions required by the elements in the image processing apparatus of the present application or to perform the image processing methods of the method embodiments of the present application.

The processor 4002 may also be an integrated circuit chip having signal processing capabilities, for example, the chip shown in fig. 2. In implementation, the steps of the image processing method according to the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 4002 or an instruction in the form of software.

The processor 4002 may also be a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 4001, and the processor 4002 reads information in the memory 4001, and in combination with hardware thereof, performs functions required to be executed by units included in the image processing apparatus of the embodiment of the present application, or executes the image processing method of the embodiment of the method of the present application.

The communication interface 4003 enables communication between the apparatus 4000 and other devices or communication networks using a transceiving apparatus such as, but not limited to, a transceiver. For example, the image to be processed can be acquired through the communication interface 4003.

Bus 4004 may include a path for transferring information between various components of device 4000 (e.g., memory 4001, processor 4002, communication interface 4003).

It should be appreciated that the processor in embodiments of the present application may be a central processing unit (central processing unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf programmable gate arrays (field programmable GATE ARRAY, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of random access memory (random access memory, RAM) are available, such as static random access memory (STATIC RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double DATA RATE SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (direct rambus RAM, DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.

In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of training a neural network predictor, the method performed by a computing device, the method comprising:

Acquiring a first network structure of a first neural network and a second network structure of a second neural network, wherein the first network structure is a network structure with a label, and the label is used for indicating the performance of the first network structure;

Obtaining the similarity between the first network structure and the second network structure;

Training the neural network predictor according to the first network structure, the second network structure, the similarity and the label, wherein the neural network predictor is used for predicting the performance of a network structure of a target neural network, and the computing device deploys the target neural network to process images to be processed.

2. The method of claim 1, wherein the obtaining the similarity between the first network structure and the second network structure comprises:

acquiring a first feature vector according to the first network structure, wherein the first feature vector is used for representing the first network structure;

acquiring a second feature vector according to the second network structure, wherein the second feature vector is used for representing the second network structure;

and obtaining the similarity according to the first feature vector and the second feature vector.

3. The method of claim 2, wherein the obtaining a first feature vector from the first network structure comprises:

encoding the first network structure by using an encoder to obtain the first feature vector, wherein the encoder is used for encoding to obtain a feature vector representing the network structure;

the obtaining a second feature vector according to the second network structure includes:

and encoding the second network structure by using the encoder to obtain the second feature vector.

4. A method according to claim 3, wherein the encoder is trained by:

Decoding the second feature vector by using a decoder to obtain a third network structure, wherein the decoder is used for decoding to obtain the network structure represented by the feature vector;

The encoder is trained based on differences between the second network structure and the third network structure.

5. The method of any of claims 2 to 4, wherein the training the neural network predictor according to the first network structure, the second network structure, the similarity, and the label comprises:

Determining the performance of the first network structure according to the first feature vector, the second feature vector and the similarity;

training the neural network predictor based on the performance of the first network structure and the tag.

6. The method according to any one of claims 2 to 4, wherein the similarity is a distance between the first feature vector and the second feature vector.

7. The method of any one of claims 1 to 4, wherein the neural network predictor is a graph roll-up neural network.

8. An image processing method, comprising:

Acquiring an image to be processed;

Performing image processing on the image to be processed by using a neural network;

Wherein the neural network is determined from a neural network predictor, the neural network predictor being trained by the method of any one of claims 1 to 7.

9. An apparatus for training a neural network predictor, the apparatus being applied to a computing device, the apparatus comprising:

The first acquisition module is used for acquiring a first network structure of a first neural network and a second network structure of a second neural network, wherein the first network structure is a network structure with a label, and the label is used for indicating the performance of the first network structure;

The second acquisition module is used for acquiring the similarity between the first network structure and the second network structure;

The training module is used for training the neural network predictor according to the first network structure, the second network structure, the similarity and the label, the neural network predictor is used for predicting the performance of the network structure of a target neural network, and the computing device deploys the target neural network to process the image to be processed.

10. The apparatus of claim 9, wherein the second acquisition module is specifically configured to:

11. The apparatus of claim 10, wherein the second acquisition module is specifically configured to:

12. The apparatus of claim 11, wherein the encoder is trained by:

13. The apparatus according to any one of claims 10 to 12, wherein the training module is specifically configured to:

14. The apparatus according to any one of claims 10 to 12, wherein the similarity is a distance between the first feature vector and the second feature vector.

15. The apparatus of any one of claims 9 to 12, wherein the neural network predictor is a graph roll-up neural network.

16. An image processing apparatus, comprising:

The acquisition module is used for acquiring the image to be processed;

The image processing module is used for performing image processing on the image to be processed by using a neural network;

17. An apparatus for training a neural network predictor, comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of any of claims 1-7.

18. An image processing apparatus comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of claim 8.

19. A computer readable storage medium storing program code for execution by a device, the program code comprising instructions for performing the method of any one of claims 1 to 7 or 8.

20. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface to perform the method of any one of claims 1 to 7 or 8.