CN113570029A

CN113570029A - Method for obtaining neural network model, image processing method and device

Info

Publication number: CN113570029A
Application number: CN202010357935.0A
Authority: CN
Inventors: 田沈晶; 黄泽毅; 徐凯翔; 唐少华
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2021-10-29
Also published as: WO2021218517A1

Abstract

The application discloses a method for acquiring a neural network model, an image processing method and an image processing device in the field of artificial intelligence. The method for acquiring the neural network model comprises the following steps: acquiring a pre-trained super network model, wherein the pre-trained super network model is obtained based on source data set training; acquiring a target data set, wherein the task corresponding to the target data set is the same as the task corresponding to the source data set; performing transfer learning on the pre-trained hyper-network model based on the target data set to obtain a transfer-learned hyper-network model; and searching the sub-network model in the ultra-network model after the transfer learning to obtain a target neural network model. The method can reduce the training cost and improve the performance of the neural network model in the process of obtaining the required neural network model.

Description

Method for obtaining neural network model, image processing method and device

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to a method for acquiring a neural network model, an image processing method, and an apparatus.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, human-computer interaction, recommendation and search, AI basic theory, and the like.

With the rapid development of artificial intelligence technology, the performance of neural network models (e.g., convolutional neural network models) has been continuously improved, and the neural network models have achieved great success in the processing and analysis of various media signals such as images, videos, and voices. Taking image recognition as an example, the deep neural network model leads the traditional computer vision method with the advantage of rolling compaction level. However, training a good deep neural network model requires a great deal of expert experience. In recent years, automatic search of neural network models by means of automatic machine learning (AutoML) technology has become a hot spot in the field of computer vision. AutoML can obtain a better neural network model than artificial design. However, training resources, such as training machines, training data, etc., required by the AutoML are often much larger than those of the general neural network model, and the training cost of the AutoML is much higher than that of the general neural network model. In small data scenarios, it is often difficult for AutoML to directly train to obtain an excellent neural network model because there is not enough training data.

Therefore, how to obtain the required neural network model through the AutoML becomes a problem to be solved urgently.

Disclosure of Invention

The application provides a method for acquiring a neural network model, an image processing method and an image processing device, which can reduce training cost and improve the performance of the neural network model in the process of acquiring the required neural network model.

In a first aspect, a method for obtaining a neural network model is provided, the method including: acquiring a pre-trained super network model, wherein the pre-trained super network model is obtained based on source data set training; acquiring a target data set, wherein the task corresponding to the target data set is the same as the task corresponding to the source data set; performing transfer learning on the pre-trained hyper-network model based on the target data set to obtain a transfer-learned hyper-network model; and searching the sub-network model in the ultra-network model after the transfer learning to obtain a target neural network model.

The source data set can adopt a data set with large data volume, so that the training of the super-network model can be ensured to be sufficient, and the super-network model with higher accuracy can be obtained.

It should be noted that the source data set may be a data set related to a task that the target neural network model needs to perform. That is, the sub-network models in the pre-trained hyper-network model are consistent with the tasks performed by the target neural network model. For example, both are used for image classification; or, both are used for image segmentation; or both for target detection.

For example, when the target neural network model is used for image classification, the source data set may be the public data set ImageNet.

The target data set may be a data set input by a user or a data set acquired from another device.

The transfer learning of the pre-trained hyper-network model based on the target dataset may be a fine-tuning of the pre-trained hyper-network model based on the target dataset.

The transfer learning of the pre-trained hyper-network model refers to the transfer of weights of the pre-trained hyper-network model.

The target neural network model may refer to a neural network model whose performance indicators meet the target performance indicators. That is, the target subnetwork model can be searched for in the transition-learned super network model, and the target neural network model can be determined based on the target subnetwork model. The target subnetwork model can be a subnetwork model whose performance indicators meet the target performance indicators. The target subnetwork model can be one subnetwork model or a plurality of subnetwork models.

The performance index of the subnetwork model may include inference accuracy of the subnetwork model, hardware overhead of the subnetwork model, or inference duration of the subnetwork model. The target performance index may include a target accuracy, a target cost, or a target inference duration, etc.

For example, the sub-network model is searched in the super network model after the migration learning, and the target neural network model can be obtained by searching the sub-network model in the super network model after the migration learning through a reinforcement learning algorithm.

In the embodiment of the application, the pre-trained super network model is migrated to the target data set, so that the super network model with better performance can be obtained even if the target data set is smaller, and then the target neural network model is obtained through searching. And the application of small data scenes is enabled, and the precision of the AutoML in the small data scenes is greatly improved.

Meanwhile, for different requirements of the user, for example, the overhead/precision requirements of the user, a neural network model meeting the requirements of the user can be obtained by searching the sub-network models in the super-network model, and the target data set is adapted to meet the requirements of the user, for example, the overhead/precision requirements of the user are met.

Meanwhile, the weight of the hyper-network model is shared among different data sets, the source data set and the target data set are data sets related to the same task, efficient transfer learning of the AutoML can be achieved, only the weight of the hyper-network model is finely adjusted during transfer, the structure of the hyper-network model does not need to be adjusted, transfer efficiency of the AutoML can be greatly improved, time required by training of the AutoML is reduced by at least one order of magnitude, and even the training time of a common neural network model is reached.

In addition, the migration time of the super network model provided in the embodiment of the application is close to the migration time of the common neural network model. That is to say, compared with a method for obtaining a target neural network model through migration learning of a common neural network model, under the same training duration, the method for obtaining the neural network model in the embodiment of the present application can better meet the requirement of a user on fine overhead/precision. And under the condition of the same overhead, obtaining a target neural network model with higher precision.

In addition, for the same task, such as an image classification task, under the condition that a user needs a plurality of neural network models, the neural network models do not need to be designed and trained respectively according to each deployment scheme or user requirement, the super network models only need to be trained once, the weights of the super network models are shared among different data sets or are migrated to the different data sets, the neural network models meeting different overhead/precision requirements of the user are obtained, and the training cost is greatly reduced.

With reference to the first aspect, in certain implementations of the first aspect, the pre-trained hyper-network model is obtained by training through progressive shrinkage.

Specifically, training the hyper-network model by progressive shrinkage may include: the maximum subnetwork model is trained, and then the subnetwork model with the variable convolution kernels, the subnetwork model with the variable layer number and the subnetwork model with the variable channel number are trained step by step.

The maximum subnetwork model refers to a subnetwork model with the maximum convolution kernel (kernel), the maximum number of layers (depth) and the maximum number of channels (width) in the super network model.

Specifically, the sub-network model with variable convolution kernels, the sub-network model with variable layer number, and the sub-network model with variable channel number can be trained in a form of knowledge distillation on the maximum sub-network model.

Due to weight sharing between the sub-network models, interference between different sub-network models may occur when training the super-network model. According to the scheme in the embodiment of the application, mutual influence of the sub-network models with different sizes in the training process is reduced through the training of the progressive shrinkage algorithm, and the obtained super-network model can support various different architecture settings. For example, a number of different architectural settings include different numbers of layers, different numbers of channels, different subnet models of convolution kernel sizes, and so forth. After the training of the super network model is completed, a proper sub network model can be selected from the super network model, additional training on the searched sub network model is not needed, or retraining is not needed for the sub network model, and the accuracy of the sub network model can be ensured to meet the requirement of pre-training. In the training process of the super network model, each sub network model does not need to be trained independently, and the sub network model in the super network model can reach the accuracy rate similar to the accuracy rate of the independently trained sub network model.

With reference to the first aspect, in some implementation manners of the first aspect, performing migration learning on a pre-trained hyper-network model based on a target data set to obtain a post-migration-learning hyper-network model, including: selecting a sub-network model from the pre-trained super-network model, calculating the weight gradient of the sub-network model based on the target data set, updating the weight of the sub-network model based on the weight gradient of the sub-network model to obtain an updated sub-network model, and obtaining the updated super-network model based on the updated sub-network model; repeating the steps until the updated hyper-network model meets a termination condition to obtain the hyper-network model after the transfer learning; wherein the termination condition comprises at least one of: the number of repetitions is greater than or equal to the first number of iterations; and the inference precision of the updated hyper-network model is greater than or equal to the first inference precision.

That is, in each iterative training, only the sub-network model of a single path is activated, or it can be understood that only one sub-network model is selected from the super-network models, the weight of the sub-network model is updated, and the iteration is continued until the training is completed. At each iterative training, only the weights of the selected sub-network models are activated and updated.

According to the scheme of the embodiment of the application, the super network model is migrated through the one-way algorithm, so that the sub-network model can be uniformly sampled and trained, and the training effect is improved. In addition, the memory space can be reduced, and efficient training is realized.

With reference to the first aspect, in certain implementation manners of the first aspect, the performing migration learning on the pre-trained super network model based on the target data set to obtain a super network model after migration learning includes: selecting N from the pre-trained hyper-network model_bA sub-network model that computes the N based on the target dataset_bWeight gradient of a sub-network model based on said N_bUpdating the N by weight gradient of a sub-network model_bWeights of the sub-network models to obtain an updated hyper-network model, N_bIs a positive integer;

repeating the steps until the updated hyper-network model meets a termination condition, and obtaining the hyper-network model after the transfer learning, wherein the termination condition comprises at least one of the following conditions: the number of repetitions is greater than or equal to the first number of iterations; and the inference precision of the updated hyper-network model is greater than or equal to the first inference precision.

The sub-network model of only a single path is activated each time the sub-network model is selected, or it can be understood that only one sub-network model is selected each time the sub-network model is selected from the super-network model. Select N_bThe sub-network model may also be understood as selecting N_bA secondary sub-network model.

Updating N_bThe weights of the sub-network models are the weights of the updated super-network model.

Illustratively, updating the super network model weights may include: subtracting N by the weight of the current hyper-network model_bThe weight gradient of the sub-network model updates the weight of the super-network model.

Alternatively, updating the super network model weights may include: subtracting N by the weight of the current hyper-network model_bThe product of the learning rate and the weight gradient of the sub-network model updates the weight of the super-network model.

Illustratively, the inference precision of the updated super network model may be an inference precision of at least one sub-network model in the super network model.

Since weight sharing may exist between the sub-network models, if the weight of the current sub-network model is updated according to the weight gradient calculated by each back propagation, interference may be generated on other sub-network models sharing the weight. According to the scheme of the embodiment of the application, forward propagation and backward propagation are performed for multiple times in each iteration, the weight gradients of a plurality of sub-network models are accumulated in one iteration, and the weight of the super-network model is updated only once, so that the mutual interference among different sub-network models can be reduced, the precision of the super-network model is improved, and the training speed of the super-network model is improved.

With reference to the first aspect, in some implementations of the first aspect, searching a subnetwork model in the migrated super-network model to obtain a target neural network model includes:

the method comprises the following steps: determining n first sub-network models according to the hyper-network model after the transfer learning, wherein n is an integer larger than 1;

step two: adjusting the structures of the n first sub-network models to obtain n second sub-network models;

step three: selecting n third sub-network models from the n first sub-network models and the n second sub-network models, and using the n third sub-network models as the n first sub-network models in the second step;

repeating the second step to the third step until the n third sub-network models meet a search termination condition, wherein the search termination condition includes at least one of the following: the repetition times are larger than or equal to the second iteration times, or the inference precision of at least p third sub-network models in the n third sub-network models is larger than or equal to the target precision;

determining a target neural network model from the n third sub-network models.

For example, n sub-network models are extracted from the hyper-network model after the migration learning, and the n sub-network models are the n first sub-network models. The n first sub-network models may be regarded as one population.

For example, adjusting the structure of the n first sub-network models may be adjusting the structure of the first sub-network model by cross mutation or the like.

With reference to the first aspect, in certain implementations of the first aspect, determining n first subnetwork models from the migration-learned super-network model includes: selecting n fourth sub-network models from the super-network models after the migration learning; acquiring hardware overhead of the n fourth sub-network models on target equipment; and adjusting the structures of the n fourth sub-network models based on the hardware overhead to obtain the n first sub-network models.

Optionally, the selecting n fourth sub-network models in the migrated super-network model may be randomly selecting n fourth sub-network models.

Optionally, adjusting the subnetwork model structure according to the hardware overhead on the target device of the n fourth subnetwork models may include: and adjusting the structure of the sub-network model according to the adjusted probability of the structure of the sub-network model, wherein the adjusted sub-network model can meet the target cost. Wherein the probability of the adjustment of the subnetwork model structure is determined according to the hardware overhead of the subnetwork model.

For example, for a subnet model with high hardware overhead, the probability of adjusting the current subnet model to a smaller subnet model is greater than the probability of adjusting the current subnet model to a larger subnet model. For a subnet model with low hardware overhead, the probability of adjusting the current subnet model to a larger subnet model is greater than the probability of adjusting the current subnet model to a smaller subnet model. Wherein the hardware overhead size may be determined relative to the target overhead. For example, a sub-network model with a larger target cost may be regarded as a sub-network model with a larger hardware cost, and a sub-network model with a smaller target cost may be regarded as a sub-network model with a smaller hardware cost. Alternatively, the size of the hardware overhead may also be determined relative to other references, which is not limited in this embodiment of the present application.

For example, the target device may include a GPU, NPU, or the like.

According to the scheme of the embodiment of the application, a heuristic searching mode is adopted, the hardware cost of the sub-network model on the target equipment can be sensed, the structure of the sub-network model is adjusted based on the hardware cost, then searching is carried out, and the final sub-network model can meet the target cost.

In a second aspect, there is provided an image processing method, comprising: acquiring an image to be processed; processing the image to be processed by adopting a target neural network model to obtain a processing result of the image to be processed; the target neural network model is obtained by searching a subnetwork model in a super network model, the super network model is obtained by carrying out transfer learning on a pre-trained super network model based on a target data set, the pre-trained super network model is obtained by training based on a source data set, and a task corresponding to the target data set is the same as a task corresponding to the source data set.

In the present application, since the target neural network model is obtained by using the method of the first aspect, and is relatively in line with or close to the application requirements of the neural network model, the image classification using such neural network model can achieve a better image classification effect (e.g., the classification result is more accurate, etc.). Even under the condition that the target data set is small, the super network model with good performance can be obtained, the application of a small data scene is enabled, the precision of the AutoML in the small data scene is greatly improved, and further the target neural network model meeting different user requirements is obtained.

With reference to the second aspect, in some implementations of the second aspect, the pre-trained hyper-network model is trained by progressive shrinkage.

With reference to the second aspect, in some implementations of the second aspect, the performing migration learning on the pre-trained hyper-network model based on the target data set includes: selecting a sub-network model from pre-trained super-network models, calculating the weight gradient of the sub-network model based on a target data set, updating the weight of the sub-network model based on the weight gradient of the sub-network model to obtain an updated sub-network model, and obtaining the updated super-network model based on the updated sub-network model; repeating the steps until the updated hyper-network model meets the termination condition; wherein the termination condition comprises at least one of: the number of repetitions is greater than or equal to the first number of iterations; and the inference precision of the updated hyper-network model is greater than or equal to the first inference precision.

With reference to the second aspect, in some implementations of the second aspect, the performing migration learning on the pre-trained hyper-network model based on the target data set includes: the hyper-network model is generated by selecting N from the pre-trained hyper-network model_bA sub-network model that computes the N based on the target dataset_bWeight gradient of a sub-network model based on said N_bUpdating the N by weight gradient of a sub-network model_bThe weight of the sub-network model is updated to obtain N_bA sub-network model based on the updated N_bThe sub-network model obtains an updated hyper-network model, N_bIs a positive integer; repeating the steps until the updated hyper-network model meets the termination condition, wherein the termination condition comprises at least one of the following conditions: the number of repetitions is greater than or equal to the first number of iterations; and the inference precision of the updated hyper-network model is greater than or equal to the first inference precision.

With reference to the second aspect, in some implementations of the second aspect, the target neural network model is obtained by searching a subnetwork model in a super network model, and includes: the target neural network model is obtained by determining n first sub-network models according to the hyper-network model, wherein n is an integer greater than 1; adjusting the structures of the n first sub-network models to obtain n second sub-network models; selecting n third sub-network models from the n first sub-network models and the n second sub-network models, and updating the n third sub-network models into the n first sub-network models; repeating the steps until the n third sub-network models meet the search termination condition; determined according to the n third subnetwork models; wherein the search termination condition comprises at least one of: the number of repetitions is greater than or equal to the second number of iterations, or the inference precision of at least p of the n third sub-network models is greater than or equal to the target precision.

With reference to the second aspect, in some implementations of the second aspect, determining n first subnetwork models from the super-network model includes: selecting n fourth sub-network models from the super-network models; acquiring hardware expenses of the n fourth sub-network models on the target equipment; and adjusting the structures of the n fourth sub-network models based on the hardware overhead to obtain n first sub-network models.

In a third aspect, an apparatus for obtaining a neural network model is provided, and the apparatus includes a module or a unit configured to perform the method in any one of the implementations of the first aspect and the first aspect.

In a fourth aspect, an image processing apparatus is provided, which includes means or unit for performing the method of any one of the implementations of the second aspect and the second aspect.

It is to be understood that extensions, definitions, explanations and explanations of relevant matters in the above-described first aspect also apply to the same matters in the second, third and fourth aspects.

In a fifth aspect, an apparatus for obtaining a neural network model is provided, the apparatus comprising: a memory for storing a program; a processor configured to execute the memory-stored program, the processor being configured to perform the first aspect and the method of any one of the implementations of the first aspect when the memory-stored program is executed.

The processor in the fifth aspect may be a Central Processing Unit (CPU), or may be a combination of the CPU and a neural network model arithmetic processor, where the neural network model arithmetic processor may include a Graphics Processing Unit (GPU), a neural-Network Processing Unit (NPU), a Tensor Processing Unit (TPU), and so on. Wherein, the TPU is an artificial intelligence accelerator application specific integrated circuit which is completely customized for machine learning by google (google).

In a sixth aspect, there is provided an image processing apparatus comprising: a memory for storing a program; a processor for executing the memory-stored program, the processor being configured to perform the second aspect and the method of any one of the implementations of the second aspect when the memory-stored program is executed.

The processor in the above sixth aspect may be a central processing unit, or may be a combination of a CPU and a neural network model arithmetic processor, where the neural network model arithmetic processor may include a graphics processor, a neural network model processor, a tensor processor, and the like. Wherein, the TPU is an artificial intelligence accelerator application-specific integrated circuit which is completely customized for machine learning by Google.

In a seventh aspect, a computer-readable medium is provided, which stores program code for execution by a device, the program code comprising instructions for performing the method of any one of the implementations of the first aspect or the second aspect.

In an eighth aspect, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform the method of any one of the implementations of the first or second aspect.

A ninth aspect provides a chip, where the chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface to execute the method in any one implementation manner of the first aspect or the second aspect.

Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the method in any one implementation manner of the first aspect or the second aspect.

The chip may be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence agent framework provided by an embodiment of the present application;

fig. 2 is a schematic structural diagram of a system architecture according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a convolutional neural network model according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of another convolutional neural network model provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present disclosure;

FIG. 6 is a diagram illustrating a system architecture according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of Automl provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of a system for obtaining a neural network model according to an embodiment of the present disclosure;

FIG. 9 is a schematic flow chart diagram of a method for obtaining a neural network model according to an embodiment of the present application;

FIG. 10 is a schematic block diagram of a hyper-network model provided by an embodiment of the present application;

FIG. 11 is a schematic flow chart of a progressive contraction method provided by an embodiment of the present application;

FIG. 12 is a schematic flow chart diagram of a method for obtaining a neural network model provided by an embodiment of the present application;

fig. 13 is a schematic flowchart of an image processing method provided in an embodiment of the present application;

FIG. 14 is a schematic block diagram of an apparatus for obtaining a neural network model provided by an embodiment of the present application;

fig. 15 is a schematic block diagram of an image processing apparatus provided in an embodiment of the present application;

FIG. 16 is a schematic block diagram of an apparatus for obtaining a neural network model provided by an embodiment of the present application;

fig. 17 is a schematic block diagram of an image processing apparatus provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

FIG. 1 shows a schematic diagram of an artificial intelligence body framework that describes the overall workflow of an artificial intelligence system, applicable to the general artificial intelligence field requirements.

The artificial intelligence topic framework described above is described in detail below in two dimensions, "intelligent information chain" (horizontal axis) and "Information Technology (IT) value chain" (vertical axis).

The "smart information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process.

The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure:

the infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform.

The infrastructure may communicate with the outside through sensors, and the computing power of the infrastructure may be provided by a smart chip.

The intelligent chip may be a hardware acceleration chip such as a Central Processing Unit (CPU), a neural-Network Processing Unit (NPU), a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), and a Field Programmable Gate Array (FPGA).

The infrastructure platform may include distributed computing framework and network, and may include cloud storage and computing, interworking network, and the like.

For example, for an infrastructure, data may be obtained through sensors and external communications and then provided to an intelligent chip in a distributed computing system provided by the base platform for computation.

(2) Data:

data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphics, images, voice and text, and also relates to internet of things data of traditional equipment, including service data of an existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing:

the data processing generally includes processing modes such as data training, machine learning, deep learning, searching, reasoning, decision making and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General-purpose capability:

after the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent products and industrial applications:

the intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city, intelligent terminal and the like.

The embodiment of the application can be applied to many fields in artificial intelligence, such as intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe cities and other fields.

In particular, the method for acquiring a neural network model in the embodiment of the present application may be specifically applied to the fields that require the use of a (deep) neural network model, such as automatic driving, image classification, image retrieval, image semantic segmentation, image quality enhancement, image super-resolution, and natural language processing.

The following briefly introduces two application scenarios of photo album picture classification and peace and quiet city.

Classifying photo album pictures:

when a user stores a large number of pictures on a terminal device (for example, a mobile phone) or a cloud disk, the images in the photo album are identified, so that the user or a system can conveniently classify and manage the photo album, and the user experience is improved.

By using the method for acquiring the neural network model, the neural network model suitable for photo album classification can be acquired or optimized. Then, the neural network model can be used for classifying the pictures, so that the pictures of different categories are labeled, and the pictures can be conveniently checked and searched by a user. In addition, the classification labels of the pictures can also be provided for the album management system to perform classification management, so that the management time of a user is saved, the album management efficiency is improved, and the user experience is improved.

And (3) identifying the attributes under the scene of a safe city:

under the scene of a safe city, various attribute identifications, such as pedestrian attribute identification and riding attribute identification, are required to be carried out, and the deep neural network model plays an important role in various attribute identifications by virtue of strong capability of the deep neural network model. By adopting the method for acquiring the neural network model, the neural network model suitable for attribute recognition in the scene of the safe city can be acquired or optimized. The neural network model can then be used to process the input road picture to identify different attribute information in the road picture.

Since the embodiments of the present application relate to the application of a large number of neural network models, for the sake of understanding, the following description will be made about terms and concepts related to the neural network models to which the embodiments of the present application may relate.

(1) Neural network model

The neural network model may be composed of neural units, which may be referred to as x_sAnd an arithmetic unit with intercept 1 as input, the output of which may be:

wherein s is 1, 2, … … n, n is a natural number greater than 1, and W is_sIs x_sB is the bias of the neural unit. f is the activation functions of the neural unit for introducing the non-linear characteristics into the neural network model to convert the input signals in the neural unit into output signals. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network model is a network formed by connecting together a plurality of the above-mentioned single neural units, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

(2) Deep neural network model

A deep neural network model (DNN), also called a multi-layer neural network model, may be understood as a neural network model with multiple hidden layers. The DNN is divided according to the positions of different layers, and neural network models inside the DNN can be divided into three types: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer.

Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein the content of the first and second substances,

is the input vector of the input vector,

is the output vector of the output vector,

is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector

Obtaining the output vector through such simple operation

Due to the large number of DNN layers, the coefficient W and the offset vector

The number of the same is also large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

The superscript 3 represents the number of layers in which the coefficient W is located, and the subscripts correspond to the third layer index 2 of the output and the inputThe second level index 4.

In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as

Note that the input layer is without the W parameter. In the deep neural network model, more hidden layers make the network more capable of depicting complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final purpose of the process of training the deep neural network model, that is, learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the trained deep neural network model.

(3) Convolutional neural network model

A Convolutional Neural Network (CNN) is a deep neural network model with a convolutional structure. The convolutional neural network model contains a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network model. In convolutional layers of convolutional neural network models, one neuron may be connected to only part of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The convolution kernel can be formalized as a matrix of random size, and can be learned to obtain reasonable weight in the training process of the convolution neural network model. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network model, while reducing the risk of overfitting.

(4) A Recurrent Neural Network (RNN) model is used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are all connected, and each node between every two layers is connectionless. Although the common neural network model solves a plurality of problems, the common neural network model still has no capability to solve a plurality of problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. The RNN is called a recurrent neural network, i.e., the current output of a sequence is also related to the previous output. The concrete expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more but connected, and the input of the hidden layer not only comprises the output of the input layer but also comprises the output of the hidden layer at the last moment. In theory, RNNs can process sequence data of any length. The training for RNN is the same as for conventional CNN or DNN.

(5) Loss function

In the process of training the deep neural network model, because the output of the deep neural network model is expected to be as close to the value really expected to be predicted as possible, the predicted value of the current network and the value really expected to be predicted can be compared, and the weight vector of each layer of the neural network model can be updated according to the difference situation between the predicted value and the value really expected (of course, the process of changing before the first updating is generally carried out, namely parameters are configured in advance for each layer in the deep neural network model). Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network model becomes a process of minimizing the loss.

(6) Back propagation algorithm

The neural network model can adopt the size of parameters in the neural network model modified in the training process by a Back Propagation (BP) algorithm, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and parameters in the neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.

As shown in fig. 2, the present embodiment provides a system architecture 100. In fig. 2, a data acquisition device 160 is used to acquire training data. For a neural network model for image classification, the training data may include training images and corresponding classification results of the training images, where the results of the training images may be manually pre-labeled results.

After the training data is collected, data collection device 160 stores the training data in database 130, and training device 120 trains target model/rule 101 based on the training data maintained in database 130.

Describing the target model/rule 101 obtained by the training device 120 based on the training data, the training device 120 processes the input original image, and compares the output image with the original image until the difference between the output image and the original image of the training device 120 is smaller than a certain threshold, thereby completing the training of the target model/rule 101. In this embodiment, the training device 120 may be configured to obtain a pre-trained super network model, migrate the pre-trained super network model based on the target data set, and search the sub-network model in the migrated super network model to obtain the target model/rule 101. The target data set may be stored in a database 130. In some possible implementations, the training device 120 may also be used to pre-train the hyper-network model. The hyper-network model is trained based on a source data set. The source data set may also be stored in the database 130.

The above-described target model/rule 101 can be used to implement the image processing method of the embodiment of the present application. The target model/rule 101 in the embodiment of the present application may specifically be a neural network model. It should be noted that, in practical applications, the training data maintained in the database 130 may not all come from the collection of the data collection device 160, but may be received from other devices, such as the target data set input by the client device 140. It should be noted that, the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130, and may also obtain the training data from the cloud or other places for performing the model training.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 2, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR) AR/Virtual Reality (VR), a vehicle-mounted terminal, or a server or a cloud. In fig. 2, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include: the image to be processed is input by the client device.

The preprocessing module 113 is configured to perform preprocessing according to input data (such as an image to be processed) received by the I/O interface 112, and in this embodiment, the preprocessing module 113 may not be provided, and the computing module 111 is directly used to process the input data.

In the process that the execution device 110 preprocesses the input data or in the process that the calculation module 111 of the execution device 110 executes the calculation or other related processes, the execution device 110 may call the data, the code, and the like in the data storage system 150 for corresponding processes, and may store the data, the instruction, and the like obtained by corresponding processes in the data storage system 150.

Finally, the I/O interface 112 returns the processing result, such as the classification result of the image obtained as described above, to the client device 140, thereby providing it to the user.

It should be noted that the training device 120 may generate corresponding target models/rules 101 for different targets or different tasks based on different training data, and the corresponding target models/rules 101 may be used to achieve the targets or complete the tasks, so as to provide the user with the required results.

In the case shown in fig. 2, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data inputted to the I/O interface 112 and the output result outputted from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.

It should be noted that fig. 2 is only a schematic diagram of a system architecture provided in an embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 2, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110.

As shown in fig. 2, a target model/rule 101 is obtained according to training of the training device 120, where the target model/rule 101 may be a neural network model in this embodiment, and specifically, the neural network model constructed in this embodiment may include a CNN (convolutional neural network), a DCNN (deep convolutional neural network), a Recurrent Neural Network (RNNS), and the like.

Since CNN is a very common neural network model, the structure of CNN will be described in detail below with reference to fig. 3. As described in the introduction of the basic concept, the convolutional neural network model is a deep neural network model with a convolutional structure, and is a deep learning (deep learning) architecture, where the deep learning architecture refers to performing multiple levels of learning at different abstraction levels through an algorithm updated by the neural network model. As a deep learning architecture, CNN is a feed-forward (feed-forward) artificial neural network model in which individual neurons can respond to images input thereto.

The structure of the neural network model specifically adopted in the image processing method according to the embodiment of the present application may be as shown in fig. 3. In fig. 3, convolutional neural network model (CNN)200 may include an input layer 210, a convolutional/pooling layer 220 (where pooling layer is optional), and a neural network model layer 230. The input layer 210 may obtain an image to be processed, and deliver the obtained image to be processed to the convolutional layer/pooling layer 220 and the following neural network model layer 230 for processing, so as to obtain a processing result of the image. The following describes the internal layer structure in CNN 200 in fig. 3 in detail.

Convolutional layer/pooling layer 220:

and (3) rolling layers:

the convolutional layer/pooling layer 220 shown in fig. 3 may include layers such as 221 and 226, for example: in one implementation, 221 is a convolutional layer, 222 is a pooling layer, 223 is a convolutional layer, 224 is a pooling layer, 225 is a convolutional layer, 226 is a pooling layer; in another implementation, 221, 222 are convolutional layers, 223 is a pooling layer, 224, 225 are convolutional layers, and 226 is a pooling layer. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.

The inner working principle of a convolutional layer will be described below by taking convolutional layer 221 as an example.

Convolution layer 221 may include a number of convolution operators, also called kernels, whose role in image processing is to act as a filter to extract specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on the image, the weight matrix is usually processed pixel by pixel (or two pixels by two pixels … …, depending on the value of the step size stride) in the horizontal direction on the input image, so as to complete the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same size (row by column), i.e. a plurality of matrices of the same type, are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by "plurality" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The plurality of weight matrices have the same size (row × column), the sizes of the convolution feature maps extracted by the plurality of weight matrices having the same size are also the same, and the extracted plurality of convolution feature maps having the same size are combined to form the output of the convolution operation.

The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can be used to extract information from the input image, so that the convolutional neural network model 200 can make correct prediction.

When convolutional neural network model 200 has multiple convolutional layers, the more convolutional layers (e.g., 221) ahead tend to extract more general features, which may also be referred to as low-level features; as the depth of the convolutional neural network model 200 increases, the more convolutional layers (e.g., 226) in the future extract more complex features, such as features with high levels of semantics, the more highly semantic features are more suitable for the problem to be solved.

A pooling layer:

since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, where the layers 221-226, as illustrated by 220 in fig. 3, may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.

Neural network model layer 230:

after processing by convolutional layer/pooling layer 220, convolutional neural network model 200 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, to generate the final output information (class information required or other relevant information), the convolutional neural network model 200 needs to generate one or a set of the number of required classes of output using the neural network model layer 230. Therefore, a plurality of hidden layers (231, 232 to 23n shown in fig. 3) and an output layer 240 may be included in the neural network model layer 230, and parameters included in the plurality of hidden layers may be obtained by pre-training according to related training data of a specific task type, for example, the task type may include image recognition, image classification, image super-resolution reconstruction, and the like.

After the plurality of hidden layers in the neural network model layer 230, i.e., the last layer of the entire convolutional neural network model 200 is an output layer 240, the output layer 240 has a loss function similar to the classification cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e., the forward propagation is the propagation from the direction 210 to 240 in fig. 3) of the entire convolutional neural network model 200 is completed, the backward propagation (i.e., the backward propagation is the propagation from the direction 240 to 210 in fig. 3) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network model 200, and the error between the result output by the convolutional neural network model 200 through the output layer and the ideal result.

The structure of the neural network model specifically adopted in the image processing method according to the embodiment of the present application may be as shown in fig. 4. In fig. 4, convolutional neural network model (CNN)200 may include an input layer 210, a convolutional/pooling layer 220 (where pooling layer is optional), and a neural network model layer 230. Compared with fig. 3, in the convolutional layer/pooling layer 220 in fig. 4, a plurality of convolutional layers/pooling layers are parallel, and the features extracted respectively are all input to the neural network model layer 230 for processing.

It should be noted that the convolutional neural network models shown in fig. 3 and 4 are only examples of two possible convolutional neural network models of the image processing method according to the embodiment of the present application. In a specific application, the neural network model adopted by the image processing method of the embodiment of the present application may also exist in the form of other network models.

In addition, the neural network model obtained by the method for obtaining the neural network model in the embodiment of the present application may be used in the image processing method in the embodiment of the present application.

Fig. 5 is a hardware structure of a chip provided in an embodiment of the present application, where the chip includes a neural network model processor 50. The chip may be provided in the execution device 110 as shown in fig. 2 to complete the calculation work of the calculation module 111. The chip may also be disposed in the training apparatus 120 as shown in fig. 2 to complete the training work of the training apparatus 120 and output the target model/rule 101. The algorithms for each layer in the convolutional neural network model shown in fig. 3 and 4 can be implemented in a chip as shown in fig. 5.

The neural network model processor NPU 50 is mounted as a coprocessor on a main processing unit (CPU) (host CPU), and tasks are distributed by the main CPU. The core portion of the NPU is an arithmetic circuit 503, and the controller 504 controls the arithmetic circuit 503 to extract data in a memory (weight memory or input memory) and perform an operation.

In some implementations, the arithmetic circuit 503 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuitry 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 503 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 502 and buffers each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 501 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in an accumulator (accumulator) 508.

The vector calculation unit 507 may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 507 may be used for network calculation of non-convolution/non-FC layers in the neural network model, such as pooling (Pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector calculation unit 507 can store the processed output vector to the unified buffer 506. For example, the vector calculation unit 507 may apply a non-linear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 507 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 503, for example for use in subsequent layers in a neural network model.

The unified memory 506 is used to store input data as well as output data.

A memory unit access controller 505 (DMAC) transfers input data in the external memory to the input memory 501 and/or the unified memory 506, stores weight data in the external memory into the weight memory 502, and stores data in the unified memory 506 into the external memory.

A Bus Interface Unit (BIU) 510, configured to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through a bus.

An instruction fetch buffer 509 connected to the controller 504 for storing instructions used by the controller 504;

the controller 504 is configured to call the instruction cached in the instruction storage 509 to implement controlling the working process of the operation accelerator.

Generally, the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are On-Chip memories, the external memory is a memory outside the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM), or other readable and writable memories.

Among them, the operations of the layers in the neural network model, for example, the convolutional neural network model shown in fig. 3 and 4, may be performed by the arithmetic circuit 503 or the vector calculation unit 507. For example, the operation of pre-training the hyper-network model in the embodiment of the present application may be performed by the operation circuit 503 or the vector calculation unit 507. For example, the operation of migrating the pre-trained neural network model based on the target data set in the embodiment of the present application may be performed by the operation circuit 503 or the vector calculation unit 507. For example, the operations of the layers in the target neural network model in the embodiment of the present application may be performed by the arithmetic circuit 503 or the vector calculation unit 507.

The execution device 110 in fig. 2 described above is capable of executing the steps of the image processing method according to the embodiment of the present application, and the chip shown in fig. 5 may also be used for executing the steps of the image processing method according to the embodiment of the present application.

While the training device 110 in fig. 2 described above is capable of performing the steps of the method for obtaining a neural network model according to the embodiment of the present application, the chip shown in fig. 5 may also be used for performing the steps of the method for obtaining a neural network model according to the embodiment of the present application.

As shown in fig. 6, the present embodiment provides a system architecture 300. The system architecture includes a local device 301, a local device 302, and an execution device 310 and a data storage system 350, wherein the local device 301 and the local device 302 are connected with the execution device 310 through a communication network.

The execution device 310 may be implemented by one or more servers. Optionally, the execution device 310 may be used with other computing devices, such as: data storage, routers, load balancers, and the like. The enforcement devices 310 may be disposed on one physical site or distributed across multiple physical sites. The executing device 310 may use data in the data storage system 350 or call program code in the data storage system 350 to implement the method for acquiring a neural network model or the image processing method according to the embodiment of the present application.

Specifically, in one implementation, the execution device 310 may perform the following processes:

acquiring a pre-trained super network model, wherein the pre-trained super network model is obtained based on source data set training;

acquiring a target data set;

performing transfer learning on the pre-trained hyper-network model based on the target data set to obtain a transfer-learned hyper-network model;

and searching the sub-network model in the ultra-network model after the transfer learning to obtain a target neural network model.

The process execution device 110 can obtain a target neural network model, which can be used for image classification or image processing, etc.

The user may operate respective user devices (e.g., local device 301 and local device 302) to interact with the execution device 310. Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and so forth.

The local devices of each user may interact with the enforcement device 310 via a communication network of any communication mechanism/standard, such as a wide area network, a local area network, a peer-to-peer connection, etc., or any combination thereof.

In one implementation, the local device 301 or the local device 302 acquires relevant parameters of the target neural network model from the execution device 310, deploys the target neural network model on the local device 301 or the local device 302, and performs image classification or image processing and the like by using the target neural network model.

In another implementation, the target neural network model may be directly deployed on the execution device 310, and the execution device 310 classifies or otherwise processes the images to be processed by acquiring the images to be processed from the local device 301 and the local device 302 and using the target neural network model.

The execution device 310 may also be a cloud device, and at this time, the execution device 310 may be deployed in the cloud; alternatively, the execution device 310 may also be a terminal device, in which case, the execution device 310 may be deployed at a user terminal side, which is not limited in this embodiment of the application.

In an auto machine learning (AutoML) cloud service platform, a user can customize a neural network model according to own requirements and tasks. The cloud platform based on automatic machine learning can carry out network design and search according to the limiting conditions set by a user, and the network model obtained by network design and search is trained and provided for the user. The limiting conditions may include the type of the network model, the accuracy of the network model, the time delay of the network model, the running platform of the network model, and the like.

By using the method for acquiring the neural network model provided by the embodiment of the application, the neural network model can be acquired according to the requirements of the user, the performance of the acquired neural network model is improved, and the processing efficiency in the process of acquiring the neural network model is improved.

Fig. 7 shows a schematic structural diagram of the AutoML framework. As shown in fig. 7, AutoML generally defines a search space (search space) in advance, and the search space refers to a searchable range. The AutoML continuously generates the sub-network model configuration in the search space and forms a closed loop of evaluation-feedback-regeneration of the sub-network model configuration until finally searching to obtain an excellent neural network model.

Specifically, the search space is determined according to the AutoML specific task, for example, when the specific task is a neural network model, the search space may include a plurality of neural network model structural units, and the final neural network model is formed by combining these neural network model units in the search space.

The controller 710 is configured to select different configurations within the search space to be assigned to the evaluator 720 for evaluation, and then perform policy update, or configuration update, according to the evaluation result of the evaluator 720 metric feedback. For example, the controller 710 may select a neural network model building block in the search space, or search for a neural network model building block, and combine the neural network model building blocks to obtain one or more sub-network models, select a sub-network model from the combined sub-network models, and assign the sub-network model configuration to the evaluator 720 for evaluation.

The evaluator 720 is configured to evaluate the performance indicators of different configurations and feed back the evaluation results to the controller 710. For example, evaluator 720 may evaluate a performance metric of the subnetwork model selected by controller 710. For example, the performance indicators may include accuracy of the neural network model, latency of the network model, and the like. And feeding back the evaluation result to the optimizer for the controller 710 to update the configuration until the target neural network model is obtained.

However, the training cost of AutoML is much higher than that of the general neural network model. For example, training time, computation amount, and the like of the AutoML are at least one data order higher than those of the common neural network model.

In addition, the AutoML usually needs a large amount of data to obtain an excellent neural network model, and under a part of small data scenes, the AutoML is often difficult to directly train to generate an excellent model.

Knowledge or patterns learned on a certain domain or task can be applied to different but related domains or problems by transfer learning (transfer learning). Generally speaking, a neural network model can be enabled to enable a small data set through transfer learning, and training resources are reduced. However, migrating the neural network model through transfer learning may not meet the user's needs. For example, the overhead requirements of the user cannot be met. The neural network model obtained before the migration can not necessarily meet the overhead requirement of the user, the overhead of the neural network model after the migration is basically consistent with that of the neural network model before the migration, and if the neural network model before the migration cannot meet the overhead requirement of the user, the neural network model after the migration cannot naturally meet the overhead requirement of the user. For another example, the accuracy requirement of the user cannot be met. The neural network model before migration is obtained based on source data set training, and the architecture of the neural network model obtained after migration is basically unchanged. If the architecture is not sufficient in nature or is not suitable for the target task, parameter tuning is performed based on the target data set at a glance, and the effect of the neural network model cannot be improved significantly, that is, the accuracy of the neural network model of the architecture may not meet the accuracy requirement of the user.

Therefore, how to effectively reduce the training resources required by the AutoML, and enabling the AutoML to enable small datasets is an important challenge to be overcome by the current AutoML landing.

The embodiment of the application provides a method for acquiring a neural network model based on AutoML, which can be executed by a system for acquiring the neural network. Fig. 8 is a schematic block diagram of a system 800 for obtaining a neural network model according to an embodiment of the present application.

In order to better understand the implementation process of the method for obtaining the neural network model in the embodiment of the present application, the functions of the respective modules in fig. 8 are briefly described below.

The system 800 for acquiring the neural network model may be a cloud service device, a mobile terminal, for example, a device with sufficient computing power such as a computer and a server for acquiring the neural network model, or a system formed by the cloud service device and the mobile terminal.

The system 800 for obtaining a neural network model mainly includes: a pre-training module 810, an input module 820, a migration module 830, a search module 840, a test module 850, and an output module 860.

The pre-training module 810 can be configured to pre-train the hyper-network model to obtain weights of the hyper-network model.

Wherein a hyper network model refers to a model that can cover all sub network models in the search space. The weight of the super network model is the weight of all the sub network models. That is, the weights of the subnetwork model can be obtained from the super network model.

The super network model may be a pre-defined hyper network model of AutoML. For example, the hyper-network model may be defined in terms of the tasks that the neural network model needs to perform.

It should be noted that the pre-training module 810 is an optional module, and the pre-training process may be performed by other devices. In this case, the migration module 830 may receive a hyper-network model pre-trained by other devices.

Alternatively, the pre-training may be off-line training, i.e. the pre-training process may be completed in an off-line phase. It should be understood that online and offline in embodiments of the present application may be at different stages relative to the user. Alternatively, it can be appreciated that the system 800 in the offline phase is not affected by the user, and the models of the hyper-network resulting from the offline training can be stored for later processing in the online phase. The system 800 in the online phase may accept the user's input and perform corresponding operations according to the user's input. For example, the pre-training process may be performed in an off-line phase, and when the user acquires the required neural network model by using the system 800, the user may directly acquire the pre-trained hyper-network model through the migration module 830 without performing the pre-training operation on line.

Illustratively, the pre-training module 810 may be located on a cloud server or on a local device.

The pre-training module 810 pre-trains the super-network model based on the source data set.

The source data set may be a data set relating to the tasks that the target neural network model needs to perform. For example, when the target neural network model is used for image classification, the source data set may include the source sample image and the classification label for the source sample image. For example, the source data set may be the public data set ImageNet.

The input module 820 may be used to receive input data from a user. For example, the input module 820 may receive any one or more of: target data set, hyper-parameters, target overhead, target accuracy, target search duration or target loss function, etc.

This target data set is used to fine-tune (fine-tuning) the hyper-network model output by pre-training module 810.

It should be noted that the hyper-parameters of the neural network model include parameters that are not changed during the training of the neural network model. The hyper-parameters are not derived by training of the neural network model, and are typically determined prior to training of the neural network model.

Illustratively, the hyper-parameters of the neural network model include: a learning rate of the neural network model, a label smoothing (label smoothing) coefficient of the neural network model, a drop (dropout) parameter of the neural network model, or the like.

The target overhead refers to the hardware overhead of the target neural network model output by the output module 860 on the target device.

The target accuracy refers to the inference accuracy of the target neural network model output by the output module 860.

The target search duration refers to a search duration for searching the subnetwork model in the super network model to obtain the target neural network model.

The objective loss function is used to fine tune the hyper-network model output by the pre-training module 810.

The migration module 830 can be configured to perform migration learning on the pre-trained hyper-network model based on the target dataset. Alternatively, it can be appreciated that the weights of the hyper-network model obtained by the pre-training module 810 are migrated to the target data set.

The migration module 830 may be located on the cloud server or on the local device.

Illustratively, the migration module 830 may receive the pre-trained hyper-network model sent by the pre-training module 810. For example, if the migration module 830 and the pre-training module 810 are located on different devices, the pre-trained hyper-network model may be transmitted between the migration module 830 and the pre-training module 810 over a communication network.

Specifically, the migration module 830 may fine-tune the weights of the hyper-network model obtained by the pre-training module 810 based on the target data set. For example, the target data set may be the data set input by the input module 820.

For example, the migration module 830 may load the weights of the super network model output by the pre-training module 810 and fine-tune the super network model according to the target data set input by the user.

As another example, the migration module 830 may load the weights of the super network model output by the pre-training module 810 and fine-tune the super network model according to the objective loss function input by the user.

The searching module 840 may be configured to search the subnetwork model output by the migrating module 830 for a subnetwork model, so as to obtain a target neural network model.

The search module 840 may be located on a cloud server or may be located on a local device.

Illustratively, the performance metrics of the target neural network model may satisfy the target performance metrics. That is, the searching module 840 may search the super network model for a target sub-network model whose performance index satisfies the target performance index, and determine the target neural network model according to the target sub-network model. The performance index of the subnetwork model may include inference accuracy of the subnetwork model, hardware overhead of the subnetwork model, or inference duration of the subnetwork model. The target performance index may include a target accuracy, a target cost, or a target inference duration, etc.

The target performance metric may be a default or input via input module 820. For example, a user may input a desired target performance metric via input module 820.

For example, the target subnetwork model can be a subnetwork model whose inference accuracy reaches a target accuracy.

For another example, the target subnetwork model can be a subnetwork model with hardware overhead up to a target overhead and inference precision up to a target precision. Wherein the testing of the hardware overhead of the subnetwork model may be performed by the testing module 850. The test module 850 resides on the target device. That is, the subnet model is deployed on the target device, and its hardware overhead size is tested by the test module 850.

Illustratively, the search duration of the search module 840 may satisfy the target search duration.

For example, the searching module 840 may search the super network model for a target sub-network model whose performance indicator satisfies the target performance indicator and whose search duration satisfies the target search duration, and determine the target neural network model according to the target sub-network model.

The testing module 850 is used to test the hardware overhead of different sub-network models on the target device. It should be appreciated that the test module 850 is an optional module. The test module 850 resides on the target device.

The output module 860 is used for outputting the target neural network model obtained by the search module 840.

A method 900 for obtaining a neural network model according to an embodiment of the present application is described in detail below with reference to fig. 9. The method shown in FIG. 9 may be performed by an apparatus for obtaining a neural network model, for example, by the training device 120 shown in FIG. 2, or by the system 800 shown in FIG. 8. The device for acquiring the neural network model may be a cloud service device, or may be a mobile terminal, for example, a computer, a server, or the like, which has sufficient computing power to execute the method 900, or may be a system formed by the cloud service device and the mobile terminal. The method 900 includes steps S910 to S940. The following describes steps S910 to S940 in detail.

S910, pre-training the hyper-network model.

In the embodiment of the present application, the pre-trained hyper-network model may also be understood as a weight of the pre-trained hyper-network model.

Illustratively, step S910 may be performed by pre-training module 810 in fig. 8 or training device 120 in fig. 2. It should be understood that this is merely an illustration, and in the embodiment of the present application, step S910 may also be performed by other devices. That is, the pre-trained hyper-network model obtained in step S930 may be a model trained by other devices.

In embodiments of the present application, the hyper-network model may also be referred to as a hyper-network or hyper-model.

The neural network model is formed by stacking a plurality of layers of operators, the neural network model can be represented by a directed acyclic graph formed by stacking a plurality of layers of operators, each layer in the directed acyclic graph is a node, and the operator at each node is a single operator.

Each layer in the super network model comprises a plurality of operators, namely each node is provided with a plurality of candidate operators, the operators between the layers are connected in a full connection mode, and one path in the full connection mode is a sub network model. For example, an operator is selected in each layer, and a neural network model composed of the selected operators in multiple layers is a sub-network model. And updating the weights in the path, namely updating the weights of the subnetwork model, and updating the weights of the super-network model at the same time, namely achieving the effect of training the super-network model. For example, in the hyper-network model shown in fig. 10, the candidate operators in one layer include operator 411 and operator 412 to wait for selection, the candidate operators in the layer may both be convolution, and the number of channels of operator 411 and operator 412 may be different.

The operator refers to a basic unit of calculation of the neural network model, and in this embodiment, the operator may also be understood as a structural unit of the neural network model or a "block" in the neural network model. The relationship between the above-described super network model and the sub network model may also be understood as that each layer of the super network model comprises a plurality of selectable blocks, one block being selected in each layer, the selected blocks being combined to form a sub network model.

Illustratively, the operator may include: an activation operator, a feature extraction operator, a normalization operator, an anti-overfitting operator, and the like. For example, the activation operator may include: modified linear unit (ReLU), sigmoid, and the like. The feature extraction operator may include: convolution (convolution), full connection (full connection), and the like. The normalization operator may include: batch normalization (batch normalization), etc. The anti-overfitting operator may include: pooling (pooling), and the like.

The network topologies of the sub-network models in the super-network model may be identical. In particular, the direction of data flow between the blocks that make up the subnetwork model may be the same. The sub-network models in the super-network model may differ in convolution kernel size, number of layers, or number of channels.

The hyper network model may be predefined. Illustratively, the search space may be predefined according to the tasks that the target neural network model needs to perform. The search space may be used as a searchable range in step S940. Alternatively, it will be appreciated that the hyper-network model may be defined according to the tasks that the target neural network model needs to perform. Specifically, the number or kind of operators selectable in the hyper-network model can be determined according to the task to be performed by the target neural network. Illustratively, the tasks required to be performed by the target neural network may include image classification, image segmentation, or target detection, among others. Exemplarily, step S910 may be completed in an offline stage. For example, when acquiring a required neural network model, a user can directly acquire a hyper-network model which is pre-trained, and the pre-trained operation is not required to be performed on line.

Exemplarily, step S910 is an optional step. Method 900 may begin execution at step S920.

Illustratively, the super network model may be pre-trained based on the source data set.

It should be noted that the source data set may be a data set related to a task that the target neural network model needs to perform. The source data set may include source sample data and tags corresponding to the source sample data.

For example, when the target neural network model is used for image classification, the source data set may include the source sample images and classification labels for the source sample images. For example, the source data set may be the public data set ImageNet.

As another example, when the target neural network model is used for image segmentation, the source data set may include the source sample image and classification labels for pixels in the source sample image.

For another example, when the target neural network model is used for target detection, the source data set may include the source sample images and object classification labels in the source sample images, as well as bounding boxes for the objects.

Optionally, step S910 includes: the hyper-network model is pre-trained based on a single path (single path) algorithm.

Specifically, in each iterative training, only one single-path subnetwork model is activated, or it can be understood that only one subnetwork model is selected from the super-network models, the weights of the subnetwork models are updated, and the iteration is continued until the training is completed.

Optionally, the hyper-network model is pre-trained based on a Progressive Shrinkage (PS) algorithm. FIG. 11 shows a schematic diagram of a progressive shrink algorithm. As shown in fig. 11, the maximum subnetwork model is trained first, and then the convolutional kernel variable subnetwork model, the layer number variable subnetwork model, and the channel number variable subnetwork model are trained step by step.

Specifically, the training of the subnet model with the variable convolution kernel can be performed by sampling a plurality of subnet models with the maximum training layer number and the maximum channel number from the super-network model. That is, in the training phase, the number of layers of the plurality of trained subnetwork models is D, the number of channels is W, and the sizes of convolution kernels of the plurality of subnetwork models may be different. Wherein D represents the maximum number of layers in the super network model, and W represents the maximum number of channels in the super network model.

Illustratively, the training of the subnet model with the variable convolution kernels can be performed by adopting a random one-way algorithm to sample a plurality of subnet models with the maximum number of layers, the maximum number of channels and different convolution kernels.

Specifically, the sub-network model with the variable training layer number may be trained for a plurality of sub-network models with the largest number of sampling channels from the super-network model. That is, in the training phase, the number of channels of the plurality of trained subnetwork models is W, the convolution kernel sizes of the plurality of subnetwork models may be different, and the number of layers of the plurality of subnetwork models may be different. Where W represents the maximum number of channels in the super network model.

Illustratively, the training sub-network model with the variable number of layers can be obtained by training a plurality of sub-network models with the maximum number of channels, different number of layers and different convolution kernels by adopting a random one-way method. The sub-network model with the variable number of training channels can be trained by randomly sampling different sub-network models.

In particular, the sub-network model with the variable number of training channels can be trained by sampling a plurality of sub-network models from the super-network model. That is, in the training phase, the sizes of convolution kernels of the trained sub-network models may be different, the number of layers of the sub-network models may be different, and the number of channels of the sub-network models may be different.

Illustratively, the sub-network model with the variable number of training channels can be trained by sampling a plurality of different sub-network models by adopting a random one-way method.

Specifically, the subnet model with variable convolution kernels, the subnet model with variable layer number, or the subnet model with variable channel number can be trained by means of knowledge distillation.

Knowledge distillation refers to transferring the knowledge of one neural network model to another neural network model. Knowledge of the neural network model can be understood as a mapping of inputs to outputs. The input-to-output mapping in the neural network model is determined based on parameters of the neural network model. That is knowledge distillation can be understood as transferring the parameters of one neural network model to another neural network model.

Specifically, knowledge distillation refers to training a student (student) network model with the output of a trained teacher network model and the real labels of training samples. In the embodiment of the present application, the teacher network model refers to a maximum subnetwork model, and the student network model refers to a subnetwork model with a variable convolution kernel, a subnetwork model with a variable number of layers, or a subnetwork model with a variable number of channels.

For example, training the subnet model with a variable convolution kernel by knowledge distillation means that the source data is input into the trained maximum subnet model to obtain an output value of the maximum subnet model, and the subnet model with a variable convolution kernel is trained based on the output value and a label corresponding to the source sample data.

Due to weight sharing between the sub-network models, interference between different sub-network models may occur when training the super-network model. The interaction of the sub-network models with different sizes in the training process is reduced through the training of the progressive shrinkage algorithm, and the obtained super-network model can support various different architecture settings. For example, a number of different architectural settings include different numbers of layers, different numbers of channels, different subnet models of convolution kernel sizes, and so forth. After the training of the super network model is completed, a proper sub network model can be selected from the super network model, additional training on the searched sub network model is not needed, or retraining is not needed for the sub network model, and the accuracy of the sub network model can be ensured to meet the requirement of pre-training. In the training process of the super network model, each sub network model does not need to be trained independently, and the sub network model in the super network model can reach the accuracy rate similar to the accuracy rate of the independently trained sub network model.

S920, input data is obtained, and the input data comprises a target data set.

Optionally, the input data may also include any one or more of: hyper-parameters, target overhead, target accuracy, target search duration, or target loss function, etc.

Illustratively, this step may be performed by input module 820 in FIG. 8.

The target data set may be determined according to the tasks that the target neural network model needs to perform. That is, the sub-network models in the pre-trained hyper-network model are consistent with the tasks performed by the target neural network model. Alternatively, it can be understood that the task corresponding to the source data set is the same as the task corresponding to the target data set. For example, both are used for image classification; or, both are used for image segmentation; alternatively, both are used for target detection.

Illustratively, when the target neural network model is used to implement image classification, then the target dataset may include the target sample image and the classification label for the target sample image. For example, a target neural network model is used to implement vehicle identification. The target data set may include the target vehicle image and the classification label for the target vehicle image.

For another example, when the target neural network model is used for image segmentation, the target dataset may include the target sample image and classification labels for pixels in the target sample image. As another example, if a target neural network model is used to implement target detection, then the target dataset may comprise a target detection dataset. The target detection dataset may include a target sample image, an object classification label in the target sample image, and a bounding box for the object.

Illustratively, the target data set may be a data set input by a user or a data set acquired from another device. For example, step S920 is performed by a cloud service device, and then the other device may be a target device. The target device may be a device to be deployed by the target neural network model.

S930, carrying out transfer learning on the pre-trained super network model based on the target data set to obtain the super network model after the transfer learning.

Trimming refers to applying a pre-trained model to a target data set and adapting the parameters of the model to the target data set.

The step of performing transfer learning on the pre-trained super network model specifically refers to transferring the weight of the pre-trained super network model.

The knowledge learned on the source data set is migrated to the target data set by migration learning. That is, the weights of the pre-trained hyper-network model are migrated to the target data set.

Illustratively, step S930 may be performed by the migration module 830 in fig. 8 or the training device 120 in fig. 2. It should be understood that this is merely an illustration, and in the embodiment of the present application, step S930 may also be performed by other devices.

Specifically, a pre-trained hyper-network model is obtained, or the weights of the pre-trained hyper-network model are loaded, and the hyper-network model is finely adjusted based on the target data set. The device for executing step S910 and the device for executing step S930 may be the same or different. Illustratively, when the device performing step S910 and the device performing step S930 are different devices, the pre-trained hyper network model may be transmitted through a communication network.

Optionally, step S930 includes fine-tuning the pre-trained hyper-network model by a one-way algorithm. Therefore, the sub-network model can be uniformly sampled and trained, and the training effect is improved. In addition, the memory space can be reduced, and efficient training is realized.

Fine-tuning a pre-trained hyper-network model through a one-way algorithm, comprising:

selecting a sub-network model from the pre-trained super-network models, calculating the weight gradient of the sub-network model based on the target data set, and updating the weight of the sub-network model based on the weight gradient of the sub-network model to obtain an updated super-network model; and repeating the steps until the updated hyper-network model meets the termination condition to obtain the hyper-network model after the transfer learning.

Wherein the selecting of one sub-network model from the super-network models may be a random selection of one sub-network model from the super-network models.

The following takes an iterative training process as an example, and illustrates an updating method of the weights of the currently selected subnetwork model.

In an iterative training process, a sub-network model in the super-network model is selected during each forward propagation, namely target sample data is input into the sub-network model, a loss value (loss) corresponding to the output of the sub-network model is calculated through a loss function, a weight gradient of the current sub-network model is calculated according to the back propagation of the loss value, and the weight of the sub-network model is adjusted according to the weight gradient.

The function value of the loss function is used to indicate the difference between the classification label of the target sample image and the prediction label output by the sub-network model. And updating the weights of the subnetwork model according to the difference situation between the two until the predicted labels of the neural network model and the labels of the training data are very close. For example, a higher function value for the loss function indicates a larger variance, and the training of the neural network model becomes a process of narrowing the function value as much as possible. In some cases, the loss function may also be an objective function.

Optionally, step S930 includes: selecting N from a pre-trained hyper-network model_bA model, calculating N based on the target data set_bWeight gradient of the sub-network model based on N_bUpdating the weight of the pre-trained hyper-network model by the weight gradient of the sub-network model to obtain an updated hyper-network model; and repeating the steps until the updated hyper-network model meets the termination condition to obtain the hyper-network model after the transfer learning.

The sub-network model of only a single path is activated each time the sub-network model is selected, or it can be understood that only one sub-network model is selected each time the sub-network model is selected from the super-network model.

The following describes a method for updating the weights of the hyper-network model by taking an iterative process as an example.

And selecting a sub-network model from the super-network model every time of forward propagation, namely inputting a target sample image into the sub-network model and calculating a function value of the loss function. And calculating the weight gradient of the current subnetwork model according to the function value back propagation. Executing N_bThis procedure, i.e. selecting N_bSecondary network model, calculating N_bWeight gradient of the sub-network model, cumulative N_bThe resulting weight gradient. The weights of the hyper-network model are then updated once according to the accumulated weight gradient, which process can be considered as an iterative process. And continuously iterating until the termination condition is met, and obtaining the ultra-network model after the transfer learning, namely completing the transfer of the ultra-network model. Wherein N is_bIs a positive integer. N is a radical of_bMay be preset or preset, and may beInput by a user. A function value of a loss function indicating a difference between a classification label of the target sample image and a prediction label output by the sub-network model

Wherein N is_bThe sub-cumulative weight gradient may satisfy:

wherein dW represents the weight gradient of the hyper-network model, and L represents the function value of the loss function during the ith forward propagation in the process of one iteration.

Illustratively, updating the one-time hyper-network model weights may include: the weights of the super network model are updated by subtracting the accumulated weight gradient from the weights of the current super network model.

Alternatively, updating the once-through hyper-network model weights may include: the weight of the super network model is updated by subtracting the product of the accumulated weight gradient and the learning rate from the weight of the current super network model. For example, the weights of the current hyper-network model may satisfy:

W_j＝W_j-1-lr*dW

wherein, W_jIs the weight of the hyper-network model after the jth iteration, W_j-1Lr represents the learning rate as the weight of the hyper-network model after the j-1 th iteration.

Optionally, the termination condition comprises a number of repetitions being greater than or equal to the first number of iterations. In this case, the number of iterations may also be understood as the number of updates of the weights of the hyper-network model.

Optionally, the termination condition includes that the inference precision of the updated hyper-network model is greater than or equal to the first inference precision. For example, the inference accuracy of the super network model may be an inference accuracy of at least one sub-network model in the super network model.

In the embodiments of the present application, "inference" may also be referred to as "prediction".

Alternatively, the termination condition may include: within a preset time interval, the variation value of the inference precision of the z sub-network models is smaller than a set threshold value. The z sub-network models may be pre-specified z sub-network models. That is, z sub-network models may be pre-specified, and the precision of the z sub-network models is tested in each iteration, and if the precision of the z sub-network models does not change significantly within a period of time or within a certain number of iterations, the migration may be terminated, that is, the migration of the super-network model is completed.

Since weight sharing may exist between the sub-network models, if the weight of the current sub-network model is updated according to the weight gradient calculated by each back propagation, interference may be generated on other sub-network models sharing the weight. In the scheme of the embodiment of the application, multiple times of forward propagation and backward propagation are executed in each iteration, the weight gradients of a plurality of sub-network models are accumulated in one iteration, and the weight of the super-network model is updated only once, so that the mutual interference among different sub-network models can be reduced, the precision of the super-network model is improved, and the training speed of the super-network model is increased.

The loss function in step S930 may be a preset loss function or a target loss function input by the user.

It should be understood that the above trimming manner is merely an example, and other manners capable of trimming the pre-trained hyper-network model are all applicable to step S930, and the embodiment of the present application does not limit the manner of trimming the hyper-network model.

And S940, searching the sub-network model from the super-network model after the migration learning to obtain the target neural network model.

Illustratively, this step may be performed by the search module 840 of FIG. 8 or the training device 120 of FIG. 2.

Illustratively, the performance metrics of the target neural network model may satisfy the target performance metrics. That is, the target subnetwork model can be searched for in the transition-learned super network model, and the target neural network model can be determined based on the target subnetwork model. The target subnetwork model can be a subnetwork model whose performance indicators meet the target performance indicators. The performance index of the subnetwork model may include inference accuracy of the subnetwork model, hardware overhead of the subnetwork model, or inference duration of the subnetwork model. The target performance index may include a target accuracy, a target cost, or a target inference duration, etc.

The target overhead refers to the hardware overhead of the target neural network model on the target device.

The target accuracy refers to the inference accuracy of the target neural network model.

The target inference duration refers to an inference duration of the target neural network model.

The target performance index may be a preset target performance index or a target performance index input by a user. Illustratively, step S920 further includes obtaining a target cost, a target precision or a target inference duration, etc.

For example, if the target performance metric is a target accuracy, the target sub-network model may be a sub-network model whose inference accuracy reaches the target accuracy.

For another example, if the target performance indicator includes a target accuracy and a target cost, the target subnetwork model may be the subnetwork model with the hardware cost reaching the target cost and the inference accuracy reaching the target accuracy. Illustratively, the testing of hardware overhead of the subnetwork model may be performed by the testing module 850. Specifically, when testing the hardware overhead of the sub-network model, the sub-network model may be deployed on the target device, and the hardware overhead thereof is tested.

Illustratively, the search duration for the target subnetwork model can satisfy the target search duration.

The target search duration refers to a search duration for searching the super network model to obtain the sub network model.

For example, a target subnetwork model with a performance index meeting a target performance index is searched in the super network model, and the search time meets the target search time, and a target neural network model is determined according to the target subnetwork model.

The target search time length may be a preset target search time length or a target search time length input by a user.

Illustratively, the sub-network model is searched in the super network model after the transfer learning through a reinforcement learning algorithm to obtain a target neural network model.

Illustratively, the subnetwork model is searched in the super network model after the transfer learning through an evolutionary algorithm to obtain a target neural network model.

Searching the subnet model in the super network model after the migration learning through an evolutionary algorithm may include:

the method comprises the following steps: and determining n first sub-network models according to the hyper-network model after the transfer learning. The n first sub-network models may be used as an initial population.

For example, n sub-network models are extracted from the super-network model after the migration learning, and the n sub-network models are n first sub-network models.

Step two: and adjusting the structures of the n first sub-network models to obtain n second sub-network models.

Step three: and selecting n third sub-network models from the n first sub-network models and the n second sub-network models, and taking the n third sub-network models as the n first sub-network models in the second step. The n third sub-network models are new populations.

And repeating the second step to the third step until the n third sub-network models meet the search termination condition.

And determining the target neural network model according to the n third sub-network models.

Wherein n is a positive integer greater than 1. n may be preset or may be input by the user. For example, the value of n may be obtained through the input module 820.

The search termination condition may be preset or determined based on input data of the user.

For example, the search termination condition may be that the number of repetitions is greater than or equal to the second number of iterations. The second number of iterations may be predetermined or may be user input.

For another example, the search termination condition may be that the precision of at least p of the n third subnetwork models meets the target precision. Wherein p is a positive integer and is less than or equal to n. p may be preset or may be input by the user. For example, the value of p may be obtained through the input module 820.

For another example, the search termination condition may be that the search duration reaches the target search duration.

For another example, the search termination condition may be that the hardware costs of at least q third subnet models out of the n third subnet models satisfy the target costs. Wherein q is a positive integer and is not more than n. q may be preset or may be input by the user. For example, the value of q may be obtained through the input module 820.

In the embodiments of the present application, "decimation" may also be understood as "sampling".

Further, in step one, determining n first subnetwork models according to the transition-learned super-network model may include:

selecting n fourth sub-network models from the super-network models after the migration learning;

acquiring the hardware overhead of the n fourth sub-network models on the target equipment;

and adjusting the structures of the n fourth sub-network models based on the hardware overhead to obtain n first sub-network models.

The following illustrates a method for obtaining a target neural network model by searching a subnetwork model in a super network model after migration learning.

Illustratively, n sub-network models are extracted from the hyper-network model after the migration learning, and the sub-network model structure is adjusted according to the hardware overhead of the n sub-network models on the target device, so as to obtain n adjusted sub-network models. And taking the adjusted n sub-network models as an initial population, generating new n sub-network models by adopting cross variation, selecting n sub-network models from the 2n sub-network models to form a new population, and continuously iterating until a search termination condition is met. And finally obtaining n sub-network models as search results.

The search termination condition may be preset or determined based on an input of the user. For example, the search termination condition may be that the current iteration number reaches the second iteration number. The second number of iterations may be predetermined or may be user input.

For another example, the search termination condition may be that the precision of at least p of the n sub-network models obtained by the current iteration satisfies the target precision. Wherein p is a positive integer and is less than or equal to n. p may be preset or may be input by the user. For example, the value of p may be obtained through the input module 820.

As another example, the iteration termination condition may be that the search duration reaches the target search duration.

For another example, the iteration termination condition may be that the hardware cost of at least q sub-network models in the n sub-network models obtained by the current iteration satisfies the target cost. Wherein q is a positive integer and is not more than n. q may be preset or may be input by the user. For example, the value of q may be obtained through the input module 820.

It should be understood that the above search termination condition is merely an example, and the search termination condition may be set as needed, for example, the search termination condition may include the above two conditions.

Illustratively, extracting n sub-network models from the super-network model may include: and randomly extracting n sub-network models from the super-network model.

Illustratively, selecting n sub-network models from the 2n sub-network models to form a new population may include a variety of ways. The following illustrates a way of selecting n sub-network models from 2n sub-network models.

For example, selecting n sub-network models from the 2n sub-network models to form a new population may include: and randomly selecting n sub-network models from the 2n sub-network models to form a new population.

Or, selecting n sub-network models from the 2n sub-network models to form a new population may include: and testing the hardware expenditure of the 2n sub-network models, and selecting the n sub-network models to form a new population according to the limitation of the hardware expenditure.

Or, selecting n sub-network models from the 2n sub-network models to form a new population may include: and testing the reasoning precision of the 2n sub-network models, and selecting the n sub-network models with the highest reasoning precision to form a new population.

Or, selecting n sub-network models from the 2n sub-network models to form a new population may include: and testing the reasoning precision and hardware expense of the 2n sub-network models, and selecting the n sub-network models with the highest reasoning precision within the limit range of the hardware expense to form a new population.

Illustratively, adjusting the structure of the sub-network model according to the hardware overhead of the n sub-network models on the target device to obtain the adjusted n sub-network models includes: the subnetwork model structure can be adjusted according to the probability of the subnetwork model structure adjustment, and the adjusted subnetwork model can meet the target cost. Wherein the probability of the adjustment of the subnetwork model structure is determined according to the hardware overhead of the subnetwork model.

For example, for a subnet model with a large hardware overhead, the probability of adjusting the current subnet model to a smaller subnet model is greater than the probability of adjusting the current subnet model to a larger subnet model. For a subnet model with low hardware overhead, the probability of adjusting the current subnet model to a larger subnet model is greater than the probability of adjusting the current subnet model to a smaller subnet model. Wherein the hardware overhead size may be determined relative to the target overhead. For example, a sub-network model with a larger target cost may be regarded as a sub-network model with a larger hardware cost, and a sub-network model with a smaller target cost may be regarded as a sub-network model with a smaller hardware cost. Alternatively, the size of the hardware overhead may also be determined relative to other references, which is not limited in this embodiment of the present application.

For example, the target device may include a GPU, NPU, or the like.

In this way, a heuristic search mode is adopted, the hardware cost of the sub-network model on the target equipment can be sensed, and the structure of the sub-network model is adjusted based on the hardware cost, so that the final sub-network model can meet the target cost.

Further, the target subnetwork model is fed back to the user. The target subnetwork model can be one subnetwork model or a plurality of subnetwork models. For example, the target subnetwork model comprises m subnetwork models. Wherein m is a positive integer and is not more than n. The n sub-network models in the search results include the m sub-network models. m may be preset or may be input by the user.

Specifically, the search result can be fed back to the user according to the user's needs. The following illustrates a specific form of feeding back the search results to the user.

Illustratively, the m sub-network models may be fed back to the user, who selects the desired sub-network model as the target neural network model. For example, the m sub-network models fed back to the user may be m sub-network models randomly selected from the search results. For another example, the m sub-network models fed back to the user may be m sub-network models with the highest precision in the search result, and further, the m sub-network models may be fed back to the user based on the precision ranking. For another example, the m sub-network models fed back to the user may be the m sub-network models with the smallest overhead in the search result, and further, the m sub-network models may be fed back to the user based on the ranking of the overhead. As another example, the m sub-network models fed back to the user may be the m sub-network models with the highest accuracy within the target cost range.

Further, the accuracy of the m sub-network models may be fed back to the user, who selects the desired sub-network model as the target neural network model.

Further, the costs of the m sub-network models may be fed back to the user, and the user may select a desired sub-network model as the target neural network model.

According to the scheme of the embodiment of the application, the pre-trained hyper-network model is migrated to the target data set, so that the hyper-network model with better performance can be obtained even if the target data set is smaller. And the application of small data scenes is enabled, and the precision of the AutoML in the small data scenes is greatly improved.

In addition, the super network model obtained based on the progressive shrinkage training can support various different architecture settings, and after the super network model training is completed, a proper sub network model can be selected from the super network model without additional training. Or the sub-network model does not need to be retrained, and the accuracy of the sub-network model can also meet the requirement of pre-training.

The process of deploying multiple neural network models by the method of the embodiments of the present application is illustrated below by fig. 12. Fig. 12 is a schematic flow chart diagram illustrating a method of obtaining a neural network model according to an embodiment of the present application. The method of fig. 12 includes steps S1110 to S1140. The method of fig. 12 may be regarded as an embodiment of the method 900 in fig. 9, and specific implementation may refer to the foregoing method 900, and in order to avoid unnecessary repetition, a repeated description is appropriately omitted when the method of fig. 12 is introduced.

The method of fig. 12 is described as being divided into an off-line phase and an on-line phase.

An off-line stage:

s1110, pre-training the hyper-network model.

In particular, the search space, i.e. the hyper network model, is predefined. The hyper-network model is pre-trained based on the source dataset. The source data set may be a data set relating to the tasks that the target neural network model needs to perform.

For example, when the target neural network model is used for image classification, the source data set may include the source sample images and classification labels for the source sample images. As shown in fig. 12, the source data set may be the public data set ImageNet.

An online stage:

s1120, acquiring a pre-trained hyper-network model.

For example, a pre-trained hyper-network model may be loaded.

S1130, a target data set is obtained.

The target data set may be determined according to the tasks that the target neural network model needs to perform.

Illustratively, when the target neural network model is used for image classification, the target dataset may include the target sample image and a classification label for the target sample image.

Illustratively, the target data set may be a target data set input by a user.

S1140, the pre-trained hyper-network model is subjected to transfer learning based on the target data set.

It should be noted that only one target data set is shown in fig. 12, and the number of target data sets is not limited in the embodiment of the present application. If the user enters multiple target data sets, the pre-trained hyper-network model may be migrated to a different target data set.

For example, the plurality of target data sets includes bird data sets or vehicle data sets, and the like. Step S1140 includes performing transfer learning on the pre-trained super network model based on the bird data set to obtain a super network model 1 after the transfer learning, and performing transfer learning on the pre-trained super network model based on the vehicle data set to obtain a super network model 2 after the transfer learning.

S1150, searching is carried out in the super network model after the transfer learning, and a target neural network model is obtained.

Specifically, the hyper-network model after the transfer learning can be searched according to different user requirements, so that a target neural network model meeting different user requirements is obtained.

For example, as shown in fig. 12, different user requirements may include that the inference accuracy of the target neural network 1 reaches the target accuracy 1, and that the inference accuracy of the target neural network 2 reaches the target accuracy 2.

It should be noted that fig. 12 only uses two target accuracies as examples of two user requirements, and the embodiment of the present application does not limit the number of user requirements and the specific content of the user requirements.

Thus, for the same task, for example, an image classification task, in the case that a user needs a plurality of neural network models, the neural network models do not need to be designed and trained respectively for each deployment scenario or user requirement, and the super network models only need to be trained once. The pre-trained hyper-network model is migrated to a target data set, and searching is carried out according to different user requirements, so that a neural network model meeting different overhead/precision requirements of users is obtained, and the training cost is greatly reduced.

It should be understood that the above description is made only by taking the application of the target neural network model to image classification as an example. The method for acquiring the neural network model provided by the embodiment of the application can be applied to other tasks needing computer vision. Such as object detection, image segmentation, etc.

Illustratively, the target neural network model may also be applied to non-visual tasks. Such as natural language processing or speech recognition.

In different application scenarios, the source data set and the target data set may be determined according to the application scenario.

For example, when the target neural network model is applied to speech recognition, the source data set may include the source sample audio signal and the class labels corresponding to the source sample audio signal, and the target data set may include: and the target sample audio signal and the classification label corresponding to the target sample audio signal.

Fig. 13 shows a schematic flowchart of an image processing method 1200 provided in an embodiment of the present application, which may be executed by an apparatus or device capable of performing image processing, for example, the method may be executed by a terminal device, a computer, a server, or the like.

The target neural network model used in the image processing method 1200 in fig. 13 may be constructed by the method in fig. 9 or the method in fig. 12 described above. The method 1200 includes steps S1210 to S1220. Specific implementations of the method 1200 may refer to the method 900 described above, and in order to avoid unnecessary repetition, the following description of the method 1200 is appropriately omitted.

And S1210, acquiring an image to be processed.

The image to be processed may be an image captured by the terminal device (or other devices or apparatuses such as a computer and a server) through a camera, or the image to be processed may also be an image obtained from inside the terminal device (or other devices or apparatuses such as a computer and a server) (for example, an image stored in an album of the terminal device, or an image obtained by the terminal device from a cloud end), which is not limited in this embodiment of the application.

And S1220, processing the image to be processed by adopting the target neural network model to obtain a processing result of the image to be processed.

Wherein the target neural network model is obtained by searching the subnetwork model in the super network model. The super network model is obtained by carrying out transfer learning on the pre-trained super network model based on the target data set. The pre-trained hyper-network model is trained based on a source data set. The source data set and the target data set are both data sets that are relevant to the task of image processing.

Optionally, the target neural network model is adopted to classify the image to be processed, and a classification result is output.

The target neural network model is obtained by searching a subnetwork model in a super network model, and the super network model is obtained by carrying out transfer learning on a pre-trained super network model based on a target data set. The pre-trained hyper-network model is trained based on a source data set. The target data set comprises a target sample image and a classification label of the target sample image, and the source data set comprises a source sample image and a classification label of the source sample image.

The detailed steps for obtaining the neural network model can be found in the aforementioned method 900, and are not described herein.

The device embodiment of the present application will be described in detail below with reference to fig. 14 to 17. It should be understood that the apparatus in the embodiment of the present application may perform the method in the embodiment of the present application, that is, the following specific working processes of various products, and reference may be made to the corresponding processes in the embodiment of the foregoing method.

Fig. 14 is a schematic block diagram of an apparatus 1300 for obtaining a neural network model provided in an embodiment of the present application. It is understood that the apparatus 1300 may perform the method of acquiring a neural network model of fig. 9 or 12. For example, the apparatus 1300 may be the training device 120 in fig. 1, or the performing device 310 in fig. 6, or the system 800 in fig. 8. The apparatus 1300 includes: an acquisition unit 1310 and a processing unit 1320.

The obtaining unit 1310 is configured to obtain a pre-trained super network model, where the pre-trained super network model is obtained by training based on a source data set; acquiring a target data set, wherein a task corresponding to the target data set is the same as a task corresponding to the source data set; the processing unit 1320 is configured to migrate the pre-trained hyper-network model based on the target data set to obtain a migrated and learned hyper-network model; and searching a sub-network model in the super-network model after the transfer learning to obtain a target neural network model.

Optionally, as an embodiment, the pre-trained hyper-network model is obtained by training through a progressive shrinkage method.

Optionally, as an embodiment, the processing unit 1320 is specifically configured to: selecting a sub-network model from the pre-trained super-network model, calculating the weight gradient of the sub-network model based on the target data set, updating the weight of the sub-network model based on the weight gradient of the sub-network model to obtain an updated sub-network model, and obtaining the updated super-network model based on the updated sub-network model; repeating the steps until the updated hyper-network model meets a termination condition to obtain the hyper-network model after the transfer learning; wherein the termination condition comprises at least one of: the number of repetitions is greater than or equal to the first number of iterations; and the inference precision of the updated hyper-network model is greater than or equal to the first inference precision.

Optionally, as an embodiment, the processing unit 1320 is specifically configured to: selecting N from the pre-trained hyper-network model_bA model calculating the N based on the target data set_bWeight gradient of a sub-network model based on said N_bUpdating the N by weight gradient of a sub-network model_bThe weight of the sub-network model is updated to obtain N_bA sub-network model based on the updated N_bThe sub-network model obtains an updated hyper-network model, N_bIs a positive integer; repeating the steps until the updated hyper-network model meets a termination condition, and obtaining the hyper-network model after the transfer learning, wherein the termination condition comprises at least one of the following conditions: the number of repetitions is greater than or equal to the first number of iterations; and the inference precision of the updated hyper-network model is greater than or equal to the first inference precision.

Optionally, as an embodiment, the processing unit 1320 is specifically configured to: the method comprises the following steps: determining n first sub-network models according to the hyper-network model after the transfer learning, wherein n is an integer larger than 1; step two: adjusting the structures of the n first sub-network models to obtain n second sub-network models; step three: selecting n third sub-network models from the n first sub-network models and the n second sub-network models, the n third sub-network models being the n first sub-network models in step two; repeating the second step to the third step until the n third sub-network models meet a search termination condition, wherein the search termination condition includes at least one of the following: the repetition times are larger than or equal to the second iteration times, or the inference precision of at least p third sub-network models in the n third sub-network models is larger than or equal to the target precision; determining a target neural network model from the n third sub-network models.

Optionally, as an embodiment, the processing unit 1320 is specifically configured to: selecting n fourth sub-network models from the super-network models after the migration learning; acquiring hardware overhead of the n fourth sub-network models on target equipment; and adjusting the structures of the n fourth sub-network models based on the hardware overhead to obtain the n first sub-network models.

Fig. 15 is a schematic block diagram of an image processing apparatus 1400 provided in an embodiment of the present application. It should be understood that apparatus 1300 may perform the image processing method of fig. 13. For example, the apparatus 1400 may be the execution device 110 in fig. 1, or the local device 301 or the execution device 310 in fig. 6. The apparatus 1400 comprises: an acquisition unit 1410 and a processing unit 1420.

The acquiring unit 1410 is configured to acquire an image to be processed; the processing unit 1420 is configured to perform image processing on the image to be processed by using a target neural network model, and output a processing result; the target neural network model is obtained by searching a subnetwork model in a super network model, the super network model is obtained by performing transfer learning on a pre-trained super network model based on a target data set, the pre-trained super network model is obtained by training based on a source data set, and a task corresponding to the target data set is the same as a task corresponding to the source data set.

Optionally, as an embodiment, the super-network model is obtained by performing transfer learning on a pre-trained super-network model based on a target data set, and includes: the super network model is obtained by selecting a sub network model from the pre-trained super network model, calculating the weight gradient of the sub network model based on the target data set, updating the weight of the sub network model based on the weight gradient of the sub network model to obtain an updated sub network model, and obtaining the updated super network model based on the updated sub network model; repeating the steps until the updated hyper-network model meets the termination condition; wherein the termination condition comprises at least one of: the number of repetitions is greater than or equal to the first number of iterations; and the inference precision of the updated hyper-network model is greater than or equal to the first inference precision.

Optionally, as an embodiment, the super-network model is obtained by performing transfer learning on a pre-trained super-network model based on a target data set, and includes: the hyper-network model is generated by selecting N from the pre-trained hyper-network model_bA model calculating the N based on the target data set_bWeight gradient of a sub-network model based on said N_bUpdating the N by weight gradient of a sub-network model_bThe weight of the sub-network model is updated to obtain N_bA sub-network model based on the updated N_bThe sub-network model obtains an updated hyper-network model, N_bIs a positive integer; repeating the steps until the updated hyper-network model meets a termination condition, wherein the termination condition comprises at least one of the following conditions: the number of repetitions is greater than or equal to the first number of iterations; and the inference precision of the updated hyper-network model is greater than or equal to the first inference precision.

Optionally, as an embodiment, the target neural network model is obtained by searching a subnetwork model in a super network model, and includes: the target neural network model is obtained by determining n first sub-network models according to the hyper-network model, wherein n is an integer greater than 1; adjusting the structures of the n first sub-network models to obtain n second sub-network models; selecting n third sub-network models from the n first sub-network models and the n second sub-network models, and updating the n third sub-network models to the n first sub-network models; repeating the steps until the n third sub-network models meet the search termination condition; determined from the n third sub-network models; wherein the search termination condition comprises at least one of: the repetition times are larger than or equal to the second iteration times, or the inference precision of at least p third sub-network models in the n third sub-network models is larger than or equal to the target precision.

Optionally, as an embodiment, the determining n first sub-network models according to the super-network model includes: selecting n fourth sub-network models among the super-network models; acquiring hardware overhead of the n fourth sub-network models on target equipment; and adjusting the structures of the n fourth sub-network models based on the hardware overhead to obtain the n first sub-network models.

It should be noted that the apparatus 1300 and the apparatus 1400 are embodied in the form of functional units. The term "unit" herein may be implemented in software and/or hardware, and is not particularly limited thereto.

For example, a "unit" may be a software program, a hardware circuit, or a combination of both that implement the above-described functions. The hardware circuitry may include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared processor, a dedicated processor, or a group of processors) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality.

Accordingly, the units of the respective examples described in the embodiments of the present application can be realized in electronic hardware, or a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Fig. 16 is a hardware structural diagram of an apparatus for acquiring a neural network model according to an embodiment of the present application. An apparatus 3000 for acquiring a neural network model shown in fig. 16 (the apparatus 3000 may be a computer device) includes a memory 3001, a processor 3002, a communication interface 3003, and a bus 3004. The memory 3001, the processor 3002, and the communication interface 3003 are communicatively connected to each other via a bus 3004. For example, the apparatus 3000 may be the training device 120 of fig. 1, or the performing device 310 of fig. 6, or the system 800 of fig. 8.

The memory 3001 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 3001 may store a program, and the processor 3002 is configured to perform the steps of the method of acquiring a neural network model according to the embodiment of the present application when the program stored in the memory 3001 is executed by the processor 3002. Exemplarily, the processor 3002 may perform steps S920 to S940 in the method illustrated in fig. 9 or steps S1120 to S1150 in the method illustrated in fig. 12 above.

The processor 3002 may be a general Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more integrated circuits, and is configured to execute related programs to implement the method for acquiring a neural network model according to the embodiment of the present disclosure.

The processor 3002 may also be an integrated circuit chip having signal processing capabilities, such as the chip shown in FIG. 5. In implementation, the steps of the method for obtaining a neural network model of the present application may be implemented by integrated logic circuits of hardware in the processor 3002 or by instructions in the form of software.

The processor 3002 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 3001, and the processor 3002 reads the information in the memory 3001, and completes the functions required to be executed by the units included in the apparatus for acquiring a neural network model according to the embodiment of the present application, or executes the method for acquiring a neural network model according to the embodiment of the present application, in combination with the hardware thereof.

The communication interface 3003 enables communication between the apparatus 3000 and other devices or communication networks using transceiver means such as, but not limited to, a transceiver. For example, a pre-trained hyper-network model or a target data set, etc. may be obtained via the communication interface 3003.

The bus 3004 may include a pathway to transfer information between various components of the apparatus 3000 (e.g., memory 3001, processor 3002, communication interface 3003).

Fig. 17 is a schematic diagram of a hardware configuration of an image processing apparatus according to an embodiment of the present application. An image processing apparatus 4000 shown in fig. 17 includes a memory 4001, a processor 4002, a communication interface 4003, and a bus 4004. The memory 4001, the processor 4002 and the communication interface 4003 are communicatively connected to each other via a bus 4004. For example, the apparatus 4000 may be the execution device 110 in fig. 1, or the local device 301 or the execution device 310 in fig. 6.

Memory 4001 may be a ROM, a static storage device, and a RAM. The memory 4001 may store a program, and the processor 4002 and the communication interface 4003 are used to execute the steps of the image processing method according to the embodiment of the present application when the program stored in the memory 4001 is executed by the processor 4002. Specifically, the processor 4002 may perform steps S1210 to S1220 in the method illustrated in fig. 13 above.

The processor 4002 may be a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits, and is configured to execute a relevant program to implement the functions required to be executed by the units in the image processing apparatus according to the embodiment of the present application, or to execute the image processing method according to the embodiment of the method of the present application.

The processor 4002 may also be an integrated circuit chip having signal processing capabilities, such as the chip shown in fig. 5. In implementation, the steps of the image processing method according to the embodiment of the present application may be implemented by an integrated logic circuit of hardware in the processor 4002 or by instructions in the form of software.

The processor 4002 may also be a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware component. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The memory medium is located in the memory 4001, and the processor 4002 reads information in the memory 4001, and completes functions required to be executed by units included in the image processing apparatus of the embodiment of the present application in combination with hardware thereof, or executes the image processing method of the embodiment of the method of the present application.

Communication interface 4003 enables communication between apparatus 4000 and other devices or a communication network using transceiver means such as, but not limited to, a transceiver. For example, the image to be processed may be acquired through the communication interface 4003.

Bus 4004 may include a pathway to transfer information between various components of apparatus 4000 (e.g., memory 4001, processor 4002, communication interface 4003).

It should be noted that although the above-described apparatus 3000 and apparatus 4000 only show memories, processors, and communication interfaces, in particular implementations, those skilled in the art will appreciate that the apparatus 3000 and apparatus 4000 may also include other devices necessary for normal operation. Also, those skilled in the art will appreciate that the apparatus 3000, 4000 may also include hardware components for performing other additional functions, according to particular needs. Furthermore, it should be understood by those skilled in the art that the apparatus 3000, 4000 may also include only the devices necessary to implement the embodiments of the present application, and not necessarily all of the devices shown in fig. 16 and 17.

It should be understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, which may be understood with particular reference to the former and latter text.

In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of obtaining a neural network model, comprising:

acquiring a target data set, wherein a task corresponding to the target data set is the same as a task corresponding to the source data set;

and searching a sub-network model in the super-network model after the transfer learning to obtain a target neural network model.

2. The method of claim 1, wherein the pre-trained hyper-network model is trained by progressive shrinkage.

3. The method according to claim 1 or 2, wherein the performing transfer learning on the pre-trained hyper-network model based on the target data set to obtain a transfer-learned hyper-network model comprises:

selecting a sub-network model from the pre-trained hyper-network model, and calculating a weight gradient of the sub-network model based on the target data set;

updating the weight of the sub-network model based on the weight gradient of the sub-network model to obtain an updated sub-network model;

obtaining an updated hyper-network model based on the updated sub-network model;

repeating the steps until the updated hyper-network model meets a termination condition to obtain the migrated hyper-network model;

wherein the termination condition comprises at least one of: the repetition times are larger than or equal to the first iteration times, and the inference precision of the updated hyper-network model is larger than or equal to the first inference precision.

4. The method according to claim 1 or 2, wherein the performing transfer learning on the pre-trained hyper-network model based on the target data set to obtain a transfer-learned hyper-network model comprises:

selecting N from the pre-trained hyper-network model_bA sub-network model that computes the N based on the target dataset_bA weight gradient of the sub-network model;

based on the N_bUpdating the N by weight gradient of a sub-network model_bThe weight of the sub-network model is updated to obtain N_bA sub-network model;

based on the updated N_bThe sub-network model obtains an updated hyper-network model, N_bIs a positive integer;

5. The method according to any one of claims 1 to 4, wherein searching the sub-network model in the super network model after the migration learning to obtain a target neural network model comprises:

step three: selecting n third sub-network models from the n first sub-network models and the n second sub-network models, the n third sub-network models being the n first sub-network models in step two;

determining a target neural network model from the n third sub-network models.

6. The method of claim 5, wherein determining n first subnetwork models from the migration-learned super-network model comprises:

acquiring hardware overhead of the n fourth sub-network models on target equipment;

and adjusting the structures of the n fourth sub-network models based on the hardware overhead to obtain the n first sub-network models.

7. An image processing method, comprising:

acquiring an image to be processed;

adopting a target neural network model to perform image processing on the image to be processed, and outputting a processing result;

the target neural network model is obtained by searching a subnetwork model in a super network model, the super network model is obtained by performing transfer learning on a pre-trained super network model based on a target data set, the pre-trained super network model is obtained by training based on a source data set, and a task corresponding to the target data set is the same as a task corresponding to the source data set.

8. The method of claim 7, wherein the pre-trained hyper-network model is trained by progressive shrinkage.

9. The method of claim 7 or 8, wherein the super network model is obtained by performing transfer learning on a pre-trained super network model based on a target data set, and comprises:

the super network model is obtained by selecting a sub network model from the pre-trained super network model, calculating the weight gradient of the sub network model based on the target data set, updating the weight of the sub network model based on the weight gradient of the sub network model to obtain an updated sub network model, and obtaining the updated super network model based on the updated sub network model; repeating the steps until the updated hyper-network model meets the termination condition;

wherein the termination condition comprises at least one of: the number of repetitions is greater than or equal to the first number of iterations; and the inference precision of the updated hyper-network model is greater than or equal to the first inference precision.

10. The method of claim 7 or 8, wherein the super network model is obtained by performing transfer learning on a pre-trained super network model based on a target data set, and comprises:

the hyper-network model is generated by selecting N from the pre-trained hyper-network model_bA sub-network model that computes the N based on the target dataset_bWeight gradient of a sub-network model based on said N_bUpdating the N by weight gradient of a sub-network model_bThe weight of the sub-network model is updated to obtain N_bA sub-network model based on the updated N_bThe sub-network model obtains an updated hyper-network model, N_bIs a positive integer; repeating the steps until the updated hyper-network model meets a termination condition, wherein the termination condition comprises at least one of the following conditions: the number of repetitions is greater than or equal to the first number of iterations; and the inference precision of the updated hyper-network model is greater than or equal to the first inference precision.

11. The method of any one of claims 7 to 10, wherein the target neural network model is obtained by searching a subnetwork model in a super network model, comprising:

the target neural network model is obtained by determining n first sub-network models according to the hyper-network model, wherein n is an integer greater than 1; adjusting the structures of the n first sub-network models to obtain n second sub-network models; selecting n third sub-network models from the n first sub-network models and the n second sub-network models, and updating the n third sub-network models to the n first sub-network models; repeating the steps until the n third sub-network models meet the search termination condition; determined from the n third sub-network models;

wherein the search termination condition comprises at least one of: the repetition times are larger than or equal to the second iteration times, or the inference precision of at least p third sub-network models in the n third sub-network models is larger than or equal to the target precision.

12. The method of claim 11, wherein determining n first sub-network models from the super-network model comprises: selecting n fourth sub-network models among the super-network models; acquiring hardware overhead of the n fourth sub-network models on target equipment; and adjusting the structures of the n fourth sub-network models based on the hardware overhead to obtain the n first sub-network models.

13. An apparatus for obtaining a neural network model, comprising:

an acquisition unit configured to:

a processing unit to:

14. The apparatus of claim 13, wherein the pre-trained hyper-network model is trained by progressive shrinkage.

15. The apparatus according to claim 13 or 14, wherein the processing unit is specifically configured to:

repeating the steps until the updated hyper-network model meets a termination condition to obtain the hyper-network model after the transfer learning;

16. The apparatus according to claim 13 or 14, wherein the processing unit is specifically configured to:

17. The apparatus according to any one of claims 13 to 16, wherein the processing unit is specifically configured to:

determining a target neural network model from the n third sub-network models.

18. The apparatus according to claim 17, wherein the processing unit is specifically configured to:

19. An image processing apparatus characterized by comprising:

the acquisition unit is used for acquiring an image to be processed;

a processing unit to:

20. The apparatus of claim 19, wherein the pre-trained hyper-network model is trained by progressive shrinkage.

21. The apparatus of claim 19 or 20, wherein the super network model is obtained by performing migration learning on a pre-trained super network model based on a target data set, and comprises:

22. The apparatus of claim 19 or 20, wherein the super network model is obtained by performing migration learning on a pre-trained super network model based on a target data set, and comprises:

23. The apparatus of any one of claims 19 to 22, wherein the target neural network model is obtained by searching a subnetwork model in a super network model, comprising:

24. The apparatus of claim 23, wherein the determining n first sub-network models from the super-network model comprises: selecting n fourth sub-network models among the super-network models;

acquiring hardware overhead of the n fourth sub-network models on target equipment; and adjusting the structures of the n fourth sub-network models based on the hardware overhead to obtain the n first sub-network models.

25. An apparatus for obtaining a neural network model, comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of any one of claims 1-6.

26. An image processing apparatus comprising a processor and a memory, the memory being arranged to store program instructions, the processor being arranged to invoke the program instructions to perform the method of any of claims 7 to 12.

27. A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, the program code comprising instructions for performing the method of any of claims 1 to 6 or 7 to 12.

28. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory through the data interface to perform the method of any one of claims 1 to 6 or 7 to 12.