CN112445823A

CN112445823A - Searching method of neural network structure, image processing method and device

Info

Publication number: CN112445823A
Application number: CN201910834158.1A
Authority: CN
Inventors: 陈醒濠; 杨朝晖; 王云鹤; 许春景
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-09-04
Filing date: 2019-09-04
Publication date: 2021-03-05
Also published as: WO2021043193A1

Abstract

The application provides a neural network structure searching method, an image processing method and an image processing device in the field of computer vision in the field of artificial intelligence. The searching method of the neural network structure comprises the following steps: determining a search network according to the target task, wherein the search network comprises a structural space and a parameter space; updating the parameter space according to the training data of the target task to obtain an updated parameter space; determining a set of subnetworks from the search network; updating the sub-network set to obtain an updated sub-network set; and determining a plurality of target neural networks corresponding to the target task according to the updated parameter space and the updated sub-network set. According to the technical scheme, the plurality of target neural network models can be obtained through one-time search, so that a user can select a proper model to apply according to resource constraints in an application scene.

Description

Searching method of neural network structure, image processing method and device

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to a search method, an image processing method, and an apparatus for a neural network structure.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

With the rapid development of artificial intelligence technology, neural networks (e.g., deep neural networks) have achieved great success in processing and analyzing various media signals such as images, videos, and voices in recent years. A well-performing neural network tends to have a delicate network structure, which requires a highly skilled and experienced human expert to expend a great deal of effort in construction. In order to better construct a neural network, it has been proposed to construct a neural network by a neural Network Architecture Search (NAS) method, and to automatically search for a neural network architecture, thereby obtaining a neural network architecture with excellent performance.

In a traditional neural network structure searching method, a searching space of all network architectures which can be represented in principle needs to be defined, then candidate network structures are found out from the searching space through a searching strategy and evaluated, the next round of searching is carried out according to a feedback result, and only one sub-network structure is finally reserved after multiple rounds of searching. However, for different application scenarios, resource constraint conditions are different, and one sub-network structure obtained by the traditional neural network structure searching method can only be applied to a scenario matched with the one sub-network structure, and cannot meet differentiation requirements.

Disclosure of Invention

The application provides a searching method of a neural network structure, an image processing method and an image processing device, which can obtain a plurality of target neural network models through one-time searching, so that a user can select a proper model to apply according to resource constraints in an application scene.

In a first aspect, a method for searching a neural network structure is provided, the method including: determining a search network according to a target task, wherein the search network comprises a structure space and a parameter space, the structure space comprises a plurality of sub-network structures, the parameter space comprises a plurality of parameters, and each sub-network structure corresponds to at least one parameter in the parameter space; updating the parameter space according to the training data of the target task to obtain an updated parameter space; determining a set of subnetworks from the search network, the set of subnetworks comprising a first plurality of subnetwork structures of the plurality of subnetwork structures; updating the set of subnetworks to obtain an updated set of subnetworks, the updated set of subnetworks comprising a plurality of second ones of the plurality of subnetwork structures; and determining a plurality of target neural networks corresponding to the target tasks according to the updated parameter space and the updated sub-network set.

The search network is determined according to the application requirements of the target neural network corresponding to the target task.

That is, the search network may be determined according to application requirements of the target neural network. Specifically, the search network may be determined according to a data type of data that the target neural network needs to process.

The search network includes a variety of connections and different operations. Generally, the kind and number of operations included in the search network are matched to the application requirements of the target neural network. The search network includes a kind and a number of operations to be adapted to the processing of the image data when the target neural network is used to process the image data, and a kind and a number of operations to be adapted to the processing of the voice data when the target neural network is used to process the voice data.

For example, when the target neural network is a neural network for processing image data, the search network may include a long and short term memory unit or a long and short term memory network (LSTM), and the like, and specifically, the search network may include a convolution operation, a pooling operation, a skip-connect operation, an activation function, and the like.

For another example, when the target neural network is a neural network for processing voice data, the search network may include a long and short term memory unit or a long and short term memory network (LSTM), and specifically, the search network may include an activation function (e.g., ReLU, Tanh), and the like.

The search space is determined according to the application requirement of the target neural network and the video memory resource condition of the device for executing the neural network structure search.

The video memory resource condition of the device that performs the neural network structure search may refer to a video memory resource size of the device that performs the neural network structure search.

That is, in the present application, the search space may be determined comprehensively according to the application requirements of the target neural network and the video memory resource conditions of the device performing the neural network structure search.

Specifically, the operation type and number included in the search space may be determined according to the application requirement of the target neural network, and then the operation type and number included in the search space may be adjusted according to the video memory resource condition of the device performing the neural network structure search, so as to determine the operation type and number finally included in the search space.

For example, after determining the type and number of operations contained in the search space according to the application requirements of the target neural network, if the video memory resources of the device performing the neural network structure search are less, some less important operations in the search space may be deleted. If the video memory resource of the device executing the neural network structure search is sufficient, the operation types and the number contained in the search space can be kept, or the operation types and the number contained in the search space can be increased.

The search network in the embodiment of the present application may include basic modules for constructing a neural network, such as convolution, a preset basic operation or a combination of basic operations adapted to the application requirement of the target neural network, where these basic operations or combinations of basic operations may be collectively referred to as basic operations. The size, depth, width, connection mode, etc. of the search network can be designed manually.

The search network comprises a structure space comprising a plurality of sub-network structures and a parameter space comprising a plurality of parameters, wherein each sub-network structure corresponds to at least one parameter in the parameter space.

The structural space in the embodiment of the present application includes all network architectures that can be characterized in principle, and may also be understood as a structural space that is a complete set of network architectures, where each possible network architecture may be referred to as a sub-network architecture. The sub-network structure comprises all or part of basic modules and all or part of basic operations in the search network, different sub-network structures comprise different types and numbers of the basic modules and the basic operations, and parameters of the basic modules and the basic operations of different sub-network structures are shared. That is, in the embodiment of the present application, the parameter space includes parameters of a basic module and a basic operation in the search network, for example, a parameter related to a separation convolution with a convolution kernel size of 3 × 3 for the basic operation, a parameter related to a hole convolution with a convolution kernel size of 5 × 5 and a hole rate of 2 for the basic operation, a parameter related to a jump connection for the basic operation, and the like. Different sub-network structures may include different types and numbers of basic modules and basic operations, different sub-network structures may have different connection modes, but different sub-network structures share parameters in the parameter space, for example, different sub-network structures include the same basic operation, and different sub-network structures share parameters corresponding to the same basic operation included in the parameter space. Each base module may correspond to one or more parameters, each base operation may correspond to one or more parameters, each sub-network structure may comprise one or more base modules, may comprise one or more base operations, and thus each sub-network structure in the structure space in the embodiments of the present application corresponds to at least one parameter in the parameter space, e.g. each sub-network structure corresponds to one or more parameters in the parameter space.

For better understanding of the structure space and the sub-network structure of the embodiment of the present application, for example, assuming that the number of layers of the search network is 22, each layer includes 5 basic operations (the 5 basic operations are expressed by numbers 1 to 5 respectively), a 22-dimensional vector may be used to express a sub-network structure, for example, (1,2,4,5,3,5,1 … … 3,2), the sub-network structures expressed by different vectors are different, and all the sub-network structures expressed by 22-dimensional vectors may form the structure space of the embodiment of the present application.

Optionally, when determining a search network corresponding to the target task, the parameter value in the parameter space may be randomly set, or a default initialized value, or a manually set value, which is not specifically limited in the embodiment of the present application.

It should be understood that, in the embodiment of the present application, updating the parameter space may be understood as updating the parameter in the parameter space, or may be understood as updating the value of the parameter in the parameter space.

Optionally, the updating of the parameter space according to the training data of the target task may be to update part of the parameters in the parameter space, or to update all the parameters in the parameter space.

In this embodiment of the present application, the training data of the target task may be a part of the data that needs to be processed by the target neural network, or may be all of the data that needs to be processed by the target neural network, or may be other data of the same type as the data that needs to be processed by the target neural network, and the implementation of the present application is not particularly limited. For example, when the target neural network is used to process image data, the training data of the target task may be image data such as a picture; when the target neural network is used for processing voice data, the training data of the target task may be voice data such as voice segments; when the target neural network is used for processing text data, the training data of the target task may be text data such as a text sequence.

In this embodiment, the sub-network set includes a plurality of sub-networks, where a sub-network includes a sub-network structure and a parameter corresponding to the sub-network structure, the sub-network structure belongs to a structure space, and the parameter corresponding to the sub-network structure belongs to a parameter space. In other words, a sub-network may be expressed with parameters corresponding to the sub-network structure and the sub-network structure. The parameters corresponding to the sub-network structure may be understood as parameters corresponding to the basic modules and basic operations comprised by the sub-network structure. In an embodiment of the application, the set of sub-networks comprises a plurality of first sub-network structures of a plurality of sub-network structures comprised in the structure space, it being understood that the set of sub-networks further comprises parameters corresponding to each first sub-network structure of the plurality of first sub-network structures.

It should be understood that, in the embodiment of the present application, updating the sub-network set may be understood as replacing the sub-network structure included in the sub-network set, or may be understood as re-determining or selecting the sub-network structure included in the sub-network set.

In this embodiment, the update sub-network set includes a plurality of second sub-network structures of a plurality of sub-network structures included in the structure space, where the first sub-network structure and the second sub-network structure may be the same or different. In other words, part or all of the plurality of first sub-network structures in the pre-update sub-network set may be included in the update sub-network set, or a plurality of second sub-network structures included in the update sub-network set may be completely different from the plurality of first sub-network structures included in the pre-update sub-network set.

According to the searching method of the neural network structure, one sub-network set is maintained in the searching process, a plurality of target neural networks can be obtained through one-time searching, and the plurality of sub-network structures included in the sub-network set share parameters in the parameter space, so that the searching speed can be increased, the searching efficiency is improved, and the plurality of target neural networks can be efficiently obtained from one-time searching.

Further, the multiple target neural networks may have different attributes, providing a model that may satisfy different computational and/or storage resource constraints, allowing a user to select a corresponding target neural network based on resource constraints in practical applications.

With reference to the first aspect, in a possible implementation manner, the updating the parameter space according to the training data of the target task to obtain an updated parameter space includes: inputting training data of the target task; updating the parameter space based on a gradient back propagation algorithm, wherein the gradient of the parameter space is an average value of the parameter gradients corresponding to part of the plurality of first sub-network structures.

In the embodiment of the application, the parameter gradients corresponding to all the first sub-network structures in the sub-network set are directly summed, so that a large calculation burden may be brought, and the parameter gradients corresponding to part of the first sub-network structures in a small batch (mini-batch) may be summed, so that the calculation burden may be reduced, and the updating speed of a parameter space is increased, thereby increasing the searching speed of a neural network structure and improving the searching efficiency of the neural network structure.

With reference to the first aspect, in a possible implementation manner, the updating the subnet set to obtain an updated subnet set includes: processing the plurality of first sub-network structures based on any one algorithm of a particle swarm algorithm, a genetic algorithm and a firework algorithm to obtain a next sub-network structure; based on a non-dominated sorting algorithm, carrying out grading on the plurality of first sub-network structures and the next generation sub-network structure according to a first metric target to obtain a first grading result; based on a non-dominated sorting algorithm, carrying out grading on the plurality of first sub-network structures and the next generation sub-network structure according to a second metric target to obtain a second grading result; merging the sub-network structures included in the same level in the first level classification result and the second level classification result respectively to obtain merged level classification results; selecting a plurality of sub-network structures from the merged ranking result as the plurality of second sub-network structures.

It should be understood that the particle swarm algorithm, the genetic algorithm and the firework algorithm all belong to one of the evolutionary algorithms, and other evolutionary algorithms can be adopted in the embodiment of the application to execute the same method as the particle swarm algorithm, the genetic algorithm and the firework algorithm.

With reference to the first aspect, in a possible implementation manner, the updating the subnet set to obtain an updated subnet set includes: based on a non-dominated sorting algorithm, carrying out hierarchical classification on a plurality of first sub-network structures and next generation sub-network structures according to a first metric target to obtain a first hierarchical classification result; based on a non-dominated sorting algorithm, carrying out hierarchical classification on the plurality of first sub-network structures and the next generation sub-network structure according to a second metric target to obtain a second hierarchical classification result; and selecting a plurality of sub-network structures from the previous levels in the first-time grading result and the second-time grading result as a plurality of second sub-network structures.

According to the method and the device, the grading of the non-dominated sorting algorithm is performed twice on the current generation sub-network structure and the next generation sub-network structure according to different measurement targets, and then the second sub-network structures are selected according to the grading results of the two times, so that the protection of some models which are poor in temporary performance but possibly obtain better performance in the future can be realized, the models are prevented from being dominated in the early stage of search and being discarded too early, and the coverage range of the sub-network set can be expanded.

With reference to the first aspect, in a possible implementation manner, the first metric target includes at least one of a parameter amount, a network runtime, a network required operation memory, a network computation amount, energy consumption, a number of floating point operations, and an accuracy rate, and the second metric target includes at least one of a parameter amount, a network runtime, a network required operation memory, a network computation amount, energy consumption, a number of floating point operations, and an increase rate of the accuracy rate.

With reference to the first aspect, in a possible implementation manner, the determining, according to the update parameter space and the update sub-network set, a plurality of target neural networks corresponding to the target task includes: and selecting part or all of the second sub-network structures from the updating sub-network set as the plurality of target neural networks, wherein the parameters corresponding to the part or all of the second sub-network structures belong to the updating parameter space.

With reference to the first aspect, in a possible implementation manner, the method further includes: acquiring target task data to be processed; selecting a first target neural network from the plurality of target neural networks according to computing resources and/or storage resources of the terminal device; and processing the target task data to be processed according to the first target neural network to obtain a processing result corresponding to the target task.

In the embodiment of the application, the target neural network matched with the resource constraint condition of the terminal device can be selected from the models with different sizes and accuracies according to the computing resource and/or the storage resource of the terminal device, so that the applicable terminal device and application scenarios are wider.

In a second aspect, a method for searching a neural network structure is provided, the method comprising: determining a search network according to a target task, the search network comprising a structural space comprising a plurality of sub-network structures; determining a set of subnetworks from the search network, the set of subnetworks comprising a first plurality of subnetwork structures of the plurality of subnetwork structures; updating the set of subnetworks to obtain an updated set of subnetworks, the updated set of subnetworks comprising a plurality of second ones of the plurality of subnetwork structures; and determining a plurality of target neural networks corresponding to the target tasks according to the updated sub-network set.

According to the searching method of the neural network structure, a sub-network set is maintained in the searching process, a plurality of target neural networks can be provided through one-time searching, the target neural networks can have different sizes and accuracies, and different calculation and/or storage resource limitations can be met, so that a user can select the corresponding target neural network according to resource constraints in practical application.

With reference to the second aspect, in a possible implementation manner, the updating the subnet set to obtain an updated subnet set includes: processing the plurality of first sub-network structures based on any one algorithm of a particle swarm algorithm, a genetic algorithm and a firework algorithm to obtain a next sub-network structure; based on a non-dominated sorting algorithm, carrying out grading on the plurality of first sub-network structures and the next generation sub-network structure according to a first metric target to obtain a first grading result; based on a non-dominated sorting algorithm, carrying out grading on the plurality of first sub-network structures and the next generation sub-network structure according to a second metric target to obtain a second grading result; merging the sub-network structures included in the same level in the first level classification result and the second level classification result respectively to obtain merged level classification results; selecting a plurality of sub-network structures from the merged ranking result as the plurality of second sub-network structures.

With reference to the second aspect, in one possible implementation manner, the first metric target includes at least one of a parameter amount, a network runtime, a network required operating memory, a network computation amount, energy consumption, a number of floating point operations, and an accuracy rate, and the second metric target includes at least one of a parameter amount, a network runtime, a network required operating memory, a network computation amount, energy consumption, a number of floating point operations, and an increase rate of the accuracy rate.

According to the method and the device, the grading of the non-dominated sorting algorithm is performed twice on the current generation sub-network structure and the next generation sub-network structure according to different measurement targets, and then when a plurality of second sub-network structures are selected according to the grading results twice, some models which are poor in temporary performance but possibly better in future can be protected, the models are prevented from being dominated in the early stage of searching and being discarded too early, and therefore the coverage range of a sub-network set can be expanded.

With reference to the second aspect, in a possible implementation manner, the determining, according to the update subnetwork set, a plurality of target neural networks corresponding to the target task includes: selecting part or all of a second sub-network structure from the set of update sub-networks as the plurality of target neural networks.

In a third aspect, an image processing method is provided, which is applied to a terminal device, and includes: acquiring an image to be processed; selecting a first target neural network from a plurality of target neural networks according to computing resources and/or storage resources of the terminal device; classifying the image to be processed according to the first target neural network to obtain a classification result of the image to be processed; wherein the plurality of target neural networks are derived from an updated parameter space and an updated set of subnetworks, the updated parameter space is obtained by updating the parameter space included in the search network according to the training data of the image classification task, the search network corresponds to the image classification task, the parameter space includes a plurality of parameters, the search network further comprising a fabric space, the fabric space comprising a plurality of sub-network fabrics, wherein each sub-network structure corresponds to at least one parameter in the parameter space, the updated sub-network set is obtained by updating the sub-network set determined from the search network, wherein the set of subnetworks comprises a plurality of first subnetwork structures of the plurality of subnetwork structures, the update sub-network set comprises a plurality of second sub-network structures of the plurality of sub-network structures.

The plurality of target neural networks are determined by updating the parameter space and the sub-network set according to the parameters in the parameter space shared by the plurality of sub-network structures in the sub-network set, the training process of the sub-network structures is not trained from the beginning after randomly initializing the parameters, but inherits the parameters which are trained before, namely the parameters in the updated parameter space, so that the searching efficiency can be effectively improved, the plurality of target neural networks can be obtained by one-time searching, the plurality of target neural networks can have different attributes, a model which can meet different calculation and/or storage resource limitations is provided, and a user can select the corresponding target neural networks according to resource constraints in practical application.

With reference to the third aspect, in a possible implementation manner, the updated parameter space is obtained by updating the parameter space based on a gradient back propagation algorithm, where a gradient of the parameter space is an average value of parameter gradients corresponding to part of the first sub-network structures in the plurality of first sub-network structures.

With reference to the third aspect, in a possible implementation manner, the updated subnetwork set is obtained by processing the plurality of first subnetwork structures based on any one of a particle swarm algorithm, a genetic algorithm, and a firework algorithm, obtaining a next-generation subnetwork structure, and then selecting a plurality of subnetwork structures from merged ranking results as the plurality of second subnetwork structures, where the merged ranking results are obtained by respectively merging subnetwork structures included in the same level in a first ranking result and a second ranking result, the first ranking result is based on a non-dominated sorting algorithm, the plurality of first subnetwork structures and the next-generation subnetwork structure are ranked according to a first metric target, and the second ranking result is based on the non-dominated sorting algorithm, and ranking the plurality of first sub-network structures and the next generation sub-network structure according to a second metric objective.

With reference to the third aspect, in a possible implementation manner, the first metric target includes at least one of a parameter amount, a network runtime, a network required operation memory, a network computation amount, energy consumption, a number of floating point operations, and an accuracy rate, and the second metric target includes at least one of a parameter amount, a network runtime, a network required operation memory, a network computation amount, energy consumption, a number of floating point operations, and an increase rate of the accuracy rate.

In a fourth aspect, there is provided an image processing method, comprising: acquiring an image to be processed; and processing the image to be processed according to the target neural network to obtain a processing result of the image to be processed.

The processing of the image may refer to recognizing, classifying, detecting, and the like of the image.

In a fifth aspect, an image processing method is provided, which includes: acquiring a road picture; carrying out convolution processing on the road picture according to the target neural network to obtain a plurality of convolution characteristic graphs of the road picture; and carrying out deconvolution processing on the plurality of convolution characteristic graphs of the road picture according to the target neural network to obtain a semantic segmentation result of the road picture.

Wherein the target neural network is a target neural network searched according to the first aspect or any possible implementation manner of the first aspect.

In a sixth aspect, there is provided an image processing method comprising: acquiring a face image; carrying out convolution processing on the face image according to the target neural network to obtain a convolution characteristic diagram of the face image; and comparing the convolution characteristic image of the face image with the convolution characteristic image of the identity document image to obtain a verification result of the face image.

The convolution characteristic map of the identity document image can be acquired in advance and stored in a corresponding database. For example, the identity document image is subjected to convolution processing in advance, and the obtained convolution feature map is stored in a database.

In addition, the target neural network is a target neural network searched according to the first aspect or any possible implementation manner of the first aspect.

It is to be understood that extensions, definitions, explanations and explanations of relevant matters in the above-described first aspect also apply to the same matters in the second, third, fourth, fifth and sixth aspects.

In a seventh aspect, a neural network structure search apparatus is provided, which includes means for performing the method in any one of the above-mentioned first aspect or any one of the above-mentioned possible implementations of the first aspect, or includes means for performing the method in any one of the above-mentioned second aspect or any one of the above-mentioned possible implementations of the second aspect.

In an eighth aspect, there is provided a neural network structure searching apparatus, including: a memory for storing a program; a processor for executing the memory-stored program, the processor being configured to perform the method of the first aspect or any one of the possible implementations of the first aspect, or the method of the second aspect or any one of the possible implementations of the second aspect, when the memory-stored program is executed.

In a ninth aspect, there is provided an image processing apparatus comprising: a memory for storing a program; a processor configured to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to perform the method in any one of the possible implementation manners of the third aspect to the sixth aspect.

A tenth aspect provides a computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method of any one of the implementations of the first to sixth aspects.

In an eleventh aspect, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform the method in any one of the implementations of the first to sixth aspects.

In a twelfth aspect, a chip is provided, where the chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface to execute the method in any one implementation manner of the first aspect to the sixth aspect.

Optionally, as a possible implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, and when the instructions are executed, the processor is configured to execute the method in any one implementation manner of the first aspect to the sixth aspect.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence agent framework provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of one specific application provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of one specific application provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of one specific application provided by an embodiment of the present application;

fig. 5 is a schematic structural diagram of a system architecture according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present disclosure;

FIG. 9 is a diagram illustrating a system architecture according to an embodiment of the present application;

FIG. 10 is a schematic flow chart diagram of a search method of a neural network structure of an embodiment of the present application;

FIG. 11 is a schematic diagram of a search method of a neural network structure according to an embodiment of the present application;

fig. 12 is a schematic effect diagram of a search method of a neural network structure according to an embodiment of the present application;

fig. 13 is a schematic effect diagram of a search method of a neural network structure according to an embodiment of the present application;

FIG. 14 is a schematic flow chart diagram of an image processing method of an embodiment of the present application;

fig. 15 is a schematic block diagram of a neural network structure search apparatus according to an embodiment of the present application;

fig. 16 is a schematic block diagram of an image processing apparatus of an embodiment of the present application;

fig. 17 is a schematic block diagram of a neural network training device according to an embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

FIG. 1 shows a schematic diagram of an artificial intelligence body framework that describes the overall workflow of an artificial intelligence system, applicable to the general artificial intelligence field requirements.

The artificial intelligence topic framework described above is described in detail below in two dimensions, "intelligent information chain" (horizontal axis) and "Information Technology (IT) value chain" (vertical axis).

The "smart information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process.

The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the artificial intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure:

the infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform.

The infrastructure may communicate with the outside through sensors, and the computing power of the infrastructure may be provided by a smart chip.

The intelligent chip may be a hardware acceleration chip such as a Central Processing Unit (CPU), a neural-Network Processing Unit (NPU), a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), and a Field Programmable Gate Array (FPGA).

The infrastructure platform may include distributed computing framework and network, and may include cloud storage and computing, interworking network, and the like.

For example, for an infrastructure, data may be obtained through sensors and external communications and then provided to an intelligent chip in a distributed computing system provided by the base platform for computation.

(2) Data:

data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphics, images, voice and text, and also relates to internet of things data of traditional equipment, including service data of an existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing:

the data processing generally includes processing modes such as data training, machine learning, deep learning, searching, reasoning, decision making and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General-purpose capability:

after the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent products and industrial applications:

the intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city, intelligent terminal and the like.

The embodiment of the application can be applied to many fields in artificial intelligence, such as intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe cities and other fields.

Specifically, the embodiment of the present application may be specifically applied to the fields that need to use a (deep) neural network, such as image classification, image retrieval, image semantic segmentation, image super-resolution, and natural language processing, and may also be specifically applied to cloud services.

For better understanding of the solution of the embodiment of the present application, a brief description is given below to possible application scenarios of the embodiment of the present application with reference to fig. 2 to 4.

Cloud service:

the embodiment of the application can be deployed on cloud services, and provides automatic neural network structure searching services for users. The user uploads training data, and specifies required indexes (such as accuracy, model size and the like), namely a series of different models can be automatically searched out through the cloud service for the user to select.

For example, as shown in fig. 2, the searching method for the neural network structure according to the embodiment of the present application may be deployed on a server, and a user may upload a training picture to a training picture library in the server and specify a required index of the neural network structure. By utilizing the searching method of the neural network structure, a plurality of neural network structures suitable for image recognition can be obtained, then the neural network is trained according to the training pictures in the training picture library, and a plurality of image recognition neural networks can be obtained, wherein the plurality of image recognition neural networks can have different model sizes and accuracy rates. The user can select the matched image recognition neural network according to the computing resource and/or the storage resource of the terminal and deploy the image recognition neural network on the terminal. For example, as shown in fig. 2, the server may search the image recognition neural network structures with sizes of 2.7M, 3.2M, 3.7M, etc. through the neural network structure search system (corresponding to the search method of the neural network structure in the embodiment of the present application), and if the computing resource of the terminal device is less than 3.0M, the image recognition neural network structure with a size of 2.7M may be selected from the plurality of image recognition neural network structures searched by the server. The image recognition neural network structure deployed on the terminal can recognize the image to be processed, as shown in fig. 2, the image recognition neural network processes the input picture, and the obtained picture is a recognition result of the bird. The cloud service of the embodiment of the application can search out a series of neural network structures with different model sizes and accuracies, and a user can select a proper neural network structure to deploy according to an application scene and resource constraints in practical application.

Classifying photo album pictures:

specifically, when a user stores a large number of pictures on a terminal device (e.g., a mobile phone) or a cloud disk, the images in the album are identified, so that the user or the system can conveniently classify and manage the album, and the user experience is improved.

By utilizing the searching method of the neural network structure, a plurality of neural network structures suitable for photo album classification can be obtained through searching, and then the neural network is trained according to the training pictures in the training picture library, so that a plurality of photo album classification neural networks can be obtained. And selecting an album classification neural network matched with the computing resources of the current terminal equipment from the plurality of album classification neural networks, and classifying the pictures by using the selected album classification neural network, so that labels are marked on the pictures of different categories, and the pictures can be conveniently checked and searched by a user. In addition, the classification labels of the pictures can also be provided for the album management system to perform classification management, so that the management time of a user is saved, the album management efficiency is improved, and the user experience is improved.

For example, as shown in fig. 3, a plurality of neural networks suitable for album classification can be searched by the searching method of the neural network structure according to the embodiment of the present application. After obtaining a plurality of neural networks suitable for photo album classification, the neural networks can be trained according to the training pictures to obtain a plurality of corresponding photo album classification neural networks. And selecting an album classification neural network matched with the computing resource and/or the storage resource of the terminal equipment from the plurality of album classification neural networks, and then classifying the pictures to be processed by using the selected album classification neural network. As shown in fig. 3, the photo album classification neural network processes the input picture to obtain the type of the picture as tulip. Or, the photo album classification neural network classifies the input pictures, for example, classifying the pictures of birds together and displaying them together, and classifying the pictures of swimming together and displaying them together.

Target detection and segmentation in an automatic driving scene:

in automatic driving, it is important for the vehicle to make driving decisions to detect and segment objects such as pedestrians and vehicles on the street. As shown in fig. 4, the neural network structure provided by the neural network structure search method according to the embodiment of the present application may be embedded into a target detection and segmentation framework, and the target detection and segmentation framework including the neural network structure may process an input road picture, may accurately detect, locate and segment a target on a street, and may output a detection and segmentation result.

Since the embodiments of the present application relate to the application of a large number of neural networks, for the convenience of understanding, the related terms and related concepts such as neural networks related to the embodiments of the present application will be described below.

(1) Neural network

The neural network may be composed of neural units, which may be referred to as x_sAnd an arithmetic unit with intercept 1 as input, the output of which may be:

wherein s is 1,2, … … n, n is a natural number greater than 1, and W is_sIs x_sB is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the localThe partial receptor domain is characterized in that the partial receptor domain can be a region composed of several nerve units.

(2) Deep neural network

Deep Neural Networks (DNNs), also called multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i +1) th layer.

Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein,

is the input vector of the input vector,

is the output vector of the output vector,

is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector

Obtaining the output vector through such simple operation

Due to the large number of DNN layers, the coefficient W and the offset vector

The number of the same is also large. The definition of these parameters in DNN is as followsThe following steps: taking the coefficient W as an example, assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.

In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as

Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.

(3) Convolutional neural network

A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Recurrent Neural Networks (RNNs) are used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are all connected, and each node between every two layers is connectionless. Although solving many problems, the common neural network still has no capability to solve many problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. The RNN is called a recurrent neural network, i.e., the current output of a sequence is also related to the previous output. The concrete expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more but connected, and the input of the hidden layer not only comprises the output of the input layer but also comprises the output of the hidden layer at the last moment. In theory, RNNs can process sequence data of any length. The training for RNN is the same as for conventional CNN or DNN.

Now that there is a convolutional neural network, why is a circular neural network? For simple reasons, in convolutional neural networks, there is a precondition assumption that: the elements are independent of each other, as are inputs and outputs, such as cats and dogs. However, in the real world, many elements are interconnected, such as stock changes over time, and for example, a person says: i like to travel, wherein the favorite place is Yunnan, and the opportunity is in future to go. Here, to fill in the blank, humans should all know to fill in "yunnan". Because humans infer from the context, but how do the machine do it? The RNN is generated. RNNs aim at making machines capable of memory like humans. Therefore, the output of the RNN needs to be dependent on the current input information and historical memory information.

(5) Loss function

In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are preset for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be lower, and the adjustment is continuously carried out until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.

(6) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.

(7) Genetic algorithm (genetic algorithm, GA)

The genetic algorithm follows the principle of survival, excellence and decline of the suitable person in the nature, and is a randomized search algorithm by using natural selection and natural genetic mechanism in the biology as a reference. The basic idea of the genetic algorithm is to simulate the natural evolution process, and to perform genetic operation on individuals with a certain structural form in a population so as to generate a new population and gradually approach an optimal solution. In the solving process, a population with a fixed scale is set, each individual in the population represents a possible solution of the problem, the degree of adaptation of the individual to the environment is judged by using a fitness function, the individual with poor fitness is eliminated, the individual with good fitness is continuously propagated, a new population is formed by selection, intersection and variation in the propagating process, and the steps are repeated to obtain more better solutions. The main steps of the genetic algorithm include: encoding, population initialization, individual fitness calculation, evolution calculation and decoding. In the coding step, the candidate solution of the problem is represented by a chromosome, the mapping process from a solution space to a coding space is realized, and a genetic algorithm does not directly understand the decision variable of the space, but converts the decision variable into the chromosome consisting of genes according to a certain structure. In the population initialization step, an initial population (encoded set) is generated that represents a potential set of solutions to the problem. In the step of calculating the individual fitness, the fitness of each individual is calculated by using a fitness function. In the evolutionary computation step, a population representing a new solution set is generated through selection, crossover and mutation. In the decoding step, the optimal individuals in the last generation population are decoded to realize the mapping from the coding space to the solution space, and can be used as the approximate optimal solution of the problem.

In addition, in the step of evolution calculation, selection (selection) is to eliminate unreasonable individuals according to the individual fitness and the principle of high-quality elimination; crossover (crossover) is an encoded cross-recombination, similar to that of a chromosome; mutation (mutation) is a change that encodes a perturbation with a small probability, similar to a gene mutation.

Fitness is as follows: which may be essentially a cost function, or a rule, that by calculating fitness for the individuals in the initial population, a measure of whether the individuals in the initial population are good or bad can be obtained.

(8) Evolutionary Algorithm (EAS)

The evolutionary algorithm is not a specific algorithm, but a "cluster of algorithms". The inspiration generated by the evolutionary algorithm is used for the reference of the evolutionary operation of organisms in the nature, and generally comprises basic operations such as gene coding, population initialization, cross mutation operators, operation retention mechanisms and the like. The basic framework of the evolutionary algorithm is also the framework described by the simple genetic algorithm, but the evolutionary mode has large difference, and the selection, the intersection, the variation, the population control and the like have many changes. Compared with the traditional optimization algorithms such as a calculus-based method and an exhaustion method, the evolutionary computation is a mature global optimization method with high robustness and wide applicability, has the characteristics of self-organization, self-adaptation and self-learning, can not be limited by problem properties, and effectively processes the complex problems which are difficult to solve by the traditional optimization algorithms.

(9) Particle Swarm Optimization (PSO)

The particle swarm algorithm is also called a particle swarm optimization algorithm, a particle swarm optimization algorithm or a bird swarm foraging algorithm.

Particle swarm optimization is an evolutionary computing technique (evolutionary computing) derived from the study of behavior of bird populations to prey on. The basic idea of the particle swarm optimization algorithm is to find the optimal solution through cooperation and information sharing among individuals in a group. Colloquially, consider such a scenario: a flock of birds randomly search for food. There is only one food in this area. All birds are unaware that food is there. But they know how far away the current location is from the food. Then what is the optimal strategy to find the food. The simplest and most effective way is to search the surrounding area of the bird that is currently closest to the food. In the whole searching process of the bird group, other birds can know the position of the bird group through mutual transmission of respective information, whether the bird group finds the optimal solution or not is judged through the cooperation, meanwhile, the information of the optimal solution is transmitted to the whole bird group, and finally, the whole bird group can be gathered around a food source, namely, the optimal solution is found. In PSO, the solution to each optimization problem is a bird in the search space, called a "particle". All particles have a fitness value (fitness value) determined by the optimized function, each particle also has a speed to determine the direction and distance they fly, and then the particles follow the current optimal particle to search in the solution space. The PSO is initialized to a population of random particles (random solution) and then an optimal solution is found by iteration. In each iteration, the particle updates itself by tracking two "extrema", the first being the optimal solution found by the particle itself, this solution being called the individual extremum pBest, the other being the optimal solution currently found by the whole population, this extremum being the global extremum gBest. Alternatively, instead of using the entire population, only a portion of it may be used as a neighborhood for the particle, and the extremum in all the neighborhoods is the local extremum. The PSO searches for global optimum by following the currently searched optimum value, the algorithm has the advantages of easiness in implementation, high precision, high convergence speed and the like, and the superiority is shown in solving the practical problem.

(10) Firework algorithm (FWA)

The firework algorithm is a group intelligent algorithm which is provided by the inspiration of firework explosion in night sky. The implementation of the firework algorithm comprises starting iteration, and sequentially utilizing an explosion operator, a mutation operator, a mapping rule and a selection strategy until a termination condition is reached, namely the accuracy requirement of the problem is met or the maximum function evaluation times is reached. The implementation of the fireworks algorithm may include several steps as follows: 1) fireworks are randomly generated in a particular solution space, each fireworks representing a solution of the solution space. 2) And calculating the fitness value of each firework according to the fitness function, and generating sparks according to the fitness value. The number of sparks is calculated based on the idea of immunity concentration in immunology, i.e. the fireworks with better fitness value generate more sparks. 3) According to the actual firework property and the actual situation of the search problem, sparks are generated in the radiation space of the fireworks. (the size of the explosion amplitude of a certain firework is determined by the fitness value of the firework on the function, and the larger the fitness value is, the larger the explosion amplitude is, and vice versa). Each spark represents one solution in the solution space. In order to ensure the diversity of the population, the fireworks need to be subjected to appropriate variation, such as Gaussian variation. 4) And calculating the optimal solution of the population, judging whether the optimal solution meets the requirements, stopping searching if the optimal solution meets the requirements, and continuing iteration if the optimal solution does not meet the requirements. The initial value of the iteration is the best solution obtained by this loop and the other solutions selected.

(11) Non-dominant sorting algorithm (NSGA)

The main difference between NSGA and simple genetic algorithms is: the algorithm is layered according to the dominating relationship between individuals before the selection operator is executed. The selection operator, the crossover operator and the mutation operator of the method are not different from a simple genetic algorithm. Before the selection operation is performed, the populations are ordered according to dominant and non-dominant relationships between individuals: first, all non-dominant individuals in the population are found and given a shared virtual fitness value. Obtaining a first non-dominated optimal layer; then, ignoring the stratified group of individuals, continuing to stratify the other individuals in the population according to the dominant versus non-dominant relationship and assigning them a new virtual fitness value that is less than the value of the previous layer, and continuing the above operation for the remaining individuals until all individuals in the population are stratified.

For convenience of understanding and explanation, the target neural network model is described as a CNN model (or CNN network) in the embodiment of the present application, but it should be understood that the type of the target neural network model in the embodiment of the present application is not limited thereto, and may be any one of the models described above and not shown.

Referring to fig. 5, a system architecture 100 is provided in an embodiment of the present application. In fig. 5, the data acquisition device 160 is used for acquiring training data, and for the image processing method of the embodiment of the present application, the training data may include a training image and a classification result corresponding to the training image, where the result of the training image may be a result of manual pre-labeling.

After the training data is collected, data collection device 160 stores the training data in database 130, and training device 120 trains target model/rule 101 based on the training data maintained in database 130.

Describing the target model/rule 101 obtained by the training device 120 based on the training data, the training device 120 processes the input original image, and compares the output image with the original image until the difference between the output image and the original image of the training device 120 is smaller than a certain threshold, thereby completing the training of the target model/rule 101.

The above target model/rule 101 can be used to implement the image processing method according to the embodiment of the present application. The target model/rule 101 in the embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices. It should be noted that, the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130, and may also obtain the training data from the cloud or other places for performing the model training.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 5, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR), a vehicle-mounted terminal, or a server or a cloud. In fig. 5, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include: the image to be processed is input by the client device.

The preprocessing module 113 and/or the preprocessing module 114 are configured to perform preprocessing according to input data (such as an image to be processed) received by the I/O interface 112, and in this embodiment of the application, the input data may be processed directly by the computing module 111 without the preprocessing module 113 and the preprocessing module 114 (or only one of them is available).

In the process that the execution device 110 preprocesses the input data or in the process that the calculation module 111 of the execution device 110 executes the calculation or other related processes, the execution device 110 may call the data, the code, and the like in the data storage system 150 for corresponding processes, and may store the data, the instruction, and the like obtained by corresponding processes in the data storage system 150.

Finally, the I/O interface 112 returns the processing results, such as the image classification results obtained above, to the client device 140 for presentation to the user.

It should be noted that the training device 120 may generate corresponding target models/rules 101 for different targets or different tasks based on different training data, and the corresponding target models/rules 101 may be used to achieve the targets or complete the tasks, so as to provide the user with the required results.

In the case shown in fig. 5, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data inputted to the I/O interface 112 and the output result outputted from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.

It should be noted that fig. 5 is only a schematic diagram of a system architecture provided in the embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation. For example, in FIG. 5, the data storage system 150 is an external memory with respect to the execution device 110, in other cases, the data storage system 150 may be disposed in the execution device 110.

As shown in fig. 5, a target model/rule 101 is obtained according to training of the training device 120, where the target model/rule 101 may be a neural network in the present application in this embodiment, and specifically, the neural network provided in this embodiment may be CNN, Deep Convolutional Neural Network (DCNN), Recurrent Neural Network (RNN), or the like.

Since CNN is a very common neural network, the structure of CNN will be described in detail below with reference to fig. 6. As described in the introduction of the basic concept above, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning (deep learning) architecture, where the deep learning architecture refers to performing multiple levels of learning at different abstraction levels through a machine learning algorithm. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images input thereto.

The structure of the neural network specifically adopted in the image processing method according to the embodiment of the present application may be as shown in fig. 6. In fig. 6, Convolutional Neural Network (CNN)200 may include an input layer 210, a convolutional/pooling layer 220 (where pooling is optional), and a neural network layer 230. The input layer 210 may obtain an image to be processed, and deliver the obtained image to be processed to the convolutional layer/pooling layer 220 and the following neural network layer 230 for processing, so as to obtain a processing result of the image. The following describes the internal layer structure in CNN 200 in fig. 6 in detail.

Convolutional layer/pooling layer 220:

and (3) rolling layers:

the convolutional layer/pooling layer 220 shown in fig. 6 may include layers such as 221 and 226, for example: in one implementation, 221 is a convolutional layer, 222 is a pooling layer, 223 is a convolutional layer, 224 is a pooling layer, 225 is a convolutional layer, 226 is a pooling layer; in another implementation, 221, 222 are convolutional layers, 223 is a pooling layer, 224, 225 are convolutional layers, and 226 is a pooling layer. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.

The inner working principle of a convolutional layer will be described below by taking convolutional layer 221 as an example.

Convolutional layer 221 may include a number of convolution operators, also called kernels, whose role in image processing is to act as a filter to extract specific information from the input image matrix, which may be essentially a weight matrix, which is usually predefined. During the convolution operation on the image, the weight matrix is usually processed on the input image pixel by pixel in the horizontal direction (or two pixels by two pixels … … depending on the value of the step size stride), so as to complete the extraction of the specific feature from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same size (row by column), i.e. a plurality of matrices of the same type, are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by "plurality" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The plurality of weight matrices have the same size (row × column), the sizes of the convolution feature maps extracted by the plurality of weight matrices having the same size are also the same, and the extracted plurality of convolution feature maps having the same size are combined to form the output of the convolution operation.

The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can extract information from the input image, so that the convolutional neural network 200 can make correct prediction.

When convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (e.g., 221) tends to extract more general features, which may also be referred to as low-level features; as the depth of convolutional neural network 200 increases, the more convolutional layers (e.g., 226) that go further back extract more complex features, such as features with high levels of semantics, the more highly semantic features are more suitable for the problem to be solved.

A pooling layer:

since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce pooling layers after the convolutional layer, i.e. layers 221 and 226 as illustrated by 220 in fig. 6, which may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.

The neural network layer 230:

after processing by convolutional layer/pooling layer 220, convolutional neural network 200 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, to generate the final output information (class information or other relevant information as needed), the convolutional neural network 200 needs to generate one or a set of outputs of the number of classes as needed using the neural network layer 230. Accordingly, a plurality of hidden layers (231, 232 to 23n shown in fig. 6) and an output layer 240 may be included in the neural network layer 230, and parameters included in the hidden layers may be pre-trained according to related training data of a specific task type, for example, the task type may include image recognition, image classification, image super-resolution reconstruction, and the like.

After the hidden layers in the neural network layer 230, i.e. the last layer of the whole convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to the classification cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e. the propagation from 210 to 240 in fig. 6 is the forward propagation) of the whole convolutional neural network 200 is completed, the backward propagation (i.e. the propagation from 240 to 210 in fig. 6 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

The structure of the neural network specifically adopted in the image processing method according to the embodiment of the present application may be as shown in fig. 7. In fig. 7, Convolutional Neural Network (CNN)300 may include an input layer 310, convolutional/pooling layer 320 (where pooling layer is optional), and neural network layer 330. Compared with fig. 6, in the convolutional layer/pooling layer 320 in fig. 7, a plurality of convolutional layers/pooling layers are parallel, and the features extracted respectively are all input to the all-neural network layer 330 for processing.

It should be noted that the convolutional neural networks shown in fig. 6 and fig. 7 are only examples of two possible convolutional neural networks of the image processing method according to the embodiment of the present application, and in a specific application, the convolutional neural networks used in the image processing method according to the embodiment of the present application may also exist in the form of other network models.

In addition, the structure of the convolutional neural network obtained by the neural network structure search method according to the embodiment of the present application may be as shown in the convolutional neural network structures in fig. 6 and 7.

Fig. 8 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present disclosure. The chip includes a neural Network Processor (NPU) 40. The chip may be provided in the execution device 110 as shown in fig. 5 to complete the calculation work of the calculation module 111. The chip may also be disposed in the training apparatus 120 as shown in fig. 5 to complete the training work of the training apparatus 120 and output the target model/rule 101. The algorithms for the various layers in the convolutional neural networks shown in fig. 6 and 7 can be implemented in a chip as shown in fig. 8.

The various modules and units in the NPU 40 are briefly described below.

The neural network processor NPU 40 as a coprocessor may be mounted on a main processing unit (CPU) (host CPU) and tasks are distributed by the main CPU. The core part of the NPU 40 is an arithmetic circuit 403, and when the NUP 40 operates, the controller 404 in the NPU 40 may control the arithmetic circuit 403 to extract data in a memory (weight memory or input memory) and perform an operation.

In some implementations, the arithmetic circuit 403 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuitry 403 is a two-dimensional systolic array. The arithmetic circuit 403 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 403 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 402 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 401 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in an accumulator (accumulator) 408.

The vector calculation unit 407 may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 407 may be used for network calculation of non-convolution/non-fully connected layers (FC) in a neural network, such as pooling (Pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector calculation unit 407 can store the processed output vector to the unified buffer 406. For example, the vector calculation unit 407 may apply a non-linear function to the output of the arithmetic circuit 403, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 407 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 403, for example for use in subsequent layers in a neural network.

The unified memory 406 is used to store input data as well as output data.

The weight data directly passes through a memory unit access controller 405 (DMAC) to carry input data in the external memory to the input memory 401 and/or the unified memory 406, store the weight data in the external memory into the weight memory 402, and store data in the unified memory 406 into the external memory.

A Bus Interface Unit (BIU) 410, configured to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 409 through a bus.

An instruction fetch buffer 409 connected to the controller 404, for storing instructions used by the controller 404;

the controller 404 is configured to call an instruction cached in the instruction memory 409 to implement controlling of a working process of the operation accelerator.

Generally, the unified memory 406, the input memory 401, the weight memory 402, and the instruction fetch memory 409 may all be on-chip memories. The external memory of the NPU may be memory external to the NPU, and the external memory may be double data rate synchronous dynamic random access memory (DDR SDRAM), High Bandwidth Memory (HBM), or other readable and writable memory.

The operation of each layer in the convolutional neural network shown in fig. 6 or fig. 7 may be performed by the operation circuit 403 or the vector calculation unit 407.

The execution device 110 in fig. 5 described above is capable of executing the steps of the image processing method according to the embodiment of the present application, and the CNN model shown in fig. 6 and 7 and the chip shown in fig. 8 may also be used for executing the steps of the image processing method according to the embodiment of the present application. The following describes a neural network structure search method and an image processing method according to an embodiment of the present application in detail with reference to the drawings.

As shown in fig. 9, the present embodiment provides a system architecture 500. The system architecture comprises a local device 501, a local device 502, an execution device 510 and a data storage system 550, wherein the local device 501 and the local device 502 are connected with the execution device 510 through a communication network.

The execution device 510 may be implemented by one or more servers. Optionally, the execution device 510 may be used with other computing devices, such as: data storage, routers, load balancers, and the like. The enforcement devices 510 may be disposed on one physical site or distributed across multiple physical sites. The execution device 510 may use data in the data storage system 550 or call program code in the data storage system 550 to implement the neural network structure search method of the embodiment of the present application.

Specifically, the execution device 510 may perform the following process:

determining a search network according to a target task, wherein the search network comprises a structure space and a parameter space, the structure space comprises a plurality of sub-network structures, the parameter space comprises a plurality of parameters, and each sub-network structure corresponds to at least one parameter in the parameter space; updating the parameter space according to the training data of the target task to obtain an updated parameter space; determining a set of subnetworks from the search network, the set of subnetworks comprising a first plurality of subnetwork structures of the plurality of subnetwork structures; updating the set of subnetworks to obtain an updated set of subnetworks, the updated set of subnetworks comprising a plurality of second ones of the plurality of subnetwork structures; and determining a plurality of target neural networks corresponding to the target tasks according to the updated parameter space and the updated sub-network set.

The execution device 510 can search out a plurality of target neural networks, which can be used for image classification or image processing, and the like.

The user may operate respective user devices (e.g., local device 501 and local device 502) to interact with the execution device 510. Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and so forth.

The local devices of each user may interact with the enforcement device 510 via a communication network of any communication mechanism/standard, such as a wide area network, a local area network, a peer-to-peer connection, etc., or any combination thereof.

In one implementation, the local device 501 or the local device 502 acquires relevant parameters of a plurality of target neural networks from the execution device 510, selects one target neural network matching with resource constraints of the local device from the plurality of target neural networks, deploys the selected target neural network on the local device 501 or the local device 502, and performs image classification or image processing and the like by using the selected target neural network.

In another implementation, the execution device 510 may select one target neural network from a plurality of target neural networks, where the target neural network matches the resource constraint condition of the execution device, and directly deploy the target neural network, and the execution device 510 classifies or otherwise processes the to-be-processed image by acquiring the to-be-processed image from the local device 501 and the local device 502.

The execution device 510 may also be referred to as a cloud device, and in this case, the execution device 510 is generally deployed in the cloud.

The following first describes the method for searching the neural network structure according to the embodiment of the present application in detail with reference to fig. 10. The method shown in fig. 10 may be executed by a neural network structure searching apparatus, which may be a computer, a server, a cloud device, or other device with sufficient computing power to implement the search of the neural network structure.

The method shown in fig. 10 includes steps 1001 to 1005, which are described in detail below.

1001. Determining a search network according to a target task, the search network comprising a structure space and a parameter space, the structure space comprising a plurality of sub-network structures, the parameter space comprising a plurality of parameters, wherein each sub-network structure corresponds to at least one parameter in the parameter space.

There are various ways to determine a search for a network based on a target task.

As a possible implementation manner, the search network is determined according to the application requirement of the target neural network corresponding to the target task.

For another example, when the target neural network is a neural network for processing voice data, the search network may include a long and short term memory unit, or a long and short term memory network (LSTM), and the like, and specifically, the search network may include an activation function (e.g., ReLU, Tanh), and the like.

As another possible implementation manner, the search space is determined according to the application requirement of the target neural network and the video memory resource condition of the device performing the neural network structure search.

As another possible implementation, the search space is determined according to the application requirements of the target neural network and the size of the model storage space.

That is, in the present application, the search space may be determined comprehensively according to the application requirements of the target neural network and the size of the model storage space. Specifically, the operation type and number included in the search space may be determined according to the application requirement of the target neural network, and then the operation type and number included in the search space may be adjusted according to the size of the model storage space to determine the operation type and number finally included in the search space. For example, after determining the type and number of operations contained in the search space according to the application requirements of the target neural network, if the model storage space is small, some less important operations in the search space may be eliminated. If the model storage space is large, the types and the number of the operations contained in the search space can be maintained or increased.

1002. And updating the parameter space according to the training data of the target task to obtain an updated parameter space.

Alternatively, for a sub-network structure, the training data of the target task may be used as the input of the sub-network structure, the output of the sub-network structure is compared with the correct result, and the parameter value corresponding to the sub-network structure in the parameter space is adjusted according to the difference between the output of the sub-network structure and the correct result, so that the difference between the output of the sub-network structure and the correct result satisfies a preset condition, for example, the difference between the output of the sub-network structure and the correct result is smaller than a certain threshold, or the difference between the output of the sub-network structure and the correct result is kept unchanged or no longer reduced.

1003. Determining a set of subnetworks from the search network, the set of subnetworks comprising a first plurality of subnetwork structures of the plurality of subnetwork structures.

It should be understood that the set of subnetworks in this step may be understood as the set of subnetworks prior to the update.

1004. Updating the set of subnetworks to obtain an updated set of subnetworks, the updated set of subnetworks comprising a second plurality of subnetwork structures of the plurality of subnetwork structures.

It should be noted that step 1002 may be understood as a process of updating a parameter space, and step 1003 and step 1004 may be understood as a process of updating a subnet set (or a process of updating a subnet structure), in this embodiment of the present application, the process of updating the parameter space is relatively independent from the process of updating the subnet set, so that the sequence number of the steps in fig. 10 does not limit the execution order of the steps, and the execution order of

steps

1002, 1003, and 1004 may be determined according to the actual application, and in some embodiments,

steps

1002, 1003, and 1004 may be executed multiple times, for example, the process of updating the parameter space (step 1002) and the process of updating the subnet set (step 1003 and step 1004) are executed alternately until a preset condition is met, and then step 1005 is executed.

In step 1003, determining the set of subnetworks from the search network may include an initialization process of the set of subnetworks and an update process of the set of subnetworks.

When the initialization process that determines the sub-network set as the sub-network set from the search network in this step is performed, the sub-networks included in the sub-network set (the sub-networks include the sub-network structures and the parameters corresponding to the sub-network structures) may be sampled from the search network, and specifically, a plurality of first sub-network structures included in the sub-network set may be sampled from a structure space, where the parameters corresponding to each of the first sub-network structures in the plurality of first sub-network structures may be obtained from the parameter space. Optionally, the parameter space may be not updated yet, and the parameter in the parameter space may be an initialized value or a randomly set value, that is, the parameter corresponding to each of the plurality of first sub-network structures is the initialized value or the randomly set value. It should be noted that, the method for obtaining a sub-network by sampling from a search network may perform random sampling according to an existing method, and the embodiment of the present application is not limited thereto.

When the update procedure that determines the sub-network set as the sub-network set from the search network in this step, the update sub-network set obtained in the previous round of sub-network set update procedure may be determined as the sub-network set in the current round of sub-network set update procedure (i.e., the sub-network set before update in the current round of update procedure). Alternatively, a part of the sub-network structures may be selected from the plurality of sub-network structures included in the updated sub-network set obtained in the previous round of updating the sub-network set to form the sub-network set in the current round of updating the sub-network set (i.e., the pre-updated sub-network set in the current round of updating the sub-network set), that is, the sub-network set in the current round of updating the sub-network set includes the part of the sub-network structures in the updated sub-network set obtained in the previous round of updating the sub-network set.

For convenience of understanding, the following descriptions of the processes are all premised on the determination of the completed search network and the initialization of the set of subnetworks, and the updating process of the parameter space and the updating process of the set of subnetworks are alternately performed.

Updating of parameter space

In the current round of parameter space updating process, step 1002, updating the parameter space according to the training data of the target task to obtain an updated parameter space, including: the parameter space is updated according to the training data of the target task and the subnetwork set (i.e. the subnetwork set determined in the updating process of the subnetwork set in the step 1003) to obtain an updated parameter space.

Optionally, the parameter space may be updated based on a gradient backpropagation algorithm. Specifically, the process of updating the parameter space based on the gradient back propagation algorithm is composed of a forward propagation process and a back propagation process. In the forward propagation process, training data of the target task is input into each sub-network structure included in the sub-network set, and the training data of the target task is processed layer by layer through an input layer and a hidden layer and is transmitted to an output layer. If the expected output value cannot be obtained in the output layer, a target function is taken (the selection of the target function corresponds to the target task, for example, when the target task is a classification task, the cross entropy of the output and the expected error can be taken as the target function, and in the regression problem, the output and the expected error square sum loss function can be taken as the target function), the backward propagation is carried out, the partial derivative of the target function to each neuron weight (each neuron weight belongs to a parameter in a parameter space) is obtained layer by layer, the gradient of the target function to the weight vector, namely, the parameter gradient corresponding to each sub-network structure is formed, the gradient of the parameter space can be obtained according to the parameter gradient corresponding to each sub-network structure, and therefore, the gradient is used as a basis for modifying the weight (namely, the basis for updating the parameter space), and the learning of the network is completed in the weight modifying process. When the error reaches the expected value or reaches a certain iteration number, the network learning is finished, the weight modification process is finished, and the updating of the parameter space is finished.

In the embodiment of the present application, the gradient of the parameter space of the search network may be determined in various ways.

As a possible implementation, the gradient of the parameter space may be an average value of the parameter gradients corresponding to the plurality of first sub-network structures included in the sub-network set. That is, the gradient of the parameter space may be obtained by summing the gradients of the parameters corresponding to all the first sub-network structures in the plurality of first sub-networks included in the sub-network set and then averaging the summed gradients.

It should be understood that, in the above possible implementation manner, when the parameter space is updated, the parameters corresponding to the plurality of first sub-network structures included in the sub-network set in the parameter space are updated.

As another possible implementation manner, the gradient of the parameter space may be an average value of the parameter gradients corresponding to some first sub-network structures in the plurality of first sub-network structures included in the sub-network set. That is, a part of the first sub-network structures is selected from the plurality of first sub-networks included in the sub-network set, and the parameter gradients corresponding to the part of the first sub-network structures are summed and then averaged to obtain the gradient of the parameter space.

It should be understood that, in the above possible implementation manner, when the parameter space is updated, the parameters corresponding to a part of the first sub-network structures in the plurality of first sub-network structures included in the sub-network set in the parameter space are updated.

It should be noted that, in the embodiment of the present application, the gradient of the parameter corresponding to the sub-network structure has the same meaning as the gradient expression of the sub-network, where the sub-network includes the sub-network structure and the parameter corresponding to the sub-network structure.

Updating of a set of subnetworks

In the embodiment of the present application, updating the sub-network set may be understood as replacing the sub-network structure included in the sub-network set, or may be understood as re-determining or selecting the sub-network structure included in the sub-network set. In the embodiment of the present application, the subnet set is maintained, and therefore, updating the subnet set can also be understood as optimizing the subnet structure in the subnet set.

In the updating process of the current round of the sub-network sets, after the sub-network sets are determined in step 1003, the parameter space of the current round is updated according to the sub-network sets, so that when the current round of the sub-network sets is updated, the parameter space is already updated, and the updated parameter space is obtained.

Optionally, a plurality of first sub-network structures included in the sub-network set may be processed based on any one of a particle swarm algorithm, a genetic algorithm, and a firework algorithm to obtain a next-generation sub-network structure, where the plurality of first sub-network structures included in the sub-network set may be referred to as a parent sub-network structure or a current-generation sub-network structure, and then a plurality of sub-network structures may be selected from the current-generation sub-network structure and the next-generation sub-network structure to form an update sub-network set, that is, a plurality of second sub-network structures included in the update sub-network set are selected from the current-generation sub-network structure and the next-generation sub-network structure. It should be understood that both the current generation sub-network structure and the next generation sub-network structure belong to the structural space determined in step 1001, i.e. the plurality of second sub-network structures comprised by the set of update sub-networks belong to the structural space determined in step 1001. It is to be understood that the next generation sub-network structure comprises at least one sub-network structure.

It should be understood that the embodiments of the present application may also obtain the next generation subnetwork structure based on other evolutionary algorithms, and the embodiments of the present application are not listed.

Specifically, taking any one evolutionary algorithm as an example, a next generation sub-network structure may be generated by performing intersection and mutation operations on a plurality of first sub-network structures included in the sub-network set, where the plurality of first sub-network structures are current generation sub-network structures; evaluating the fitness of the current generation sub-network structure and the next generation sub-network structure according to the verification data of the target task, such as parameter number, recognition rate and the like; and selecting a plurality of sub-network structures from the current generation sub-network structure and the next generation sub-network structure according to the evaluation result to form an updating sub-network set, wherein a plurality of second sub-network structures included in the updating sub-network set are selected from the current generation sub-network structure and the next generation sub-network structure.

There are various methods of selecting the plurality of second sub-network structures.

As a possible implementation manner, a sub-network structure that performs better on the training data of the target task may be selected from the current generation sub-network structure and the next generation sub-network structure according to any one of the measurement targets, such as the parameter, the network operation time, the operation memory required by the network, the network computation amount, the energy consumption, the number of floating point operations, the accuracy, and the like. For example, a highly accurate sub-network structure may be selected from the current generation sub-network structure and the next generation sub-network structure; or a sub-network configuration with a small number of parameters may be selected from the current generation sub-network configuration and the next generation sub-network configuration, and so on.

As another possible implementation, a plurality of second sub-network structures may be picked from the current-generation sub-network structure and the next-generation sub-network structure based on a non-dominant ranking algorithm, such as the non-dominant ranking algorithm NSGA-iii. Specifically, based on the non-dominated sorting algorithm NSGA-iii, the current generation sub-network structure and the next generation sub-network structure may be ranked (or layered) according to the dominance relationship between individuals according to a plurality of metric targets, specifically, all non-dominated individuals in the population (i.e., the current generation sub-network structure and the next generation sub-network structure in the embodiment of the present application) are found and given a shared virtual fitness value to obtain a first rank (or a first non-dominated optimal layer), then, the group of layered individuals is ignored, other individuals in the population continue to be layered according to the dominance and non-dominance relationship, and a new virtual fitness value is given to them, which is smaller than the value of the previous layer, and the above operations are continued for the remaining individuals until all individuals in the population are layered. Wherein the sub-network structures in the preceding level are not dominated by the following levels, e.g. the sub-network structures in the first level are not dominated by other levels, and the sub-network structures in the second level are not dominated by the third level and below, i.e. there will not be sub-network structures in the following levels that perform better than the sub-network structures in the preceding level. For example, taking accuracy and parameter number into account, there will not be a sub-network structure in the next level that has the same accuracy but a smaller parameter number than one sub-network structure in the previous level. Thus, a plurality of second sub-network structures may be picked from the preceding hierarchy.

Since the parameters in the parameter space of the search network are not sufficiently trained, because the performance of the sub-network structure during the search process does not completely match the performance of the final network structure, a small model with higher accuracy may dominate a large model with temporarily lower accuracy. Even though these large models may eventually achieve higher performance, they may be discarded prematurely during the search process. Therefore, to avoid the large model being prematurely eliminated, the large model can be protected during the search process, for example.

As yet another possible implementation, a plurality of second sub-network structures may be picked from the current-generation sub-network structure and the next-generation sub-network structure based on a non-dominant ranking algorithm, such as the modified non-dominant ranking algorithm of the embodiments of the present application (denoted pNSGA-iii for convenience of description).

Specifically, based on a non-dominated sorting algorithm, a plurality of first sub-network structures and a next generation sub-network structure are graded according to a first metric target to obtain a first grading result; based on a non-dominated sorting algorithm, carrying out hierarchical classification on the plurality of first sub-network structures and the next generation sub-network structure according to a second metric target to obtain a second hierarchical classification result; respectively merging the sub-network structures included in the same grade in the first-time grading result and the second-time grading result to obtain merged grading results; and selecting a plurality of sub-network structures from the merged grading result as a plurality of second sub-network structures.

In other words, based on the non-dominated sorting algorithm, the current generation sub-network structure and the next generation sub-network structure can be graded according to the domination relationship between individuals according to the first metric target, so as to obtain a first grading result; based on a non-dominance sorting algorithm, the current generation sub-network structure and the next generation sub-network structure can be graded according to the dominance relation between individuals according to a second metric target, and a second grading result is obtained; merging the sub-network structure included in the first level in the first-time level classification result and the sub-network structure included in the first level in the second-time level classification result to obtain a merged first level, similarly merging the sub-network structure included in the second level in the first-time level classification result and the sub-network structure included in the second level in the second-time level classification result to obtain a merged second level, and similarly merging other same levels to finally obtain a merged level classification result; a plurality of sub-network configurations are selected from the merged results of the rankings as a plurality of second sub-network configurations, for example, a plurality of second sub-network configurations are selected from the previous rankings in the merged results of the rankings.

Alternatively, the sub-network structures in the previous levels may be directly selected from the first-time ranking result and the second-time ranking result without merging the first-time ranking result and the second-time ranking result, and the sub-network structures selected from the first-time ranking result and the second-time ranking result may be used as the plurality of second sub-network structures included in the updated sub-network set.

Specifically, based on a non-dominated sorting algorithm, a plurality of first sub-network structures and a next generation sub-network structure are graded according to a first metric target to obtain a first grading result; based on a non-dominated sorting algorithm, carrying out hierarchical classification on the plurality of first sub-network structures and the next generation sub-network structure according to a second metric target to obtain a second hierarchical classification result; and selecting a plurality of sub-network structures from the previous levels in the first-time grading result and the second-time grading result to be a plurality of second sub-network structures.

When the non-dominated sorting algorithm improved by the embodiment of the application selects a plurality of second sub-network structures from the current generation sub-network structure and the next generation sub-network structure, the protection of some models which have poor temporary performance but may obtain better performance in the future can be realized, and the models are prevented from being dominated and discarded prematurely in the early stage of searching, so that the coverage range of the sub-network set can be expanded.

Optionally, the first metric target may include at least one of a parameter amount, a network runtime, a network required running memory, a network computation amount, an energy consumption, a number of floating point operations, and an accuracy rate, and the second metric target may include at least one of a parameter amount, a network runtime, a network required running memory, a network computation amount, an energy consumption, a number of floating point operations, and an increase rate of the accuracy rate.

For convenience of understanding, taking the first metric objective comprising the parameter number and the accuracy rate, and taking the second metric objective comprising the growth rate of the parameter number and the accuracy rate as an example, according to the parameter number and the accuracy rate, performing level division on the current generation sub-network structure and the next generation sub-network structure according to the domination relationship between individuals to obtain a first level division result; based on a non-domination sorting algorithm, the current generation sub-network structure and the next generation sub-network structure can be graded according to the domination relation between individuals according to the increase rate of the parameter quantity and the accuracy rate, and a secondary grading result is obtained; respectively merging the sub-network structures included in the same grade in the first-time grading result and the second-time grading result to obtain merged grading results; and selecting the sub-network structures in the former level from the merged grading results, namely selecting the sub-network structures with less parameter quantity or higher accuracy and the sub-network structures with more parameter quantity or higher accuracy, so that the sub-network structures with higher accuracy and less parameter quantity than the selected sub-network structures do not exist in the later level, and the sub-network structures with higher accuracy and more parameter quantity than the selected sub-network structures do not exist in the later level. That is, a network with less parameters, low precision, medium parameters and slightly high precision is selected by the first measurement target, and a network with more parameters, low precision (possibly with a large subsequent increase space) and medium parameters and slightly high precision is selected by the second measurement target.

Or, taking the first metric target comprising the parameter number and the accuracy rate, and the second metric target comprising the growth rate of the parameter number and the accuracy rate as an example, according to the parameter number and the accuracy rate, performing level division on the current generation sub-network structure and the next generation sub-network structure according to the domination relationship between individuals to obtain a first level division result; based on a non-domination sorting algorithm, the current generation sub-network structure and the next generation sub-network structure can be graded according to the domination relation between individuals according to the increase rate of the parameter quantity and the accuracy rate, and a secondary grading result is obtained; selecting a sub-network structure in the former level from the first-time grading result, namely selecting a sub-network structure with less parameter quantity or higher accuracy, wherein the sub-network structure in the former level has no sub-network structure with higher accuracy than the selected sub-network structure and less parameter quantity than the selected sub-network structure in the latter level, and simultaneously selecting a sub-network structure in the former level from the second-time grading result, namely selecting a sub-network structure with more parameter quantity or higher accuracy, wherein the sub-network structure which has higher accuracy than the selected sub-network structure and more parameter quantity than the selected sub-network structure does not exist in the latter level; and merging the sub-network structure selected from the first grading result and the sub-network structure selected from the second grading result to serve as a second sub-network structure in the updated sub-network set. Therefore, the method can ensure that in the searching process, the sub-network structure with less parameters or higher accuracy and the sub-network structure with more parameters or higher accuracy can be searched, so that the remaining sub-network structures do not have the sub-network structure which has higher accuracy than the selected sub-network structure and less parameters than the selected sub-network structure and the sub-network structure which has higher accuracy than the selected sub-network structure and more parameters than the selected sub-network structure, the range of covering the parameters can be expanded, the method is favorable for finally searching the sub-network structure with more parameters and high accuracy and the sub-network structure with less parameters and high accuracy, the target neural network with higher accuracy and the target neural network with less parameters can be determined, and the problem of small model traps can be effectively solved.

In addition, in the process of updating the sub-network set, when the fitness of the current generation sub-network structure and the fitness of the next generation sub-network structure are evaluated according to the verification data of the target task after the next generation sub-network structure is generated, the verification data of the target task needs to be input into the current generation sub-network structure and the next generation sub-network structure respectively, and an output result is obtained. It is mentioned above that in the current round of updating the set of subnetworks, the parameter space has already been updated, and what is obtained is the updated parameter space. Therefore, the parameters corresponding to the current generation sub-network structure and the next generation sub-network structure are parameters in the updated parameter space, that is, the parameters corresponding to the current generation sub-network structure are updated, and the parameters corresponding to the next generation sub-network structure inherit the parameters trained in the previous round. In this way, the training process of the current generation sub-network structure and the next generation sub-network structure is not trained from the beginning after randomly initializing the parameters, but inherits the parameters which are trained before, so that the search efficiency can be effectively improved.

1005. And determining a plurality of target neural networks corresponding to the target tasks according to the updated parameter space and the updated sub-network set.

In this step, the updated parameter space may be understood as the parameter space in which the last update is completed, and the updated sub-network set may be understood as the sub-network set in which the last update is completed.

The manner in which the plurality of target neural networks corresponding to the target task are determined may be varied,

as an example, a part or all of the second sub-network structures included in the set of update sub-networks may be selected as the target neural network corresponding to the target task, where the parameter corresponding to each of the plurality of second sub-network structures included in the set of update sub-networks belongs to the parameter in the update parameter space.

As still another example, a part or all of the plurality of second sub-network structures included in the update sub-network set may be selected to be widened or expanded to obtain a target neural network corresponding to the target task, where a parameter corresponding to each of the plurality of second sub-network structures included in the update sub-network set belongs to a parameter in the update parameter space.

As another example, a part or all of the second sub-network structures may be selected from the plurality of second sub-network structures included in the update sub-network set, and training data of a larger scale or more target tasks is input to perform training, so as to obtain a target neural network corresponding to the target task, where a parameter corresponding to each of the plurality of second sub-network structures included in the update sub-network set belongs to a parameter in the update parameter space.

According to the searching method of the neural network structure, one sub-network set is maintained in the searching process, a plurality of target neural networks can be obtained through one-time searching, and the plurality of sub-network structures included in the sub-network set share parameters in a parameter space, so that the searching speed can be increased, the searching efficiency is improved, the plurality of target neural networks can be efficiently obtained from one-time searching, the plurality of target neural networks can have different attributes such as parameter quantity, network running time, network required running memory, network calculated quantity, energy consumption, floating point operation number, accuracy and the like, a model capable of meeting different calculation and/or storage resource limitations is provided, and a user can select the corresponding target neural network according to resource constraints in practical application.

Optionally, after step 1005, the method for searching a neural network structure according to the embodiment of the present application further includes: acquiring target task data to be processed; selecting a first target neural network from the plurality of target neural networks according to computing resources and/or storage resources of the terminal device; and processing the target task data to be processed according to the first target neural network to obtain a processing result corresponding to the target task.

For better understanding of the searching method of the neural network structure according to the embodiment of the present application, the following briefly introduces an overall process of the searching method of the neural network structure according to the embodiment of the present application with reference to fig. 11. Taking a target task as an image classification as an example, an overall process of the neural network structure searching method provided by the embodiment of the present application is shown in fig. 11.

The searching method of the neural network structure in the embodiment of the application can be deployed on a computing node of a server, and the target neural network obtained by the searching method of the neural network structure in the embodiment of the application can be deployed on terminals such as a mobile phone terminal, a monitoring device, an automatic driving automobile and the device which can deploy the target neural network.

At 1101, a search network and training data are determined based on the target task and used as input to a neural network structure search device, such as a server.

The search network in the embodiment of the present application may be a super network (supernet), in which a large number of connection modes and different operations are included, the search network includes a structure space and a parameter space, the structure space includes a plurality of sub-network structures, and the parameter space includes a plurality of parameters, where each sub-network structure corresponds to at least one parameter in the parameter space. Different connections and different operations connected may constitute a sub-network structure.

For convenience of understanding, in the embodiment of the present application, the search network is denoted as N, the parameter space is denoted as W, and one set of subnetworks maintained includes N subnetworksStructure, the set of subnetworks being denoted as N₁,N₂,…N_NAnd in the sub-network set initialization process, the N sub-network structures can be sampled from the search network N, and the N sub-network structures share parameters in the parameter space W.

The training data is image data corresponding to the picture classification task, such as pictures.

At 1102, a parameter space update of the network is searched.

In this step, a gradient back propagation algorithm may be used to update the parameter space W of the search network with the subnetwork structure in the subnetwork set.

Specifically, the process of updating the parameter space based on the gradient back propagation algorithm is composed of a forward propagation process and a back propagation process. In the forward propagation process, training data is input into each sub-network structure included in the sub-network set, and the training data passes through the hidden layer through the input layer, is processed layer by layer and is transmitted to the output layer. If the expected output value cannot be obtained in the output layer, the cross entropy of the output and the expected is taken as an objective function, backward propagation is carried out, the partial derivative of the objective function to each neuron weight (each neuron weight belongs to a parameter in a parameter space W) is calculated layer by layer to form the gradient of the objective function to the weight vector, namely the parameter gradient corresponding to each sub-network structure, the gradient of the parameter space W of the search network N can be obtained according to the parameter gradient corresponding to each sub-network structure, and therefore the gradient is used as a basis for modifying the weight (namely the basis for updating the parameter space), and the learning of the network is completed in the weight modifying process. When the error reaches the expected value, the network learning is finished, the weight modification process is finished, and the updating of the parameter space is finished.

For example, the gradient of the parameter space W may be an average value of the parameter gradients corresponding to the N sub-network structures included in the sub-network set. For example, it can be calculated by the following formula:

for sub-network structure N of N sub-network structures_iThe corresponding parameters may be expressed as:

wherein, W_iFor sub-network structure N_iCorresponding parameter, C_iFor a binary parameter ({0,1}) to indicate whether the corresponding connection is reserved, i takes on a value ranging from 1 to a positive integer in N.

For sub-network structure N_iWith an output loss of L_iThen subnetwork structure N_iCorresponding parameter W_iThe gradient of (d) is:

the gradient of the parameter space W of the search network N is the average of the gradients of the parameters corresponding to the N sub-network structures:

for example, the gradient of the parameter space W may be an average value of the gradients of the parameters corresponding to M sub-network structures of the N sub-network structures included in the sub-network set. For example, it can be calculated by the following formula:

wherein M is a positive integer greater than or equal to 1 and less than N.

In the embodiment of the application, the parameter gradients corresponding to the N sub-network structures in the sub-network set are directly summed, so that a large calculation burden may be brought, the parameter gradients corresponding to the M first sub-network structures in a small batch (mini-batch) may be summed, the calculation burden may be reduced, and the updating speed of a parameter space is increased, so that the searching speed of a neural network structure can be increased, and the searching efficiency of the neural network structure is improved.

At 1103, the set of subnetworks is updated.

In this step, the subnet set is updated by using an evolutionary algorithm as an example. Wherein the set of subnetworks comprising N sub-network structures is denoted as { N }₁,N₂,…N_NThe P measurement targets adopted in the updating process of the sub-network set are F₁,F₂,…F_P}. The P metric targets may include any one or more of parameters, network runtime, network required operating memory, network computational load, energy consumption, number of floating point operations, and accuracy.

Specifically, based on an evolutionary algorithm, the next generation of t × N sub-network structures may be generated by performing intersection and mutation operations on N sub-network structures included in the sub-network set, where t is greater than 0, and t may be set correspondingly according to a target task or selected according to experience.

And evaluating the fitness of (t +1) multiplied by N sub-network structures of the parent sample (namely N sub-network structures) and the child sample (namely the next generation t multiplied by N sub-network structures), such as the parameter number, the identification rate and the like according to the verification data.

The improved non-dominated sorting algorithm proposed by the embodiment of the present application is used to sort out N sub-network structures from (t +1) × N sub-network structures containing parent samples and child samples as a new solution set, i.e. as an updated sub-network set.

It should be understood that, in this step, other numbers of sub-network structures may be selected from the (t +1) × N sub-network structures including the parent sample and the child sample, and the embodiment of the present application is not limited thereto.

Taking the measurement target including the parameter number and the accuracy as an example, a specific process applying the improved non-dominated sorting algorithm proposed by the embodiment of the present application may be as follows:

based on a non-domination sorting algorithm, carrying out hierarchical classification on (t +1) xN sub-network structures containing parent samples and child samples according to the domination relation among individuals according to the parameter number and the accuracy rate to obtain a first hierarchical classification result;

based on a non-domination sorting algorithm, carrying out level division on (t +1) x N sub-network structures containing parent samples and child samples according to the domination relation between individuals according to the increase rate of the parameter quantity and the accuracy rate to obtain a second level division result;

selecting the sub-network structures in the former level from the first-time grading result, namely selecting the sub-network structures with less parameter quantity or higher accuracy, wherein the sub-network structures which are higher in accuracy than the selected sub-network structures and less in parameter quantity than the selected sub-network structures do not exist in the later level;

selecting the sub-network structures in the previous level from the second-time grading result, namely selecting the sub-network structures with more parameter quantity or higher accuracy, wherein the sub-network structures which are higher in accuracy than the selected sub-network structures and have more parameter quantity than the selected sub-network structures do not exist in the next level;

merging the selected sub-network structure from the first ranking result and the selected sub-network structure from the second ranking result to form a new solution set { N }₁,N₂,…N_N}。

Therefore, the method can ensure that in the searching process, the sub-network structure with less parameters or higher accuracy and the sub-network structure with more parameters or higher accuracy can be searched, so that the remaining sub-network structures do not have the sub-network structure which has higher accuracy than the selected sub-network structure and less parameters than the selected sub-network structure and the sub-network structure which has higher accuracy than the selected sub-network structure and more parameters than the selected sub-network structure, the range of covering the parameters can be expanded, the method is favorable for finally searching the sub-network structure with more parameters and high accuracy and the sub-network structure with less parameters and high accuracy, the target neural network with higher accuracy and the target neural network with less parameters can be determined, and the problem of small model traps can be effectively solved.

After step 1103, step 1102 may be executed again, and in step 1102 executed again, the set of subnets used when updating the parameter space is the updated set of subnets obtained in step 1103 last time, that is, the new solution set. And the parameters corresponding to the sub-network structures in the new solution set used are the parameters in the update parameter space.

Similarly, after the step 1102 is executed again, the step 1103 may be executed again, and in the step 1103 executed again, the parameters corresponding to the sub-network structures used when the sub-network set is updated are the parameters in the parameter space updated again after the step 1102 is executed again. Step 1102 and step 1103 are performed alternately, that is, the parameter space update of the search network and the sub-network set update are performed alternately.

At 1104, after a plurality of times of alternate updates, when the subnet structures in the updated subnet set satisfy a preset condition, a plurality of target neural network models may be output, for example, N subnet structures may be obtained, where the N subnet structures respectively have different accuracies and attributes, such as parameters, network running time, running memory required by the network, network computation amount, energy consumption, and floating point operation number.

In 1105, in practical applications, a target neural network model may be selected from a plurality of target neural network models according to hardware resource limitations and deployed on a terminal, and a target task, such as image classification, is performed by using the selected target neural network model, and a corresponding classification result is output.

In order to illustrate the effect of the neural network structure search method according to the embodiment of the present application, the following compares the neural network structure search method according to the embodiment of the present application with a conventional scheme.

Fig. 12 shows a schematic diagram of verification effects of the network structures searched when the improved non-dominated sorting algorithm pNSGA-iii is adopted in the neural network structure searching method according to the embodiment of the present application and when the conventional non-dominated sorting algorithm NSGA-iii is adopted in the conventional scheme in the data set CIFAR-10 under similar constraints, and table 1 shows statistics of verification results of the network structures searched by the conventional method and the method of the present application on the data set CIFAR-10.

TABLE 1

The CARS-A to CARS-I in the table 1 are network structures obtained by searching when the improved non-dominated ranking algorithm pNSGA-III is applied, and the other network structures are network structures obtained by searching when the traditional non-dominated ranking algorithm NSGA-III is applied, and as can be seen from the table 1 and the figure 12, the network structure obtained by searching on the CIFA-R10 by the neural network structure searching method provided by the embodiment of the application can cover the range of 2.4M-3.6M parameters, and the network structures under different calculation constraints can be obtained by one-time searching for user selection. Compared with the traditional method, under the condition of the same parameter quantity, the method provided by the application can obtain a network structure with better performance, the search time only needs 0.4GPU Days, and the search overhead is greatly reduced.

Fig. 13 is a schematic diagram illustrating a verification effect of a network structure searched by the neural network structure searching method in the embodiment of the present application on a data set ImageNet, and table 2 shows statistics of verification results of a network structure searched by a conventional method and the method of the present application on the data set ImageNet.

TABLE 2

The CARS-A to CARS-I in the table 2 are network structures obtained by searching by applying the neural network structure searching method provided by the present application, and the other network structures are network structures obtained by searching by using the conventional method, and as can be seen from the table 2 and fig. 13, the network structures obtained by searching on the datA set ImageNet by the neural network structure searching method provided by the embodiment of the present application can cover the range of the parameters from 3.7M to 5.1M and the range of the calculated amount from 430-. Compared with the traditional method, the network structure performance obtained by the method provided by the application is better under the condition of the same parameter number.

The embodiment of the application also provides a searching method of the neural network structure. The method can be executed by a neural network structure searching device, and the neural network structure searching device can be equipment with enough computing power to realize searching of a neural network structure, such as a computer, a server, cloud equipment and the like.

Step one, determining a search network according to a target task, wherein the search network comprises a structural space, and the structural space comprises a plurality of sub-network structures.

In this step, the manner of determining a search network according to the target task is similar to the manner mentioned in step 1001, and specific reference is made to the above description, which is not repeated for brevity.

The search network includes a structure space that includes a plurality of sub-network structures. The structural space in the embodiment of the present application includes all network architectures that can be characterized in principle, and may also be understood as a structural space that is a complete set of network architectures, where each possible network architecture may be referred to as a sub-network architecture. The sub-network structure comprises all or part of basic modules and all or part of basic operations in the search network, and different sub-network structures comprise different types and numbers of the basic modules and the basic operations.

And secondly, determining a sub-network set from the search network, wherein the sub-network set comprises a plurality of first sub-network structures in the plurality of sub-network structures.

Determining the set of subnetworks from the search network may include an initialization process of the set of subnetworks and an update process of the set of subnetworks.

When the initialization process for determining the sub-network set as the sub-network set from the search network in this step is performed, the sub-network structures included in the sub-network set can be sampled from the structure space. The parameter corresponding to each of the plurality of first sub-network structures may be an initialized value or a randomly set value. It should be noted that, the method for obtaining a sub-network by sampling from a search network may perform random sampling according to an existing method, and the embodiment of the present application is not limited thereto.

And step three, updating the sub-network set to obtain an updated sub-network set, wherein the updated sub-network set comprises a plurality of second sub-network structures in the plurality of sub-network structures.

The procedure for updating the subnet set may be similar to the procedure for updating the subnet set in step 1004, and may be described in the above related description, and the difference from the procedure described in the above description is that in the embodiment of the present application, only the procedure for updating the subnet set does not have related operations in a parameter space, and the parameter corresponding to the first subnet structure in the subnet set may be a value that is reinitialized at each training.

Optionally, updating the set of subnetworks to obtain an updated set of subnetworks may include:

processing the plurality of first sub-network structures based on an evolutionary algorithm, such as any one of a particle swarm algorithm, a genetic algorithm and a firework algorithm, so as to obtain a next sub-network structure; based on a non-dominated sorting algorithm, carrying out grading on the plurality of first sub-network structures and the next generation sub-network structure according to a first metric target to obtain a first grading result; based on a non-dominated sorting algorithm, carrying out grading on the plurality of first sub-network structures and the next generation sub-network structure according to a second metric target to obtain a second grading result; merging the sub-network structures included in the same level in the first level classification result and the second level classification result respectively to obtain merged level classification results; selecting a plurality of sub-network structures from the merged ranking result as the plurality of second sub-network structures.

based on a non-dominated sorting algorithm, carrying out hierarchical classification on a plurality of first sub-network structures and next generation sub-network structures according to a first metric target to obtain a first hierarchical classification result; based on a non-dominated sorting algorithm, carrying out hierarchical classification on the plurality of first sub-network structures and the next generation sub-network structure according to a second metric target to obtain a second hierarchical classification result; and selecting a plurality of sub-network structures from the previous levels in the first-time grading result and the second-time grading result to be a plurality of second sub-network structures.

It should be understood that the above description of the improved non-dominated ranking algorithm is equally applicable to the embodiments of the present application, and reference is not made to the above description where it is not done in detail.

And fourthly, determining a plurality of target neural networks corresponding to the target tasks according to the updated sub-network set.

According to the searching method of the neural network structure, a sub-network set is maintained in the searching process, a model which can meet different calculation and/or storage resource limitations can be provided through one-time searching, and a user can select a corresponding target neural network according to resource constraints in practical application.

Fig. 14 is a schematic flowchart of an image processing method according to an embodiment of the present application. It should be understood that the same definitions, explanations and extensions as described above with respect to the method shown in fig. 11 apply to the method shown in fig. 14, and the repetitive description will be appropriately omitted when describing the method shown in fig. 14. The method shown in fig. 14 may be applied to a terminal device, and includes:

2001. and acquiring an image to be processed.

2002. A first target neural network is selected from a plurality of target neural networks based on computing resources and/or storage resources of the terminal device.

2003. And classifying the images to be processed according to the first target neural network to obtain a classification result of the images to be processed.

Wherein the plurality of target neural networks are derived from the updated parameter space and the set of updated sub-networks, the updated parameter space is obtained by updating the parameter space included in the search network according to the training data of the image classification task, the search network corresponds to the image classification task, the parameter space includes a plurality of parameters, the search network further comprising a fabric space, the fabric space comprising a plurality of sub-network fabrics, wherein each sub-network structure corresponds to at least one parameter in the parameter space, the updated sub-network set is obtained by updating the sub-network set determined from the search network, wherein the set of subnetworks comprises a plurality of first subnetwork structures of the plurality of subnetwork structures, the update sub-network set comprises a plurality of second sub-network structures of the plurality of sub-network structures.

It should be understood that in the method shown in fig. 14, a plurality of target neural networks are determined by maintaining one subnetwork set, and a plurality of subnetwork structures in the one subnetwork set share parameters in a parameter space, and the training process for the subnetwork structures is determined by updating the parameter space and the subnetwork set, instead of training from the beginning after randomly initializing the parameters, the training process for the subnetwork structures inherits the parameters that have been trained before, i.e., the parameters in the updated parameter space, so that the search efficiency can be effectively improved, and a plurality of target neural networks can be obtained by one search, and the plurality of target neural networks can have different attributes, such as size or number of floating point operations, a model that can satisfy different computational and/or storage resource limitations is provided, so that a user can select a corresponding target neural network according to resource constraints in practical applications.

Fig. 15 is a schematic hardware configuration diagram of a neural network configuration search apparatus according to an embodiment of the present application. The neural network structure search apparatus 3000 shown in fig. 15 (this apparatus 3000 may be specifically a kind of computer device) includes a memory 3001, a processor 3002, a communication interface 3003, and a bus 3004. The memory 3001, the processor 3002, and the communication interface 3003 are communicatively connected to each other via a bus 3004.

The memory 3001 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 3001 may store a program, and the processor 3002 is configured to perform the steps of the neural network structure searching method according to the embodiment of the present application when the program stored in the memory 3001 is executed by the processor 3002.

The processor 3002 may be a general Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more integrated circuits, and is configured to execute related programs to implement the method for searching a neural network structure according to the embodiment of the present invention.

The processor 3002 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the neural network structure searching method of the present application may be implemented by integrated logic circuits of hardware in the processor 3002 or instructions in the form of software.

The processor 3002 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 3001, and the processor 3002 reads the information in the memory 3001, and in combination with the hardware thereof, performs the functions required to be performed by the units included in the neural network structure search apparatus, or performs the neural network structure search method according to the method embodiment of the present application.

The communication interface 3003 enables communication between the apparatus 3000 and other devices or communication networks using transceiver means such as, but not limited to, a transceiver. For example, information of the target neural network to be determined and training data required in determining the target neural network may be acquired through the communication interface 3003.

The bus 3004 may include a pathway to transfer information between various components of the apparatus 3000 (e.g., memory 3001, processor 3002, communication interface 3003).

Fig. 16 is a schematic diagram of a hardware configuration of an image processing apparatus according to an embodiment of the present application. An image processing apparatus 4000 shown in fig. 16 includes a memory 4001, a processor 4002, a communication interface 4003, and a bus 4004. The memory 4001, the processor 4002 and the communication interface 4003 are communicatively connected to each other via a bus 4004.

Memory 4001 may be a ROM, a static storage device, and a RAM. The memory 4001 may store a program, and the processor 4002 and the communication interface 4003 are used to execute the steps of the image processing method according to the embodiment of the present application when the program stored in the memory 4001 is executed by the processor 4002.

The processor 4002 may be a general-purpose, CPU, microprocessor, ASIC, GPU or one or more integrated circuits, and is configured to execute a relevant program to implement the functions required to be executed by the units in the image processing apparatus according to the embodiment of the present application, or to execute the image processing method according to the embodiment of the method of the present application.

Processor 4002 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the image processing method according to the embodiment of the present application may be implemented by an integrated logic circuit of hardware in the processor 4002 or by instructions in the form of software.

The processor 4002 may also be a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware component. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The memory medium is located in the memory 4001, and the processor 4002 reads information in the memory 4001, and completes functions required to be executed by units included in the image processing apparatus of the embodiment of the present application in combination with hardware thereof, or executes the image processing method of the embodiment of the method of the present application.

Communication interface 4003 enables communication between apparatus 4000 and other devices or a communication network using transceiver means such as, but not limited to, a transceiver. For example, the image to be processed may be acquired through the communication interface 4003.

Bus 4004 may include a pathway to transfer information between various components of apparatus 4000 (e.g., memory 4001, processor 4002, communication interface 4003).

Fig. 17 is a schematic hardware configuration diagram of a neural network training device according to an embodiment of the present application. Similar to the apparatus 3000 and the apparatus 4000 described above, the neural network training apparatus 5000 shown in fig. 17 includes a memory 5001, a processor 5002, a communication interface 5003, and a bus 5004. The memory 5001, the processor 5002 and the communication interface 5003 are connected to each other via a bus 5004.

After the neural network is searched by the neural network structure search device shown in fig. 15, the neural network may be trained by the neural network training device 5000 shown in fig. 17, and the trained neural network may be used to execute the image processing method according to the embodiment of the present application.

Specifically, the apparatus shown in fig. 17 may acquire training data and a neural network to be trained from the outside through the communication interface 5003, and then train the neural network to be trained according to the training data by the processor.

It should be noted that although the above-described apparatus 3000, 4000 and 5000 merely illustrate a memory, a processor, and a communication interface, in a specific implementation, those skilled in the art will appreciate that the apparatus 3000, 4000 and 5000 may also include other devices necessary to achieve normal operation. Also, those skilled in the art will appreciate that the apparatus 3000, the apparatus 4000 and the apparatus 5000 may also comprise hardware components for performing other additional functions, according to particular needs. Furthermore, those skilled in the art will appreciate that apparatus 3000, apparatus 4000 and apparatus 5000 may also include only those components necessary to implement embodiments of the present application, and need not include all of the components shown in fig. 15, 16 and 17.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for searching a neural network structure, comprising:

determining a search network according to a target task, wherein the search network comprises a structure space and a parameter space, the structure space comprises a plurality of sub-network structures, the parameter space comprises a plurality of parameters, and each sub-network structure corresponds to at least one parameter in the parameter space;

updating the parameter space according to the training data of the target task to obtain an updated parameter space;

determining a set of subnetworks from the search network, the set of subnetworks comprising a first plurality of subnetwork structures of the plurality of subnetwork structures;

updating the set of subnetworks to obtain an updated set of subnetworks, the updated set of subnetworks comprising a plurality of second ones of the plurality of subnetwork structures;

and determining a plurality of target neural networks corresponding to the target tasks according to the updated parameter space and the updated sub-network set.

2. The searching method according to claim 1, wherein the updating the parameter space according to the training data of the target task to obtain an updated parameter space comprises:

inputting training data of the target task;

updating the parameter space based on a gradient back propagation algorithm, wherein the gradient of the parameter space is an average value of the parameter gradients corresponding to part of the plurality of first sub-network structures.

3. The method according to claim 1 or 2, wherein said updating the set of subnetworks to obtain an updated set of subnetworks comprises:

processing the plurality of first sub-network structures based on any one algorithm of a particle swarm algorithm, a genetic algorithm and a firework algorithm to obtain a next sub-network structure;

based on a non-dominated sorting algorithm, carrying out grading on the plurality of first sub-network structures and the next generation sub-network structure according to a first metric target to obtain a first grading result;

based on a non-dominated sorting algorithm, carrying out grading on the plurality of first sub-network structures and the next generation sub-network structure according to a second metric target to obtain a second grading result;

merging the sub-network structures included in the same level in the first level classification result and the second level classification result respectively to obtain merged level classification results;

selecting a plurality of sub-network structures from the merged ranking result as the plurality of second sub-network structures.

4. The search method of claim 3, wherein the first metric objective comprises at least one of a parameter amount, a network runtime, a network required operating memory, a network computation amount, an energy consumption, a number of floating point operations, and an accuracy rate, and wherein the second metric objective comprises at least one of a parameter amount, a network runtime, a network required operating memory, a network computation amount, an energy consumption, a number of floating point operations, and an increase rate of the accuracy rate.

5. The search method according to any one of claims 1 to 4, further comprising:

acquiring target task data to be processed;

selecting a first target neural network from the plurality of target neural networks according to computing resources and/or storage resources of the terminal device;

and processing the target task data to be processed according to the first target neural network to obtain a processing result corresponding to the target task.

6. A method for searching a neural network structure, comprising:

determining a search network according to a target task, the search network comprising a structural space comprising a plurality of sub-network structures;

and determining a plurality of target neural networks corresponding to the target tasks according to the updated sub-network set.

7. The method of claim 6, wherein the updating the set of subnetworks to obtain an updated set of subnetworks comprises:

8. The search method of claim 7, wherein the first metric objective comprises at least one of a parameter amount, a network runtime, a network required operating memory, a network computation amount, an energy consumption, a number of floating point operations, and an accuracy rate, and wherein the second metric objective comprises at least one of a parameter amount, a network runtime, a network required operating memory, a network computation amount, an energy consumption, a number of floating point operations, and an increase rate of the accuracy rate.

9. An image processing method applied to a terminal device is characterized by comprising the following steps:

acquiring an image to be processed;

selecting a first target neural network from a plurality of target neural networks according to computing resources and/or storage resources of the terminal device;

classifying the image to be processed according to the first target neural network to obtain a classification result of the image to be processed;

wherein the plurality of target neural networks are derived from an updated parameter space and an updated set of subnetworks, the updated parameter space is obtained by updating the parameter space included in the search network according to the training data of the image classification task, the search network corresponds to the image classification task, the parameter space includes a plurality of parameters, the search network further comprising a fabric space, the fabric space comprising a plurality of sub-network fabrics, wherein each sub-network structure corresponds to at least one parameter in the parameter space, the updated sub-network set is obtained by updating the sub-network set determined from the search network, wherein the set of subnetworks comprises a plurality of first subnetwork structures of the plurality of subnetwork structures, the update sub-network set comprises a plurality of second sub-network structures of the plurality of sub-network structures.

10. The method of claim 9, wherein the updating of the parameter space is based on a gradient back propagation algorithm, wherein the gradient of the parameter space is an average of the parameter gradients corresponding to some of the plurality of first sub-network structures.

11. The method according to claim 9 or 10, wherein the updated subnetwork set is obtained by selecting a plurality of subnetwork structures from merged results obtained by respectively merging subnetwork structures included in the same level in a first-time ranking result and a second-time ranking result obtained by ranking the plurality of first subnetwork structures and the next-generation subnetwork structure according to a first metric target based on a non-dominated ranking algorithm as the plurality of second subnetwork structures after processing the plurality of first subnetwork structures based on any one of a particle swarm algorithm, a genetic algorithm, and a fireworks algorithm to obtain the next-generation subnetwork structures, and ranking the plurality of first sub-network structures and the next generation sub-network structure according to a second metric objective.

12. The method of claim 11, wherein the first metric objective comprises at least one of a parameter amount, a network runtime, a network required operating memory, a network computation amount, an energy consumption, a number of floating point operations, and an accuracy rate, and wherein the second metric objective comprises at least one of a parameter amount, a network runtime, a network required operating memory, a network computation amount, an energy consumption, a number of floating point operations, and an increase rate of accuracy rate.

13. A neural network structure search apparatus, comprising:

a memory for storing a program;

a processor for executing the memory-stored program, the processor for performing the following processes when the memory-stored program is executed:

14. An image processing apparatus characterized by comprising:

a memory for storing a program;

acquiring an image to be processed;

15. A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, the program code comprising instructions for performing the method of any of claims 1-8 or 9-12.

16. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory through the data interface to perform the method of any one of claims 1-8 or 9-12.