CN116776925A

CN116776925A - Network structure searching method and device, storage medium and electronic equipment

Info

Publication number: CN116776925A
Application number: CN202310961004.5A
Authority: CN
Inventors: 周恩慈; 张骞; 黄畅
Original assignee: Beijing Horizon Information Technology Co Ltd
Current assignee: Beijing Horizon Information Technology Co Ltd
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-09-19

Abstract

A network structure searching method, apparatus, storage medium and electronic device are disclosed. The method comprises the following steps: expanding a network structure of a target network to obtain a first network, wherein the network structure comprises the number, the type and/or the connection mode of network layers in a neural network; based on hardware resources of a processor, searching a network structure of a first network by utilizing a plurality of training set samples corresponding to a plurality of tasks to obtain a second network, wherein the second network is a sub-network of the first network; determining a third network based on the network parameters of the first network and the network structure of the second network; and retraining the third network by utilizing the plurality of training set samples to obtain a fourth network. In the technical scheme provided by the disclosure, the network structure search and the retraining are performed in a multi-task scene, and the consistency of the network structure search and the retraining is higher, so that the searched neural network performs better in the multi-task scene.

Description

Network structure searching method and device, storage medium and electronic equipment

Technical Field

The disclosure relates to the field of computer technology, and in particular, to a network structure searching method, a device, a storage medium and electronic equipment.

Background

Neural network structure search (Neural Architecture Search, NAS) is a technique for automatically designing a neural network, the basic principle of which is to give a candidate neural network structure set which can be called a search space, and search out an optimal network structure meeting the requirements of a specific task from the candidate neural network structure set by using a certain strategy.

Currently, in practical application of neural network structure search, the neural network structure suitable for a single task is generally obtained by searching the neural network structure according to the single task. For example, a neural network structure search is performed according to a certain detection task, so that a neural network structure suitable for the detection task can be obtained; and searching the neural network structure according to a certain classification task to obtain the neural network structure suitable for the classification task. However, because different tasks have different demands on the neural network structure, the neural network structure obtained by searching the neural network structure according to a single task often has poor performance in a multi-task scenario.

Disclosure of Invention

The existing neural network structure obtained by searching the neural network structure according to a single task often has poor performance in a multi-task scene.

In order to solve the technical problems, the present disclosure provides a network structure searching method, a device, a storage medium and an electronic apparatus.

In a first aspect of the present disclosure, a network structure searching method is provided, including: expanding a network structure of a target network to obtain a first network, wherein the network structure comprises the number, the type and/or the connection mode of network layers in a neural network; based on hardware resources of a processor, searching a network structure of a first network by utilizing a plurality of training set samples corresponding to a plurality of tasks to obtain a second network, wherein the second network is a sub-network of the first network; determining a third network based on the network parameters of the first network and the network structure of the second network; and retraining the third network by utilizing the plurality of training set samples to obtain a fourth network.

According to the method provided by the embodiment of the disclosure, the network structure search is performed on the first network by utilizing a plurality of training set samples corresponding to a plurality of tasks to obtain the second network, and the third network is determined based on the network parameters of the first network and the network structure of the second network, so that the third network inherits the information learned by the first network during the network structure search by utilizing the plurality of training set samples corresponding to the plurality of tasks. On the basis, a plurality of training set samples of a plurality of tasks are utilized, and on the basis of the information inherited by the third network, the third network is continuously retrained to obtain a fourth network, so that the consistency of network structure searching and retrained is improved, and the fourth network has better performance in a multi-task scene.

In a second aspect of the present disclosure, there is provided a network structure search apparatus including: the expansion module is used for expanding a network structure of the target network to obtain a first network, wherein the network structure comprises the number, the type and/or the connection mode of network layers in the neural network; the searching module is used for searching the network structure of the first network by utilizing a plurality of training set samples corresponding to a plurality of tasks based on the hardware resources of the processor to obtain a second network, wherein the second network is a sub-network of the first network; a determining module, configured to determine a third network based on the network parameter of the first network and the network structure of the second network; and the training module is used for retraining the third network by utilizing the plurality of training set samples to obtain a fourth network.

In a third aspect of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the network structure search method provided in any one of the above aspects.

In a fourth aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; and the processor is used for reading the executable instructions from the memory and executing the instructions to realize the network structure searching method provided in any aspect.

An embodiment of a fifth aspect of the present disclosure proposes a computer program product which, when executed by an instruction processor in the computer program product, performs the network structure search method proposed by the embodiment of the first aspect of the present disclosure.

Drawings

Fig. 1 is a network configuration diagram of a target network according to an exemplary embodiment of the present disclosure.

Fig. 2 is a flowchart of a network structure searching method provided in an exemplary embodiment of the present disclosure.

Fig. 3 is a logic block diagram of a network structure search method provided by an exemplary embodiment of the present disclosure.

Fig. 4 is a schematic diagram of a first network provided by an exemplary embodiment of the present disclosure.

Fig. 5 is a schematic diagram of a first network and a second network provided by an exemplary embodiment of the present disclosure.

Fig. 6 is an iterative diagram of a network structure search provided by an exemplary embodiment of the present disclosure.

Fig. 7 is a flowchart of a network structure searching method step S230 provided in an exemplary embodiment of the present disclosure.

Fig. 8 is a flowchart of a network structure searching method step S231 provided in an exemplary embodiment of the present disclosure.

Fig. 9 is a block diagram of a network structure search apparatus provided in an exemplary embodiment of the present disclosure.

Fig. 10 is another block diagram of a network structure searching apparatus provided in an exemplary embodiment of the present disclosure.

Fig. 11 is a further block diagram of a network structure search apparatus provided in an exemplary embodiment of the present disclosure.

Fig. 12 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.

Detailed Description

For the purpose of illustrating the present disclosure, exemplary embodiments of the present disclosure will be described in detail below with reference to the drawings, it being apparent that the described embodiments are only some, but not all embodiments of the present disclosure, and it is to be understood that the present disclosure is not limited by the exemplary embodiments.

It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.

Summary of the application

The neural network may be described in terms of both network architecture and network parameters.

The network structure refers to an overall framework or topology of the neural network, defines the number and types of network layers in the neural network and/or the connection modes among different layers, and determines the paths of information transmission in the network and the dependency relationship among the layers. For example, in Convolutional Neural Networks (CNNs), we can have a structure of multiple convolutional layers, pooled layers, and fully connected layers. The arrangement and connection between these layers constitutes a network structure.

The network parameters comprise weight, deviation value and other parameters of each layer in the neural network. These parameters are used to adjust the characteristic representation of the input data so that the network can predict or classify according to the given training data. The network parameters may be learned through a training process with the goal of minimizing the loss function and optimizing the performance of the network. For example, gradients may be calculated by a back-propagation algorithm and network parameters updated using an optimization algorithm. By constantly adjusting network parameters, the neural network may gradually increase its predictive and generalization capabilities.

The NAS network structure search is simply called network structure search, and is a technology for automatically designing a neural network. Unlike conventional manual or empirically designed neural network approaches, neural network structure search can automatically search for an optimal network structure meeting specific task requirements in a given set of candidate neural networks called search spaces, using available computing resources, such as GPU resources. The basic concepts involved in the search of the neural network structure are explained below.

1. And searching the space.

The search space refers to a set of possible neural network structures that are considered in the neural network structure search process. In general, the search space may contain the following choices and configurations:

Network layer type: including convolutional layers, fully-connected layers, cyclic layers, pooling layers, and the like.

The connection mode between the layers is as follows: including series connection, parallel connection, jump connection, etc.

Layer number: the depth of the network, i.e. how many layers the network contains, is determined.

Number of nodes: number of nodes in each layer.

Activation function: selection of activation functions for introducing nonlinear transformations, such as ReLU, s i gmoi d, tanh, etc.

Other super parameters: such as learning rate, regularization parameters, selection of optimizers, etc.

2. Searching strategies.

The search strategy refers to a specific method and algorithm for finding the optimal network structure in the search space. In the neural network structure searching method provided by the disclosure, optional searching strategies include, but are not limited to, at least one of the following:

evolutionary algorithm: such as genetic algorithms, iterative searches and optimizations are performed in the search space using concepts similar to biological evolution. And generating a new network structure through operations such as selection, crossover, mutation and the like, and evaluating the adaptability according to the performance of the new network structure.

Reinforcement learning: the neural network structure search is regarded as a decision process, and a reward mechanism is defined to enable a search algorithm to be continuously explored in a search space and adjust strategies according to feedback.

Gradient-based optimization algorithm: gradient information is used to adjust the hyper-parameters or connectivity of the network structure.

3. Performance assessment strategies.

Performance evaluation policies refer to methods for evaluating the performance and effectiveness of different network structures. An alternative performance evaluation strategy includes training the searched network structure with a training set, and performing performance evaluation on the searched network structure with a verification set, such as evaluating indexes of accuracy, recall, F1 value, etc., to select the network structure with optimal performance.

Currently, in practical application of neural network structure search, the neural network structure suitable for a single task is generally obtained by searching the neural network structure according to the single task. For example, a neural network structure search is performed according to a certain detection task, so that a neural network structure suitable for the detection task can be obtained; and searching the neural network structure according to a certain classification task to obtain the neural network structure suitable for the classification task.

In general, a neural network goes from an input layer to an output layer, and its feature learning process goes through multiple stages (stages) of gradual transition from a low level to a high level. The neural network structure search is carried out on different single tasks, the obtained network structure is different, and the distribution of the calculated amount in each stage is mainly reflected in different. For example: in classification tasks, supervisory signals are usually only present in the last stage, so that the calculated amount is usually distributed in high level; in the detection task, since supervisory signals exist in a plurality of stages, the calculation amount is distributed in different levels.

Because the optimal network structures of different tasks are different, the network structure obtained by searching the neural network structure based on one single task is likely to perform poorly on the other single task, so that the network structure is difficult to apply to a multi-task scene.

Exemplary System

The target network may be any pre-set neural network. For example, any one of known neural networks may be used as the target network. As shown in fig. 1, the network structure of the target network may include a plurality of blocks, for example: a backbone (backbone) network 10, a neck (back) network 20, and a head (head) network 30, each block may include at least one network layer. Wherein:

the backhaul network is one or more layers from the neural network, and the main functions of the backhaul network are feature representation and feature extraction. Illustratively, as shown in FIG. 1, the backhaul network 10 may include an input layer, a stem layer, at least one stage structure. The stem layer may include one or more layers for preliminary processing and feature extraction of data input from the input layer, such as normalization, size scaling, convolution operations, pooling operations, etc., to provide suitable input for subsequent network structures. Each stage structure may include one or more layers of network, such as convolution layer, pooling layer, normalization layer, and activation function, for performing specific computing operations and feature extraction on the data, and by stacking multiple stage structures, features of higher levels may be gradually extracted and their expression capacity may be increased.

The back network 20 is an intermediate network connected between the back network 10 and the head network 30, and is used for fusing high-dimensional and low-dimensional features extracted by the back network 10 to enhance the model expression capability so as to better adapt to the requirements of specific tasks. The neg network 20 may include one or more block structures, each of which may include one or more layers of different types of networks, such as convolutional layers, pooled layers, normalized layers, residual connections, and the like. The layers are organized within the block structure in a sequence and manner of connection to achieve specific computational operations and feature processing.

The head network 30 is the last layer or layers of the neural network for predicting a particular task. The head network 30 may receive the representation of the feature from the neg network 20 and decouple the feature to obtain the prediction. The structure of the head network 30 is different depending on the task. For example, if it is a classification task, the head network 30 may be a classifier, such as one that includes a full connectivity layer and Softmax activation functions. In the case of object detection or segmentation tasks, the head network 30 may be a structure, such as a convolution layer, for predictive regression or classification. In the case of other tasks, the head network 30 may be other output structures suitable for the particular task.

Exemplary method

The network structure searching method provided by the embodiment of the disclosure can be run in a processor. The processor may be, for example, a graphics processing unit (Graphics Processing Unit, GPU) (also referred to as a graphics processor), a tensor processing unit (Tensor Processing Unit, TPU), a dedicated hardware accelerator, or the like. In order to satisfy the extensive neural network training tasks that may be involved in network structure searching, a high-performance computing cluster may be constructed that includes a plurality of computing nodes, each of which may include one or more processors, e.g., one or more GPUs. In this way, large-scale neural network training tasks involved in the network structure search can be distributed to different computing nodes for processing, so that the network structure search speed is increased.

The computing device or the high-performance computing cluster with one or more processors (such as GPUs) can be deployed locally or at the cloud, and accordingly, the network structure searching method provided by the disclosure can be run in one or more local processors or one or more processors at the cloud.

As shown in fig. 2 and 3, the network structure search method may include the steps of:

step S210, expanding the network structure of the target network to obtain a first network.

In one embodiment, the network structure of the target network is extended, including but not limited to: adding more network layers in the network structure, for example: more convolution layers, pooling layers, full connection layers, etc. can be added in the network structure of the target network. Adjusting the width of the network layer in the network structure, for example: the number of characteristic channels of the network layer is increased. A residual connection, a jump connection, is introduced in the network structure. A branch network or subnetwork is added to the network structure. The number of stage structures and/or block structures and the number of stacked layers are increased. Adjusting hyper-parameters, such as: for the convolution layer, the number of convolution kernels, the number of channels, the width, the height, the horizontal direction step length, the vertical direction step length and the like can be adjusted; for the pooling layer, the height of the pooling core, the width of the pooling core, the step length in the horizontal direction, the step length in the vertical direction and the like can be adjusted; for the activation function, the type of the activation function may be adjusted, and the parameters of the activation function may be adjusted.

The first network shown in fig. 4 may be extended from the target network shown in fig. 1. For example, the number of stacked layers of the stage structure in the backhaul network 10 may be increased based on the target network shown in fig. 1, so that the backhaul network 10 can extract features of higher levels. For example, from the original stage 1, the number may be increased to stage 1, stage 2, stage 3, stage4 and … stage n. The number of block structures, the number of stacked layers, the connection mode of the extended block structure and the stage structure, the connection mode between the extended block structures, and the like in the negk network 20 can also be increased. The number of network layers in the head network 30, the connection scheme between each network layer in the head network 30 and each network layer in the neg network 20, and the like may also be extended based on the number of the multitasks. For example, the output layer of the head network may be extended to task 1, task 2, task 3, task 4 … task n depending on the number of multitasking.

In the embodiment of the disclosure, the first network is a search space for searching the neural network structure.

In the embodiment of the present disclosure, the target network may be a known single-task neural network or a known multi-task neural network, which is not limited herein. For example, with a known single-task neural network as a target network, the network structure search method provided by the present disclosure may be used to obtain a multi-task neural network. For another example, with a known multi-tasking neural network as a target network, the network structure search method provided by the present disclosure may be used to obtain a neural network that can be applied to more tasks.

In the disclosed embodiment, the target network may be an initial untrained network, in which case the network parameters in the target network may be random or preset. Alternatively, the target network may be a pre-trained neural network, in which case the network parameters in the target network may be parameters after pre-training.

Step S220, based on the hardware resources of the processor, searching the network structure of the first network by utilizing a plurality of training set samples corresponding to a plurality of tasks to obtain a second network, wherein the second network is a sub-network of the first network.

As shown in fig. 5, the first network may include an input layer, an output layer, and a super network between the input layer and the output layer, as an example. The input layer is configured to receive a plurality of training set samples corresponding to a plurality of tasks, and may include, for example, a convolutional neural network Conv, so as to perform preliminary processing and feature extraction on input data, and send the data to a super network. The output layer is used for outputting a plurality of results corresponding to the number of tasks, for example, tasks task 0 to task n respectively. The super network is a network obtained by expanding a target network. After a network search of the first network, a sub-network, i.e. a second network, may be determined from the first network, the second network comprising part of the network nodes and part of the connection relations in the first network, e.g. comprising network nodes with solid boxes in fig. 4 and connection relations indicated by solid arrows.

In the field of deep learning, a neural network may be applied to one or more tasks including, but not limited to: image classification tasks, target detection tasks, semantic segmentation tasks, instance segmentation tasks, object tracking tasks, generation tasks, natural language processing tasks, recommendation tasks, speech recognition tasks, and the like. Embodiments of the present disclosure provide for obtaining neural networks applicable to multitasking, and therefore, multiple training set samples may be determined based on the specific requirements of the multitasking.

For example, if the multitasking includes an image classification task, a target detection task, a semantic segmentation task, and an object tracking task, in step S220, a network structure search may be performed on the first network by using a training sample set 1 corresponding to the image classification task, a training sample set 2 corresponding to the target detection task, a training sample set 3 corresponding to the semantic segmentation task, and a training sample set 4 corresponding to the object tracking task. Wherein each training sample set may include a training set and a validation set, for example: training sample set 1 may comprise training set 1 and verification set 1, training sample set 2 may comprise training set 2 and verification set 2, training sample set 3 may comprise training set 3 and verification set 3, and training sample set 4 may comprise training set 4 and verification set 4.

In some embodiments, as shown in fig. 6, the network structure search for the first network using multiple training set samples corresponding to multiple tasks may be an iterative process in which: a sub-network can be obtained by sampling from the first network based on a preset searching strategy; then training the sub-network by utilizing training sets corresponding to a plurality of tasks, and updating the weight and bias of the sub-network by using algorithms such as back propagation and the like in the training process by minimizing a loss function; then, performing performance evaluation on the trained sub-network by using verification sets corresponding to a plurality of tasks; and, can also optimize the network parameter of the first network on the basis of the performance assessment result of the sub-network, make the first network obtain the information from sub-network already estimated, in order to produce the better sub-network in the follow-up.

In this way, in the process of searching the network structure of the first network, a new sub-network can be obtained in each iteration, and parameters of the first network can be continuously optimized through training and evaluating the sub-network until an optimal sub-network is obtained from the first network or a preset search termination condition is reached. The optimal sub-network is the second network, and the network structure of the second network is the optimal sub-network structure.

It can be understood that in step S220, the searching of the network structure for the first network includes determining the sub-network in the first network by using the searching policy, training the searched sub-network by using the training set, and performing the performance evaluation on the searched sub-network by using the verification set. These processes involve a large amount of computation, which consumes processor hardware resources.

Taking a processor as an example of a GPU, hardware resources of the GPU may include, but are not limited to:

number of stream processors (Stream Processors): stream processors are small processing units within the GPU that perform parallel computations, in general, the greater the number of stream processors, the more computationally intensive the GPU.

Size of Video Memory: the video memory is used for storing model parameters and intermediate calculation results of the neural network, and in general, the larger the video memory is, the larger the neural network can be accommodated, and the searching efficiency of the neural network structure is improved.

Parallel memory access (Parallel Memory Access) capability: also referred to as bit width, which indicates the ability of the GPU to read and write multiple data elements simultaneously, a large bit width is beneficial to speeding up the hyper-parametric update and gradient computation process in neural network structure search.

Tensor Core (Tensor Core) number: the tensor core is that some GPUs are provided with special hardware units for accelerating matrix operation, so that the speed of matrix multiplication and convolution operation involved in the search of the neural network structure can be accelerated.

The traditional neural network structure searching method is realized based on a training sample set of a single task, the neural network training and performance processes are respectively realized based on the training set of the single task and a verification set of the single task, and the calculated amount is relatively small. According to the neural network structure searching method provided by the embodiment of the disclosure, the network structure searching is performed on the first network by utilizing the training set samples corresponding to the tasks, in the searching process of the first network structure, the training and performance evaluation processes are respectively realized on the basis of the training sets with the tasks and the verification sets with the tasks, and the calculated amount is relatively larger.

Since the hardware resources of the processor are limited, when the calculation amount in the neural network structure search process is large, there is a possibility that a problem of insufficient hardware resources of the processor occurs. For example: when the parallel computation amount is large, the resources of the stream processor may be insufficient; when the intermediate results generated in the calculation process are more, insufficient memory resources may be caused; when the super-parameter updating and gradient calculation amount involved in the process of optimizing the network structure is large, the bit width resource is possibly insufficient; when the computation amount of matrix multiplication and convolution operations is large, tensor core resources may be insufficient.

When the hardware resources of the processor are insufficient, a series of problems may occur in the searching process of the neural network structure, for example: the search speed is slow, the search is wrong and cannot be continued, a large-scale network structure cannot be loaded, data is lost, and the like.

In order to avoid the above problem, in the embodiments of the present disclosure, the process of searching the network structure of the first network by using a plurality of training set samples corresponding to a plurality of tasks may be optimized based on the hardware resources of the processor. For example: the first network may be searched in blocks, that is, only a certain portion of the first network is searched in each iteration process, while other portions of the first network are temporarily fixed, so as to reduce the amount of computation and reduce the occupation of processor resources. Also for example: the first network may be searched using smaller-scale training set samples, or the training set may be scaled up stepwise to reduce the amount of computation and reduce the occupation of processor resources. Also for example: in the process of searching the network structure of the first network, redundant connection or neurons can be removed by adopting a pruning algorithm and the like, so that the calculated amount is reduced, and the occupation of processor resources is reduced.

Step S230, determining a third network based on the network parameters of the first network and the network structure of the second network.

In some embodiments, network parameters of the first network may be mapped into a network structure of the second network to obtain a third network. The network parameters of the first network may include, for example, a weight, an offset value, etc. of each layer; the network structure of the second network may for example comprise the number, type and/or manner of connection between the different layers of the network layer etc.

In the traditional neural network structure searching method, in order to obtain a multi-task neural network, a training sample set of a single task is used for searching the neural network structure, and a training sample of the multi-task is used for retraining the searched sub-network, so that the consistency of a searching stage and a retraining stage is poor, and the retraining obtained neural network performs poorly on the multi-task.

In the embodiment of the disclosure, in order to improve the consistency of the search phase and the retraining phase, the network parameters of the first network may be mapped into the network structure of the second network to obtain the third network on the basis of searching the second network by using the multitasking training sample set.

It can be appreciated that in the process of searching the network structure of the first network, the sub-network is trained and performance evaluated by using a plurality of multitasking training set samples, and the network parameters of the first network are optimized based on the performance evaluation result of the sub-network. It can be seen that in the embodiments of the present disclosure, the network parameters of the first network are optimized based on the multi-tasking scenario.

Therefore, compared with the second network, the third network contains the network parameters of the first network, inherits the information learned by the first network during the network structure searching by using the multiple training set samples, and is more suitable for the multi-task scene consistent with the multiple training set samples. Therefore, if the third network is retrained by using a plurality of training set samples after the multi-task, the consistency of the searching and the retrained is higher, which is beneficial to improving the performance of the neural network after the retrained in the multi-task scene.

And step S240, retraining the third network by utilizing a plurality of training set samples to obtain a fourth network.

In some embodiments, the third network may be retrained with training sets corresponding to the plurality of tasks, and the performance of the trained fourth network on each task may also be verified with verification sets corresponding to the plurality of tasks.

Specifically, the training set corresponding to each task may include a plurality of input samples and target outputs corresponding to the respective input samples. The input samples are input data of the neural network, and according to different task types, the input samples can be data of different types such as images, texts, sounds and the like. The input samples may also be represented in a form understandable by a neural network such as a vector or a multidimensional matrix. The target output is a desired input result associated with the input sample, which, depending on the task type, the target output contains category information, target information, semantic information and the like corresponding to the input sample.

During training, input samples in the training set corresponding to each task can be input into a third network to obtain output results corresponding to each input sample, and then network parameters of the third network are updated in a gradient mode by means of a back propagation algorithm based on loss l oss (such as mean square error, cross entropy and the like) between the output results and target output. And repeating the steps, thereby continuously updating the network parameters of the third network, and gradually reducing the loss until the preset termination condition is met. The trained third network is the fourth network.

In addition, the performance of the fourth network can be evaluated by using the verification set corresponding to each task, for example, indexes such as accuracy, precision, recall and the like of the fourth network are evaluated, so that the fourth network can be optimized continuously after the fourth network according to the evaluation result.

As can be seen from the above technical solutions, in the network structure searching method in the embodiments of the present disclosure, a plurality of training set samples corresponding to a plurality of tasks are used to perform network structure searching on a first network to obtain a second network, and a third network is determined based on network parameters of the first network and a network structure of the second network, so that the third network inherits information learned by the first network during the network structure searching process using the plurality of training set samples corresponding to the plurality of tasks. On the basis, a plurality of training set samples of a plurality of tasks are utilized, and on the basis of the information inherited by the third network, the third network is continuously retrained to obtain a fourth network, so that the consistency of network structure searching and retrained is improved, and the fourth network has better performance in a multi-task scene.

In some embodiments, step 230 may specifically include, based on the embodiment shown in fig. 2, the following steps: assigning a network structure of the second network using the network parameters of the first network, a third network is obtained.

In a specific implementation, parameters such as weight, deviation value and the like of a network layer corresponding to a network structure of the second network in the first network may be used.

In order to obtain a multi-task neural network, the conventional neural network structure search method generally performs network structure search on a single task, and retrains the searched sub-network on the multi-task. Because the task scenes of the network structure search and the retraining are different, the retraining of the sub-network is actually a brand new training in the multi-task scene, and therefore, the network parameters used in the retraining of the sub-network are usually initial values, such as random initial values, so that the network structure search phase and the retraining phase do not have consistency.

In the embodiment of the disclosure, the network structure search and the retraining are both performed under a multi-task scene, on the basis, the network parameters of the first network are utilized to assign a value to the network structure of the second network to obtain the third network, and the information learned by the first network in the network structure search stage can be transmitted to the third network, so that the third network can continue to use the same multi-task training sample set to train on the basis of the learned information, thereby improving the consistency of the network structure search stage and the retraining stage.

In some embodiments, based on the embodiment shown in fig. 2, in step S230, performing a network structure search on the first network by using a plurality of training set samples corresponding to a plurality of tasks, to obtain a second network may specifically include the following steps S310 to S320:

step S310, determining the training weight corresponding to each task in the training phase as the searching weight corresponding to each task.

Step S320, searching the network structure of the first network by utilizing the plurality of training set samples and the searching weight to obtain a second network.

In a multitasking training scenario, the neural network is typically provided with a total loss function, which may be obtained by weighted summation of the loss function of each task and the training weights of each task. Thus, the training weights of the individual tasks affect the extent to which the individual tasks contribute to the overall loss function of the neural network. By adjusting the training weights of different tasks, the balance and flexibility of different tasks can be realized in the multi-task training, so that the neural network is better adapted to the requirements of different tasks, and better performance is achieved on a plurality of tasks.

In the embodiment of the disclosure, training weights may be set for each task, so as to retrain the third network by using training set samples and training weights corresponding to each task. In addition, the searching weight of each task can be set, so that the contribution degree of each task to the searched second network can be adjusted by using the searching weight.

Wherein the training weight of each task can be used as the searching weight.

For example, if the multitasking includes an image classification task, a target detection task, a semantic segmentation task, and an object tracking task, then the training weight and the search weight of the image classification task may be set to w1, the training weight and the search weight of the target detection task may be set to w2, the training weight and the search weight of the semantic segmentation task may be set to w3, and the training weight and the search weight of the object tracking task may be set to w4.

According to the method provided by the embodiment of the disclosure, the training weights corresponding to the tasks in the training stage are used as the searching weights corresponding to the tasks in the network structure searching stage, so that the training weights are identical to the searching weights, and the consistency of the network structure searching stage and the retraining stage is improved.

As shown in fig. 7, in some embodiments, the step 230 may specifically include the following steps S231-S232 based on the embodiment shown in fig. 2 above:

step S231, based on the hardware resources of the processor, respectively performing network structure search on each block in the first network by using a plurality of training set samples to obtain a sub-network corresponding to each block in the first network.

Taking the first network shown in fig. 3 as an example, based on hardware resources of a processor, firstly, performing network structure search on a backhaul network in the first network by using a plurality of training set samples to search out a sub-network corresponding to a backhaul network part, then, performing network structure search on a neg network in the first network by using a plurality of training set samples to search out a sub-network corresponding to a neg network part, and finally, performing network structure search on a head network in the first network by using a plurality of training set samples to search out a sub-network corresponding to a head network part.

Step S232, obtaining a second network based on the sub-networks corresponding to the blocks in the first network.

In a specific implementation, the sub-networks corresponding to the blocks in the first network may be combined to obtain the second network. The merging mode may be, for example, connecting or weighting and merging the sub-networks corresponding to the blocks according to the sequence, which is not limited herein.

According to the method provided by the embodiment of the disclosure, considering that the hardware resources of the processor are limited, if the overall network structure search is performed on the first network, the problem of insufficient hardware resources of the processor may occur, so that the network search is performed on each block of the first network respectively, so that fewer processor resources are occupied, the problem of insufficient hardware resources of the processor is avoided, and the stability of the network structure search is improved.

As shown in fig. 8, in some embodiments, the step S231 may specifically include the following steps S2311-S2313 based on the embodiment shown in fig. 7 described above:

in step S2311, at least one block is selected from the respective blocks of the first network as a block to be searched based on the hardware resource of the processor.

Wherein the number of blocks selected from the first network at a time may be determined based on hardware resources of the processor. If the hardware resources of the processor are relatively large, for example, a network structure search is performed on a high-performance computing cluster containing a large number of GPUs, more blocks may be selected as the blocks to be searched at a time. If the hardware resources of the processor are small, for example, when a network structure search is performed on a computing device containing only a small number of GPUs, then fewer blocks may be selected as blocks to be searched at a time.

For example, considering that the computational load of the search based on the network structure of the multitasking is large, one block may be selected from the first network at a time as the block to be searched, for example: firstly, selecting a backup network from a first network as a block to be searched, selecting a neg network as the block to be searched after the backup network is searched, and selecting a head network as the block to be searched after the neg network is searched.

In step S2312, a search subspace corresponding to the block to be searched is determined based on the block to be searched and other blocks other than the corresponding block to be searched in the target network.

The target network is an unexpanded network with fewer network layers, fewer connections, fewer stage structures and/or b/lock structures and fewer stacking layers, and less complexity of the super-parameters than the first network. Therefore, when performing network searching on each block to be searched, if other blocks which are not searched adopt a simpler network structure of the target network, that is, a network formed by the block to be searched in the first network and other blocks except the corresponding block to be searched in the target network is used as a searching subspace corresponding to the block to be searched, the calculation amount in the searching process can be reduced, and the searching efficiency is improved.

For example, when the back bone network of the first network is searched, the network structure of the target network may be adopted by the back network and the head network, when the back network of the first network is searched, the network structure of the target network may be adopted by the back bone network and the head network, and when the head network of the first network is searched, the network structure of the target network may be adopted by the back bone network and the head network.

Step S2313, based on the plurality of training set samples, searching the network structure of the search subspace corresponding to the block to be searched, and obtaining the subnetwork corresponding to the block to be searched.

For example, network searching may be performed on a backhaul network of the first network, where a network structure of a target network is adopted by a network structure of a back network and a head network, so as to obtain a sub-network corresponding to the backhaul network; then, network searching is carried out on a network of the first network, and a back network and a head network adopt a network structure of a target network to obtain a sub-network corresponding to the network of the second network; and finally, carrying out network searching on the head network of the first network, wherein the backhaul network and the network adopt the network structure of the target network to obtain the sub-network corresponding to the head network.

After searching the network structure of the three blocks of the backhaul network, the neg network and the head network in the first network, the sub-networks corresponding to the respective blocks may be combined to obtain the second network.

According to the method provided by the embodiment of the disclosure, network structure searching is respectively carried out on each block of the first network, so that occupation of system resources of a processor in a searching process can be reduced, and searching efficiency and searching stability are improved.

In some embodiments, based on the embodiment shown in fig. 3, step S210 may specifically include: and expanding the network structure of the target network based on the calculated amount distribution of each task in the network structure to obtain a first network.

In the embodiment of the disclosure, based on the characteristic of the calculation amount distribution of each task in the neural network, each stage in the network structure of the target network can be expanded in a targeted manner.

For example, if the classification tasks in the plurality of tasks are relatively more or the classification tasks have relatively high weights, the high level stage in the network structure of the target network may be extended more deeply. For example: compared with other stages, more network layers are added, more residual connection and jump connection are introduced, more branch networks or sub-networks are added, and the convolution kernel has more channels and the like.

For example, if there are more tasks for target detection in the plurality of tasks or the weight of the tasks for target detection is higher, the stage of each level in the network structure of the target network may be extended in a balanced manner.

According to the method provided by the embodiment of the disclosure, the network structure of the target network is expanded in a targeted manner according to the characteristics of the calculated amount distribution of each task in the neural network, so that the calculated amount of the searched neural network on each stage is matched with the calculated amount distribution of the multiple tasks, and the performance of the neural network is improved.

Exemplary apparatus

Fig. 9 is a block diagram of a network structure search apparatus provided in an exemplary embodiment of the present disclosure. The network structure searching apparatus may be run in a processor (e.g., GPU) for performing the network structure searching method of any of the above embodiments of the present disclosure, for example.

As shown in fig. 9, the network structure search apparatus may include: an expansion module 410, a search module 420, a determination module 430, and a training module 440. Wherein:

an expansion module 410, configured to expand a network structure of the target network to obtain a first network, where the network structure includes a number, a type, and/or a connection manner of network layers in the neural network;

the searching module 420 is configured to perform a network structure search on the first network by using a plurality of training set samples corresponding to a plurality of tasks based on hardware resources of the processor, to obtain a second network, where the second network is a sub-network of the first network;

a determining module 430, configured to determine a third network based on the network parameters of the first network and the network structure of the second network;

the training module 440 is configured to retrain the third network by using the plurality of training set samples to obtain a fourth network.

In some embodiments, the determining module 430 is specifically configured to assign a value to the network structure of the second network by using the network parameter of the first network, to obtain the third network.

In some embodiments, the searching module 420 is specifically configured to determine a training weight corresponding to each task in the training phase as a searching weight corresponding to each task, and perform a network structure search on the first network by using a plurality of training set samples and the searching weights to obtain the second network.

In some embodiments, the expansion module 410 is specifically configured to expand the network structure of the target network based on the calculation amount distribution of each task in the network structure, so as to obtain the first network.

In one embodiment, as shown in fig. 10, the search module 420 specifically includes:

a searching unit 421, configured to perform network structure searching on each block in the first network by using a plurality of training set samples based on the hardware resource of the processor, so as to obtain a sub-network corresponding to each block in the first network;

a determining unit 422, configured to obtain the second network based on the sub-networks corresponding to the respective blocks in the first network.

Fig. 11 is a further block diagram of a network structure search apparatus provided in an exemplary embodiment of the present disclosure. In some embodiments, as shown in fig. 11, the search unit 421 may specifically include:

A selecting subunit 4211, configured to select, based on the hardware resource of the processor, at least one block from the blocks in the first network as a block to be searched;

a determining subunit 4212, configured to determine a search subspace corresponding to the block to be searched based on the block to be searched and other blocks other than the corresponding block to be searched in the target network, where the search subspace represents a search range of network structure search;

the searching subunit 4213 is configured to perform network structure searching on the searching subspace corresponding to the block to be searched based on the multiple training set samples, to obtain a subnetwork corresponding to the block to be searched.

The beneficial technical effects corresponding to the exemplary embodiments of the present apparatus may refer to the corresponding beneficial technical effects of the foregoing exemplary method section, and will not be described herein.

Exemplary electronic device

Fig. 12 is a block diagram of an electronic device according to an embodiment of the present disclosure, including at least one processor 111 and a memory 112.

The processor 111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 11 to perform desired functions.

Memory 112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and executed by the processor 111 to implement the network structure search methods and/or other desired functions of the various embodiments of the disclosure above.

In one example, the electronic device 11 may further include: an input device 113 and an output device 114, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

The input device 113 may also include, for example, a keyboard, a mouse, and the like.

The output device 114 may output various information to the outside, which may include, for example, a display, a speaker, a printer, and a communication network and a remote output apparatus connected thereto, etc.

Of course, only some of the components of the electronic device 11 relevant to the present disclosure are shown in fig. 12 for simplicity, components such as buses, input/output interfaces, and the like being omitted. In addition, the electronic device 11 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer readable storage Medium

In addition to the methods and apparatus described above, embodiments of the present disclosure may also provide a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the network structure search method of the various embodiments of the present disclosure described in the "exemplary methods" section above.

The computer program product may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the network structure search method of the various embodiments of the present disclosure described in the "exemplary methods" section above.

A computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example but not limited to, a system, apparatus, or device including electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present disclosure have been described above in connection with specific embodiments, but the advantages, benefits, effects, etc. mentioned in the embodiments of the present disclosure are merely examples and are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.

Various modifications and alterations to this disclosure may be made by those skilled in the art without departing from the spirit and scope of the application. Thus, the present disclosure is intended to include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A network structure search method, comprising:

expanding a network structure of a target network to obtain a first network, wherein the network structure comprises the number, the type and/or the connection mode of network layers in a neural network;

based on hardware resources of a processor, searching a network structure of the first network by utilizing a plurality of training set samples corresponding to a plurality of tasks to obtain a second network, wherein the second network is a sub-network of the first network;

Determining a third network based on network parameters of the first network and a network structure of the second network;

and retraining the third network by utilizing the plurality of training set samples to obtain a fourth network.

2. The method of claim 1, wherein determining a third network based on network parameters of the first network and a network structure of the second network comprises:

and assigning a network structure of the second network by using the network parameters of the first network to obtain the third network.

3. The method of claim 1, wherein searching the first network for the network structure using a plurality of training set samples corresponding to a plurality of tasks to obtain the second network comprises:

determining the training weight corresponding to each task in the training phase as the searching weight corresponding to each task;

and searching the network structure of the first network by utilizing the training set samples and the searching weight to obtain the second network.

4. The method of claim 1, wherein performing a network structure search on the first network based on hardware resources of the processor using a plurality of training set samples corresponding to a plurality of tasks to obtain the second network comprises:

Based on the hardware resources of the processor, respectively carrying out network structure search on each block in the first network by utilizing the plurality of training set samples to obtain a sub-network corresponding to each block in the first network;

and obtaining the second network based on the sub-network corresponding to each block in the first network.

5. The method of claim 4, wherein performing a network structure search on each block in the first network based on the hardware resources of the processor using the plurality of training set samples, respectively, to obtain a sub-network corresponding to each block in the first network, comprises:

selecting at least one block from the blocks of the first network as a block to be searched based on the hardware resource of the processor;

determining a search subspace corresponding to the block to be searched based on the block to be searched and other blocks except the block to be searched in the target network, wherein the search subspace represents a search range of network structure search;

and searching a network structure of a search subspace corresponding to the block to be searched based on the training set samples to obtain a subnetwork corresponding to the block to be searched.

6. The method of claim 1, wherein expanding the network structure of the target network to obtain the first network comprises:

and expanding the network structure of the target network based on the calculated amount distribution of each task in the network structure to obtain the first network.

7. A network structure search apparatus comprising:

the expansion module is used for expanding a network structure of the target network to obtain a first network, wherein the network structure comprises the number, the type and/or the connection mode of network layers in the neural network;

the searching module is used for searching the network structure of the first network by utilizing a plurality of training set samples corresponding to a plurality of tasks based on hardware resources of the processor to obtain a second network, wherein the second network is a sub-network of the first network;

a determining module, configured to determine a third network based on a network parameter of the first network and a network structure of the second network;

and the training module is used for training the third network by utilizing the plurality of training set samples to obtain a fourth network.

8. The apparatus of claim 7, wherein the search module comprises:

The searching unit is used for searching the network structure of each block in the first network by utilizing the plurality of training set samples based on the hardware resources of the processor to obtain a sub-network corresponding to each block in the first network;

and the determining unit is used for obtaining the second network based on the sub-network corresponding to each block in the first network.

9. A computer-readable storage medium storing a computer program for executing the network structure search method according to any one of the preceding claims 1-6.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the network structure search method according to any one of claims 1-6.