CN113988272A

CN113988272A - Method and device for generating neural network, computer equipment and storage medium

Info

Publication number: CN113988272A
Application number: CN202111314991.7A
Authority: CN
Inventors: 刘吉豪; 刘宇; 宋广录; 黄鑫
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-11-08
Filing date: 2021-11-08
Publication date: 2022-01-28

Abstract

The present disclosure provides a method, an apparatus, a computer device and a storage medium for generating a neural network, wherein the method comprises: determining a first search space corresponding to a preset network structure unit, a second search space corresponding to a downsampling processing unit and a third search space corresponding to the size information of the neural network structure, wherein the preset network structure unit and the downsampling processing unit are used for constructing the neural network structure; performing network structure search based on the first search space, the second search space, the third search space and a preset training data set corresponding to the deep learning task to obtain a target neural network structure corresponding to an expected result of the deep learning task; a target neural network for processing the deep learning task is generated based on the target neural network structure.

Description

Method and device for generating neural network, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to a method and an apparatus for generating a neural network, a computer device, and a storage medium.

Background

For a deep learning task, such as an image classification task, an image detection task, or an image segmentation task, different types of network structures are often required to jointly construct a deep neural network for execution. Each network structure has a plurality of choices, and the combination of different network structures can generate neural networks with different structures, and the performance of the neural networks in the corresponding deep learning task depends on the quality of the network structures. For the combination of different network structures, the established neural network is mainly established manually and established by artificial experience, so that the performance performances of precision, resource consumption and the like in a deep learning task after the established neural network is trained are to be improved.

Disclosure of Invention

The embodiment of the disclosure at least provides a method, a device, a computer device and a storage medium for generating a neural network.

In a first aspect, an embodiment of the present disclosure provides a method for generating a neural network, including:

determining a first search space corresponding to a preset network structure unit, a second search space corresponding to a downsampling processing unit and a third search space corresponding to size information of a neural network structure, wherein the preset network structure unit and the downsampling processing unit are used for constructing the neural network structure;

performing network structure search based on the first search space, the second search space, the third search space and a training data set corresponding to a preset deep learning task to obtain a target neural network structure corresponding to an expected result of the deep learning task;

generating a target neural network for processing a deep learning task based on the target neural network structure.

In the aspect, a search space for searching a neural network structure is constructed in advance, and one search space is provided with multiple networks or multiple kinds of size information in advance, so that the multiple search spaces constructed in the aspect provide a larger search space, a network structure corresponding to a preset network structure unit is searched from a first search space, a network structure corresponding to a downsampling processing unit is searched from a second search space, size information of the neural network structure is searched from a third search space, and the network structure is searched from the larger search space, so that the neural network structures with different structures can be obtained, the diversity of the searched neural network structures is improved, and the processing performance of the finally obtained target neural network on a deep learning task is improved. Further, a neural network structure is constructed based on the searched network structure and size information, and then a target neural network structure corresponding to the expected result of the deep learning task can be obtained by combining the preset training data set corresponding to the deep learning task and the expected result of the deep learning task, namely the target neural network with the accuracy meeting the expectation can be obtained.

In an alternative embodiment, the first search space comprises at least two of a convolution structure unit, a self-attention mechanism based coding-decoding structure unit and a multi-layer perceptron MLP structure unit.

In an alternative embodiment, the second search space includes a local downsampling unit for sampling local context information, a global downsampling unit for sampling global context information, and a global-local downsampling unit for sampling global context information and local context information.

In an alternative embodiment, the local downsampling unit comprises a first convolution operator for performing a convolution operation;

the global downsampling unit comprises a first processing unit based on a self-attention mechanism and a second convolution operator used for carrying out convolution operation on data input into the first processing unit;

the global-local downsampling unit includes a second processing unit based on a self-attention mechanism and a third convolution operator for performing a convolution operation on data input to the second processing unit.

In an optional embodiment, the performing a network structure search based on the first search space, the second search space, and the third search space, and a training data set corresponding to a preset deep learning task includes:

constructing a target search space based on the first search space, the second search space, and the third search space;

determining a plurality of candidate neural network structures based on the target search space; each candidate neural network structure comprises at least one preset network structure unit selected from a first search space and at least one down-sampling processing unit selected from a second search space, and the size information of each preset network structure unit in each candidate neural network is the size information selected from a third search space;

constructing a corresponding candidate neural network based on the candidate neural network structure;

determining a processing result of the candidate neural network on the deep learning task by adopting a training data set corresponding to the preset deep learning task;

and searching a target neural network structure in the target search space based on the candidate neural network structure by taking the minimization of the difference between the processing result of the candidate neural network on the deep learning task and the expected result as a target.

In an alternative embodiment, the searching the target neural network structure in the target search space based on the candidate neural network structure includes:

respectively updating the first selection probability of each preset network structure unit, the second selection probability of each downsampling processing unit and the third selection probability of the size information corresponding to each preset network structure unit based on the processing result of each target candidate neural network structure in part of candidate neural network structures on the deep learning task;

searching a network structure based on the updated first selection probability, second selection probability and third selection probability, and reselecting a plurality of new target candidate neural network structures;

and under the condition that the difference between the processing result and the expected result of the deep learning task respectively by the plurality of new target candidate neural network structures does not meet a first preset condition, returning to the step of updating the selection probability.

In an optional embodiment, the updating, based on the processing result of each target candidate neural network structure in the partial candidate neural network structures on the deep learning task, the first selection probability of each preset network structure unit, the second selection probability of each downsampling processing unit, and the third selection probability of the size information corresponding to each preset network structure unit respectively includes:

determining a target candidate neural network structure of which the accuracy corresponding to the processing result meets a second preset condition based on the processing result of each target candidate neural network structure in part of candidate neural network structures on the deep learning task;

and respectively updating the first selection probability of each preset network structure unit, the second selection probability of each downsampling processing unit and the third selection probability of the size information corresponding to each preset network structure unit based on the target candidate neural network structure of which the accuracy corresponding to the processing result meets a second preset condition.

In an optional embodiment, the searching the target neural network structure in the target search space based on the candidate neural network structure further includes:

and under the condition that the difference between the processing result and the expected result of the deep learning task by the plurality of new target candidate neural network structures respectively meets the first preset condition, taking the new target candidate neural network structure corresponding to the difference meeting the first preset condition as the target neural network structure.

In an alternative embodiment, the candidate neural network structure comprises a plurality of feature extraction nodes and a plurality of down-sampling nodes which are distributed at intervals; the feature extraction node comprises a preset network structure unit selected from a first search space; the down-sampling node comprises a down-sampling processing unit selected from the second search space; and the size information of each preset network structure unit is selected from the third search space.

In an alternative embodiment, the generating a target neural network for processing a deep learning task based on the target neural network structure includes:

obtaining a plurality of training samples;

and training the target neural network to be trained corresponding to the target neural network structure by using the training sample until a training cut-off condition is met, and obtaining the trained target neural network.

In a second aspect, an embodiment of the present disclosure further provides an apparatus for generating a neural network, including:

the device comprises a search space determining module, a down-sampling processing unit and a neural network structure, wherein the search space determining module is used for determining a first search space corresponding to a preset network structure unit, a second search space corresponding to the down-sampling processing unit and a third search space corresponding to the size information of the neural network structure, and the preset network structure unit and the down-sampling processing unit are used for constructing the neural network structure;

the network structure searching module is used for searching a network structure based on the first searching space, the second searching space and the third searching space and a training data set corresponding to a preset deep learning task to obtain a target neural network structure corresponding to an expected result of the deep learning task;

and the neural network generating module is used for generating a target neural network for processing a deep learning task based on the target neural network structure.

In an optional embodiment, the network structure search module is configured to construct a target search space based on the first search space, the second search space, and the third search space;

In an optional implementation manner, the network structure searching module is configured to update the first selection probability of each preset network structure unit, the second selection probability of each downsampling processing unit, and the third selection probability of the size information corresponding to each preset network structure unit, respectively, based on a processing result of each target candidate neural network structure in a part of candidate neural network structures on the deep learning task;

In an optional embodiment, the network structure searching module is configured to determine, based on a processing result of each target candidate neural network structure in a part of candidate neural network structures on the deep learning task, a target candidate neural network structure whose accuracy degree corresponds to the processing result meets a second preset condition;

In an optional implementation manner, the network structure searching module is further configured to, in a case that differences between processing results of the deep learning task and the expected results of the plurality of new target candidate neural network structures respectively satisfy the first preset condition, take a new target candidate neural network structure corresponding to the difference that satisfies the first preset condition as the target neural network structure.

In an optional embodiment, the neural network generating module is configured to obtain a plurality of training samples;

In a third aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the computer device is run, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any one of the possible methods of generating a neural network of the first aspect.

In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the first aspect described above, or any one of the possible methods for generating a neural network in the first aspect.

For the above description of the effects of the apparatus, the computer device and the storage medium for generating a neural network, reference is made to the above description of the method for generating a neural network, and details are not repeated here.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 illustrates a flow chart of a method of generating a neural network provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a neural network structure constructed based on a target search space according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating a network structure of different downsampling processing units provided in an embodiment of the present disclosure;

fig. 4 is a schematic diagram illustrating a specific flow of network structure search provided by an embodiment of the present disclosure;

fig. 5 is a schematic diagram illustrating an apparatus for generating a neural network according to an embodiment of the present disclosure;

fig. 6 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Furthermore, the terms "first," "second," and the like in the description and in the claims, and in the drawings described above, in the embodiments of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein.

Reference herein to "a plurality or a number" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Research shows that the performance performances of precision, resource consumption and the like of the built neural network in a deep learning task after training need to be improved by artificial experience.

Based on the above research, the present disclosure provides a method of generating a neural network, in which a search space for performing a search of a neural network structure is constructed in advance, a plurality of networks or a plurality of size information are constructed in advance in one search space, therefore, the plurality of search spaces constructed by the present invention provides a larger search space, the network structure corresponding to the preset network structure unit is searched from the first search space, the network structure corresponding to the down-sampling processing unit is searched from the second search space, and the size information of the neural network structure is searched from the third search space, the network structure is searched from a larger search space, the neural network structures with various different structures can be obtained, the diversity of the neural network structures obtained by searching is improved, therefore, the processing performance of the finally obtained target neural network on the deep learning task is improved. Further, a neural network structure is constructed based on the searched network structure and size information, and then a target neural network structure corresponding to the expected result of the deep learning task can be obtained by combining the preset training data set corresponding to the deep learning task and the expected result of the deep learning task, namely the target neural network with the accuracy meeting the expectation can be obtained.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The following is a detailed description of specific terms related to embodiments of the disclosure:

1. the multi-layer perceptron (MLP), a model of a feedforward artificial neural network, maps multiple input datasets onto a single output dataset.

2. The Transformer is an encoding-decoding Network structure based on a Self-Attention mechanism, which uses a Self-Attention mechanism and does not adopt a sequential structure of a Recurrent Neural Network (RNN), so that a model can be trained in a parallelization way and can have global information.

3. Convolutional Neural Networks (CNN), which are a kind of multi-layer Neural Networks designed specifically for processing two-dimensional data, can extract a topological structure from a two-dimensional image, optimize a network structure by using a back propagation algorithm, and solve unknown parameters in the network.

In order to facilitate understanding of the present embodiment, a method for generating a neural network disclosed in the embodiments of the present disclosure is first described in detail, and an execution subject of the method for generating a neural network provided in the embodiments of the present disclosure is generally a computer device with certain computing power. In some possible implementations, the method of generating a neural network may be implemented by a processor invoking computer readable instructions stored in a memory.

The method for generating a neural network provided by the embodiments of the present disclosure is described below by taking an execution subject as a computer device as an example.

Referring to fig. 1, a flowchart of a method for generating a neural network provided in an embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:

s101: and determining a first search space corresponding to the preset network structure unit, a second search space corresponding to the downsampling processing unit and a third search space corresponding to the size information of the neural network structure, wherein the preset network structure unit and the downsampling processing unit are used for constructing the neural network structure.

In this step, the preset network structure unit may include an infrastructure network structure unit for constructing a feature extraction unit in the complete neural network structure. The downsampling processing unit may include a network structure unit for performing downsampling processing.

Here, the first search space, the second search space, and the third search space may be pre-constructed target search spaces for neural network structures, where the first search space includes network structures corresponding to pre-set network structure units, that is, network structures used for performing mainstream operations of deep learning tasks, such as CNN network structures, transform network structures, MLP network structures, and so on. The second search space comprises a preset network structure corresponding to the down-sampling processing unit, namely a network structure for executing the down-sampling task in the deep learning task, a local down-sampling network structure, a global-local down-sampling network structure, a global down-sampling network structure and the like. The third search space stores preset size information of the neural network structure, such as network cycle times repeats, network channel numbers, channel adjustment parameters expansion and the like.

The size information in the third search space may be used to configure a preset network structure unit, that is, the size information of the preset network structure unit; and configuring a network structure corresponding to the preset network structure unit by utilizing a uniform form of the size information in the third search space, so that the test processing of a deep learning task is conveniently carried out on the obtained neural network structure.

The preset network structure unit and the down-sampling processing unit are used for constructing a neural network structure. The size information is used for configuring size data for a network structure corresponding to a preset network structure unit.

For example, if a plurality of preset network structure units are searched from the first search space, and the network structures corresponding to each preset network structure unit are a CNN network structure, a transform network structure, and an MLP network structure, the network structure corresponding to each preset network structure unit may be configured in a unified form, i.e., repetats, channels, and expansion; for example, a CNN network structure is configured by means of repattachments-1, channels-1 and expansion-1; configuring a Transformer network structure by using repeats-2, channels-2 and expansion-2; the MLP network structure is configured with repeats-3, channels-3 and expansion-3. Wherein, the data represented by repeats-1, repeats-2 and repeats-3 are different; the data represented by channels-1, 2 and 3 are different; the data represented by expansion-1, expansion-2, and expansion-3 are different.

S102: and searching a network structure based on the first search space, the second search space and the third search space and a preset training data set corresponding to the deep learning task to obtain a target neural network structure corresponding to an expected result of the deep learning task.

In specific implementation, a network structure corresponding to a preset network structure unit can be searched from the first search space; the network structure corresponding to the down-sampling processing unit can be searched from the second search space; size information of the preset network configuration unit may be searched from the third search space. And then, constructing a complete neural network structure based on the searched network structure and size information.

Here, the network structures corresponding to the plurality of preset network structure units may be searched from the first search space to obtain the plurality of preset network structure units. The number of preset network structure units may be set according to an empirical value or a deep learning task in an actual application process, and the embodiment of the present disclosure is not particularly limited. Aiming at a plurality of preset network structure units, in the process of processing the deep learning task, the sequence of executing the task between each preset network structure unit can be set according to the time sequence searched from the first search space; or, the setting may be performed according to preset location information of preset network structure units, where there is a task execution sequence between the preset location information of different preset network structure units.

Referring to fig. 2, it is a schematic diagram of a neural network structure constructed based on a target search space. The preset network structure unit 21 includes network structures obtained by searching in the first search space 23, such as a transform network structure 211, a CNN network structure 212, an MLP network structure 213, a CNN network structure 214, and a transform network structure 215 (the network structures may be connected according to a searched time sequence, or preset location information of the preset network structure unit). The down-sampling processing unit 22 includes network structures obtained by searching in the second search space 24, such as a local down-sampling network structure 221, a global down-sampling network structure 222, a global-local down-sampling network structure 223, and a local down-sampling network structure 224. Wherein, the Transformer network structure 211 further comprises size information repeats-1, channels-1 and expansion-1 searched from the third search network 25; the CNN network structure 212 further includes size information repeats-2, channels-2, and expansion-2 searched from the third search network 25; the MLP network structure 213 further includes size information repeats-3, channels-3, and expansion-3 searched from the third search network 25; the CNN network structure 214 further includes size information repeats-4, channels-4 and expansion-4 searched from the third search network 25; the transform network structure 215 further includes size information repeats-5, channels-5, and expansion-5 searched from the third search network 25.

For example, for the CNN network structure, repeat may be executed in a loop for several times based on the original execution order according to the data indicated by repeat; setting the number of convolution channels according to channels; the channels are expanded or reduced according to the data indicated by expansion, and the number of convolution channels is changed in different convolution layers.

After obtaining the neural network structure, giving the neural network structure parameters, and completing the preliminary training of the test neural network corresponding to the neural network structure by using a training data set corresponding to a preset deep learning task; and then, processing a deep learning task based on the test neural network finished by the initial training, so as to obtain a processing result of the deep learning task.

Here, the training data set may be used for both the preliminary training process for testing the neural network and the test process for testing the neural network for the deep learning task.

Illustratively, a test neural network that has been preliminarily trained is targeted; the neural network 20 is tested as shown in fig. 2. Firstly, the method comprises the steps of firstly processing test data of a deep learning task by using a Transformer corresponding to a first preset network structure unit 21, and then sequentially executing the test data, wherein a local downsampling network corresponding to a first downsampling processing unit 22 further processes an output result after the Transformer is processed; until the Transformer corresponding to the Transformer network structure 215 outputs the task processing result. It is understood that, besides the preset network structure unit and the down-sampling processing unit, the neural network may further include a fully connected layer and/or a classifier, etc. to obtain a final processing result of the deep learning task, such as an image classification result and a target detection result.

And then, under the condition that an expected result is achieved in the deep learning task processing process, the neural network structure constructed at this time can be used as a target neural network structure. Here, the condition for achieving the desired result may include that the actual processing result of the neural network coincides with the desired result, or that a difference between the two satisfies a preset condition. Wherein the desired result may be as label information for training data in a training data set of the deep learning task.

In some embodiments, a preset condition of the network structure search result may be further used as a condition of an expected result of the deep learning task, for example, a first selection probability of a preset network structure unit is greater than or equal to a first preset threshold; and/or the second selection probability of the downsampling processing unit is greater than or equal to a second preset threshold; and/or the third selection probability of the size information corresponding to the preset network structure unit is greater than or equal to a third preset threshold value; and/or the accuracy of the processing result of the deep learning task is greater than or equal to a fourth preset threshold value, and the like.

Here, the first preset threshold, the second preset threshold, the third preset threshold, and the fourth preset threshold may be set according to empirical values, and the embodiment of the present disclosure is not particularly limited.

S103: a target neural network for processing the deep learning task is generated based on the target neural network structure.

Under the condition that the target neural network structure is determined, the target neural network structure parameters can be further given, and after multiple times of iterative training, the target neural network for processing the deep learning task is formed.

In specific implementation, a training data set corresponding to a preset deep learning task may be used to further train and process the test neural network corresponding to the target neural network structure and having been preliminarily trained in S102, so as to generate a target neural network for processing the deep learning task and obtaining an accurate task processing result.

In some embodiments, multiple training samples may also be obtained; and then, training the target neural network to be trained corresponding to the target neural network structure by using the training sample until a training cut-off condition is met, and obtaining the trained target neural network.

Here, the training samples may be part of training samples in a training data set corresponding to the deep learning task, such as samples other than training samples used in the preliminary training process of the target neural network structure. The target neural network to be trained may be a target neural network that has completed a preliminary training. The training cutoff condition may be that the accuracy of the processing result of the target neural network is greater than a preset accuracy, or the number of times of iterative training is greater than a set value, or the like.

The above-mentioned S101 to S103 are constructed in advance with a search space for searching the neural network structure, and one search space is constructed in advance with multiple networks or multiple size information, so that the multiple search spaces constructed in this aspect provide a larger search space, for example, a network structure corresponding to a preset network structure unit is searched from the first search space, a network structure corresponding to a downsampling processing unit is searched from the second search space, and size information of the neural network structure is searched from the third search space, and a network structure search is performed from the wider search space, so that neural network structures with multiple different structures can be obtained, and diversity of the neural network structures obtained by search is improved, thereby being beneficial to improving performance of the finally obtained target neural network in a deep learning task. Further, a neural network structure is constructed based on the searched network structure and size information, and then a target neural network structure corresponding to an expected result of the deep learning task, namely a target neural network meeting the expected processing result of the deep learning task, can be obtained by combining a preset training data set corresponding to the deep learning task and the expected result of the deep learning task.

For the first search space in S101. The first search space may include at least two of a convolution structure unit, a self-attention mechanism based encoding-decoding structure unit, a multi-layered perceptron MLP structure unit, and the like. Illustratively, the convolution structure unit may include a CNN network structure; the auto-attention mechanism based coding-decoding structure unit may include a transform network structure; the multi-layer perceptron MLP fabric element may comprise a MLP network fabric.

For the second search space in S101. The second search space may include a local downsampling unit for sampling local context information, a global downsampling unit for sampling global context information, a global-local downsampling unit for sampling global context information, and the like.

Exemplarily, the local downsampling unit is configured to sample local features of the input data, that is, to perform feature extraction based on local information of the input data, so as to obtain features related to the local information; the global down-sampling unit is used for sampling global features of input data, namely features related to global information obtained by sampling the global information based on the input data; the global-local down-sampling unit is used for sampling the local features and the global features of the input data and performing fusion processing to obtain global-local fusion features, namely performing feature extraction on the local information based on the input data to obtain features related to the local information, sampling the global information based on the input data to obtain features related to the global information, and then performing fusion processing on the obtained features related to the local information and the features related to the global information to obtain the global-local fusion features.

Reference may be made to fig. 3, which shows a schematic diagram of network structures of different downsampling processing units, including a local downsampling network structure 31, a global downsampling network structure 32, and a global-local downsampling network structure 33. Wherein, the local downsampling network structure 31 may include a first convolution operator 311 for performing a convolution operation, and the first convolution operator 311 may include a two-dimensional convolution network structure, such as Conv2d, where the step size stride is 2, which is denoted as s 2; the global downsampling network structure 32 may comprise a first processing unit 321 based on a self-Attention mechanism, such as a Multi-Head Attention mechanism, and a second convolution operator 322 for performing a convolution operation on data input to the first processing unit, and the second convolution operator 322 may comprise a one-dimensional convolution network structure, such as Conv1d, with a stride of 2. Here, the data input to the first processing unit may include input data input to the global down-sampling network structure 32, that is, output data of a last network structure unit of the global down-sampling network structure 32 in a certain constructed neural network structure. The first processing unit 321 and the second convolution operator 322 can process the same input data, and the first processing unit 321 can also process the output data of the second convolution operator 322, and perform fusion processing on the two processing results to obtain output data.

The global-local downsampling network structure 33 may comprise a second processing unit 331 based on a self-Attention mechanism, such as Multi-Head attachment, and a third convolution operator 332 for performing a convolution operation on data input to the second processing unit, and the third convolution operator 332 may comprise a two-dimensional convolution network structure, such as Conv2d, with a stride of 2. Here, the data input to the second processing unit may be input data input to the global-local downsampling network structure 33, that is, output data of a last network structure unit of the global-local downsampling network structure 33 in a certain constructed neural network structure. The second processing unit 331 and the third convolution operator 332 can process the same input data, and the second processing unit 331 can also process the output data of the third convolution operator 332, and perform fusion processing on the two processing results to obtain output data.

Illustratively, for a local downsampling network structure, i.e., Conv2d, it may be implemented by using a convolution operation with stride of 2, and the convolution kernel size is 3 × 3; for a global downsampling network structure, firstly, Conv1d may be used to perform convolution operation on input data, for example, convolution operation with stride of 2 may be used to implement the convolution operation, and query data query obtained by sampling is denoted as Q; then, respectively carrying out linear transformation processing on the input data by using different weight information to obtain key data key, which is recorded as K, and valuable data value, which is recorded as V; and then, utilizing Multi-Head Attention to respectively perform fusion processing on the query data query, the key data key and the valuable data value, for example, weighting and fusing corresponding data to obtain output data of the global down-sampling network structure. For the global-local downsampling network structure, the sampling process of the global downsampling network structure may be referred to, and the repeated parts are not described again.

For the network structure search in S102, see fig. 4, which is a specific flow diagram of the network structure search, including S401 to S405, where:

s401: and constructing a target search space based on the first search space, the second search space and the third search space.

Here, the target search space may include a plurality of sub-search spaces, which are a first search space, a second search space, a third search space, and the like, as shown in fig. 2.

S402: a plurality of candidate neural network structures is determined based on the target search space.

Here, a schematic diagram of a candidate neural network structure may be shown in fig. 2, where each candidate neural network structure may include at least one preset network structure unit selected from the first search space, at least one down-sampling processing unit selected from the second search space, and size information of each preset network structure unit in each candidate neural network is size information selected from the third search space.

The candidate neural network structure is used for constructing a candidate neural network to execute a deep learning task.

S403: and constructing a corresponding candidate neural network based on the candidate neural network structure.

Specifically, a network parameter may be given to the candidate neural network structure, and a candidate neural network corresponding to the candidate neural network structure may be constructed.

S404: and determining a processing result of the candidate neural network on the deep learning task by adopting a training data set corresponding to the preset deep learning task.

The data in the training data set corresponding to the preset deep learning task can be used as training data to preliminarily train the candidate neural network, and the candidate neural network after preliminary training is used for processing the deep learning task to obtain the processing result of the deep learning task.

S405: and searching out a target neural network structure in a target search space based on the candidate neural network structure by taking the minimization of the difference between the processing result and the expected result of the candidate neural network on the deep learning task as a target.

The processing result of the deep learning task may include a task result of the deep learning task output by the candidate neural network, such as a classification result of an image classification task. Or the result of updating the first selection probability of the preset network structure unit, the result of updating the second selection probability of the downsampling processing unit, and the result of updating the third selection probability of the size information corresponding to the preset network structure unit.

The expected result of the deep learning task can be a preset maximum accuracy threshold of the processing result of the deep learning task; or, the first selection probability of the preset network structure unit in the current candidate neural network structure may be greater than or equal to a first preset threshold; and/or the second selection probability of the downsampling processing unit is greater than or equal to a second preset threshold; and/or the third selection probability of the size information corresponding to the preset network structure unit is greater than or equal to a third preset threshold value, and the like.

In the case where the desired result is a maximum accuracy threshold, the minimization of variance may be that a variance between the accuracy of the processing result and the maximum accuracy threshold is less than or equal to a preset minimum variance threshold; in the case that the expected result is that the first selection probability is greater than or equal to a first preset threshold, the minimization of the difference may be that the first selection probability of any preset network structure unit in the current candidate neural network structure is greater than or equal to the first preset threshold; in the case that the expected result is that the second selection probability is greater than or equal to a second preset threshold, the minimization of the difference may be that the second selection probability of any downsampling processing unit in the current candidate neural network structure is greater than or equal to the second preset threshold; in the case that the expected result is that the third selection probability is greater than or equal to the third preset threshold, the minimum difference may be that the third selection probability of the size information corresponding to any preset network structure unit in the current candidate neural network structure is greater than or equal to the third preset threshold.

For each candidate neural network structure in the plurality of candidate neural network structures, the target neural network structure can be searched in the target search space based on the candidate neural network structure by respectively taking the difference between the processing result and the expected result of the deep learning task of the candidate neural network as a target to be minimized.

Taking the image classification task as an example, the expected result of the image classification task may be the maximum accuracy of the image classification result, such as 100%. The minimum difference value may be set to 0, 5%, 10%, or the like, and the specific data may be set according to an empirical value, which is not specifically limited in the embodiments of the present disclosure. In the case that the difference between the accuracy and the maximum accuracy of the processing result of the candidate neural network is less than or equal to the minimum difference value, the candidate neural network structure may be used as a target neural network structure for processing the image classification task of this time. In the case that the difference between the accuracy and the maximum accuracy of the processing result of the candidate neural network is greater than the minimum difference value, a feedback value (reward) can be calculated based on the difference, and the candidate neural network structure is searched again in the target search space based on a genetic algorithm or a reinforcement learning method under the guidance of the feedback value. The network structure search process is repeated until the difference between the accuracy and the maximum accuracy of the processing results of the candidate neural network is less than or equal to the minimum difference value.

Optionally, when the difference between the accuracy and the maximum accuracy of the processing result of the candidate neural network is greater than the minimum difference value, according to the image classification result corresponding to each candidate neural network structure in the plurality of candidate neural network structures, determining the candidate neural network structure corresponding to the image classification result with the highest accuracy, updating the probability that the size information corresponding to each preset network structure unit, the downsampling processing unit, and each preset network structure unit is selected in the next network search process, and according to the updated probability, repeatedly executing the network search step, that is, performing the next network search, and repeating the above steps until the target neural network structure is searched.

In some embodiments, a part of candidate neural network structures may be selected from a plurality of candidate neural network structures, and the target neural network structure may be determined by grouping and searching the candidate neural network structures. In specific implementation, the first selection probability of each preset network structure unit, the second selection probability of each downsampling processing unit and the third selection probability of the size information corresponding to each preset network structure unit can be respectively updated based on the processing result of each target candidate neural network structure in the partial candidate neural network structures on the deep learning task; and searching the network structure based on the updated first selection probability, second selection probability and third selection probability, and reselecting a plurality of new target candidate neural network structures by adopting a genetic algorithm, a reinforcement learning algorithm and the like. And under the condition that the difference between the processing result and the expected result of the deep learning task respectively by the plurality of new target candidate neural network structures does not meet the first preset condition, returning to the step of executing updating and selecting probability.

For example, the processing result with the highest accuracy may be screened out from the processing results of the deep learning tasks respectively corresponding to part of the candidate neural network structures, and it may be determined whether a difference between the processing result with the highest accuracy and the maximum accuracy threshold indicated by the expected result is less than or equal to a preset minimum difference threshold; if not, determining a candidate neural network structure corresponding to the processing result, further determining a first selection probability of each preset network structure unit, a second selection probability of each downsampling processing unit and a third selection probability of the size information corresponding to each preset network structure unit in the candidate neural network structure, and respectively updating the first selection probability of each preset network structure unit, the second selection probability of each downsampling processing unit and the third selection probability of the size information corresponding to each preset network structure unit under the condition that the first selection probability, the second selection probability or the third selection probability do not meet the expected result. For example, the first selection probability of each preset network structure unit, the second selection probability of each downsampling processing unit, and the third selection probability of the size information corresponding to each preset network structure unit may be increased, where the increase range may be set according to an actual situation and an empirical value, and the embodiment of the present disclosure is not specifically limited. Based on the updated first selection probability, second selection probability and third selection probability, network structure search is performed again from the target search space, that is, the network search in S102 is repeatedly performed, so that a plurality of new target candidate neural network structures can be reselected.

And then, under the condition that the difference between the processing result and the expected result of the deep learning task respectively by the plurality of new target candidate neural network structures does not meet the first preset condition, returning to the step of executing updating and selecting probability. Illustratively, in the case that the expected result is the maximum accuracy threshold, in the case that the difference between the accuracy of the processing result of the deep learning task and the maximum accuracy threshold is greater than the preset minimum difference threshold, returning to the step of updating the selection probability. And/or returning to the step of updating the selection probability under the condition that the first selection probability of the updated preset network structure unit is smaller than the first preset threshold value under the condition that the expected result is that the first selection probability is larger than or equal to the first preset threshold value. And/or returning to the step of updating the selection probability under the condition that the expected result is that the second selection probability is greater than or equal to a second preset threshold and the updated second selection probability is less than the second preset threshold. And/or returning to the step of updating the selection probability under the condition that the expected result is that the third selection probability is greater than or equal to a third preset threshold and the updated third selection probability is less than the third preset threshold.

In some embodiments, in a case where a plurality of new target candidate neural network structures satisfy a first preset condition with respect to a difference between a processing result and an expected result of the deep learning task, respectively, the new target candidate neural network structure corresponding to the difference satisfying the first preset condition is taken as the target neural network structure.

Continuing the above example, in the case that the difference between the accuracy of the processing result of the deep learning task and the maximum accuracy threshold is less than or equal to the preset minimum difference threshold, or in the case that the first selection probability of any preset network structure unit in the new target candidate neural network structure is greater than or equal to the first preset threshold, or in the case that the second selection probability of any downsampling processing unit in the new target candidate neural network structure is greater than or equal to the second preset threshold, or in the case that the third selection probability of the size information corresponding to any preset network structure unit in the new target candidate neural network structure is greater than or equal to the third preset threshold, the new target candidate neural network structure is taken as the target neural network structure.

In some embodiments, based on the processing result of each target candidate neural network structure in the partial candidate neural network structures on the deep learning task, determining a target candidate neural network structure of which the accuracy degree corresponding to the processing result meets a second preset condition; and respectively updating the first selection probability of each preset network structure unit, the second selection probability of each downsampling processing unit and the third selection probability of the size information corresponding to each preset network structure unit based on the target candidate neural network structure of which the accuracy corresponding to the processing result meets the second preset condition.

Illustratively, from the processing results of the deep learning tasks respectively corresponding to the partial candidate neural network structures, screening partial processing results with higher accuracy, for example, sorting the accuracy of the processing results, screening the first two processing results with higher accuracy, and determining the target candidate neural network structure corresponding to the selected processing result, updating the first selection probability of each preset network structure unit and the second selection probability of each down-sampling processing unit respectively on the basis of the first selection probability of each preset network structure unit, the second selection probability of each down-sampling processing unit and the third selection probability of the size information corresponding to each preset network structure unit of the target candidate neural network structure, and the third selection probability of the size information corresponding to each preset network structure unit is used for preparing the next network search.

In some embodiments, the candidate neural network structure may include a plurality of feature extraction nodes and a plurality of downsampling nodes distributed at intervals. The feature extraction node may include a preset network structure unit selected from the first search space, such as a General Operator (GOP) in fig. 2; the Down-sampling node includes a Down-sampling processing unit (DSM) selected from the second search space, such as the Down-sampling processing unit (DSM) shown in fig. 2; and the size information of each preset network structure unit is selected from the third search space.

Here, the number of intervals distributed at intervals may be limited according to an actual application process, and the embodiment of the present disclosure is not particularly limited.

Alternatively, the candidate neural network may include a plurality of feature extraction nodes that successively perform the feature extraction task, and a plurality of downsampling nodes that successively perform the downsampling task.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, the embodiment of the present disclosure further provides a device for generating a neural network corresponding to the method for generating a neural network, and since the principle of solving the problem of the device in the embodiment of the present disclosure is similar to that of the method for generating a neural network in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 5, there is shown a schematic diagram of an apparatus for generating a neural network according to an embodiment of the present disclosure, the apparatus including: a search space determination module 501, a network structure search module 502 and a neural network generation module 503; wherein,

a search space determining module 501, configured to determine a first search space corresponding to a preset network structure unit, a second search space corresponding to a downsampling processing unit, and a third search space corresponding to size information of a neural network structure, where the preset network structure unit and the downsampling processing unit are used to construct the neural network structure;

a network structure search module 502, configured to perform network structure search based on the first search space, the second search space, and the third search space, and a training data set corresponding to a preset deep learning task, so as to obtain a target neural network structure corresponding to an expected result of the deep learning task;

a neural network generating module 503, configured to generate a target neural network for processing a deep learning task based on the target neural network structure.

In an optional embodiment, the network structure searching module 502 is configured to construct a target search space based on the first search space, the second search space, and the third search space;

In an optional implementation manner, the network structure searching module 502 is configured to update the first selection probability of each preset network structure unit, the second selection probability of each downsampling processing unit, and the third selection probability of the size information corresponding to each preset network structure unit, respectively, based on a processing result of each target candidate neural network structure in a part of candidate neural network structures on the deep learning task;

In an optional embodiment, the network structure searching module 502 is configured to determine, based on a processing result of each target candidate neural network structure in a part of candidate neural network structures on the deep learning task, a target candidate neural network structure whose accuracy degree corresponds to the processing result meets a second preset condition;

In an optional implementation manner, the network structure searching module 502 is further configured to, in a case that differences between processing results of the deep learning task and the expected results of the plurality of new target candidate neural network structures respectively satisfy the first preset condition, take a new target candidate neural network structure corresponding to the difference that satisfies the first preset condition as the target neural network structure.

In an optional embodiment, the neural network generating module 503 is configured to obtain a plurality of training samples;

The description of the processing flow of each module in the apparatus for generating a neural network and the interaction flow between each module may refer to the related description in the above-mentioned method embodiment for generating a neural network, and will not be described in detail here.

Based on the same technical concept, the embodiment of the application also provides computer equipment. Referring to fig. 6, a schematic structural diagram of a computer device provided in an embodiment of the present application includes:

a processor 61, a memory 62 and a bus 63. Wherein the memory 62 stores machine-readable instructions executable by the processor 61, the processor 61 being configured to execute the machine-readable instructions stored in the memory 62, the machine-readable instructions when executed by the processor 61 causing the processor 61 to perform the steps of: s101: determining a first search space corresponding to a preset network structure unit, a second search space corresponding to a downsampling processing unit and a third search space corresponding to the size information of the neural network structure, wherein the preset network structure unit and the downsampling processing unit are used for constructing the neural network structure; s102: performing network structure search based on the first search space, the second search space, the third search space and a preset training data set corresponding to the deep learning task to obtain a target neural network structure corresponding to an expected result of the deep learning task; s103: a target neural network for processing the deep learning task is generated based on the target neural network structure.

The memory 62 includes a memory 621 and an external memory 622; the memory 621 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 61 and the data exchanged with the external storage 622 such as a hard disk, the processor 61 exchanges data with the external storage 622 through the memory 621, and when the computer device is operated, the processor 61 communicates with the storage 62 through the bus 63, so that the processor 61 executes the execution instructions mentioned in the above method embodiments.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the method for generating a neural network described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

Embodiments of the present disclosure also provide a computer program product including computer instructions, which when executed by a processor, implement the steps of the above-described method for generating a neural network. Wherein a computer program product may be any product that can implement the method for generating a neural network, and some or all aspects of the computer program product that contribute to the prior art may be embodied in the form of a Software product (e.g., Software Development Kit (SDK)), which may be stored in a storage medium and causes an associated device or processor to perform some or all of the steps of the method for generating a neural network.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the modules is only one logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional modules in the embodiments of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of generating a neural network, comprising:

2. The method of claim 1, wherein the first search space comprises at least two of a convolutional structural unit, a self-attention mechanism-based coding-decoding structural unit, and a multi-layered perceptron MLP structural unit.

3. The method of claim 1 or 2, wherein the second search space comprises a local downsampling unit for sampling local context information, a global downsampling unit for sampling global context information, and a global-local downsampling unit for sampling global context information and local context information.

4. The method of claim 3, wherein the local downsampling unit includes a first convolution operator to perform a convolution operation;

5. The method of claim 1, wherein the performing a network structure search based on the first search space, the second search space, and the third search space, and a training data set corresponding to a preset deep learning task comprises:

6. The method of claim 5, wherein the searching for the target neural network structure within the target search space based on the candidate neural network structures comprises:

7. The method of claim 6, wherein the updating the first selection probability of each preset network structure unit, the second selection probability of each downsampling processing unit, and the third selection probability of the size information corresponding to each preset network structure unit based on the processing result of each target candidate neural network structure in the partial candidate neural network structure on the deep learning task respectively comprises:

8. The method of claim 6 or 7, wherein the searching for a target neural network structure within the target search space based on the candidate neural network structures further comprises:

9. The method of any one of claims 1 to 8, wherein the candidate neural network structure comprises a plurality of feature extraction nodes and a plurality of down-sampling nodes distributed at intervals; the feature extraction node comprises a preset network structure unit selected from a first search space; the down-sampling node comprises a down-sampling processing unit selected from the second search space; and the size information of each preset network structure unit is selected from the third search space.

10. The method of any one of claims 1 to 9, wherein the generating a target neural network for processing a deep learning task based on the target neural network structure comprises:

obtaining a plurality of training samples;

11. An apparatus to generate a neural network, comprising:

12. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when a computer device is run, the machine-readable instructions when executed by the processor performing the steps of the method of generating a neural network of any one of claims 1 to 10.

13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the method of generating a neural network as claimed in any one of claims 1 to 10.